There are a variety of tools out there that include solutions for backing up to amazon s3 however none of them are as easy and straight forward as the debian package s3cmd. I tried everything before trying s3cmd and it was a mistake so i recommend not wasting your time and just start off right. s3cmd is very easy to install, configure, and use. For automatic backups just drop your s3cmd command into a cron and your set. Many people try going with the s3snc.rb ruby script but it requires way more effort and packages to install when really all it does is backup using s3cmd.
The first thing i did was create a user called backupadmin and gave the user appropriate permissions to the locations i wanted to backup. I recommend doing that however it is not required. s3cmd can run under any user and you can configure multiple s3cmd users. When you configure s3cmd it will create a file .s3cmd in the users home directory in which the s3 settings will be stored. To install s3cmd use the following command as super user
apt-get install s3cmd
Now that it is installed you need to configure it.
s3cmd –configure
You will be prompted for the access key and secret key you can find/generate from your s3 account portal. You will also be prompted to use ssl/https for your data transfer. It is slower so if you don’t have security sensitive data then definitely enter no for https. If you are backing up secure data answer yes to https. In my case I had both secure and public data so i setup another user for the https backups ‘sbackupadmin’ for secure backup admin then ran the configuration again under that user.
After running s3cmd –configure a file is created in your home directory called .s3cfg. If you run configure as root then you will find the config file in /root/.s3cfg. Once you run configure you can use the .s3cfg file generated as a template to copy to other users/servers. .s3cfg is easily edited to make adjustments such as tuning your ssl and gpg settings.
Here are a few examples of how I use s3cmd.
To create a remote folder ‘bucket’ and put a single file in the new bucket:
s3cmd mb s3://server1
s3cmd mb s3://server1/mysql
s3cmd -p put /home/backupadmin/server1/mysqldump.tar.gz s3://server1/mysql/
s3cmd ls s3://server1/mysql (this lists the remote bucket to see the file was copied)
To sync a folders contents to a remote bucket (like rsync):
s3cmd -p sync /home/backupadmin/server2/public_html/ s3://server2/public_html
When using sync after the files are copied initially only new files and files modified since last s3 backup will be copied. This is an awesome solution for almost all backup solutions. Even if i am doing one single file i use sync. The -p option there preserves the user/group and permissions. You will probably always want to use the -p option.
A simple way to backup all of your servers is to simply drop your s3cmd command into a cron on each server and let it rip!
Warning:
you probably do not want to run both https and encryption or it will run very slow
–check your .s3cfg file and edit accordingly
Advanced backup section:
s3cmd does not handle large directories well! If you have more than 200 folders in your command you will want to break it up and do one folder at a time using a shell script. In my case I am working with a public_html folder with 1800 folders and growing. Here is what I did to not only solve the resources and speed issue but also allows control of backup strength, 1X, 2X, 3X, etc..
1. ls /home/example/public_html/ > folder-list.txt
2. Create open office calc spreadsheet and paste folder-list.txt in column B
3. Sequentially (s3cmd -p sync /home/example/public_html/)paste beginning of command in column A, then populate cells down even with last of column B
4. In column C enter a unique character such as $ and drag down to last row in set (used for search & replace later to add a space in command)
5. paste destination in column D and drag down to bottom row of set (s3://example/public_html/)
5. copy all column B to column E for destination folder
6. highlight entire data set and paste in gedit
7. search and replace ‘tab’ gaps to nothing – remove tab spaces from fields then replace $ with ‘ ‘ a space.
Result:
s3cmd -p sync /home/example/public_html/client1 s3://example/public_html/client1
s3cmd -p sync /home/example/public_html/client2 s3://example/public_html/clinet2
s3cmd -p sync /home/example/public_html/client3 s3://example/public_html/client3
s3cmd -p sync /home/example/public_html/file.php s3://example/public_html/file.php
Name the file something like s3-pub_html.sh and copy to /usr/local/bin on server.
chmod +x /usr/local/bin/s3-pub_html.sh
At this point you are good to go. Depending on how big your backup is you may or may not need the next step. The next step will break up the script into chunks that can be run in parallel for 2X, 3X+ backups. The example below I am doing a 2X backup.
8. For Split if you would like to run scripts since i have 1827 lines. 1827 divided by 8 = 228.375 – split by 225 then put leftovers in last file
split -225 s3-pub_html.sh
List the directory and you will see the split files xaa, xab, xac, etc each containing 225 lines except the last one with the remainder. rename those files to xaa.sh, xab.sh, etc. create 2 new master scripts and add the split scripts to them
s3-pub_html-1x.sh
sh /usr/local/bin/xaa.sh | mail -s “s3-pub_html-bkup1″ s3report@rezik.net
sh /usr/local/bin/xab.sh | mail -s “s3-pub_html-bkup2″ s3report@rezik.net
sh /usr/local/bin/xac.sh | mail -s “s3-pub_html-bkup3″ s3report@rezik.net
sh /usr/local/bin/xad.sh | mail -s “s3-pub_html-bkup4″ s3report@rezik.net
s3-pub_html-2x.sh
sh /usr/local/bin/xaa.sh | mail -s “s3-pub_html-bkup5″ s3report@rezik.net
sh /usr/local/bin/xab.sh | mail -s “s3-pub_html-bkup6″ s3report@rezik.net
sh /usr/local/bin/xac.sh | mail -s “s3-pub_html-bkup7″ s3report@rezik.net
sh /usr/local/bin/xad.sh | mail -s “s3-pub_html-bkup8″ s3report@rezik.net
crontab example for 2X
15 22 * * * sh /usr/local/bin/s3-pub_html-1x.sh
0 0 * * * sh /usr/local/bin/s3-pub_html-2x.sh
The first backup begins at 11:15pm and then the second session starts at midnight. This allows our backups to finish by 4:00am where it would take until 8:00am at 1X.
Below is the details for s3cmd. You can get it by typing s3cmd –help in your command line. enjoy!
Usage: s3cmd [options] COMMAND [parameters]
S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing “buckets” and uploading, downloading and removing
“objects” from these buckets.
Options:
-h, –help show this help message and exit
–configure Invoke interactive (re)configuration tool.
-c FILE, –config=FILE
Config file name. Defaults to /root/.s3cfg
–dump-config Dump current configuration after parsing config files
and command line options and exit.
-e, –encrypt Encrypt files before uploading to S3.
–no-encrypt Don’t encrypt files.
-f, –force Force overwrite and other dangerous operations.
-P, –acl-public Store objects with ACL allowing read for anyone.
–acl-private Store objects with default ACL allowing access for you
only.
–delete-removed Delete remote objects with no corresponding local file
[sync]
–no-delete-removed Don’t delete remote objects.
-p, –preserve Preserve filesystem attributes (mode, ownership,
timestamps). Default for [sync] command.
–no-preserve Don’t store FS attributes
–exclude=GLOB Filenames and paths matching GLOB will be excluded
from sync
–exclude-from=FILE Read –exclude GLOBs from FILE
–rexclude=REGEXP Filenames and paths matching REGEXP (regular
expression) will be excluded from sync
–rexclude-from=FILE Read –rexclude REGEXPs from FILE
–debug-syncmatch, –debug-exclude
Output detailed information about remote vs. local
filelist matching and –exclude processing and then
exit
–bucket-location=BUCKET_LOCATION
Datacentre to create bucket in. Either EU or US
(default)
-m MIME/TYPE, –mime-type=MIME/TYPE
Default MIME-type to be set for objects stored.
-M, –guess-mime-type
Guess MIME-type of files by their extension. Falls
back to default MIME-Type as specified by –mime-type
option
-H, –human-readable-sizes
Print sizes in human readable form.
-v, –verbose Enable verbose output.
-d, –debug Enable debug output.
–version Show s3cmd version (0.9.8.3) and exit.
Commands:
Make bucket
s3cmd mb s3://BUCKET
Remove bucket
s3cmd rb s3://BUCKET
List objects or buckets
s3cmd ls [s3://BUCKET[/PREFIX]]
List all object in all buckets
s3cmd la
Put file into bucket
s3cmd put FILE [FILE...] s3://BUCKET[/PREFIX]
Get file from bucket
s3cmd get s3://BUCKET/OBJECT LOCAL_FILE
Delete file from bucket
s3cmd del s3://BUCKET/OBJECT
Synchronize a directory tree to S3
s3cmd sync LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIR
Disk usage by buckets
s3cmd du [s3://BUCKET[/PREFIX]]
Get various information about Buckets or Objects
s3cmd info s3://BUCKET[/OBJECT]
See program homepage for more information at
http://s3tools.logix.cz