Recently we had a requirement to move a bunch of files from a Rackspace CloudFiles container to an Amazon S3 bucket, to consolidate with other Amazon Web Services (AWS) we were using on a project. We needed to move the bulk of the files in the first instance, but as the related functionality wasn’t due to go live immediately we would need to periodically synchronise the bucket right up until launch.
After a quick Google I came across the multi-cloud-mirror project on Google Code; a Python script that, according to its author Joe Masters Emison, “provides an easy, multi-processing way of synchronizing a bucket at Amazon S3 to a container at Rackspace Cloud Files, or vice versa”.
I found the script worked really well. Initially I planned to set it up on an EC2 machine and trigger it with a cron wrapped by Lockrun, as the author suggests. However, after using it a couple times it transpired that I’d be able to get the initial import achieved in a matter of hours, so I ran it locally on OS X, with a plan to sync on an ad-hoc basis as necessary. The only issue I found was that, due to the way the files are named, after being copied to the machine running multi-cloud-mirror, their MIME type wasn’t being inferred correctly. I corrected that, and should the fix be of use to anyone it’s available in our forked version of the project on the Box UK GitHub account.