A software package a day. Day 4: Amazon's S3 (Simple Storage Service)

As it has been mentioned repeatedly due to the project I was working on, I thought perhaps a more detailed examination of Amazon’s S3 and related services would make sense. Note really a “software package”, but I think that underlines an important feature of software today. The line between software packages and services is getting blurrier and blurrier. Components such as Google Gears makes such lines even harder to define, allowing websites to be accessed and the data manipulated so it can be later synchronized with the website. Considering that the offerings online include such things as images editors (Splashup), office suites (Zoho) and more, what we think of as being desktop applications is mostly legacy. Therefore I feel justified adding S3 and similar products to the mix discussed here.

The basic idea behind Amazon’s S3 is simple enough in our scenario: upload the content, set it readable to the public and then link to this content from the main site. The principle advantage of the S3 system is that you get to leverage Amazon’s massive distribution capability and avoid consuming bandwidth and connection time on your primary server. It is a great way to deliver images, music, software downloads and video data. Of course there is no free lunch here, but at around $2.15 for 10GB of data transfer (including storage and request fees) it is quite competitive with traditional file distribution solutions while providing more fault tolerance and redundancy than traditional inexpensive file hosting offers.

There have been some noticeable outages with S3 in the past, so I don’t consider it the be-all solution to mission critical delivery of data or services, but for our purposes a couple of hours a year really isn’t as life threatening as some make it out to be. Frankly, I’m a bit shocked how many people were affected in ways where they couldn’t fail over delivery to another service (their own or yet another hosting solution). Threat analysis should not only consider the hostile outsider, but unexpected events like hosting failures.

There are some potential problems with S3. First, as pointed out in Day 2: CloudBerry, there is no native user interface for uploading files to S3. Second, there is the distinct possibility that a denial of service attack is more dangerous when using a service like S3, especially as it stands today there is no “cutoff” point where your file won’t be distributed. Theoretically a distributed denial of service attack could become very expensive. The trade off is that your site will probably still run: if your revenue per hour of your site exceeds the risk, S3 distribution makes a lot of sense in mitigating simplistic denial of service attacks. There are mitigation strategies available as well, such as putting rate limiters on individual IP address requests as well as periodically changing the name of the S3 resource that is at risk (your service knows the new name, but it prevents hammering directly upon the S3 resource).

There are more features to S3 though, such as the ability to use it for storage in conjunction with the EC2 (elastic compute) service, moving files from a S3 site to a S3 site directly (useful when managing a large number of domains that need a large resource, sharing files between users of S3 and more. Add on features such as CloudFront (still in beta at this time) allow you to have your content migrate to a server on the continent where the user is, making for fast access even for international customers. 

There is much to like about Amazon’s S3 and related services. They aren’t for everyone, but if you need high speed transfer of data and quickly provisioned server resources it delivers.