Video Backup and Archiving With Amazon Glacier

Amazon recently introduced an online data backup and archiving service called Amazon Glacier. For just $0.01 per GB / month, you can upload and securely store as much data as you want, with 99.999999999% durability.

A secure, reliable, pay as you go service with unlimited storage capacity and no up-front fees sounds perfect for production companies and agencies with large libraries of video content.

But is it?

Amazon certainly thinks so. In press interviews and the use case on its site, Amazon highlights that media companies need safe, secure, and easily accessible storage for their core assets and that Glacier fits the bill. For these companies, and others like them, Glacier promises to free them of the tyranny of managing tape libraries.

This post provides a detailed look at how Amazon Glacier works, how much it would cost under different usage scenarios for a video professional, the pros and cons of using the service, and some ideas on how it could form the basis of a very compelling archiving service.

A follow up post later this week will compare the cost of using Glacier with alternatives like using LTO, backing up onto a library of hard drives, keeping data in near-line storage on your SAN, or burning it onto Blu-ray.

Whew! Let's get started.

Glacier's Cold Storage Explained

To determine whether Glacier makes sense for archiving your video assets, you need a basic understanding of what the service is designed for and how it's priced. In Amazon's words:

Glacier is designed for long-term storage of assets that are not accessed very often, and do not need to be accessed immediately.

In keeping with this, it takes 3-5 hours for data retrieval requests to be completed (i.e. before your data is ready for download). As we will see below, pricing is also designed to reinforce that the service is for archival purposes rather than storage that will be regularly accessed (Amazon's S3 service is optimized for this purpose and costs 10x as much).

Storage

Storage is relatively cheap at $10.24 per terabyte per month ($0.01 per GB / Month) regardless of how much data is stored. There is no charge for data transfer into the storage archive. Not much complexity here.

Retrieval

Retrieving data from the storage archive is broken up into two components; a data retrieval request, and a charge for the amount of data transferred.

  • Data retrieval requests are where things get complicated and can get unexpectedly pricey. Customers can request to retrieve up to 5% of their average monthly storage for free. However, this is pro-rated daily, meaning that on any given day, you can download a maximum of 0.17% of your stored data for free (5% / 30 days). For example, with a 50 TB archive you can retrieve 85 GB for free each day.

    A retrieval fee is charged when you exceed your daily quota. The fee is calculated based upon the peak hourly usage from the days in which you exceeded your allowance. Calculating this gets complicated, and rather than explaining all the nuances, I'll show some examples of what retrieval costs would look like in the next section. If you are really interested in the details, I suggest you slam back your favorite energy drink and check out this Glacier FAQ.

  • Data transfer out is charged in tiers based on the volume of data transferred in a month. The first GB is free, and up to 10 TB /month the price of data transfer out is $0.12 per GB. Beyond that threshold, the prices steadily decrease.

This pricing model is consistent with the stated purpose of the service. Relatively steep retrieval fees and restore times provide a disincentive for people that would otherwise use this as a regular storage product.

I suspect that this pricing works very well for things like financial / medical / legal record keeping where the amount of data restored at any one time is very small in relation to years worth of records. However, with video, retrieving source files for even a modest project (ex. 100 minutes of P2 footage comes in at 100 GB), you would easily exceed the free daily retrieval amount included with an an archive of over 50 TB.

Glacier Storage and Retrieval Pricing Examples

To better illustrate the costs of using Amazon Glacier, it's helpful to look at some use cases for different archive sizes.

For the purposes of this article, I have selected archive sizes of 5 terabytes, 10 terabytes, and 50 terabytes. This is characteristic of the archives that I think freelancers, small production companies and small agencies would maintain. I have also included a 500 TB example for larger companies. Regardless of archive size, the principles would be the same.

If you want to model your own costs, I would suggest checking out this unofficial Glacier calculator.

The table below shows the monthly cost of maintaining various sized archives. The 10 terabyte archive would cost $100 per month for storage, and you would be allowed to request a maximum of 16.67 Gigabytes for free each day of the month. In this case, if you were to retrieve your daily maximum, the data transfer component would only cost $2.00, which is relatively insignificant.

Glacier storage pricing

If you request any more than your free download quota on a given day, you are subject to the retrieval request charges outlined in the table below.

Glacier retrieval pricing

I figure requesting 100 GB would be representative of a situation where you are need to retrieve all the footage from an old corporate video shoot (assuming your source was P2 footage at approximately 1 GB per minute). For someone with a 10 terabyte archive, the retrieval request would be $150 and the data transfer would be $12, for a total cost of $162. This isn't the end of the world, and it's certainly cheaper than re-shooting material, but it's questionable whether this is a cost that you could pass on to a client. For the larger 50 TB archive the total retrieval cost would only be $42.

I figure the two scenarios of restoring a 500 GB to 1 TB archive would be representative of restoring some of the material for an episode of a TV show or a documentary. In the case of a company with a 50 TB library, the 500 GB retrieval job would be $810 and the 1 TB retrieval job would be $1770.

Finally, the case of restoring a full library is extremely expensive in all scenarios. Since this is archival footage, it's fairly unlikely that anyone would do a full restore even if there were a catastrophic failure at their facility. They would likely still only restore the project files they need to work with locally, because the Glacier archive would still provide a reliable, durable, and redundantly stored archive.

The more likely scenario where you would incur the charge for retrieving your entire archive is if you decided to move it to another provider. In this case, you could spread your transfer over days, weeks, or months in order to cut retrieval costs, as will be explained in more detail below.

Minimizing Glacier Retrieval Costs

It should be noted that the key assumption underlying all of these scenarios above is that the data is requested all at once. Amazon bases the retrieval fee on the peak hourly usage from the days in which you exceeded your allowance. This means it essentially takes the single hour in the month where you requested the most data, subtracts your free amount of data transfer during this period, and then applies this rate for every hour of the month.

Retrieval requests could be reduced significantly by spreading them out over a longer period of time, thus driving down peak hourly usage. For example, instead of requesting 500 GB worth of files from a 10 TB archive all at once, you could request a 21 GB chunk every hour over the course of the day (21 * 24 = 504). Doing so would cut the retrieval charge from $870 down to $145 (assuming these are still your peak downloading hours in the month). Spreading out the download over two days would reduce the charge down to $70. If you could spread out the request evenly over the month, the retrieval fee would fall to $0.

Essentially, the most effective way to retrieve large amounts of data is to spread the request out evenly over long periods of time. So for big retrievals you would definitely want to manage your requests.

Throttling retrieval requests in this manner isn't as far fetched as it would seem, because the limiting factor in retrieving data is the speed of the Internet connection. Why request 500 GB all at once if it's going to take you well over a day to download all the data (even with a 50Mbps Internet connection it would realistically take a couple of days). The challenge with this approach is that it either takes manual intervention, or a software program that intelligently spreads out requests to minimize costs.

Pros and Cons of Using Glacier for Your Media Company

As with any purchase decision, it's important to take stock of your own situation and clarify your needs. This will help you filter out whether Glacier is your best bet for media archiving, or whether you would be better served by investing in LTO5, hard drives, or Blu-ray. The strengths and weaknesses of Glacier for video archiving below should help you make a decision. I'll cover how Glacier compares with these other options in a follow-up post.

Glacier Strengths

  • Cost of storage. This is an extremely low price for access to an unlimited amount of fully redundant storage with no up-front fees. The storage component of the service is cheap... but to keep things cheap, you must make sure that your usage of this service falls within the design parameters set out by Amazon (see weaknesses).

  • Reliable. Glacier provides annual durability of 99.999999999% per archive. To achieve this, Amazon backs up data in several different geographically distinct facilities and the service can withstand concurrent failure of two of them. Amazon also ensures that data is written to multiple facilities before it acknowledges that the backup was successful. Further, it regularly checks the integrity of data, and heals itself based on the different copies of each file.

    It is highly unlikely that your existing archiving system offers this level of durability. This is certainly the case if you are just stashing hard drives with client files in your edit suite, or if you just have one or two copies of each file on an LTO tape. An archive is only as good as your ability to restore it.

  • Secure. A common fear with cloud-based services is the security of that data. Amazon has you covered here. Uploads can be done over SSL and data is automatically encrypted with Advanced Encryption Standard (AES) 256, a secure symmetric-key encryption standard. Prying eyes are not going to see your files.

  • No upfront investment and no capacity planning required. For a small company this is fantastic. No need to invest in a $2500 LTO drive before you can start archiving. For larger companies the benefits are even more apparent. The upfront costs of storage appliances, and automated tape libraries are significant. Time is spent researching, negotiating prices, and maintaining that hardware. Typically, you also purchase far more capacity than you need, because the decision is a one-shot deal. This means that there is expensive excess capacity that can sit unused for long periods of time.

  • Low ongoing operational costs. With Amazon, once your archiving system is setup (and I'm assuming that there is will be front-end software to manage it), there is really little else to do other than make sure that the uploads are being processed as intended. With options like LTO, hard-drives, and Blu-ray, you need to spend time on activities like creating the archives, duplicating the archives for redundancy, ensuring the archives work (rotating LTO tapes, testing hard drives), and transporting archives to secure locations (to meet Amazon like reliability, this means one local copy for easy retrieval, and two copies stored at different facilities for redundancy).

    For a smaller shop this is likely done by an employee as part of their regular duties. It's hard to put a cost to this, because it's something that typically gets done in idle time, but it's important to highlight that there is still an opportunity cost to that time. Your employee (or you) could be spending time on another activity with a higher return (ex. talking to your existing customers). With larger facilities, someone is likely employed to take care of backing up data, or this is a formal part of an employee's role. In this case, the cost of operational complexity and staffing overhead must be included in an analysis of whether Glacier is more or less cost effective than the alternatives.

  • Archives are accessible from anywhere. It's easy to retrieve archives from wherever you are and whenever you want, and this can be done with minimal delay.

Glacier Weaknesses

  • Not plug and play (yet). Amazon Web Services are designed for developers, hence the API driven services aren't particularly friendly for end users. This means that the real value of Glacier will need to be unlocked by talented software developers that will make it easier to use.

    At the lower-end of the spectrum, companies that build FTP and backup clients like Cloudberry lab and Haystack Software will create front-end tools that make it easy to archive files, and manage retrieval costs.

    Higher up the food chain, video specific solutions that make heavy use of meta-data to speed search and retrieval will be needed to unlock the value of Glacier for broadcasters, studios, and large creative agencies with complex media workflows.

  • Upload times. The biggest barrier to cloud video editing is the upload bottleneck. The same applies to archiving video online.

    • Many smaller production companies are using cable Internet or ADSL connections that offer fast download speeds, but have upload speeds in the 1-2 Mbps range. At these speeds, it would take over an hour per GB of footage uploaded, and that 5 TB archive would literally take hundreds of days. Even a company using Verizon's FIOS service with 50 Mbps upload speeds would still need at least 10 days to upload an archive of this size.

    • Medium to large production companies and studios with symmetrical Internet connections that have upload speeds of 100 Mbps and up may still experience long upload times due to network latency and packet loss, combined with the likelihood that they will be uploading much larger archives. For these companies, an accelerated upload tool like Aspera would likely be needed to reduce upload times.

    • Another option for ingesting archives into Glacier is AWS Import / Export. This service allows you bypass the Internet by sending your storage devices directly to Amazon and having them transfer your data directly to Glacier using their high-speed internal network. Pricing for this is based on a combination of devices processed ($80 each) and the amount of processing time spent transferring data. Here is a link to a pricing calculator.

  • Retrieval requests and download times. For many applications, the 4 hour retrieval request time shouldn't be an issue, although I can see there being a wrinkle for some use cases like news productions where pulling an old archive is going to be very deadline driven. Download times could be an issue in environments where large volumes of source material need to be retrieved quickly.

  • Complex pricing for data retrieval. While overall storage pricing is cheap, there is a good deal of complexity involved in understanding and modeling data restoration scenarios. If retrieval requests are going to be large in relation to the size of the overall archive, then requests will have to be carefully managed to reduce costs and align them with the speed of the Internet connection. If there are going to be frequent large retrievals, then this may not be the service for you.

Turning Glacier into a Dream Service for Video

Right now Glacier is a complex service to use. With the right front-end management features, I think it could form the basis of a great media archiving service. Here are some of the core elements that would add value to the platform.

  • Rich meta-data management. If all of the footage is indexed, tagged with rich and meaningful meta-data, then it's going to be much easier to unlock the value of the footage. This would make it easy to search and identify exactly what needs to be restored.

  • Accelerated video upload/download. This would help ingest assets and remove file size barriers associated with web-based uploaders and FTP.

  • Automatic proxy creation. By doing this, it would be easy for users to view proxies and preview media before incurring the cost of retrieving the source files from the deep archive.

  • Intelligent management of retrieval requests. Before a retrieval request is initiated, a system could provide options for automatically breaking up retrieval requests in order to reduce costs and align requests with realistic download speeds.

How Do You Backup and Archive Video Files

We would love to hear how you backup and archive data and whether you think Amazon Glacier would work for your business.

Look out for our follow up post that will compare LTO with some of the alternatives.