I just finished writing a very complicated proposal for a video-sharing social network project that was essentially YouTube built with Rails. The online video-sharing arena is already pretty crowded but apparently there are still latecomers who want to get in on the action.
What was most surprising to me — at least, during the course of my research — was exactly how cheap it was to build our very own YouTube clone. Obviously as you approach YouTube’s size (45 terabytes of data and counting), it becomes more and more difficult to keep your operating costs down, but starting up is getting cheaper by the minute. (Which probably explains why everyone is jumping on the bandwagon.)
A big part of the steadily cheapening technology costs is the fact that you can now get pay-as-you-go grid computing and storage at very affordable rates. There are a whole bunch of services available out there, but for this pitch we decided to go with Amazon and .
Our project requirements required us to build an application that could handle 5,000 videos of 3-minutes-or-less per day, or as many as 300 submissions per hour. Those are piddling numbers compared to YouTube’s , but they were big enough that we were understandably worried about how to manage the processing and conversion to FLV (streaming Flash video).
The process we were proposing to build worked like this:
-
User uploads a video.
The average 3-minute MPEG video weighs in at 15mb. -
Video is encoded or queued for encoding.
The video, assuming it is not corrupted or obviously broken in some way, is encoded into FLV, using open-source encoders FFMPEG or mplayer. Given the stated 300-videos-per-hour scenario, it’s entirely possible that you will run out of CPU resources to handle the load during peak hours of operation. (My Macbook Pro took about 10 seconds to do one full conversion of a 3-min WMV during my informal tests. 300 videos could take almost the entire hour.) When this happens, videos are held in a queue until enough resources are available to process them. -
Video is queued for approval.
(This was a client request and is probably in response to the recent spats with YouTube and Viacom.) After a video has been encoded, it enters another queue, this time for editorial approval. Our estimates indicated that it would take an editor an average of 60 seconds per submitted video. This immediately presents a problem, as it suggests that your peak of 300 submissions would take 5 hours to process. If you have several peak hours in a row (which are often the case), you will have approval queues spilling over into the next day. A management solution must thus be in place to allow several editors to work in parallel. -
Approved video is displayed on the website.
At this point, the video is available for public viewing. Flash video streaming would be accomplished by using the lighttpd web server, which rather conveniently supports FLV-streaming out-of-the-box. You could optionally also use flvtool2 to insert meta-data into each video. Every approved video is taggable, rateable, commentable, emailable, etc. (It’s a YouTube clone, what can I say.)
So how much does this whole thing cost?
Well, first let’s talk about some more numbers.
The big issue here is storage. Each new FLV weighs in at 2-4mb. If we assume that our maximum growth rate is 5,000 FLVs per day, we’re looking at a storage ceiling of about 450GBs in one month. If growth continues at a steady pace, you would need to have at least 6TBs of storage for your first year. And that doesn’t even include the raw originals, which would be 3-4 times larger. Neither does it include any kind of regular backup.
After the storage, you then have to worry about bandwidth. Let’s do some simple math and assume that each video on this theoretical YouTube clone would only be watched 20 times on average for the duration of its existence. That means that each video on the website represents a bandwidth cost of about 80mb or so. 150,000 videos per month multiplied by 80mb equals 12TB of monthly bandwidth.
When you apply these numbers of a solution powered by EC2/S3, you get some fairly affordable figures:
EC2 grid-computing costs for 1 year = $1,800.00
S3 storage costs for 1 year = $7,800.00
S3 bandwidth costs for 1 year = $18,000.00
Your total technology cost is less than $30,000.00, for a website that would (theoretically) end up to be about 1/10th the current size of YouTube.com. When you consider that YouTube was valued at $1.65 billion, $30,000.00 is chump change. Of course, you would actually need someone to develop and maintain this darned thing for you. You’d also need to spend on a full-time editorial team to view each and every one of those pesky videos:
Development cost = $30,000.00
Maintenance cost = $12,000.00
Editorial team = $15,000.00
All together you come up with a grand total of about $85,000.00. Still less than 0.0005% of that $1.65 billion.