The Pros and Cons of Cloud Computing

Several years ago, a new buzzword entered the fray of internet speak: The Cloud. In the past, I’ve written critically about cloud computing, and my reservations originally outlined remain. However, there is real value in the cloud as well.

Known for applications that are considered “Software as a Service”, or SaaS, The Cloud represents the idea that data processing and storage exists, not on physical hardware maintained by an organization or their designated controlled environment data center, but literally out on the internet. The data and processing is harnessed by the power of distributed computing across grids and data centers. The power of multiple hundreds of thousands of server farms can much more effectively manage the processing power of a service, application or company far better than a single server or node of servers. In fact, economically, cloud computing is far cheaper than traditional computing paradigms.

In an enterprise sense, cloud computing is often handled by Content Delivery Networks (CDNs) such as Limelight, Panther or Akamai. These enterprise solutions run as cheap as $0.25/Gb of bandwidth/transfer to $1.00/Gb. Other lower cost solutions include Amazon’s EC2 or Google AppEngine (which now supports Java in addition to it’s longstanding Python commitment).

Companies looking to expand their offerings with web apps, downloadable materials or provide a service relying heavily on rich media (images, video [especially HD video], or streaming audio) might consider the Cloud as a viable scaling solution. In my opinion, based on my experiences, there are benefits and drawbacks to the Cloud.

Benefits of the Cloud

Cloud computing is cheap. Dirt cheap. If you’re scaling up an application – that is active growth projections and model, not simply a prototype – Amazon EC2 will give you computing power for as little as 10 cents an hour, and time is measured only when the cloud is actually working on behalf of a user, so if it’s idling you’re in the clear. In addition, at 10 cents per gigabyte of bandwidth, it’s extremely cheap to begin a large scale growth projection.

A secondary benefit of using the cloud is your ability to fire your IT staff. Within reason, of course. The absence of physical hardware and infrastructure security requirements allows your company to devote more resources to the development of your technology product, as opposed to positioning watchmen on the wall, so to speak.

Thirdly, the cloud is infinitely scalable. It is not necessary to worry about clustering, nodes, GeoIP content serving (that is, serving content from a UK-based data center to a user in Germany as opposed to serving from Southeast Asia, as an example). Simply put, the Cloud allows you to build as much capacity and bandwidth as you’re willing to pay for.

Drawbacks of the Cloud

Of course, since there is no such thing as a free lunch, there are also detractors to leveraging the cloud. In a siloed environment of physical hardware and CAT-5 cabling across server closets, it is possible to scale by diversifying the vendors and hardware and data centers. From a business perspective, this means you can work one vendor against another and possibly incite a bidding war. You could use all or none of the vendors that might show interest. In the cloud, it is difficult, if not impossible, to choose 2 different CDN providers in addition to Amazon EC2, for instance. Tim Bray wrote a piece about this last year and raises significant, yet important, questions.

Secondarily, especially for cheaper solutions like Amazon S3 (different from EC2 in that it is simply a storage facility in the cloud), you have a high chance of latency. Simply put, the network connection is not fast enough to be able to rely on to serve rich media at scale. Many services out there, including WordPress.com opt to use Amazon S3 as a “cold cache” – that is, the last place that content is served from and then only if needed. The latency of the network makes it prohibitive, depending on the application, to do it on every page load.

Finally, Cloud reliance can cause significant problems if the control of downtime and outages is removed from your control. Over the past year, Amazon has had significant downtime incidents (8 hours in one case!). Reliance on the cloud can cause real problems when time is money.

In the past, we have advocated for a hybrid solution to cloud computing. It is perfectly okay and reasonable (even expected!) for companies to leverage the cloud. Economically, it allows them to go crazy at building the business and focusing resources. In a down economy, the economics behind the Cloud over physical hardware is a no-brainer. However, we continue to advocate for a failover plan that will help an agile company dodge the effects of downtime. A hybrid environment is also attractive as well, allowing companies to directly manage and control critical operational systems and benefit from the infinite possibilities of scale.

Cloud Computing Does Not Spell the End for Common Sense I.T. Management

Sometimes I think I might be the only one who retains commons sense. Really. At least in the area of I.T. Management. Though we had our share of growing pains at b5media, the knowledge gained from working in an enterprise environment at Northrop Grumman was only accentuated by my tenure as the Director of Technology at b5media.

Unfortunately, some common best-use practices in developing infrastructure are often put aside by those with shiny object syndrome surrounding “cloud computing“.

Let me explain.

You may have noticed a severe hampering of many internet services over the weekend. The culprit was a rare, but yet heavy-duty outage of Amazon S3 (Simple Storage Service) cloud storage. S3 is used by many companies including Twitter, WordPress.com, FriendFeed, and SmugMug to name a few. Even more individuals are using S3 for online data backup or for small projects requiring always-on virtual disk space. Startups often use S3 due to the “always on” storage, defacto CDN and the inexpensive nature of the service… it really is cheap!

And that’s good. I’m a fan of using the cheapest, most reliable service for anything. Whatever gets you to the next level quickest and with as little output of dollars is good in my book, for the same reason I’m a fan of prototyping ideas in Ruby on Rails (but to be clear, after the prototype, build on something more reliable and capable of handling multi-threaded processes, kthxbai.)

However, sound I.T. management practice says that there should never be a single point of failure. Ever. Take a step back and map out the infrastructure. If you see anyplace where there’s only one of those connecting lines between major resource A and major resource B – start looking there for bottlenecks and potential company-sinking aggravation.

Thus was the case for many companies using S3. Depending on the use of S3, and if the companies had failover to other caches, some companies were affected more than others. Twitter for instance, uses S3 for avatar storage but had no other “cold cache” for that data rendering a service without user images – bad, but not deathly.

SmugMug shrugged the whole thing off (which is a far cry from the disastrous admission that “hot cache” was used very little when Amazon went down back in February), which I thought was a bit odd. Their entire company revolves around hosted photos on Amazon S3 and they simply shrugged off an 8 hour outage as “ok because everyone goes down once in awhile”. Yeah, and occasionally people get mugged in dark city streets, but as long as it’s not me it’s okay! Maybe it was the fact that the outage occurred on a Sunday. Who knows? To me, this sort of outage rages as a 9.5/10 on the critical scale. Their entire business is wrapped up in S3 storage with no failover. For perspective, one 8 hour outage in July constitutes 98.9% uptime – a far cry from five 9’s (99.999%) which is minimal mitigation of risk in enterprise, mission-critical services.

WordPress.com, as always, comes through as a shining example of a company who economically benefits from the use of S3 as a cold cache and not primary access or “warm cache”.

Let me stop and provide some definition. Warm (or hot) cache is always preferable to cold cache. It is data that has been loaded into memory or a more reasonably accessible location – but typically memory. Cold cache is a file based storage of cached data. It is less frequently accessed because access only occurs if warm cache data has expired or doesn’t exist.

WordPress.com has multiple levels of caching because they are smart and understand the basic premise of eliminating single point of failure. Image data is primarily accessed over their server cluster via a CDN, however S3 is used as a cold cache. With the collapse of S3 over the weekend, WordPress.com, from my checking, remained unaffected.

This is the basic principle of I.T. enterprise computing that is lost on so much of the “web world”. If companies have built and scaled (particularly if they have scaled!) and rely on S3 with no failover, shame on them. Does it give Amazon a black eye? Absolutely. however, at the end of the day SmugMug, WordPress.com, Friendfeed, Twitter and all the other companies utilizing S3 answer to their customers and do not have the luxury of pointing the finger at Amazon. If their business is negatively affected, they have no one to blame but themselves. The companies who understood this planned accordingly and were not negatively affected by the S3 outage. Those who weren’t were left, well, holding the bag.

Added: GNIP gets it, and they are new to the game. Even startups have no excuse.

Early Adopters Are Useless

We are early adopters. We use. We try. We evangelize. We bury. We filter.

That’s what we think anyway.

In reality, we are pretty useless.

Late last year, Amazon released the Kindle to the joy and enthusiasm of many early adopters. Robert Scoble, the poster child for early adopters, gleefully got his Kindle on the first day and wrote about how beautiful it was and how it brought him great pleasure. One week later, he hated the Kindle listing a laundry list of problems from usability to the inability to send gifts to other Kindle owners.

Increasingly, I’m seeing common people (read: non-tech early adopters) who own and love the Kindle. And the numbers bear that out, if we’re to believe TechCrunch’s statement that by 2010, Amazon will have sold $750M in Kindles or 1-3% of the company’s total revenue. (Update: For clarity, the TechCrunch article cites a CitiGroup analyst and is not the authoritative assessment of TechCrunch. My point is, that’s where I heard the number in the first place – regardless of the original source.)

Brad Feld, a few years ago, wrote an amazing article titled The First 25,000 Users are Irrelevant which talks about the effect of early adopters on companies and products. As the oh-too-typical scenario goes, TechCrunch or Mashable covers a new product, there is a surge of traffic, registration or sign-ups for private beta invites from early adopters, or “tire kickers” then they go away. Some remain and become “evangelists” for the company or product, but most people don’t even care. Later on, if the company has mainstream staying power, the real buy-in will happen organically and without the say-so of the early adopters who largely came and went.

See, we like to tell people we are filters. We like to think we are influencers and powerful. We like to think we have an inside angle on what works and what doesn’t work, but we are just small insignificant people in the grand scheme of things, and largely irrelevant.

Amazon knows this. They don’t really care about us. And that’s why they might hit the $750M mark by 2010 and completely bypass the early adopters, placing their Kindle directly in the hands of mainstream commuters and book lovers.

Update: Corvida at SheGeeks thinks this is generational and writes a thoughtful and intelligent argument about this. However, I’m not convinced that everything is generational. I think early adoption is also a result of personalities.