Cloud Computing Does Not Spell the End for Common Sense I.T. Management

Sometimes I think I might be the only one who retains commons sense. Really. At least in the area of I.T. Management. Though we had our share of growing pains at b5media, the knowledge gained from working in an enterprise environment at Northrop Grumman was only accentuated by my tenure as the Director of Technology at b5media.

Unfortunately, some common best-use practices in developing infrastructure are often put aside by those with shiny object syndrome surrounding “cloud computing“.

Let me explain.

You may have noticed a severe hampering of many internet services over the weekend. The culprit was a rare, but yet heavy-duty outage of Amazon S3 (Simple Storage Service) cloud storage. S3 is used by many companies including Twitter, WordPress.com, FriendFeed, and SmugMug to name a few. Even more individuals are using S3 for online data backup or for small projects requiring always-on virtual disk space. Startups often use S3 due to the “always on” storage, defacto CDN and the inexpensive nature of the service… it really is cheap!

And that’s good. I’m a fan of using the cheapest, most reliable service for anything. Whatever gets you to the next level quickest and with as little output of dollars is good in my book, for the same reason I’m a fan of prototyping ideas in Ruby on Rails (but to be clear, after the prototype, build on something more reliable and capable of handling multi-threaded processes, kthxbai.)

However, sound I.T. management practice says that there should never be a single point of failure. Ever. Take a step back and map out the infrastructure. If you see anyplace where there’s only one of those connecting lines between major resource A and major resource B – start looking there for bottlenecks and potential company-sinking aggravation.

Thus was the case for many companies using S3. Depending on the use of S3, and if the companies had failover to other caches, some companies were affected more than others. Twitter for instance, uses S3 for avatar storage but had no other “cold cache” for that data rendering a service without user images – bad, but not deathly.

SmugMug shrugged the whole thing off (which is a far cry from the disastrous admission that “hot cache” was used very little when Amazon went down back in February), which I thought was a bit odd. Their entire company revolves around hosted photos on Amazon S3 and they simply shrugged off an 8 hour outage as “ok because everyone goes down once in awhile”. Yeah, and occasionally people get mugged in dark city streets, but as long as it’s not me it’s okay! Maybe it was the fact that the outage occurred on a Sunday. Who knows? To me, this sort of outage rages as a 9.5/10 on the critical scale. Their entire business is wrapped up in S3 storage with no failover. For perspective, one 8 hour outage in July constitutes 98.9% uptime – a far cry from five 9’s (99.999%) which is minimal mitigation of risk in enterprise, mission-critical services.

WordPress.com, as always, comes through as a shining example of a company who economically benefits from the use of S3 as a cold cache and not primary access or “warm cache”.

Let me stop and provide some definition. Warm (or hot) cache is always preferable to cold cache. It is data that has been loaded into memory or a more reasonably accessible location – but typically memory. Cold cache is a file based storage of cached data. It is less frequently accessed because access only occurs if warm cache data has expired or doesn’t exist.

WordPress.com has multiple levels of caching because they are smart and understand the basic premise of eliminating single point of failure. Image data is primarily accessed over their server cluster via a CDN, however S3 is used as a cold cache. With the collapse of S3 over the weekend, WordPress.com, from my checking, remained unaffected.

This is the basic principle of I.T. enterprise computing that is lost on so much of the “web world”. If companies have built and scaled (particularly if they have scaled!) and rely on S3 with no failover, shame on them. Does it give Amazon a black eye? Absolutely. however, at the end of the day SmugMug, WordPress.com, Friendfeed, Twitter and all the other companies utilizing S3 answer to their customers and do not have the luxury of pointing the finger at Amazon. If their business is negatively affected, they have no one to blame but themselves. The companies who understood this planned accordingly and were not negatively affected by the S3 outage. Those who weren’t were left, well, holding the bag.

Added: GNIP gets it, and they are new to the game. Even startups have no excuse.

Identi.ca and the Art of the Launch

Ask any startup. The most difficult decision leading up to a public release is when and what? Some might argue that getting funding is the most difficult but a good startup avoids funding until later, if at all. Others might argue that the difficult part is getting the right mix of people and hitting milestones. That also is important, but not as important as the when and how.

Usually, a good launch product is the result of a perceived need. Or maybe a need not yet realized – it’s hard to say for sure. There’s some black magic involved in all that.

FriendFeed launched not long ago because there was an empty hole in Twitter – that was aggregation and conversation. FriendFeed figured out that, to be successful, it was going to target that emptiness in the highly popular Twitter experience.

Disqus and Intense Debate figured that, in order to be successful, they needed to target the missing piece in blog comments – that was reputation and reputation management across blogs. The two fight it out, post-launch, over which is going to differentiate it over the other.

In these cases, the timing of the launches was critical to the uptake. Twitter started experiencing significant problems and influential early adopters began getting itchy to be somewhere that scratched their itch.

Putting aside timing, the most important part of a launch is what. It’s feature-sets. It’s determining the balance between a fully developed roadmap of features and what is needed to “hook” early adopters and get them to stay.

Take Identi.ca, the new Twitter clone that is completely open source and is timely in that Twitter faithful are really, really close to burying the hatchet and simply abandoning it altogether. The timing could not be more perfect. Folks have been talking about distributing Twitter and relieving the strain of a centralized service at one time. Open sourcing the product does this, to a degree.

However, Identi.ca gets a big “FAIL” for its launch for a few very important reasons.

  1. There is no coherent way to deal with “replies”. Folks used to Twitter realize that when there is a river of content, and that’s what Twitter is, there must be a way to manage conversations. There must be a way to keep up with followers who are talking to you. In my working with Identi.ca, there is no way to do that and, while that might be coming, it wasn’t there at launch. Very conceivably, I’ve been lost forever and I generally have tons of followers as an early adopter. FAIL.
  2. XMPP doesn’t work. The one reliable way to reply that folks on Identi.ca were talking about last night was with XMPP, the protocol used for various IM clients including Google Talk. I could deal with replies that way if it worked but at some point, XMPP stopped working. I could receive, but I could not send. A one way conversation is a monologue. FAIL.
  3. OpenID integration must be seamless. I was pleased to see OpenID supported when I signed up. Unfortunately, today, I could not login with my OpenID account. If I can’t get in, I can’t use it. FAIL.

Some would say I’m being too hard on this startup. Screw that. Perform or get off the stage. There are very obvious and defined features that must be included in a microcontent site at launch. I’m not saying an entire roadmap needs to be worked out. No, get a working beta up and get testers in there. However, without replies, without reliable “offline” access (i.e. IM, SMS or client integration) I’m not going to stick around. Finally, direct messages would be a nice feature.

While I have high hopes for Identi.ca, I will remain there only to squat on the name “technosailor”. Bye, guys.

The Mind of Dave Winer

Dave Winer has a bad reputation. He’s got a reputation for challenging anyone who disagrees with him. He’s got a reputation for blocking people by default on Twitter.

Yeah. It’s the rule, not the exception.

See, blocking on Twitter is an acceptable action. I’ve blocked people that are so troll-like, I can’t deal with them. These are people who have indicated in the context of their tweets that Christianity is responsible for pedophilia, nearly all murder and bloodshed in the world, etc. While I won’t argue that Christianity has historically included bloodshed and murder in the name of Jesus and that there are sad cases of unacceptable sexual actions in the name of Jesus, that does not qualify for an ongoing, destructive attack on a religion that has done much good, has a significant number of followers, etc. Blocked.

I also have blocked people who belligerently disparage people unprovoked. But very few, and only after a long period of time where my tolerance level have been diminished.

Blocking is an acceptable action in some cases. Most people looking to filter noise simply don’t follow people in return and if it turns out that a person is creating too much noise, unfollowing is the socially acceptable thing to do. Blocking is an ultimate action that is usually only taken when there are no other options. See, Twitter is all about opt-in. I opt to see your updates and vica versa. It’s a “pull” technology, not a “push” technology. I cannot control who hears my messages, but with a block I can control who doesn’t.

Dave has opted to take the ultimate action on gads of people, and while that is within his right to do (the action is not necessarily in question), the perception is a different story. The perception is that he is silencing those who disagree with him. Like Stalin did. Like Mao Zedong did. Like Fidel Castro did. Like the government of Myanamar is currently doing.

Dave’s inability to tolerate those in opposition to him flies in the face of his political fantasies of inclusion for everyone. Here’s a tweet where, in broad strokes, he paints the Republican party as racists. Another one where he quantifies the use of “average white person” as meaning “racist” – more broad strokes from a guy who demonstrates his own inability to get along with people.

Here’s what Dave needs to understand. While he is, without prejudice, responsible for many of the technologies we use today – RSS and blogs – he is past his time, out of touch with reality, and quite possibly a lunatic. His inability to behave in socially acceptable ways pushes him to the fringe of, not only the social and new media space, but civilized society as a whole. His knack for building technologies that someone else has created and calling them his own innovations – whether explicitly or implicitly) his getting tiresome. See Dave’s Twitter uptime monitor of May 23, 2008 vs. Pingdom’s report from Dec 19, 2007. Also see Dave’s decentralizing Twitter “idea” from May 4, 2008 is something I talked about on Twitter quite a bit months before he came up with his groundbreaking idea.

So Dave, instead of building silly apps that do nothing particularly fancy and using Comcast bandwidth, why don’t you go re-inherit your seat at the table and write a whitepaper/spec for decentralized Twitter. Think of it as a protocol, much like email, and go from there. It should include SMS gateways, APIs for handing messages around. And for a business plan, make the open APIs accessible via a pay model. You might be on to something then and it will allow you to be productive as opposed to squashing dissent and blocking people for no apparent reason.