Make the Web, Cloud Do Your Work So You Don’t Have To

2668834386_ef9cfbd4e0_z


Photo by Balleyne

While perusing around the web yesterday (after sifting through my email post-vacation), I came across this Ars Technica article discussing the new Firefox upgrade timeline. It actually follows a similar upgrade timeline that WordPress adopted after WordPress 2.0 was released.

The new policy outlines a 3-4 month window for new major releases with limited security updates for releases outside of the current stable release.

The Ars article goes on to describe the angst that has come out of the corporate community as they have been lulled into a process of having to test new releases of software to ensure compatibility with their internal firewall’d webapps that have, in no small part, been created for a specific browser – usually Internet Explorer 6 or 7.

Browser Stagnation Caused IT Stagnation

A few years ago, the stagnation of browser support was broken as Firefox and Opera started a race to implement CSS3 features that were not necessarily status quo, as a result of Internet Explorer, and were not even blessed as part of an official spec. The browser makers just started doing it.

Notably, some of these browser-specific “add-ons” to CSS dealt with things that had been desired but only usable with browser hacks: rounded corners, opacity, etc.

Apple came on the scene, particularly with iOS (then iPhone OS), and put a tremendous amount of development efforts into WebKit. WebKit is a browser framework like Gecko, the framework that Firefox and the old Netscape are built on was. Apple’s take on WebKit was Safari. Google followed suit with Chrome awhile later, also built on WebKit.

What we end up with is a browser war with higher stakes than the famed Internet Explorer-Netscape war of the 1990s. We also see a lot more innovation and one-upmanship… something that can only be good for consumers.

The Ars article describes a tenuous balance for enterprise customers. That balance is the need to support internal firewalled applications while giving users access to the public web. The money quote from the article sums up the balance nicely:

The Web is a shared medium. It’s used for both private and public sites, and the ability to access these sites is dependent on Web browsers understanding a common set of protocols and file formats (many corporate intranet sites may not in fact be accessible from the Internet itself, but the browsers used to access these sites generally have to live in both worlds).

[...]

If developers could be sure that only Internet Explorer 9, Firefox 5, and Chrome 13 were in use on the Internet, they would be able to make substantial savings in development and testing, and would have a wealth of additional features available to use.

But they can’t assume that, and so they have to avoid desirable features or waste time working around their absence. And a major reason—not the only reason, but a substantial one—is corporate users. Corporate users who can’t update their browsers because of some persnickety internal application they have to use, but who then go and use that same browser on the public Internet. By unleashing these obsolete browsers on the world at large, these corporate users make the Web worse for everyone. Web developers have to target the lowest common denominator, and the corporations are making that lowest common denominator that much lower.

As someone who has worked on the web for more than 10 years and who has also worked in Enterprise, I agree.

I remember when I worked for the Navy and the Navy-Marine Corps Intranet (NMCI) was in deployment. It was a massive headache for everyone involved because the assumption with that contract was that systems could uniformly be tied together and standardized. By my understanding, they finally achieved that last year, but not until after being years late and hundreds of millions over budget.

I don’t know the final deployment as my contract with the Navy ended back in 2004. I know that proprietary systems were in place that were designed to a function and not to a standard.  When standards were introduced as necessary requisites for any system in that eco-system, the implications were huge.

This is the world we live in today where, as the Ars article points out, browsers that must live in a world of compatibility and still access the public web drag the rest of us down.

Outsource Your Shit and Focus on Your Core Business

But Ars already makes that point. I’m not making it again except to highlight the validity of their thoughts. My point is more intrinsic to startups, small businesses and entrepreneurs and I make it delicately as it has, in some ways, countered some of my thoughts in the past.

Why should you worry about building applications to a function when you can build them to a standard? Or better yet, why should you build from the ground up to a function when you can use external, cloud-based services built to a standard.

Take Microsoft’s just-announced Microsoft Office 365. Now, I don’t know anything about this product so don’t take my commentary as an endorsement in any way. We use Google Apps at WP Engine (another good example of exactly what I’m saying here).

In Office 365, you have a common piece of line-of-business software (Microsoft Office) available for a subscription and hosted in the cloud. This eliminates IT Administrators requirement for testing on the internal network. It’s on the web! Everyone has the web! And it doesn’t need (and in fact, cannot work) with non-standard browsers. And you don’t even need Microsoft’s browser to use it.

Suddenly, IT Administrators along with Microsoft have saved the Enterprise tens, if not hundreds, of thousands of dollars in man-hours testing and re-resting for OS compatibility. And suddenly, IT Administrators along with Microsoft have taken the chains off users to have freedom of choice in their browsers (which, by the way, is more than a pie in the sky idealistic thing… it’s also a cost-saving efficiency thing). And also suddenly, Microsoft has released the web to be able to thrive and not be retarded by corporate requirements.

This kind of thing makes perfect sense. Why re-invent the wheel? Why put resources into something you don’t have to? Why not let a third party, like Microsoft or Google, worry about the compatibility issues in line-of-business software.

After all, your company isn’t in the core business of building these applications. You are in the line of business of doing something else… building a product, a social network, a mobile app, a hosting company, etc. Your software should not define the cost of doing business. Your people and your product should.

99.96% Uptime is Bogus Marketing

Reliability Update

Twitter has been making great progress in terms of uptime and
reliability. Fail Whale sightings are far less frequent these
days thanks to our efforts but we still have a long journey
ahead. Last month we saw 99.88% uptime and so far this month we
are at 99.96%. Our engineering and operations teams have been
taking a very methodical approach to improving Twitter. We’re
using the word “craftsmanship” to characterize our work here at
the office. Reliability and dependability continue to be top on
or list of key goals.

The above passage is from an email from Biz Stone at Twitter today. After a horrid June, things could only go up at Twitter HQ. Fortunately, it looked like they got serious about the uptime issues they had their and things have been better.

In the meantime, they purchased the super reliable and speedy Summize and branded it with Twitter branding at search.twitter.com.

This could only be a good thing, right?

Well, you’d think. Except the purchase of the super speedy and efficient Summize has only driven the tool into the pond. To be fair, it’s not horrible, but it suffers from the same weaknesses that Twitter does.

That is, it can’t keep up.

As an example, I’ve been following #dnc08 and #rnc08 searches on Summize to watch what people are saying about the political conventions. During the high traffic tweet windows during the evenings of the conventions, Twitter is reliable. That is, they are reliably late. Usually 1-2 hours behind the actual tweet stream.

This is completely unacceptable, and it is complete spin, I guess in the spirit of the conventions, for Biz to tout 99.96% uptime.

Let me be clear, when things are slow and not performing up to standard, you cannot claim 99.96% uptime. Technically you can. Uptime is technically defined by if the web server serves a 404 Page not Found (or Twitter Fail Whale in their case) or a 200 Page found status code.

But from a common sense user experience, this is not uptime. And to claim so is disingenuous.

I appreciate the efforts Twitter has put into improving, but why are we fighting the ability to use the tools during high-demand times. In essence, that makes the tools completely useless.

I look forward to better results, but my skepticism remains in place about Twitter. They do not have the staying power to make it.

I have been on a 3 month hiatus on Twitter blogging. I have refused to blog about it, but there’s another post that has been in the back of my mind for some time. What happens when companies and businesses trying to use Twitter as a marketing and communications tool cannot. What happens when your brand relies on the communication lines and those communications lines dying?

Another day, another post. For today, the spin needs to be exposed.

Cloud Computing Does Not Spell the End for Common Sense I.T. Management

Sometimes I think I might be the only one who retains commons sense. Really. At least in the area of I.T. Management. Though we had our share of growing pains at b5media, the knowledge gained from working in an enterprise environment at Northrop Grumman was only accentuated by my tenure as the Director of Technology at b5media.

Unfortunately, some common best-use practices in developing infrastructure are often put aside by those with shiny object syndrome surrounding “cloud computing“.

Let me explain.

You may have noticed a severe hampering of many internet services over the weekend. The culprit was a rare, but yet heavy-duty outage of Amazon S3 (Simple Storage Service) cloud storage. S3 is used by many companies including Twitter, WordPress.com, FriendFeed, and SmugMug to name a few. Even more individuals are using S3 for online data backup or for small projects requiring always-on virtual disk space. Startups often use S3 due to the “always on” storage, defacto CDN and the inexpensive nature of the service… it really is cheap!

And that’s good. I’m a fan of using the cheapest, most reliable service for anything. Whatever gets you to the next level quickest and with as little output of dollars is good in my book, for the same reason I’m a fan of prototyping ideas in Ruby on Rails (but to be clear, after the prototype, build on something more reliable and capable of handling multi-threaded processes, kthxbai.)

However, sound I.T. management practice says that there should never be a single point of failure. Ever. Take a step back and map out the infrastructure. If you see anyplace where there’s only one of those connecting lines between major resource A and major resource B – start looking there for bottlenecks and potential company-sinking aggravation.

Thus was the case for many companies using S3. Depending on the use of S3, and if the companies had failover to other caches, some companies were affected more than others. Twitter for instance, uses S3 for avatar storage but had no other “cold cache” for that data rendering a service without user images – bad, but not deathly.

SmugMug shrugged the whole thing off (which is a far cry from the disastrous admission that “hot cache” was used very little when Amazon went down back in February), which I thought was a bit odd. Their entire company revolves around hosted photos on Amazon S3 and they simply shrugged off an 8 hour outage as “ok because everyone goes down once in awhile”. Yeah, and occasionally people get mugged in dark city streets, but as long as it’s not me it’s okay! Maybe it was the fact that the outage occurred on a Sunday. Who knows? To me, this sort of outage rages as a 9.5/10 on the critical scale. Their entire business is wrapped up in S3 storage with no failover. For perspective, one 8 hour outage in July constitutes 98.9% uptime – a far cry from five 9′s (99.999%) which is minimal mitigation of risk in enterprise, mission-critical services.

WordPress.com, as always, comes through as a shining example of a company who economically benefits from the use of S3 as a cold cache and not primary access or “warm cache”.

Let me stop and provide some definition. Warm (or hot) cache is always preferable to cold cache. It is data that has been loaded into memory or a more reasonably accessible location – but typically memory. Cold cache is a file based storage of cached data. It is less frequently accessed because access only occurs if warm cache data has expired or doesn’t exist.

WordPress.com has multiple levels of caching because they are smart and understand the basic premise of eliminating single point of failure. Image data is primarily accessed over their server cluster via a CDN, however S3 is used as a cold cache. With the collapse of S3 over the weekend, WordPress.com, from my checking, remained unaffected.

This is the basic principle of I.T. enterprise computing that is lost on so much of the “web world”. If companies have built and scaled (particularly if they have scaled!) and rely on S3 with no failover, shame on them. Does it give Amazon a black eye? Absolutely. however, at the end of the day SmugMug, WordPress.com, Friendfeed, Twitter and all the other companies utilizing S3 answer to their customers and do not have the luxury of pointing the finger at Amazon. If their business is negatively affected, they have no one to blame but themselves. The companies who understood this planned accordingly and were not negatively affected by the S3 outage. Those who weren’t were left, well, holding the bag.

Added: GNIP gets it, and they are new to the game. Even startups have no excuse.