Wednesday, July 13, 2005

Buy or Lease Servers?

Every week I get bombarded with emails and advertisements for server hardware and software solutions. Obviously there are a lot of people out there who like to bring things in house, but I am here to tell you today that owning your own hardware is like buying your own cow to get your daily milk. Or do you remember in that cartoon the Incredibles that the seamstress says "not capes" and the hero says "but..." and the seamstress gives the examples. Let me give you a few.

A year ago I walked into a situation at a company that had recently been sold in which the data of 20,000 customers was sitting on two servers up in a data center. If those servers were to fail and the data were lost, the company would probably cease to exist. Most of the former IT employees no longer worked at the company as a result of the uncertainty that existed before the sale of the company, so there weren't many resources around to address the issue of how to manage the servers.

I started asking question such as "Where is the data backed up?" and "What is the recovery plan if something goes down on the servers?". It turned out that the configuration of a new server to support the customers was not a copy/paste recovery process. It was days of configuration to get things back up to running again on a new box if a processor or motherboard died (and the servers were already many years old and running 24 hours a day under tremendous loads).

As you look at this problem, it could happen anywhere. All you have to do is put a box in a data center and your problems begin. What do you do? Do you buy redundant hardware at $3-5000 per server? That is a lot of money to just have sitting around turned off! Do you use tape drives and arrays of disks? Who are you supposed to call to put humpty together again if the guy who installed the proprietary tape drive system is no longer around and you have a component failure in the server and need to get the data out? What if you have everything on your array of disks but you can't get the sucker installed in the latest box because the first time you installed it was a couple operating systems ago? What if one disk fails in your array and you they haven't made that drive for 4 years and you can't buy one? What if the ram fails and it takes you 48 hours to ship in the proprietary type that works with your big-name server?

Another company I know of loses an incredible amount of money each year because their programmers are busy fiddling with the firewalls and network that protects a couple really boring/low traffic servers. The wasted payroll of this effort is in the tens of thousands of dollars each year, and the lost opportunity cost that occurs as projects don't get completed and customers go to other vendors reaches into the millions.
At the end of the day, it is very expensive to own hardware and hire the resources to mantain those servers, especially considering that the maintenance should only happen once a year when something fails.

I think it is important to qualify this perspective. Some companies are like grocery stores in that they are huge and need a lot of "milk" or servers in this case, and it makes sense that they have people and extra hardware to handle these things. Interestingly, however, most businesses are nowhere near this big, but far too many of them pay big money and put themselves in bad situations when disasters happen, and it doesn't need to be that way.

In our case, for about $200 a month, we can pay for a leased server that is top of the line in a data center that is top notch. By going with one of the big boys, we are guaranteed hardware replacement when anything goes down, and guaranteed there are lots of spare parts for our server for years to come. The cost savings and reliability of such a solution or amazing. We have to watch our backups and know how to get back up again if there were any problems, but for the most part we are freed up to do what is most important, and that is make the company lots of money.

Tuesday, July 12, 2005

Java vs PHP and other scripting languages

It is amazing to me that after having created so many large applications in so many languages for so many companies, that after 8 years I still find myself in heated conversations with developers about which language to use to create the next web application. In the back of my mind, I have some pretty solid answers to most problems, but when I talk with very educated people who are in complete disagreement with me, it is reasonable that I second guess myself.

Some recent events brought a lot of clarity to this argument, however. As my team and I prepared to develop a major application I decided it would be fun to revisit a test I did about 5 years ago, and compare Resin (a Java application server known to be one of the fastest on the market), to PHP. Five years ago I found that by simulating 50 simultaneous requests to applications with a Mysql back end database, that the server came under almost no load when using PHP, while the Java stuff not only used 50% of CPU and RAM, but also ran considerably slower (2-3 times).

And so a few weeks ago I grabbed an old box, probably around a 1.5 Gig Pentium box with maybe 500 Megs of Ram, and installed the latest RedHat Fedora Linux. I figured that since our data center boxes are much more powerful than this hardware, it would be a good test. I put a lot of thought into how I would benchmark the applications in Java and PHP under various scenarios, and I ran the first test under PHP. I created a 500 character block of text and inserted it into an empty database 1 million times. This took about 2 1/2 minutes and the server did show that it was feeling the load, although I can't remember exactly what the load was because I planned to do the test again and then document the results. Something to keep in mind is that I was running the desktop environment on the box rather than just using the command line and it seems that the desktop GUI takes a ton of processing power away from the computer on such old hardware, so there were a number of factors contributing to a less than optimal running environment for the PHP application--but it was still really fast.

I then asked our top Java engineer, who has written significant J2EE applications for years, to set up the java environment to do the same test as I did in PHP on that server. He said it would be just a few minutes. Well, 4-5 hours later he gave up on setting up the environment because there were some libraries missing when he tried installing the new Java 5.

Now this was not the intention of the test, to highlight that one of the differences between Java or other "Enterprise" develpment environments and scripting languages is the speed of getting set up to do work. It is interesting to note, however, that I used PHP out of the box and had it running a heavy test within a very short time.

How fast you get you environment up and running is not really central to the discussion of when and where to use Java or a scripting language, so let me return to that discussion.

Some of the many things that cloud the issue of whether to use a scripting language or a language like Java is the discussion of frameworks, design patterns, and object relational database mapping. These topics are a combination of both theory and best practices and tend to be driven largely by the academic realm of universities and classrooms. But after years of indoctrination both in the classroom and in the real world about the many frameworks and patterns and object-oriented programming techniques, let me compare and contrast applications written in both scripting languages and enterprise platforms in production scenarios.

One of the largest applications I ever built in a scripting language consisted of 120 dynamic pages. It was an intranet application used for many different departments and every page was needed (no redundancy). The processing for every page was very simple, similar to the actual needs of most web applications I have ever worked on. For every form, there was a corresponding block of processing code which would cliean up the data a bit and update some information in the database, and do it all in just a few lines of readable code.

One of the largest projects I have worked on in the java realm was a 300+ page online banking application. It had all the buzzwords, it used model/view/controller and it was multi-tiered with presentation, business, and database layers. Obviously it was a bit complicated to work on but everyone at the company felt that in a mere 6 months an average java developer could get around the software effectively enough to do things without having to ask others how anymore. Although I've expressed a little sarcasm, I'll be honest in saying that the software was actually very well architected.

So what does this all mean? Well, recently I read an article about the 10th anniversay of java and it featured an interview with James Gosling, the creator of Java. He was asked about his perception of scripting languages. He said something to the effect that java for web applications (J2EE) and scripting languages are different animals. He said in essence that scripting languages are great for protyping and getting an application out the door with an emphasis only on getting something working as quickly possible. He said that java is not concerned with getting something out the door as quickly as possible, it is concerned with building a scalable, dependable platform as quickly as possible (something to that effect).

When James says scalable, what he means, I believe, is that if you need to go from 10,000 users a day to 10,000,000 or more users a day, you can do so by putting in clusters of servers handling loads at different tiers. In the case of the banking application I worked on, this is elevant, because every Friday there are 10,000 people logged in simultaneously reading data from their bank accounts and it takes some planning to pull that off smoothly.

But I think James dismisses scripting languages a bit too quickly in calling them "prototyping" languages. If I can take an old crummy box that pales in comparison to my data center top-of-the-line machines, and get it to do a million database operations in less than 3 minutes, then just think about the implications of that. Over the course of a day, I think I can easily handle 1 million page views with database operations on a live box in a data center on a relatively cheap server. Just for fun, let's say that I do 1 million sales a day on my website at $1.00 each. That means that I could do $365 million in business on a cheap box in a data center over the course of the year. That isn't anything to sneeze at.

What would the code for that look like....well, for the most part exactly what I described for the 120-page scripting example above. It would display pages of dynamic data and have short action pages of inserts and updates to the database with a little text formatting thrown in.

What I take away from these examples and from years of experience is this: Most web software applications that need to be built today are squarely in the problem domain of a scripting language. Some applications are not, such as high transaction systems like stock exchanges and banking, but most of the time you don't have to worry about those scenarios. Understanding these principles and dilineations can be worth tens of thousands or millions of dollars to your business because they have an enormous effect on not only your costs but also your opportunities.

Although understanding these principles is important, you aren't out of the woods simply by knowing the differences between when to use a scripting language and when to use a language like Java. One problem area includes which scripting language to use: ASP, ASP 2.0, Ruby, Perl, PHP, Python, Cold Fusion, ..., etc. Other problem areas include whether to use frameworks or not, whether to use object-oriented features, which database to use and which features of that database to use, and how good your programmers are, because bad programmers can mess up any language or programming approach. Those questions and answers are out of the scope of this article but are also important to address.