J. R. Boynton

Website network and systems architecture

Some notes about the standard wisdom, and options....

Generally, we imagine a string: router, load balancer, multiple web servers, multiple application servers, router, a single database server, a content management server.

Database servers

The decision to use a single database server is the standard wisdom. Oracle certainly wants you to buy a big Oracle database license, to run on a large server. With database software like Oracle, you are better off using as few databases as possible. Each instance of Oracle is expensive to configure well and maintain.

I wonder, though, if you couldn't separate your data into two sets. One would be for transactions and information for which security is critical. The other category would be for content – articles, product data, maybe even user session data. This latter category could be maintained in lightweight databases, and mirrored onto many database servers. That would leave a relatively small amount of data that is used for a relatively small number of transaction. Even if you use Oracle for that category of data, the license will be cheaper because you need less hardware, and the need and difficulty to optimize performance would be significantly reduced.

I'm also a proponent of pre-processing database information. Maintain one set of tables, and automatically generate tables that are optimized for the use of application software. The generated tables could easily be mirrored to other systems. Database gurus seem to be more interested in processing data at request time – which requires significantly larger hardware.

Web servers and application servers

As for web servers and application servers, the best number of each to use depends a lot on the software you run.

In general, you are better off having more servers. If you have only one of each, then you are in serious trouble if something breaks. If you have two of each and one breaks, you aren't dead in the water, but you have to fix the problem immediately. If you have three of each – and two of each are powerful enough to handle your load – then if one breaks, you can wait for morning before you fix it.

On the other hand, some software is so fragile that if you have more servers, the software will break more often. Then you would need more staff to keep the servers going.

A lot of systems people prefer buying big machines, while some software doesn't make use of multiple processors.

I generally prefer having more servers, and using simpler software that is more robust. You could probably reduce costs by a factor of five or more this way. But you probably have to be smarter about design than if you – for example – simply buy one big computer.

Direct access to web servers

Even though your customers all access your web servers through the load balancer, you should make sure you have a way to reach each physical machine directly – so that you can tell "from outside" if one is broken. It's very difficult to diagnose a problem that's only occurring on one of several machines if you are looking at the website through a load balancer.

Systems people might think this isn't important, but it's a lot easier for everyone if the content people can identify the exact problem on the exact server.

Content Managment System

A common – pathetic – approach is to force writers to use html forms to edit content. This deprives them of tools like spell checking and automatic saves. It promotes writing in Word and dragging into the html form, and to revise, the reverse. Copy/paste adds potential for errors, and most CMS software is even too dumb to fix the operating system-specific characters (like 'curly' quotes and long dashes).

A better choice is to have the CMS provide access to documents that can be edited with desktop software.

The problem with this approach is the Unix security model. You want the CMS to copy and move documents around, so it needs write permission, but you would prefer that it didn't run as setuid. You also want for each user to be able to edit the files she checks out, but you want to prevent other users from editing the check-out files. That is, you don't want the CMS software to run in the users' group, because then all users would be able to edit anyone's files.

The first architectural approach here is to put the CMS on an isolated machine, and let it run as setuid. If someone breaks into your lan, and then breaks into your cms, they wouldn't be able to damage anything else.

A second approach is to use WebDav. This lets Word, Dreamweaver, and many other desktop applications load and save files over http. The CMS can control who is allowed to edit any file, so you aren't restricted to Unix's security model. You could get access permissions from LDAP to aid scalability.

Load balancing

Don't even think about round-robin DNS. DNS caches ip addresses all over the world, outside of your control. If a web server goes down, round-robin DNS takes forever to recover.




Copyright © 1998-2008 J. R. Boynton