Sunday | 23 November, 2008
LinuxWorld.com.au

Web applications without databases

George Belotsky 02/02/2007 13:10:00

The World Wide Web started out as a collection of static pages linked to one another. Of course, this was not enough for long. Today (especially with the arrival of Web 2.0) is the era of the Web application. While static pages still abound, the most popular -- and the most exciting -- sites are dynamic.

The typical dynamic site depends heavily on a relational database backend (although the most visited sites do not necessarily use this approach). The application builds pages by querying the database, and assembling the returned data into HTML documents. This is a complex process, often implemented in several software layers (N-tier architecture). The database, in particular, must be very robust. After all, the system's state lives there. If the database goes down (or worse, becomes corrupted) the entire site can no longer function.

Very often, expensive hardware (frequently coupled with expensive software) and teams of database administrators must nurse the database. This scenario is common -- even in systems where the rest of the application runs on Free/Open Source Software (FOSS) with generic hardware.

There are tools to help developers of dynamic sites deal with the resulting scalability problems. For example, you could use memcached -- a distributed in-memory cache -- as part of your application. This can greatly reduce the strain on the database. Caching proxies such as Squid ease the load on multiple layers of the system. Adding Squid to a live site may even compensate for some errors in design or implementation, allowing the application to perform sufficiently well with little change.

Yet, the standard architecture for dynamic sites may be less effective than its popularity suggests. First of all, the ubiquitous relational database system (RDBMS) is not the best tool for text processing (see "One Size Fits All? -- Part 2: Benchmarking Results"). Programming for the Web, of course, is still primarily about text processing. Google does not use an RDBMS for this purpose, nor did the earlier search engines (such as Lycos and Inktomi).

Second, serving static pages remains the simplest and most effective solution. The article "Linux & Scaling: The Essentials" provides a fascinating account of how a surge of traffic crippled a dynamic site. Converting everything to static content restored the system. The creators of LiveJournal have also observed that static content is easy to handle (see page 30 of "Inside LiveJournal's Backend").

From the above discussion, we may surmise that the paradigm of dynamically generating pages for every request is often wrong. On the Web, pure reads greatly outnumber writes. Clearly, it makes more sense to generate a page only when its contents change, then serve the static HTML file in response to all read requests. The site is effectively dynamic, but the Web server operates on plain HTML files. This "Relative Static" (with respect the Web server) approach is the focus of this article.

Additional Resources
Newsletter Subscription
Sign up for our LinuxWorld newsletters!
RSS Feeds
 
Sponsored Links