Advanced Performance Techniques
Introduction
It’s not often that a developer proudly pounds away for weeks on his input device, only to end up with a small compressed tool.
This may resemble Barbara Windsor’s opening line from ‘Carry On Coding’, but it is important to remember that the speed of your website plays an integral role in its usability, availability and appeal. Companies such as Google and Yahoo! have unquestionably verified this theory, gaining and retaining a large percentage of the web audience.
The advent of broadband, inexpensive technology, and the current buzz surrounding XML may have shifted the IT community’s focus away from online efficiency, but performance issues are as relevant today, if not more so, than they were a year ago. Broadband still hasn’t arrived for the majority. Cheaper technology is allowing more people to connect, hence more Internet traffic. And our saviour XML – owing to its extensible plain-text format – is actually quite a bit bulkier than most of what it replaces.
Having praised the need for speed, it seems a little ironic that the sites gaining the press’s attention - and usually those winning the awards - are those that appear to have made little attempt to employ efficiency measures. In fact, they are usually heavy with Flash imagery and content. This article, therefore, concentrates on areas that can increase performance (or appear to increase performance) without changing the look and feel of the site. Widely covered topics, such as image compression, will not be included – instead, some lesser-known techniques are exposed.
Style over Substance, not Style with Substance
No formatting information should be contained within the body of the document, rather, all formatting should be specified through CSS, removing all font, bold, em tags, etc. Also, this CSS declaration should be removed from each HTML file into a single external file, which is then referenced at the top of each HTML page. This CSS file can then be cached by the browser, providing an additional boost to the download speed.
Validation’s what you need, if you wanna be a record breaker
Web browsers are now mature enough to have fairly standard Javascript implementations. Therefore, Javascript should now be used for most client-side form validation. Although this may seem like an additional piece of code for the user to download, they will appreciate it when they can be told – almost in real time – when they have not filled out forms correctly, giving them a ‘quicker’ experience and also saving server bandwidth and processing power for the form validation and return of page again. Again, the Javascript should be in an external file for caching purposes, and should be re-used across multiple forms.
Compression solves depression
Most modern browsers (Netscape 4.5+, IE4.01+) allow for content to be ‘gzip’ (or ‘deflate’) compressed. Plain text (i.e. HTML) can significantly compress, and as long as the web server has some spare CPU power for the compression algorithm, heavily content driven sites can benefit significantly from gzip encoding. Recent Box UK statistics show that compressing content-heavy sites can reduce the size/speed of the HTML download by an average of 75-80%.
You thought Vanilla Ice was a bad Wrapper?
Placing your content in frames can seem like an excellent method for speeding up a site – after all, the menu need only be downloaded once, etc. However, the advantages are outweighed by the difficulty in achieving cross-platform/browser compatibility, and the relatively slow speed at which browsers render framed content. Together with the multiple search engine problems they introduce, they should generally be avoided.
Wrapping all of your HTML inside a single table can also seem like a good idea – you can limit the width of your site, or scale it all by changing the percentage width of the outer table. However, most browsers do not render the contents of a table until the HTML for the entire table has been loaded – therefore, a 50 k page will display a blank page until it has all been loaded. This can APPEAR to be a very slow page, although it is just as fast as any other 50k page. The user will get confused and agitated by the lack of any content appearing, and will try elsewhere. Alternatively, try placing content in a few separate tables, or better still, use CSS for positioning of content. This may add a few more bytes to your code, but the IMPRESSION of speed will be greater – the user can read the top of the page while the remainder downloads.
Body Beautiful
All code should be well-formed (X)HTML, with the correct nesting of elements, etc. Browsers will then render content faster, search engine spiders will be comfortable within the site, etc. However, be careful with backwards compatibility – older versions of Netscape, in particular, do not render XHTML perfectly.
Back to basics
The speed of your site depends partly on the HTML, and partly on the hardware/software that it runs on. If you mainly serve static HTML pages, consider changing to a NT/IIS combination. If you serve mainly dynamic content, consider a Linux/Apache/PHP/MySQL combination. Research CPU and memory considerations, too, especially if your web server is noticeably struggling at peak times in the day. Separate database servers, and Load balancing considerations, should be amongst the items you should research.
Just because you know HTML, there’s no need to get carried away…
Remove all unnecessary HTML from your code, e.g. the border=”0” attribute on images which are not links. However, keep width, height and alt attributes - as although they add extra bytes to the size, again they can give the impression of faster downloads or a more professional site, as browsers like to position all the elements before rendering the document. If you have used a WYSIWYG editor to produce the code, try re-structuring the document, perhaps by shortening the indent width (perhaps from four spaces to two spaces).
Another trick is to rename all internal directory structures so that they are as small as possible (e.g. /images to /img) – thus, if your template calls 12 images, it will have at least 3*12 = 36 less bytes in the HTML). Similarly, template images should not be called ‘template_image_row_1_col_3’, but rather something shorter, like ‘tp_r1c3’.
Database optimisation
Spend some time examining all the different SQL statements that are made, and ensure that indices are built on common columns (i.e. those which crop up in ‘where combinations’). You could also consider re-structuring the database, so as to de-normalise multiple tables into a single table. Although all database design tutorials will specify that tables should be normalised, this is a good method for speeding up sites, as joining of tables is a slow process in most database servers.
If you are running MySQL version 3, consider upgrading to MySQL 4 or switching to SQL Server - both of these databases employ query caching, that can have a dramatic performance increase for database driven sites.
For SQL Server applications, also consider using triggers and procedures for common queries - these will lessen the amount of SQL transferred between applications, and the database server can precompile the queries for additional speed.
Static over dynamic
Not everything needs to be dynamically created. If there are pieces of information that have quite a long dynamic cycle, embed them statically, but perhaps allow for new items to be re-embedded easily, through a pseudo-dynamic process.
Show me the cache
We have already covered CSS/Javascript cache. Caching can also be used for common database queries and also for compiled versions of scripting languages, etc. For example, Zend produce a PHP compiling module.
At Box UK, we also implement what we call 'layered caching' - that is, caching at various architectural levels. At the highest level, we keep track of database update times for each table. Assuming no personalised, rotating or random content, you can then cache each request for a dynamic page based on a combination of GET and POST variables. If the database times have not been updated since the last (cached) request, you can return the fully cached page without needing to dynamically extract the page.
At the next level, for XML/XSL based systems, the actual XSL transformation can be cached - given an XML and XSL string, take a hash of each, and cache the resulting transform against these hash strings. On each subsequent request, you can then check the hashes against the cached values, saving valuable processor power by removing the transformation process for previously transformed XML/XSL.
At a lower level, for modular systems, modular processes can be cached. Given a module for a specific page, the output from the module (or indeed the serialized object code) can be cached on the filesystem or in the users session (depending on whether you want site-wide caching or per-user caching). On subsequent requests, the date/time of this cache can be checked against database update times - if the cache is more recent than the latest database update, then use the cache.
At a lowest level, individual functions and methods within the code can 'cache' (remember) commonly called algorithms - use static variables within functions, and class properties within objects. These can then be checked for a value before processing the algorithm; if a value already exists, use the value rather than the algorithm.
Conclusion
Increasing the performance of an online application will have a ‘circular’ effect on speed: smaller files will take up less bandwidth; less bandwidth will allow for better performance under stress, and better performance will reduce the server stress and give the user faster downloads.
Note, though, that accessibility should always have a higher priority than performance. Do not increase performance by removing alt tags, reducing style sheet information, etc.
Document Index
- Introduction
- Style over Substance, not Style with Substance
- Validation’s what you need, if you wanna be a record breaker
- Compression solves depression
- You thought Vanilla Ice was a bad Wrapper?
- Body Beautiful
- Back to basics
- Just because you know HTML, there’s no need to get carried away…
- Database optimisation
- Static over dynamic
- Show me the cache
- Conclusion
