As a precursor to a design checklist we will be publishing next week, we thought it would be interesting to try some different experiments on the home pages of popular websites, to see if it was possible to automatically (programmatically) calculate whether a design is good or not. We chose Apple, Microsoft, 37 Signals, YouTube and Myspace. Analysis was carried out on the afternoon of Wednesday 10 June 2009.
Note that all images (except the typography hierarchies in the final section) are clickable for larger versions of each image.
Although not a hard-and-fast rule, a good design should typically (unless the idea is to cause tension through imbalance) take the balance of the composition into consideration, where smaller dark areas can be balanced out by larger areas of lighter areas.
We brainstormed some mathematical approaches to this; eventually Ben and Carey opted for a vector-based center of mass algorithm, using a weighted luminosity measure of each pixel to determine the weight.
In the results below, the green dot represents the absolute middle of each screen, with the red dot identifying our calculated center of mass.
As can be seen, most home pages are fairly well balanced, with the two largest discrepancies (relative to the size of the page) being YouTube and Myspace. Coincidentally (or perhaps not), these could be considered the least professional of the designs, so this does begin to support the assertion that some aspects of good design can be calculated automatically.
The colors used in a design should be harmonious, or at the very least, not clash (again, unless that is the specific intention). The pie charts below show the distribution of colors on each home page.
Note: The chart on the left includes the background color of each page, the chart on the right does not include the background color.
We have not yet used color theory to analyze this data (e.g. to test for complementary colors), but even the rudimentary charts above highlight the simplicity of the Apple design.
The text on a design should ideally follow a pre-defined typographical hierarchy, where every step in the hierarchy should be clearly differentiated: by size (at least one pixel), weight, style (e.g. italics) and/or font family.
As can be seen, the Apple typography is as close to perfect as you could image; at least 1 pixel and/or weight difference between a very restricted number of styles. Microsoft almost gets it right, but with the largest two styles having less than a pixel between them, the difference becomes a little blurred (these differences can be caused by inaccurate CSS inheritance rather than purposefully designed, but the net result is the same).
37 Signals, although opting for a much large set of styles, is actually extremely systematic about how the family, weight, style and size cascade down the styles, with a clear hierarchy established.
YouTube also performs well in this test. Unfortunately, possibly as expected, Myspace fails badly, not only with a lack of clear hierarchy, but also by committing the typographical sin of mixing similar typefaces (Arial and Verdana).
These prototype experiments were designed to assess the possibility that a tool could partially critique a design, based on long-established rules of good design. The results are promising, showing at least some relationship between what many people would consider the ‘best’ designs and the output of these algorithms.
This is just the beginning, though. What other tests could be run? How else could this data be used and visualized? Would tests like these, if popular, cause a fear of creativity? Let us know your thoughts in the comments.