Gary Stock of Nexcerpt tells Steve Outing that 90% of some news pages consist of stuff the reader never sees. Only 10% is actual “content”. The rest is structure. This has a couple of implications.
First, it takes four or five times as long to load the page as it should.
Second, all that markup and javascript has to be executed, so there’s even greater overhead associated with it. Table tags in particular take a long time to process.
Third, this is yet another powerful argument for cascading style sheets, which load only once and persist in the reader’s cache.
I tried this out on the San Jose Mercury News home page and he’s right. There are 81756 characters in the page source, of which roughly 8000 (less than 10%) are content — including navigation text and some script elements.
Here are my (rough) calculations of content to total HTML ratios on the home pages of some popular news sites:

  • Wall Street Journal: 22380/59810 = 37%
  • Google News: 16908/62881 = 27%
  • News.com: 9355/46857 = 20%
  • Yahoo News: 11000/56601 – 19%
  • New York Times: 10253/75262 = 14%
  • San Jose Mercury News: 8000/81756 = 10%

It’s hardly a coincidence that the sites with the highest content/markup ratios also have the most news on them. The Wall Street Journal fits three times as much news into three-quarters of the space as the Mercury News.
Imagine a technology that could quadruple your bandwidth (and reduce your overhead) at no additional cost. Would you adopt it?