The dusty archive of websites
The British Library has long had a duty to preserve archival copies of the UK's books and magazines. Its remit has now changed to include "ephemeral materials such as websites".
There are five "Legal Deposit" libraries in the British Isles. The most well known is the world famous British Library in London. The others are the National Library of Scotland, the National Library of Wales, the Bodleian Library Oxford, and the University Library Cambridge. Under the Legal Deposit Act, everyone who publishes a book or magazine in the UK must send a copy to the British Library, to be added to the national archive, and must supply additional copies to the other deposit libraries if requested. The British Library adds around three million items a year to its legal deposit which forms a valuable historical record for future generations, but as more and more publishing is done online only, the question arose as to whether or not these online-only documents should also become part of our historical archive.
At the start of April, the remit of deposit libraries was expanded to include ephemeral documents such as websites, and an amendment to the Copyright Act came into force which allows the deposit libraries to make archival copies of any UK online publication. So far the project has cost £3m and the British Library has said that its first crawl of the UK webspace could be available to researchers by the end of the year.
Such a project is, to say the least, ambitious. The first question is what constitutes a UK website and how many are there? That isn't an easy question to answer. Even counting the total number of websites in existence is difficult, and not just because it changes every minute of the day. Where two domain names show identical information, do you count that as one site or two? Some sites are communities, providing microsites for members and small businesses. Do you count those as one website or a thousand? A conservative estimate is that at the end of 2012 there were around 630 million websites worldwide and at least 50 billion pages.
How many of those 630 million websites are UK sites? This question is even harder. If you only consider domain names ending with dot uk, you are excluding all the sites which use a non-geographic domain name, like skillzone.net for instance, but you would be including the sites which use dot uk domain names to pass themselves off as legitimate UK businesses. The physical hosting location is also misleading. Here at SKILLZONE we have hosted sites on our UK server for companies based in Ireland, Cyprus and even Australia. Likewise, it is not unusual for UK sites to use hosting services based in Germany, the Netherlands or the USA. So again, there is no definitive answer, but the best current estimate is that there are about 8 million "UK" websites.
Then the question arises as to what you want to archive? If you archive news sites because they report current events relevant to future historians, do you also archive the journals, the blogs, and the tweets? Do you archive sites which are online brochures for businesses and companies, the hotels with dish of the day menus, the technical forums, the football fan sites, and the endless number of sites which mash up content from other sites and throw their own banner ads on top? And do you try to keep an archive of changes to all these sites which, unlike printed books, can change monthly, weekly, daily, even minute to minute?
Laudable as it sounds to archive the UK internet, I'm not sure it would be worth the money it will eat up. The concept of a deposit library for books and magazines was good because, in years past, the inherent difficulties and costs of publishing created a quality filter which kept the volumes down. The costs were shared between the publisher and the library. Most of all, it wasn't technically complicated and it was possible to preserve a copy of all the notable publications, just in case one of them became important to future generations. All of that changes with internet publishing. The deposit libraries now have the legal authority to archive whatever they wish, but I hope they archive wisely, and not just because "that's what we do".
24th April 2013
This article comes from the SKILLZONE email newsletter, published monthly since January 2008, and covering topics related to technology and the internet. All articles and artwork in the SKILLZONE newsletter are orignal content. If you would like to receive the newsletter direct to your inbox each month, please SUBSCRIBE here. It is free, and you don't get added to any other mailing lists. It uses best-practice confirmed opt-in only, and you may unsubscribe at any time.