Digital preservation Long-term, error-free storage of digital information, with means for retrieval and interpretation, for all the time span that the information is required for.
Metadata Data about data. Metadata describes how and when and by whom a particular set of data was collected, and how the data is formatted.
Semantic Web The Semantic Web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a form that can be understood, interpreted, and used by software agents, thus permitting them to find, share, and integrate information more easily.
XML The Extensible Markup Language (XML) is a
W3C-recommended general-purpose markup language that supports a wide variety of applications. Its primary purpose is to facilitate the sharing of data across different information systems, particularly systems connected via the Internet.
The influence of the Internet in our everyday lives is so pervasive that it is hard to credit it as a still-immature technology. Twenty years ago it did not exist. By July 2005, according to Hobbes’ Internet Timeline, there were 353 million Internet hosts, that is, computer systems connected to the network with registered Internet (IP) addresses. The extent of Internet penetration in the developing world is still patchy, although this is a rapidly changing picture. In December 2005, only 2.5% of African households had Internet access, whereas for the United States the figure was 68%, and for the United Kingdom 63% Http://www. internetworldstats. com/.
The origins of the Internet can be traced back to the Cold War and the Space Race. In 1957 the USSR launched the first Sputnik; USA formed the Advanced Research Projects Agency (ARPA). In 1969 ARPANET was commissioned to provide a computer network for the US Department of Defence which connected the military with university and civilian contractors. The intention was also to provide a distributed system of computers capable of withstanding a nuclear strike. In 1971 the first e-mail program was invented; e-mail rapidly grew to comprise 75% of all traffic on ARPANET, and in 1973 the first international connection was made to link University College London with ARPANET. Within a short space of time academic networks were established and from 1981 BITNET provided e-mail and file transfer (FTP) facilities at City University, New York.
By the mid-1980s, therefore, the Internet existed as a network of networks across which it was possible to transmit computer files. However, for this to be transformed into the World Wide Web, three things were required: (1) a means of specifying the internet addresses of specific files, (2) a protocol for information transfer, and (3) a uniform way of structuring hypertext documents providing links to other documents. The first was provided by the development of Universal Resource Identifiers (URIs), and later Universal Resource Locators (URLs). These domain names could be mapped to the numeric addresses of each computer connected to the Internet, and specific files hosted on the computers could be referenced. Second, a HyperText Transfer Protocol (HTTP) provided an agreed means of transmitting files from the computer, or web server, on which it was held, to the user’s own computer. Third, the specification of HTML, a HyperText Markup Language, provided a standard for machine-readable instructions to software browsers for how to display interlinked text files.
In 1991, the World Wide Web was effectively ‘released’ to the world when CERN (the European Organisation for Nuclear Research) made its linemode browser available. This was able to parse and display blocks of text, and to highlight hypertext links, but it was not suitable for a Windows environment. In 1993, access was therefore revolutionized when the US National Center for Supercomputing Applications (NCSA) released Mosaic, the first Windows-based browser. This was followed, the next year, by Netscape Navigator, and eventually by Microsoft’s Internet Explorer. Most early web applications were academic in aim but this soon began to change. In 1993 the website Http://www. whitehouse. gov/ was launched, closely followed, in 1994, by the first internet shopping malls.
Archaeology has made innovative use of the Internet for dissemination and communication since the beginning. In common with other disciplines, e-mail was rapidly adopted, first by the academic sector, followed by private individuals. A number of discussion lists, such as ARCH-L and BRITARCH provided active fora for heated debate, as well as more routine announcements. ARCH-L has online archives going back to May 1992 Http://listserv. tamu. edu/archives/ Arch-l. html. More focused lists create online communities of archaeological specialists, although some regret the time required just to monitor all the lists in which they have an interest.
The most significant impact of the Internet on archaeology has come about through the use of the Web for publication and dissemination. By creating its own website, any project, whether major state-funded research, or local archaeology society, or community-based, is able to promote itself online. New web tools such as ‘blogs’: weblogs which are online journals or diaries, or ‘wikis’: collaborative online environments to which a number of people can contribute, allow higher levels of interaction. Other innovations, including automated web newsfeeds, webcams, and webcasts, have been used very effectively by fieldwork projects to broadcast regular updates on new discoveries (see Popular Culture and Archaeology).
Enthusiasts have created websites for every possible archaeological subject. While this can be seen as a democratizing process, allowing everyone access to the means of self-publication, the downside is the lack of traditional academic control and review, leading to the easy promulgation of unscientific nonsense. The multivocality allowed by the Internet has resonated with many postmodern archaeological thinkers, but it demands high-level skills of source criticism from its users.
There are, however, a number of authoritative archaeological websites, which have followed traditional publication models. The first peer-reviewed e-journal for Archaeology, Internet Archaeology Http://intarch. ac. uk, was first published in 1996. Antiquity Http://antiquity. ac. uk Has also developed a web presence of additional features. Most archaeological journals have responded to the growth of the Internet by providing parallel online editions, using the PDF format. The JSTOR archive Http://www. Jstor. org/ now provides access to a large number of scholarly journals, including American Antiquity. However, this is really a means of using the Internet for electronic distribution of a traditional printed journal, rather than electronic publication in its own right. Other projects have been highly innovative in exploiting the potential of the Internet to provide access to elaborate graphics, including multimedia, online Geographic Information System (GIS), and searchable databases. The Digital Archaeological Archive of Comparative Slavery Http://www. daacs. Org/ provides an excellent example of a highly interactive excavation publication.
The Internet can also provide access to material which would never have been published in traditional form. Archaeology has suffered from a growing problem of gaining access to unpublished grey literature, reporting the results of contract archaeology. In the UK, the OASIS project has provided an index to that literature and is gradually making it available online, together with an online recording form for reporting new fieldwork Http://ads. ahds. ac. uk/project/oasis/. In the United States, the National Parks Service has set up NADB-Reports - a bibliographic database of gray literature. The Internet has also allowed unparalleled access to primary sources. In the UK, the Archaeology Data Service (ADS) provides online access to a growing number of archaeological fieldwork archives Http://ads. ahds. ac. uk/; in the USA the Perseus Project has a particular focus on classical texts and archaeology Http://www. perseus. tufts. edu/.
However, the majority of archaeological content available via the Internet is only accessible by searching online databases in what is known as the ‘deep web’, as opposed to the ‘surface web’ of indexable web pages and images. Examples of deep web content include the National Monuments Records, or sites and monument inventories of a number of countries or regions. Among the first to go online was Scotland, which now provides sophisticated map-based searching of its CANMORE inventory Http://www. rcahms. Gov. uk/. The online database of the Portable Antiquities Scheme Http://www. finds. org. uk, which holds information about finds reported by members of the public under the Treasure Act for England and Wales, is an interesting example of a community-fed online resource.
As the Internet grows as a primary means of providing access to archaeological information fears have been expressed that it disenfranchises those who lack the technical skills or high-speed connections that are commonly assumed. Although there are parts of the developing world - as well as sections of the population of the developed world (particularly the aged) - for whom this is true, it is not clear that these problems are greater than those of access to academic libraries, and in most cases they are substantially less.
As the quantity of Internet information grows another difficulty is finding what you want. A number of sites have developed which attempt to provide guides to archaeology on the Internet, or a subset of it. The Archaeology pages at About. com Http:// Archaeology. about. com/ or those provided by the popular archaeology magazine Current Archaeology Http://www. archaeology. co. uk/ are among the most successful. The former has a useful set of links to archaeological blogs. Other sites, sometimes described as web gateways, provide lists of links to sites, often with summary abstracts. The ARCHNET site, currently hosted at the University of Arizona Http://archnet. Asu. edu/, is one of the most useful resources; one of the most innovative sites is that provided by the British Archaeological Jobs Resource Http://www. bajr. org/. However, one of the greatest problems facing these sites is keeping up to date with the logarithmic growth of the web; most depend upon considerable investment of time and expertise, and it is difficult to see how they can be sustainable in the long term without institutional support. Most Internet users will also turn to a search engine to find what they are looking for, and the dominance of ‘Google’ makes it the first port of call for most people looking for archaeology on the web. Google is an extremely valuable tool in some circumstances and is difficult to beat as a free text index, but it does not provide access to the ‘deep web’, and it cannot take advantage of structured information held in databases. Thus it cannot distinguish, for instance, between Barrow as a place, a person, or an excavation tool. Library catalogs use simple metadata (‘data about data’) for author, title, date, place of publication, and subject, in order to help us find the book we want. The Dublin Core has been adopted as the agreed metadata standard for resource discovery for electronic resources. The Open Archives Initiative has also agreed a standard (OAI-PMH) for metadata harvesting, which allows the crosssearching of structured information held in distributed databases. Nonetheless, for such searching to be effective requires adherence to standards and the adoption of agreed archaeological vocabularies and subject thesauri.
It is envisaged that in the future, Internet tools will be able to make much more effective use of structured data. The mark-up language HTML just controls how a web page is rendered by the browser software. Tim Berners-Lee’s vision for a Semantic Web shows how XML will be used to tag the content of web pages, allowing more automated harvesting and assimilation of relevant information. In the United
States the term ‘cyberinfrastucture’ has been used to describe a web-based research environment in which multiple datasets can be integrated and analyzed; elsewhere the term ‘e-Science’ has been adopted to reflect new ways of undertaking collaborative research using the Internet.
Finally, there is an underlying concern that because of the lack of central control of the Internet there are problems with the longevity and sustainability of archaeological pages. Many sites are transient and web addresses change. The ‘Internet Archive’ takes regular snapshots of many websites but there are long-term issues about the preservation of digital data. In the UK the ADS is seeking to address some of these problems, and other countries are becoming aware of the need to archive digital data, but there are inevitably challenges in coordinating preservation activity at an international level, while the Internet continues to evolve at such a rapid rate.
See also: Interpretation of Archaeology for the Public; Popular Culture and Archaeology.