Web Archiving @ the University of Melbourne

Advanced Systems Access

1. Machine Readable Collection Description

A machine may want to view / use the collection level metadata. This is available as an XML feed, again parameterising a URL with the relevant account identifier and the the collection identifier.
http://www.archive-it.org/archiveit/feed.xml?accountId=ACCOUNTID&sc=COLLECTIONID

eg http://www.archive-it.org/archiveit/feed.xml?accountId=197&sc=938

For the University of Melbourne, our ACCOUNTID is 197 and the main collection is 938.

2. OAI-PMH Data Provider

A web archive collection can be exposed to industry standard federated collection catalogue services such as OAI-PMH. There is an OAI-PMH data provider built in to Archive-IT, supporting collection metdata only. The OAI-PMH repository base URL is http://oai.archive-it.org:7090/oai

This service is designed for an OAI-PMH harvester, which can issue the following six types of requests (all of this text goes after the URL). All responses are in XML.

?verb=Identify 
returns basic information about the OAI-PMH repository.
?verb=ListMetadataFormats 
returns a list of all metadata formats available in the repository
?verb=ListIdentifiers&metadataPrefix=oai_dc 
    returns a list of all record identifiers (Archive-IT's are oai:archive-it.org:archiveit/[collectionid]) 
    with date of last modification
?verb=ListRecords&metadataPrefix=oai_dc 
    returns all of the metadata records 
?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:archive-it.org:archiveit/938 
    returns the metadata record for collection 938
?verb=ListSets 
    returns nothing for now, very soon there will be institution-based "sets" so people can pull out 
    all of the records for a given institution (via a "set=" argument to ListRecords above).

For more information on OAI-PMH please see Open Archives Initiative.

3. Full Text Search by URL of Selected Archive Collection

Since Archive-IT also supports search and retrieve via URL (SRU), it is also possible to construct a URL that links to both a collection and contains a query string. The syntax is:

http://index.archive-it.org:8080/nutchwax/?query=QUERY+TERM&go=Search+Web+Archive&collection=COLLECTION_ID
where QUERY_TERM is the normalised form of the search query, and COLLECTION_ID is the identifier of the particular collection.
As an example, to construct the URL which corresponds to the earlier cases:

http://index.archive-it.org:8080/nutchwax/?query=Enter+search+terms&go=Search+Web+Archive&collection=938

4. URL Search by URL of Selected Archive Collection

Again, since Archive-IT also supports search and retrieve via URL (SRU), it is also possible to construct a URL that links to both a collection and contains a specific target URL. The syntax is:

http://wayback.archive-it.org/COLLECTION_ID/query?type=urlquery&url=QUERY_URL&go=Search+Web+Archive&type=urlquery

where QUERY_URL is the normalised form of the URL, and COLLECTION_ID is the identifier of the particular collection (938 in most cases). As an example, to construct the URL which corresponds to the earlier cases:

eg http://wayback.archive-it.org/938/query?type=urlquery&url=http%3A%2F%2Fwww.unimelb.edu.au&go=Search+Web+Archive&type=urlquery

or with a date range included, by adding the date (YYYY((-MM)-DD) parameter:

http://wayback.archive-it.org/938/query?type=urlquery&url=http%3A%2F%2Fwww.unimelb.edu.au&type=urlquery&date=2007

http://wayback.archive-it.org/938/query?type=urlquery&url=http%3A%2F%2Fwww.unimelb.edu.au&type=urlquery&date=2007

top of page