Compliance with PROV Web Archiving
Draft Version 1.0
As stated in the PROV Advice to Agencies 20a: Web-generated Records Version 1, 2007 – “Agencies are reminded that they have a legal obligation to create and maintain records of website activity. Public records will be created, and need to be identified, captured and maintained appropriately, in accordance with PROV Standards and all other relevant legislation and policy requirements” (p.3).
| 1. | Ensure that all staff and management who have a role in the creation, maintenance and functioning of agency websites understand that websites create public records and are therefore bound by a range of legal and recordkeeping requirements. |
| UOM response: The UOM have developed a Web Archiving Policy and a Web Archiving website which targets staff who are involved in the creation, management and functioning of UOM websites. The aim of the policy and associated procedures are to help staff understand that websites create corporate University records and that they are therefore bound by a range of legal and recordkeeping requirements. | |
| 2. | Assess the functions delivered or documented by their websites. |
UOM response: Records Services has created a spreadsheet in Excel that contains the UOM's Enterprise Classification Scheme (ECS). Each of the functions and activities from the ECS have been listed. A number of 'seed' urls have been listed (a 'seed' means the initial starting point of a Crawl in the process of creating a Collection of websites) against each function. For example, against the function – ALUMNI RELATIONS – there are currently two identified 'seeds' for this function – which include – http://www.unimelb.edu.au/alumni/ http://www.unimelb.edu.au/advancement/ The reason these urls have been identified as 'seeds' for this function is that most of the pages that contain information or records for this function at the activity level, seem to stem from one of the above 'seed' urls. For example, some of the activities that can be found underneath the ALUMNI RELATIONS function include: Bequests - http://www.unimelb.edu.au/alumni/giving/ Donations - http://www.unimelb.edu.au/alumni/giving/ Procedures - http://www.unimelb.edu.au/advancement/manual/policy_procedures.html And as indicated above, sample urls can be matched against each of these activities and in many cases the url against the activity, stems from the higher level url matched against the FUNCTION. The spreadsheet however, is not comprehensive. It is not comprehensive because it does not attempt to list and appraise every single website on the University of Melbourne domain. Nor does it attempt to match up every possible url against a function or activity. The goal of the spreadsheet is to contain a sample listing of urls that can be checked against quarterly web crawls to ensure that information (and web records) that are related to a core University function, are being captured. It is anticipated that over time, work in this area of appraisal can continue to be refined and finessed. At this early point however, it does provide evidence of Records Services ability to assess the functions delivered or documented on the website. |
|
| 3. | Based on the functions identified, perform a risk analysis relating to the web content (which should include all relevant regulatory requirements, including any PROV Retention & Disposal Authorities). |
UOM response: Records Services is currently (as at January 2008) reviewing its Retention & Disposal Authorities (which is now due for completion inMarch/April 2008). As a result of this review, each function/activity combination will have multiple sentences associated to them. This means that in theory, any url that is attached to a specific function/activity combination will also have a disposal option attached to it as well. The UOM Retention & Disposal Authorities are sourced from and benchmarked against the PROV Retention & Disposal Authorities. In terms of other risk assessment strategies, a decision has been made to capture the whole of the UOM domain on a quarterly basis. Although it was not resource efficient to web crawl the entire UOM domain on a daily basis, anything longer than a quarterly capture was considered risky. It was reasonable to expect that the majority of significant changes to webpages on the UOM domain would be captured on a quarterly basis. The above mentioned spreadsheet which has matched urls against the functions/activities in the UOM ECS has also highlighted a number of urls that may (due to a risk assessment) need to captured with a greater frequency. For example, the University's front page is to be captured on a daily basis because of a number of factors including potential long term historical interest as well as short term reputational risk management. Although work is underway in addressing this recommendation, the UOM understands that further work and development may be required in this area. As advancements are made, they will be recorded on this webpage. |
|
| 4. | Determine what content from websites should be and will be captured as records and how often this will occur. |
UOM response: Although the quarterly captures of the UOM domain will go a long way towards recording the University's ongoing web site activity, the question of identifying UOM 'web records' still requires further development as at January 2008. The use of the UOM ECS and the matching up of urls against functions/activities does provide a starting point for identifying UOM web records. Although it is openly acknowledged, that this method may not comprehensively, as yet, identify every single UOM web record that is in existence. Many of the activities listed in the UOM ECS create transactions that may never appear in a published format. A good example of this occurs within the LEGAL SERVICES function. Many of the activities listed underneath this function, (for example, enquiries, claims, investigations etc) would never be represented on a UOM website. These records would be (for the most part) considered confidential in nature and would only exist on paper or electronic files, which would be kept safe and secure in the University's corporate recordkeeping system. Other activities listed in the UOM ECS may also potentially be used to flag or categorise content that is related to a particular subject. For example, there are many activities listed underneath the PROPERTY, ASSETS AND EQUIPMENT function that could easily be related to content in UOM webpages. For example, a search across the UOM domain may easily find webpages that mention activities related to insurance, facilities hiring, fit outs, maintenance etc, however in many instances these webpages are simply providing information about these activities, rather than being a record that a certain activity has taken place. For example, information provided on a website about who to contact regarding maintenance enquiries would be different to the records Property and Campus Services might retain to document the actual maintenance activity. As evidence of maintenance activities might involve invoices, work plans etc. Again, it is highly unlikely, within the UOM context, for the records of these kinds of transactions to be found on a UOM website. However, in the case of some activities like procedures, publishing, policy (activities which are aligned against a number of different functions in the UOM ECS) the material published to the web, is evidence of the activity taking place and therefore does become a record in its own right. Therefore, within the UOM context, each quarterly capture of the UOM domain is considered a record, as it is a record of the University's website activity at that particular time. This is particularly relevant in cases relating to the protection of the University's reputation. These quarterly captures may become a record in their own right, with long term historical value to researchers who wish to track the history of the UOM website or to track particular changes in organizational structure or campus life. In terms of corporate recordkeeping, the UOM will identify 'web records' as defined by the UOM ECS. |
|
| 5. | Formulate a strategy to facilitate the capture, maintenance and integration of records from websites (this may be assisted with reference to the companion document to this PROV Advice, Advice to Agencies 20b: Technical Issues for Capturing Web Records). |
| UOM response:
The strategy to facilitate the capture, maintenance and integration of records from websites at the UOM is based on the Web Archiving Policy and a service agreement with the Internet Archive that enables the UOM to use the “ArchiveIT” software to capture webpages from the UOM domain.
Details about the University's technical partnership with the Internet Archive and our software service agreement to use “ArchiveIT” is documented under the 'technology' part of our website. The Service allows Records Services staff (only) to login and create web collections (“Collections”), catalogue the websites associated with a Collection, archive websites in the Collection, monitor the archiving process, search and browse the Collections when complete, and administer access to these Collections. The Internet Archive will host and manage any and all Collections created under the service agreement for the duration of the agreement and will make the Collections publicly accessible. The UOM has determined (as at January 2008) to create the following “Collections”. Quarterly Collections Daily Collections Weekly Collections Monthly Collections Various 'snapshot' or 'ad-hoc' Collections The name of each Collection obviously reflects the frequency of the Collection capture. Quarterly captures for example, occur 4 times a year, starting in January. The “ArchiveIT” software provides the capability to document metadata against both Collections and seeds. Information about various functions/activity combinations will be associated to each Collection via this metadata. Metadata about each Collection will also be registered in the UOM's corporate recordkeeping system (TRIM). |