Imaging Centre

Test Cases

Some materials have already been rendered into electronic form and samples follow. These examples illustrate the scanning system's capability to produce a Word document which represents the original text and formatting with a good degree of accuracy.

Test case 1 – PhD Thesis

This Thesis was produced in the UK in 1996 using a word-processing program. The author found himself in Australia with only his bound copy of the thesis – the electronic version having long succumbed to format creep. The Imaging Centre thanks the author for his kind permission to reproduce these pages.

Two pages are featured – one from the main body of the text which demonstrates the rendering of several aspects of formatting common in academic writing (various line spacing, indentation, various point sizes, italics and footnotes) and one page which shows the rendering of a bibliography. Both the PDF image file and the Word file of these sample pages are shown so that you can assess the accuracy of the rendering for yourself.

Please note that the Word document is “as is” from the scanner – no editing has been done. There is one OCR error on this page – which Word's spell-checker will find (unless you have “ divmitory” saved in your dictionary). The footnote is not “linked”- one whole paragraph has come through in italics, rather than one word, and one full stop has come through as a comma - so some editing is needed.

Text page Image Word document

Bibliography page Image Word document

Test case 2 – Archive box list

Detailed lists of the contents of archive boxes are produced as part of the control documentation when materials are transferred for intermediate or permanent storage. If the list exists only in analogue form, then that information cannot be easily transferred into databases to facilitate searching. One such box list has been scanned and because the information is largely in table form, the text was copied and pasted from Word into Excel.

Please note that the Word document is “as is” from the scanner – no editing has been done. There are OCR errors in the headers of these pages due to shading on the original document. Such shading can be minimized even more than has occurred here, so the OCR could potentially be improved in cases such as this. Nevertheless, there would be little effort involved in editing the Word version to correct the OCR once in the table headline, delete the other occurrences and then transpose to Excel. This editing would be the work of moments. Please note also, that the last page of the document was not properly formatted when typed.

Box list Image Word document Excel spreadsheet

 

top of page