Outsourcing a digitisation assignment

From Tracks
Revision as of 15:09, 12 June 2024 by Nastasia (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Other languages:
English • ‎Nederlands • ‎français

It is often recommended to outsource the digitisation of your archive or collection to an external partner. They have the expertise to deliver high-quality digitised files – but there are many things to consider.
In this article, you’ll learn:

  • What is the advantage of outsourcing a digitisation project?
  • What agreements should you make with the provider?
  • How do you follow up on the digitisation process?
  • How do you perform quality control after digitisation?

Outsourcing a digitisation assignment means asking an external company to digitise your archive or collection for you.

The fact that specialists are taking care of it increases your chances of obtaining high-quality reproductions. Outsourcing your digitisation assignment is therefore certainly recommended if you have large amounts of valuable content, or if there are items that you simply can’t digitise yourself, maybe because they’re too big in terms of format or involve tricky formats such as video tapes.

But be careful: outsourcing a digitisation project isn’t as easy as simply calling a company and asking them to do it for you. There are lots of issues that need to be established in advance as much as possible before requesting a quote from suppliers, and then included in any agreements you make.

You need to assume three phases when outsourcing a digitisation assignment: before, during and after digitisation.

  • Before: prepare your content, look for a supplier, make agreements, and perform initial tests;
  • During: follow up queries any from the supplier and carry out interim checks;
  • After: check that all your content has been returned to you correctly, and carry out final quality control checks on the reproductions.


Foreword

Before going any further, please read through the general principles for high-quality digitisation, which we explain for you in the High-quality text and image digitisation section. The general requirements for good digitisation (sufficient storage space, descriptions, reproduction quality and file format quality) also apply here.

Before digitisation: agreements with the supplier

Practical agreements

Make sure there are clear agreements in place for the logistics, pick-up, delivery date, contact persons, digital file delivery method, etc.

Rights

Make an agreement with the supplier that they cannot claim any intellectual ownership or usage rights for the reproductions. If your collection contains privacy-sensitive content, ask for a confidentiality clause.

Logistics and storage conditions

Make sure your supplier clearly describes where the originals will be stored, and how they will be transported. It’s preferable to draw up a condition report for the collection, so that you can check afterwards that nothing has been damaged.

Conditions about completeness

Ask for guarantees that the supplier will scan everything properly. It’s best to describe and record your collection in as much detail as possible to be able to check this. (See the High-quality text and image digitisation section.) This ensures not only that you can check everything has been digitised properly, but also that the supplier has returned all the originals to you.

Example spreadsheet for digitising slides. Each slide is given a unique number (left column) and an indication of the number of slides per holder (fourth column), so you can check that all slides have been scanned and returned. The register also gives the supplier clear instructions about what filenames need to be given to the reproductions (third column).

Conditions about the reproduction quality and process

  • Make clear agreements about the resolutions, bit depths and colour profiles to be used for the digitisation.
  • Make sure the supplier clearly describes what recording equipment they are going to use.
  • The supplier should state that the equipment will be calibrated on a daily basis: at the start of the process, when restarting the equipment, and when changing settings.
  • Ask how the supplier is going to check the capturing and calibration quality, and that they will provide you with evidence of their control processes.

Image quality standards

Include in your specifications that the tenderer must satisfy internationally applicable standards such as Metamorfoze, FADGI or Digitisation guidelines for photographic materials. This obliges the supplier to comply with international standards and carry out the prescribed quality controls.

Post-processing

Agree whether post-processing will take place, and how it needs to happen (e.g. straightening, cropping, sharpening). You need to be explicit if you want to make sure a specific process definitely isn’t used.

If it concerns text documents, then it’s a good idea to OCR them straight away. OCR stands for Optical Character Recognition and converts the text in your document into machine-readable text, so that you can also search for words in your reproductions. It is often possible at no great expense. Agree how the OCR data will be delivered. A text file per scan is the minimum requirement and should always be requested, but you can for example additionally have the OCR text embedded in a PDF for optimum searchability.

The deliverable files

Agree whether the supplier will deliver master, archive and/or reference files. (See the High-quality text and image digitisation section for more information.) Agree which file formats will be used: uncompressed baseline TIFF v6 for master and archive files, JPEG for reference files if they contain image content, and PDF for reference files if they have text content.

Make sure the supplier also delivers the checksums for each file, so you can check that no errors have occurred e.g. when copying files. The easiest way to do this is to provide your supplier with a text document or spreadsheet of all files with their checksums.

An example text file with checksums per file. You can use these checksums to trace any errors in the copying process. It’s also useful for checking that your files have not been modified.

The metadata

Agree how the supplier needs to name the files and what the folder structure should be. Ideally, you will draw up a list of all the content that the supplier needs to digitise in a spreadsheet, where you can clearly indicate all the filenames in a separate column. You can find an example spreadsheet for magazine digitisation on the VAi website.

If you want the supplier to add any other metadata during the scanning process, then make clear agreements about how they will enter and deliver this, and how they will check the quality.

You can also ask the supplier to embed certain technical and content-related metadata in the files. See the Embedded metadata for photos section for more information.

A test phase

It is recommended to arrange a test phase with the supplier, so you can observe the digitisation process in person on site. Make sure they digitise a number of items as a test to begin with. If you are not satisfied, you can then still make adjustments to create a ‘reference scan’ that you are happy with. This will set the bar for the quality of your subsequent digitisations.

Ensure adequate time for checking

Ideally, you will check that the supplier does everything you ask. See below for more information about how you can tackle the quality control.

Agree a period of time with your supplier for you to check the deliveries. The supplier will preferably also provide you with interim results over the internet, so you can check them straight away and intervene if anything goes wrong. Determine in advance with your supplier which methods and software you’ll be using to check the content.

During the digitisation

Follow up queries from the supplier and ideally carry out interim quality controls while the digitisation process is still ongoing. These tips can help:

  • Spend adequate time determining the reference scan;
  • The sooner a fault is noticed, the better. Ask your supplier to provide you with interim results, so you can respond to any errors straight away. Target scans, which test the correctness of the calibration, can for example be sent over the internet immediately after calibration and checking. Generated reference files can also be sent over the internet, so you can assess the visual quality straight away too;
  • If you have made specific requests for post-processing after the digitisation (e.g. to add OCR and metadata), then add an intermediate stage to check the digitisation quality. Only give your authorisation to proceed with the further post-processing of the files once you are certain that the digitisation itself has been done properly. Otherwise you run the risk of the supplier spending time processing faulty files, which can lead to disputes later on.

After the digitisation: quality control

Outsourcing a digitisation assignment saves you a lot of time because you don’t have to create the reproductions yourself, but you do need to allocate lots of time for the quality control if you want to do it properly. Do not underestimate this; all kinds of things can go wrong. Generally speaking, you should check the content for:

  • completeness;
  • image quality;
  • correctness of file format;
  • post-processing quality (e.g. OCR);
  • metadata quality.

Content completeness

This shouldn’t take long if you have drawn up a good list of all the content in advance. Check that there are the same number of files per folder as there are in the listed items (pages, photos, etc).

Also check that no errors have crept into the files, e.g. as a result of copying actions over the internet. This can be done automatically if you have agreed a good checksum file with your supplier. (See the checksums section.)

Finally, it’s always a good idea to check that the original materials have been returned in their entirety.

Image quality

Try to perform this check as much as possible during the digitisation process already. After all, the only way to recover anything is to carry out the digitisation again.

We’ve already mentioned above that it’s a good idea to ask your supplier to follow an international digitisation standard (Metamorfoze, FADGI or Digitisation guidelines for photographic materials). This means the supplier is contractually obliged to undertake certain steps to guarantee the digitisation quality.

But just because a supplier says they will respect a certain standard doesn’t necessarily mean they actually will. The standards offer a framework in which you can use ‘target scans’ to check for yourself whether this has happened. (See the relevant standard for more information.)

You should bear in mind that these standards are for specialists, and the checks require specific software. If this doesn’t work, make sure you at least always check the images. Are they sharp enough? Are they readable? Do the images show any discolouration or imperfections (e.g. moiré)? Can the image be used for publication in books and on the internet?

If you would like help to check whether a reproduction has been created correctly in accordance with the relevant standards, please contact a TRACKS partner.

Correctness of file format

Try to perform this check as much as possible during the digitisation process.

Focus your efforts mainly on the quality controls for the master and archive files. The reference files are less critical because new ones can always be generated from the archive files.

Please note: just because a file has a ‘.tif’ extension doesn’t necessarily mean that it’s a TIFF file. You can use a tool such as Droid to test whether your TIFF really is a TIFF. See the Identifying your digital archive files with DROID section.

You can use the DPF manager tool to test whether your TIFF file is an Uncompressed Baseline TIFF v6.0 (i.e. the most sustainable type of TIFF file). In our experience, it’s quite common that TIFF files do not meet a required standard, so make sure you carry out adequate testing. Do not hesitate to contact a TRACKS partner if you would like additional help.

Post-processing quality (OCR)

OCR is rarely or never completely flawless. The quality depends on the software used and the source material in particular. In order to check the quality, you always need to request the OCR data in separate text files and not just in PDF files. This allows you to open a number of random text files with OCR text as a sample.

Please contact a TRACKS partner for more information.

The quality of the metadata

If you ask your supplier to add metadata (such as titles) to the digitised files, the only way to check it is to actually look at it, possibly using a random sample. Regardless of the method used by the supplier to deliver the metadata, it’s best to also request a printout of the metadata in a spreadsheet as this simplifies manually checking.

Digital files also have embedded technical metadata, which you can also assess if you wish. See the embedded metadata section for more info.

Read more

Authors: this article was originally based on text by Wim Lowet (Flanders Architecture Institute) in collaboration with Nastasia Vanderperren and Bart Magnus (meemoo).