High-quality text and image digitisation

From Tracks
Jump to navigation Jump to search
Other languages:
English • ‎Nederlands • ‎français

Het digitaliseren van analoog tekst- en beeldmateriaal kan veel voordelen bieden, bijvoorbeeld voor raadpleegbaarheid. Digitalisering is echter een intensief proces dat veel tijd en middelen vraagt. Een goede planning en aanpak zijn dan ook cruciaal.
In deze tool leer je het volgende:

  • Waar moet je je gedigitaliseerde beelden opslaan?
  • Hoe beschrijf je je gedigitaliseerde beelden?
  • Hoe zorg je voor een goede beeldkwaliteit van de gedigitaliseerde beelden?
  • Welke bestandsformaten moet je gebruiken voor je gedigitaliseerde beelden?

Digitisation is the conversion of analogue media, such as paper or photographs, into a digital form. But how do you ensure high-quality digitisation of your archive and collections?

There are four issues, which we explain in this article, that determine digitisation quality:

  • the location where digitised items are stored;
  • the information (metadata) saved for the file;
  • the visual quality of the capture;
  • the quality of the file format used to save the image information.
A book scanner in action. Book scanners have been designed specifically for scanning books

Good to know before reading

You can read more about digitising text and image content – such as photos, posters and drawings – in this section. If you have other media, such as audio tapes or film reels, then go to the Digitising audio and video recordings section.

For your born-digital archive content (i.e. files created on a computer), please see the Digital storage section.

You can also outsource your digitisation assignments to a professional company, which often have more in-house expertise to produce higher quality. But their expertise and quality should still be assessed on the basis of the aforementioned issues. You can find more information in Outsourcing a digitisation assignment.

Als je beslist om zelf te digitaliseren, denk dan goed na over wie welke taken op zich zal nemen. Wat kan je realiseren met de tijd en vaardigheden van de werknemers? Waarvoor schakel je eventueel externe consultants in? Werk je met studenten of vrijwilligers? Soms kan het nuttig zijn om mensen de mogelijkheid te geven om bepaalde vaardigheden te laten ontwikkelen. Wanneer je plant om vrijwilligers of studenten in te schakelen, lees dan zeker ook de tool (Vrijwilligers-)werk in de archiefzorg.

The storage location

Imagine you spend months digitising your entire photographic collection, saving all the photos on your computer, and then your computer is stolen or you spill coffee on it! Or what if all your images somehow disappear following a system update?

All sorts of things can go wrong with your digitised archive, so you need to make sure your digital files are stored properly. At the very least, you need a good back-up strategy. Read more about this in the How do you make a back-up? section.

The flatbed scanner, with its characteristic cover, is perhaps the best-known type of scanner. It can be useful for digitising two-dimensional objects such as documents, photos and drawings, but can also scan transparent carriers such as slides. Formats range from A4 to A0 and more.

The description

A digital reproduction loses lots of value if you don’t know what the original is, or who made the copy and when. You should therefore make sure you keep full records of what is digitised and where the original can be found.

It’s best to register or describe the collection before starting the digitisation process. Another option is to do this systematically during the digitisation process itself, but you need to make sure you work out exactly how you want to do this in advance. You can create the description in a spreadsheet such as Excel or in a database, for example. It’s best not to use Word or other unstructured text formats.

Ideally there will already be a access pass or inventory for the collection, which you can use as the basis for registering your digitisation work. And if there isn’t, but you want to start digitising already, you should always make a note of what each file is, and where the analogue source can be found in your collection, so you always know where the original is.

A digitisation spreadsheet

A spreadsheet can serve as an overview of your digitisation work and record all the links between originals and their digital reproductions. Keep a record of at least the following details:

Column Column contents
Unique number A unique number ensures clear identification of the original and its reproduction. It is very important to include this number as part of the reproduction’s filename. It can also be added to the original (e.g. in pencil), and is often a combination of your inventory number with a serial number (e.g. for photo albums)
Type of document If your collection contains different types of content, you can indicate this here, e.g.: "photo", "text document", "poster"...
Brief description A brief description of the original content, e.g.: "Photo taken during a study trip to Prague", "Poster of a show at the Beursschouwburg"...
Place code If the inventory number alone doesn’t provide sufficient information about where the original is located, you can find it in this field, e.g. the number of the box where the original is kept.

These columns are the minimum requirements and are sufficient to get started with the high-quality digitisation process. You can of course add columns of your own choice depending on the content or your requirements. The most typical columns are:

  • start and end dates;
  • the projects that the photo relates to (e.g. exhibitions for art houses, productions by performing arts organisations);
  • the people in the image.

In general, the following rule applies: the simpler the registration, the smoother the digitisation process itself will be. Bear in mind that you can also add content descriptions after digitisation, based on the reproductions.

Think carefully about adding extra descriptive metadata in your spreadsheet or your inventory and placement list.

The filename and folder structure for your reproductions

As well as a description in a spreadsheet or database, it’s also important to consider what filenames to use for your digital reproductions. As mentioned previously, there should always be a link between your filenames and the unique numbers in your spreadsheet. It preferably starts with the unique number (possibly preceded by a unique code to refer to your organisation). You can then add further text/info after an underscore if desired.

See the Naming files and folders section for this.

When you’re digitising documents that require multiple reproductions (such as photo albums, books or magazines), you need to pay careful attention to the filenames. Make sure they display the correct page order.

The codes "-r" (recto) and "-v" (verso) are often used in filenames if you need to digitise both the front and rear of an original.

Magazines are even more complicated. They have annuals, editions and sometimes also supplements. You will therefore need to consider how to save this info logically in your filenames or folder structure. You can’t order and fully rely on your spreadsheet to recreate the magazine structure.

It’s not difficult, but make sure the files are ordered and organised in a clear and logical way.

The visual quality of the recording

The visual quality of a recording starts with the recording equipment. The better your photographic device or camera, the better your images. But the better your equipment, the more knowledge you need as a user to be able to configure it correctly.

The sharpness that your scanner or photographic device achieves, and how precisely it reproduces colours, depends on how well it is calibrated (see below). Your reproduction environment has a strong influence too, especially if you’re taking photographs. You need to be able control the brightness at all times. And many recordings will also need post-processing, such as straightening and cropping.

You can find a good basic guide for configuring equipment on the FARO website (in Dutch).

A document scanner to automatically scan large numbers of documents using a document feeder. This kind of processing solution with document feeders is not normally recommended for high-quality scans. Old, fragile and valuable items are also at risk of being damaged.

General rules

  • Scan the entire document, leaving a margin of approximately 0.5 cm around the edge to prove that the full document has been digitised. You can always remove the margin again later, e.g. for publication.
  • The image must have a resolution of minimum 300 ppi at full size (ppi stands for ‘pixels per inch’). This means that 300 pixels are recorded for every inch of your document. The more pixels there are, the sharper the image is and the more you can zoom in without losing quality. If you’re digitising documents that you know it must be possible to magnify (e.g. passport photos and slides), the standard value of 300 ppi is not sufficient. If you want to magnify the document 2x as standard, choose 600 ppi. And use 1200 ppi to zoom in 4x, and so on.
  • If you’re scanning or taking photographs in colour, choose a bit depth of 24 bit. This is the number of bits (zeros and ones) used to register the colour per pixel. The greater the bit depth, the greater the range of colours that can be saved.
  • If you’re scanning or taking photographs in grey tones, choose a bit depth of 8 or 16 bit.
  • Make sure the colours are captured and saved using a sufficiently rich colour profile. An RGB colours profile is usual for digitisation projects. The world of heritage mostly opts for the colour profiles ECI RGB v2 or Adobe RGB. Another common colour profile is sRGB. But do not use this for your archive or master files (see below) as the range of colours that sRGB can save is not rich enough.

Calibrate the recording equipment

You’re already on the right track if you follow the general rules above, but they’re not enough on their own. In order to create high-quality reproductions, you need to properly calibrate your recording equipment and screen. Environmental factors such as lighting also need to be optimal.

Unless you’re outsourcing to a professional, you need to be prepared to spend time delving into the subject matter to work it all out. (See the Outsourcing a digitisation assignment section for this.) Go through the user guides and do some experiments or seek advice and training if you want to do it yourself. Make sure you keep the aforementioned general rules in mind.

Calibrating recording equipment and achieving the required standards for high-quality digitisation is quite a technical affair. You can of course create reproductions of a decent standard (without attaching great importance to exact colour reproduction) if you don’t have time to delve into all this in more detail. A digital recording is better than no recording at all. But try to stick to the general rules.

Equipment

What kind of equipment should I buy? A scanner or a photographic device? If you buy good-quality equipment, you can meet the standards required for a high-quality scan in both scenarios in principle.

A scanner is often simpler for beginners to use, but a good photographic device usually offers more possibilities for taking good pictures because you can configure more parameters. Bear in mind that this is a steep learning curve and you need a good environment where you can control the light. Photos taken without good knowledge of photography or in poorly lit conditions result in worse quality images than those produced by scanners.

If you buy a scanner, make sure the software at least allows you to configure the resolution, bit depth and colour profile, and that the scanner can produce uncompressed TIFF files (see below).

Tip: read user reviews for the device, and seek advice from sellers or TRACKS partners.

A photographic set-up. Good lighting and exposure control is required for high-quality photos.

Software

Good image processing software for editing files and saving them in the right format (see below) is recommended. Adobe Photoshop is very well known and very suitable. You can achieve a lot with this, especially when used in combination with Lightroom, a tool for performing Photoshop actions on multiple images simultaneously. Professionals often use Capture One software.

There are free options available too, such as Gimp for Photoshop. Examples of free software for batch-processing images (without us wanting to recommend this option in particular) include XnView and Faststone Image Viewer.

File format quality

Which file format should you choose: JPEG, TIFF or PNG? The correct answer is that you should save more than a single copy of your file. Create at least an archive file and a reference file. You can also save the master file if desired.

The archive file

The archive file is the copy in which you save all your information in the best possible quality, without risking any loss of information. The archive copy serves as your back-up which you can always return to when you need the highest quality.

Choose Uncompressed Baseline TIFF v 6.0 for your archive copy. This file format requires more storage space than the others, but it is used globally for saving high-quality image data. Make sure you do not use any compression in your archive files. Compression is often achieved by cutting out certain information that isn't immediately visible to the eye, but can be seen when you process the file, (e.g. in Photoshop for book publication).

Make sure the colours in the archive file are coded using the ECI RGB v2 or Adobe RGB colour space. You can configure this in Photoshop.

Check the TIFF. Not every TIFF is a well-made TIFF. The TIFF is produced by your scanner software, which has been created by humans, so something could always go wrong.

The reference file

Uncompressed TIFF files are usually too big for day-to-day use and publication on websites. Use a JPEG copy instead. We call this copy the reference file. The easiest way to create one of these files is with software such as Adobe Lightroom or other alternatives, which can convert TIFFs into JPEGs in batches.

The master file

You can make a distinction between master files and archive files. Both are high-quality TIFF formats, but your master files contain the unprocessed information as it comes out of the scanner or photographic device. Your archive file, on the other hand, is a processed image that might have been straightened, cropped, and so on.

If you save master, archive and reference files, you can be sure you have the right image format for all possible purposes. It does however mean that you need to save a TIFF that is twice as big.

Read more

Authors: This article was originally based on text written by Wim Lowet (Flanders Architecture Institute) in collaboration with Nastasia Vanderperren and Bart Magnus (meemoo).