Capteren van data van 3,5-inch diskettes van House for Electronics Arts (HeK)/en: verschil tussen versies
(Nieuwe pagina aangemaakt met 'This is how we established that seven disk images had the FAT12 file system. The other two had the HFS file system.') |
(Nieuwe pagina aangemaakt met 'We then used the Bitcurator Disk Image Access Tool to retrieve the files from disk images with the FAT12 file system. Bitcurator<ref>For more information, see https...') |
||
Regel 90: | Regel 90: | ||
This is how we established that seven disk images had the FAT12 file system. The other two had the HFS file system. | This is how we established that seven disk images had the FAT12 file system. The other two had the HFS file system. | ||
− | + | We then used the Bitcurator Disk Image Access Tool to retrieve the files from disk images with the FAT12 file system. Bitcurator<ref>For more information, see https://bitcurator.net/bitcurator</ref> is a specialist version of Ubuntu that consists of a collection of forensic tools to help with the preservation of data on external carriers. Bitcurator Disk Image Access Tool is software with which you can see all files on a disk image (including deleted files) and export them. | |
[[Bestand:Disk Image Access Interface 016.png|600px]] | [[Bestand:Disk Image Access Interface 016.png|600px]] |
Versie van 21 sep 2020 14:27
In May 2018, HeK (House for Electronic Arts)[1] asked PACKED vzw to capture data from its 3.5-inch diskettes. This data was the digital art works Raoul A. Pictor cherche son style (1993)[2] by Hervé Graumann and Über Sehen (1993)[3] by Studer / Van den berg. There were nine high-density diskettes in total, some of which were made for Mac and the others for Windows. HeK didn’t have the right reading equipment to capture the data, so PACKED vzw developed a workflow to retrieve it from the diskettes. Diskettes are fragile carriers. If they become too damaged, there’s a very real chance the reading equipment won’t be able to read the carriers and the art works will be lost.
Issue
Diskettes are data carriers with a capacity of 80 KB (first generation) to 2.88 MB (latest generation), which use magnetism to store data. They were ubiquitous in the 1980s until the emergence of the CD-R and USB sticks at the end of the 1990s/early 2000s.
There are various types of diskettes and some variants are not compatible. Many require their own specific reading device, which cannot write or read other types.[4] Diskettes can differ, for example, in:
- size: the first diskettes, invented in the late 1960s by IBM, had an 8-inch diameter. The 5.25-inch diskette was introduced for home computers in the mid 1970s. The 3.5-inch diskette became the most popular data storage medium in 1988. Diskettes were also available in 2, 2.5, 3, 3.25 and 4-inch formats, but they never fully broke through.
- the number of tracks and sectors: data is organised in tracks and sectors on diskettes. Tracks are concentric circles around the centre of the diskette with spaces left in between. Nothing is written in these spaces. Sectors are blocks that are a constant size (expressed in bytes), each with their own identification number so the operating system can find the data on the diskette. Diskettes can also differ in the number of tracks they have per side[5], per sector and per inch, and the number of bytes they have per track.
- the number of writeable sides: there are single-sided and double-sided diskettes. A diskette reader that can read single-sided diskettes can’t necessarily read double-sided diskettes, and vice versa.
- density: this is the efficiency with which data can be stored on a magnetic carrier. The higher the density, the more data a diskette can store. A greater density is achieved for example by coding improvements for data storage, the magnetic strength at which the data can be written and the material used. There are single-density (SD or 1D), double-density (DD or 2D), quad-density (QD or 4D), high-density (HD), extra-high density (ED) and triple-density (TD) diskettes.
- logical format: the logical format is the file system that determines how the data is written to the carrier. The most common formats are FM (for DOS-formatted, single-density diskettes), MFM (for double-density diskettes that are DOS-formatted and high-density diskettes) and GCR, which has an Apple variant and a Commodore variant. There are also separate formats for Atari and Amiga, among others.
The consequence of all these differences is, for example, that a 3.5-inch diskette station cannot read every 3.5-inch diskette.
The many variants mean that capturing data from diskettes can be a challenge. Diskette readers with a USB connection, which you can still buy today, can usually only read high-density 1.44 MB diskettes, which was the most popular format after the mid 1990s. Diskettes are also fragile carriers. They’re sensitive to dust, condensation and temperature fluctuations, and can’t be stored near magnets or magnetic devices. Any damage can render them unreadable, making it very difficult or even impossible to retrieve data from them.
Status
We captured the content from the nine 3.5-inch diskettes. The files were retrieved from the disk image, identified and saved to a contemporary data carrier.
Method
We decided to create disk images to capture the data. Disk images are bit-for-bit copies of the diskettes. This doesn’t just store the files, but also all the system information, on the carrier. So the information on the carrier is copied as completely as possible, and remains as close to the original as possible. Then you can retrieve the files from the disk image and identify them. Disk images can be created with software that performs a checksum control on the source (the original disk content) and the disk image (the copy).[6] This ensures that there haven’t been any errors when creating the disk image, and that the disk image is an identical copy of the original.
The copied carriers were listed in a spreadsheet with the following columns:
- UI (unique identifier): to create the unique identifier, we used the code assigned to the art work by the institution, and then added a consecutive 3-figure number for each carrier, starting with 001. For example, the unique identifier 2008_199_001 refers to the first carrier processed for the art work with number 2008/199.
- Institution: the name of the museum, i.e. HeK.
- Carrier type: the type of diskette. For HeK, these were 3.5-inch DS HD diskettes[7].
- Carrier format: the logical format on the diskette. In the case of the high-density diskettes from HeK, this was MFM.
- Information on the carrier: all the information from the label on the diskette.
- Functional? If the disk image could be opened and the files retrieved from it, then the diskette was considered to be functional.
- Copied with no errors? This field indicates if a disk image could be created without the software encountering any errors while reading the carrier.
- MD5 checksum: an MD5 checksum was created for every disk image. These checksums are used to check the file integrity.
- Notes: this column includes relevant information about the carrier, e.g. it was an empty diskette, not all files could be retrieved from the diskette, or the error messages that we received when we tried to open the disk image.
In order to prevent our computer files being written to the external carriers, we used write blockers. 3.5-inch diskettes have a write blocker on the carrier which makes the diskette read-only. This is the slider in the bottom left corner. We also used a hardware write blocker. This equipment prevents a computer from being able to write data on the connected carrier.
Create disk images
When testing a reading device with a USB connection, we established that it could read the HeK 3.5-inch high-density diskettes. We used Guymager[8] software to create a disk image from the diskettes. Guymager is open source software that’s used to create disk images of evidence in forensic examinations. It’s extremely important that data is captured unchanged for forensic examinations, and Guymager makes this possible. It has various features to check that the copy is the same as the original. It’s also important that data is saved unchanged for digital preservation. Another of Guymager’s advantages is that it automatically creates metadata in the capturing process and writes it to a text file, such as the checksums for both the carrier and the disk image, for example.
The software is designed so that that an MD5-checksum can be created, and the MD5-checksum for the disk image and the original carrier can be compared to ensure that the disk image and carrier are identical. We opted for Linux dd raw image as the file format because it’s an open format supported by all operating systems. Expert Witness Format is a proprietary format and can only be opened with a limited number of applications.
This enabled us to make identical copies of the nine diskettes.
Exporting disk image files
A disk image is not a file that you can simply open to look up data. It differs from copying files from a single location because the disk image saves all the system information as well as the files from the carrier. For a computer, a disk image is therefore equivalent to an external drive or carrier that needs to be read in. To read or use the files and folders from a disk image, you need to connect or mount the disk image to your computer. This can be risky because some operating systems (invisibly) write files to the connected storage media. Sometimes it’s also not possible to mount a disk image because of its file system. File systems are software categorisations for a storage medium (e.g. hard drive or external carrier) that the operating system needs to display the data as files on the medium and use them in applications. Some file systems can only be used on a specific operating system, whereas others are accessible to multiple operating systems.[9] It’s possible, for example, that a disk image from an (external) drive that’s been formatted for Windows cannot be opened on a Mac computer, or vice versa.
In order to ensure that HeK had access to the files on the disk image, we first exported and identified them, using software to ensure we could export all the files – including hidden files – without altering the disk images. Before we could start exporting files from the disk images, we first needed to know which file system the disk images had. Selecting the right tool does after all depend on the file system. This information is also needed in case you want to open the files in an emulation environment. The appropriate emulation environment can be selected on the basis of the file system.
We always performed the following actions for the export:
- Determine the file system
- Create an index file with an overview of all files on the disk image
- Retrieve the files from the disk image
- Identify the file formats. This step is required to know which software you can use to open the files (if the computer doesn’t automatically find this out itself).
Determining the file system
The most common file systems for diskettes used in MS-DOS/Windows and Classic Macintosh are FAT12[10] and HFS[11]. FAT is a file system that was developed for MS-DOS and Windows, for which FAT12 is used specifically for diskettes. It is widely supported, including by almost all modern operating systems (Windows, Mac and Linux). HFS is an obsolete file system that was developed by Apple and used for diskettes and hard drives. HFS disk images can only be read on Mac (both Classic Macintosh and the modern OS X/macOS).
We used Disktype to determine the file system. This is a command line tool that can be used in UNIX environments such as Linux or Mac, or via Cygwin[12] on Windows, to establish the file systems on a disk or disk image. We use the command disktype image.img > disktype.txt to write the info to a text file disktype.txt for the disk image with the name image.img (see screenshot).
This is how we established that seven disk images had the FAT12 file system. The other two had the HFS file system.
We then used the Bitcurator Disk Image Access Tool to retrieve the files from disk images with the FAT12 file system. Bitcurator[13] is a specialist version of Ubuntu that consists of a collection of forensic tools to help with the preservation of data on external carriers. Bitcurator Disk Image Access Tool is software with which you can see all files on a disk image (including deleted files) and export them.
Bitcurator Disk Imge Access Tool kan geen disk images met het bestandssysteem HFS gebruiken. Voor HFS bestaat er een gelijkaardige software, HFSExplorer. Ook hiermee je alle bestanden (inclusief verborgen) kunt exporteren met behoud van de originele metadata zoals laatste bewerkingsdatum.
Met deze software konden we van alle diskettes de bestanden van de disk images exporteren.
Bestanden identificeren
Nadat alle bestanden van de disk images gehaald werden, konden ze geïdentificeerd worden. Hiervoor werd DROID gebruikt. DROID identificeert bestanden op twee manieren. Enerzijds door de bestandsextensie, anderzijds door een code die opgeslagen is in de bitstream van een bestand. Het gebruikt hiervoor de PRONOM-databank. DROID slaagde er niet in om alle bestanden te identificeren. Dit komt doordat in het HFS-bestandssysteem (de klassieke Mac-omgeving) bestanden geen extensie hadden of omdat bestanden verkeerde extensies hadden. Als DROID de interne code van een bestand niet kent, en enkel een bestand kan identificeren op basis van de extensie, dan is het voor DROID onmogelijk om deze bestanden te herkennen.
Besluit
Gegevens op obsolete dragers zijn fragiel en dreigen te verdwijnen, o.m. doordat de leesapparatuur zeldzaam wordt, maar ook omdat de dragers verouderen waardoor ze niet goed meer gelezen kunnen worden. Daarom moeten ze zo snel mogelijk naar een hedendaagse gegevensdrager overgebracht worden. Met behulp van een diskettelezer met USB-aansluiting, een write blocker en software zoals disktype, Guymager, HFSExplorer en Bitcurator konden we alle negen diskettes overzetten naar een hedendaagse gegevensdrager.
Wanneer je zelf in je archief een diskette vind, contacteer ons dan vooraleer je zelf pogingen doet om de drager te lezen. Bezorg ons alle informatie die je hebt over de drager, zoals de periode waarin ze gebruikt werd, de computer waarop de drager gebruikt werd (Mac of Widows/MS-DOS) en een foto van de drager. Dit maakt het voor ons makkelijker om de drager te identificeren en te bepalen welke strategie we moeten gebruiken om de gegevens van de drager af te halen.
Auteur: Nastasia Vanderperren (PACKED vzw)
- ↑ For more information, see http://www.hek.ch/en.html
- ↑ for more information, see http://www.hek.ch/en/collection/collection-single/collection/raoul-a-pictor-cherche-son-style.html
- ↑ Über Sehen is a screensaver. See http://www.studervandenberg.ch/works.html
- ↑ An non-exhaustive list of diskette types: https://en.wikipedia.org/wiki/List_of_floppy_disk_formats
- ↑ the most common number of tracks is 40 or 80.
- ↑ Such as Guymager, Isobuster, FTK imager and Disk Utility
- ↑ DS stands for double-sided, HD for high-density.
- ↑ http://guymager.sourceforge.net/
- ↑ For more information, see https://en.wikipedia.org/wiki/File_system.
- ↑ For more information, see https://en.wikipedia.org/wiki/File_Allocation_Table#FAT12.
- ↑ For more information, see https://en.wikipedia.org/wiki/Hierarchical_File_System.
- ↑ Cygwin aims to allow programs of Unix-like systems to be recompiled and run natively on Windows with minimal source code modifications, https://en.wikipedia.org/wiki/Cygwin
- ↑ For more information, see https://bitcurator.net/bitcurator