Está en la página 1de 53

CHITO N.

ANGELES

Are there standards for digitization or digital


archiving?

Yes, but limited to certain aspects only.

ISO/TR 13028:2010 - Information and


documentation - Implementation guidelines
for digitization of records.
Not applicable to: technical specifications for
the digital capture of records; technical
specifications for the long-term preservation
of digital records; or digitization of existing
archival holdings for preservation purposes,
etc.

ISO/TR 19005-1:2005; ISO/TR 190052:2011; ISO/TR 19005-3: 2012,


(underdevelopment) - Document
management - Electronic document file
format for long-term preservation.
Specifies how to use the Portable Document
Format (PDF) for long-term preservation of
electronic documents.
Standard is known as PDF/A.

Unlike preservation microfilming and


photocopying, there are no formal standards
that govern the capture, processing, and
storage of digital images.
There are, however, a number of projects and
publications that have set forth best practices
for creating high-quality digital images,
access systems, and storage systems.

Also known as imaging or scanning, is the


means of converting hard-copy, or nondigital, records into digital format.
Hard-copy or non-digital records include
audio, visual, image or text.
Digitization may also be undertaken by taking
digital photographs of the source records,
where appropriate.

Source: Government Recordkeeping Group, Archives New Zealand. Continuum


Create and Maintain: Digitisation Standard (2005).

A process by which digital data is preserved in


digital form in order to ensure the usability,
durability and intellectual integrity of the
information contained therein.
A more precise definition is: the storage,
maintenance, and accessibility of a digital object
over the long term, usually as a consequence of
applying one or more digital preservation
strategies.
These strategies may include technology
preservation, technology emulation or data
migration.

Source: The NINCH Guide to Good Practice in the Digital Representation and
Management of Cultural Heritage Materials (2002).

Born Digital - Digital materials which are


created and retained in digital form.

May or may not have a non-digital equivalent.

Source: Government Recordkeeping Group, Archives New Zealand. Continuum


Create and Maintain: Digitisation Standard (2005).

Digital Repository / Archive - a digital repository is


where digital content, assets, are stored and can
be searched and retrieved for later use.
A repository supports mechanisms to import,
export, identify, store and retrieve digital assets.
Putting digital content into a repository enables
staff and institutions to then manage and
preserve it, and therefore derive maximum value
from it.
Digital repositories may include research outputs
and journal articles, theses, elearning objects and
teaching materials or research data.

Source: Digital Repositories: Helping universities and colleges. JISC, August


2005.

Master - A faithful digital reproduction of

a document, optimized for longevity and


for production of a range of delivery
versions (derivatives).

Masters are captured at the highest


practicable quality or resolution and stored
for long-term usage.
Typically, masters are stored in an off-line
mode on tape or CD and are accessed only
for the production of derivative images.

Source: Government Recordkeeping Group, Archives New Zealand.


Continuum Create and Maintain: Digitisation Standard (2005).

Derivative - an image created from the master


image, through some kind of image editing
process to create a user or working copy.
The process usually involves a loss of
information to reduce the size by sampling it to
a lower resolution, using lossy compression
techniques, or altering an image using image
processing techniques.
Typically, derivatives are made for purposes
such as web access, including thumbnail
images, or as reference or service images
that should fit completely within an average
monitor.

Source: Government Recordkeeping Group, Archives New Zealand.


Continuum Create and Maintain: Digitisation Standard (2005).

Digital images - electronic snapshots taken of


a scene or scanned from documents, such as
photographs, manuscripts, printed texts, and
artwork.

The digital image is sampled and mapped as


a grid of dots or picture elements (pixels).
Each pixel is assigned a tonal value (black,
white, shades of gray or color), which is
represented in binary code (zeros and ones).

Resolution - a measure of the ability to capture


detail in the original work.

The spatial frequency at which a digital image


is sampled (the sampling frequency) is often
a good indicator of resolution.
Dots-per-inch (dpi) or pixels-per-inch (ppi)
are common and synonymous terms used to
express resolution for digital images.

Pixel Dimensions - the horizontal and vertical


measurements of an image expressed in
pixels.

May be determined by multiplying both the


width and the height by the dpi.
Example: an 8" x 10" document scanned at
300 dpi has the pixel dimensions of 2,400
pixels (8" x 300 dpi) by 3,000 pixels (10" x
300 dpi).

Bit Depth- determined by the number of bits


used to define each pixel.

The greater the bit depth, the greater the


number of tones (grayscale or color) that can
be represented.
Digital images may be produced in black and
white (bitonal), grayscale, or color.

Bit Depth

A bitonal image is represented by pixels


consisting of 1 bit each, which can represent
two tones (typically black and white), using
the values 0 for black and 1 for white or vice
versa.
A grayscale image is composed of pixels
represented by multiple bits of information,
typically ranging from 2 to 8 bits or more.

Bit Depth

A color image is typically represented by a bit


depth ranging from 8 to 24 or higher.
With a 24-bit image, the bits are often
divided into three groupings: 8 for red, 8 for
green, and 8 for blue. Combinations of those
bits are used to represent other colors.
A 24-bit image offers 16.7 million (2 24 )
color values.

File Size - calculated by multiplying the surface


area of a document (height x width) to be
scanned by the bit depth and the dpi2.
Because image file size is represented in
bytes, which are made up of 8 bits, divide
this figure by 8.

Formula 1 for File Size

FS = (height x width x bit depth x dpi2) / 8

File Size

Example: Compute the file size of a US-Letter


size page captured in 8-bit Grayscale at
100dpi.
FS = (8.5 x 11 x 8 x 1002)/8
FS = 935,000 bytes.

File Size

If the pixel dimensions are given, multiply


them by each other and the bit depth to
determine the number of bits in an image file.

Formula 2 for File Size

FS= (pixel dimensions x bit depth) / 8

File Size

Example: Compute the file size of a 24-bit


image captured with a digital camera with
pixel dimensions of 2,048 x 3,072.
FS = (2048 x 3072 x 24)/8
FS = 18,874,368 bytes.

Compression - algorithms designed to reduce the


size of the image for storage or transmission.

Lossless schemes (e.g., ITU-T6) abbreviate the


binary code without discarding any information,
so that when the image is "decompressed" it is
bit for bit identical to the original. Most often
used with bitonal scanning of textual material.
Lossy schemes (e.g., JPEG) utilize a means for
averaging or discarding the least significant
information, based on an understanding of visual
perception. Typically used with tonal images.

File Formats - consist of both the bits that


comprise the image and header information
on how to read and interpret the file.

File formats vary in terms of resolution, bitdepth, color capabilities, and support for
compression and metadata.

Optical Character Recognition (OCR) - a


technology that enables you to convert
different types of documents, such as
scanned paper documents, PDF files or
images captured by a digital camera into
editable and searchable data.

Source: http://finereader.abbyy.com/about_ocr/whatis_ocr/

Quality (usability, functionality)


Persistence (long-term access)
Interoperability (e.g., across platforms and
software environments)
Storage Space (file size)
Storage Hardware
Storage Media (e.g., DVDs, CDs)

Master copies should be created to the


highest technical standards achievable.
Image formats should be open-source (non
proprietary), have published technical
specifications available in the public domain.
Image formats should be widely supported by
many software applications and operating
systems.

Digitize an original or first generation (i.e.,


print rather than microfilm) of the source
material to achieve the best quality image
possible.
Create backup copies of all files on servers
and storage media (e.g., DVDs) and have an
off-site backup strategy.
Create meaningful metadata for image files or
collections.

Prior to digitization, consideration of third


party copyright or other constraints inherent
in the record should be resolved.
OCR should be performed on all digital
reproductions where the content is primarily
textual and computer processed. Collections
that are photographic in nature and those not
computer processed need not require OCR.
Plan for future technological developments
and migration.

Tagged Image File Format (TIFF)


Extensions: .tif, .tiff
Bit-depths: 1-bit bitonal; 4- or 8-bit.
grayscale or palette color; up to 64-bit color.
Compression: Uncompressed
Lossless: ITU-T.6, LZW, etc.
Lossy: JPEG

Standard/ Proprietary: De facto standard.


Web Support: plug-in or external application.
Supports multiple images/file (multi-page).

Joint Photographic Expert Group (JPEG) / JPEG


File Interchange Format (JFIF)
Extensions: .jpg, .jpeg, .jif, .jfif
Bit-depths: 8-bit grayscale; 24-bit color.
Compression: Lossless; Lossy: JPEG.
Standard/ Proprietary: JPEG: ISO 10918-1/2;
JFIF: de Facto Standard.
Web Support: Native since Microsoft Internet
Explorer 2, Netscape Navigator 2.

JP2-JPX/ JPEG 2000


Extensions: .jp2, .jpx, .j2k, .j2c
Bit-depths: supports up to 214 channels, each
with 1-38 bits; gray or color.
Compression: Uncompressed
Lossless/Lossy: Wavelet.

Standard/ Proprietary: JPEG: ISO/IEC 15444


parts 1-6, 8-11.
Web Support: Plug-in.

Portable Document Format (PDF)


Extension: .pdf
Bit-depths: 4-bit grayscale; 8-bit color; up to
64-bit color support.
Compression: Uncompressed
Lossless: ITU-T.6, LZW, JBIG
Lossy: JPEG

Standard/ Proprietary: De facto standard.


Web Support: Plug-in or external application.
Contains OCR text layer.

DjVu, pronounced dayzhavoo


Extension: .djvu
Bit-depths: 1-bit bitonal, 4- to 8-bit
grayscale; 24-bit color support.
Compression: Lossless: JB2, IW44; Lossy.
Standard/ Proprietary: Emerging standard.
Web Support: Plug-in or external application.
Supports multiple images/file (multi-page).
Contains OCR text layer.

DjVu
High quality image compression technique:
Scanned bitonal: 300dpi: 5-40K per page (3-10
times better than TIFF/G4).
5-10 times better than than JPEG or PDF

Image Masters
Preservation / Archive Copy
Uncompressed
Highest possible quality recommended

Derivatives
Display / Viewing / Reading
Printing
Thumbnails

Image Masters
TIFF
JPEG (if using digital cameras)

Derivatives / Deliverables
Text/ Documents: PDF, DjVu
Photographs: PNG, DjVu

Black and White


File Format: TIFF
Compression: Uncompressed or Lossless
compressed using CCITT Group 4 (ITU-T6)
Bit Depth: 600dpi, bitonal

Grayscale
File Format: TIFF
Compression: Uncompressed or Lossless
compressed using LZW or JPEG2000
Bit Depth: 300dpi, 8-bit grayscale

Color
File Format: TIFF
Compression: Uncompressed or Lossless
Compressed using LZW or JPEG2000
Bit Depth: 300dpi, 24-bit color

Thumbnail
File Format: JPEG
Compression: Lossy
Resolution: 72-100 dpi

View / Service copy


File Format: JPEG / PDF / DjVu
Compression: Lossy
Resolution: 72-100 dpi

Print Copy (PDF/DjVu)


File Format: PDF / DjVu
Compression: Lossy
Resolution: 100-150 DPI

Flatbed Scanner
Best known and largest selling scanner

Sheet Feed Scanner


Use the same basic technology as flatbeds, but
maximize throughput, usually at the expense of
quality.
Designed for high-volume scanning

Overhead Scanner
High speed book scanner.
Sometimes referred to as Planetary scanner
Bound volumes can be placed face up for scanning

V-Shaped Book Scanner


Uses Digital SLR Cameras and a unique v-shaped,
auto-adjusting book cradle and platen to capture
sharp images at up to 700 pages an hour.
Natively captures flat images.
No need for page curvature
correction.

Image Capture and Processing


IrfanView (Freeware)
Image capture, conversion, processing

Adobe Acrobat (Proprietary)


PDF creation, conversion, processing
OCR
Watermarks

Document Express Editor (Proprietary)


DjVu creation, conversion, processing
OCR

Image Capture
Image Processing
Quality Control
Delivery
Storage and Backup

Document(s) or other materials are captured


in digital form using a scanner or digital
camera.
Guidelines and Procedures:
Pre-scanning
Preparing item level inventory list

Copyright Statement
Should accompany each digital file.
If accessed from the web, copyright statement can be
displayed on the website (if the same rights apply to
all items on the site).

Image editing (if necessary)

Compression of files, sharpening of images,


deskewing, image rotation, cropping, deleting and
reordering pages.

Optical Character Recognition


Creating Derivatives
Adding Watermarks
Adding Security (e.g., restrictions on copying,
printing, or extraction, and password
protection)
Creation of metadata describing the scanned
materials.

What to look for when checking digital


images for quality:
Missing pages.
Incorrect order of pages.
Pages of different sizes.
Readability of text.
Black or white areas on some parts of the page that
is covering the content.
Image not the correct size
Image in wrong resolution
Image in wrong file format

What to look for when checking digital


images for quality:

Image in wrong mode or bit-depth


Overall light problems (e.g., too dark)
Loss of detail in highlights or shadows
Poor contrasts
Uneven tone or flares
Missing scan lines or dropped-out pixels
Lack of sharpness
Excessive sharpening
Image in wrong orientation

What to look for when checking digital


images for quality:

Image not centered or skewed


Incomplete or cropped images
Excessive noise (see dark areas)
Misaligned color channels
Image processing and scanner artifacts (e.g.,
extraneous lines, noise, banding)

The process of getting the scanned images to


the user through computer networks/Web,
monitors, and printers.
Delivery Methods

Removable Storage Devices


Optical Media (CDs, DVDs)
Static Web Pages
Digital Repositories

Recommended Digital Repository software:


Eprints
Dspace
Greenstone

Strategies for storage and backup may


include:
Dedicated server or shared storage solution.
Database Systems
File-based Systems (FTP, WebDav, Shared Folders)

Writing the digitized records to magnetic tape.


Writing the digitized records to optical media (e.g.,
CD, DVD).

También podría gustarte