ECONOMIC GROWTH CENTER DIGITAL LIBRARY
Supporting Economic Development Research:
A Collaborative Project to Create Access to Statistical Sources
Not Born Digital
Report (PDF, 1.2 Mb)
submitted to the Andrew W. Mellon Foundation on April 27, 2005
Social Science Research Services
and Social Science
Libraries and Information Services at Yale University have built
a prototype statistical digital library called the Economic Growth
Center Digital Library (EGCDL). This project, funded by the Andrew
W. Mellon Foundation, digitized a selection of Mexican state
statistical abstracts and a selection of Nigerian commodity price
statistics volumes from the Yale University Library's Economic
Growth Center Library Collection.
The Mexican statistical abstracts (Anuarios Estadísticos
de los Estados, Instituto Nacional de Estadística, Geografía
e Informática [INEGI]), provide annual data at the state
and municipal levels and cover a variety of social and economic
indicators including education, employment, agricultural and industrial
production and service sector activity. The digital collection spans
the years 1994-2000 for all 31 Mexican states.
The project was extended to a series of Nigerian commodity price
statistics volumes, covering Nigeria and 16 Nigerian states, over
the years 1977-2000, allowing for a comparison of digitization standards,
procedures, costs, and outputs and metadata production processes
used for the Mexican series.
In a departure from most digital libraries, which concentrate on
images or texts, EGCDL focuses on statistical tables. The EGCDL
project brings the expertise of data archivists and data management
experts into the world of digital library production, and contributes
to the advancement of digital libraries by addressing issues and
challenges unique to statistical materials, including:
- Evaluating whether common digitization practices and standards, which were generally developed for images and text,
are ideally suited to statistically-intensive documents
- Automating metadata production for thousands of PDF files and Excel tables
- Designing a user interface to present the PDF versions of the statistical abstracts along with individual tables from the series.
The EGCDL is an extension of the Economic Growth Center Library Collection (EGCLC) at Yale University.
The EGCLC is one of the most comprehensive of its kind in the United States, focuses on materials relating to statistics,
economics and planning in over 100 developing countries. It provides an historical perspective to current research in globalization,
urban studies and development policies.
Digital equivalents of the Mexico statistical series and the Nigeria
commodity price volumes were produced in Adobe Portable Document
Format (PDF) and as archival TIFF images. From the Mexico statistical
abstracts, tables from the demographic and economic chapters for
even-numbered years in the series (1994, 1996, 1998, and 2000) were
digitized into Microsoft Excel spreadsheets. The table digitization
was an automated process; numeric values from the tables were not
manually keyed. PDF and Excel files are available on the project
The project team also produced detailed metadata for the Excel
statistical tables in XML according to the Data Documentation Initiative
(DDI) specification for numeric data. Users can search for and display
specific individual tables, and download tables and metadata from
the statistical series for use in statistical analysis packages.
This project also addresses the issues of long term preservation of the digital materials produced in the course of the project
and their relationship to the original printed source materials in the collection.
Principal Investigators and Project Staff