Supporting Economic Development Research:
A Collaborative Project to Create Access to Statistical Sources Not Born Digital

Final Report (PDF, 1.2 Mb)
submitted to the Andrew W. Mellon Foundation on April 27, 2005


Social Science Research Services and Social Science Libraries and Information Services at Yale University have built a prototype statistical digital library called the Economic Growth Center Digital Library (EGCDL). This project, funded by the Andrew W. Mellon Foundation, digitized a selection of Mexican state statistical abstracts and a selection of Nigerian commodity price statistics volumes from the Yale University Library's Economic Growth Center Library Collection.

The Mexican statistical abstracts (Anuarios Estadísticos de los Estados, Instituto Nacional de Estadística, Geografía e Informática [INEGI]), provide annual data at the state and municipal levels and cover a variety of social and economic indicators including education, employment, agricultural and industrial production and service sector activity. The digital collection spans the years 1994-2000 for all 31 Mexican states.

The project was extended to a series of Nigerian commodity price statistics volumes, covering Nigeria and 16 Nigerian states, over the years 1977-2000, allowing for a comparison of digitization standards, procedures, costs, and outputs and metadata production processes used for the Mexican series.

In a departure from most digital libraries, which concentrate on images or texts, EGCDL focuses on statistical tables. The EGCDL project brings the expertise of data archivists and data management experts into the world of digital library production, and contributes to the advancement of digital libraries by addressing issues and challenges unique to statistical materials, including:

  • Evaluating whether common digitization practices and standards, which were generally developed for images and text, are ideally suited to statistically-intensive documents
  • Automating metadata production for thousands of PDF files and Excel tables
  • Designing a user interface to present the PDF versions of the statistical abstracts along with individual tables from the series.

The EGCDL is an extension of the Economic Growth Center Library Collection (EGCLC) at Yale University. The EGCLC is one of the most comprehensive of its kind in the United States, focuses on materials relating to statistics, economics and planning in over 100 developing countries. It provides an historical perspective to current research in globalization, urban studies and development policies.

Digital equivalents of the Mexico statistical series and the Nigeria commodity price volumes were produced in Adobe Portable Document Format (PDF) and as archival TIFF images. From the Mexico statistical abstracts, tables from the demographic and economic chapters for even-numbered years in the series (1994, 1996, 1998, and 2000) were digitized into Microsoft Excel spreadsheets. The table digitization was an automated process; numeric values from the tables were not manually keyed. PDF and Excel files are available on the project web site.

The project team also produced detailed metadata for the Excel statistical tables in XML according to the Data Documentation Initiative (DDI) specification for numeric data. Users can search for and display specific individual tables, and download tables and metadata from the statistical series for use in statistical analysis packages.

This project also addresses the issues of long term preservation of the digital materials produced in the course of the project and their relationship to the original printed source materials in the collection.

Principal Investigators and Project Staff

