header left decoration

Simple Search | Advanced Search | Help Searching StatCat | Help Using Data | About StatCat

StatCat Search Help

Words in a search are automatically connected with AND. That is, a search on income education will find records with both the word income AND the word education.

Use Boolean operators (AND, OR, NOT) to combine search terms. For example: income OR education finds records with either word in the record. income NOT education finds records with the word income but not the word education. AND, OR, NOT must be in capital letters.

Use quotation marks to search phrases. For example: "current population survey"

Search terms are not case-sensitive.

Truncation and wildcards:
Use ? for a single character wildcard search.
For example: wom?n finds women or woman.
Use * for a multiple character wildcard search or to truncate a word.
For example: test* finds test, tests, testing, etc.

Use + to indicate that a term must appear. For example, income +education will locate records containing the word "education"; if the record also contains the word "income", it will be ranked higher in the results.

See the Jakarta Lucene Query Parser Syntax page for information on constructing additional types of searches.

"Keywords anywhere" searches the following fields (for field definitions, see below):

Abstract
Author
Bibliographic Citation
Data Collector
Distributor
Geographic Coverage
Geographic Unit
Holding Notes
Keyword
Producer
Related Materials
Related Publications
Related Studies
Series Name
Series Information
Title

Subject headings are no longer used in StatCat and the "Browse Subjects" function is no longer available.

StatCat field definitions

Abstract
Author
Bibliographic Citation
Case Count
Class or Status of the Study
Collection Notes
Data Access Information
Data Appraisal
Data Collector
Data Format
Data Source
Date of Collection
Date of Production
Distributor
Extent of Collection
Extent of Processing Checks
File Structure
File Type
Frequency of Data Collection
Funding Agency
Geographic Coverage
Geographic Unit
Grant Number
Holding Notes
Keyword
Kind of Data
Logical Record Length
Media
Mode of Data Collection
Nation
Place of Production
Producer
Records Per Case
Related Materials
Related Publications
Related Studies
Response Rates
Restrictions
Sampling
Series Name and Series Description
Study Number
Time Method
Time Period
Total Number of Records
Unit of Observation
Universe
Variable Count
Version History

Abstract
A summary description of the data collection.

Author
The name of the study's principal investigator(s). Authors can be individuals, organizations, or a combination of both.

Bibliographic Citation
Use the bibliographic citation in papers or publications based on the data. For more information on citing ICPSR versions of studies, see Citing Electronic Data Files.

Case Count
The number of cases or observations in the data file. For hierarchical data files and non-data files, this field is filled in as "inap."

Class or Status of the Study
Indicates the processing status of the study; data distributors may use a class or study status number to indicate processing status.

Collection Notes
Used to describe details about the data collection that are not recorded in other fields.

Data Access Information
Links to details of data holdings available on CD-ROM or on the Statlab server at Yale or available on the Internet.

Data Appraisal
Describes issues such as response variance, nonresponse rate and testing for bias, interviewer and response bias, confidence levels, question bias, etc.

Data Collector
The individual, agency, or institution responsible for administering the questionnaire or interview or compiling the data.

Data Format
Physical format of the data file, such as: logical record length format (LRECL), card image (i.e. data with multiple records per case), OSIRIS, SPSS Portable, SAS Transport, delimited format, etc. For more information, see ICPSR's Types of Data Formats and Types of Data Structures.

Data Source
Describes the type of technique or data collection instrument used to collect the data. Also used to list any book(s), article(s), serial(s), and/or machine-readable data file(s) that served as the source(s) of the data file.

Date of Collection
Date(s) when the data were collected (as distinguished from Date of Production or Time Period).

Date of Production
Date the data collection was produced (as distinguished from Date of Collection or Time Period).

Distributor
The organization designated by the author or producer to generate copies of a particular data collection including any necessary editions or revisions. Examples: ICPSR, Roper Center.

Extent of Collection
Summarizes the number of physical files that exist in a collection, recording the number of files that contain data and noting whether the collection contains machine-readable documentation and/or other supplementary files and information.

Extent of Processing Checks
This field contains abbreviations that describe processing activities and checks performed on data collections either by ICPSR or by others.

ICPSR's Extent of Processing Key:

CDBK.ICPSR = ICPSR produced a codebook for this collection.
CONCHK.PR = Consistency checks performed by Data Producer/ Principal Investigator.
CONCHK.ICPSR = Consistency checks performed by ICPSR.
DDEF.ICPSR = ICPSR generated SAS and/or SPSS data definition statements for this collection.
FREQ.PR = Frequencies provided by Data Producer/Principal Investigator.
FREQ.ICPSR = Frequencies provided by ICPSR.
MDATA.PR = Missing data codes standardized by Data Producer/Principal Investigator.
MDATA.ICPSR = Missing data codes standardized by ICPSR.
RECODE = ICPSR performed recodes and/or calculated derived variables.
REFORM.DATA = Data reformatted by ICPSR.
REFORM.DOC = Documentation reformatted by ICPSR.
SCAN = Hardcopy documentation converted to machine-readable form by ICPSR.
UNDOCCHK.PR = Checks for undocumented codes performed by Data Producer/Principal Investigator.
UNDOCCHK.ICPSR = Checks for undocumented codes performed by ICPSR.

File Structure
Used to describe the structure of the file/part -- rectangular, hierarchical, relational. Note, "inap." will appear in this field for codebook files, dictionary files, and other non-data files. For more information, see ICPSR's Types of Data Structures.

File Type
Types of data files include raw data (ASCII, EBCDIC, etc.) and software-dependent files (SAS, SPSS, etc.).

Frequency of Data Collection
Used if data were collected at more than one point in time (e.g. monthly, quarterly).

Funding Agency
The source(s) of funds for production of the data collection.

Geographic Coverage
Geographic scope of the data; may include additional levels of geographic coding provided in the variables.

Geographic Unit
Lowest level of geographic aggregation covered by the data (e.g. state).

Grant Number
The grant or contract number of the project that sponsored the data collection.

Holding Notes
Details distinguishing a particular holding of a dataset.

Keyword
Words or phrases that describe a data collection's content. In StatCat, "keywords anywhere" searches the keyword field as well as several others (see above).

Kind of Data
Examples of different kinds of data include:

  1. census/enumeration data-- data collected from all members of a population
  2. aggregate data--summarized statistical data for an entire population
  3. clinical data--when the data deal with psychological or medical-related testing
  4. event/transaction data--when the data deal with a succession of events or transactions that occur over a specified time period
  5. survey data--data collected from a sample of respondents, generally through structured interviews or self-administered questionnaires
  6. program source code--when the file consists of computer program language
  7. machine-readable text--when the file is composed solely of computer- readable text
  8. administrative records data--information collected on individuals or groups as part of the routine administrative procedures of an agency, business, or institution. Such data are not usually collected with research purposes in mind, may be voluminous, and may require preparation such as coding to be usable by researchers. (Examples: income tax forms, patent applications, naturalization records, death certificates.)
  9. experimental data--data gleaned from experiments

Logical Record Length
Number of characters in the logical record of each file or part.

Media
On what medium or media are the data available; e.g. CD-ROM, Statlab server, Internet.

Mode of Data Collection
Method used to collect the data (e.g. telephone interviews, mail questionnaires, etc.).

Nation
Country or countries covered in the file.

Place of Production
Address of the archive or agency that produced the data collection (see Producer).

Producer
The producer of the data collection is the person or organization with the financial or administrative responsibility for the physical processes whereby the data collection was brought into existence.

Records Per Case
Used for card-image data or other files in which there are multiple records per case. Note, "inap." Will appear in this field for codebook files, dictionary files, and other non-data files as well as hierarchical data files.

Related Materials
Describes materials related to the study description, such as appendices, additional information on sampling found in other documents, etc.

Related Publications
Contains information about primary or related publications that are based on the data, such as articles and reports.

Related Studies
Information on the relationship of the current data collection to others (e.g., predecessors, successors, other waves or rounds) or to other editions of the same file. This would include the names of additional data collections generated from the same data collection vehicle plus other collections directed at the same general topic.

Response Rates
The proportion of respondents from the selected sample who provided information.

Restrictions
Contains information regarding any limitations on use or restrictions on access to the file(s). Example: "Data may be used by current Yale faculty, students, and staff."

Sampling
Describes how the cases that appear in the study were selected. The sample is a selection out of the universe of all possible relevant cases (e.g. adults in the United States, housing units in three counties of Michigan, etc.) that could have been included in the study.

Series Name and Series Description
The name of and information about the data series to which the collection belongs, if any.

Study Number
Unique number assigned by the distributor. ICPSR numbers are four digits; Roper numbers are a combination of letters and numbers. Other distributors may not assign numbers to studies.

Time Method
Types of time methods include: panel survey, cross-section, trend study, time series, etc.

Time Period
The time period covered by the data (as distinguished from Date of Collection or Date of Production).

Total Number of Records
Overall record count in the file. Used in instances such as files with multiple cards/decks or records per case.

Unit Of Observation
Describes who or what are being studied: individuals, families/households, groups, institutions, etc.

Universe
The group of persons or other elements that are the object of the study and to which the study results refer. Age, nationality, and residence commonly help to delineate a given universe, but any of a number of factors may be involved, such as sex, race, income, veteran status, criminal convictions, etc. The universe may consist of elements other than persons, such as housing units, court cases, deaths, countries, etc.

Variable Count
Number of variables in the file.

Version History
This field is used to explain changes that have been made to the data collection since its last release.

DDI: Metadata powered by the Data Documentation
Initiative Some of these definitions are adapted or copied from the Data Documentation Initiative (DDI) and the Inter-university Consortium of Political and Social Research (ICPSR) field descriptions.

Yale University Social Science Statistical Laboratory and Social Science Libraries and Information Services
Reference services: Social Science Data Librarian
Statistical consulting: Statlab
This page last modified: June 14, 2005
© 2005 Yale University