View on GitHub

cancer-incidence-v5

Code associated to manuscript on age-specific cancer incidence rates by Richardson, Anghel and Deng

U.S. Census Data Processing

Raw Data

The population data from the United States Census for the years 2000 and 2010 was downloaded from the (now decommissioned) United States Census Bureau American Fact Finder https://factfinder.census.gov/ website (accessed on 2018-10-21 and 2018-10-22). We used the Advanced Search with the following options:

For instance, to obtain the information for the Detroit metropolitan area registry, the “County” geographic type was selected, and the three counties:’Macomb County, Michigan’, ‘Oakland County, Michigan’, and ‘Wayne County, Michigan’.

The PCT12 or PCT012 file with sex and age by single year intervals was downloaded for each query, and PCT12C in the case of Alaska. For Alaska, the population counts are restricted to American Indian/Alaska Native individuals.

The information about the state and counties in a SEER registry can be found from the list on the SEER website, as well as more detailed information in an SEER manual appendix (2016).

Raw Data Directory Structure

The files downloaded above are saved in the data/population folder, in two subfolders labelled 2000 and 2010:

+---2000
|   +---2000-Alaska-native
|   |       aff_download_readme_ann.txt
|   |       DEC_00_SF1_PCT012C.txt
|   |       DEC_00_SF1_PCT012C_metadata.csv
|   |       DEC_00_SF1_PCT012C_with_ann.csv
|   |
|   +---2000-California-counties
|   |       aff_download_readme_ann.txt
|   |       DEC_00_SF1_PCT012.txt
|   |       DEC_00_SF1_PCT012_metadata.csv
|   |       DEC_00_SF1_PCT012_with_ann.csv
|   |
|   +---2000-Detroit-metro
|   |       aff_download_readme_ann.txt
|   |       DEC_00_SF1_PCT012.txt
|   |       DEC_00_SF1_PCT012_metadata.csv
|   |       DEC_00_SF1_PCT012_with_ann.csv
|   |
|   +---2000-Georgia-counties
|   |       aff_download_readme_ann.txt
|   |       DEC_00_SF1_PCT012.txt
|   |       DEC_00_SF1_PCT012_metadata.csv
|   |       DEC_00_SF1_PCT012_with_ann.csv
|   |
|   +---2000-states
|   |       aff_download_readme_ann.txt
|   |       DEC_00_SF1_PCT012.txt
|   |       DEC_00_SF1_PCT012_metadata.csv
|   |       DEC_00_SF1_PCT012_with_ann.csv
|   |
|   \---2000-Washington-counties
|           aff_download_readme_ann.txt
|           DEC_00_SF1_PCT012.txt
|           DEC_00_SF1_PCT012_metadata.csv
|           DEC_00_SF1_PCT012_with_ann.csv
|
\---2010
    +---2010-Alaska-native
    |       aff_download_readme_ann.txt
    |       DEC_10_SF1_PCT12C.txt
    |       DEC_10_SF1_PCT12C_metadata.csv
    |       DEC_10_SF1_PCT12C_with_ann.csv
    |
    +---2010-California-counties
    |       aff_download_readme_ann.txt
    |       DEC_10_SF1_PCT12.txt
    |       DEC_10_SF1_PCT12_metadata.csv
    |       DEC_10_SF1_PCT12_with_ann.csv
    |
    +---2010-Detroit-metro
    |       aff_download_readme_ann.txt
    |       DEC_10_SF1_PCT12.txt
    |       DEC_10_SF1_PCT12_metadata.csv
    |       DEC_10_SF1_PCT12_with_ann.csv
    |
    +---2010-Georgia-counties
    |       aff_download_readme_ann.txt
    |       DEC_10_SF1_PCT12.txt
    |       DEC_10_SF1_PCT12_metadata.csv
    |       DEC_10_SF1_PCT12_with_ann.csv
    |
    +---2010-states
    |       aff_download_readme_ann.txt
    |       DEC_10_SF1_PCT12.txt
    |       DEC_10_SF1_PCT12_metadata.csv
    |       DEC_10_SF1_PCT12_with_ann.csv
    |
    \---2010-Washington-counties
            aff_download_readme_ann.txt
            DEC_10_SF1_PCT12.txt
            DEC_10_SF1_PCT12_metadata.csv
            DEC_10_SF1_PCT12_with_ann.csv

Processing Steps

Information for different registries is saved and then sourced for subsequent scripts from the R file popRegistryInfo.R.

The processing of the population files is done by running the following script in R, called from the working directory scripts/census-population:

source('popProcessAll.R')

which calls four scripts to process the data from each of the Census years:

Processed files are saved in the outputs/population directory. Note that the files are saved with the system date as a prefix in the filename, so that subsequent scripts which load that data may need to be edited to have the updated date.

Older fraction by gender and registry

The output .csv files for the older fraction of the 85+ population for the 2000 Census (by different age groups and different registries) are given below: #* Male #* Female #* Both

Return to main page