View on GitHub

cancer-incidence-v5

Code associated to manuscript on age-specific cancer incidence rates by Richardson, Anghel and Deng

Calculation of crude rate and confidence intervals

The crude age-specific incidence rate is computed by dividing the the number of new cancer cases diagnosed for persons in a 5-year age group by the population at risk, and then multiplied by 100,000. The rates are computed for both time periods 2000-2003 and 2010-2013. The number of cancer cases and the susceptible population are from the geographical regions given by the 17 selected SEER cancer registries (omitting Alaska).

The difficulty in computing the age-specific rate is for the ‘oldest old’ category, where the population for the 5-year age groups older than 85 years has to be inferred using the Census data from the 17 registries.

The two-sided confidence intervals were computed as in Harding et al. (2008): “Where x is the count of new diagnoses and n is the person-years at risk, +/- 34.1% two-sided confidence intervals for incidence rates were calculated according to the normal distribution (for x > 10), according to the Poisson distribution (for x < 10 and n > 1000), and by exact binomial proportion (for x < 10 and n < 1,000).”

Processing Steps

The following scripts compute the crude incidence rate and the confidence intervals in R, called from the working directory scripts/incidence-seer. Note that data files were saved with the system date as a prefix in the filename, and thus scripts which load that data may need to be updated with the correct date.

source('seerRunOneYear.R')

which calls three scripts to process each year (2000, 2001, 2002, 2003, 2010, 2011, 2012 and 2013) of SEER and Census data:

rm(list=ls())
source('seerMultipleYears.R')
rm(list=ls())
source('seerErrorCalulation.R')

The script seerMultipleYears.R uses the tables generated for each year to generate a table of all the case counts, population, and crude rate for all 5-year intervals for the time periods 2000-2003 and 2010-2013. Note: At the top of the script, the variables year and popyear have to be changed to indicate which time period to be calculated.

The script seerErrorCalulation.R computes confidence intervals for the crude rate, as described by (Harding et al. 2008). Note: At the top of the script, the variable fileToLoad (different for 2000 and 2010) has to be specified.

The processed tables and output are saved in the outputs/seer/count-data directory.

Return to main page