Title: | Individualized Coherent Absolute Risk Estimation (iCARE) |
---|---|
Description: | An R package to build, validate and apply absolute risk models |
Authors: | Parichoy Pal Choudhury, Paige Maas, William Wheeler, Nilanjan Chatterjee |
Maintainer: | Parichoy Pal Choudhury <[email protected]> |
License: | GPL-3 + file LICENSE |
Version: | 1.35.0 |
Built: | 2024-12-23 06:03:28 UTC |
Source: | https://github.com/bioc/iCARE |
Example data for computeAbsoluteRisk
, computeAbsoluteRiskSplitInterval
,
ModelValidation
, and plotModelValidation
.
bc_model_cov_info
: a main list containing information on family history, age at menarche (years),
parity, age at first birth (years), age at menopause (years), height (meters), Body Mass Index (kg/sq.m.),
use of hormone replacement therapy, use of estrogen and progesterone combined therapy,
use of estrogen only therapy, current use of hormone replacement therapy, alcohol (drinks/week), smoking status.;
information on each risk factor is given as a list
bc_model_formula
: formula for the specification of the models with risk factors
bc_72_snps
: contains published SNP information from reference:
Michailidou K, Lindstrom S, Dennis J, Beesley J, Hui S, Kar S, Lemacon A, Soucy P, Glubb D, Rostamianfar A,
et al. (2017) Association analysis identifies 65 new breast cancer risk loci. Nature 551:92-94
bc_model_log_or
: vector of log-odds ratios of family history, age at menarche (years), parity,
age at first birth (years), age at menopause (years), height (meters), Body Mass Index (kg/sq.m.),
use of hormone replacement therapy, use of estrogen and progesterone combined therapy,
use of estrogen only therapy, current use of hormone replacement therapy, alcohol (drinks/week), smoking status.
bc_model_log_or_post_50
: vector of log-odds ratios of family history, age at menarche (years), parity, age at first birth (years),
age at menopause (years), height (meters), Body Mass Index (kg/sq.m.), use of hormone replacement therapy,
use of estrogen and progesterone combined therapy, use of estrogen only therapy,
current use of hormone replacement therapy, alcohol (drinks/week), smoking status. for women 50 years or older
ref_cov_dat
: contains individual level reference dataset of risk factors representative of the underlying population imputed using reference (4) and (5)
ref_cov_dat_post_50
: contains individual level reference dataset of the risk factors for women aged 50 years or older
bc_inc
: contains age-specific incidence rates of breast cancer from reference (3)
mort_inc
: contains age-specific incidence rates of all-cause mortality from reference (1) below
new_cov_prof
: Information on family history, age at menarche (years), parity, age at first birth (years), age at menopause (years), height (meters),
Body Mass Index (kg/sq.m.), use of hormone replacement therapy, use of estrogen and progesterone combined therapy,
use of estrogen only therapy, current use of hormone replacement therapy, alcohol (drinks/week),
smoking status for three women (given for illustration of absolute risk prediction)
new_snp_prof
: Information on 72 breast cancer associated SNPs for three women (given for illustration of absolute risk prediction)
validation.cohort.data
: Simulated full cohort dataset of 50,000 women for illustration
of model validation. The variables are:
id
: Subject id
famhist
: Family history; binary indicator of presence/absence of disease among first degree relatives
parity
: number of child births categorized as nulliparous (ref), 1 births, 2 births, 3 births, 4+ births
menarche_dec
: categories of age at menarche (years) with levels: less than 11,11-11.5,11.5-12,12-13(ref),13-14,14-15, greater than 15
birth_dec
: categories of age at first birth (years) with levels: less than 19 (ref), 19-22,22-23,23-25,25-27,27-30,30-34,34-38, greater than 38
agemeno_dec
: categories of age at menopause (years) with levels: less than 40 (ref), 40-45, 45-47, 47-48, 48-50, 50-51, 51-52, 52-53, 53-55, greater than 55
height_dec
: categories of height (meters) with levels: less than 1.55, 1.55-1.57, 1.57-1.60, 1.60-1.61, 1.61-1.63, 1.63-1.65, 1.65-1.66, 1.66-1.68, 1.68-1.71
bmi_dec
: categories of body mass index (kg/sq.m.) with levels: less than 21.5 (ref), 21.5-23, 23-24.2, 24.2-25.3, 25.3-26.5, 26.5-27.8, 27.8-29.3, 29.3-31.4, 31.4-34.6
rd_menohrt
: use of hormone replacement therapy with levels: premenopausal (ref), postmenopausal and never HRT user, postmenopausal and ever HRT user
rd2_everhrt_c
: binary indicator for postmenopausal and ever user of estrogen and progesterone combined therapy
rd2_everhrt_e
: binary indicator for postmenopausal and ever user of estrogen only therapy
rd2_currhrt
: binary indicator of postmenopausal and current HRT user
alcoholdweek_dec
: alcohol in drinks per week categorized into levels: none (ref), 0-0.4, 0.4-0.8, 0.8-1.5, 1.5-3.2, 3.2-5.7, 5.7-9.8, >9.8
ever_smoke
: binary indicator for ever smoker
study.entry.age
: age of study entry
study.exit.age
: age of study exit
observed.outcome
: binary indicator of disease status (yes/no)
time.of.onset
: time (in years) since study entry to the development of disease;
for subjects who have not developed disease beyond the observed followup, it is set to Inf
observed.followup
: number of years the subject is followed up in the study
(difference between the age of study exit and age of study entry)
validation.nested.case.control.data
: A simulated example of a case-control study of 5285 women,
nested within the full cohort.
In addition to the variables given above, it has information on the 72 breast cancer associated SNPs with
variable names being the rs-identifiers.
output
: object returned from computeAbsoluteRisk
(1) Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS). Underlying Cause of Death 1999-2011 on CDC WONDER Online Database, released 2014. Data are from the Multiple Cause of Death Files, 1999-2011, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program.
Accessed at http://wonder.cdc.gov/ucd-icd10.html on Aug 26, 2014.
(2) Michailidou K, Beesley J, Lindstrom S, et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nature genetics 2015;47:373-80.
(3) Surveillance, Epidemiology, and End Results (SEER) Program SEER*Stat Database: Incidence - SEER 18 Regs Research Data, Nov 2011 Sub, Vintage 2009 Pops (2000-2009) <Katrina/Rita Population Adjustment> - Linked To County Attributes - Total U.S., 1969-2010 Counties. In: National Cancer Institute D, Surveillance Research Program, Surveillance Systems Branch, ed. SEER18 ed.
(4) 2010 National Health Interview Survey (NHIS) Public Use Data Release, NHIS Survey Description. 2011.
(Accessed at ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/NHIS/2010/srvydesc.pdf.)
(5) Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Questionnaire. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention; 2010.
temp <- data(bc_data, package="iCARE") # Display the object names temp
temp <- data(bc_data, package="iCARE") # Display the object names temp
This function is used to build absolute risk models and apply them to estimate absolute risks.
computeAbsoluteRisk(model.formula = NULL, model.cov.info = NULL, model.snp.info = NULL, model.log.RR = NULL, model.ref.dataset = NULL, model.ref.dataset.weights = NULL, model.disease.incidence.rates, model.competing.incidence.rates = NULL, model.bin.fh.name = NA, n.imp = 5, apply.age.start, apply.age.interval.length, apply.cov.profile = NULL, apply.snp.profile = NULL, use.c.code = 1, return.lp = FALSE, return.refs.risk = FALSE)
computeAbsoluteRisk(model.formula = NULL, model.cov.info = NULL, model.snp.info = NULL, model.log.RR = NULL, model.ref.dataset = NULL, model.ref.dataset.weights = NULL, model.disease.incidence.rates, model.competing.incidence.rates = NULL, model.bin.fh.name = NA, n.imp = 5, apply.age.start, apply.age.interval.length, apply.cov.profile = NULL, apply.snp.profile = NULL, use.c.code = 1, return.lp = FALSE, return.refs.risk = FALSE)
model.formula |
an object of class |
model.cov.info |
contains information about the risk factors in the model ;
a main list containing a list for each covariate, which must have the fields:
If factor variable, then:
|
model.snp.info |
dataframe with three columns, named: [ "snp.name", "snp.odds.ratio", "snp.freq" ] |
model.log.RR |
vector with log odds ratios corresponding to the model params; no intercept;
names must match design matrix arising from |
model.ref.dataset |
dataframe of risk factors for a sample of subjects representative of underlying population, no missing values.
Variables must be in same order with same names as in |
model.ref.dataset.weights |
optional vector of sampling weights for |
model.disease.incidence.rates |
two column matrix [ integer ages, incidence rates] or three column matrix [start age, end age, rate] with incidence rate of disease. Must fully cover age interval for estimation. |
model.competing.incidence.rates |
two column matrix [ integer ages, incidence rates] or three column matrix [start age, end age, rate] with incidence rate of competing events. Must fully cover age interval for estimation. |
model.bin.fh.name |
string name of family history variable, if in model. This must refer to a variable that only takes values 0,1, NA. |
n.imp |
integer value for number of imputations for handling missing SNPs. |
apply.age.start |
single integer or vector of integer ages for the start of the interval over which to compute absolute risk. |
apply.age.interval.length |
single integer or vector of integer years over which absolute risk should be computed. |
apply.cov.profile |
dataframe containing the covariate profiles for which absolute risk will be computed. Covariates must be in same order
with same names as in |
apply.snp.profile |
data frame with observed SNP data (coded 0,1, 2, or NA). May have missing values. |
use.c.code |
binary indicator of whether to run the c program for fast computation. |
return.lp |
binary indicator of whether to return the linear predictor for each subject in apply.cov.profile. |
return.refs.risk |
binary indicator of whether to return the absolute risk prediction for each subject in |
Individualized Coherent Absolute Risk Estimators (iCARE) is a tool that allows researchers to quickly build models for absolute risk and apply them to estimate individuals' risk based on a set of user defined input parameters. The software gives users the flexibility to change or update models rapidly based on new risk factors or tailor models to different populations based on the specification of simply three input arguments:
(1) a model for relative risk assumed to be externally derived
(2) an age-specific disease incidence rate and
(3) the distribution of risk factors for the population of interest.
The tool can handle missing information on risk factors for risk estimation using an approach where all estimates are derived from a single model through appropriate model averaging.
This function returns a list of results objects, including:
risk
: absolute risk estimates over the specified interval for subjects given by apply.cov.profile
details
: dataframe with the start of the interval, the end of the interval, the covariate profile, and the risk estimates for each individual
beta.used
: the log odds ratios used in the model
lps
: linear predictors for subjects in model.cov.profile
, if requested by return.lp
refs.risk
: absolute risk estimates for subjects in model.ref.dataset
, if requested by return.refs.risk
;
computes for first age interval provided
data(bc_data, package="iCARE") results = computeAbsoluteRisk(model.formula=bc_model_formula, model.cov.info = bc_model_cov_info, model.snp.info = bc_72_snps, model.log.RR = bc_model_log_or, model.ref.dataset = ref_cov_dat, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.age.start = 50, apply.age.interval.length = 30, apply.cov.profile = new_cov_prof, apply.snp.profile = new_snp_prof, return.refs.risk = TRUE) summary(results) plot(results, main="Risk") boxplot(results$risk ~ new_cov_prof$famhist, na.rm=TRUE)
data(bc_data, package="iCARE") results = computeAbsoluteRisk(model.formula=bc_model_formula, model.cov.info = bc_model_cov_info, model.snp.info = bc_72_snps, model.log.RR = bc_model_log_or, model.ref.dataset = ref_cov_dat, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.age.start = 50, apply.age.interval.length = 30, apply.cov.profile = new_cov_prof, apply.snp.profile = new_snp_prof, return.refs.risk = TRUE) summary(results) plot(results, main="Risk") boxplot(results$risk ~ new_cov_prof$famhist, na.rm=TRUE)
This function is used to build an absolute risk model that incorporates different input parameters before and after a given time point. The model is then applied to estimate absolute risks.
computeAbsoluteRiskSplitInterval(apply.age.start, apply.age.interval.length, apply.cov.profile, model.formula, model.disease.incidence.rates, model.log.RR, model.ref.dataset, model.ref.dataset.weights=NULL, model.cov.info, use.c.code=1, model.competing.incidence.rates=NULL, return.lp=FALSE, apply.snp.profile=NULL, model.snp.info=NULL, model.bin.fh.name=NULL, cut.time=NULL, apply.cov.profile.2=NULL, model.formula.2=NULL, model.log.RR.2=NULL, model.ref.dataset.2=NULL, model.ref.dataset.weights.2=NULL, model.cov.info.2=NULL, model.bin.fh.name.2=NULL, n.imp=5, return.refs.risk=FALSE)
computeAbsoluteRiskSplitInterval(apply.age.start, apply.age.interval.length, apply.cov.profile, model.formula, model.disease.incidence.rates, model.log.RR, model.ref.dataset, model.ref.dataset.weights=NULL, model.cov.info, use.c.code=1, model.competing.incidence.rates=NULL, return.lp=FALSE, apply.snp.profile=NULL, model.snp.info=NULL, model.bin.fh.name=NULL, cut.time=NULL, apply.cov.profile.2=NULL, model.formula.2=NULL, model.log.RR.2=NULL, model.ref.dataset.2=NULL, model.ref.dataset.weights.2=NULL, model.cov.info.2=NULL, model.bin.fh.name.2=NULL, n.imp=5, return.refs.risk=FALSE)
apply.age.start |
single integer or vector of integer ages for the start of the interval over which to compute absolute risk. |
apply.age.interval.length |
single integer or vector of integer years over which absolute risk should be computed. |
apply.cov.profile |
dataframe containing the covariate profiles for which absolute risk will be computed.
Covariates must be in same order with same names as in |
model.formula |
an object of class |
model.disease.incidence.rates |
two column matrix [ integer ages, incidence rates] or three column matrix [start age, end age, rate] with incidence rate of disease. Must fully cover age interval for estimation. |
model.log.RR |
vector with log odds ratios corresponding to the model params; no intercept;
names must match design matrix arising from |
model.ref.dataset |
dataframe of risk factors for a sample of subjects representative of underlying population, no missing values.
Variables must be in same order with same names as in |
model.ref.dataset.weights |
optional vector of sampling weights for |
model.cov.info |
contains information about the risk factors in the model ; a main list containing a list for each covariate, which must have the fields:
If factor variable, then:
|
use.c.code |
binary indicator of whether to run the c program for fast computation. |
model.competing.incidence.rates |
two column matrix [ integer ages, incidence rates] or three column matrix [start age, end age, rate] with incidence rate of competing events. Must fully cover age interval for estimation. |
return.lp |
binary indicator of whether to return the linear predictor for each subject in apply.cov.profile. |
apply.snp.profile |
data frame with observed SNP data (coded 0,1, 2, or NA). May have missing values. |
model.snp.info |
dataframe with three columns [ rs number, odds ratio, allele frequency ] |
model.bin.fh.name |
string name of family history variable, if in model. This must refer to a variable that only takes values 0,1, NA. |
cut.time |
integer age for which to split computation into before and after |
apply.cov.profile.2 |
see |
model.formula.2 |
see |
model.log.RR.2 |
see |
model.ref.dataset.2 |
see |
model.ref.dataset.weights.2 |
see |
model.cov.info.2 |
see |
model.bin.fh.name.2 |
see |
n.imp |
integer value for number of imputations for handling missing SNPs. |
return.refs.risk |
binary indicator of whether to return the absolute risk prediction for each subject in |
Individualized Coherent Absolute Risk Estimators (iCARE) is a tool that allows researchers to quickly build models for absolute risk and apply them to estimate individuals' risk based on a set of user defined input parameters. The software gives users the flexibility to change or update models rapidly based on new risk factors or tailor models to different populations based on the specification of simply three input arguments:
(1) a model for relative risk assumed to be externally derived
(2) an age-specific disease incidence rate and
(3) the distribution of risk factors for the population of interest.
The tool can handle missing information on risk factors for risk estimation using an approach where all estimates are derived from a single model through appropriate model averaging.
This function returns a list of results objects, including:
risk
: absolute risk estimates over the specified interval for subjects given by apply.cov.profile
details
: dataframe with the start of the interval, the end of the interval, the covariate profile, and the risk estimates for each individual
beta.used
: the log odds ratios used in the model
lps.1
: linear predictors based on first set of parameters for subjects in model.cov.profile
, if requested by return.lp
lps.2
: linear predictors based on second set of parameters for subjects in model.cov.profile
, if requested by return.lp
refs.risk
: absolute risk estimates for subjects in model.ref.dataset
, if requested by return.refs.risk
; computes for first age interval provided
data(bc_data, package="iCARE") results <- computeAbsoluteRiskSplitInterval(model.formula=bc_model_formula, cut.time = 50, model.cov.info = bc_model_cov_info, model.snp.info = bc_72_snps, model.log.RR = bc_model_log_or, model.log.RR.2 = bc_model_log_or_post_50, model.ref.dataset = ref_cov_dat, model.ref.dataset.2 = ref_cov_dat_post_50, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.age.start = 30, apply.age.interval.length = 40, apply.cov.profile = new_cov_prof, apply.snp.profile = new_snp_prof, return.refs.risk = TRUE) summary(results) plot(results) boxplot(results$risk ~ new_cov_prof$famhist, na.rm=TRUE)
data(bc_data, package="iCARE") results <- computeAbsoluteRiskSplitInterval(model.formula=bc_model_formula, cut.time = 50, model.cov.info = bc_model_cov_info, model.snp.info = bc_72_snps, model.log.RR = bc_model_log_or, model.log.RR.2 = bc_model_log_or_post_50, model.ref.dataset = ref_cov_dat, model.ref.dataset.2 = ref_cov_dat_post_50, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.age.start = 30, apply.age.interval.length = 40, apply.cov.profile = new_cov_prof, apply.snp.profile = new_snp_prof, return.refs.risk = TRUE) summary(results) plot(results) boxplot(results$risk ~ new_cov_prof$famhist, na.rm=TRUE)
Individualized Coherent Absolute Risk Estimators (iCARE) is a tool that allows researchers to quickly build models for absolute risk and apply them to estimate individuals' risk based on a set of user defined input parameters. The software gives users the flexibility to change or update models rapidly based on new risk factors or tailor models to different populations based on the specification of simply three input arguments: (1) a model for relative risk assumed to be externally derived (2) an age-specific disease incidence rate and (3) the distribution of risk factors for the population of interest. The tool can handle missing information on risk factors for risk estimation using an approach where all estimates are derived from a single model through appropriate model averaging.
The main functions for building and applying an absolute risk model are computeAbsoluteRisk
and computeAbsoluteRiskSplitInterval
.
The first of these computes absolute risks over the specified time interval using a single set of paramters.
The second provides more advanced functionality and computes absolute risk over the interval in two parts.
computeAbsoluteRiskSplitInterval
allows the user compute absolute risk over the interval in two parts,
incorporating two different sets of paramters before and after a specified cutpoint.
This function allows a different cutpoint for each covariate profile if desired.
The function for validating an absolute risk model is ModelValidation
, and
plotModelValidation
can be called for producing plots
for model calibration, model discrimination and incidence rates.
Paige Maas, Parichoy Pal Choudhury, Nilanjan Chatterjee and William Wheeler <[email protected]>
This function is used to validate absolute risk models.
ModelValidation(study.data, total.followup.validation = FALSE, predicted.risk = NULL, predicted.risk.interval = NULL, linear.predictor = NULL, iCARE.model.object = list(model.formula = NULL, model.cov.info = NULL, model.snp.info = NULL, model.log.RR = NULL, model.ref.dataset = NULL, model.ref.dataset.weights = NULL, model.disease.incidence.rates = NULL, model.competing.incidence.rates = NULL, model.bin.fh.name = NA, apply.cov.profile = NULL, apply.snp.profile = NULL, n.imp = 5, use.c.code = 1, return.lp = TRUE, return.refs.risk = TRUE), number.of.percentiles = 10, reference.entry.age = NULL, reference.exit.age = NULL, predicted.risk.ref = NULL, linear.predictor.ref = NULL, linear.predictor.cutoffs = NULL, dataset = "Example Dataset", model.name = "Example Risk Prediction Model")
ModelValidation(study.data, total.followup.validation = FALSE, predicted.risk = NULL, predicted.risk.interval = NULL, linear.predictor = NULL, iCARE.model.object = list(model.formula = NULL, model.cov.info = NULL, model.snp.info = NULL, model.log.RR = NULL, model.ref.dataset = NULL, model.ref.dataset.weights = NULL, model.disease.incidence.rates = NULL, model.competing.incidence.rates = NULL, model.bin.fh.name = NA, apply.cov.profile = NULL, apply.snp.profile = NULL, n.imp = 5, use.c.code = 1, return.lp = TRUE, return.refs.risk = TRUE), number.of.percentiles = 10, reference.entry.age = NULL, reference.exit.age = NULL, predicted.risk.ref = NULL, linear.predictor.ref = NULL, linear.predictor.cutoffs = NULL, dataset = "Example Dataset", model.name = "Example Risk Prediction Model")
study.data |
Data frame which includes the variables below.
|
total.followup.validation |
logical; TRUE if risk validation is performed over the total followup, for all other cases (e.g., 5 year or 10 year risk validation) it is FALSE |
predicted.risk |
vector of predicted risks; should be supplied if risk prediction is done by some
method other than that implemented in |
predicted.risk.interval |
scalar or vector denoting the number of years after entering
the study over which risk validation is desired (e.g., 5 for validating a model for 5 year risk)
if |
linear.predictor |
vector of risk scores for each subject, i.e. |
iCARE.model.object |
A named list containing the input arguments to the function |
number.of.percentiles |
the number of percentiles of the risk score that determines the number of strata over which the risk prediction model is to be validated, default = 10 |
reference.entry.age |
age of entry to be specified for computing absolute risk of the reference population |
reference.exit.age |
age of exit to be specified for computing absolute risk of the reference population |
predicted.risk.ref |
predicted absolute risk in the reference population assuming the entry age to be
as specified in |
linear.predictor.ref |
vector of risk scores for the reference population |
linear.predictor.cutoffs |
user specified cut-points for the linear predictor to define categories for absolute risk calibration and relative risk calibration |
dataset |
name and type of dataset to be displayed in the output, e.g., "PLCO Full Cohort" or "Full Cohort Simulation" |
model.name |
name of the model to be displayed in output, e.g., "Synthetic Model" or "Simulation Setting" |
This function returns a list of the following objects:
Subject_Specific_Observed_Outcome
: observed outcome after adjusting the observed followup according
to the risk prediction interval: 1 if disease has occurred by the end of followup, 0 if censored
Risk_Prediction_Interval
: Character object showing the interval of risk prediction
(e.g., 5 years). If the risk prediction is over the total followup of the study,
this reads "Observed Followup"
Adjusted.Followup
: followup time (in years) after adjusting the observed followup according to the risk
prediction interval
Subject_Specific_Predicted_Absolute_Risk
: predicted absolute risk of disease for each subject
Reference_Absolute_Risk
: predicted absolute risk in the reference population
Subject_Specific_Risk_Score
: estimated risk score for each subject; the missing covariates are handled
internally using the imputation in iCARE
Reference_Risk_Score
: risk score for the reference population
Population_Incidence_Rate
: age specific disease incidence rate in the population
Study_Incidence_Rate
: estimated age specific incidence rate in the study
Category_Results
: observed and predicted absolute risks and observed and predicted relative risks
in each category defined by the risk score
Category_Specific_Observed_Absolute_Risk
: Observed absolute risk in each category defined by the risk score
Category_Specific_Predicted_Absolute_Risk
: Predicted absolute risk in each category defined by the risk score
Category_Specific_Observed_Relative_Risk
: Observed relative risk in each category defined by the risk score
Category_Specific_Predicted_Relative_Risk
: Predicted relative risk in each category defined by the risk score
Variance_Matrix_Absolute_Risk
: Variance-covariance matrix of the vector of cateogry specific absolute risks
Variance_Matrix_LogRelative_Risk
: Variance-covariance matrix of the vector of cateogry specific relative risks
Hosmer_Lemeshow_Results
: results of the Hosmer-Lemeshow type chisquare test comparing the observed and predicted absolute risks
HL_pvalue
: pvalue of the Hosmer-Lemeshow type chisquare test
RR_test_result
: results of the chisquare test comparing the observed and predicted relative risks
RR_test_pvalue
: pvalue of the chisquare test of relative risk
AUC
: estimate of the Area Under the Curve (AUC) defined as the probability that for a randomly sampled
case-control pair the case has a higher risk score than the control; for the full cohort setting we compute
the empirical proportion and for the nested case-control setting we compute the inverse probability weighted estimator
Variance_AUC
: estimate of the variance of Area Under the Curve (AUC): for the full cohort setting the regular
asymptotic variance is estimated and for the nested case-control setting the influence function based variance
estimate of the inverse probability weighted variance estimator is computed
CI_AUC
: 95 percent Wald based confidence interval of Area Under the Curve (AUC) using the asymptotic variance
Overall_Expected_to_Observed_Ratio
: The overall ratio of the expected risk to the observed risk
CI_Overall_Expected_to_Observed_Ratio
: 95 percent Wald based confidence interval of the overall ratio of the expected risk to the observed risk
data(bc_data, package="iCARE") validation.cohort.data$inclusion = 0 subjects_included = intersect(validation.cohort.data$id, validation.nested.case.control.data$id) validation.cohort.data$inclusion[subjects_included] = 1 validation.cohort.data$observed.followup = validation.cohort.data$study.exit.age - validation.cohort.data$study.entry.age selection.model = glm(inclusion ~ observed.outcome * (study.entry.age + observed.followup), data = validation.cohort.data, family = binomial(link = "logit")) validation.nested.case.control.data$sampling.weights = selection.model$fitted.values[validation.cohort.data$inclusion == 1] set.seed(50) data = validation.nested.case.control.data snpDat = bc_72_snps form = diagnosis ~ famhist + as.factor(parity) info = list(bc_model_cov_info[[1]], bc_model_cov_info[[3]]) vars = all.vars(form)[-1] risk.model = list(model.formula = form, model.cov.info = info, model.snp.info = snpDat, model.log.RR = bc_model_log_or[c(1, 8:11)], model.ref.dataset = ref_cov_dat[, vars], model.ref.dataset.weights = NULL, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.cov.profile = data[,vars], apply.snp.profile = data[,snpDat$snp.name], n.imp = 5, use.c.code = 1, return.lp = TRUE, return.refs.risk = TRUE) # Not run since it can take a few minutes # output = ModelValidation(study.data = data, total.followup.validation = TRUE, # predicted.risk.interval = NULL, iCARE.model.object = risk.model, # number.of.percentiles = 10) output
data(bc_data, package="iCARE") validation.cohort.data$inclusion = 0 subjects_included = intersect(validation.cohort.data$id, validation.nested.case.control.data$id) validation.cohort.data$inclusion[subjects_included] = 1 validation.cohort.data$observed.followup = validation.cohort.data$study.exit.age - validation.cohort.data$study.entry.age selection.model = glm(inclusion ~ observed.outcome * (study.entry.age + observed.followup), data = validation.cohort.data, family = binomial(link = "logit")) validation.nested.case.control.data$sampling.weights = selection.model$fitted.values[validation.cohort.data$inclusion == 1] set.seed(50) data = validation.nested.case.control.data snpDat = bc_72_snps form = diagnosis ~ famhist + as.factor(parity) info = list(bc_model_cov_info[[1]], bc_model_cov_info[[3]]) vars = all.vars(form)[-1] risk.model = list(model.formula = form, model.cov.info = info, model.snp.info = snpDat, model.log.RR = bc_model_log_or[c(1, 8:11)], model.ref.dataset = ref_cov_dat[, vars], model.ref.dataset.weights = NULL, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.cov.profile = data[,vars], apply.snp.profile = data[,snpDat$snp.name], n.imp = 5, use.c.code = 1, return.lp = TRUE, return.refs.risk = TRUE) # Not run since it can take a few minutes # output = ModelValidation(study.data = data, total.followup.validation = TRUE, # predicted.risk.interval = NULL, iCARE.model.object = risk.model, # number.of.percentiles = 10) output
This function is used to create plots for model calibration, model discrimination and incidence rates.
plotModelValidation(study.data, validation.results, dataset = "Example Dataset", model.name = "Example Model", x.lim.absrisk = "", y.lim.absrisk = "", x.lab.absrisk = "Expected Absolute Risk (%)", y.lab.absrisk = "Observed Absolute Risk (%)", x.lim.RR = "", y.lim.RR = "", x.lab.RR = "Expected Relative Risk", y.lab.RR = "Observed Relative Risk", risk.score.plot.kernel = "gaussian", risk.score.plot.bandwidth = "nrd0", risk.score.plot.percent.smooth = 50)
plotModelValidation(study.data, validation.results, dataset = "Example Dataset", model.name = "Example Model", x.lim.absrisk = "", y.lim.absrisk = "", x.lab.absrisk = "Expected Absolute Risk (%)", y.lab.absrisk = "Observed Absolute Risk (%)", x.lim.RR = "", y.lim.RR = "", x.lab.RR = "Expected Relative Risk", y.lab.RR = "Observed Relative Risk", risk.score.plot.kernel = "gaussian", risk.score.plot.bandwidth = "nrd0", risk.score.plot.percent.smooth = 50)
study.data |
See |
validation.results |
List returned from |
dataset |
Name and type of dataset to be displayed in the output, e.g., "PLCO Full Cohort" or "Full Cohort Simulation" |
model.name |
Name of the model to be displayed in output, e.g., "Synthetic Model" or "Simulation Setting" |
x.lim.absrisk |
Vector of length two specifying the x-axes limits in the absolute risk calibration plot. If not specified, then default limits will be computed. |
y.lim.absrisk |
Vector of length two specifying the y-axes limits in the absolute risk calibration plot. If not specified, then default limits will be computed. |
x.lab.absrisk |
String specifying the x-axes label in the absolute risk calibration plot. The default is "Expected Absolute Risk (%)". |
y.lab.absrisk |
String specifying the y-axes label in the absolute risk calibration plot. The default is "Observed Absolute Risk (%)." |
x.lim.RR |
Vector of length two specifying the x-axes limits in the relative risk calibration plot. If not specified, then default limits will be computed. |
y.lim.RR |
Vector of length two specifying the y-axes limits in the relative risk calibration plot. If not specified, then default limits will be computed. |
x.lab.RR |
String specifying the x-axes label in the relative risk calibration plot. The default is "Expected Relative Risk". |
y.lab.RR |
String specifying the y-axes label in the relative risk calibration plot. The default is "Observed Relative Risk". |
risk.score.plot.kernel |
Character string giving the smoothing kernel to be used by the density function used internally to plot the density of the risk scores. It should be one of "gaussian", "rectangular", "triangular", "epanechnikov", "biweight", "cosine" or "optcosine", with default "gaussian". |
risk.score.plot.bandwidth |
The options for bandwidth selection used by the density function internally to plot
the density of the risk scores. The options are one of the following: "nrd0", "nrd", "ucv", "bcv", "SJ-ste", "SJ-dpi"
with the default being "nrd0". More information on these different options is available in the help pages that can be
accessed from R using the command |
risk.score.plot.percent.smooth |
Percentage of the number of sample points used for determining the number of equally spaced points at which the density of the risk score is to be estimated. This number supplies the input parameter "n" to the density function used internally to plot the densities of the risk score. The default value is 50. |
This function returns NULL
data(bc_data, package="iCARE") validation.cohort.data$inclusion = 0 subjects_included = intersect(validation.cohort.data$id, validation.nested.case.control.data$id) validation.cohort.data$inclusion[subjects_included] = 1 validation.cohort.data$observed.followup = validation.cohort.data$study.exit.age - validation.cohort.data$study.entry.age selection.model = glm(inclusion ~ observed.outcome * (study.entry.age + observed.followup), data = validation.cohort.data, family = binomial(link = "logit")) validation.nested.case.control.data$sampling.weights = selection.model$fitted.values[validation.cohort.data$inclusion == 1] set.seed(50) data = validation.nested.case.control.data snpDat = bc_72_snps form = diagnosis ~ famhist + as.factor(parity) info = list(bc_model_cov_info[[1]], bc_model_cov_info[[3]]) vars = all.vars(form)[-1] risk.model = list(model.formula = form, model.cov.info = info, model.snp.info = snpDat, model.log.RR = bc_model_log_or[c(1, 8:11)], model.ref.dataset = ref_cov_dat[, vars], model.ref.dataset.weights = NULL, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.cov.profile = data[,vars], apply.snp.profile = data[,snpDat$snp.name], n.imp = 5, use.c.code = 1, return.lp = TRUE, return.refs.risk = TRUE) # Not run since it can take a few minutes #output = ModelValidation(study.data = data, total.followup.validation = TRUE, # predicted.risk.interval = NULL, iCARE.model.object = risk.model, # number.of.percentiles = 10) plot(output)
data(bc_data, package="iCARE") validation.cohort.data$inclusion = 0 subjects_included = intersect(validation.cohort.data$id, validation.nested.case.control.data$id) validation.cohort.data$inclusion[subjects_included] = 1 validation.cohort.data$observed.followup = validation.cohort.data$study.exit.age - validation.cohort.data$study.entry.age selection.model = glm(inclusion ~ observed.outcome * (study.entry.age + observed.followup), data = validation.cohort.data, family = binomial(link = "logit")) validation.nested.case.control.data$sampling.weights = selection.model$fitted.values[validation.cohort.data$inclusion == 1] set.seed(50) data = validation.nested.case.control.data snpDat = bc_72_snps form = diagnosis ~ famhist + as.factor(parity) info = list(bc_model_cov_info[[1]], bc_model_cov_info[[3]]) vars = all.vars(form)[-1] risk.model = list(model.formula = form, model.cov.info = info, model.snp.info = snpDat, model.log.RR = bc_model_log_or[c(1, 8:11)], model.ref.dataset = ref_cov_dat[, vars], model.ref.dataset.weights = NULL, model.disease.incidence.rates = bc_inc, model.competing.incidence.rates = mort_inc, model.bin.fh.name = "famhist", apply.cov.profile = data[,vars], apply.snp.profile = data[,snpDat$snp.name], n.imp = 5, use.c.code = 1, return.lp = TRUE, return.refs.risk = TRUE) # Not run since it can take a few minutes #output = ModelValidation(study.data = data, total.followup.validation = TRUE, # predicted.risk.interval = NULL, iCARE.model.object = risk.model, # number.of.percentiles = 10) plot(output)