check.associations.Rd
This function computes different measures of association
between features and the label and stores the results in the
association
slot of the SIAMCAT object
check.associations(siamcat, formula="feat~label", test='wilcoxon',
alpha=0.05, mult.corr="fdr", log.n0=1e-06, pr.cutoff=1e-06,
probs.fc=seq(.1, .9, .05), paired=NULL, feature.type='filtered',
verbose = 1)
object of class siamcat-class
string, formula used for testing, see Details for more
information, defaults to "feat~label"
string, statistical test used for the association testing, can
be either 'wilcoxon'
or 'lm'
, see Details for more
information, defaults to 'wilcoxon'
float, significance level, defaults to 0.05
string, multiple hypothesis correction method, see
p.adjust
, defaults to "fdr"
float, pseudo-count to be added before log-transformation of
the data, defaults to 1e-06
. Will be ignored if
feature.type
is "normalized"
.
float, cutoff for the prevalence computation, defaults to
1e-06
numeric vector, quantiles used to calculate the generalized
fold change between groups, see Details for more information,
defaults to seq(.1, .9, .05)
character, column name of the meta-variable containing
information for a paired test, defaults to NULL
string, on which type of features should the function
work? Can be either c("original", "filtered", or "normalized")
.
Please only change this parameter if you know what you are doing!
If feature.type
is "normalized"
, the normalized abundances
will not be log10-transformed.
integer, control output: 0
for no output at all,
1
for only information about progress and success, 2
for
normal level of information and 3
for full debug information,
defaults to 1
object of class siamcat-class with the slot
associations
filled
The function uses the Wilcoxon test as default statistical test for binary classification problems. Alternatively, a simple linear model (as implemented in lm) can be used as well. For regression problems, the function defaults to the linear model.
The function calculates several measures for the effect size of the assocations between microbial features and the label. For binary classification problems, these associations are:
AUROC (area under the Receiver Operating Characteristics curve) as a non-parametric measure of enrichment,
the generalized fold change (gFC), a pseudo-fold change which is calculated as geometric mean of the differences between quantiles across both groups,
prevalence shift (difference in prevalence between the two groups).
For regression problems, the effect sizes are:
Spearman correlation between the feature and the label.
To correct for possible confounders while testing for association, the
function uses linear mixed effect models as implemented in the
lmerTest package. To do so, the test formula needs to be adjusted
to include the confounder. For example, when correcting for the metadata
information Sex
, the formula would be:
'feat~label+(1|Sex)'
(see also the example below).
Please note that modifying the formula parameter in this function might lead to unexpected results!
For paired testing, e.g. when the same patient has been sampled before and after an intervention, the `paired` parameter can be supplied to the function. This indicated a column in the metadata table that holds the information about pairing.
# Example data
data(siamcat_example)
# Simple example
siamcat_example <- check.associations(siamcat_example)
#> + Enrichments have already been calculated!
# Confounder-corrected testing (corrected for Sex)
#
# this is not run during checks
# siamcat_example <- check.associations(siamcat_example,
# formula='feat~label+(1|Sex)', test='lm')
# Paired testing
#
# this is not run during checks
# siamcat_paired <- check.associations(siamcat_paired,
# paired='Individual_ID')