This function computes different measures of association between features and the label and stores the results in the association slot of the SIAMCAT object

check.associations(siamcat, formula="feat~label", test='wilcoxon', 
alpha=0.05, mult.corr="fdr", log.n0=1e-06, pr.cutoff=1e-06, 
probs.fc=seq(.1, .9, .05), paired=NULL, feature.type='filtered', 
verbose = 1)

Arguments

siamcat

object of class siamcat-class

formula

string, formula used for testing, see Details for more information, defaults to "feat~label"

test

string, statistical test used for the association testing, can be either 'wilcoxon' or 'lm', see Details for more information, defaults to 'wilcoxon'

alpha

float, significance level, defaults to 0.05

mult.corr

string, multiple hypothesis correction method, see p.adjust, defaults to "fdr"

log.n0

float, pseudo-count to be added before log-transformation of the data, defaults to 1e-06. Will be ignored if feature.type is "normalized".

pr.cutoff

float, cutoff for the prevalence computation, defaults to 1e-06

probs.fc

numeric vector, quantiles used to calculate the generalized fold change between groups, see Details for more information, defaults to seq(.1, .9, .05)

paired

character, column name of the meta-variable containing information for a paired test, defaults to NULL

feature.type

string, on which type of features should the function work? Can be either c("original", "filtered", or "normalized"). Please only change this parameter if you know what you are doing!

If feature.type is "normalized", the normalized abundances will not be log10-transformed.

verbose

integer, control output: 0 for no output at all, 1 for only information about progress and success, 2 for normal level of information and 3 for full debug information, defaults to 1

Value

object of class siamcat-class with the slot associations filled

Statistical testing

The function uses the Wilcoxon test as default statistical test for binary classification problems. Alternatively, a simple linear model (as implemented in lm) can be used as well. For regression problems, the function defaults to the linear model.

Effect sizes

The function calculates several measures for the effect size of the assocations between microbial features and the label. For binary classification problems, these associations are:

  • AUROC (area under the Receiver Operating Characteristics curve) as a non-parametric measure of enrichment,

  • the generalized fold change (gFC), a pseudo-fold change which is calculated as geometric mean of the differences between quantiles across both groups,

  • prevalence shift (difference in prevalence between the two groups).

For regression problems, the effect sizes are:

  • Spearman correlation between the feature and the label.

Confounder-corrected testing

To correct for possible confounders while testing for association, the function uses linear mixed effect models as implemented in the lmerTest package. To do so, the test formula needs to be adjusted to include the confounder. For example, when correcting for the metadata information Sex, the formula would be: 'feat~label+(1|Sex)' (see also the example below).

Please note that modifying the formula parameter in this function might lead to unexpected results!

Paired testing

For paired testing, e.g. when the same patient has been sampled before and after an intervention, the `paired` parameter can be supplied to the function. This indicated a column in the metadata table that holds the information about pairing.

Examples

# Example data
data(siamcat_example)

# Simple example
siamcat_example <- check.associations(siamcat_example)
#> + Enrichments have already been calculated!


# Confounder-corrected testing (corrected for Sex)
#
# this is not run during checks
# siamcat_example <- check.associations(siamcat_example, 
#     formula='feat~label+(1|Sex)', test='lm')

# Paired testing
#
# this is not run during checks
# siamcat_paired <- check.associations(siamcat_paired, 
#     paired='Individual_ID')