`check.associations.Rd`

This function computes different measures of association
between features and the label and stores the results in the
`association`

slot of the SIAMCAT object

```
check.associations(siamcat, formula="feat~label", test='wilcoxon',
alpha=0.05, mult.corr="fdr", log.n0=1e-06, pr.cutoff=1e-06,
probs.fc=seq(.1, .9, .05), paired=NULL, feature.type='filtered',
verbose = 1)
```

- siamcat
object of class siamcat-class

- formula
string, formula used for testing, see Details for more information, defaults to

`"feat~label"`

- test
string, statistical test used for the association testing, can be either

`'wilcoxon'`

or`'lm'`

, see Details for more information, defaults to`'wilcoxon'`

- alpha
float, significance level, defaults to

`0.05`

- mult.corr
string, multiple hypothesis correction method, see

`p.adjust`

, defaults to`"fdr"`

- log.n0
float, pseudo-count to be added before log-transformation of the data, defaults to

`1e-06`

. Will be ignored if`feature.type`

is`"normalized"`

.- pr.cutoff
float, cutoff for the prevalence computation, defaults to

`1e-06`

- probs.fc
numeric vector, quantiles used to calculate the generalized fold change between groups, see Details for more information, defaults to

`seq(.1, .9, .05)`

- paired
character, column name of the meta-variable containing information for a paired test, defaults to

`NULL`

- feature.type
string, on which type of features should the function work? Can be either

`c("original", "filtered", or "normalized")`

. Please only change this parameter if you know what you are doing!If

`feature.type`

is`"normalized"`

, the normalized abundances will not be log10-transformed.- verbose
integer, control output:

`0`

for no output at all,`1`

for only information about progress and success,`2`

for normal level of information and`3`

for full debug information, defaults to`1`

object of class siamcat-class with the slot
`associations`

filled

The function uses the Wilcoxon test as default statistical test for binary classification problems. Alternatively, a simple linear model (as implemented in lm) can be used as well. For regression problems, the function defaults to the linear model.

The function calculates several measures for the effect size of the assocations between microbial features and the label. For binary classification problems, these associations are:

AUROC (area under the Receiver Operating Characteristics curve) as a non-parametric measure of enrichment,

the generalized fold change (gFC), a pseudo-fold change which is calculated as geometric mean of the differences between quantiles across both groups,

prevalence shift (difference in prevalence between the two groups).

For regression problems, the effect sizes are:

Spearman correlation between the feature and the label.

To correct for possible confounders while testing for association, the
function uses linear mixed effect models as implemented in the
lmerTest package. To do so, the test formula needs to be adjusted
to include the confounder. For example, when correcting for the metadata
information `Sex`

, the formula would be:
`'feat~label+(1|Sex)'`

(see also the example below).

Please note that modifying the formula parameter in this function might lead to unexpected results!

For paired testing, e.g. when the same patient has been sampled before and after an intervention, the `paired` parameter can be supplied to the function. This indicated a column in the metadata table that holds the information about pairing.

```
# Example data
data(siamcat_example)
# Simple example
siamcat_example <- check.associations(siamcat_example)
#> + Enrichments have already been calculated!
# Confounder-corrected testing (corrected for Sex)
#
# this is not run during checks
# siamcat_example <- check.associations(siamcat_example,
# formula='feat~label+(1|Sex)', test='lm')
# Paired testing
#
# this is not run during checks
# siamcat_paired <- check.associations(siamcat_paired,
# paired='Individual_ID')
```