This function computes different measures of association between features and the label and visualizes the results

check.associations(siamcat, fn.plot=NULL, color.scheme = "RdYlBu",
    alpha =0.05, mult.corr = "fdr", = "fc",
    detect.lim = 1e-06, pr.cutoff = 1e-6, = 50,
    plot.type = "",
    panels = c("fc","auroc"), prompt = TRUE,
    feature.type = 'filtered', paired=NULL, verbose = 1)



object of class siamcat-class


string, filename for the pdf-plot. If fn.plot is NULL, the plot will be produced in the active graphics device.


valid R color scheme or vector of valid R colors (must be of the same length as the number of classes), defaults to 'RdYlBu'


float, significance level, defaults to 0.05


string, multiple hypothesis correction method, see p.adjust, defaults to "fdr"

string, sort features by p-value ("p.val"), by fold change ("fc") or by prevalence shift ("pr.shift"), defaults to "fc"


float, pseudocount to be added before log-transformation of the data, defaults to 1e-06. Will be ignored if feature.type is "normalized".


float, cutoff for the prevalence computation, defaults to 1e-06

integer, how many associated features should be shown, defaults to 50


string, specify how the abundance should be plotted, must be one of these: c("bean", "box", "", "quantile.rect"), defaults to ""


vector, name of the panels to be plotted next to the abundances, possible entries are c("fc", "auroc", "prevalence"), defaults to c("fc", "auroc")


boolean, turn on/off prompting user input when not plotting into a pdf-file, defaults to TRUE


string, on which type of features should the function work? Can be either c()"original", "filtered", or "normalized"). Please only change this paramter if you know what you are doing!

If feature.type is "normalized", the normalized abundances will not be log10-transformed.


character, column name of the meta-variable containing information for a paired test


integer, control output: 0 for no output at all, 1 for only information about progress and success, 2 for normal level of information and 3 for full debug information, defaults to 1


object of class siamcat-class with the slot associations filled


For each feature, this function calculates different measures of association between the feature and the label. In detail, these associations are:

  • Significance as computed by a Wilcoxon test followed by multiple hypothesis testing correction.

  • AUROC (Area Under the Receiver Operating Characteristics Curve) as a non-parameteric measure of enrichment (corresponds to the effect size of the Wilcoxon test).

  • The generalized Fold Change (gFC) is a pseudo fold change which is calculated as geometric mean of the differences between the quantiles for the different classes found in the label.

  • The prevalence shift between the two different classes found in the label.

Finally, the function produces a plot of the top associated features at a user-specified significance level alpha, showing the distribution of the log10-transformed abundances for both classes, and user-selected panels for the effect (AU-ROC, Prevalence Shift, and Fold Change).


# Example data

# Simple example
siamcat_example <- check.associations(siamcat_example,
#> Plotted associations between features and label successfully to: ./assoc_plot.pdf

# Plot associations as box plot
siamcat_example <- check.associations(siamcat_example,
    fn.plot='./assoc_plot_box.pdf', plot.type='box')
#> Plotted associations between features and label successfully to: ./assoc_plot_box.pdf

# Additionally, sort by p-value instead of by fold change
siamcat_example <- check.associations(siamcat_example,
    fn.plot='./assoc_plot_fc.pdf', plot.type='box','p.val')
#> Plotted associations between features and label successfully to: ./assoc_plot_fc.pdf

# Custom colors
siamcat_example <- check.associations(siamcat_example,
    fn.plot='./assoc_plot_blue_yellow.pdf', plot.type='box',
    color.scheme=c('cornflowerblue', '#ffc125'))
#> Plotted associations between features and label successfully to: ./assoc_plot_blue_yellow.pdf