This function performs unsupervised feature filtering. Features can be filtered based on abundance, prevalence, or on variance. Additionally, unmapped reads may be removed.

filter.features(siamcat, filter.method = "abundance",
    cutoff = 0.001, rm.unmapped = TRUE,
    feature.type='original', verbose = 1)

Arguments

siamcat

an object of class siamcat-class

filter.method

string, method used for filtering the features, can be one of these: c('abundance', 'cum.abundance', 'prevalence', 'variance', 'pass'), defaults to 'abundance'

cutoff

float, abundace, prevalence, or variance cutoff, defaults to 0.001 (see Details below)

rm.unmapped

boolean, should unmapped reads be discarded?, defaults to TRUE

feature.type

string, on which type of features should the function work? Can be either "original", "filtered", or "normalized". Please only change this paramter if you know what you are doing!

verbose

integer, control output: 0 for no output at all, 1 for only information about progress and success, 2 for normal level of information and 3 for full debug information, defaults to 1

Value

siamcat an object of class siamcat-class

Details

This function filters the features in a siamcat-class object in a unsupervised manner.

The different filter methods work in the following way:

  • 'abundace' - remove features whose maximum abundance is never above the threshold value in any of the samples

  • 'cum.abundance' - remove features with very low abundance in all samples, i.e. those that are never among the most abundant entities that collectively make up (1-cutoff) of the reads in any sample

  • 'prevalence' - remove features with low prevalence across samples, i.e. those that are undetected (relative abundance of 0) in more than 1 - cutoff percent of samples.

  • 'variance' - remove features with low variance across samples, i.e. those that have a variance lower than cutoff

  • 'pass' - pass-through filtering will not change the features

Features can also be filtered repeatedly with different methods, e.g. first using the maximum abundance filtering and then using prevalence filtering. However, if a filtering method has already been applied to the dataset, SIAMCAT will default back on the original features for filtering.

Examples

# Example dataset data(siamcat_example) # Simple examples siamcat_filtered <- filter.features(siamcat_example, filter.method='abundance', cutoff=1e-03)
#> Features successfully filtered
# 5% prevalence filtering siamcat_filtered <- filter.features(siamcat_example, filter.method='prevalence', cutoff=0.05)
#> Features successfully filtered