Perform unsupervised feature filtering. — filter.features • SIAMCAT

This function performs unsupervised feature filtering.

filter.features(siamcat, filter.method = "abundance", 
cutoff = 0.001, rm.unmapped = TRUE, feature.type='original', verbose = 1)

Arguments

siamcat: an object of class siamcat-class
filter.method: string, method used for filtering the features, can be one of these: c('abundance', 'cum.abundance', 'prevalence', 'variance', 'pass'), defaults to 'abundance'
cutoff: float, abundace, prevalence, or variance cutoff, defaults to 0.001 (see Details below)
rm.unmapped: boolean, should unmapped reads be discarded?, defaults to TRUE
feature.type: string, on which type of features should the function work? Can be either "original", "filtered", or "normalized". Please only change this paramter if you know what you are doing!
verbose: integer, control output: 0 for no output at all, 1 for only information about progress and success, 2 for normal level of information and 3 for full debug information, defaults to 1

Value

siamcat an object of class siamcat-class

Details

This function filters the features in a siamcat-class object in a unsupervised manner.

The different filter methods work in the following way:

'abundace' - remove features whose maximum abundance is never above the threshold value in any of the samples
'cum.abundance' - remove features with very low abundance in all samples, i.e. those that are never among the most abundant entities that collectively make up (1-cutoff) of the reads in any sample
'prevalence' - remove features with low prevalence across samples, i.e. those that are undetected (relative abundance of 0) in more than 1 - cutoff percent of samples.
'variance' - remove features with low variance across samples, i.e. those that have a variance lower than cutoff
'pass' - pass-through filtering will not change the features

Features can also be filtered repeatedly with different methods, e.g. first using the maximum abundance filtering and then using prevalence filtering. However, if a filtering method has already been applied to the dataset, SIAMCAT will default back on the original features for filtering.

Examples

# Example dataset
data(siamcat_example)

# Simple examples
siamcat_filtered <- filter.features(siamcat_example,
    filter.method='abundance',
    cutoff=1e-03)
#> Features successfully filtered

# 5% prevalence filtering
siamcat_filtered <- filter.features(siamcat_example,
    filter.method='prevalence',
    cutoff=0.05)
#> Features successfully filtered

# filter first for abundance and then for prevalence
siamcat_filt <- filter.features(siamcat_example, 
    filter.method='abundance', cutoff=1e-03)
#> Features successfully filtered
siamcat_filt <- filter.features(siamcat_filt, filter.method='prevalence', 
    cutoff=0.05, feature.type='filtered')
#> Features successfully filtered