This function performs feature normalization according to user-specified parameters.

normalize.features(siamcat, norm.method = c("rank.unit", "rank.std",
"log.std", "log.unit", "log.clr", "std", "pass"), 
norm.param = list(log.n0 = 1e-06, sd.min.q = 0.1, n.p = 2, norm.margin = 1),
feature.type='filtered', verbose = 1)

Arguments

siamcat

an object of class siamcat-class

norm.method

string, normalization method, can be one of these: c('rank.unit', 'rank.std', 'log.std', 'log.unit', 'log.clr','std', 'pass')

norm.param

list, specifying the parameters of the different normalization methods, see Details for more information

feature.type

string, on which type of features should the function work? Can be either "original", "filtered", or "normalized". Please only change this paramter if you know what you are doing!

verbose

integer, control output: 0 for no output at all, 1 for only information about progress and success, 2 for normal level of information and 3 for full debug information, defaults to 1

Value

an object of class siamcat-class with normalized features

Implemented methods

There are seven different normalization methods available, which might need additional parameters, which are passed via the norm.param list:

  • 'rank.unit' - converts features to ranks and normalizes each column (=sample) by the square root of the sum of ranks This method does not require additional parameters.

  • 'rank.std' - converts features to ranks and applies z-score standardization. This method requires sd.min.q (minimum quantile of the standard deviation to be added to all features in order to avoid underestimation of standard deviation) as additional parameter.

  • 'log.clr' - centered log-ratio transformation. This methods requires a pseudocount (log.n0) before log-transformation.

  • 'log.std' - log-transforms features and applies z-score standardization. This method requires both a pseudocount (log.n0) and sd.min.q

  • 'log.unit' - log-transforms features and normalizes by features or samples with different norms. This method requires a pseudocount (log.n0) and then additionally the parameters norm.maring (margin over which to normalize, similarly to the apply-syntax: Allowed values are 1 for normalization over features, 2 over samples, and 3 for normalization by the global maximum) and the parameter n.p (vector norm to be used, can be either 1 for x/sum(x) or 2 for x/sqrt(sum(x^2))).

  • 'std' - z-score standardization without any other transformation This method only requires the sd.min.q parameter

  • 'pass' - pass-through normalization will not change the features

Frozen normalization

The function additionally allows to perform a frozen normalization on a different dataset. After normalizing the first dataset, the norm_feat slot in the SIAMCAT object contains all parameters of the normalization, which you can access via the norm_params accessor.

In order to perform a frozen normalization of a new dataset, you can run the function supplying the normalization parameters as argument to norm.param: norm.param=norm_params(siamcat_reference). See also the example below.

Examples

# Example data
data(siamcat_example)

# Simple example
siamcat_norm <- normalize.features(siamcat_example,
    norm.method='rank.unit')
#> Features normalized successfully.

# log.unit example
siamcat_norm <- normalize.features(siamcat_example,
    norm.method='log.unit',
    norm.param=list(log.n0=1e-05, n.p=1, norm.margin=1))
#> Features normalized successfully.

# log.std example
siamcat_norm <- normalize.features(siamcat_example,
    norm.method='log.std',
    norm.param=list(log.n0=1e-05, sd.min.q=.1))
#> Features normalized successfully.

# Frozen normalization
# normalize the object siamcat with the same parameters as used in 
# siamcat_reference
# 
# this is not run
# siamcat_norm <- normalize.features(siamcat,
#   norm.param=norm_params(siamcat_reference))