SuperSCC.feature_selection.find_signature_genes

SuperSCC.feature_selection.find_signature_genes(data, label_column='cluster', n_features_to_select=0.15, ratio_of_none_zero_counts=0.1, class_weight='balanced', HVG=True, save=False, filename=None, n_jobs=-1, logger=None, **kwargs)[source]

A simple wrapper of feature_selection function for finding highly variable genes for overall clusters or for markers of the corresponding cluster.

Parameters:

data – A log normalized expression matrix (rows are cells; columns are features) with an extra column containing clustering or cell type labels.
label_column – A string to specify the name of cell type column in the data.
n_features_to_select – A int or float to control the number of features to select. If integer, the parameter is the absolute number of features to select. If float between 0 and 1, it is the fraction of features to select. Default is 0.15.
ratio_of_none_zero_counts – A float to determine the cutoff in which a feature will be omited when below specified value. A higher value leads to less features will be kept for calculation.
class_weight – A string to decide whether class weights will be considered. If None, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). Default is ‘balanced’.
HVG – A Bool value to decider whether only detected highly variable features will be returned.
save – A Bool value to decide whether the result will be written into disk. Default is True.
filename – A string to name the output file. Default is None.
n_jobs – A int to decide the number of thread used for the program. Default is -1, meaning using all available threads.
logger – A log_file object to write log information into disk. Default is None.
**kwargs – Other paremeters passed to feature_selection function.