SuperSCC.clustering.global_consensus_cluster

SuperSCC.clustering.global_consensus_cluster(data, n_components=30, resolution=1, only_positive=True, ratio_of_none_zero_counts=0.1, n_features_to_select=0.15, cut_off=0.1, focus='expression1', class_weight='balanced', ep_cut_off=1, pct_cut_off=0.5, robust=True, save=True, file_name=None, n_jobs=-1, logger=None, **kwargs)[source]

A function to merge global clusters and find markers for each global cluster.

Parameters:

data – A log normalized counts matrix. Rows are cells; Columns are features.
n_components – A int to decide how many principle components will be used for KNN clustering. Default is 30.
resolution – A int to control the coarseness of the clustering. Higher values lead to more clusters. Default is 1.
only_positive – A Bool value to control whether only positive features are used. Default is True.
ratio_of_none_zero_counts – A float to determine the cutoff in which a feature will be omited when below specified value. A higher value leads to less features will be kept for calculation. Default is 0.1.
n_features_to_select – A int or float to control the number of features to select. If integer, the parameter is the absolute number of features to select. If float between 0 and 1, it is the fraction of features to select. Default is 0.15.
cut_off – A int or float to control the cutoff of combined score (jaccard index * distance correlation) that will used to decide whether a pair of cluster should be merged or not. Default is 0.1.
focus – A string to decide the value used for distance correlation calculation. “expression1” - the mean expression of feauture will be used. “rank1” - the rank of feature based on expression will be used. Default is ‘expression1’.
class_weight – A string to decide whether class weights will be considered. If None, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). Default is ‘balanced’.
ep_cut_off – A int or float value to decide the expression cutoff to select positive informative features. Default is 1.
pct_cut_off – A int or float value to decide the expression percentage cutoff to select positive informative features. Default is 0.5.
robust – A Bool value to decider whether re calculate the markers of clusters after merging. Default is True.
save – A Bool value to decide whether write the output into the disk. Default is true.
filename – A string to control the name of output. Default is None.
n_jobs – A int to decide the number of thread used for the program. Default is -1, meaning using all available threads.
logger – A log_file object. Default is None.
**kwargs – Other paremeters passed to feature_selection function.