SuperSCC.label_transfer.model_training

SuperSCC.label_transfer.model_training(data, label_column, features, model, normalization_method='Min-Max', feature_selection_params=None, parameters=None, probability=False, random_state=10, cv=5, n_jobs=-1, filename=None, save=True, logger=None)[source]

A function to do model training based on selected features and model.

Parameters:

data – A log normalized expression matrix. Rows are cells; Columns are features.
label_column – A string to specify the name of cell type column in the data.
features – A list to decide the features remained for model training.
model – A string to decide the training model. “random_foreast”, “svm” or “logistic” could be selected.
normalization_method – A string to decide how to normalize data. Default is “Min-Max”. Other available words including “Standardization”. If features are returned by feature_selection function, this parameter should be consistent with the counterpart used for running feature_selection function.
feature_selection_params – A dict to show params used for extracting informaive features used for model training.
parameters – A dict to elucidate the parameter name and value pair that should be searched in the grid search algorithm for corresponding model. Default is None.
probability – A Bool value to decide whether the logits should be return when the model is “svm”. Default is False.
random_state – A int to control the randomness of the bootstrapping of the samples. It takes effect when model is set in ‘random_foreast’ or in “logistic”. Default is 10.
cv – A int to decide the number of cross validation for grid serach. Default is 5.
n_jobs – A int to decide the number of thread used for the program. Default is -1, meaning using all available threads.
filename – A string to indicte the name of output file.
save – A Bool value to decide whether the result will be written into disk. Default is True.
logger – A log_file object to write log information into disk. Default is None.