SuperSCC.gene_module.get_gene_module
- SuperSCC.gene_module.get_gene_module(data, intersect_size=10, intersect_group_size=5, parallel_num=8)[source]
A function to iteratively find gene modules from a collection of gene sets.
- Args:
- data (pd.DataFrame): A DataFrame where each column contains a gene set as a
list of strings (e.g., “GENE/SCORE”). Columns should be padded with NaN for unequal lengths.
intersect_size (int): The minimum intersection size to consider merging two sets. intersect_group_size (int): The minimum number of high-intersection pairs required
to proceed with a merge.
parallel_num (int): The number of parallel processes to use.
- Returns:
- dict: A dictionary containing:
‘gene_module’: A list of the identified gene modules (each a list of strings).
‘module_members’: A list of the original column names that formed each module.
‘remained_gene_sets’: The final DataFrame of gene sets that were not merged.