SuperSCC.gene_module.get_gene_module

SuperSCC.gene_module.get_gene_module(data, intersect_size=10, intersect_group_size=5, parallel_num=8)[source]

A function to iteratively find gene modules from a collection of gene sets.

Args:
data (pd.DataFrame): A DataFrame where each column contains a gene set as a

list of strings (e.g., “GENE/SCORE”). Columns should be padded with NaN for unequal lengths.

intersect_size (int): The minimum intersection size to consider merging two sets. intersect_group_size (int): The minimum number of high-intersection pairs required

to proceed with a merge.

parallel_num (int): The number of parallel processes to use.

Returns:
dict: A dictionary containing:
  • ‘gene_module’: A list of the identified gene modules (each a list of strings).

  • ‘module_members’: A list of the original column names that formed each module.

  • ‘remained_gene_sets’: The final DataFrame of gene sets that were not merged.