Gene module detection and evaluation

Here, we show how SuperSCC’s heirarchical markers under different levels across studies can provide conserved gene modules that provide insightful biological clues.

[1]:
import SuperSCC as scc
import pandas as pd
[2]:
# read m or f level gene sets in
m_gene_sets = pd.read_csv("m_ensemble_id_files.csv", index_col = 0)

f_gene_sets = pd.read_csv("f_ensemble_id_files.csv", index_col = 0)
m_gene_sets.head(5)
[2]:
Travaglini_2020_cluster5 No_public_cluster5 Ma_2019_cluster4 He_2022_cluster0 Suo_2022_Bone_marrow_cluster5 Galen_2019_cluster5 Morse_2019_cluster7 Xing_2021_cluster6 He_2022_cluster2 Qian_2020_lung_cluster1 ... Bharat_2020_cluster1 Steen_2021_cluster1 Yoshida_2022_cluster0 Alonso_2022_cluster6 Madissoon_2019_cluster1 Krishna_2021_cluster4 Chen_2022_cluster11 Neftel_2019_cluster0 Kumar_2022_cluster4 Deprez_2020_cluster2
1 ENSG00000121966/1738.2351897769943 ENSG00000271503/1734.6190121609518 ENSG00000106538/1385.5679976354565 ENSG00000164692/2468.1337875547238 ENSG00000250722/1667.2164093886356 ENSG00000019582/1414.0449372661146 ENSG00000117632/1399.6887628457075 ENSG00000011600/1013.176199610937 ENSG00000159713/2473.7380315811993 ENSG00000171345/1320.574483940035 ... ENSG00000011465/1865.0514703170616 ENSG00000271503/1315.6645825474366 ENSG00000227507/1442.2838063594952 ENSG00000019582/2287.361485126351 ENSG00000008517/1553.726620595502 ENSG00000142089/1051.6387484268378 ENSG00000167483/1997.0511977031802 ENSG00000131981/1259.2501484315092 ENSG00000142089/1180.2160799334624 ENSG00000090382/1583.774763951293
2 ENSG00000023445/1712.7058973832948 ENSG00000163736/1698.7479738188138 ENSG00000118785/1382.7571181252606 ENSG00000168542/2467.7523831213666 ENSG00000110077/1663.85722621107 ENSG00000204287/1413.137006847736 ENSG00000123975/1376.739482084917 ENSG00000101439/1011.6857952721534 ENSG00000115461/2473.209902285476 ENSG00000124107/1319.5039148666324 ... ENSG00000152583/1864.1649480656831 ENSG00000113088/1315.5788184204819 ENSG00000198851/1441.4623821948296 ENSG00000142669/2256.9022557941184 ENSG00000277734/1545.845008371674 ENSG00000163453/1049.452735078961 ENSG00000170476/1993.7598078727938 ENSG00000172020/1257.4388151263192 ENSG00000163453/1180.1276273681042 ENSG00000196154/1565.871026444962
3 ENSG00000172183/1711.3438097256449 ENSG00000064601/1666.4037913705797 ENSG00000124107/1380.3373387623137 ENSG00000100097/2467.4978313223114 ENSG00000168209/1659.6000979136218 ENSG00000100097/1410.1419352433252 ENSG00000163221/1272.5457579541662 ENSG00000158869/1009.609298896034 ENSG00000170421/2452.2836977046118 ENSG00000111057/1319.170776591765 ... ENSG00000197766/1863.0891180071826 ENSG00000105374/1315.4072957510782 ENSG00000111716/1439.4524714844001 ENSG00000234745/2244.281396655737 ENSG00000167286/1541.3411033675386 ENSG00000113140/1041.8106206913862 ENSG00000129824/1992.3170782932825 ENSG00000182718/1256.4367007245362 ENSG00000113140/1177.8488889484258 ENSG00000130208/1564.9462437446584
4 ENSG00000211592/1672.2984801738332 ENSG00000169756/1643.9154834566546 ENSG00000291137/1378.549435772063 ENSG00000166482/2466.732322365024 ENSG00000172889/1653.3447399171896 ENSG00000101347/1399.5280975265284 ENSG00000173207/1268.4867016395046 ENSG00000173372/1009.1489588275941 ENSG00000187244/2442.43740436204 ENSG00000008394/1318.4073921533343 ... ENSG00000111341/1862.8357915864788 ENSG00000077984/1315.021160612585 ENSG00000131507/1417.5592763650513 ENSG00000204525/2238.732647136744 ENSG00000211772/1540.903491946536 ENSG00000129538/1040.046048479781 ENSG00000198624/1981.0806438418783 ENSG00000032219/1244.3645352057724 ENSG00000102265/1177.5932079608735 ENSG00000121552/1564.2291269628986
5 ENSG00000211895/1653.701290107666 ENSG00000120885/1633.0141097582223 ENSG00000106565/1378.2663427426346 ENSG00000152583/2466.5423493463045 ENSG00000177606/1652.1638095466308 ENSG00000196126/1396.8798181308327 ENSG00000164611/1185.2089098205347 ENSG00000119655/1001.5303466732067 ENSG00000119888/2440.769555734281 ENSG00000167642/1315.2900562381037 ... ENSG00000167779/1860.0335261766566 ENSG00000145649/1314.7204884182925 ENSG00000104660/1416.0830400138318 ENSG00000164733/2225.215424973359 ENSG00000227507/1520.1419539111037 ENSG00000142192/1024.934266829235 ENSG00000184226/1974.2149793436588 ENSG00000147588/1241.0681456651373 ENSG00000165949/1175.1865024595227 ENSG00000109861/1560.692029208657

5 rows × 527 columns

In each cell, it contains gene ID/symbol and its feature importance detected by SuperSCC.

[ ]:
# find gene modules across m-cluster level gene sets
m_gms = scc.gene_module.get_gene_module(
    data = m_gene_sets,
    parallel_num = 32
)

The output includes gene_module, module_members and remained_gene_sets.

  1. gene_module contains the final conserved gene sets (50 genes) with updated SuperSCC importance scores across studies

  2. module_members contains the contributors for each module

  3. remained_gene_setscontains the remaining gene sets that are not included into the gene modules in each round.

[5]:
m_gms.keys()
[5]:
dict_keys(['gene_module', 'module_members', 'remained_gene_sets'])
[12]:
m_gms["gene_module"][0][0:5]
[12]:
array(['ENSG00000163513/997.624181884792',
       'ENSG00000175899/992.2225469499747',
       'ENSG00000003436/990.4354006923714',
       'ENSG00000184113/989.2463610909931',
       'ENSG00000116016/988.3663976243083'], dtype='<U33')
[11]:
m_gms["module_members"][0][0:5]
[11]:
array(['Pal_2021_3_cluster2', 'Glasner_2023_cluster3',
       'Pan_2023_cluster3', 'Habermann_2020_cluster2',
       'Deprez_2020_cluster4'], dtype='<U30')
[ ]:
# find gene modules across f-cluster level gene sets
f_gms = scc.gene_module.get_gene_module(
    data = f_gene_sets,
    parallel_num = 32,
    lib_loc = "/home/fengtang/R/x86_64-pc-linux-gnu-library/4.3/"
)

After getting the modules under different levels, we could connect them by tracking their contributors. For instance, within one individual dataset, M cluster A contributes to M gene module 1 while F cluster B contributes to F gene module 2. Since M cluster A is the parent cluster of F cluster B, a bridge can be constructed between M gene module 1 and F gene module 2. We can count such event between each module pairs and also explore gene module function via enrichment analysis. Finally we can obtain a data frame like:

image.png

Such flow between gene modules under different level can be visualized in Sankey plot. You can also use SuperSCC for this:

[2]:
df = pd.read_csv("/home/fengtang/jupyter_notebooks/working_script/gene_module/F_cluster/sankey_df.csv", index_col=0)
df = df.iloc[:, [6,0,5,1]]
df.head(5)
[2]:
M_func M_gene_module F_func F_gene_module
1 M_angiogenesis M_gene_module_1 F_angiogenesis F_gene_module_1
2 M_defense_response M_gene_module_2 F_angiogenesis F_gene_module_1
3 M_angiogenesis M_gene_module_9 F_angiogenesis F_gene_module_1
4 Unknown source No F_angiogenesis F_gene_module_1
5 M_immune_response M_gene_module_11 F_immune_response F_gene_module_2
[ ]:
sankey = scc.clustering.get_sankey_dataframe(df, plot = True)

SuperSCC priotitizelty employs LLMs to evaluate the function of gene module, when you want to compare two gene modules:

[ ]:
res = scc.gene_module.compare_gene_modules(
    module1 = ["ADA","C17orf99","PARP3","MICA","MAD2L2","BATF","CD226","TNFSF13B","LILRB1"],
    module2 = ["MIR302E", "ADAM8", "PARK7", "NLRP3", "CCR7", "CNR1", "ADORA1", "C2CD4A", "DNASE1"],
    api_key = "*********" # api key for DeepSeek LLM
)
/home/fengtang/anaconda3/envs/SuperSCC/lib/python3.11/site-packages/SuperSCC/SuperSCC.py:2792: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import ChatOpenAI``.
  model = ChatOpenAI(
[4]:
res.keys()
[4]:
dict_keys(['common_genes', 'unique_to_module1', 'unique_to_module2', 'comparison_analysis'])

The output contains three keys:

  1. common_genes: shared genes between gene modules.

  2. unique_to_module1: unique genes for module 1.

  3. unique_to_module2: unique genes for module 2.

  4. comparison_analysis: the function annotation covering common biological biological pathways between modules, unique pathways in each module, potential functional relationships between modules, disease associations shared between modules and tissue/cell type specificity differences.

[6]:
res["comparison_analysis"]
[6]:
'### Analysis of Gene Modules\n\n#### 1. **Common Biological Pathways Between Modules**\nBoth Module 1 and Module 2 contain genes involved in immune regulation and inflammation. For example:\n- **Module 1**: Genes like **MICA**, **CD226**, and **TNFSF13B** are associated with immune cell activation, particularly in natural killer (NK) cells and T cells. **LILRB1** is involved in immune checkpoint regulation.\n- **Module 2**: Genes like **NLRP3** and **CCR7** are key players in inflammasome activation and immune cell migration, respectively. **ADAM8** is also implicated in immune cell adhesion and signaling.\n\nThe overlapping pathways include:\n- **Immune response regulation**: Both modules contribute to the modulation of immune cell activity, though through different mechanisms.\n- **Inflammatory signaling**: **NLRP3** (Module 2) and **TNFSF13B** (Module 1) are linked to inflammatory pathways, suggesting a shared role in inflammation-related processes.\n\n#### 2. **Unique Pathways in Each Module**\n- **Module 1**:\n  - **DNA repair and cell cycle regulation**: **PARP3** and **MAD2L2** are involved in DNA repair and mitotic regulation, respectively.\n  - **Transcription regulation**: **BATF** is a transcription factor that regulates immune cell differentiation.\n  - **NK cell and T cell activation**: **MICA** and **CD226** are specifically involved in NK and T cell signaling.\n\n- **Module 2**:\n  - **Neuroinflammation and neurodegeneration**: **PARK7** and **CNR1** are associated with neuroprotection and cannabinoid signaling, respectively.\n  - **MicroRNA regulation**: **MIR302E** is involved in post-transcriptional gene regulation, potentially influencing cell differentiation and development.\n  - **Chemokine signaling**: **CCR7** is critical for lymphocyte trafficking and immune cell migration.\n\n#### 3. **Potential Functional Relationships Between Modules**\nThe two modules may interact in the context of immune regulation and inflammation. For instance:\n- **Module 1** genes like **TNFSF13B** and **LILRB1** could modulate immune responses that are influenced by **Module 2** genes like **NLRP3** and **CCR7**, which drive inflammatory signaling and immune cell migration.\n- **PARP3** (Module 1) and **PARK7** (Module 2) both have roles in cellular stress responses, suggesting a potential interplay in maintaining cellular homeostasis under stress conditions.\n\n#### 4. **Disease Associations Shared Between Modules**\nBoth modules are implicated in diseases involving immune dysregulation and inflammation:\n- **Autoimmune diseases**: Genes like **TNFSF13B** (Module 1) and **NLRP3** (Module 2) are linked to autoimmune conditions such as lupus and rheumatoid arthritis.\n- **Cancer**: **MICA** (Module 1) and **ADAM8** (Module 2) are associated with tumor immune evasion and metastasis, respectively.\n- **Neurodegenerative diseases**: **PARK7** (Module 2) is linked to Parkinson’s disease, while **PARP3** (Module 1) may play a role in DNA damage-related neurodegeneration.\n\n#### 5. **Tissue/Cell Type Specificity Differences**\n- **Module 1**:\n  - **Immune cells**: **MICA**, **CD226**, and **LILRB1** are highly expressed in NK cells, T cells, and myeloid cells.\n  - **Proliferating tissues**: **PARP3** and **MAD2L2** are active in tissues with high cell turnover, such as the bone marrow and gut.\n\n- **Module 2**:\n  - **Neuronal tissues**: **PARK7** and **CNR1** are predominantly expressed in the brain and nervous system.\n  - **Immune cells**: **NLRP3** and **CCR7** are active in macrophages, dendritic cells, and lymphocytes.\n  - **Epithelial tissues**: **ADAM8** is often expressed in epithelial cells and is involved in tissue remodeling.\n\n### Summary\nModule 1 is more focused on immune cell regulation, DNA repair, and cell cycle control, with strong associations to immune-related diseases and proliferating tissues. Module 2, on the other hand, emphasizes neuroinflammation, microRNA regulation, and chemokine signaling, with links to neurodegenerative diseases and neuronal tissues. Despite their differences, both modules converge on immune and inflammatory pathways, suggesting potential crosstalk in diseases like autoimmunity and cancer.'

Analysis of Gene Modules

1. Common Biological Pathways Between Modules

Both Module 1 and Module 2 contain genes involved in immune regulation and inflammation. For example:

  • Module 1: Genes like MICA, CD226, and TNFSF13B are associated with immune cell activation, particularly in natural killer (NK) cells and T cells. LILRB1 is involved in immune checkpoint regulation.

  • Module 2: Genes like NLRP3 and CCR7 are key players in inflammasome activation and immune cell migration, respectively. ADAM8 is also implicated in immune cell adhesion and signaling.

The overlapping pathways include:

  • Immune response regulation: Both modules contribute to the modulation of immune cell activity, though through different mechanisms.

  • Inflammatory signaling: NLRP3 (Module 2) and TNFSF13B (Module 1) are linked to inflammatory pathways, suggesting a shared role in inflammation-related processes.

2. Unique Pathways in Each Module

  • Module 1:

    • DNA repair and cell cycle regulation: PARP3 and MAD2L2 are involved in DNA repair and mitotic regulation, respectively.

    • Transcription regulation: BATF is a transcription factor that regulates immune cell differentiation.

    • NK cell and T cell activation: MICA and CD226 are specifically involved in NK and T cell signaling.

  • Module 2:

    • Neuroinflammation and neurodegeneration: PARK7 and CNR1 are associated with neuroprotection and cannabinoid signaling, respectively.

    • MicroRNA regulation: MIR302E is involved in post-transcriptional gene regulation, potentially influencing cell differentiation and development.

    • Chemokine signaling: CCR7 is critical for lymphocyte trafficking and immune cell migration.

3. Potential Functional Relationships Between Modules

The two modules may interact in the context of immune regulation and inflammation. For instance:

  • Module 1 genes like TNFSF13B and LILRB1 could modulate immune responses that are influenced by Module 2 genes like NLRP3 and CCR7, which drive inflammatory signaling and immune cell migration.

  • PARP3 (Module 1) and PARK7 (Module 2) both have roles in cellular stress responses, suggesting a potential interplay in maintaining cellular homeostasis under stress conditions.

4. Disease Associations Shared Between Modules

Both modules are implicated in diseases involving immune dysregulation and inflammation:

  • Autoimmune diseases: Genes like TNFSF13B (Module 1) and NLRP3 (Module 2) are linked to autoimmune conditions such as lupus and rheumatoid arthritis.

  • Cancer: MICA (Module 1) and ADAM8 (Module 2) are associated with tumor immune evasion and metastasis, respectively.

  • Neurodegenerative diseases: PARK7 (Module 2) is linked to Parkinson’s disease, while PARP3 (Module 1) may play a role in DNA damage-related neurodegeneration.

5. Tissue/Cell Type Specificity Differences

  • Module 1:

    • Immune cells: MICA, CD226, and LILRB1 are highly expressed in NK cells, T cells, and myeloid cells.

    • Proliferating tissues: PARP3 and MAD2L2 are active in tissues with high cell turnover, such as the bone marrow and gut.

  • Module 2:

    • Neuronal tissues: PARK7 and CNR1 are predominantly expressed in the brain and nervous system.

    • Immune cells: NLRP3 and CCR7 are active in macrophages, dendritic cells, and lymphocytes.

    • Epithelial tissues: ADAM8 is often expressed in epithelial cells and is involved in tissue remodeling.

Summary

Module 1 is more focused on immune cell regulation, DNA repair, and cell cycle control, with strong associations to immune-related diseases and proliferating tissues. Module 2, on the other hand, emphasizes neuroinflammation, microRNA regulation, and chemokine signaling, with links to neurodegenerative diseases and neuronal tissues. Despite their differences, both modules converge on immune and inflammatory pathways, suggesting potential crosstalk in diseases like autoimmunity and cancer.

when you only want to assess one individual gene module:

[ ]:
res = scc.gene_module.analyse_one_gene_module(
    module_genes = ["ADA","C17orf99","PARP3","MICA","MAD2L2","BATF","CD226","TNFSF13B","LILRB1"],
    api_key = "*********" # api key for DeepSeek LLM
)
/home/fengtang/anaconda3/envs/SuperSCC/lib/python3.11/site-packages/SuperSCC/SuperSCC.py:2842: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import ChatOpenAI``.
  model = ChatOpenAI(
[3]:
res
[3]:
'The gene module consisting of **ADA, C17orf99, PARP3, MICA, MAD2L2, BATF, CD226, TNFSF13B, and LILRB1** represents a functionally diverse yet interconnected group of genes involved in immune regulation, DNA repair, and cellular signaling. Below is a detailed functional interpretation of this gene module:\n\n---\n\n### 1. **Common Biological Pathways**\nThe genes in this module are primarily associated with **immune response pathways** and **DNA damage repair mechanisms**. \n- **ADA (Adenosine Deaminase)** is critical for purine metabolism and immune function, as it prevents the accumulation of toxic deoxyadenosine, which can impair lymphocyte development and function.\n- **PARP3 (Poly(ADP-Ribose) Polymerase 3)** and **MAD2L2 (Mitotic Arrest Deficient 2 Like 2)** are involved in DNA repair and genomic stability. PARP3 participates in the repair of DNA double-strand breaks, while MAD2L2 is associated with error-prone DNA repair processes.\n- **MICA (MHC Class I Polypeptide-Related Sequence A)** and **LILRB1 (Leukocyte Immunoglobulin-Like Receptor B1)** are key players in immune regulation. MICA is a stress-induced ligand recognized by natural killer (NK) cells, while LILRB1 is an inhibitory receptor that modulates immune cell activity.\n- **BATF (Basic Leucine Zipper ATF-Like Transcription Factor)** and **TNFSF13B (Tumor Necrosis Factor Superfamily Member 13B)** are involved in cytokine signaling and immune cell differentiation. BATF regulates T-cell differentiation, while TNFSF13B (also known as BAFF) is critical for B-cell survival and maturation.\n- **CD226 (DNAX Accessory Molecule-1)** is a co-stimulatory molecule involved in NK and T-cell activation.\n- **C17orf99** is a less characterized gene but has been implicated in immune modulation and inflammation.\n\n---\n\n### 2. **Cellular Processes Involved**\nThe genes in this module are involved in several key cellular processes:\n- **Immune Regulation**: ADA, MICA, LILRB1, CD226, BATF, and TNFSF13B are central to immune cell activation, differentiation, and tolerance. These genes collectively regulate the balance between immune activation and suppression.\n- **DNA Repair and Genomic Stability**: PARP3 and MAD2L2 are critical for maintaining genomic integrity by repairing DNA damage and ensuring proper cell cycle progression.\n- **Cell Signaling and Communication**: MICA, LILRB1, and CD226 mediate cell-cell interactions, particularly in the context of immune surveillance and tumor recognition.\n- **Transcription Regulation**: BATF acts as a transcription factor that modulates gene expression in immune cells, influencing their functional states.\n\n---\n\n### 3. **Potential Tissue/Cell Type Specificity**\nThis gene module is likely to be highly expressed in **immune-related tissues and cell types**, including:\n- **Lymphoid tissues** (e.g., lymph nodes, spleen) and **bone marrow**, where immune cell development and maturation occur.\n- **Peripheral blood mononuclear cells (PBMCs)**, including T cells, B cells, and NK cells, which express genes like MICA, LILRB1, CD226, and BATF.\n- **Tumor microenvironments**, where MICA and LILRB1 play roles in immune evasion and tumor surveillance.\n- **Epithelial tissues**, where PARP3 and MAD2L2 may contribute to DNA repair in response to environmental stressors.\n\n---\n\n### 4. **Disease Associations**\nThe genes in this module are associated with a variety of diseases, particularly those involving immune dysregulation and cancer:\n- **Autoimmune Diseases**: ADA deficiency causes severe combined immunodeficiency (SCID), while TNFSF13B is linked to systemic lupus erythematosus (SLE) and rheumatoid arthritis.\n- **Cancer**: MICA and LILRB1 are implicated in tumor immune evasion, while PARP3 and MAD2L2 are associated with cancer progression due to their roles in DNA repair and genomic instability.\n- **Infectious Diseases**: CD226 and MICA are involved in viral recognition and clearance, making them relevant to infectious disease outcomes.\n- **Inflammatory Disorders**: BATF and TNFSF13B are associated with chronic inflammation and inflammatory bowel disease (IBD).\n\n---\n\n### 5. **Functional Relationships Between Genes**\nThe genes in this module exhibit strong functional relationships, particularly in the context of immune regulation and DNA repair:\n- **Immune Activation vs. Suppression**: MICA and CD226 promote immune activation by engaging NK and T cells, while LILRB1 acts as an inhibitory receptor to dampen immune responses. This balance is critical for maintaining immune homeostasis.\n- **DNA Repair and Immune Crosstalk**: PARP3 and MAD2L2, while primarily involved in DNA repair, may also influence immune responses by modulating genomic stability and cell survival in immune cells.\n- **Cytokine Signaling and Transcription**: BATF and TNFSF13B work in concert to regulate cytokine production and immune cell differentiation, linking transcriptional regulation to immune function.\n- **Metabolic Regulation**: ADA connects immune function to cellular metabolism by regulating purine levels, which are critical for lymphocyte proliferation and function.\n\n---\n\n### Summary\nThis gene module represents a functionally cohesive network of genes involved in immune regulation, DNA repair, and cellular signaling. The interplay between these genes suggests a critical role in maintaining immune homeostasis, responding to DNA damage, and modulating immune responses in diseases such as cancer, autoimmune disorders, and infections. The tissue-specific expression of these genes in immune-related tissues further underscores their importance in immune surveillance and disease pathogenesis.'

The gene module consisting of ADA, C17orf99, PARP3, MICA, MAD2L2, BATF, CD226, TNFSF13B, and LILRB1 represents a functionally diverse yet interconnected group of genes involved in immune regulation, DNA repair, and cellular signaling. Below is a detailed functional interpretation of this gene module:


1. Common Biological Pathways

The genes in this module are primarily associated with immune response pathways and DNA damage repair mechanisms.

  • ADA (Adenosine Deaminase) is critical for purine metabolism and immune function, as it prevents the accumulation of toxic deoxyadenosine, which can impair lymphocyte development and function.

  • PARP3 (Poly(ADP-Ribose) Polymerase 3) and MAD2L2 (Mitotic Arrest Deficient 2 Like 2) are involved in DNA repair and genomic stability. PARP3 participates in the repair of DNA double-strand breaks, while MAD2L2 is associated with error-prone DNA repair processes.

  • MICA (MHC Class I Polypeptide-Related Sequence A) and LILRB1 (Leukocyte Immunoglobulin-Like Receptor B1) are key players in immune regulation. MICA is a stress-induced ligand recognized by natural killer (NK) cells, while LILRB1 is an inhibitory receptor that modulates immune cell activity.

  • BATF (Basic Leucine Zipper ATF-Like Transcription Factor) and TNFSF13B (Tumor Necrosis Factor Superfamily Member 13B) are involved in cytokine signaling and immune cell differentiation. BATF regulates T-cell differentiation, while TNFSF13B (also known as BAFF) is critical for B-cell survival and maturation.

  • CD226 (DNAX Accessory Molecule-1) is a co-stimulatory molecule involved in NK and T-cell activation.

  • C17orf99 is a less characterized gene but has been implicated in immune modulation and inflammation.


2. Cellular Processes Involved

The genes in this module are involved in several key cellular processes:

  • Immune Regulation: ADA, MICA, LILRB1, CD226, BATF, and TNFSF13B are central to immune cell activation, differentiation, and tolerance. These genes collectively regulate the balance between immune activation and suppression.

  • DNA Repair and Genomic Stability: PARP3 and MAD2L2 are critical for maintaining genomic integrity by repairing DNA damage and ensuring proper cell cycle progression.

  • Cell Signaling and Communication: MICA, LILRB1, and CD226 mediate cell-cell interactions, particularly in the context of immune surveillance and tumor recognition.

  • Transcription Regulation: BATF acts as a transcription factor that modulates gene expression in immune cells, influencing their functional states.


3. Potential Tissue/Cell Type Specificity

This gene module is likely to be highly expressed in immune-related tissues and cell types, including:

  • Lymphoid tissues (e.g., lymph nodes, spleen) and bone marrow, where immune cell development and maturation occur.

  • Peripheral blood mononuclear cells (PBMCs), including T cells, B cells, and NK cells, which express genes like MICA, LILRB1, CD226, and BATF.

  • Tumor microenvironments, where MICA and LILRB1 play roles in immune evasion and tumor surveillance.

  • Epithelial tissues, where PARP3 and MAD2L2 may contribute to DNA repair in response to environmental stressors.


4. Disease Associations

The genes in this module are associated with a variety of diseases, particularly those involving immune dysregulation and cancer:

  • Autoimmune Diseases: ADA deficiency causes severe combined immunodeficiency (SCID), while TNFSF13B is linked to systemic lupus erythematosus (SLE) and rheumatoid arthritis.

  • Cancer: MICA and LILRB1 are implicated in tumor immune evasion, while PARP3 and MAD2L2 are associated with cancer progression due to their roles in DNA repair and genomic instability.

  • Infectious Diseases: CD226 and MICA are involved in viral recognition and clearance, making them relevant to infectious disease outcomes.

  • Inflammatory Disorders: BATF and TNFSF13B are associated with chronic inflammation and inflammatory bowel disease (IBD).


5. Functional Relationships Between Genes

The genes in this module exhibit strong functional relationships, particularly in the context of immune regulation and DNA repair:

  • Immune Activation vs. Suppression: MICA and CD226 promote immune activation by engaging NK and T cells, while LILRB1 acts as an inhibitory receptor to dampen immune responses. This balance is critical for maintaining immune homeostasis.

  • DNA Repair and Immune Crosstalk: PARP3 and MAD2L2, while primarily involved in DNA repair, may also influence immune responses by modulating genomic stability and cell survival in immune cells.

  • Cytokine Signaling and Transcription: BATF and TNFSF13B work in concert to regulate cytokine production and immune cell differentiation, linking transcriptional regulation to immune function.

  • Metabolic Regulation: ADA connects immune function to cellular metabolism by regulating purine levels, which are critical for lymphocyte proliferation and function.


Summary

This gene module represents a functionally cohesive network of genes involved in immune regulation, DNA repair, and cellular signaling. The interplay between these genes suggests a critical role in maintaining immune homeostasis, responding to DNA damage, and modulating immune responses in diseases such as cancer, autoimmune disorders, and infections. The tissue-specific expression of these genes in immune-related tissues further underscores their importance in immune surveillance and disease pathogenesis.