Gene module detection and evaluation
Here, we show how SuperSCC’s heirarchical markers under different levels across studies can provide conserved gene modules that provide insightful biological clues.
[1]:
import SuperSCC as scc
import pandas as pd
[2]:
# read m or f level gene sets in
m_gene_sets = pd.read_csv("m_ensemble_id_files.csv", index_col = 0)
f_gene_sets = pd.read_csv("f_ensemble_id_files.csv", index_col = 0)
m_gene_sets.head(5)
[2]:
| Travaglini_2020_cluster5 | No_public_cluster5 | Ma_2019_cluster4 | He_2022_cluster0 | Suo_2022_Bone_marrow_cluster5 | Galen_2019_cluster5 | Morse_2019_cluster7 | Xing_2021_cluster6 | He_2022_cluster2 | Qian_2020_lung_cluster1 | ... | Bharat_2020_cluster1 | Steen_2021_cluster1 | Yoshida_2022_cluster0 | Alonso_2022_cluster6 | Madissoon_2019_cluster1 | Krishna_2021_cluster4 | Chen_2022_cluster11 | Neftel_2019_cluster0 | Kumar_2022_cluster4 | Deprez_2020_cluster2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ENSG00000121966/1738.2351897769943 | ENSG00000271503/1734.6190121609518 | ENSG00000106538/1385.5679976354565 | ENSG00000164692/2468.1337875547238 | ENSG00000250722/1667.2164093886356 | ENSG00000019582/1414.0449372661146 | ENSG00000117632/1399.6887628457075 | ENSG00000011600/1013.176199610937 | ENSG00000159713/2473.7380315811993 | ENSG00000171345/1320.574483940035 | ... | ENSG00000011465/1865.0514703170616 | ENSG00000271503/1315.6645825474366 | ENSG00000227507/1442.2838063594952 | ENSG00000019582/2287.361485126351 | ENSG00000008517/1553.726620595502 | ENSG00000142089/1051.6387484268378 | ENSG00000167483/1997.0511977031802 | ENSG00000131981/1259.2501484315092 | ENSG00000142089/1180.2160799334624 | ENSG00000090382/1583.774763951293 |
| 2 | ENSG00000023445/1712.7058973832948 | ENSG00000163736/1698.7479738188138 | ENSG00000118785/1382.7571181252606 | ENSG00000168542/2467.7523831213666 | ENSG00000110077/1663.85722621107 | ENSG00000204287/1413.137006847736 | ENSG00000123975/1376.739482084917 | ENSG00000101439/1011.6857952721534 | ENSG00000115461/2473.209902285476 | ENSG00000124107/1319.5039148666324 | ... | ENSG00000152583/1864.1649480656831 | ENSG00000113088/1315.5788184204819 | ENSG00000198851/1441.4623821948296 | ENSG00000142669/2256.9022557941184 | ENSG00000277734/1545.845008371674 | ENSG00000163453/1049.452735078961 | ENSG00000170476/1993.7598078727938 | ENSG00000172020/1257.4388151263192 | ENSG00000163453/1180.1276273681042 | ENSG00000196154/1565.871026444962 |
| 3 | ENSG00000172183/1711.3438097256449 | ENSG00000064601/1666.4037913705797 | ENSG00000124107/1380.3373387623137 | ENSG00000100097/2467.4978313223114 | ENSG00000168209/1659.6000979136218 | ENSG00000100097/1410.1419352433252 | ENSG00000163221/1272.5457579541662 | ENSG00000158869/1009.609298896034 | ENSG00000170421/2452.2836977046118 | ENSG00000111057/1319.170776591765 | ... | ENSG00000197766/1863.0891180071826 | ENSG00000105374/1315.4072957510782 | ENSG00000111716/1439.4524714844001 | ENSG00000234745/2244.281396655737 | ENSG00000167286/1541.3411033675386 | ENSG00000113140/1041.8106206913862 | ENSG00000129824/1992.3170782932825 | ENSG00000182718/1256.4367007245362 | ENSG00000113140/1177.8488889484258 | ENSG00000130208/1564.9462437446584 |
| 4 | ENSG00000211592/1672.2984801738332 | ENSG00000169756/1643.9154834566546 | ENSG00000291137/1378.549435772063 | ENSG00000166482/2466.732322365024 | ENSG00000172889/1653.3447399171896 | ENSG00000101347/1399.5280975265284 | ENSG00000173207/1268.4867016395046 | ENSG00000173372/1009.1489588275941 | ENSG00000187244/2442.43740436204 | ENSG00000008394/1318.4073921533343 | ... | ENSG00000111341/1862.8357915864788 | ENSG00000077984/1315.021160612585 | ENSG00000131507/1417.5592763650513 | ENSG00000204525/2238.732647136744 | ENSG00000211772/1540.903491946536 | ENSG00000129538/1040.046048479781 | ENSG00000198624/1981.0806438418783 | ENSG00000032219/1244.3645352057724 | ENSG00000102265/1177.5932079608735 | ENSG00000121552/1564.2291269628986 |
| 5 | ENSG00000211895/1653.701290107666 | ENSG00000120885/1633.0141097582223 | ENSG00000106565/1378.2663427426346 | ENSG00000152583/2466.5423493463045 | ENSG00000177606/1652.1638095466308 | ENSG00000196126/1396.8798181308327 | ENSG00000164611/1185.2089098205347 | ENSG00000119655/1001.5303466732067 | ENSG00000119888/2440.769555734281 | ENSG00000167642/1315.2900562381037 | ... | ENSG00000167779/1860.0335261766566 | ENSG00000145649/1314.7204884182925 | ENSG00000104660/1416.0830400138318 | ENSG00000164733/2225.215424973359 | ENSG00000227507/1520.1419539111037 | ENSG00000142192/1024.934266829235 | ENSG00000184226/1974.2149793436588 | ENSG00000147588/1241.0681456651373 | ENSG00000165949/1175.1865024595227 | ENSG00000109861/1560.692029208657 |
5 rows × 527 columns
In each cell, it contains gene ID/symbol and its feature importance detected by SuperSCC.
[ ]:
# find gene modules across m-cluster level gene sets
m_gms = scc.gene_module.get_gene_module(
data = m_gene_sets,
parallel_num = 32
)
The output includes gene_module, module_members and remained_gene_sets.
gene_modulecontains the final conserved gene sets (50 genes) with updated SuperSCC importance scores across studiesmodule_memberscontains the contributors for each moduleremained_gene_setscontains the remaining gene sets that are not included into the gene modules in each round.
[5]:
m_gms.keys()
[5]:
dict_keys(['gene_module', 'module_members', 'remained_gene_sets'])
[12]:
m_gms["gene_module"][0][0:5]
[12]:
array(['ENSG00000163513/997.624181884792',
'ENSG00000175899/992.2225469499747',
'ENSG00000003436/990.4354006923714',
'ENSG00000184113/989.2463610909931',
'ENSG00000116016/988.3663976243083'], dtype='<U33')
[11]:
m_gms["module_members"][0][0:5]
[11]:
array(['Pal_2021_3_cluster2', 'Glasner_2023_cluster3',
'Pan_2023_cluster3', 'Habermann_2020_cluster2',
'Deprez_2020_cluster4'], dtype='<U30')
[ ]:
# find gene modules across f-cluster level gene sets
f_gms = scc.gene_module.get_gene_module(
data = f_gene_sets,
parallel_num = 32,
lib_loc = "/home/fengtang/R/x86_64-pc-linux-gnu-library/4.3/"
)
After getting the modules under different levels, we could connect them by tracking their contributors. For instance, within one individual dataset, M cluster A contributes to M gene module 1 while F cluster B contributes to F gene module 2. Since M cluster A is the parent cluster of F cluster B, a bridge can be constructed between M gene module 1 and F gene module 2. We can count such event between each module pairs and also explore gene module function via enrichment analysis. Finally we can obtain a data frame like:
Such flow between gene modules under different level can be visualized in Sankey plot. You can also use SuperSCC for this:
[2]:
df = pd.read_csv("/home/fengtang/jupyter_notebooks/working_script/gene_module/F_cluster/sankey_df.csv", index_col=0)
df = df.iloc[:, [6,0,5,1]]
df.head(5)
[2]:
| M_func | M_gene_module | F_func | F_gene_module | |
|---|---|---|---|---|
| 1 | M_angiogenesis | M_gene_module_1 | F_angiogenesis | F_gene_module_1 |
| 2 | M_defense_response | M_gene_module_2 | F_angiogenesis | F_gene_module_1 |
| 3 | M_angiogenesis | M_gene_module_9 | F_angiogenesis | F_gene_module_1 |
| 4 | Unknown source | No | F_angiogenesis | F_gene_module_1 |
| 5 | M_immune_response | M_gene_module_11 | F_immune_response | F_gene_module_2 |
[ ]:
sankey = scc.clustering.get_sankey_dataframe(df, plot = True)
SuperSCC priotitizelty employs LLMs to evaluate the function of gene module, when you want to compare two gene modules:
[ ]:
res = scc.gene_module.compare_gene_modules(
module1 = ["ADA","C17orf99","PARP3","MICA","MAD2L2","BATF","CD226","TNFSF13B","LILRB1"],
module2 = ["MIR302E", "ADAM8", "PARK7", "NLRP3", "CCR7", "CNR1", "ADORA1", "C2CD4A", "DNASE1"],
api_key = "*********" # api key for DeepSeek LLM
)
/home/fengtang/anaconda3/envs/SuperSCC/lib/python3.11/site-packages/SuperSCC/SuperSCC.py:2792: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import ChatOpenAI``.
model = ChatOpenAI(
[4]:
res.keys()
[4]:
dict_keys(['common_genes', 'unique_to_module1', 'unique_to_module2', 'comparison_analysis'])
The output contains three keys:
common_genes: shared genes between gene modules.unique_to_module1: unique genes for module 1.unique_to_module2: unique genes for module 2.comparison_analysis: the function annotation covering common biological biological pathways between modules, unique pathways in each module, potential functional relationships between modules, disease associations shared between modules and tissue/cell type specificity differences.
[6]:
res["comparison_analysis"]
[6]:
'### Analysis of Gene Modules\n\n#### 1. **Common Biological Pathways Between Modules**\nBoth Module 1 and Module 2 contain genes involved in immune regulation and inflammation. For example:\n- **Module 1**: Genes like **MICA**, **CD226**, and **TNFSF13B** are associated with immune cell activation, particularly in natural killer (NK) cells and T cells. **LILRB1** is involved in immune checkpoint regulation.\n- **Module 2**: Genes like **NLRP3** and **CCR7** are key players in inflammasome activation and immune cell migration, respectively. **ADAM8** is also implicated in immune cell adhesion and signaling.\n\nThe overlapping pathways include:\n- **Immune response regulation**: Both modules contribute to the modulation of immune cell activity, though through different mechanisms.\n- **Inflammatory signaling**: **NLRP3** (Module 2) and **TNFSF13B** (Module 1) are linked to inflammatory pathways, suggesting a shared role in inflammation-related processes.\n\n#### 2. **Unique Pathways in Each Module**\n- **Module 1**:\n - **DNA repair and cell cycle regulation**: **PARP3** and **MAD2L2** are involved in DNA repair and mitotic regulation, respectively.\n - **Transcription regulation**: **BATF** is a transcription factor that regulates immune cell differentiation.\n - **NK cell and T cell activation**: **MICA** and **CD226** are specifically involved in NK and T cell signaling.\n\n- **Module 2**:\n - **Neuroinflammation and neurodegeneration**: **PARK7** and **CNR1** are associated with neuroprotection and cannabinoid signaling, respectively.\n - **MicroRNA regulation**: **MIR302E** is involved in post-transcriptional gene regulation, potentially influencing cell differentiation and development.\n - **Chemokine signaling**: **CCR7** is critical for lymphocyte trafficking and immune cell migration.\n\n#### 3. **Potential Functional Relationships Between Modules**\nThe two modules may interact in the context of immune regulation and inflammation. For instance:\n- **Module 1** genes like **TNFSF13B** and **LILRB1** could modulate immune responses that are influenced by **Module 2** genes like **NLRP3** and **CCR7**, which drive inflammatory signaling and immune cell migration.\n- **PARP3** (Module 1) and **PARK7** (Module 2) both have roles in cellular stress responses, suggesting a potential interplay in maintaining cellular homeostasis under stress conditions.\n\n#### 4. **Disease Associations Shared Between Modules**\nBoth modules are implicated in diseases involving immune dysregulation and inflammation:\n- **Autoimmune diseases**: Genes like **TNFSF13B** (Module 1) and **NLRP3** (Module 2) are linked to autoimmune conditions such as lupus and rheumatoid arthritis.\n- **Cancer**: **MICA** (Module 1) and **ADAM8** (Module 2) are associated with tumor immune evasion and metastasis, respectively.\n- **Neurodegenerative diseases**: **PARK7** (Module 2) is linked to Parkinson’s disease, while **PARP3** (Module 1) may play a role in DNA damage-related neurodegeneration.\n\n#### 5. **Tissue/Cell Type Specificity Differences**\n- **Module 1**:\n - **Immune cells**: **MICA**, **CD226**, and **LILRB1** are highly expressed in NK cells, T cells, and myeloid cells.\n - **Proliferating tissues**: **PARP3** and **MAD2L2** are active in tissues with high cell turnover, such as the bone marrow and gut.\n\n- **Module 2**:\n - **Neuronal tissues**: **PARK7** and **CNR1** are predominantly expressed in the brain and nervous system.\n - **Immune cells**: **NLRP3** and **CCR7** are active in macrophages, dendritic cells, and lymphocytes.\n - **Epithelial tissues**: **ADAM8** is often expressed in epithelial cells and is involved in tissue remodeling.\n\n### Summary\nModule 1 is more focused on immune cell regulation, DNA repair, and cell cycle control, with strong associations to immune-related diseases and proliferating tissues. Module 2, on the other hand, emphasizes neuroinflammation, microRNA regulation, and chemokine signaling, with links to neurodegenerative diseases and neuronal tissues. Despite their differences, both modules converge on immune and inflammatory pathways, suggesting potential crosstalk in diseases like autoimmunity and cancer.'
Analysis of Gene Modules
1. Common Biological Pathways Between Modules
Both Module 1 and Module 2 contain genes involved in immune regulation and inflammation. For example:
Module 1: Genes like MICA, CD226, and TNFSF13B are associated with immune cell activation, particularly in natural killer (NK) cells and T cells. LILRB1 is involved in immune checkpoint regulation.
Module 2: Genes like NLRP3 and CCR7 are key players in inflammasome activation and immune cell migration, respectively. ADAM8 is also implicated in immune cell adhesion and signaling.
The overlapping pathways include:
Immune response regulation: Both modules contribute to the modulation of immune cell activity, though through different mechanisms.
Inflammatory signaling: NLRP3 (Module 2) and TNFSF13B (Module 1) are linked to inflammatory pathways, suggesting a shared role in inflammation-related processes.
2. Unique Pathways in Each Module
Module 1:
DNA repair and cell cycle regulation: PARP3 and MAD2L2 are involved in DNA repair and mitotic regulation, respectively.
Transcription regulation: BATF is a transcription factor that regulates immune cell differentiation.
NK cell and T cell activation: MICA and CD226 are specifically involved in NK and T cell signaling.
Module 2:
Neuroinflammation and neurodegeneration: PARK7 and CNR1 are associated with neuroprotection and cannabinoid signaling, respectively.
MicroRNA regulation: MIR302E is involved in post-transcriptional gene regulation, potentially influencing cell differentiation and development.
Chemokine signaling: CCR7 is critical for lymphocyte trafficking and immune cell migration.
3. Potential Functional Relationships Between Modules
The two modules may interact in the context of immune regulation and inflammation. For instance:
Module 1 genes like TNFSF13B and LILRB1 could modulate immune responses that are influenced by Module 2 genes like NLRP3 and CCR7, which drive inflammatory signaling and immune cell migration.
PARP3 (Module 1) and PARK7 (Module 2) both have roles in cellular stress responses, suggesting a potential interplay in maintaining cellular homeostasis under stress conditions.
5. Tissue/Cell Type Specificity Differences
Module 1:
Immune cells: MICA, CD226, and LILRB1 are highly expressed in NK cells, T cells, and myeloid cells.
Proliferating tissues: PARP3 and MAD2L2 are active in tissues with high cell turnover, such as the bone marrow and gut.
Module 2:
Neuronal tissues: PARK7 and CNR1 are predominantly expressed in the brain and nervous system.
Immune cells: NLRP3 and CCR7 are active in macrophages, dendritic cells, and lymphocytes.
Epithelial tissues: ADAM8 is often expressed in epithelial cells and is involved in tissue remodeling.
Summary
Module 1 is more focused on immune cell regulation, DNA repair, and cell cycle control, with strong associations to immune-related diseases and proliferating tissues. Module 2, on the other hand, emphasizes neuroinflammation, microRNA regulation, and chemokine signaling, with links to neurodegenerative diseases and neuronal tissues. Despite their differences, both modules converge on immune and inflammatory pathways, suggesting potential crosstalk in diseases like autoimmunity and cancer.
when you only want to assess one individual gene module:
[ ]:
res = scc.gene_module.analyse_one_gene_module(
module_genes = ["ADA","C17orf99","PARP3","MICA","MAD2L2","BATF","CD226","TNFSF13B","LILRB1"],
api_key = "*********" # api key for DeepSeek LLM
)
/home/fengtang/anaconda3/envs/SuperSCC/lib/python3.11/site-packages/SuperSCC/SuperSCC.py:2842: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import ChatOpenAI``.
model = ChatOpenAI(
[3]:
res
[3]:
'The gene module consisting of **ADA, C17orf99, PARP3, MICA, MAD2L2, BATF, CD226, TNFSF13B, and LILRB1** represents a functionally diverse yet interconnected group of genes involved in immune regulation, DNA repair, and cellular signaling. Below is a detailed functional interpretation of this gene module:\n\n---\n\n### 1. **Common Biological Pathways**\nThe genes in this module are primarily associated with **immune response pathways** and **DNA damage repair mechanisms**. \n- **ADA (Adenosine Deaminase)** is critical for purine metabolism and immune function, as it prevents the accumulation of toxic deoxyadenosine, which can impair lymphocyte development and function.\n- **PARP3 (Poly(ADP-Ribose) Polymerase 3)** and **MAD2L2 (Mitotic Arrest Deficient 2 Like 2)** are involved in DNA repair and genomic stability. PARP3 participates in the repair of DNA double-strand breaks, while MAD2L2 is associated with error-prone DNA repair processes.\n- **MICA (MHC Class I Polypeptide-Related Sequence A)** and **LILRB1 (Leukocyte Immunoglobulin-Like Receptor B1)** are key players in immune regulation. MICA is a stress-induced ligand recognized by natural killer (NK) cells, while LILRB1 is an inhibitory receptor that modulates immune cell activity.\n- **BATF (Basic Leucine Zipper ATF-Like Transcription Factor)** and **TNFSF13B (Tumor Necrosis Factor Superfamily Member 13B)** are involved in cytokine signaling and immune cell differentiation. BATF regulates T-cell differentiation, while TNFSF13B (also known as BAFF) is critical for B-cell survival and maturation.\n- **CD226 (DNAX Accessory Molecule-1)** is a co-stimulatory molecule involved in NK and T-cell activation.\n- **C17orf99** is a less characterized gene but has been implicated in immune modulation and inflammation.\n\n---\n\n### 2. **Cellular Processes Involved**\nThe genes in this module are involved in several key cellular processes:\n- **Immune Regulation**: ADA, MICA, LILRB1, CD226, BATF, and TNFSF13B are central to immune cell activation, differentiation, and tolerance. These genes collectively regulate the balance between immune activation and suppression.\n- **DNA Repair and Genomic Stability**: PARP3 and MAD2L2 are critical for maintaining genomic integrity by repairing DNA damage and ensuring proper cell cycle progression.\n- **Cell Signaling and Communication**: MICA, LILRB1, and CD226 mediate cell-cell interactions, particularly in the context of immune surveillance and tumor recognition.\n- **Transcription Regulation**: BATF acts as a transcription factor that modulates gene expression in immune cells, influencing their functional states.\n\n---\n\n### 3. **Potential Tissue/Cell Type Specificity**\nThis gene module is likely to be highly expressed in **immune-related tissues and cell types**, including:\n- **Lymphoid tissues** (e.g., lymph nodes, spleen) and **bone marrow**, where immune cell development and maturation occur.\n- **Peripheral blood mononuclear cells (PBMCs)**, including T cells, B cells, and NK cells, which express genes like MICA, LILRB1, CD226, and BATF.\n- **Tumor microenvironments**, where MICA and LILRB1 play roles in immune evasion and tumor surveillance.\n- **Epithelial tissues**, where PARP3 and MAD2L2 may contribute to DNA repair in response to environmental stressors.\n\n---\n\n### 4. **Disease Associations**\nThe genes in this module are associated with a variety of diseases, particularly those involving immune dysregulation and cancer:\n- **Autoimmune Diseases**: ADA deficiency causes severe combined immunodeficiency (SCID), while TNFSF13B is linked to systemic lupus erythematosus (SLE) and rheumatoid arthritis.\n- **Cancer**: MICA and LILRB1 are implicated in tumor immune evasion, while PARP3 and MAD2L2 are associated with cancer progression due to their roles in DNA repair and genomic instability.\n- **Infectious Diseases**: CD226 and MICA are involved in viral recognition and clearance, making them relevant to infectious disease outcomes.\n- **Inflammatory Disorders**: BATF and TNFSF13B are associated with chronic inflammation and inflammatory bowel disease (IBD).\n\n---\n\n### 5. **Functional Relationships Between Genes**\nThe genes in this module exhibit strong functional relationships, particularly in the context of immune regulation and DNA repair:\n- **Immune Activation vs. Suppression**: MICA and CD226 promote immune activation by engaging NK and T cells, while LILRB1 acts as an inhibitory receptor to dampen immune responses. This balance is critical for maintaining immune homeostasis.\n- **DNA Repair and Immune Crosstalk**: PARP3 and MAD2L2, while primarily involved in DNA repair, may also influence immune responses by modulating genomic stability and cell survival in immune cells.\n- **Cytokine Signaling and Transcription**: BATF and TNFSF13B work in concert to regulate cytokine production and immune cell differentiation, linking transcriptional regulation to immune function.\n- **Metabolic Regulation**: ADA connects immune function to cellular metabolism by regulating purine levels, which are critical for lymphocyte proliferation and function.\n\n---\n\n### Summary\nThis gene module represents a functionally cohesive network of genes involved in immune regulation, DNA repair, and cellular signaling. The interplay between these genes suggests a critical role in maintaining immune homeostasis, responding to DNA damage, and modulating immune responses in diseases such as cancer, autoimmune disorders, and infections. The tissue-specific expression of these genes in immune-related tissues further underscores their importance in immune surveillance and disease pathogenesis.'
The gene module consisting of ADA, C17orf99, PARP3, MICA, MAD2L2, BATF, CD226, TNFSF13B, and LILRB1 represents a functionally diverse yet interconnected group of genes involved in immune regulation, DNA repair, and cellular signaling. Below is a detailed functional interpretation of this gene module:
1. Common Biological Pathways
The genes in this module are primarily associated with immune response pathways and DNA damage repair mechanisms.
ADA (Adenosine Deaminase) is critical for purine metabolism and immune function, as it prevents the accumulation of toxic deoxyadenosine, which can impair lymphocyte development and function.
PARP3 (Poly(ADP-Ribose) Polymerase 3) and MAD2L2 (Mitotic Arrest Deficient 2 Like 2) are involved in DNA repair and genomic stability. PARP3 participates in the repair of DNA double-strand breaks, while MAD2L2 is associated with error-prone DNA repair processes.
MICA (MHC Class I Polypeptide-Related Sequence A) and LILRB1 (Leukocyte Immunoglobulin-Like Receptor B1) are key players in immune regulation. MICA is a stress-induced ligand recognized by natural killer (NK) cells, while LILRB1 is an inhibitory receptor that modulates immune cell activity.
BATF (Basic Leucine Zipper ATF-Like Transcription Factor) and TNFSF13B (Tumor Necrosis Factor Superfamily Member 13B) are involved in cytokine signaling and immune cell differentiation. BATF regulates T-cell differentiation, while TNFSF13B (also known as BAFF) is critical for B-cell survival and maturation.
CD226 (DNAX Accessory Molecule-1) is a co-stimulatory molecule involved in NK and T-cell activation.
C17orf99 is a less characterized gene but has been implicated in immune modulation and inflammation.
2. Cellular Processes Involved
The genes in this module are involved in several key cellular processes:
Immune Regulation: ADA, MICA, LILRB1, CD226, BATF, and TNFSF13B are central to immune cell activation, differentiation, and tolerance. These genes collectively regulate the balance between immune activation and suppression.
DNA Repair and Genomic Stability: PARP3 and MAD2L2 are critical for maintaining genomic integrity by repairing DNA damage and ensuring proper cell cycle progression.
Cell Signaling and Communication: MICA, LILRB1, and CD226 mediate cell-cell interactions, particularly in the context of immune surveillance and tumor recognition.
Transcription Regulation: BATF acts as a transcription factor that modulates gene expression in immune cells, influencing their functional states.
3. Potential Tissue/Cell Type Specificity
This gene module is likely to be highly expressed in immune-related tissues and cell types, including:
Lymphoid tissues (e.g., lymph nodes, spleen) and bone marrow, where immune cell development and maturation occur.
Peripheral blood mononuclear cells (PBMCs), including T cells, B cells, and NK cells, which express genes like MICA, LILRB1, CD226, and BATF.
Tumor microenvironments, where MICA and LILRB1 play roles in immune evasion and tumor surveillance.
Epithelial tissues, where PARP3 and MAD2L2 may contribute to DNA repair in response to environmental stressors.
4. Disease Associations
The genes in this module are associated with a variety of diseases, particularly those involving immune dysregulation and cancer:
Autoimmune Diseases: ADA deficiency causes severe combined immunodeficiency (SCID), while TNFSF13B is linked to systemic lupus erythematosus (SLE) and rheumatoid arthritis.
Cancer: MICA and LILRB1 are implicated in tumor immune evasion, while PARP3 and MAD2L2 are associated with cancer progression due to their roles in DNA repair and genomic instability.
Infectious Diseases: CD226 and MICA are involved in viral recognition and clearance, making them relevant to infectious disease outcomes.
Inflammatory Disorders: BATF and TNFSF13B are associated with chronic inflammation and inflammatory bowel disease (IBD).
5. Functional Relationships Between Genes
The genes in this module exhibit strong functional relationships, particularly in the context of immune regulation and DNA repair:
Immune Activation vs. Suppression: MICA and CD226 promote immune activation by engaging NK and T cells, while LILRB1 acts as an inhibitory receptor to dampen immune responses. This balance is critical for maintaining immune homeostasis.
DNA Repair and Immune Crosstalk: PARP3 and MAD2L2, while primarily involved in DNA repair, may also influence immune responses by modulating genomic stability and cell survival in immune cells.
Cytokine Signaling and Transcription: BATF and TNFSF13B work in concert to regulate cytokine production and immune cell differentiation, linking transcriptional regulation to immune function.
Metabolic Regulation: ADA connects immune function to cellular metabolism by regulating purine levels, which are critical for lymphocyte proliferation and function.
Summary
This gene module represents a functionally cohesive network of genes involved in immune regulation, DNA repair, and cellular signaling. The interplay between these genes suggests a critical role in maintaining immune homeostasis, responding to DNA damage, and modulating immune responses in diseases such as cancer, autoimmune disorders, and infections. The tissue-specific expression of these genes in immune-related tissues further underscores their importance in immune surveillance and disease pathogenesis.