- Open Access
Determination of a six-gene prognostic model for cervical cancer based on WGCNA combined with LASSO and Cox-PH analysis
World Journal of Surgical Oncology volume 19, Article number: 277 (2021)
This study aimed to establish a risk model of hub genes to evaluate the prognosis of patients with cervical cancer.
Based on TCGA and GTEx databases, the differentially expressed genes (DEGs) were screened and then analyzed using GO and KEGG analyses. The weighted gene co-expression network (WGCNA) was then used to perform modular analysis of DEGs. Univariate Cox regression analysis combined with LASSO and Cox-pH was used to select the prognostic genes. Then, multivariate Cox regression analysis was used to screen the hub genes. The risk model was established based on hub genes and evaluated by risk curve, survival state, Kaplan-Meier curve, and receiver operating characteristic (ROC) curve.
We screened 1265 DEGs between cervical cancer and normal samples, of which 620 were downregulated and 645 were upregulated. GO and KEGG analyses revealed that most of the upregulated genes were related to the metastasis of cancer cells, while the downregulated genes mostly acted on the cell cycle. Then, WGCNA mined six modules (red, blue, green, brown, yellow, and gray), and the brown module with the most DEGs and related to multiple cancers was selected for the follow-up study. Eight genes were identified by univariate Cox regression analysis combined with the LASSO Cox-pH model. Then, six hub genes (SLC25A5, ENO1, ANLN, RIBC2, PTTG1, and MCM5) were screened by multivariate Cox regression analysis, and SLC25A5, ANLN, RIBC2, and PTTG1 could be used as independent prognostic factors. Finally, we determined that the risk model established by the six hub genes was effective and stable.
This study supplies the prognostic value of the risk model and the new promising targets for the cervical cancer treatment, and their biological functions need to be further explored.
Cervical cancer is one of the most common tumors in gynecology with high incidence and mortality rate, ranking fourth in the world for common female malignancies [1, 2]. The prognosis of patients with early cervical cancer is favorable, and the 5-year survival rate can reach 90%, while the survival rate of advanced patients, especially those with distant metastasis or recurrence, is significantly decreased [3, 4]. Clinically, traditional treatment methods for cervical cancer still have certain drawbacks. For example, surgical treatment is only suitable for early-stage patients, and postoperative radiotherapy will cause irreversible damage to the ovaries and uterus [5, 6]. Hence, emerging therapies such as targeted therapy and immunotherapy have become a hot spot in cervical cancer research , and new diagnostic molecular markers and therapeutic targets are still to be discovered.
The identification of tumor-related genes is an important basis for the study of tumor pathogenesis and the formulation of prevention and treatment measures . Understanding cervical cancer at the gene level provides important clues for etiology research, early diagnosis, and treatment . During the last two decades, the advances of markers screened by high throughput genome sequencing and bioinformatics analysis received more and more attention . Bioinformatics analysis based on gene chip is a common method to detect the difference of RNA expression between different samples on a high-throughput platform, which can screen out differentially expressed genes (DEGs) in a large amount of data [11,12,13]. Analyses of the microarray data from the public databases, such as The Cancer Genome Atlas (TCGA) database, would provide valuable clues for the investigation of various diseases . The phenotypic variation of complex life is often not caused by a single gene, but by the interaction of multiple genes in the gene network . Therefore, the analysis of gene function, pathway and interaction is the basis of molecular targeted therapy.
Weighted gene co-expression network analysis (WGCNA) can be used to find clusters (modules) of highly related genes, and associate modules with external sample features, thus to identifying hub genes that can be used as therapeutic targets . Until now, WGCNA has been successfully applied in various biological fields, including cancer [17,18,19]. Horvath et al. found the key pathogenic gene ASPM for glioblastoma using WGCNA, which has become a key gene for related treatment . However, there is still no study on identifying key nodules and genes in cervical cancer using the WGCNA method. Herein, we used WGCNA for the first time to identify key modules related to cervical cancer. Based on the genes contained in these key modules, we obtained a set of genes as prognostic biomarkers using univariate Cox regression analysis, in combination with Cox proportional hazard (PH) model based on LASSO estimation. We expected to provide more powerful biomarkers for the diagnosis and prognosis of patients with cervical cancer.
Materials and methods
The gene expression information of cervical cancer was downloaded from TCGA and GTEx databases. Ten normal samples were obtained from the GTEx database as well as 3 normal samples and 306 tumor samples were obtained from the TCGA database. Totally, 13 normal samples and 306 tumor samples were obtained from the two databases for subsequent analysis.
Screening for DEGs
Thirteen normal samples and 306 tumor samples offered from TCGA and GTEx databases were analyzed for DEGs. We used the limma package in R to obtain differentially expressed genes with FDR < 0.05 and |logFC| > 1 as screening conditions. The final results are represented by a volcano map and heat map.
GO function annotation and KEGG enrichment analysis
GO function annotation was carried out for the differentially expressed genes in Database for Annotation Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/), including biological process (BP), cell composition (CC), and molecular function (MF). Simultaneously, KEGG function enrichment analysis was carried out to determine the main biological functions and pathways affected by the differentially expressed genes. P < 0.05 indicated that the difference was statistically significant.
WGCNA package (version 1.61) was used to identify the cervical cancer-related modules according to the previous description . In brief, scale-free topology criterion was used to calculate the soft threshold. The optimal soft threshold was selected and the minimum module size was set as 30 genes. The modules were identified using the dynamic tree cut, and the MEDissThres parameter was set to 0.25. After the modules were correlated with clinical features, the modules with Pearson correlation coefficient > 0.75 were selected and merged to obtain the target module.
Construction of prognostic model
Univariable Cox regression analysis was used to screen out the significant genes. Then, the most useful prognostic marker was selected using Lasso regression analysis, and tenfold cross-validation was utilized to determine the optimal value of penalty parameter λ. The Cox-PH model was used to validate the identified prognosis-related genes. Subsequently, a Multivariate Cox regression model was used to determine the hub genes and independent prognostic factors. The risk curve and survival state of the risk model established by the prognosis-related genes were analyzed. In addition, Kaplan-Meier analysis and receiver operating characteristic (ROC) curve analysis were performed for further verification.
All statistical tests were bilateral, and statistical analyses in this work were performed using R software and SPSS 22.0 (IBM, Armonk, NY, USA). The comparisons were conducted with T test, and P < 0.05 was regards as significant statistically.
Based on the TCGA and GTEx databases, 1265 differentially expressed genes were obtained using the limma package in R with FDR < 0.05 and |logFC| > 1, of which 620 were downregulated and 645 were upregulated in cervical cancer (Fig. 1A, B).
Then, we used GO and KEGG enrichment to analyze the upregulated and downregulated DEGs respectively. Herein, Fig. 2A, B presented the enrichment of downregulated genes. For GO-BP analysis, downregulated DEGs were markedly enriched in mitosis chromosomes segregation, sister chromatid segregation, and organelle fission. These are all related to the cell cycle and sex cell generation. GO-CC analysis indicated that DEGs are mainly concentrated in chromosomal region and immunoglobulin complex, which are also related to cell cycle and immunity. The mainly enriched MF terms are antigen binding, immunoglobulin receptor binding, and cell-cell adhesion, which are all about regulating the immune and cellular interactions. In addition, KEGG analysis revealed that the top significantly enriched pathway for DEGs is the cell cycle. Combined with GO analysis, most of the downregulated genes act on the cell cycle. Hence, we speculated that after the downregulation of these genes, the cell cycle may be disordered. Meanwhile, the results of upregulated genes were displayed in Fig. 2C, D. GO-BP analysis indicated that the top three enriched GO terms were extracellular matrix organization, extracellular structure organization, and cell-substrate adhesion, which are associated with the occurrence and metastasis of cancer. The results of GO-CC analysis were similar to GO-BP analysis that the upregulated DEGs were mainly enriched in collage-containing extracellular matrix and focal adhesion. Also, for GO-MF analysis, upregulated DEGs were mainly enriched in extracellular matrix structural constituents. Moreover, KEGG analysis indicated that the markedly enriched pathways for DEGs are focal adhesion, vascular smooth muscle contraction, ribosome, and extracellular matrix receptor responses. It follows that most of the upregulated DEGs are related to the changes of internal environment as well as the focus and metastasis of cancer cells.
WGCNA network module mining
The DEGs screened from TCGA and GTEX databases were applied for WGCNA. Firstly, we calculate the soft threshold using scale-free topology criterion. The connectivity between genes in the gene network satisfied the scale-free network distribution when soft threshold power of β = 13 (scale-free R2 = 0.9) (Fig. 3A, B). Then, co-expression modules (cut height, ≥0.25) were mined using the phylogenetic tree (Fig. 3C). The modules were analyzed by hierarchical clustering, and the modules on the same branch showed similar gene expression patterns (Fig. 3D). Similar gene modules are then identified and combined, and then 6 co-expression modules (red, blue, green, brown, yellow, and gray) were obtained (Fig. 3E). Subsequently, the gene clustering was visualized (Fig. 3E) and the correlation between modules was analyzed (Fig. 3F). All data revealed that the higher correlation was presented between the blue and red modules as well as the brown and yellow modules.
Additionally, KEGG enrichment analysis was performed for each module (Fig. 4). The genes in the brown module (Fig. 4B) are mainly involved in the cell cycle, DNA replication, and signaling pathways and are associated with multiple cancers. In addition, the brown module has the highest concentration, so we chose the brown module as the research object.
Establishment of prognostic model based on independent prognostic gene
Univariate cox regression analysis was performed to obtain the meaningful genes in brown module. As presented in Fig. 5A, 22 genes were considered to be associated with prognosis. The immunization scoring model of the training cohort was established using LASSO Cox regression analysis (Fig. 5B, C). Eight prognostic genes (SLC25A5, ENO1, RNASEH2A, ANLN, RIBC2, CHAF1A, PTTG1, and MCM5) were screened and PH (proportional hazards) test was performed on them. As presented in Fig. 6A–H, the risk ratio of gene expression changed little over time and was independent of survival time. Hence, all of the above genes can be used for risk regression analysis and modeling.
Then, a risk model with six genes (SLC25A5, ENO1, ANLN, RIBC2, PTTG1, and MCM5) as influencing factors was obtained by multivariate Cox regression analysis (Fig. 7A). Wherein, the P values of SLC25A5, ANLN, RIBC2, and PTTG1 were all less than 0.05 and could be used as independent prognostic factors. Furthermore, we assessed the risk model with six hub genes. First we analyzed the risk curve and survival state. As shown in Fig. 7B, C, the high risk group presented more deaths and a lower average number of years of survival. Then, we plotted the KM curve and the time-dependent ROC curve. The KM curve indicated that the survival rate of the high-risk group was significantly lower than that of the low-risk group (Fig. 7D). Meanwhile, the ROC curve revealed that 1 year, 5 years, 10 years and 15 years survival rate can be all predicted by using the risk model (Fig. 7E–H). Also, nomogram indicated that the total score of nomogram was helpful to provide a quantitative method for predicting the prognosis of cervical cancer (Fig. 7I). These data illustrated that six genes (SLC25A5, ENO1, ANLN, RIBC2, PTTG1, and MCM5) can be used to establish the risk model, and the model was stable.
For the moment, targeted therapy has become a hot spot in the treatment of cervical cancer . Several biomarkers for cervical cancer have been identified in recent years. For example, Heng Zou et al. reported a novel circ_0018289/miR-183-5p/TMED5 regulatory network as a novel molecular basis underlying the modulation of cervical cancer . Studies provided evidence that targeting IGF1 by miR-186-3p can regulate cervical cancer progression, and miR-125 inhibited cervical cancer progression and development by inhibiting VEGF and PI3K/AKT signaling pathway, providing more insights into the treatment of cervical cancer [2, 24]. However, the biomarkers of targeted therapy for cervical cancer are complex and diverse, which also need to be fully excavated and explored . In this work, we found 1265 DEGs between cervical cancer and normal samples, of which 620 were downregulated and 645 were upregulated. The upregulated DEGs were mainly related to tumor cell metastasis, while the downregulated genes were mainly related to the cell cycle. Moreover, immune cells can successfully distinguish tumor tissue from normal tissue. We excavated six modules using WGCNA, and then analyzed the brown module with the most DEGs and related to multiple cancers. The hub genes such as SLC25A5, ENO1, ANLN, RIBC2, PTTG1, and MCM5 can be used to establish the risk model for CC, and the model was stable. Furthermore, SLC25A5, ANLN, RIBC2, and PTTG1 could be used as independent prognostic factors.
Increasing evidence demonstrated that genomic instability, oncogene activation, aberrant expression of important genes, and alterations of signaling pathways are vital factors and contribute to cancer pathogenesis . WGCNA has been widely utilized for identifying gene modules associated with several diseases and screening potential therapeutic targets . WGCNA clusters highly related genes into different modules to construct co-expression modules, and then analyzes the functions of these modules . For example, KEGG is used to analyze the pathways associated with DEGs in these modules . By WGCNA algorithm, Wenxi Yan et al. identified 5 significantly stable gene modules related to colon cancer and constructed a 12-gene prognostic model via further bioinformatics analysis . In this work, we identified six modules using WGCNA. We found that the genes of the brown module mainly act on the cell cycle, DNA replication and are associated with multiple cancers. In order to identify the prognostic genes in the brown module, we performed univariate Cox-Lasso regression analysis, and finally screened eight genes. Then, six prognostic hub genes, SLC25A5, ENO1, ANLN, RIBC2, PTTG1, and MCM5, were identified by multivariate Cox regression analysis.
SLC25A5 is a by-product of ninucleotide transferase, encoding the transporter ADP/ATP . SLC25A5 deletion can lead to mitochondrial dysfunction and oxidative stress. Hence, the development of B cells may be related to SLC25A5 . Moreover, SLC25A5 is considered to be associated with lymph node metastasis of hepatocellular carcinoma . ENO1 is expressed on the surface of tumor cells and promoted tumor cells invasion, which has diagnostic and prognostic value in many cancers . ANLN, a protein coding gene, is highly expressed in many cancers . Upregulation of ANLN leads to shorter survival time in patients with colorectal cancer  and breast cancer . Importantly, Leilei Xia et al. found that ANLN is a prognostic factor for cervical cancer . RIBC2 is abnormally expressed in renal clear cell carcinoma, breast cancer, and ovarian serous cystadenocarcinoma . Importantly, RIBC2 has been identified as a key gene related to the progression and prognosis of cervical squamous cell carcinoma . PTTG1, an oncogene, has the ability to regulate the self-renewal and epithelial mesenchymal transformation of cancer stem cells . In terms of biological function, PTTG1 is regulated by various lncRNAs and miRNAs and plays a role in the malignant progression of cervical cancer [39,40,41]. MCM5, a minichromosome maintenance protein, is upregulated in cervical cancer and promoted cell proliferation. Also, MCM5 is closely related to clinical stage, lymph node metastasis, distant metastasis, and histological grade . In this paper, we used these six genes to establish an effective and stable risk model for the prognosis of cervical cancer.
In summing, we screened six prognostic hub genes by WGCNA combined with LASSO Cox-PH analysis and established a risk model. We also proved that the risk model is effective and stable. Our results provide a new strategy for targeted therapy of cervical cancer. However, in vitro and in vivo biological function experiments are needed to verify our results.
Availability of data and materials
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Bray F, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
Fu K, et al. MiR-125 inhibited cervical cancer progression by regulating VEGF and PI3K/AKT signaling pathway. World J Surg Oncol. 2020;18(1):115.
Cohen PA, et al. Cervical cancer. Lancet. 2019;393(10167):169–82.
Ji Y, Wang H. Prognostic prediction of systemic immune-inflammation index for patients with gynecological and breast cancers: a meta-analysis. World J Surg Oncol. 2020;18(1):197.
Andrae B, et al. Screening and cervical cancer cure: population based cohort study. Bmj. 2012;344:e900.
Brucker SY, Ulrich UA. Surgical Treatment of Early-Stage Cervical Cancer. Oncol Res Treat. 2016;39(9):508–14.
Menderes G, et al. Immunotherapy and targeted therapy for cervical cancer: an update. Expert Rev Anticancer Ther. 2016;16(1):83–98.
Tamura G. Alterations of tumor suppressor and tumor-related genes in the development and progression of gastric cancer. World J Gastroenterol. 2006;12(2):192–8.
Ibeanu OA. Molecular pathogenesis of cervical cancer. Cancer Biol Ther. 2011;11(3):295–306.
Fang Z, et al. Identification of a prognostic gene signature of colon cancer using integrated bioinformatics analysis. World J Surg Oncol. 2021;19(1):13.
Zhang Y, Szustakowski J, Schinke M. Bioinformatics analysis of microarray data. Methods Mol Biol. 2009;573:259–84.
Fryer RM, et al. Global analysis of gene expression: methods, interpretation, and pitfalls. Exp Nephrol. 2002;10(2):64–74.
Zhu T. Global analysis of gene expression using GeneChip microarrays. Curr Opin Plant Biol. 2003;6(5):418–25.
Zhu Y, et al. Identification of six candidate genes for endometrial carcinoma by bioinformatics analysis. World J Surg Oncol. 2020;18(1):161.
Kugler KG, et al. Integrative network biology: graph prototyping for co-expression cancer networks. PLoS One. 2011;6(7):e22843.
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559.
Zhai X, et al. Colon cancer recurrence-associated genes revealed by WGCNA co-expression network analysis. Mol Med Rep. 2017;16(5):6499–505.
Ding M, et al. A comprehensive analysis of WGCNA and serum metabolomics manifests the lung cancer-associated disordered glucose metabolism. J Cell Biochem. 2019;120(6):10855–63.
Di Y, et al. Bladder cancer stage-associated hub genes revealed by WGCNA co-expression network analysis. Hereditas. 2019;156:7.
Horvath S, et al. Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target. Proc Natl Acad Sci U S A. 2006;103(46):17402–7.
Lou Y, et al. Characterization of transcriptional modules related to fibrosing-NAFLD progression. Sci Rep. 2017;7(1):4748.
Kumar L, et al. Chemotherapy and targeted therapy in the management of cervical cancer. Curr Probl Cancer. 2018;42(2):120–8.
Zou H, et al. Identification of a novel circ_0018289/miR-183-5p/TMED5 regulatory network in cervical cancer development. World J Surg Oncol. 2021;19(1):246.
Lu X, et al. MiR-186-3p attenuates tumorigenesis of cervical cancer by targeting IGF1. World J Surg Oncol. 2021;19(1):207.
Kori, M. and K. Yalcin Arga, Potential biomarkers and therapeutic targets in cervical cancer: Insights from the meta-analysis of transcriptomics data within network biomedicine perspective. 2018. 13(7): p. e0200717.
He Y, et al. Pan-cancer analysis of m5C regulator genes reveals consistent epigenetic landscape changes in multiple cancers. World J Surg Oncol. 2021;19(1):224.
Wan Q, et al. Co-expression modules construction by WGCNA and identify potential prognostic markers of uveal melanoma. Exp Eye Res. 2018;166:13–20.
Clémençon B, Babot M, Trézéguet V. The mitochondrial ADP/ATP carrier (SLC25 family): pathological implications of its dysfunction. Mol Asp Med. 2013;34(2-3):485–93.
Cho J, et al. Mitochondrial ATP transporter Ant2 depletion impairs erythropoiesis and B lymphopoiesis. Cell Death Differ. 2015;22(9):1437–50.
Lee CF, et al. Genomic-wide analysis of lymphatic metastasis-associated genes in human hepatocellular carcinoma. World J Gastroenterol. 2009;15(3):356–65.
Cappello P, et al. Alpha-Enolase (ENO1), a potential target in novel immunotherapies. Front Biosci (Landmark Ed). 2017;22:944–59.
Wang A, et al. ANLN-induced EZH2 upregulation promotes pancreatic cancer progression by mediating miR-218-5p/LASP1 signaling axis. J Exp Clin Cancer Res. 2019;38(1):347.
Wang G, et al. Overexpression of Anillin (ANLN) is correlated with colorectal cancer progression and poor prognosis. Cancer Biomark. 2016;16(3):459–65.
Zhou W, et al. Knockdown of ANLN by lentivirus inhibits cell growth and migration in human breast cancer. Mol Cell Biochem. 2015;398(1-2):11–9.
Xia L, et al. ANLN functions as a key candidate gene in cervical cancer as determined by integrated bioinformatic analysis. Cancer Manag Res. 2018;10:663–70.
Xiong M, et al. A common variant rs2272804 in the 5'UTR of RIBC2 inhibits downstream gene expression by creating an upstream open reading frame. Eur Rev Med Pharmacol Sci. 2020;24(7):3839–48.
Meng H, et al. Identification of Key Genes in Association with Progression and Prognosis in Cervical Squamous Cell Carcinoma. DNA Cell Biol. 2020;39(5):848–63.
Parte S, et al. PTTG1: a Unique Regulator of Stem/Cancer Stem Cells in the Ovary and Ovarian Cancer. Stem Cell Rev Rep. 2019;15(6):866–79.
Guo XC, et al. The long non-coding RNA PTTG3P promotes growth and metastasis of cervical cancer through PTTG1. Aging (Albany NY). 2019;11(5):1333–41.
Robles-Ramírez Mdel C, et al. A peptide fraction from germinated soybean protein down-regulates PTTG1 and TOP2A mRNA expression, inducing apoptosis in cervical cancer cells. J Exp Ther Oncol. 2012;9(4):255–63.
Li L, et al. Pituitary tumor-transforming gene 1 enhances metastases of cervical cancer cells through miR-3666-regulated ZEB1. Tumour Biol. 2015.
Wang D, et al. The role of MCM5 expression in cervical cancer: Correlation with progression and prognosis. Biomed Pharmacother. 2018;98:165–72.
Consent to publication
Ethics approval and consent to participate
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Li, S., Han, F., Qi, N. et al. Determination of a six-gene prognostic model for cervical cancer based on WGCNA combined with LASSO and Cox-PH analysis. World J Surg Onc 19, 277 (2021). https://doi.org/10.1186/s12957-021-02384-2
- Cervical cancer