Identification of novel antioxidant gene signature to predict the prognosis of patients with gastric cancer

Background Gastric cancer (GC) commonly relates to dismal prognosis and lacks efficient biomarkers. This study aimed to establish an antioxidant-related gene signature and a comprehensive nomogram to explore novel biomarkers and predict GC prognosis. Methods Clinical and expression data of GC patients were extracted from The Cancer Genome Atlas database. Univariate and multivariate Cox analyses were utilized to construct a score-based gene signature and survival analyses were conducted between high- and low-risk groups. Furthermore, we established a prognostic nomogram integrating clinical variables and antioxidant-related gene signature. Its predictive ability was validated by Harrell' concordance index and calibration curves and an independent internal cohort verified the consistency of the antioxidant gene signature-based nomogram. Results Four antioxidant-related genes (CHAC1, GGT5, GPX8, and PXDN) were significantly associated with overall survival of GC patients but only two genes, CHAC1 (HR = 0.803, P < 0.05) and GPX8 (HR = 1.358, P < 0.05), were confirmed as independent factors. A score-based signature was constructed and could act as an independent prognosis predictor (P < 0.05). Patients with lower scores showed significantly better prognosis (P < 0.05). Comprehensive nomogram combining the antioxidant-related gene signature and clinical parameters (age, gender, grade, and stage) was established and effectively predicted overall survival of GC patients [3-year survival AUC = 0.680, C index = 0.665 (95% CI 0.614–0.716)]. The independent internal validation cohort verified the reliability and good consistency of the model [3-year survival AUC = 0.703, C index = 0.706 (95% CI 0.612–0.800)]. Conclusions Innovative antioxidant-related gene signature and nomogram performed well in assessing GC prognoses. This study enlightened further investigation of antioxidant system and provided novel tools for GC patient management. Supplementary Information The online version contains supplementary material available at 10.1186/s12957-021-02328-w.


Background
In recent years, gastric cancer (GC) remains a common cancer worldwide. There are around 27,600 newly diagnosed GC patients and 11,010 GC related deaths in the USA in 2020 [1]. Although recommended life style and combined treatment have helped improve the clinical outcome of GC patients, general 5-year overall survival remains approximately 20% globally [2]. This poor clinical outcome of GC patients is mainly due to the diagnosis at late stages [3]. Therefore, it is urgently needed to find promising biomarkers for screening patients at high risk and build a risk model to evaluate their prognosis to guide clinical practice.
There have been researches exploring biomarkers including gene expression profiles emphasized in GC prognosis, most of which demonstrated that the differentially expressed genes were associated with patients overall survival [4,5]. In addition, more and more studies have tried to establish molecular signatures or combine multiple biomarkers to present a more convincing prediction of GC prognosis [6,7]. Besides, nomograms were developed incorporating these prognostic biomarkers and clinical variables to further improve prediction accuracy [8][9][10].
Researchers have noticed that reactive oxygen species (ROS) and antioxidants participate in carcinogenesis and cancer treatment [11]. Limited ROS can be anti-tumorigenic while excessive levels can be promotive [12]. Similarly, recent studies have found conflicting results about the role of antioxidants in cancer treatment [13,14]. Therefore, more studies are needed to explore its functions. Meanwhile, clinical researchers have made use of antioxidants to develop new therapies for GC or to explain pharmacologic action [15,16], and some of them further studied the expression profiles of the antioxidant-related genes in GC, which might affect the function of ROS and antioxidants [17,18]. Antioxidant-related genes might be promising biomarker candidates and informative to prognostic prediction.
However, relevant studies on antioxidant-related gene signature are few and its prognostic significance in GC remains unexplored. Hence, in this study, based on the data from The Cancer Genome Atlas (TCGA) database, the predictive antioxidant-related genes were identified and a risk model was constructed to evaluate the outcome of GC patients, which also helps enlighten the potential mechanisms of molecular antioxidant in gastric cancer progression and offer more potential targets for the treatment. Furthermore, a comprehensive nomogram on the basis of the antioxidant-related gene signature and clinical variables was built to assess the prognoses of GC patients effectively in clinical practice.

Data collection
Firstly, clinical information of GC patients and the gene expression data were extracted and matched from TCGA database (https://portal.gdc.cancer.gov/). A flow chart was drawn to show all the analysis procedure in this study (Fig. 1).

Screening of the differentially expressed genes
From the website of gene set enrichment analysis (GSEA, https://www.gsea-msigdb.org/gsea/index.jsp), we obtained four antioxidant-related gene sets (antioxidant activity, GO antioxidant activity, GO glutathione catabolic process and GO glutathione metabolic process). Then, under the R environment, gene expression data from TCGA database were screened and proceeded with "limma" R package to select the differently expressed antioxidant-related genes in GC patients [19].

Establishment of the gene signature
With "survival" R package, univariate and multivariate Cox regression analyses were performed to select the genes with independent prognostic value and a linear risk score formula was established. Risk scores of all the GC samples can be calculated as follows: risk parameter = ∑ (expression of gene n × βn) (n represents the number of independent prognostic genes and β represents regression coefficients). All the GC patients were assigned risk scores and by group median risk score, they were subsequently divided into high-or low-risk teams. Log-rank tests and Kaplan-Meier curves of the two groups validated the prognostic significance of the risk score. Furthermore, we conducted overall survival analyses in stratified subgroups to further explore the prognostic ability of risk score by "survival" and "survminer" R package.

Construction and evaluation of the nomogram
A comprehensive nomogram predicting survival probability of GC patients was built by integrating antioxidant-related gene signature and clinicopathologic variables, which was conducted by "rms" R package. Based on regression analyses, the nomogram can predict the 3-and 5-year survival probability of GC patients. To assess its performance, Harrell' C-index, AUC of ROC, and calibration curves were generated. Harrell' C-index is positively related to the accuracy of nomogram and an ideal calibration graph should be close to 45-degree dotted line. Besides, an internal validation from TCGA database was performed to further confirm the feasibility. Bootstrap resampling was used in these activities.

Statistical analysis
Cox analyses aimed to select the variables with independent prognostic value and Kaplan-Meier curve analysis was performed to evaluate clinical significance of risk factors. Based on R software version 4.0.2 (http:// www.R-project.org/) and Excel software (Microsoft Corporation, California), statistical analyses were properly conducted by flexible statistical methods. R packages "limma," "survival," "rms," and "survminer" were utilized for organizing data, Cox analyses, survival analysis, and construction of the nomogram respectively. Besides, "pheatmap," "ggplot2," and "ggpubr" packages were applied for different plots. P < 0.05 was set as statistically significant in most part of our study.

Characteristics of GC patients enrolled in this study
Clinical and transcriptome data of 375 GC and 32 normal cases for subsequent analysis were selected and matched by sample ID after they were extracted separately from the TCGA database. The clinical information of 371 matched cases including variables of age, gender, grade, stage, follow-up time, and survival status and the detailed clinicopathologic features were listed in Table 1.

Differentially expressed antioxidant-related genes between GC and normal tissues
According to the four antioxidant-related gene sets from GSEA, gene expressions of all specimens from TCGA database were estimated and 62 antioxidantrelated genes were differentially expressed (30 downregulated and 32 upregulated) in GC tissues (Supplementary Figure 1). Ranked by |logFC|, eight of the top 10 differentially expressed genes were downregulated (APOA4, GSTA3, GSTA2, GSTA1, GSTM5, GPX3, HBA1, and HBB) and the other two genes
Subsequently, the alternations in the two genes were evaluated by testing the samples from TCGA in cBio-Portal database (http://www.cbioprtal.org). The results showed that 10 (2.67%) of all sequenced cases had alternation. Among them, gene GPX8 contained two amplification and four deep deletion alterations. The CHAC1 gene had 1% mutation, including one amplification, one deep deletion, and two missense mutations ( Fig. 2A). The specific mutation sites were shown in Fig. 2B. No mutation happens inside the domain of GPX8 gene, but there were two mutation sites inside the domain of CHAC1 gene, which could affect its function.
Besides, the expression of gene CHAC1 and GPX8 between GC and normal tissues were explored. Gene CHAC1 expressed significantly lower in GC compared with normal cases (P < 0.05) while gene GPX8 expressed significantly higher in GC cases on the contrary (P < 0.01, Fig. 2C). Furthermore, through other databases, we verified the differential expression of the four antioxidantrelated genes in GC by Oncomine analysis [20] and their prognostic value using the Kaplan-Meier plotter (www. kmplot.com) [21] (Supplementary Figure 2).

Construction of antioxidant-related gene signature as a risk model
On the basis of Cox regression analysis, a two-gene signature was established with the risk score which could be calculated as a linear combination of regression coefficient weighted gene expression level of CHAC1 and GPX8: (− 0.2200 × expression of CHAC1) + (0.3058 × expression of GPX8). Risk scores of all the GC patients were calculated and by group median risk score, they were subsequently divided into high-and low-risk teams (Fig. 3A). Distribution of the risk score and survival time was shown in Fig.  3B, and patients in high-risk group showed poorer prognoses than those in low-risk group. In addition, the expression profiles of CHAC1 and GPX8 were shown in a heatmap (Fig. 3C). The expression of the GPX8 gene was upregulated while the expression of the CHAC1 gene was downregulated, along with increasing risk score. Furthermore, a receiver operator characteristic (ROC) curve was drawn which could evaluate the performance of the risk model (Fig. 3D). The area under the curve (AUC) was 0.719, indicating good sensitivity and specificity of the score-based risk model in predicting the prognosis of GC patients. And in overall survival analysis, patients with lower risks were substantiated to have better prognoses by the Kaplan-Meier survival curves and log-rank tests (P < 0.05, Fig. 3E).

Validation of prediction ability of the two-gene signature
Univariate and multivariate Cox analyses then estimated the prognostic value of antioxidant-related gene signature as well as other clinicopathological features of GC  (Fig. 3G).
According to the previous two regression analyses, age, stage, and risk score were independent predictors for overall survival of GC patients, and these results were further confirmed by Kaplan-Meier survival curves (Fig. 4A-D). Patients > 65 years old and those at III-IV stages manifested a poorer survival probability. And patients at T1-2, N0, and M0 had better prognoses (Fig. 4E-G).
Then, further stratified analysis was conducted to confirm the performance of the antioxidant-related gene signature in different subgroups. As shown in the Kaplan-Meier curves (Fig. 5A-N), the two-gene risk model could act as a reliable prognostic predictor for GC patients who were ≤ 65, female, T3-4, and M0 stages by distinguishing patients into high-and low-risk groups.

Construction and validation of a nomogram model
A nomogram model for the evaluation of GC patients OS probability was constructed (Fig. 6A), combing clinicopathological features and the antioxidant-related gene signature. Harrell' concordance index for survival prediction was 0.665 (95% CI 0.614-0.716). And in Fig. 6B, C,

Discussion
In normal cells, antioxidant system helps maintain the appropriate level of reactive oxygen species (ROS) through various signaling pathways [22]. But tumor cells are featured with high levels of ROS, which can modulate pathways and change gene epigenetics, influencing various cellular and molecular processes in tumor cells and microenvironment [12]. Antioxidant proteins are also elevated to reach a new redox balance with ROS in tumor cells and maintain a pro-tumorigenic environment [23]. These suggested that antioxidants and ROS are closely related to the beginning and progression of cancer. Increasing researches have focused on the correlation between antioxidant and GC, and some scholars have demonstrated the significant role of antioxidant in GC development. For example, previous research discovered that exogenous antioxidant alpha-lipoic acid (ALA) mediating the expression of MUC4 gene inhibited proliferation and invasion of GC cells [18]. In xenograft models, GC growth can be significantly suppressed after intratumoral injection of an antioxidative enzyme nicotinamide nucleotide transhydrogenase [24]. Furthermore, increasing researches have focused on the antioxidantrelated genes in signaling pathways of antioxidant system [25]. The expression of these genes might be crucial to GC development and might enlighten diagnosis, evaluation and treatment of GC, which requires more studies. In recent years, instead of the traditional predictive methods like TNM stages and pathological grades, scholars showed interest in novel models to assess the prognosis of cancer patient more efficiently and precisely [26]. Recently, molecular biomarkers like mRNAs have been seen as potential prognosis predictors, implying their clinical significance in researches [27,28]. For instance, expression of MYOZ2 was significantly higher in GC tissues than that in the normal tissues, which might involve in the carcinogenesis of GC [29]. Similarly, excessive level of HBO1 mRNA in GC tissues and its negative correlation with GC patient survival indicated that HBO1 might act as a potential biomarker to predict patient prognosis [30]. Nevertheless, single genes could be affected by multiple factors, and it was insufficient to predict patient prognosis independently based on these individual biomarkers [31,32]. Therefore, gene signature, a statistical model made up of various marker genes, has been utilized to overcome the limitation of consistency and to predict survival outcome on a combined effect [33]. Some scholars have identified and validated prognostic gene signatures of GC and built up a specific score formula to measure the risk, but these signatures had not been widely accepted or put into practice [34]. And studies on antioxidant-related gene signatures of GC are still absent to date. Therefore, in the study, we determined two genes (CHAC1 and GPX8) associated with antioxidant system and unraveled their prognostic value in GC by bioinformatics methods. Different from previous predicting tools, this score-based risk model could act as a more efficient indicator for GC patients OS prediction and could help classification and individualized treatment for clinical application. Kaplan-Meier curves verified that patients with higher risks showed worse prognoses. Furthermore, we established a comprehensive nomogram model to provide a more efficient predicting tool in clinical practice and help make a more accurate assessment of GC patients.
As for the two antioxidant-related genes, derived from a family of Cys-glutathione peroxidase, GPX8 mainly resides in mitochondrial endoplasmic reticulum membranes and it supports the folding of oxidative protein [35,36]. In addition, it can reduce hydrogen peroxide, lipid hydroperoxides, and other damage related to oxidative stress with glutathione (GSH), which was closely associated with carcinogenesis [37,38]. Scholars discovered the GPX8/IL-6/STAT3 axis as an essential pathway in regulating cell aggressiveness of breast cancer [39]. And in GC, expression of GPX8 has been proved to increase in GC patients with worse OS, and it was confirmed to be an independent prognosis predictor [40], in accord with our result. However, its regulatory pathway and cellular functions have not been fully elucidated. CHAC1, a newly discovered enzyme associated with γ-glutamyl cyclotransferase activity, could degrade intracellular GSH, which might cause oxidative stress and contribute to necroptosis and ferroptosis in cancer [41,42]. In previous studies, higher expression of CHAC1 could act as a protective role in accelerating apoptotic death of glioma through various pathways [43], and it was suggested to be included in prognostic prediction to aid the scheme of treatment in breast cancer [44]. Our results also indicated the protective role of CHAC1 and showed significant predictive value. Contradictorily, some scholars found the overexpression of CHAC1 in H. pylori-infected parietal cells could increase the risk of GC [45], but overall, there are few studies and direct evidence illustrating the relationship between CHAC1 and GC. As analyzed above, these two key antioxidant-related enzymes act as important parts in the growth and proliferation of GC and show prognostic value in GC patients. Furthermore, oxidative stress and antioxidant system play a vital part in the tumorigenesis and progression of GC.
In conclusions, an antioxidant-related gene signature was firstly identified, and GC patient prognoses could be quantified by this risk model more efficiently and accurately. Nomogram integrating the gene signature with clinical factors provides an efficient tool in predicting prognosis of GC patients in clinical practice. Our results help enlighten the potential mechanisms of molecular antioxidant system in GC progression and offer more potential biomarkers for early diagnostic and therapeutic targets for GC treatment.