Identication of hepatocellular carcinoma risk using a novel prognostic model based on apoptosis-related genes

Dysregulation of the balance between proliferation and apoptosis is the basis in human hepatocarcinogenesis. There is a possible association of apoptosis dysregulation with poor prognosis in many malignant tumors, such as hepatocellular carcinoma (HCC). However, the prognostic effect of Apoptosis-related genes (ARGs) on HCC is still unclear. treatment of HCC patients.


Introduction
Liver cancer has been ranked sixth among the most common tumors worldwide and the fourth leading cause of cancer-related mortality [1]. Among the frequent primary liver cancer, hepatocellular carcinoma (HCC) accounts for approximately 75% cases [1]. Despite the advancements in diagnosing, techniques and treatment of HCC, the survival of the patients worsens following the high rate of recurrence and metastasis [2][3][4]. TNM staging system is a traditional method for prognostic prediction, lacks performance accuracy in its clinical application due to substantive variations in HCC clinical outcomes [5]. Over the past decades, serum alpha-fetoprotein (AFP) has been the only biomarker available for detecting and predicting prognosis, through with low sensitivity limits its clinical utility [6]. Therefore, identi cation of a novel prognostic biomarker and an advanced prognostic model for HCC patients is of paramount importance.
Bioinformatics analysis had been extensively adopted over the past few years as an effective tool in exploring functions of numerous differentially expressed genes (DEGs) and determining the complexity of HCC occurrence and development [7,8]. Hub genes and pathways associated with pathogenesis and prognosis of HCC via a series of bioinformatics analyses were established by Meng et al [9]. However, these studies usually ignore clinical information such as sex, age, grade, and stage of tumors. It may be very innovative to construct a prognostic model combining patient's gene expression level and clinical information. Hepatocarcinogenesis develops following the imbalance between proliferation and apoptosis [10]. Results of a recent study had revealed that the spindle and kinetochore-related complex subunit 3 (SKAT3) overexpressed in HCC potentially inhibits p53 activation by binding to cyclindependent kinase 2 (CDK2), before impeding cell apoptosis and promoting cancer cell proliferation [11].
Besides, several biomolecules may in uence HCC prognosis by regulating apoptosis-related genes or apoptosis-related pathways [12][13][14]. Association of the haplotype of the apoptosis-related gene CDKN1B with the prognosis of HCC patients who underwent surgical resection were reported by Yu et al [15]. CCT/ACT haplotype patients had lower overall survival rates than those with common ACT/CCT haplotype. Thus, ARGs can potentially assess HCC prognosis.
Herein, based on the gene expression and clinical characteristics data obtained from the Cancer Genome Atlas (TCGA) database, we determined the ARG related to the prognosis and develop a prognostic prediction model. The model potentially calculates the risk score to predict and evaluate the prognosis of HCC patients.

Data collection
The mRNA expression data and clinical information of HCC patients were obtained from TCGA database (https://cancergenome.nih.gov/). The clinical information includes age, gender, TNM stage, T stage, N stage, M stage, and histological grade. A total of 161 apoptosis-related genes (ARGs) were acquired from the gene set "HALLMARK_APOPTOSIS" in the Molecular Signatures Database v7.1 in GSEA.

Gene Set Enrichment Analysis and Differentially Expressed ARGs
Gene set with prominent differences between in HCC and normal samples were explored by GSEA.
Subsequently, the limma package and the Wilcoxon signed-rank test in R software 3.6.2(|logFC| > 1, FDR < 0.05) were used to show the signi cantly differentiated expressed ARGs in mRNA expression pro les of HCC cohort. The pheatmap and ggpubr packages in R software were used to develop volcano plots, heatmaps, and box plots.
Functional enrichment, KEGG pathway and PPI network analysis Gene ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed to explore the potential biological functions of ARG participation before visualized through R software with packages such as ggplot2, DOSE, clusterPro ler, enrichplot, GOplot, digest, etc. Interactions among the selected ARGs were determined through protein-protein interaction (PPI) network from the STRING database (http://www.string-db.org/) and visualized by Cytoscape software.

Establishment of Prognostic risk model based on ARGs
Univariate and multivariate Cox regression analyses were conducted to identify the prognosis-associated ARGs in HCC. A prognosis-associated prediction formula acquired from multivariate Cox regression was used to construct a prognostic model using package "glmnet" in R. According to the prognostic model, the Kaplan-Meier (K-M) method was performed to analyze the survival rate of the high-and the low-risk groups. Subsequently, we use the area under the ROC curve (AUC), and KEGG enrichment analysis to assess the predictive value of the prognostic model. Finally, the R package (rms) was used to develop a risk model-based nomogram for predicting the prognosis of HCC patients.

Statistical Analysis
The gene expression data were normalized by log2 transformation. Thereafter, the statistical analyses and plots were performed and constructed using R software 3.6.2 (https://www.r-project.org/) and Perl language packages. Respectively, p-value < 0.05 was considered statistically signi cant.

Acquisition of apoptosis-related genes set and GSE analysis
Following the search in the Molecular Signatures Database v7.1 in GSEA, 161 ARGs was identi ed from the gene set "HALLMARK_APOPTOSIS". Thereafter, the 161 ARGs in this gene set was further assessed using GSE analysis to ascertain the presence of biological signi cance in HCC. Results in (Fig. 1a) showed this ARG set was signi cantly differentially expressed between HCC and normal specimens.

Identi cation of differentiated expressed ARGs
The mRNA sequence data of 374 cases of HCC and 50 cases of normal tissue were obtained from TCGA.
The limma package and the Wilcoxon signed-rank test in R (|logFC| > 1, FDR < 0.05) were used to screen the differentiated expressed ARGs in HCC and non-tumor samples. In this study, 43 and 8 ARGs were signi cantly upregulated and downregulated, respectively. In addition, their results were plotted by volcano plot, heatmap, and box plot ( Fig. 1(b-d)).

Functional enrichment and PPI network analysis of differentiated expressed ARGs
To further study the main biological functions and signi cant pathways of these 51 identi ed genes, the GO enrichment and KEGG pathway enrichment analysis were performed. The plotted results are shown in (Fig. 2). The 51 identi ed ARGs were involved in the pathways related to cellular apoptosis ( Fig. 2a and  2b). According to KEGG enrichment analysis results, the 51 ARGs possible contribute to platinum drug resistance and transcriptional misregulation in cancer as well as associated with some oncogenic pathways such as MAPK, P53, TNF, PI3K-Akt signaling pathways (Fig. 2c). The STRING online tool was used to establish a PPI network to explore the interactions among the proteins coded by ARGs. Results observed using the Cytoscape software (Fig. 2d).

Association of ARGs with HCC patient's survival and prognosis
Univariate Cox regression analyses were performed on both mRNA expressional and corresponding clinical data of 51 selected ARGs to identify their prognosis-associated ARGs in HCC (Fig. 3a). There were 20 upregulated ARGs and 1 downregulated ARG which were statistically signi cant. These 21 identi ed genes were subsequently analyzed by multivariate Cox regression (p < 0.05) to determine ARGs association with the prognosis of patients and acquire corresponding regression coe cients. Following the multivariate Cox regression, ve prognosis-related ARGs were screened which contained PPP2R5B, SQSTM1, TOP2A, BMF, and LGALS3. Expression levels in normal and HCC specimens were further compared to establish the prognostic value of these 5 genes. The 5 identi ed ARGs in HCC specimens had signi cantly higher expression levels than normal specimens as indicated by heatmap and boxplot ( Fig. 3b and 3c). Besides, the K-M curve was constructed from the comparisons of the survival rates differences in high-and low-expressed groups in the identi ed ARGs. Interestingly, the high expression levels in the 5 identi ed ARGs indicated a low survival rate (Fig. 3d).
Simultaneously, we revealed that these 5 genes were signi cantly upregulated in HCC from the immunohistochemical results of the Human Protein Atlas (HPA) database (Fig. 4).

Establishment of prognostic risk signature based on ARGs
We combined the expression level of ARGs and the regression coe cients of multiple Cox regression analysis to summarize the risk scoring formula. The risk score = (0.385327 * expression value of PPP2R5B) + (0.2787 * expression value of SQSYM1) + (0.152062 * expression value of TOP2A) + (0.172177 * expression value of BMF) + (0.110211 * expression value of LGALS3). All genes had a positive coe cient indicating that the high expressions of identi ed genes were negatively associated with prognosis. The risk score in HCC patients was calculated by this formula, and these patients were divided into high-and low-risk groups based on the median risk score as a cutoff (Fig. 5). The plotted heatmap exhibit the 5 ARGs expression levels which showed that patients in the same group had distinct expression levels (Fig. 5a). Patients scores were ranked in ascending order (Fig. 5b) and their survival time displayed in the scatterplot (Fig. 5c) revealed that the low-risk patients had a longer survival time and higher survival rate than those with high-risk.
Survival analysis of high-and low-risk groups were performed to further illustrate the correlation between the risk score and the prognosis of patients. First, we demonstrated the prognostic signi cance of the risk score using K-M curves and the log-rank method as shown in (Fig. 6). Patients in low-risk groups had a signi cantly high two-year or ve-year survival probability (p < 0.0001; Fig. 6) compared with high-risk groups, revealing a worse prognosis. Independent prediction of HCC prognosis by the risk score Both univariate and multivariate Cox regression analyses were used to compare the predictive value of the prognostic model and other clinical variables on prognosis. The risk score was established as an independent prognosis predictor in patients with HCC ( Fig. 7a and 7b). On the contrary, clinical variables were not independent prognosis predictor as shown in the result of multivariate cox regression analysis (p > 0.05; Fig. 7b). Moreover, the ROC curve was plotted to validate the predictive value of the risk score (Fig. 7c). The curves demonstrated that the AUC of the risk score was 0.741 as the highest value than that of the clinical variables. The results suggested that the risk score had a higher predictive value than the clinical variables.

KEGG enrichment analysis between different risk patients
The KEGG analysis was utilized to explore the potential biological pathways which were associated with 5 ARGs. The results showed that the major 5 correlated pathways were apoptosis, cell cycle, mTOR signaling pathway, WNT signaling pathway, and phosphatidylinositol signaling system (Fig. 7d).

Construction of a nomogram to predict the prognosis of HCC patients
To better apply the risk score to predict the prognosis of HCC patients, we combined the risk score with the corresponding clinical variables to construct a nomogram to predict the OS of patients at 1, 2, and 3 years. Given the risk scores and clinical variables, an average point for a patient can be established and extrapolated to determine the 1-, 2-and 3-year OS among the HCC patients (Fig. 8).
The correlations between the risk score/ 5 ARGs and clinical variables Based on the gene expression and corresponding clinical data acquired from the TCGA database, we analyzed the correlations between these clinical variables and the risk score formed by 5 ARGs. The results indicated that the risk score was correlated with the stage; BMF was associated with age, gender, grade, and stage; PPP2R5B and LGALS3 were related to the stage; SQSTM1 was associated with age and gender; TOP2A was connected with grade and stage (Fig. 9).

Discussion
HCC is the most common type of malignancy with a high recurrence rate and worse prognosis [16,17]. Studies have shown limitations in achieving early diagnosis and treatment of HCC [18]. Therefore, the identi cation of a prognosis-related biomarker and the development of a prognostic model is of critical importance for HCC patients. Currently, bioinformatics analysis plays an increasingly signi cant role in identifying therapeutic targets for diagnosis, treatment, and prognosis of numerous tumors [19]. The process of apoptosis plays a crucial role in liver tumor development and regeneration [12,20]. Insu cient apoptosis potentially leads to occurrence and development of liver tumors [21].
In this work, we developed a novel prognosis-predictive model for HCC based on the expression of ARGs. Based on the 161 ARGs list from the GSEA and the data from the TCGA database, 43 signi cantly upregulated ARGs and 8 signi cantly downregulated ARGs in HCC specimens compared with normal specimens were screened out. These 51 genes mainly participated in the pathways associated with cellular apoptosis according to GO enrichment analysis. KEGG analyzes revealed pathways correlated with these 51 genes which included MAPK, P53, TNF, PI3K-Akt signaling pathways. Besides, ve prognosis-related ARGs were screened after the multivariate Cox regression which contained PPP2R5B, SQSTM1, TOP2A, BMF, and LGALS3. The high expressions having these ve ARGs were negatively associated with prognosis. Based on this, we designed a risk model that considered an independent prognostic model of HCC with multivariate regression analysis. The predictive value of the model was con rmed by performing the K-M curve and ROC curve that showed the highest prognostic predictive power of the model with other clinical features of HCC patients. Therefore, by combining the risk score and clinical features as the basis of the nomogram, facilitate e ciency in predicting the prognosis of HCC patients.
The potential impact of these ve hub ARGs on the progression of HCC were earlier observed. For example, Topoisomerase II alpha (TOP2A), was identi ed as a potential biomarker for cancer therapy in ovarian cancer, colon cancer, pancreatic adenocarcinoma and HCC [9,19,22,23]. However, TOP2A was up-regulated in HCC [24]. Studies have proved that overexpressed TOP2A in HCC leads to a worse prognosis and its reactive agents has a potential therapeutic effect on HCC patients [25]. SQSTM1, an autophagy-related protein, degrades during autophagy. Moreover, autophagy was demonstrated to suppress spontaneous tumorigenesis in the liver [26]. Besides, BMF, one of the Bcl-2 family members, promotes apoptosis by inactivating pro-survival Bcl-2-like proteins via BH3 domain following its activation by stress signals [27]. Laura and Xinping et al [28,29] revealed a close relationship between BMF and activated caspase-3as well as apoptosis inhibition using miR-221inhibits by targeting BMF in HCC and ovarian cancer cells. Besides, BMF inhibition promotes survival of multiple myeloma [30].
LGALS3 plays a signi cant role in the progression and metastasis of colon cancer, acute myeloid leukemia, melanoma and pituitary tumors [31][32][33][34]. Recent research had indicated that LGALS3 enhanced the tumorigenesis and metastasis of HCC cells via β-catenin signaling [35]. PPP2R5B (B56β) as the regulatory subunit of PP2A can in uence cell growth, survival, and metabolism [36]. PPP2R5B mutation is hypothesized to cause human overgrowth [37], though its role in HCC had remained elusive.
In summary, the present study established a 5-gene risk model and constructed a nomogram for predicting outcomes of HCC patients for classi cation in clinical practice. Moreover, the study results provided an advanced survival prediction tool for HCC patients and revealed the association between ARGs and HCC that can further be con rmed by corresponding experimental studies.

Declarations
Availability of data and meterials All the data and materials are available.

Ethics approval and consent participate
There were no cell, tissue, or animal studies. No ethical requirements are involved.

Consent for publication
Not applicable.

Con icts of Interest
The authors declare that they have no competing interests.
Authors' contributions RL and GW designed the study; RL collected data and wrote the manuscript; GW performed the data analysis and drew gures and tables; RL, GW, and DB reviewed and revised the manuscript. Besides, the organization, revision and submission of this manuscript were completed by DB. Final draft were read and approved by all authors.

Funding
This project was funded by The National Natural Science Foundation of China (No. 81871909), "13th ve-year Plan" Science and Education strong Health Project leading personnel of Yangzhou (LJRC20181; YZCXTD201801), Provincial-level discipline leader of the NJPH (DTRC201809), Research Funds for Tian Qing Liver Diseases (TQGB20200180).  Figure 4 Boxplot and immunohistochemical results of the expression levels of 5 screened ARGs between HCC and para-carcinoma tissues according to the Human Protein Atlas (HPA) database.