- Open Access
Identifying hepatocellular carcinoma patients with survival benefits from surgery combined with chemotherapy: based on machine learning model
World Journal of Surgical Oncology volume 20, Article number: 377 (2022)
Hepatocellular carcinoma (HCC) is still fatal even after surgical resection. The purpose of this study was to analyze the prognostic factors of 5-year survival rate and to establish a model to identify HCC patients with gain of surgery combined with chemotherapy.
All patients with HCC after surgery from January 2010 to December 2015 were selected from the Surveillance, Epidemiology, and End Results (SEER) database. Univariate and multivariate logistic analysis were used to analyze the prognostic factors of patients, and the risk prediction model of 5-year survival rate of HCC patients was established by classical decision tree method. Propensity score matching was used to eliminate the confounding factors of whether to receive chemotherapy in high-risk group or low-risk group.
One-thousand six-hundred twenty-five eligible HCC patients were included in the study. Marital status, α-fetoprotein (AFP), vascular infiltration, tumor size, number of lesions, and grade were independent prognostic factors affecting the 5-year survival rate of HCC patients. The area under the curve of the 5-year survival risk prediction model constructed from the above variables was 0.76, and the classification accuracy, precision, recall, and F1 scores were 0.752, 0.83, 0.842, and 0.836, respectively. High-risk patients classified according to the prediction model had better 5-year survival rate after chemotherapy, while there was no difference in 5-year survival rate between patients receiving chemotherapy and patients not receiving chemotherapy in the low-risk group.
The 5-year survival risk prediction model constructed in this study provides accurate survival prediction information. The high-risk patients determined according to the prediction model may benefit from the 5-year survival rate after combined chemotherapy.
Liver cancer ranks the fourth in the mortality of malignancy in the world, accounting for about 782,000 deaths each year, of which 85% are hepatocellular carcinoma (HCC) . At present, surgical treatment is the most important curative treatment for patients with HCC, but the recurrence rate after 5 years is more than 50%, and the overall 5-year survival rate is only 18% [2, 3]. So, how can we reduce postoperative recurrence and improve postoperative survival in HCC patients? Recently, adjuvant therapy has been shown to improve survival in patients after HCC surgery. In a study of 200 patients with postoperative HCC, the researchers found that adjuvant transarterial chemoembolization significantly improved disease-free survival in patients with tumor size > 5 cm . In a systematic review of 277 patients after HCC surgery, adjuvant immunotherapy was found to reduce the recurrence rate of the disease . There were also some trials found that antiviral therapy could improve the prognosis of patients with HBV or HCV after HCC surgery [6, 7]. However, the benefit object of the adjuvant therapy is not clear yet, and the indication of adjuvant therapy is still controversial. It can be recognized that how to accurately predict the prognosis and rationally identify patients for adjuvant therapy are important issues that we need to explore in the next step.
In terms of survival prediction of patients with HCC after surgery, a large number of predictive indicators have been explored. However, serum α-fetoprotein (AFP) remains the unique indicator for postoperative prognosis prediction and follow-up in clinical practice, although its predictive efficiency is also limited [8, 9]. The most effective way to improve the accuracy of prediction is to combine multiple indicators and construct prediction model. In this study, in order to establish the classification, we used the decision tree model, which is a prediction tool that uses classification and numerical data to assign samples to specific categories. Unlike models such as artificial neural networks (ANN), threshold and category predictions calculated by decision tree models often have practical explanations that can be used to provide clinicians with intuitive decisions . At the same time, the decision tree model is especially suitable for the small sample of database. Recently, it has been gradually incorporated into tumor staging because it can use selected factors to classify patients into subgroups with different prognosis [11, 12].
Therefore, a decision tree model was constructed based on the clinical information of postoperative HCC patients from the Surveillance, Epidemiology, and End Results (SEER) database, and the survival benefit of chemotherapy was evaluated in high- and low-risk patients identified by this model. The present study may provide a new method and reference for the postoperative management of patients with HCC in the future.
Material and methods
Data acquisition and study design
All patients diagnosed with HCC between January 2010 and December 2015 were downloaded from the SEER database (Fig. 1). We mainly wanted to study the prognosis of adult primary liver cancer with no lymph node involvement and no distant metastasis after hepatectomy. Inclusion criteria are as follows: patients who underwent resection or lobectomy; localized stage; AJCC staging N0, M0, and not TX; and alive or dead due to hepatocellular, there is only one primary tumor, and no benign or borderline tumors were present [13, 14]. Exclusion criteria are as follows: clinical diagnosis only or unknown, reporting source of autopsy only, survival time was 0 month or unknown survival time, and age at diagnosis < 18 [15,16,17]. Endpoint outcome of this study was 5-year cancer-specific death (CSD).
Model construction and validation
One-thousand six-hundred twenty-five eligible patients with HCC were divided into training group and validation group with a 4:1 ratio using block randomization. Risk factors for 5-year CSD were determined by univariate and multivariate logistic analysis. Next, a risk prediction model for 5-year CSD of patients with HCC after surgery was established by using the classical decision tree method. The classical decision tree was based on binary output variables and predictor variables, and all variables input into analysis were optimized for binary classification. If it was a continuous variable, a cutoff value was selected for classification to maximize the purity of the two categories. The reliability of the model was evaluated by receiver operating characteristic curve (ROC). Optimal sensitivity and specificity were considered to determine the cutoff values to identify high- and low-risk patients. Validation group were used to verify the prediction performance of the model.
The decision tree model was constructed by Orange3 software, and the rest results were analyzed by SPSS and R software . Continuous variables were presented as mean ± SD and compared using t-test, and classified variables were compared using χ2 test. Logistic analysis was used for univariate and multivariate analyses. Decision tree method was used for model construction. Area under curve (AUC), F1 score, precision, and recall radio were used for model evaluation. missForest package was used for random interpolation after removing the variables with missing data > 30% . The propensity score-matching (PSM) method was used to correct for significant differences in the sample sizes of the high- and low-risk groups. P < 0.05 was considered statistically significant.
Subjects grouping and clinical characteristics
In accordance with the 4:1 rule, 1625 eligible patients were randomly divided into the training cohort (n = 1300) and the validation cohort (n = 325). There were differences in race and 5-year CSD between the two groups and no differences in age at diagnosis, gender, marital status, grade, AFP level, vascular invasion, tumor size, number of lesions, AJCC_T stage, and whether to receive chemotherapy (Table 1).
Determination of independent risk factors
Univariate (Fig. 2A) and multivariate (Fig. 2B) logistic analyses were conducted in the training group to obtain independent risk factors. Univariate analysis of the clinical parameters showed that marital status, grade, AFP level, vascular invasion, tumor size, number of lesions, and T stage were related to the 5-year CSD of patients. Multivariate analysis showed that marital status, grade, AFP, vascular invasion, tumor size, and number of lesions were independent risk factors for 5-year CSD of patients. We found that married was a good prognostic factor for HCC, and AFP-positive and vascular invasion suggested a poor prognosis. And the lower the degree of differentiation, the larger the tumor volume, and the more the number of tumors, the worse the prognosis.
Construction and verification of decision tree model
The independent risk factors derived from multivariate logistic analysis of the training group were used to construct a risk prediction model for 5-year CSD using a decision tree algorithm. The model constructed is shown in Figs. 3 and 4. Figure 3 shows the results of classifying patients without vascular invasion using the decision tree model. One-hundred seventy-one (16.3%, 171/1047) patients without vascular invasion were at high risk of CSD for 5 years. It can be observed from the figure that tumor size > 5cm is a risk factor for 5-year CSD (32.5%, 129/397), and patients with poorly and undifferentiated stage are high-risk groups for 5-year CSD (77.9%, 74/95). Figure 4 shows the results of classifying patients with vascular invasion using the decision tree model. One-hundred sixty-six (65.6%, 166/253) patients with vascular invasion were at high risk of CSD for 5 years. Consistent with the above results, tumor size > 5cm (55%, 44/80) and poorly and undifferentiated stage (93%, 93/100) are the main risk factors for CSD 5 years after liver cancer surgery. Then, we calculated the calibration curve of the model and found that the model had good fitting ability (Fig. 5A). We compared the ROC (Fig. 5B) of decision tree and logistic regression and found that the decision tree model (AUC = 0.76) had stronger prediction ability than logistic regression (AUC = 0.679). Then, we determined the threshold (threshold = 0.64) of the model according to the precision and recall (Fig. 5C). Patients were classified as high (survival rate ≤ 0.64) and low risk (survival rate > 0.64) according to this threshold. We also calculated the F1 (F1 = 0.836, Fig. 5D) and classification accuracy (classification accuracy = 0.752, Fig. 5E) of the model when the model threshold was 0.64. In the validation set, when the threshold was 0.64, AUC, classification accuracy, precision, recall, and F1 scores were 0.729, 0.757, 0.873, 0.824, and 0.848, respectively (Table 2). According to the model, all patients (n = 1625) with HCC undergoing surgery could be divided into two groups, of which 413 cases were high-risk group and 1212 cases were low-risk group (Additional file 1). These data suggested that the decision tree model had good prediction performance.
Effect of surgery combined with chemotherapy on high-risk and low-risk patients
To further explore the effect of surgery combined with chemotherapy on the prognosis of HCC patients, the high-risk group and low-risk group were further divided into two subgroups according to whether or not they had received chemotherapy. In the high-risk group, there was a significant difference in AFP between surgery alone and surgery combined with chemotherapy. In order to eliminate this confounding factor, we treated with PSM. After PSM correction, there was a significant difference in 5-year CSD between the two groups. The 5-year survival rate of patients treated with surgery alone was 15.5% (11/71), and that of patients treated with surgery combined with chemotherapy was 35.2% (25/71) (Table 3). In the low-risk group, there were significant differences in AFP, lesion, and grade between surgery alone and surgery combined with chemotherapy. We used PSM to eliminate these confounding factors. We found no difference in 5-year CSD between the two groups. These data suggested that surgery combined with chemotherapy can significantly improve the prognosis of HCC patients in the high-risk group, but it has no effect on the prognosis of HCC patients in the low-risk group (Table 4).
The progress of surgical resection, ablation, and liver transplantation has improved the prognosis of HCC patients to some extent, but compared with other common human cancers, the long-term survival rate of HCC patients is still not ideal due to the high recurrence rate and lack of effective adjuvant therapy [20, 21]. Therefore, we must carry out hierarchical management and targeted treatment for postoperative patients with different risk levels in order to improve the long-term survival rate of patients with liver cancer. In this study, we found that tumor size, vascular invasion, AFP level, and number of lesions were independent risk factors for 5-year CSD through univariate and multivariate logistic regression analysis. Married was a good prognostic factor for HCC, and AFP-positive and vascular invasion suggested a poor prognosis. And the lower the degree of differentiation, the larger the tumor volume, and the more the number of tumors, the worse the prognosis. Previous studies have shown that tumor size, vascular invasion, AFP level, and number of lesions may affect the prognosis of patients with HCC, which is consistent with the results of this study [22,23,24]. Interestingly, in this study, it was found that marital status was also an independent risk factor for 5-year CSD. This is in keeping with previous reports that married patients had better 5-year HCC cause-specific survival than did unmarried patients (46.7% vs 37.8%) . Marital status is an important prognostic factor for survival in patients with HCC treated with surgical resection.
There have also been previous reports on the postoperative prognosis model of HCC. Shim et al. established the survival nomogram of postoperative HCC patients (AUC = 0.66) . This study also constructs a logistic regression model (AUC = 0.679). In contrast, the decision tree model (AUC = 0.760) in this study has better prediction performance. It seems to have greater clinical application potential. In the present study, vascular invasion, tumor size, and poor differentiation were the main risk factors for 5-year CSD in HCC patients after surgery, which is in keeping with previous studies [27, 28]. The prognosis of patients with vascular invasion, tumor size > 5cm, or poorly stage is poor. The decision tree prediction model in this study can accurately predict the high-risk group of patients with 5-year CSD after HCC surgery, help to realize patient-specific early diagnosis and treatment, and further improve the prognosis of HCC patients.
In recent years, some studies have found that surgical resection of HCC combined with chemotherapy can improve the postoperative survival rate [29,30,31]. However, there are no clinical guidelines recommending the routine use of surgery combined with chemotherapy for HCC patients because the beneficiaries are still uncertain. In this study, for the high-risk and low-risk patients divided based on the decision tree model, in the high-risk patients, the prognosis was significantly improved after surgery combined with chemotherapy, while in the low-risk patients, there was no significant change in CSD 5 years after surgery combined with chemotherapy. This means that the prognostic model established in this study can provide a reference for guiding the management of postoperative adjuvant chemotherapy.
The data source of this study is SEER database, which is an important resource for practical research in oncology. One-thousand six-hundred twenty-five HCC patients with complete clinical data were included. The characteristic distribution of the data is normal, and the model has good prediction performance in both training set and verification set, which provides a sufficient and reliable basis for further clinical application. However, this study also has some limitations. Because this study is based on a public database, the collection of clinical data is limited by the items provided in the data set, and it is impossible to explore more possible prognostic factors. In addition, the prognostic risk prediction model constructed in this study still needs external validation to further confirm its effectiveness.
The 5-year CSD prediction model based on decision tree algorithm provides accurate prediction information. The high-risk patients determined by the prediction model may benefit from the 5-year survival after surgery combined with chemotherapy. The prediction model is expected to provide reference for postoperative management of patients with HCC in the future.
Availability of data and materials
Publicly available datasets were analyzed in this study. This data can be found here: Surveillance, Epidemiology, and End Results (SEER) database (https://seer.cancer.gov/).
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68(6):394–424.
Bolondi L, Sofia S, Siringo S, Gaiani S, Casali A, Zironi G, et al. Surveillance programme of cirrhotic patients for early diagnosis and treatment of hepatocellular carcinoma: a cost effectiveness analysis. Gut. 2001;48(2):251–9.
Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020. CA Cancer J Clin. 2020;70(1):7–30.
Qi YP, Zhong JH, Liang ZY, Zhang J, Chen B, Chen CZ, et al. Adjuvant transarterial chemoembolization for patients with hepatocellular carcinoma involving microvascular invasion. Am J Surg. 2019;217(4):739–44.
Zhu GQ, Shi KQ, Yu HJ, He SY, Braddock M, Zhou MT, et al. Optimal adjuvant therapy for resected hepatocellular carcinoma: a systematic review with network meta-analysis. Oncotarget. 2015;6(20):18151–61.
Xu J, Li J, Chen J, Liu ZJ. Effect of adjuvant interferon therapy on hepatitis b/c virus-related hepatocellular carcinoma after curative therapy - meta-analysis. Adv Clin Exp Med. 2015;24(2):331–40.
Xia BW, Zhang YC, Wang J, Ding FH, He XD. Efficacy of antiviral therapy with nucleotide/nucleoside analogs after curative treatment for patients with hepatitis B virus-related hepatocellular carcinoma: a systematic review and meta-analysis. Clin Res Hepatol Gastroenterol. 2015;39(4):458–68.
Yao FY, Ferrell L, Bass NM, Watson JJ, Bacchetti P, Venook A, et al. Liver transplantation for hepatocellular carcinoma: expansion of the tumor size limits does not adversely impact survival. Hepatology. 2001;33(6):1394–403.
Park H, Park JY. Clinical significance of AFP and PIVKA-II responses for monitoring treatment outcomes and predicting prognosis in patients with hepatocellular carcinoma. Biomed Res Int. 2013;2013:310427.
Podgorelec V, Kokol P, Stiglic B, Rozman I. Decision trees: an overview and their use in medicine. J Med Syst. 2002;26(5):445–63.
Mitra AP, Skinner EC, Miranda G, Daneshmand S. A precystectomy decision model to predict pathological upstaging and oncological outcomes in clinical stage T2 bladder cancer. BJU Int. 2013;111(2):240–8.
Cao F, Shen L, Qi H, Xie L, Song Z, Chen S, et al. Tree-based classification system incorporating the HVTT-PVTT score for personalized management of hepatocellular carcinoma patients with macroscopic vascular invasion. Aging (Albany NY). 2019;11(21):9544–55.
Wu Z, Chen W, Ouyang T, Liu H, Cao L. Management and survival for patients with stage-I hepatocellular carcinoma: an observational study based on SEER database. Medicine (Baltimore). 2020;99(41):e22118.
Zheng L, Zhang CH, Lin JY, Song CL, Qi XL, Luo M. Comparative effectiveness of radiofrequency ablation vs. surgical resection for patients with solitary hepatocellular carcinoma smaller than 5 cm. Front. Oncol. 2020;10:399.
Li W, Xiao H, Wu H, Xu X, Zhang Y. Liver transplantation versus liver resection for stage I and II hepatocellular carcinoma: results of an instrumental variable analysis. Front Oncol. 2021;11:592835.
Golabi P, Fazel S, Otgonsuren M, Sayiner M, Locklear CT, Younossi ZM. Mortality assessment of patients with hepatocellular carcinoma according to underlying disease and treatment modalities. Medicine (Baltimore). 2017;96(9):e5904.
Yang D, Hanna DL, Usher J, LoCoco J, Chaudhari P, Lenz HJ, et al. Impact of sex on the survival of patients with hepatocellular carcinoma: a Surveillance, Epidemiology, and End Results analysis. Cancer. 2014;120(23):3707–16.
Godec P, Pancur M, Ilenic N, Copar A, Strazar M, Erjavec A, et al. Democratized image analytics by visual programming through integration of deep models and small-scale machine learning. Nat Commun. 2019;10(1):4551.
Stekhoven DJ, Buhlmann P. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112–8.
Llovet JM, Schwartz M, Mazzaferro V. Resection and liver transplantation for hepatocellular carcinoma. Semin Liver Dis. 2005;25(2):181–200.
Poon RT. Prevention of recurrence after resection of hepatocellular carcinoma: a daunting challenge. Hepatology. 2011;54(3):757–9.
Pawlik TM, Delman KA, Vauthey JN, Nagorney DM, Ng IO, Ikai I, et al. Tumor size predicts vascular invasion and histologic grade: implications for selection of surgical treatment for hepatocellular carcinoma. Liver Transpl. 2005;11(9):1086–92.
Liu Y, Wang ZX, Cao Y, Zhang G, Chen WB, Jiang CP. Preoperative inflammation-based markers predict early and late recurrence of hepatocellular carcinoma after curative hepatectomy. Hepatobiliary Pancreat Dis Int. 2016;15(3):266–74.
Vogel A, Cervantes A, Chau I, Daniele B, Llovet JM, Meyer T, et al. Hepatocellular carcinoma: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2019;30(5):871–3.
Wu C, Chen P, Qian JJ, Jin SJ, Yao J, Wang XD, et al. Effect of marital status on the survival of patients with hepatocellular carcinoma treated with surgical resection: an analysis of 13,408 patients in the surveillance, epidemiology, and end results (SEER) database. Oncotarget. 2016;7(48):79442–52.
Shim JH, Jun MJ, Han S, Lee YJ, Lee SG, Kim KM, et al. Prognostic nomograms for prediction of recurrence and survival after curative liver resection for hepatocellular carcinoma. Ann Surg. 2015;261(5):939–46.
Kokudo T, Hasegawa K, Matsuyama Y, Takayama T, Izumi N, Kadoya M, et al. Survival benefit of liver resection for hepatocellular carcinoma associated with portal vein invasion. J Hepatol. 2016;65(5):938–43.
Liu PH, Hsu CY, Hsia CY, Lee YH, Su CW, Huang YH, et al. Prognosis of hepatocellular carcinoma: assessment of eleven staging systems. J Hepatol. 2016;64(3):601–8.
Cheng YC, Chen TW, Fan HL, Yu CY, Chang HC, Hsieh CB. Transarterial chemoembolization for intrahepatic multiple recurrent HCC after liver resection or transplantation. Ann Transplant. 2014;19:309–16.
Choi JW, Park JY, Ahn SH, Yoon KT, Ko HK, Lee DY, et al. Efficacy and safety of transarterial chemoembolization in recurrent hepatocellular carcinoma after curative surgical resection. Am J Clin Oncol. 2009;32(6):564–9.
Peng BG, He Q, Li JP, Zhou F. Adjuvant transcatheter arterial chemoembolization improves efficacy of hepatectomy for patients with hepatocellular carcinoma and portal vein tumor thrombus. Am J Surg. 2009;198(3):313–8.
We would like to thank the entire staff of the National Cancer Institute who participated in the Surveillance, Epidemiology, and End Results (SEER) project.
The project was funded by The Special Project of the First Affiliated Hospital, Chengdu Medical College [Grant No. CYFY2019ZD03], the School Foundation of Chengdu Medical College [Grant No. CYZYB21-05], and the Project of Chengdu Medical Research [Grant No. 2021015], and Science and Technology Department of Sichuan Province [Grant No. 2023NSFSC1249].
Ethics approval and consent to participate
Not applicable. SEER is a publically available anonymous data source, so this study was not reviewed by a Human Subjects Committee.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Hu, J., Gong, N., Li, D. et al. Identifying hepatocellular carcinoma patients with survival benefits from surgery combined with chemotherapy: based on machine learning model. World J Surg Onc 20, 377 (2022). https://doi.org/10.1186/s12957-022-02837-2
- Hepatocellular carcinoma
- Machine learning