Nomogram based on homogeneous and heterogeneous associated factors for predicting distant metastases in patients with colorectal cancer

Background The identification of the homogeneous and heterogeneous risk factors for different types of metastases in colorectal cancer (CRC) may shed light on the aetiology and help individualize prophylactic treatment. The present study characterized the incidence differences and identified the homogeneous and heterogeneous risk factors associated with distant metastases in CRC. Methods CRC patients registered in the SEER database between 2010 and 2016 were included in this study. Logistic regression was used to analyse homogeneous and heterogeneous risk factors for the occurrence of different types of metastases. Nomograms were constructed to predict the risk for developing metastases, and the performance was quantitatively assessed using the receiver operating characteristics (ROC) curve and calibration curve. Results A total of 204,595 eligible CRC patients were included in our study, and 17.07% of them had distant metastases. The overall incidences of liver metastases, lung metastases, bone metastases, and brain metastases were 15.34%, 5.22%, 1.26%, and 0.29%, respectively. The incidence of distant metastases differed by age, gender, and the original CRC sites. Poorly differentiated grade, more lymphatic metastasis, higher carcinoembryonic antigen (CEA), and different metastatic organs were all positively associated with four patterns of metastases. In contrast, age, sex, race, insurance status, position, and T stage were heterogeneously associated with metastases. The calibration and ROC curves exhibited good performance for predicting distant metastases. Conclusions The incidence of distant metastases in CRC exhibited distinct differences, and the patients had homogeneous and heterogeneous associated risk factors. Although limited risk factors were included in the present study, the established nomogram showed good prediction performance. Supplementary Information The online version contains supplementary material available at 10.1186/s12957-021-02140-6.


Background
Colorectal cancer (CRC) ranks as the third most commonly diagnosed malignancy and the second leading cause of cancer death worldwide [1]. Approximately 1.8 million new cases and 880,000 deaths were estimated by the International Agency for Research on Cancer in 2018 [2]. Distant metastases have a significant impact on the prognosis of CRC. A previous study showed that the 5-year survival rate of patients with distant metastases was only 14%, and the survival rate of patients with localized stage CRC was 90% [3]. Studies investigating the incidence of liver, lung, bone, and brain metastases in CRC are relatively rare, and the findings remain controversial [4][5][6][7]. Few studies investigated the risk factors for specific organ metastases in CRC [8,9]. Overall, there has been no systematic research examining the homogeneous and heterogeneous risk factors for distant metastases in patients with CRC. The predictive models are not ideal due to the limited sample size.
The present study characterized the incidence differences and the differences in risk factors for liver, lung, bone, and brain synchronous distant metastases in CRC patients based on the Surveillance, Epidemiology, and End Results (SEER) database. We constructed a nomogram model to predict the probability of specific organ metastases. The early detection of risk factors for distant metastases may predict the probability of metastases, improve survival, and help obtain a deeper understanding of the pathogenesis of different organ metastases in CRC patients.

Population
Data in this population-based study were retrieved from the US National Cancer Institute (NCI) open public database, the SEER database. Data collection for metastatic sites, such as the liver and lung, started in 2010, and the latest data are available through December 31, 2016. Distant metastases in the SEER database were collected at the initial diagnosis of CRC, which means that the distant metastases were all synchronous metastases. CRC patients who were diagnosed between 2010 and 2016 and patients with liver, lung, bone, and brain synchronous metastases were included in the present study. Cases diagnosed at autopsy or via death certificates, with unspecified follow-up, or unknown first tumour site were excluded. Patients without distant metastasis information were also excluded. Because this study used previously collected data, it was exempt from the ethical review of the ethics board of the First Affiliated Hospital of Chongqing Medical University. SEER*Stat version 8.3.5 (https://seer.cancer.gov/seerstat/) (Information Management Service, Inc. Calverton, MD, USA) was used for case listing.

Statistical analysis
Quantitative data are presented as the means ± standard deviation (SD), and categorical data are described as numbers and percentages (N, %). Univariate and multivariate logistic regression models were used to determine the factors associated with distant metastases in CRC. Factors with P < 0.05 were incorporated into the multivariable regression model. Based on the results of the logistic analysis, the intersection of the risk factors for the four types of metastases was used to identify homogenous or heterogeneous factors. Predictive nomograms for liver metastases, lung metastases, bone metastases, and brain metastases were formulated. Receiver operating characteristics (ROC) curve, area under the curve (AUC), C-index, and calibration curves were used to evaluate their performance. Statistically significant levels were two-tailed and set at p < 0.05. Statistical analyses were performed using the IBM Statistical Package for the Social Sciences (SPSS) version 23.0 software package for Windows (SPSS, Inc., Chicago, IL, USA). The nomogram was plotted using the "rms" and "dca. R" package in R version 3.4.1 (R Foundation for Statistical Computing, Vienna, Austria; www.r-project.org), and all ROC curves were generated using MedCalc 18.2.1.

Demographic and clinical characteristics
A total of 211,266 eligible CRC patients from 2010 to 2016 were selected from the SEER database. After excluding patients with unknown distant metastasis information, this study ultimately included 204,595 patients with/without distant metastases. Of these patients, 31, 288 cases had liver metastases, 10,598 cases had lung metastases, 2553 had bone metastases, and 587 had brain metastases (Fig. 1). The mean age of all patients with distant metastases was 64.92 ± 14.33 years (range 4 to 108), 52.0% of the patients were male (N = 106,488), and 48.0% were female (N = 98,107). Most of the patients were white (76.8%, N = 157,037), and 51.2% were married (N = 104,780). Rectal cancer (23.43%, N = 47, 933) was the most frequent tumour among the CRC patients. Most CRC patients (41.55%, N = 85,002) had stage T3 cancer, and 58.89% of patients (N = 120,476) had grade IV cancer. The detailed demographic and clinical characteristics are displayed in Table 1.
The incidence of distant metastases fluctuated with age. It first increased with the age of patients, and this rising trend was most rapid in patients aged 31 to 51 years. The incidence decreased rapidly in patients > 70 years. The trends of distant metastases in males and females were roughly similar, but the incidence in males aged > 71 years decreased much faster than females (Fig. 2a). Males also had a significantly higher incidence of distant metastases than females (9.42% vs. 7.65%; P < 0.001).
The incidence of distant metastases was different by gender and original CRC sites (Fig. 2b). The highest incidence of distant metastases was observed for the sigmoid colon (3.53%), followed by the rectum (3.47%) and cecum (2.54%), and the lowest incidence of distant metastases was from the appendix (0.14%). A similar incidence was observed for the right and left colon sites (6.09% vs. 6.00%, P > 0.05). The incidence of total distant metastases in males was higher than females (9.42% vs. 7.65%, P < 0.001). However, the incidence of right colon cancer was higher than the left colon cancer in females (39.49% vs. 33.34%, P < 0.001), and the opposite incidence was true in males (31.66% vs. 37.61%, P < 0.001). The incidence of rectal tumours was lower than the colon, and there was no difference by sex.
For the different colorectal cancer and metastatic sites, the liver was the most common metastatic site for sigmoid colon cancer (3.25%), and lung, bone, and brain were the most common metastatic sites for rectal cancer (1.43%, 0.34%, and 0.06%, respectively). Bone metastases were least frequently observed in splenic flexure cancer (0.01%). The other three organ metastases were the least frequently observed for appendix colon cancer. The left colon had a higher metastatic rate than the right colon, specifically to the liver (5.58% vs 5.53%), lung (1.76% vs 1.53%) and bone (0.38% vs 0.36%). The right colon had a higher incidence rate of brain metastases than the left colon (0.10% vs 0.08%). Overall, the incidence of distant metastases was highest for liver metastases, followed by lung, bone, and brain metastases, but it varied in different original sites (Fig. 2c).

Risk factors associated with synchronous distant metastatic CRC
Age, sex, race, marital status, insurance status, left/right colon, histological grade, lymphatic metastasis, T stage, and carcinoembryonic antigen (CEA) correlated with the occurrence of distant metastases by univariate analysis. The multivariable logistic regression model results indicated that male sex, black race, uninsured status, left/ right colon, poor histological grade, T stage, and higher CEA were all positively associated with developing metastases (see Table 2). CRC exhibited homogeneity and heterogeneity for the factors associated with metastases in various organs. The associated factors for different sites of metastases are presented in Tables S1, S2, S3, S4. Poor differentiation grade, more lymphatic metastasis, higher CEA, and different metastatic organs were all positively associated with distant metastatic CRC.      Younger age, male sex, black race, uninsured status, left/ right colon, and T4/T1 stage were more positively associated with liver metastases. Older age, black race, uninsured status, site, and T4/T1 stage were more positively associated with lung metastases. Younger age, male sex, and rectum/right colon were more positively associated with bone metastases, and younger age, white race, and right colon were more positively associated with brain metastases (Fig. 3).

Discussion
Previous studies reported the incidence of liver metastases, lung metastases, bone metastases, and brain metastases, which ranged from 14.5 to 26.5% [10][11][12][13], 2.4 to 6.9% [8,14], 2.7 to 10% [4,15,16], and 0.23 to 3% [4,6,7,15], respectively. Differences in the sample size of each study may have led to these inconsistent results for the same metastatic sites. To the best of our knowledge, the present study is the largest study on incidence, and we found that liver metastases were the most common metastatic pattern in CRC patients, followed by lung metastases and bone metastases. In contrast, metastases to the brain were relatively rare. This result is consistent with the above studies. The incidence of total distant metastases in our study was also similar to a previous study [17]. The present study also found that the incidence of distant metastases fluctuated with age. Our study proves that greater than 40% of patients developed distant metastases at the age of 51-70 years, and younger patients and older patients tended to have a lower prevalence. Screening programmes can identify patients at an early stage, and these programmes are cost effective [18]. Therefore, it is necessary to perform early screening for CRC patients [19,20]. Our findings showed that males had a higher risk of developing distant metastases than females, and females had a higher incidence of right colon cancer. In contrast, males had a higher incidence of left colon cancer. Although some studies indicated sex and gender differences in colorectal cancer development [21,22], the reasons for this difference are not clear. The biological and pathophysiological differences in CRC distant metastasis development between males and females must also be addressed in the future.
In addition to age and sex, the incidence of distant metastases was also different between CRC sites, which was seldom reported. The present study found that the highest incidence of distant metastases was observed for the sigmoid colon, followed by the rectum, cecum, ascending colon, rectosigmoid junction, transverse colon, descending colon, hepatic flexure, splenic flexure, and appendix. A German study found a similar anatomic site distribution, but that study investigated only colon cancers [23]. Regardless of colon cancer or metastatic CRC, these differences in incidence distribution may be due to molecular biological differences. For example, different anatomical sites exhibit different mutation rates in Ki-ras, p53, and epidermal growth factor receptor [24][25][26]. The present study also showed that colon cancer had a higher incidence of distant metastases than rectal cancer, which is partially consistent with a previous study [27]. Knowledge of these different behaviours based on primary sites may help guide targeted screening and introduce timely individualized interventions. Previous studies showed that there were differences in the incidence of different organ metastases between different types of cancer and different histological types of the same cancer [28,29]. The high incidence of distant metastases in CRC and different incidences in metastatic sites (liver, lung, bone, brain) may partially reflect the homogeneity and heterogeneity of distant metastases from CRC. The present study found that different metastatic sites showed homogeneity and heterogeneity in the factors associated with distant metastases from CRC. Four factors (poorly differentiated grade, more lymphatic metastasis, different metastatic organs, and higher CEA) were positively associated with the four types of metastases (liver, lung, bone, and brain). To our knowledge, the present study is the first study to describe these homogeneous factors for CRC distant metastases. These homogeneous associated factors may help with early detection in CRC patients and the development of individualized treatments to improve the prognosis.
However, the heterogeneous factors identified in our study are not entirely consistent with the results of previous studies. For example, we found that age, histological grade, and N stage were associated with brain metastases, which is opposite to previous findings [6]. However, we found that age, histological grade, serum levels of CEA, and the number of positive lymph nodes were associated with lung metastases from CRC, and male sex and rectal cancer were positively associated with bone metastases. These results are consistent with previous studies [5,8].  The heterogeneities in the risk factors may partially be attributed to the different sample sizes because our study included over 200,000 CRC patients, which is greater than previous studies. More studies with larger sample sizes are needed. The biological and pathophysiological mechanisms behind the different risk factors specific to the site of metastasis of CRC are an important issue, but these issues are not clear. For example, male sex was a risk factor for bone metastases, which may be related to sex hormones and their receptors in the colon and lead to the differential development of colon cancer by sex [30,31]. However, the detailed mechanism must be studied in the future.
The present study summarized the homogeneity and heterogeneity of the four types of metastatic CRC that were not comprehensively studied previously. The homogeneous and heterogeneous associated factors mentioned above may help in the surveillance of different types of distant metastases in CRC patients. To assist clinicians in identifying high-risk CRC patients, four predictive nomograms were constructed based on the factors associated with distant metastases. The results of internal validation revealed that the nomograms showed good prediction performance. Traditional early clinical metastatic screening and early diagnosis generally require extra techniques and equipment support, but predictive nomograms based on homogeneous and heterogeneous associated factors may be more cost effective. The nomograms provide a rapid metastatic screening tool. Previous studies have proven that nomograms provide considerable benefits to CRC patients, such as timely targeted therapy, improving the survival rate [32], and reducing the risk of emergency surgery [33]. Therefore, we recommend that CRC patients be screened using the predictive model first; then, the highrisk groups should be examined using PET scans or staging laparoscopies more frequently.
However, the present study has several limitations. A previous study found that never-smokers had a lower mortality risk than current smokers (HR 0.79, 95% CI, 0.64 to 0.99) among CRC patients [34], but smoking status was not included in our analysis. Some treatments, such as surgery, chemotherapy, and radiotherapy, were also not included. Our study investigated synchronous metastases. Surgery, radiotherapy, or chemotherapy will not affect synchronous metastases. Therefore, we did not include therapy in the development of distant metastases analysis. Other clinical factors, such as perforation and obstruction, were not studied because these factors were not available from the SEER database. However, these factors adversely impact outcomes and may affect the survival of CRC patients [35,36]. The SEER database includes only the US population, and the results of this study may not be transferred to all other countries. The nomograms should be validated in other countries before they are used in specific countries.

Conclusion
The present study demonstrated that 17.07% of CRC patients had distant metastases, and the incidences of liver metastases, lung metastases, bone metastases, and brain metastases were 15.34%, 5.22%, 1.26%, and 0.29%, respectively. The incidence of distant metastases was different by the age, gender, and various primary CRC sites. Poor differentiation grade, more lymphatic metastasis, different metastatic organs, and higher CEA were positively associated with these four types of distant metastases, and heterogeneous factors were also identified. Nomograms for predicting CRC patients with distant metastases were constructed. Although limited risk factors were included in this study, the established nomogram showed good prediction performance. These results may assist clinicians in identifying high-risk populations and providing individualized treatments.