Identification and validation of a miRNA-related expression signature for tumor mutational burden in colorectal cancer

Background Tumor mutational burden (TMB) is a promising predictor, which could stratify colorectal cancer (CRC) patients based on the response to immune checkpoint inhibitors (ICIs). MicroRNAs (miRNAs) act as the key regulators of anti-cancer immune response. However, the relationship between TMB and miRNA expression profiles is not elucidated in CRC. Methods Differentially expressed miRNAs (DE miRNAs) between the TMBhigh group and the TMBlow group were identified for the CRC cohort of the TCGA database. In the training cohort, a miRNA-related expression signature for predicting TMB level was developed by the least absolute shrinkage and selection operator (LASSO) method and tested with reference to its discrimination, calibration, and decision curve analysis (DCA) in the validation cohort. Functional enrichment analysis of these TMB-related miRNAs was performed. The correlation between this miRNA-related expression signature and three immune checkpoints was analyzed. Results Twenty-one out of 43 DE miRNAs were identified as TMB-related miRNAs, which were used to develop a miRNA-related expression signature. This TMB-related miRNA signature demonstrated great discrimination (AUCtest set = 0.970), satisfactory calibration (P > 0.05), and clinical utility in the validation cohort. Functional enrichment results revealed that these TMB-related miRNAs were mainly involved in biological processes associated with immune response and signaling pathways related with cancer. This miRNA-related expression signature showed a median positive correlation with PD-L1 (R = 0.47, P < 0.05) and CTLA4 (R = 0.39, P < 0.05) and a low positive correlation with PD-1 (R = 0.16, P < 0.05). Conclusion This study presents a miRNA-related expression signature which could stratify CRC patients with different TMB levels.


Introduction
Colorectal cancer (CRC) is a commonly diagnosed cancer, and its incidence and mortality rate rank the third and second among all the malignant tumors, respectively [1]. With the increasing incidence of CRC in the young, there will be about 2.5 million newly diagnosed CRC cases in 2035 [2]. Although the current treatment strategy of surgical resection combined with radiotherapy and chemotherapy has extended survival time for early-stage CRC patients, poor prognosis still remains a serious problem for metastatic CRC [2,3].
In recent years, several studies have identified that the interaction between programmed death receptor (PD-1) and its ligand, programmed death ligand (PD-L1), could serve as the mechanism for tumors to evade an antigenspecific T cell immunologic response [4]. Based on this hypothesis, immunotherapy is introduced and has revolutionized the approach to treatment for CRC [5]. Conventional chemotherapy kills tumor cells by interfering directly with DNA or targeting key proteins required for cell proliferation [6]. By contrast, the immunotherapy could induce cell death by restoring the dysfunctional antitumor T cells [7]. The most widely used immunotherapy in CRC is the immune checkpoint inhibitors (ICIs), which included PD-1/PD-L1 inhibitors and cytotoxic T-lymphocyte antigen 4 (CTLA-4) inhibitors. At present, PD-L1 expression by immunohistochemistry test has been used to identify CRC patients who can benefit from ICIs [8]. However, the PD-L1 expression could be regulated by the tumor microenvironment, and the correlation between PD-L1 expression and immunotherapy efficacy is not clear. Microsatellite instability (MSI) is also an established biomarker for predicting response to ICIs [9]. However, the response rate of ICIs is variable among CRC patients with high microsatellite instability (MSI-H), and responders have more somatic mutations and neoantigen loads than non-responders [10], indicating that additional predictive biomarkers are required.
Tumor mutational burden (TMB) is a promising independent predictor, which could stratify patients based on the response to ICIs [11,12]. The definition of TMB is the number of somatic variants in the coding region of tumor genes. A recent study reveals that the response rate to ICIs in patients with TMB high level is higher than that in patients with TMB low level, suggesting that TMB high level is positively correlated with immunotherapy efficacy [13]. However, TMB has not been widely used in clinical practice, mainly due to the nonstandardization of TMB detection [14]. A large number of genetic mutations in TMB could produce "non-self" neoantigen proteins which could activate anti-tumor immune response [15,16]. The post-transcriptional regulation is essential in the translation of these mutated genes into neoantigen proteins, and microRNAs (miRNAs) are involved in this process. miRNA is one type of endogenous non-coding RNAs consisting of approximately 21-25 nucleotides that participates in the post-transcriptional modification process, and abnormal miRNA expression is involved in the pathogenesis of various types of cancer [17]. Some research has reported that miRNAs could serve as promising predictors for TMB levels and are involved in the regulation of anti-cancer immune response [18,19]. For example, Lv et al. revealed that the expression profiles of miRNAs were related with TMB levels, and a miRNArelated signature classifier was developed to predict TMB level in lung adenocarcinoma [20]. Zhao et al. reported that miR-138-5p could bind to 3′ untranslated region (UTR) of immune checkpoint PD-L1, consequently leading to the inhibition of its translation [21]. However, the relationship between TMB and miRNA expression patterns is not elucidated in CRC. Thus, the purpose of this research lies in the development of a miRNA-related expression signature, which could identify CRC patients with different TMB levels.

Data acquisition
Both somatic mutation data and miRNA expression profiles of the CRC cohort were downloaded from The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/), which provided comprehensive genomic information in various types of cancer. For somatic mutation data, the workflow type used in this research was VarScan2 Variant Aggregation and Masking. We measured the TMB of each sample based on the number of somatic mutations per DNA megabase [22]. Thirty-eight megabase was used to estimate the exome size [23]. We selected 10 mutations per megabase as a cutoff point, which separated CRC patients into TMB high samples and TMB low samples [24,25]. The data type for miRNA expression profiles in our study was isoform expression quantification, which contained preprocessed mature miRNAs in 539 CRC samples and 9 control samples. A total of 457 samples with both somatic mutation data and miRNA expression profiles were extracted as the overall cohort. We then randomly assigned CRC samples in the overall cohort to either the training cohort (60%) or the validation cohort (40%).

Identification of differentially expressed miRNAs
In the training cohort, we removed the miRNAs which contained missing values in more than 10% of the CRC samples. Differentially expressed miRNAs (DE miRNAs) between the TMB high group and the TMB low group were determined by the limma package in the R software. The |log2 fold change (FC)| > 1.5 and false discovery rate (FDR) < 0.01 were chosen as the threshold criteria. To visualize the expression patterns of these DE miRNAs, the heatmap was plotted using the pheatmap package in the R software.
miRNA-related expression signature building and enrichment analysis of TMB-related miRNAs The expression values of the above DE miRNAs for each CRC sample were extracted from the training cohort. The least absolute shrinkage and selection operator (LASSO) method, which could select optimal features from high-dimensional data [26], was used to screen the most useful predictive miRNAs for TMB level. The miR-NAs, whose regression coefficients were non-zero in the LASSO regression analysis, were identified as TMBrelated miRNAs. The miRNA-related expression signature was developed based on these TMB-related miRNA expression value multiplied by their corresponding LASSO regression coefficient. To further understand the biological significance and essential pathways of these TMB-related miRNAs, Gene Ontology (GO) term for biological process (BP) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were performed in DIANA-mirPath web tool [27]. To provide visual information, the enriched results for GO and KEGG analysis were presented as the bubble plot by the ggplot2 R package.

Principal component analysis of DE miRNAs and TMBrelated miRNAs and validation of the miRNA-related expression signature
To evaluate whether miRNAs could make a distinction between TMB high samples and TMB low ones, PCA was conducted based on the expression profiles of DE miR-NAs and TMB-related miRNAs. Two PCA plots were displayed by the ggplot2 R package, in which the correlations among all CRC samples were converted into a two-dimensional graph. The validation cohort was then used to evaluate the robustness of this miRNA-related expression signature. First, the area under the receiver operating characteristic curve (AUC) was measured to assess the discrimination ability of this expression signature by the pROC R package [28]. Generally, AUC above 90% means that this signature is almost perfect. Then, the calibration performance of this expression signature was assessed using the "rms" package in the R software [29]. The calibration plot was presented accompanied with the unreliability test, in which P value > 0.05 means that this classifier calibrated perfectly with the ideal signature. Finally, we performed decision curve analysis (DCA) to assess the clinical utility of this expression signature, and net benefit at different threshold probabilities was calculated by the rmda R package [30].
The correlation analysis between the miRNA-related expression signature and three immune checkpoints The expression profiles of three immune checkpoints (PD-1, PD-L1, and CTLA-4) in RNA sequencing were downloaded from TCGA. Log2 (count+1) transformation was used to normalize the expression profiles of these three immune checkpoints. In the overall cohort, the expression signature value of each sample was calculated, and the correlation between the miRNA-related expression signature and three immune checkpoints was analyzed. Besides, we used the TargetScan webserver to identify which immune checkpoint could serve as a potential target gene of these TMB-related miRNAs [31].

Statistical analysis
The clinical information between the training cohort and validation cohort was displayed as categorical data and analyzed by the χ 2 test in the R software. The Wilcoxon test was used to analyze the miRNA expression levels between the TMB high group and the TMB low group and was applied using the R software. A P value < 0.05 was regarded as a statistical difference.

Identification of DE miRNAs
As was shown in Table 1 (Fig. 1a).
miRNA-related expression signature building and enrichment analysis of these TMB-related miRNAs Based on the expression profiles of DE miRNAs in the training cohort, we used LASSO method to identify the most useful predictive miRNAs for TMB level. 21 miR-NAs, whose coefficients were non-zero under the biggest AUC, were regarded as optimal features and used to develop a miRNA-related expression signature (Fig. 1b) Fig. 2a, the GO enrichment terms for BP were mainly associated with immune response, such as the innate immune response, leukocyte migration, and toll-like receptor signaling pathway, and the like. Besides, the KEGG results revealed that these miRNAs were related with cancer-related pathways, such as the colorectal cancer pathway, Ras signaling pathway, TGF-beta signaling pathway, and Wnt signaling pathway (Fig. 2b).

PCA and validation of the miRNA-related expression signature
As was shown in Fig. 3, the PCA results for either DE miRNAs or TMB-related miRNAs revealed that the above two types of miRNAs could obviously make a distinction between TMB high samples and TMB low ones. The AUC of this miRNA-related expression signature was 0.991, 0.970, and 0.984 in the training, validation, and overall cohort, respectively (Fig. 4). Besides, the specificity (SP), negative predictive value (NPV), sensitivity (SE), positive predictive value (PPV), and accuracy values were very high ( Table 2), indicating that this signature could perfectly discriminate patients with different TMB levels. The calibration plot revealed this signature had no departure from perfect fit, and no statistical difference was found between this signature and ideal signature (P > 0.05) (Fig. 5a). DCA demonstrated that whatever the threshold probability was, this signature could produce more benefit than either treating-no-one (See figure on previous page.) Fig. 1 Identification of TMB-related miRNAs. a The heatmap of differentially expressed miRNAs (DE miRNAs). Each column represented each sample. The red dots in the heatmap represented upregulation, the green dots represented downregulation, and black dots represented miRNAs without differential expression. b Development of a TMB-related miRNA expression signature by the least absolute shrinkage and selection operator (LASSO) method. The optimal miRNAs with non-zero regression coefficients (λ) was selected by 10-fold cross-validation and "AUC" measure type curve or treating-everyone curve, suggesting this was a perfect signature (Fig. 5b).

Discussion
Though PD-L1 expression and MSI-H are established biomarkers for predicting response to ICIs in CRC [8,9], the response rate to ICIs among CRC patients with high PD-L1 expression or MSI-H is variable. TMB is an emerging independent biomarker for predicting response to ICIs in CRC [11]. However, the current TMB assessment has not been standardized, affecting its wide application in clinical practice [14]. Considering that miRNAs were found involved in the regulation of anti-cancer immune response [21,[32][33][34][35][36], we developed a miRNA-related expression signature, which could be a good addition to PD-L1 or MSI-H and help identify CRC patients with different TMB levels. The relationship between TMB and miRNA expression profile was not analyzed in previous studies. In this study, DE miRNAs between the TMB high group and the TMB low group were identified and then incorporated into the LASSO regression analysis. Some miRNAs with non-zero coefficients in the LASSO method as TMBrelated miRNAs were used to develop a miRNA-related expression signature. In the validation cohort, this signature demonstrated satisfactory discrimination, great calibration, and more benefit in clinical utility, indicating that this signature was a robust classifier for TMB levels. In particular, the SP, NPV, SE, and PPV of this signature were very high, suggesting that this signature had great recognition ability for TMB low samples and TMB high samples, respectively. Previous researches revealed that TMB high patients could produce more somatic mutations and neoantigen loads which could activate anti-cancer immune response [10]. Interestingly, functional enrichment analysis for TMB-related miRNAs revealed that these miRNAs were mainly involved in biological processes related with immune response and signaling pathways associated with cancer. Besides, the miRNA-related expression signature showed a median positive correlation with TMB, indicating that this miRNA-related expression signature predicted the TMB level from a biological perspective of the anti-cancer immune response.
Several studies have revealed that miRNAs could decrease PD-L1 expression by binding to 3′ UTR of PD-L1, suggesting that miRNAs were negatively related with PD-L1 expression [21,33,34]. However, in our research, this miRNA expression signature was positively related with three immune checkpoints, especially for PD-L1 expression. Thus, future researches are required to explore the underlying mechanism between these TMB-related miRNAs and PD-L1 expression.
Some limitations should be acknowledged in our research. First, the cutoff point to separate samples into the TMB high group and the TMB low group may vary for different TMB detection methods. Second, a larger independent cohort is required to validate the robustness of this miRNA-related expression signature. Finally, the potential mechanisms between TMB-related miRNAs and CRC immune response are not elucidated. Future studies are required to explore the underlying mechanisms.

Conclusion
In summary, we developed a miRNA-related expression signature, which could stratify CRC patients with different TMB levels and further help clinicians evaluate the efficacy of ICIs. This signature was well-validated with reference to its discrimination, calibration, and DCA.

Funding
No sources of funding were required for this work.

Availability of data and materials
All the data and materials are available.
Ethics approval and consent to participate There were no cell, tissue, or animal studies. No ethical requirements are involved.

Consent for publication
All authors agree to publish the paper.