Pan-cancer analysis of m5C regulator genes reveals consistent epigenetic landscape changes in multiple cancers

Background 5-Methylcytosine (m5C) is a reversible modification to both DNA and various cellular RNAs. However, its roles in developing human cancers are poorly understood, including the effects of mutant m5C regulators and the outcomes of modified nucleobases in RNAs. Methods Based on The Cancer Genome Atlas (TCGA) database, we uncovered that mutations and copy number variations (CNVs) of m5C regulatory genes were significantly correlated across many cancer types. We then assessed the correlation between the expression of individual m5C regulators and the activity of related hallmark pathways of cancers. Results After validating m5C regulators’ expression based on their contributions to cancer development and progression, we observed their upregulation within tumor-specific processes. Notably, our research connected aberrant alterations to m5C regulatory genes with poor clinical outcomes among various tumors that may drive cancer pathogenesis and/or survival. Conclusion Our results offered strong evidence and clinical implications for the involvement of m5C regulators. Supplementary Information The online version contains supplementary material available at 10.1186/s12957-021-02342-y.


Background
Cancers have become the second life-threatening malignancies, which contribute to almost 18.1 million people occurred and 9.6 million death globally in 2018 [1]. Lack of efficient diagnosis indicators at an early stage and high rate of postoperative recurrence contribute to poor clinical prognosis and high mortality [2,3]. Growing evidence demonstrated that genomic instability [4,5], oncogene activation, aberrant methylation modifications, alterations in epigenetic changes [6][7][8], aberrant expression of microRNAs [9], and alterations of signaling pathways are crucial factors and contribute to cancer pathogenesis [10][11][12]. Methylation is an essential epigenetic modification and is closely related to the pathogenesis of cancers [13][14][15][16][17]. The 5-methylcytosine (m 5 C), N6-methyladenine (m 6 A), and N1-methyladenosine (m 1 A) have become the most common types of epigenetic modifications in eukaryotes [13]. Emerging evidence has demonstrated that m 5 C modification has the potential to serve as novel epigenetic markers with remarkable biological significance in biological processes [18][19][20]. m 5 C modification distributes in different types of RNAs and DNAs [21][22][23]. m 5 C modifications can even modify the destiny of cancer cells [24]. m 5 C regulators contain writers, erasers, and readers, which function as common epigenetic modification and contribute to pre-mRNA splicing, gene expression, gene silencing, nuclear export, genomic maintenance, and translation initiation modifications [25,26]. m 5 C could therefore be used as a biomarker for disease progression, including various types of cancers [27]. m 5 C maintains open and closed chromatin states to control gene expression, genome editing, organismal development, and cellular differentiation [23]. In this context, writers act within a methyltransferase complex to methylate targets, and erasers remove m 5 C methylation, while readers recognize and bind to m 5 C-methylated RNA and implement corresponding functions [25,28,29]. The anomalous interplay between writers and erasers, arising from alterations to their expression, has been linked to cancer pathogenesis and progression [23,30]. However, pan-cancer effects of changes to m 5 C regulatory gene expression have not been fully defined. Next-generation sequencing (NGS) provides us effective tools to comprehensively view the m 5 C distribution landscape throughout the global transcriptome [27].
In this study, we identified the potential prognostic value of m 5 C regulators and provided a comprehensive understanding of m 5 C modifications in pan-cancers, which will help to find novel opportunities for cancer early detection, treatment, and prevention.

Study workflow
We downloaded fragments of kilobase transcripts based on fragments per kilobase of transcript per million (FPKM) gene expression from The Cancer Genome Atlas (TCGA, https:// www. cancer. gov/) dataset among 33 different cancer types. m 5 C regulator patterns were investigated in 5480 samples among 33 different cancer types, including somatic mutations, copy number variations (CNVs), gene expression, and RNA-seq data (Fig. 1B).

Genomic data collection of m 5 C regulators
Thirteen m 5 C regulators were identified from published papers. Information of m 5 C regulators was collected from Gene Cards (www. genec ards. org). Ensemble gene IDs and HUGO Gene Nomenclature Committee symbols were assigned to each m 5 C regulator-associated gene.

Whole genomics data analysis of 33 pan-cancers
The integrated OMICS datasets based on the TCGA database of 33 pan-cancers were applied in this study. We collected the mutation annotation format profile from TCGA, which contains over 10,000 cancer patient's information. The level 3 data of copy number alterations profiles in TCGA was acquired for secondary analysis. In addition, we downloaded pan-cancers RNAseq data of genomic variations profiles and corresponding clinical information from the Genomic Data Commons Data Portal using the R package "TCGAbiolinks. "

Differently expression genes (DEGs) identification
The DESeq package in R language was utilized to validate the DEGs between 33 pan-cancer samples and adjacent samples. Genes with a mean value > 0 were included in the screening of DEGs. To establish proper DEGs among 33 cancers, we settled the adjusted P value less than 0.01 and |log2 fold change (log2Fc) | no less than two as the statistical threshold value for differentially expressed genes. The results were screened as significant DEGs and methylated sites.

Functional annotations and pathway enrichment analysis
To evaluate the biological functions of each m 5 C modification-related gene, we transformed the RNA-seq data of all samples into transcripts per million (TPM) values. The methods have been described in a previous study [31]. The insufficient, duplicated, and zero expression genes will be eliminated. Furthermore, the gene set variation analysis (GSVA) was applied to determine transcriptomic activities and explore the biological processes of m 5 C regulators. To further explore m 5 C regulators related inhibition and activation factors, we performed the Pearson correlation coefficient (PCC) and defined the absolute value of the PCC greater than 0.5 and p value of less than 0.01 as the screen cut-off. The results could be recognized as significantly correlated m 5 C regulators.

The internships between m 5 C regulators
To visualize the intercorrelations among m 5 C regulators, we adopted the "CORPRRAP" R package (https:// github. com/ taiyun/ corrp lot). Besides, the STRING database was also applied for the exploration and analysis of these associations between m 5 C regulators and 325 related genes [32]. The 325 genes were obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http:// www. kegg. jp/ or http:// www. genome. jp/ kegg/). The correlations between m 5 C regulators and 325 genes were visualized through Cytoscape (https:// cytos cape. org/).

Clinical characteristic of m 5 C regulators
To explore the m 5 C regulators' related clinical characteristics, we classified genes into high and low expression groups based on genes' median expression. Correlations between outcomes of the two groups were then analyzed through a log-rank test via R software (https:// cran.r-proje ct. org/ web/ packa ges/ survi val/ index. html). The log-rank test was performed to weigh the overall survival rates that differ between the high and low expression groups. The CRAN Package survival (https:// cran.r-proje ct. org/ web/ packa ges/ survi val/ index. htm) was performed, and we defined the p value of less than 0.05 as significant difference.

The roles of m 5 C regulators in cell growth
The CRISPR-CAS9 gene scale screening of cell lines from 33 cancer types was collected from previous study [32]. We calculated the proportion of every regulator as an essential gene in the cell lines.

Results m5C regulators identification and its genomic extensive genetic changes
In this study, we identified 13 m 5 C regulators, as shown in Fig. 1A, eleven writers (NSUN1-7, DNMT1, DNMT2, DNMT3A, and DNMT3B), one eraser (TET2), and one reader (ALYREF). This study validated the frequency of m 5 C regulator patterns among 33 cancers by integrating somatic mutations and CNVs data. Table 1 illustrated detailed information. The results indicated that the overall average mutation frequency of regulatory factors is low, ranging from 0 to 9% ( Fig. 2A). The uterine corpus endometrial carcinoma (UCEC) is characterized as a high tumor mutation burden [33]. The UCEC showed significantly higher mutation frequency. Horizontal analysis indicated that TET2, DNMT3B, DNMT3A, and DNMT1 demonstrated much higher mutation frequency among 33 cancers. Furthermore, we uncovered the CNV mutation frequency of m 5 C regulators was common. Regulators such as DNMT3B, ALYREF, and NSUN5 displayed extensive CNVs. On the contrary, TET2 and NSUN4 showed significant lack of m 5 C modification related CNV mutations among pan-cancers (Fig. 2B, and Table 2).
To further investigate whether the genomic mutations affect m 5 C regulators expression, we intensively detected the m 5 C regulators' gene expression disturbances in thirty-three pan-cancers and five standard control samples. The result implied that the CNV alterations (amplification and deletion) might profoundly affect the m 5 C regulator's expression (Fig. 3A). The m 5 C regulators with CNV amplification showed significantly increased expression in pan-cancers (such as DNMT3B), and m 5 C regulators with CNV deletion exhibited remarkably decreased expression, like TET2. In addition, we comparatively analyzed the m 5 C regulators' expression levels in cancers and corresponding normal tissues and found out that DNMT3B was significantly overexpressed in thirty-three tumor or cancer tissues compared with adjacent normal tissues (Fig. 3B). These results uncovered that the m 5 C regulators among various cancers showed significant heterogeneity in gene expression and genetics. Collectively, our results demonstrated that aberrant m 5 C regulations were crucial for carcinogenesis and progression, which provided a clue for further functional detection.

m 5 C regulators related pan-carcinogenic pathways
To comprehensively explore the molecular mechanism m 5 C regulators involved in cancers, we evaluated the correlations between m 5 C regulators' proteins expression and KEGG enrichment analysis-related activities.
The results indicated that m 5 C regulator proteins have a close relationship with tumor-related pathways' activation and inactivation (Fig. 4A, and Table 3). ALYREF, DNMT1, and TET2 were involved in the cell cycle, DNA replication, and prostate cancer-related pathways. Notably, ALYREF was involved in multiple pathways, including cell cycle, DNA replication, and prostate cancer-related pathways. DNMT1 was involved in drug metabolism, lipid metabolism, and nucleic acid biosynthesis signaling pathways [34]. ALYREF, DNMT1, NSUN5, NSUN1, and TET2 showed active involvement in KEGG enrichment pathways (Fig. 4B). In addition, genes will not function alone [35]. Growing evidence indicated that genes always co-effect with multiple genes and always have multiple functions [35,36]. We further explored the internal connections between m 5 C regulators gene expression. Results indicated that the readers, writers, and erasers also have high correlations with each other. The eraser TET2 was significant correlated with the writer NSUN3. Writers such as ALYREF and NSUN5 also showed obvious correlations (R = 0.55, P < 0.01) (Fig. 4C). Additionally, to visualize the interactions between m 5 C regulators, we utilized the proteinprotein interaction (PPI) analysis in m 5 C regulators related proteins. The results showed that writers, readers, and erase were particularly frequent (Fig. 4D). These results indicated that interactions among m 5 C regulators play crucial roles in the development and progression of cancers.

Clinical significance of m 5 C regulators in pan cancers
To evaluate the clinical prognosis of m 5 C regulators, we calculated the overall survival (OS), overall median progression-free interval (PFI), disease-specific survival (DSS), and disease-free interval (DFI) of m 5 C regulators. The OS analysis implied a significant correlation between m 5 C regulators and thirty-three pan-cancers. The heat map demonstrated that m 5 C regulators were significantly correlated with survival of patients, including OS, PFI, DSS, and DFI. In detail, the OS in adrenocortical carcinoma (ACC), kidney chromophobe (KICH), kidney renal clear cell carcinoma (KIRC), brain lower grade glioma (LGG), and liver hepatocellular carcinoma (LIHC) showed significant correlations with m 5 C regulators (Fig. 5A). The highly expressed DNMT3A, DNMT3B, DNMT1, and ALYREF were significantly related to poor prognosis. Collectively, the DNMT3A, DNMT3B, DNMT1, and ALYREF might function as poor prognosis predictors in cancer progression. Moreover, we evaluated the prediction of PFI at fixed time points in patients with thirty-two solid tumors. PFIs of DNMT3B and DNMT1 showed significantly higher hazard ratio values, indicating that they have the potential to be utilized as unfavorable prognosis prediction factors. The PFI in ACC, KICH, KIRC, LGG, and uveal melanoma (UVM) showed significant correlation ships with most m 5 C regulators (Fig. 5B). Similarly, The DSS and DFI analyses indicated that ACC and LGG showed remarkably correlations with most m 5 C regulators (Fig. 5C, D). These results indicated that m 5 C regulators have crucial prognostic prediction values in a variety of cancer types.

Effect of m 5 C regulators in LIHC and cholangiocarcinoma (CHOL)
Studies have validated that m 5 C-related genes alterations have a close relationship with advanced tumor progression and advanced tumor stages [37], based on the above pieces of evidence that most m5C regulators are associated with patients' OS in LIHC and CHOL (Fig. 5A). Based on the overall expression patterns of m 5 C regulators, all patients in these cancer groups were categorized into two subgroups. The first subgroup consisted of 112 patients indicating high expression of m 5 C regulators (Reg-high), and the second subgroup consisted of 295 patients with low m 5 C regulators expression (Reg-low) (Fig. 6A). Compared with the Reg-high subgroup, the survival probability of patients in the Reg-low subgroup was significantly better (P < 0.001) (Fig. 6B). These results indicated that m 5 C regulators have a potential function as prognostic indicators in hepatocellular carcinoma and cholangiocarcinoma.
Original reports described that NSUN2 participates in catalyzing biological reactions of m 5 C formation in RNAs and regulating cell cycle [52], linked to stem cell differentiation and involved in progression [53]. It is also reported that m 5 C regulators such as NSUN2 and binding partner ALYREF participant in promoting mRNA export coordinately [54], and NSUN6, in complex with a full-length tRNA substrate targeting cytosine accessible to the enzyme for methylation [14,55,56]. The NSUN3 is required for the deposition of m 5 C at the anticodon loop in the mitochondria encoded transfer RNA methionine [57]. The NSUN5 demonstrates suppression characteristics in vivo glioma models [58]. NSUN5 gene mutation leads to an un-methylated condition at the C3782 position of 28S rRNA, which leads to a total depletion of protein synthesis and inducing an adaptive translational program under stress collectively [59], as is illustrated that the m 5 C regulators may influence a wide variety of biological functions and metabolism.
In the present study, we applied certain methodological particularities to build a model and evaluated a catalog of genomic characteristics of tumors associated with m 5 C regulators. We obtained a total of 13 m 5 C regulators. The mutations and CNVs of m 5 C regulators are linked to several tumor developments. All cancers carry somatic mutations [60]. UCEC exhibited a significantly higher number of mutations across pan-cancers, analogously TET2, DNMT3B, DNMT3A, and DNMT1 have placed a moderate burden in m 5 C regulator genes. DNMT3B gene mutation was generally higher expression level among various cancers. Here, we sequenced the m 5 C regulator genomes of pan-cancer and providing the first comprehensive remarkable insights into the forces that have shaped various cancer genomes. CNVs play an important role in tumor genesis and progression [61], including amplification and deletion of oncogenes, which may significantly increase the risk of cancer [62]. In this research, the DNMT3B, ALYREF, and NSUN5 showed extensive CNV amplification. In contrast, CNVs such as TET2 and NSUN4 are generally deletion. These results indicated that CNV and the associated gene signatures are useful for early cancer detection and diagnosis, targeted therapeutics, and prediction of prognosis.
The genomic and transcriptomic parameters of various cancers are associated with m 5 C regulators gene expression and activity of the KEGG pathways [63]. We also investigate the gene expression perturbations of m 5 C regulators through 33 cancer types with parallel normal controls. The expression of ALYREF, DNMT1, NSUN2, and TET2 are more positively correlated with the majority of pathways, such as the cell cycle, DNA replication, spliceosome, and nucleotide excision repair pathways. The DNMT1 expression is related to the activation of multiple metabolic pathways, including drug metabolism, lipid metabolism, and nucleic acid biosynthesis signaling pathways. The m 5 C regulators' pathways are significantly essential for a wide range of biological processes. m5C regulators were also validated involved in malignant activities [64]. Recent studies have demonstrated that the m5C modification in pyruvate kinase muscle isozyme M2 was involved in bladder cancer proliferation and migration. M5C regulator Aly/REF export factor regulated pyruvate kinase muscle isozyme M2 promote the glucose metabolism of bladder cancer [64]. At the same time, the precise molecular modification mechanisms and cellular processes among pan-cancer need further study and deeper exploration for a better prognosis.
For a deeper exploration of the relationship between m 5 C regulators and their clinical outcomes, we describe

Conclusion
The m5C regulators were differently expressed and showed significantly different CNVs in pan-cancers, which also involved multiple oncogene pathways. In addition, m5C regulators also exhibited prognosis prediction value in pan-cancers. Therefore, our study provides a better understanding of the biology of m5C regulators in pan-cancers, indicating that m5C RNA methylation regulators have the potential to become novel biomarkers and therapeutic targets for various tumors.