Genome-wide analysis of cell-Free DNA methylation profiling with MeDIP-seq identified potential biomarkers for colorectal cancer

Background Colorectal cancer is the most common malignancy and the third leading cause of cancer-related death worldwide. This study aimed to identify potential diagnostic biomarkers for colorectal cancer by genome-wide plasma cell-free DNA (cfDNA) methylation analysis. Methods Peripheral blood from colorectal cancer patients and healthy controls was collected for cfDNA extraction. Genome-wide cfDNA methylation profiling, especially differential methylation profiling between colorectal cancer patients and healthy controls, was performed by methylated DNA immunoprecipitation coupled with high-throughput sequencing (MeDIP-seq). Logistic regression models were established, and the accuracy of this diagnostic model for colorectal cancer was verified using tissue-sourced data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) due to the lack of cfDNA methylation data in public datasets. Results Compared with the control group, 939 differentially methylated regions (DMRs) located in promoter regions were found in colorectal cancer patients; 16 of these DMRs were hypermethylated, and the remaining 923 were hypomethylated. In addition, these hypermethylated genes, mainly PRDM14, RALYL, ELMOD1, and TMEM132E, were validated and confirmed in colorectal cancer by using publicly available DNA methylation data. Conclusions MeDIP-seq can be used as an optimal approach for analyzing cfDNA methylomes, and 12 probes of four differentially methylated genes identified by MeDIP-seq (PRDM14, RALYL, ELMOD1, and TMEM132E) could serve as potential biomarkers for clinical application in patients with colorectal cancer. Supplementary Information The online version contains supplementary material available at 10.1186/s12957-022-02487-4.

cancer [8], but the sensitivity of serum CEA is often low [9]. The fecal occult blood test (FOBT) is the most widely used method for colorectal cancer screening, but its sensitivity for the early detection of colorectal cancer is also low [8]. In reality, there are still many obstacles to the early diagnosis of colorectal cancer. If a novel biomarker can be developed for the early detection of colorectal cancer, it will have profound benefits for the general public.
Genetic and epigenetic aberrations of tumor cells occur at the initiation of tumorigenesis [10,11]. DNA methylation is an important component of epigenetic modification [12]. Epigenetics has been a promising field in cancer research and includes the study of DNA methylation, which occurs in gene promoters [13]. Alterations in DNA methylation can affect gene expression in different ways; for example, the hypermethylation of tumor suppressor genes, especially in the gene promoter region, can lead to downregulation of the tumor suppressor gene and carcinogenesis, which play a key role in many cancers [2,3]. Therefore, aberrantly methylated CpG sites located in the promoter region are considered promising cancer biomarkers.
When apoptotic or necrotic tumor cell lysis occurs, DNA fragments such as cfDNA are released into the bloodstream [14]. The detection of cfDNA could be helpful for early diagnosis and follow-up monitoring of tumors, as it has the advantages of being non-invasive and providing results in real time [15][16][17]. Many reports have pointed out that liquid biopsy studies, including cfDNA tests, and their clinical application may be helpful for tumor diagnosis, drug screening, efficacy evaluations, prognosis predictions, and tumor surveillance [14,[18][19][20]. Another type of DNA fragment released into the blood after apoptotic or necrotic tumor cell lysis is commonly referred to as circulating tumor DNA (ctDNA) [14,21]. ctDNA has methylation patterns similar to those found in tumor cells [22].
The main experimental approaches for profiling genome-wide DNA methylation include whole-genome bisulfite sequencing (WGBS), reduced-representation bisulfite sequencing (RRBS), and MeDIP (methylated DNA immunoprecipitation coupled with high-throughput sequencing) [23]. Both RRBS and WGBS show substantial DNA degradation after bisulfite treatment, and WGBS is less cost-effective [23]. Recently, some scholars have reported that compared with other detection approaches, cfDNA methylated immunoprecipitation and subsequent high-throughput sequencing (cfMeDIPseq) are more sensitive, accurate, and economical for the early diagnosis of tumors [24]. In recent years, there have been a few reports on the genome-wide detection of cfDNA methylation profiling by MeDIP-seq to screen potential tumor biomarkers. Xu et al. [25] identified hypermethylated DMRs in the promoter region that could be used as early diagnostic markers for lung cancer. Li et al. [26] identified hypermethylated DMRs located in promoter regions that completely overlapped with CpG islands and could be used for the non-invasive diagnosis of pancreatic cancer. To the best of our knowledge, there have been few reports on cfDNA methylation profiling by MeDIP-seq among colorectal cancer patients in China.
Therefore, in this study, we performed cfDNA methylation profiling in colorectal cancer patients by MeDIP-seq, followed by data analysis and validation.

Sample collection and cfDNA extraction
All colorectal cancer blood samples (n = 4) were obtained from patients with adenocarcinoma in Shanghai General Hospital, and control blood samples (n = 3) were obtained from healthy volunteers. Informed consent was obtained from all individuals. Specimens were collected and analyzed with the approval of the Ethics Committees of Shanghai General Hospital and Qingpu Branch of Zhongshan Hospital affiliated with Fudan University.
Blood from colorectal cancer patients and controls (~5 ml) was collected in tubes containing EDTA as the anticoagulant. Blood samples were centrifuged for 10 min at 1900×g and 4 °C. The plasma supernatant was carefully collected and centrifuged for 10 min at 16,000×g in a fixed-angle rotor at 4 °C. The plasma supernatant was carefully collected and frozen at − 80 °C.
Plasma cfDNA was extracted using the QIAamp Circulating Nucleic Acid Kit (Qiagen, 55114) according to the instructions. Qubit (Invitrogen) was used to analyze the concentration of cfDNA in plasma. An Agilent Bioanalyzer 2100 system was used to estimate the distribution of cfDNA size.

MeDIP-seq library construction and sequencing
cfDNA was used for the preparation of the MeDIP-seq library with some modifications [27]. Briefly, we used the Illumina NEBNext Ultra II DNA Library Preparation Kit (NEB, E7645) and ligated ~ 50 ng of cfDNA to the Illumina adapter according to the manufacturer's instructions. The resulting library was denatured at 95 °C for 10 min, immediately incubated on ice for 10 min, and then immunoprecipitated with 5-methylcytosine (5-mC) monoclonal antibody (Epigentek, A-1014). The MeDIP DNA was amplified with Q5 high-fidelity DNA polymerase (NEB, M0491), and the amplified products were purified with AMPure XP beads (Beckman). The amplified libraries were evaluated using a Bioanalyzer 2100 system (Agilent Technologies), and deep sequencing was performed using an Illumina HiSeq 2000 system.

Data processing and analysis
All qualified reads in the colorectal cancer patients' and healthy individuals' cfDNA MeDIP-seq raw data were mapped to the reference genome (Human hg38) using Bowtie (version 1.0.1) [28]. The MEDIPS analysis package (version 1.24.0) was used for the analysis and comparison of DNA methylation datasets between the patients and controls [29]. The 450K methylation array data (Illumina, San Diego, CA, USA) from normal colorectal tissue and colorectal cancer patient samples were obtained from the TCGA-COAD (colon adenocarcinoma) Samples Report (https:// gdac. broad insti tute. org/ runs/ stdda ta__ latest/ sampl es_ report/ COAD. html) and GEO database (GSE42752, GSE52270, GSE77718). Independent-sample t tests were performed between normal samples and patient samples using the R statistical programming language (3.4.3, http:// www.R-proje ct. org) using the data processed with beta (β) values (proportion of the methylated signal over the total signal), and the hypermethylated target genes with a p value < 0.05 were selected.

Whole-genome MeDIP-seq analysis of cfDNA
Plasma was collected from colorectal cancer patients (n = 4) and healthy controls (n = 3) for analysis in this study. The clinicopathological information of the patients is shown in Table 1. cfDNA was extracted from plasma using the QIAamp Circulating Nucleic Acid Kit. cfDNA derived from colorectal cancer patients (n = 4) and healthy controls (n = 3) was used for the construction of the MeDIP-seq libraries, followed by next-generation sequencing.
An Illumina HiSeq 2000 system was used to sequence the MeDIP-seq libraries. On average, 27 million and 52 million raw sequencing reads were obtained from the colorectal cancer patient group and the control group, respectively. The proportions of reads matched with the reference genome (Human hg38) were 66.2% and 52.9%, respectively. After filtering out the repetitive reads, the patient group had an average of 15 million unique reads, and the control group had an average of 5 million unique reads ( Table 2).

Distinctive cfDNA methylation patterns between colorectal cancer patients and healthy controls
To determine the overall cfDNA methylation patterns in the patients and healthy controls, we performed heuristic cluster analysis and unsupervised cluster analysis on cfDNA MeDIP data from colorectal cancer samples and normal samples, respectively. Through heuristic cluster analysis, we found that the methylation patterns were distinctive between the patient group and the control group (Fig. 1a). Genome-wide unsupervised cluster analysis also confirmed distinct methylation patterns between the two groups (Fig. 1b).

Differentially methylated regions (DMRs) in colorectal cancer patients
With the help of the MeDIPS analysis package, a total of 8398 DMRs were obtained from the genomewide distribution of patients (p value < 0.05). Among these DMRs, 1875 (22.3%) were hypermethylated, and 6523 (77.7%) were hypomethylated (Supplementary Table 1). We examined the genomic distributions of the  hypomethylated and hypermethylated DMRs and found that the proportion of hypermethylated DMRs was higher in the intergenic and intronic regions (Fig. 2a). The distribution of DMRs mapped to the whole genome on different chromosomes is shown in Fig. 2b. The 8398 DMRs exhibiting distinct patterns between colorectal cancer patients and normal controls are shown in Fig. 2c. Hypermethylation in the promoter region of tumor suppressor genes is known to be positively correlated with the occurrence and development of tumors [21,30]. Therefore, we further analyzed DMRs and identified 939 DMRs located in promoter regions ( Fig. 2d and Supplementary Table 2), including 16 hypermethylated regions and 923 hypomethylated regions. Furthermore, these 939 DMRs in the promoter regions also exhibited distinct patterns between the patients and the controls.

Validation of differentially methylated genes by using publicly available DNA methylation data
As mentioned above, we found that 16 of the DMRs located in the promoter region were hypermethylated, so we next wanted to determine whether the methylation levels of these corresponding genes could help to distinguish colorectal cancer patients from healthy individuals.
After annotating 16 DMRs with hypermethylated promoter regions, 13 genes were obtained, and their corresponding promoter region microarray probes were screened. During the screening process, probes located in the sex chromosome and the 3′UTR regions and the gene body regions were excluded, as were the SNP-related probes. Only the probes located in the UCSC (University of California Santa Cruz)_CpG_ Island regions were retained, so a total of 12 probes were used (Supplementary Table 3). The corresponding genes of the 12 probes mentioned above are PRDM14, RALYL, ELMOD1, and TMEM132E.
The 450K methylation array data were obtained from TCGA and GEO datasets, including both colorectal cancer patient samples (n = 295) and normal colorectal tissue samples (n = 193). Based on the aforementioned 12 probes, the predictive model of the logistic regression algorithm was established, and the 488 original data points were divided into the training dataset and validation dataset at a ratio of 4:1. The predictive ability of the model in the two datasets is shown in Fig. 3. According to the receiver operating characteristic (ROC) curves shown in the figure, the areas under the curve (AUCs) of the training dataset and the validation dataset were 0.928 and 0.915, respectively. Figure 3a, b shows the confusion matrix of the training dataset and the validation dataset, respectively. This suggested high validity for the diagnosis of colorectal cancer based on methylation levels of the 12 probes described above. We then extracted the 12 probes for unsupervised cluster analysis based on the 488 data points in the 450K methylation array dataset, and the results showed that the methylation data of the aforementioned 12 probes were distinct between tumor and normal tissues in general (Fig. 4a). We also compared the methylation levels of the aforementioned 12 probes between normal colorectal tissue and colorectal cancer patient tissue samples in the dataset, and we found that the methylation levels of the aforementioned 12 probes were significantly different (p value < 0.05). Compared with normal colorectal tissue, the methylation level of the 12 probes in the tumor tissue was hypermethylated (Fig. 4b). These results suggest that detecting the methylation levels of these 12 probes and their corresponding genes is helpful for the diagnosis of colorectal cancer.

Discussion
Abnormal patterns of DNA methylation, including the hypermethylation of gene promoter regions accompanied by gene silencing, play a key role in many types of cancer [13]. When apoptotic or necrotic tumor cells lyse, they release DNA fragments comprising cfDNA into the bloodstream [14]. Moreover, the methylation pattern of cfDNA in peripheral blood is similar to that found in tumor cells [22]. In this study, we performed a genome-wide epigenetic profiling assessment of patients with colorectal cancer using MeDIP-seq technology to screen for potential cfDNA biomarkers. Our analysis revealed global changes in cfDNA methylation patterns in colorectal cancer patients. We found 8398 DMRs in cfDNA collected from patients with colorectal cancer at the genome-wide level, among which 1875 (22.3%) were hypermethylated and 6523 (77.7%) were hypomethylated. When we focused on DMRs located in the promoter region, 16 (1.7%) were hypermethylated, and 923 (98.3%) were hypomethylated. This finding suggests that demethylation is widespread in cancer patients at the genome-wide level [31], with a higher proportion of hypomethylation observed in promoter regions. Studies have shown that DNA demethylation plays an important role in activating specific gene expression and the initiation of reprogramming [32]. After screening and annotating 16 hypermethylated DMRs in the promoter region, we obtained 12 probes from 4 differentially methylated genes, including PRDM14, RALYL, ELMOD1, and TMEM132E. Many reports have described the function of these genes: PRDM14 has been reported to be hypermethylated in lung cancer and has high accuracy in the diagnosis of lung cancer [33,34]. Studies have also shown that PRDM14 has several hypermethylated CpG sites in African-American colorectal cancer patients by using RRBS [35]. Meanwhile, we used MeDIP-seq technology to study cfDNA in the peripheral blood of Chinese patients with colorectal cancer. Although there were differences in the research methods, species, and specimens used, we obtained consistent results. RALYL has been reported to be downregulated in clear cell renal cell carcinoma, and its reduced expression is associated with poor prognosis [36], which means that it could serve as a tumor suppressor gene. Li et al [37] identified TMEM132E mutation as the most likely cause of autosomal recessive non-syndromic hearing loss by whole-exome sequencing. Johnson et al. [38] found that mutations in ELMOD1 may cause cochlear hair cell dysfunction, eventually leading to deafness in mice. Studies on the methylation of the last three genes in colorectal cancer have been rarely reported and are worthy of further study and verification.
Subsequently, to evaluate the diagnostic value of hypermethylated genes in colorectal cancer, methylation data were obtained from publicly available DNA methylation datasets due to the lack of cfDNA methylation data in public datasets. A predictive model of the foresaid 12 probes was constructed to confirm its high validity. Based on the diagnostic predictive model, we have demonstrated in the results section that we can effectively distinguish colorectal cancer patients from healthy controls by comparing their methylation levels in peripheral blood cfDNA. According to the training cohort (AUC = 0.928) and validation cohort (AUC = 0.915), the diagnostic prediction model could still distinguish colorectal cancer tissues from normal tissues. These results provide new methylation biomarkers for the early diagnosis of colorectal cancer. These findings indicate that the methylated genes that were identified from cfDNA derived from colorectal cancer patient plasma may have clinical application value. Therefore, cfDNA combined with MeDIP-seq, as a non-invasive and real-time diagnostic technique, is expected to be an effective method for the early clinical diagnosis of a variety of cancers [25,26].

Conclusions
In summary, the results of our study indicate that MeDIP-seq can be used as an optimal approach for analyzing cfDNA methylomes, and 12 probes of four