Identification of differentially methylated genes as diagnostic and prognostic biomarkers of breast cancer

Background Aberrant DNA methylation is significantly associated with breast cancer. Methods In this study, we aimed to determine novel methylation biomarkers using a bioinformatics analysis approach that could have clinical value for breast cancer diagnosis and prognosis. Firstly, differentially methylated DNA patterns were detected in breast cancer samples by comparing publicly available datasets (GSE72245 and GSE88883). Methylation levels in 7 selected methylation biomarkers were also estimated using the online tool UALCAN. Next, we evaluated the diagnostic value of these selected biomarkers in two independent cohorts, as well as in two mixed cohorts, through ROC curve analysis. Finally, prognostic value of the selected methylation biomarkers was evaluated breast cancer by the Kaplan-Meier plot analysis. Results In this study, a total of 23 significant differentially methylated sites, corresponding to 9 different genes, were identified in breast cancer datasets. Among the 9 identified genes, ADCY4, CPXM1, DNM3, GNG4, MAST1, mir129-2, PRDM14, and ZNF177 were hypermethylated. Importantly, individual value of each selected methylation gene was greater than 0.9, whereas predictive value for all genes combined was 0.9998. We also found the AUC for the combined signature of 7 genes (ADCY4, CPXM1, DNM3, GNG4, MAST1, PRDM14, ZNF177) was 0.9998 [95% CI 0.9994–1], and the AUC for the combined signature of 3 genes (MAST1, PRDM14, and ZNF177) was 0.9991 [95% CI 0.9976–1]. Results from additional validation analyses showed that MAST1, PRDM14, and ZNF177 had high sensitivity, specificity, and accuracy for breast cancer diagnosis. Lastly, patient survival analysis revealed that high expression of ADCY4, CPXM1, DNM3, PRDM14, PRKCB, and ZNF177 were significantly associated with better overall survival. Conclusions Methylation pattern of MAST1, PRDM14, and ZNF177 may represent new diagnostic biomarkers for breast cancer, while methylation of ADCY4, CPXM1, DNM3, PRDM14, PRKCB, and ZNF177 may hold prognostic potential for breast cancer. Supplementary Information The online version contains supplementary material available at 10.1186/s12957-021-02124-6.


Background
Breast cancer is the most commonly diagnosed cancer and the leading cause of cancer-associated death among women worldwide [1]. Early diagnosis and accurate prognostic assessment of breast cancer are crucial for timely targeted treatment [2]. Accumulating evidence suggests that DNA methylation may hold an important role for the development and progression of breast cancer [3][4][5].
DNA methylation consists in the addition of a methyl group to carbon 5-position of cytosine within a cytosine guanine (CpG) dinucleotide [6]. This molecular process is critical for several important cellular mechanisms, including embryonic development, regulation of gene expression, X-chromosome inactivation, and genomic imprinting and stability [7]. Aberrant hypo-and hypermethylation patterns of the DNA have been identified as critical players in tumorigenesis, promoting the expression or silencing of oncogenes and tumor suppressor genes, respectively [8][9][10]. Therefore, abnormal DNA methylation, acting as a cancer-related biomarker, could be helpful for cancer early detection and prognosis, as well as for predicting response to treatment of cancer.
DNA methylation markers are not currently in use in clinical settings for breast cancer assessment. This is mostly due to lack of evidence on their clinical value in large cohorts of breast cancer patients [4,[11][12][13]. Indeed, available data on the clinical potential of cancer-specific methylated markers rely on platforms with low genomic coverage, small sample datasets, or missing appropriate healthy counterparts for comparison [14].
In the present study, we aimed to evaluate methylation changes specific to breast cancer that could be used as tools in the clinical setting for diagnostic and prognostic assessment of patients. To achieve this goal, we used different bioinformatics approaches to analyze several publicly available methylation datasets of samples collected from cancer patients and healthy counterparts.

Description of breast cancer and control samples
Breast cancer and control samples publicly available at the Gene Expression Omnibus database (GEO, https:// www.ncbi.nlm.nih.gov/geo/) were used for the different bioinformatics analyses. Cancer samples were obtained from GSE72308, which includes three sets (GSE72245, N = 118; GSE72251, N = 119; GSE72254, N = 58), as well as from GSE141338 (N = 42), GSE100850 (N = 34), and GSE117439 (N = 52). DNA methylation data from normal tissue samples was used as control and was obtained from GSE88883 (N = 100), GSE74214 (N = 18), GSE141338 (N = 6), GSE100850 (N = 5), and GSE101961 (N = 121) datasets. Data from GSE41169 (N = 95) were used as a blood control dataset. Information of all samples is compiled and available in supplementary information ( Fig. 1; Table S1). Fig. 1 Workflow of the study. A multistep marker discovery analysis was performed to identify differentially methylated gene-based biomarkers of breast cancer

Differentially methylated analyses
Data from 118 breast cancer samples (GSE72245) and 50 normal samples (GSE88883) was analyzed by R package ChAMP, according to a previously described protocol [15]. Probe signal was removed when detected p value was above 0.05, or when more than 1% of the dataset contained no information. Briefly, differential methylation analysis was performed at probe (lmFit from limma; adjusted p ≤ 1 × 10 −35 ; minimum delta beta value of 0.35) or region level (bumphunter from minfi; regions represented by at least two probes with L ≥ 2). In order to minimize the risk of false positive detection in blood tests, methylation in leukocytes was excluded (GSE41169; maximum beta value allowed = 0.1). Differentially methylated probes were limited to those overlapping differentially methylated regions, which was distant of a maximum 150 bp, not located in centromeres or telomeres. Lastly, methylation level of differentially methylated genes in control and breast cancer samples was plotted with the use of GraphPad Prism software.

UALCAN database analysis
UALCAN online tool (http://ualcan.path.uab.edu) is designed to provide easy access to publicly available cancer transcriptome data (TCGA and MET500 transcriptome sequencing), including 793 breast cancer samples and 97 normal samples. Therefore, it was used to perform a comprehensive analysis of promoter DNA methylation patterns in control and breast cancer samples [16]. In this study, the beta value indicated level of DNA methylation ranging from 0 (unmethylated) to 1 (fully methylated), and different beta value cutoff was considered as hypomethylation [beta value 0.3-0.25] or hypermethylation [beta value 0.7-0.5]. Additionally, mRNA expression of the identified genes in breast cancer was also analyzed using UALCAN.

Marker discovery analysis
Receiver operating characteristic (ROC) analyses were performed in GSE72251 and GSE88883 with the pROC package in R Bioconductor to establish thresholds, considering normal and adjacent mucosa as positive outcome and cancer as negative; only loci showing a threshold below 0.35 were kept. ROC curve was generated, and area under the curve (AUC) with the binomial exact confidence interval was calculated. For AUC values above 0.9, the differentially methylated gene was deemed able to distinguish between control and breast cancer with excellent specificity and sensitivity. AUC for the combined epigenetic signature was assessed using a logistic regression model [17]. Each threshold was used to stratify the two mixed cohorts, defining a positive predictive value and negative predictive value for discriminating normal adjacent from tumor tissue. The two mixed cohorts were as follows: mixed cohort 1 included breast cancer (GSE141338, GSE100850, and GSE117439) and control (GSE101961) samples, whereas mixed cohort 2 included breast cancer (GSE72254) and control (GSE74214, GSE141338, and GSE100850) samples.

Survival analysis
Prognostic value of the selected DNA methylationdriven genes was evaluated through the Kaplan-Meier plot assessment (http://kmplot.com/analysis/) with data from the mRNA breast cancer database [18]. Median value of all gene expression levels was used as threshold to identify and separate cases with high or low gene expression. p < 0.05 was considered significant.

Identification of differentially methylated genes
To evaluate the DNA methylation pattern in breast cancer, we started by comparing 118 breast cancer and 50 control samples. We identified 105,143 differentially methylated positions and 8764 regions in breast cancer cases compared to controls (Fig. 2). Next, we filtered these differentially methylated sites as described in the "Methods" section, allowing us to refine our findings to a total of 23 differentially methylated sites. Importantly, these sites were directly linked to the transcription of 9 genes, namely adenylyl cyclase 4 (ADCY4), carboxypeptidase X (CPXM1), dynamin 3 (DNM3), guanine nucleotide bindingprotein gamma subunit 4 (GNG4), microtubule associated serine-threonine kinase 1 (MAST1), microRNA 129-2 (mir129-2), PR domain zinc finger protein 14 (PRDM14), protein kinase C beta (PRKCB), and zinc finger protein 177 (ZNF177) (Table S2; Fig. 3; Table 1). All genes, with exception of PRKCB, had significantly higher levels of DNA methylation in breast cancer samples compared to controls.
In order to validate the correlation between DNA methylation levels and mRNA expression of the identified genes in breast cancer, we used the online tool UALCAN. As expected, methylation levels of ADCY4, CPXM1, DNM3, GNG4, MAST1, PRDM14, and ZNF177 were found to be increased in breast cancer, and all with p values lower than 0.001 (Fig. 4). Note that information related to mir129-2 was not available in UALCAN, so we could not conduct this analysis. Then, we found the mRNA expression of ADCY4, CPXM1, GNG4, and ZNF177 were significantly decreased in breast cancer, the mRNA expression of MAST1 was significantly upregulated, but there was  (0.9883), and ZNF177 (0.9786) were all above 0.9 (Fig. 6). Then, we validated the diagnostic value of the combined logistic regression model in these two cohorts, and found the AUC for the combined signature of 7 genes (ADCY4, CPXM1, DNM3, GNG4, MAST1, PRDM14, ZNF177) was 0.9998 [95% CI 0.9994-1] (Fig. 7a) and the AUC for the combined signature of 3 genes (MAST1, PRDM14, and ZNF177) was 0.9991 [95% CI 0.9976-1] (Fig. 7b). Next, each threshold was used to stratify the two mixed cohorts. Our results showed that the breast cancer specificity of each gene ranged from 50.41 to 98.35%, while the sensitivity ranged from 84.25 to 97.64%, and accuracy from 67.82 to 91.13% in mixed cohort 1 (Table 2). Particularly, the specificity, sensitivity, and accuracy of MAST1 were 81.82%, 97.64%, and 89.92%; those of PRDM14 were 97.52%, 84.25%, and 90.73%; and those of ZNF177 were 80.17%, 89.76%, and 85.08%, respectively (Table 2). Results obtained in mixed cohort 2 also followed the same trend, with specificity, sensitivity, and accuracy of MAST1 being 75.86%, 100%, and 91.95%; of PRDM14 being 89.66%, 86.21%, and 87.36%; and of ZNF177 being 89.66%, 93.10%, and 91.95%, respectively (Table 3).

Prognosis analyzed by K-M plotter
To further explore the clinical value of these biomarkers, we evaluated whether 6 of our differentially methylated genes-ADCY4, CPXM1, DNM3, PRDM14, PRKCB, and ZNF177-had any relation with overall survival of breast cancer patients. Hazard ratios of these 6 genes showed significant differences between the high-expression and CHR and MAPINFO represent chromosome and position information; McaM represents the mean methylation percentage of the cases, and the McoM represents the mean methylation percentage of the controls; DMR represents differentially methylated region; p value is calculated through the Wilcoxon rank-sum test followed by FDR (false discovery rate) adjustment for multiple correction the low-expression groups, with high expression of all genes being significantly associated with longer overall survival (Fig. 8).

Discussion
Breast cancer has extremely high mortality worldwide, mostly due to late diagnosis. Cancer-specific DNA methylation patterns are correlated with gene silencing or activation in several types of cancers [19,20]. Recent studies highlight that aberrant DNA methylation is significantly associated with breast cancer and demonstrated that DNA methylation analysis may help predict the outcome of patients with breast cancer [12,21]. In this study, we identified differentially methylated genes and confirmed the diagnostic and prognostic value of 6 of these methylation-based biomarkers in breast cancer using a bioinformatics approach. We first identified 23 differentially methylated CpG sites in breast cancer samples as compared to control counterparts. And the 23 differentially methylated CpG sites correspond to 9 genes, and then, 8 of these 9 genes were coding genes-ADCY4, CPXM1, DNM3, GNG4, MAST1, mir129-2, PRDM14, and ZNF177that had significantly higher levels of DNA methylation in breast cancer. Similarly, methylation levels described in UALCAN analysis for all these genes were found to be significantly higher in patients with breast cancer, with exception for mir129-2 that was not possible to assess. Further analysis revealed the potential of these 8 differentially methylated genes to accurately predict the outcome of patients in training and validation datasets, suggesting that they could be used as biomarkers for breast cancer diagnosis. Additionally, combination of 7 of these methylation markers significantly improved our ability to predict the outcome of breast cancer patients. Overall, we found that MAST1, PRDM14, and ZNF177 had high sensitivity, specificity, and accuracy for the diagnosis of breast cancer. Growing evidence shows a strong relationship between epigenetic and genetic aberrations of MAST1, PRDM14 [22], and ZNF177 [17] in tumorigenesis. Previous studies reported that abnormal MAST1 expression is significantly associated with worse cancer prognosis [23,24]. Oishi et al. found that aberrant promoter demethylation of MAST1 could be responsible for overexpression of this gene in malignant pheochromocytoma and paraganglioma [25]. Other studies have shown that silencing of PRDM14 can suppress tumorigenicity and metastasis potential of breast cancer cells [26], while methylationmediated gene silencing of PRDM14 leads to apoptosis evasion in human papillomavirus-positive cancer cells [27]. Several reports have also shown that methylation of ZNF177 is associated with different types of cancer including gastric and endometrial cancers [28], as well as non-small cell lung carcinoma. ZNF177 is methylation-silenced in gastric cancer cell lines, whereas methylation of its promotor is a frequent epigenetic event in endometrial cancer. Indeed, ROC analysis of ZNF177 has demonstrated that it can identify endometrial carcinomas cases with a sensitivity, specificity, and accuracy of 92.3%, 94.4%, and 95.1%, respectively. Furthermore, hypermethylated CpG islands within ZNF177 were selected as candidate biomarker for further validation in NSCLC. Nakakido et al. demonstrated that ZNF177 is overexpressed in breast cancer and plays a critical role in cancer cell proliferation [29]. However, the role of MAST1, PRDM14, and ZNF177 in diagnosis and prognosis of breast cancer remains unclear.
Our findings add a new layer of evidence to the epigenetic landscape of breast cancer, providing convincing clues that MAST1, PRDM14, and ZNF177 are differentially methylated in breast cancer, as well as that they may serve as potential drivers and biomarkers for breast cancer. Furthermore, our study demonstrates that high expression of ADCY4, CPXM1, DNM3, PRDM14, PRKCB, and ZNF177 are significantly associated with longer patient survival. This finding supports the hypothesis that methylation-driven genes are likely to be associated with clinical outcomes in cancer and can be used as potential biomarkers for predicting the outcome of breast cancer patients.  The AUC for the combined signature of 3 genes (MAST1, PRDM14, and ZNF177) using a logistic regression model Fig. 8 Analysis of differentially methylated genes' prognosis in breast cancer patients using the Kaplan-Meier plotter. logrank p < 0.05 was statistically significant. HR, hazard ratio; the greater the absolute value of (HR-1), the greater the difference between groups in overall survival

Conclusions
In summary, we have identified and independently validated abnormal DNA methylation patterns in MAST1, PRDM14, and ZNF177 as potential biomarkers for breast cancer diagnosis. Moreover, we showed that DNA methylation landscape of ADCY4, CPXM1, DNM3, PRDM14, PRKCB, and ZNF177 could be selected as accurate biomarkers for the prognosis of breast cancer. Overall, these findings provide a novel epigenetic predictive model that may help improve the diagnosis and prognosis of breast cancer.
Additional file 1: Table S1. The information of all samples. Table S2. A total of 23 differentially methylated sites were identified.