Clinical value of miR-198-5p in lung squamous cell carcinoma assessed using microarray and RT-qPCR

Background To examine the clinical value of miR-198-5p in lung squamous cell carcinoma (LUSC). Methods Gene Expression Omnibus (GEO) microarray datasets were used to explore the miR-198-5p expression and its diagnostic value in LUSC. Real-time reverse transcription quantitative polymerase chain reaction was used to evaluate the expression of miR-198-5p in 23 formalin-fixed, paraffin-embedded (FFPE) LUSC tissues and corresponding non-cancerous tissues. The correlation between miR-198-5p expression and clinic pathological features was assessed. Meanwhile, putative target messenger RNAs of miR-198-5p were identified based on the analysis of differentially expressed genes in the Cancer Genome Atlas (TCGA) and 12 miRNA prediction tools. Subsequently, the putative target genes were sent to Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses. Results MiR-198-5p was low expressed in LUSC tissues. The combined standard mean difference (SMD) values of miR-198-5p expression based on GEO datasets were − 0.30 (95% confidence interval (CI) − 0.54, − 0.06) and − 0.39 (95% CI − 0.83, 0.05) using fixed effect model and random effect model, respectively. The sensitivity and specificity were not sufficiently high, as the area under the curve (AUC) was 0.7749 (Q* = 0.7143) based on summarized receiver operating characteristic (SROC) curves constructed using GEO datasets. Based on the in-house RT-qPCR, miR-198-5p expression was 4.3826 ± 1.7660 in LUSC tissues and 4.4522 ± 1.8263 in adjacent normal tissues (P = 0.885). The expression of miR-198-5p was significantly higher in patients with early TNM stages (I-II) than that in cases with advanced TNM stages (III-IV) (5.4400 ± 1.5277 vs 3.5690 ± 1.5228, P = 0.008). Continuous variable-based meta-analysis of GEO and PCR data displayed the SMD values of − 0.26 (95% CI − 0.48, − 0.04) and − 0.34 (95% CI − 0.71, 0.04) based on fixed and random effect models, respectively. As for the diagnostic value of miR-198-5p, the AUC based on the SROC curve using GEO and PCR data was 0.7351 (Q* = 0.6812). In total, 542 genes were identified as the targets of miR-198-5p. The most enriched Gene Ontology terms were epidermis development among biological processes, cell junction among cellular components, and protein dimerization activity among molecule functions. The pathway of non-small cell lung cancer was the most significant pathway identified using Kyoto Encyclopedia of Genes and Genomes analysis. Conclusion The expression of miR-198-5p is related to the TNM stage. Thus, miR-198-5p might play an important role via its target genes in LUSC.


Background
Lung cancer ranks first among all cancers in terms of incidence, and it is also the most important cause of cancer death all over the world [1]. Non-small cell lung cancer (NSCLC) accounts for around 85% of all lung cancers, with 30% of NSCLC cases being classified as lung squamous cell carcinoma (LUSC) [2][3][4][5][6][7]. Many patients are diagnosed with NSCLC at an advanced phase, which is attributable for the high mortality [8]. Therefore, more effective biomarkers are urgently needed in LUSC.
MicroRNAs (miRNAs) are ∼ 22-nt long endogenous RNAs that play significant roles in various cellular processes [9]. Previous studies have shown that miRNAs can target mRNAs involved in most of the developmental processes and are thus associating with many diseases [10]. Moreover, miRNAs have also been found to be involved in cancer [11]. Studies have found that miR-198-5p plays a vital role in many human cancers, including lung cancer [12]. A previous study [12] investigated the relationship between FGFR1 and miR-198-5p. However, the expression pattern of miR-198-5p in LUSC remains unknown. Additionally, the prospective target genes of miR-198-5p in LUSC have not yet been identified. Therefore, the relationship between miR-198-5p and lung squamous cell carcinoma as well as the underlying mechanism remains unknown.
To evaluate the clinical significance of miR-198-5p in LUSC, we examined miR-198-5p expression in LUSC tissues and carried out additional specific analyses to uncover its clinicopathological role. Furthermore, we performed big data analysis based on the Gene Expression Omnibus (GEO) and the Cancer Genome Atlas (TCGA). Subsequently, bioinformatics examinations were conducted to investigate the probable mechanism of miR-198-5p in LUSC.

Retrieval of data and publications from TCGA and GEO
The flowchart representing the main design of our study is shown in Fig. 1. We downloaded miRNA expression data from the Cancer Genome Atlas (TCGA) associated with LUSC. All data were converted to a log2 scale. We also retrieved data from the Gene Expression Omnibus (GEO) and ArrayExpress to assess the expression pattern of miR-198-5p in LUSC and corresponding non-tumor samples. The search terms were as follows: ("lung" OR "pulmonary" OR "respiratory" OR "bronchioles" OR "bronchi" OR "alveoli" OR "pneumocytes" OR "air way" [MeSH]) AND ("cancer" OR "carcinoma" OR "tumor" OR "neoplas" OR "malignan" "squamous cell carcinoma" OR "adenocarcinoma" [MeSH]) OR/AND ("MicroRNA" OR "miRNA" OR "MicroRNA" OR "Small Temporal RNA" OR "noncoding RNA" OR "ncRNA" OR "small RNA" [MeSH]). Datasets with expression levels of miR-198-5p in LUSC and corresponding non-tumor samples were included. Other types of tumor or other miRNA were excluded. The number of samples in the tumor and non-tumor groups was at least three. The expression level of miR-198-5p in the datasets was converted to a log2 scale. The number, mean, and standard deviation of miR-198-5p levels in the tumor and non-tumor groups were calculated. We also searched PubMed, Web of Science, Science Direct, Google Scholar, Ovid, LILACS, Wiley Online Library, EMBASE, Cochrane Central Register of Controlled Trials, Chong Qing VIP, CNKI, Wan Fang, and China Biology Medicine disc; however, no publications regarding miR-198-5p expression in LUSC were found in these databases.

RT-qPCR
After the formalin-fixed, paraffin-embedded (FFPE) sections were dewaxed, total RNA was achieved from these sections with the miRNeasy FFPE Kit (QIAGEN) according to manufacturer's instruction. The concentration of RNA was measured using Nanodrop 2000. MiR-191 (CAACGGAAUCCCAAAAGCAGCU) and miR-103 (AGCAGCAUUGUACAGGGCUAUGA) were used as stably expressed control miRNAs as previously reported [13]. Applied Biosystems 7900 PCR system was used to perform real-time quantitative PCR and detect miR-198-5p expression (GGUCCAGAGGGGAGAUAGGUUC). The relative expression of miR-198-5p was calculated using the formula 2 −ΔCq .

Statistical analysis
Paired sample t test and independent sample t test were performed using SPSS 23.0 to determine the association between miR-198-5p expression and various clinicopathological parameters based on real-time RT-qPCR and microarray data. P < 0.05 was regarded as being statistically significant. Receiver operating characteristic (ROC) curves were constructed using SPSS 23.0.
Concerning the meta-analysis based on all accessible data, we used Stata 14 to determine the combined expression value of miR-198-5p in tumor and nontumor groups and its relationship with both standard    Fig. 6 Pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic score, and odds ratio obtained using MetaDisc 1.4 based on GEO datasets mean difference (SMD) and summarized receiver operating characteristic (SROC). The fixed effect model was initially used. The random effect model was used when heterogeneity was detected. The data were considered heterogeneous when I 2 > 50%. Subgroup analysis and sensitivity analysis were carried out to find out the source of heterogeneity. We tested publication bias with funnel plots. SPSS 23.0 was employed to explore the diagnostic value of miR-198-5p in LUSC based on GEO data. Then, we performed a diagnostic metaanalysis using MetaDiSc 1.4. Meta-regression and threshold effect analysis were performed to determine the source of heterogeneity. The data from TCGA were not included in the analysis because of missing data.

Differentially expressed mRNAs in LUSC based on TCGA
The expression level of each mRNA transformed into the log2 scale was evaluated using DESeq R package. We obtained 9860 differentially expressed genes in LUSC, including 6092 upregulated and 3768 downregulated genes.

Selection of putative target genes of miR-198-5p
Predictions were conducted in silico with miRWalk 2.0 (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/). Genes that were present in more than 5 of the 12 prediction online tools were selected for further analysis. The selected genes were cross-referenced with the differentially expressed genes in TCGA. The overlapping genes were considered the putative targets of miR-198-5p in LUSC.

Bioinformatics analyses
Gene Ontology (GO) annotation via DAVID (https:// david.ncifcrf.gov/) was performed, including biological processes (BP), cellular components (CC), and molecular functions (MF). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with the putative target genes were also analyzed in DAVID. The results of GO annotation and KEGG pathway analysis were visualized using BiNGO and EnrichmentMap plugins in Cytoscape version 3.5.0. We used STRING (https:// string-db.org/) to build interaction maps of the proteins encoded by the putative target genes.

Validation of the putative target genes in the most significant KEGG pathway based on TCGA data
We selected the genes in the most significant KEGG pathway "non-small cell lung cancer" for further analysis. The differences in the expression levels of E2F2, E2F3, TGFA, PRKCG, CDK6, EGF, and CDK4 between LUSC and non-tumor tissues were analyzed based on TCGA data.

GEO data mining to determine the expression and diagnostic value of miR-198-5p
Considering the meta-analysis of miR-198-5p expressionbased GEO data, eight datasets were included in our study. The scatter plots based on the GEO datasets are shown in Fig. 2. Forest plots using both fixed effect model (Fig. 3a) and random effect model (Fig. 3b) represented the expression level of miR-198-5p in LUSC. The combined effect sizes were − 0.30 (95% CI − 0.54, − 0.06) and − 0.39 (95% CI − 0.83, 0.05) based on the fixed and random effect models, respectively. Subgroup analysis showed that there was no heterogeneity among the studies from Asia (I 2 = 0.0%) (Fig. 3c). The corresponding funnel plot is shown in Fig. 4a (P > 0.05). Sensitivity analysis ( Fig. 4b) showed that the dataset GSE40738 might be a source of heterogeneity. The forest plot after the removal of GSE40738 is shown in Fig. 4c. The adjusted combined SMD value was − 0.56 (95% CI − 0.86, − 0.27) with I 2 = 33.1%. The receiver operating characteristic curves based on the included datasets from GEO database are shown in Fig. 5. The pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and diagnostic odds ratio were 0.47 (95% CI 0. 40 6). The area under the curve (AUC) based on the summarized receiver operating characteristic (SROC) curve was 0.7749 (Q* = 0.7143) (Fig. 7). We did not find a threshold effect of miR-198-5p in the study (P = 0.058). Only study region was determined to be a covariant in the meta-regression, and thus it was likely not a source of heterogeneity (P = 0.0550) ( Table 1). Clinical value of miR-198-5p in LUSC assessed using RT-qPCR Using RT-qPCR, the expression of miR-198-5p in the LUSC tissues was (4.3826 ± 1.7660) compared with that in the non-tumor tissues (4.4522 ± 1.8263, P = 0.885) (Fig. 8a, b). The other clinicopathological features of the LUSC case are shown in Table 2. Notably, the expression level of miR-198-5p in patients with early TNM stage (I-II) was (5.4400 ± 1.5277) compared to (3.5690 ± 1.5228) in patients with advanced TNM stage (III-IV) (P = 0.008) (Fig. 8c, d).

Discussion
MiR-198-5p was clearly under-expressed in LUSC tissues in comparison with non-cancer lung tissues. The cases with LUSC in Asia expressed lower levels of miR-198-5p than did the healthy controls; however, the expression pattern in other regions was unclear. Our RT-qPCR indicated that the expression of miR-198-5p might be related to the tumor TNM stage, which Fig. 11 Pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic score, and odds ratio obtained using MetaDisc 1.4 based on GEO datasets and in-house RT-qPCR results  suggests that miR-198-5p likely plays a role in tumor growth, lymph node metastasis, or distant metastasis. However, the downregulation of miR-198-5p was not obvious in our in-house RT-qPCR analysis. The diagnostic validation and meta-analysis based on GEO and RT-qPCR data indicated that miR-198-5p might be a biomarker of LUSC, but the sensitivity and specificity were not sufficiently high. Similarly, study region may be a source of heterogeneity, which suggested that the diagnostic screening might be suitable for Asian populations but not for others. Apart from LUSC, several studies have explored the expression pattern and mechanism of miR-198-5p in other diseases. MiR-198-5p has been reported to be upregulated in multiple myeloma [14], chronic pancreatitis or pancreatic ductal adenocarcinoma [15], Parkinson's disease [16], esophageal cancer [17], preeclampsia [18], pancreatic adenocarcinoma, ampullary adenocarcinoma [19], lupus nephritis [20], retinoblastoma [21], anencephaly [22], and squamous cell carcinoma of tongue [23]. On the other hand, low expression of   miR-198-5p has been found in prostate cancer [24], breast cancer [25], glioblastoma [26,27], hepatocellular carcinoma [28], especially hepatitis C virusassociated hepatocellular carcinoma [29,30], osteosarcoma [31], gastric cancer [32], colorectal cancer [33], pancreatic cancer [34], and respiratory syncytial virus (RSV) infection [35]. The overexpression of miR-198-5p has also been documented in CD8+ T cells in renal cell carcinoma [36]. In prostate cancer, a recent study indicated that miR-198-5p is targeted by the long noncoding RNA SChLAP1, leading to the activation of the MAPK1 pathway, thereby promoting cancer cell proliferation and metastasis [24]. Another study suggested that miR-198-5p may be involved in prostate cancer [37]. In hepatocellular carcinoma, miR-198-5p has been shown to target the HGF/c-MET pathway [38]. Several studies have revealed that the expression of miR-198-5p is greatly related to lymph node metastasis or distant metastasis in different malignant diseases, such as breast cancer [25], osteosarcoma [31], gastric cancer [32], and colorectal cancer [33]. Some studies have also shown that miR-198-5p is closely related to cell proliferation, apoptosis, and migration [12,14,39,40]. The relationship between miR-198-5p and cancer prognosis is controversial [15,17,27,[32][33][34]. Thus, we In the case of lung adenocarcinoma, two reports have verified that miR-198-5p is under-expressed [12,41]. However, studies on the characteristic of miR-198-5p in LUSC are lacking. One study assessed the diagnostic significance of miR-198-5p in lung adenocarcinoma, with sensitivity = 71.1%, specificity = 95.2%, and AUC = 0.887 (95% CI 0.801, 0.945) [41]. Our study highlighted the diagnostic value of miR-198-5p in LUSC. Yang et al. showed that miR-198-5p was capable to suppress proliferation and promote apoptosis in lung cancer cells by targeting FGFR1 [12], and Wu et al. showed that miR-198-5p promotes apoptosis, represses cell proliferation, and leads to cell cycle arrest in lung adenocarcinoma cells by directly targeting SHMT1 [39]. Our study showed that expression pattern and diagnostic value of miR-198-5p varied according to the race of the patient population, which should be further validated in larger samples.
Although many studies use prediction tools to determine miRNA target genes, the inadequate number of available prediction tools can lead to unreliable data. We used 12 online prediction tools based on miRWalk 2.0, and this method had not been previously utilized in LUSC. The predicted genes were cross-referenced with the differentially expressed genes in TCGA, which further enhanced the specificity and accuracy of our investigation. Because miR-198-5p is downregulated in LUSC, we chose the upregulated genes from TCGA. Via bioinformatics analyses, the putative target genes of miR-198-5p were most significantly enriched in the KEGG pathways of non-small cell lung cancer, pathways in cancer, pancreatic cancer, glioma, and ECM-receptor  interactions. For the GO biological processes, the putative target genes of miR-198-5p were involved in epidermis development, negative regulation of neuron apoptotic processes, positive regulation of cell proliferation, the canonical Wnt signaling pathway, and nervous system development, which indicated that the putative target genes might regulated epidermis proliferation, cell apoptosis, or carcinoma of nervous tissues. In addition, for the GO cellular components, the putative target genes of miR-198-5p were enriched in cell junction, transcription factor complex, synaptic vesicle, cell surface proteins, and synapses, which are related to migration, metastasis, and intercellular exchange of molecules. In terms of the GO molecular functions, the putative target genes of miR-198-5p were associated with protein dimerization activity, calcium ion binding, Fig. 18 Protein-protein interaction (PPI) networks with 537 nodes and 848 edges were constructed using STRING. The PPI enrichment P value was 1.7E−14. Disconnected nodes were hidden in the network hydrolase activity on carbon-nitrogen (but not peptide) bonds, frizzled binding, and RNA polymerase II transcription co-activator activity. To verify the accuracy of our analysis, we selected several genes and determined its expression based on TCGA. The genes included the pathway non-small cell lung cancer were E2F2, E2F3, TGFA, PRKCG, CDK6, EGF, and CDK4, which were all expressed at significantly higher levels in LUSC tissues in comparison to that in the non-cancer group. Thus, these genes are probably the targets of miR-198-5p. The putative target genes of miR-198 should be validated further in the future.