Development and internal validation of PI-RADs v2-based model for clinically significant prostate cancer

Background Our objective is to build a model based on Prostate Imaging Reporting and Data System version 2 (PI-RADs v2) and assess its accuracy by internal validation. Methods Patients who took prostate biopsy from 2014 to 2015 were retrospectively collected to compose training cohort according to the inclusion criteria and patients in 2016 composing validation cohort. Diagnostic performance was evaluated by analyzing the area under the curve (AUC), calibration curves, and decision curves. Results Of the 441 patients involved, the clinically significant prostate cancer (csPCa) detection rate were 40.6% (114/281) and 43.8% (70/160) in the training and validation cohort, respectively. Meanwhile, PCa detection rate were 50.2% (141/281) and 53.8% (86/160). Age, prostate-specific antigen density (PSAD)*10 and PI-RADs v2 score composed the model for PCa (model 1) and csPCa (model 2). The area under the curve of models 1 and 2 was 0.870 (95% CI 0.827–0.912) and 0.753 (95% CI 0.717–0.828) in the training cohort, while 0.845 (95% CI 0.786–0.904) and 0.834 (95% CI 0.787–0.882) in the validation cohort. Both models illustrated good calibration, and decision curve analyses showed good performance in predicting PCa or csPCa when the threshold was 0.35 or above. Conclusions The model based on age, PSAD*10 and PI-RADs v2 score showed internally validated high predictive value for both PCa and csPCa. It could be used to improve the diagnostic performance of suspicious PCa.


Background
Prostate cancer (PCa) ranks as the second most common malignancy in male population and has been the second leading cause of cancer-related mortality in Western men [1]. Though the high morbidity and mortality exist, advancements in the early diagnosis attribute much to the improvement of life expectancy. The conventional screening pathway mainly emphasized elevated prostate-specific antigen (PSA) and abnormal digital rectal examination (DRE). However, both the sensitivity and specificity were found to be suboptimal and insufficient for early detection [2].
Multiparametric magnetic resonance imaging (mpMRI) enjoys priority in visualization of prostate due to its high soft-tissue contrast, high resolution, and simultaneous image functional parameters [3]. To set standardized reporting and propose criteria for interpreting data of mpMRI, the European Society of Urogenital Radiology (ESUR) published a reporting system termed Prostate Imaging Reporting and Data System version 1 (PI-RADs v1) in 2012, which was based on four MRI sequences (T2-weighted imaging (T2WI), diffusion-weighted imaging (DWI), dynamic contrast enhanced MRI (DCE-MRI), and MR spectroscopy) [4]. Though PI-RADS v1 system has been validated the accuracy and reproducibility, however, it was not specified exactly how to combine each MRI sequence to derive an overall category assessment, which resulted in confusion in its application. To address this issue, the ESUR and American College of Radiology agreed on the improved PI-RADS version 2 (PI-RADSv2) released online in 2014 [5]. The intended clinical application of PI-RADS v2 is for the diagnostic evaluation as well as risk assessment, and the assessment category of transition zone lesions is mainly determined by the T2WI score while that of peripheral zone lesions is defined by the DWI score [6]. Several studies have validated the high sensitivity and specificity of PI-RADs v2 in diagnosis of prostate cancer [4][5][6][7], and updated PI-RADSv2 shows significant improvement compared with the original Prostate Imaging Reporting and Data System (PIRADS) v1.
There were several risk calculators for PCa, such as European Randomized Study for Screening of Prostate Cancer Risk Calculator (ERSPC-RC), Prostate Cancer Prevention Trial Risk Calculator (PCPT-RC), and Chinese Prostate Cancer Consortium Risk Calculator (CPCC-RC) [8]. The validity of all of the above has been validated in previous studies. However, none of them is composed of PI-RADs v2. The primary objective of this study is to build a model based on PI-RADs v2 and assess its accuracy by internal validation.

Study population and data collection
Five hundred forty-three men with suspicion of PCa (elevated PSA levels and/or suspicious DRE) who were biopsy-naive were collected and registered into a reprospective database after the approval of Ethical Committee of Beijing Friendship hospital. Transrectal ultrasound (TRUS)-guided 24-core biopsy was given from January 2014 to December 2016. The exclusive criteria were as follows: patients with urinary tract infection, urinary retention, or consistent catheterization within the past 2 weeks [1]; patients who received 5αreductase inhibitors within the last 2 months [2]; those aged older than 90 years old or who had PSA level greater than 100 ng/ml [3]; those with previous history of transurethral resection of prostate (TURP) [4]; and patients without recording of PSA value, age, or MRImeasured prostate volume (PV) [5]. After that, a total of 441 patients were included, which were composed of 281 patients in the training cohort from 2014 to 2015 and 160 patients in the validation cohort from 2016. All of them received mpMRI before biopsy.

mpMRI protocol
The prostate mpMRI was performed at 3 Tesla (T) as recommended [5]. The acquisition protocol included T2WI, T1WI, DWI with apparent diffusion coefficient map (ADC), and DCE sequences and calculated b value of 1000 or above. Each sequence used a five-point assessment scale (except for DCE) which graded the level of suspicion for the presence of PCa from 1 to 5 (very low to very high) [5]. The dominance sequence is used according to zonal anatomy. DWI was the primary determining sequence of the peripheral zone (PZ), while T2WI was mainly for the transitional zone (TZ). DCE had limited contribution as merely presence and absence of early focal enhancement when T2W and DWI were of adequate diagnostic quality. However, it played a supporting role in the indeterminate category 3 PZ lesions. A urologist who was experienced with PI-RADs v2 and blinded to histopathology as well as clinical data reviewed all the images and performed scoring.

Histopathological analysis
The TRUS-guided systematic biopsy of 24-needle cores (20 cores in PZ and 4 cores in TZ) were performed within 3 months after MRI. A uropathologist with more than 20 years in urological pathology revised the histopathology results and assigned Gleason scores. The clinically significant prostate cancer (csPCa) was defined as Gleason score (GS ≥ 4 + 3 or 3 + 4 with PSA > 10 ng/ml, > 3 biopsy cores positive, or at least one biopsy core with > 50% involvement), according to Epstein criteria [9].

Statistical analysis
Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with 95% confidence interval (CI) were calculated for diagnostic accuracy of PI-RADs v2 in contrast to histological findings. Independent T test and chi-square test were performed to determine significant differences in baseline characteristics. Univariate and multivariate logistic regression was performed to explore the relationship between variables and results (PCa or csPCa). Multivariate logistic regression model for predicting PCa and csCa was constructed. The diagnostic performance of the model was assessed by receiver operating curves (ROC) and comparing diagnostic accuracy in validation cohort. Calibration curves were used to assess the extent of over-or underestimation of the models. Decision curves were applied to determine the clinical net benefit derived from the use of the model. The area under the curve (AUC) was applied for the assessment of the accuracy. P value less than 0.05 was considered to indicate a statistically significant. All analyses were performed with SPSS software (Version 21.0. IBM), and R version 3.0.0 and the figures were painted using GraphPad Prism 5.

Characteristics and biopsy outcomes
All the enrolled people were of yellow race. There are 281 patients and 160 patients in the development cohort and validation cohort, respectively. One hundred forty-one patients (50.1%) and 86 patients (53.8%) were diagnosed PCa in the development cohort and validation cohort (P = 0.06), respectively. While 114 patients (40.6%) and 70 (43.8%) patients were diagnosed csPCa in the two cohorts (P = 0.12). In the training cohort, 94 patients were diagnosed with csPCa for GS ≥ 4 + 3 and 20 patients for GS = 3 + 4 with PSA > 10 ng/ml, > 3 biopsy cores positive, or at least one biopsy core with > 50%. In the validation cohort, 66 patients were diagnosed with csPCa for GS ≥ 4 + 3 and 4 patients for GS = 3 + 4 with PSA > 10 ng/ml, > 3 biopsy cores positive, or at least one biopsy core with > 50%.

Efficiency of PI-RADS v2 alone in diagnosis of PCa and csPCa
PI-RADs v2 was proved to be 76.6% sensitive and 83.6% specific with positive predictive value 67.9% and negative predictive value 73% when used alone for diagnosis of PCa. However, when assessing the diagnostic performance of PI-RADs v2 for csPCa, the sensitivity and specificity were 85.9 and 63.5%, respectively, with positive predictive value 61.6% and negative predictive value 86. 9%. Table 2 showed the diagnostic performance of PI-RADs v2 alone.

Construction of prediction models
At univariate analysis, all variables represented independent predictors of PCa and csPCa (all p < 0.05, Table 3). Further multivariate analysis showed that only age, PSAD*10, and PI-RADs v2 score were significantly associated with biopsy results (PCa or csPCa). So, these parameters were entered into the prediction models which stood for PCa (model 1) and csPCa (model 2). In the development cohort, model 1 achieved an area under the curve (AUC) of 0.870 (95% CI 0.827-0.912) and the AUC was 0.753 (95% CI 0.678-0.828) for model 2 predicting csPCa. In the validation cohort, the AUC was 0.845 (95% CI 0.786-0.904) in predicting PCa and 0.834 (95% CI 0.787-0.882) in predicting csPCa, Table 4, Fig. 1. The diagnostic performance of the two models was significantly better than each single variable (p < 0.05), showed in Table 4.
Calibration and decision curves of the models Figure 2 displayed calibration curves of both models 1 and 2. On each calibration plot, the predicted risk of the model was represented on the x axis and the actual risk of biopsy-proven PCa or csPCa is represented on the y axis. Within the internal validation cohort, equally excellent calibration curves were observed. Decision curves showed that the models resulted in a higher net benefit when the threshold probabilities was 0.35 or above for both csPCa and PCa. (Fig. 3).

Discussion
It is showed in our study that PI-RADs v2 performed a higher sensitivity and negative predictive value when  assessing the detection of csPCa than PCa. And the validation provided evidence supporting both models 1 and 2 that were based on PI-RADs v2, age, and PSAD*10 in predicting csPCa and PCa. The performance of the two models was significantly better than each single variable. Calibration properties were good in patients with PCa and csPCa. These findings were further supported by a decision curve analysis. Several recent studies focusing on the validity of PI-RADs v2 scoring system in detection of csPCa or PCa have validated the diagnostic performance. Though the outcome varied among studies, PI-RADs v2 was proved to have high accuracy for predicting csPCa [2-4, 6, 7]. One of these studies [6] resulted in AUC of PI-RADs v2-only of 0.83 in PCa and 0. 91 in csPCa, which was higher than ours. A possible reason for this might be that our AUC analysis was based on pathological results and experimental examinations, while they made analysis basing on lesions. In their study, patients with suspicious findings, at least one lesion with a PI-RADS v1 assessment category of ≥3, were selected for biopsy and included in the cohort. And that made a great difference. Besides, targeted in-bore MR-guided biopsy helped find more csPCa comparing to our TRUS-guided systematic biopsy of 24-needle cores.
Another study [10] has shown the accurate prediction of PI-RADs v2 based model for high-grade PCa, which also comprised PI-RADs v2, age, and PSAD. On comparing that work to the present study, the model in the present study enrolled more patients (441 versus 247) and showed a lower AUC (83 versus 86%). The reason for this might be that their biopsy was based on targeted   lesions whose PI-RADS v1 sum score > 9, and this led to high detection of csPCa. Clinically significant PCa in the present study was defined as GS ≥4 + 3 or 3 + 4 with PSA > 10 ng/ml, > 3 biopsy cores positive, or at least 1 biopsy core with > 50% involvement. Comparing to definition of GS ≥ 7 in their study, less csPCa were observed in our cohort. There are several predicting tools that have been increasingly developed and validated for use in the PCa screening, such as the European Randomized Study for Screening of Prostate Cancer Risk Calculator (ERSPC-RC) and Prostate Cancer Prevention Trial Risk Calculator (PCPT-RC). Though some variables were found, they were mainly based on age, family history, PSA level, DRE, PV, and previous biopsy status [11,12]. The Chinese Prostate Cancer Consortium Risk Calculator (CPCC-RC) performed better in decision making of prostate biopsy in Chinese or in other Asian populations included PSA, PV, age, free PSA ration, and DRE but did not involve family history or prior biopsy [8]. However, all the risk calculators above did not take the weight of mpMRI into account. The model established in this study highlighted the dominance of PI-RADS v2 scoring in prediction and showed an AUC of 0.845 (0.786-0. 904) for PCa and 0.834 (0.787-0.882) for csPCa in validation cohort, which outperformed the CPCC-RC (AUC 0.801 and 0.826).
The relationship between PSA screening and PCa have been evaluated in both Chinese and Western populations, though it differs importantly between them [13,14]. A previous study [1] carried out a comprehensive epidemiological analysis of global PCa incidence and mortality using high-quality data. China has the increasing incidence and staple mortality compared to western countries. Prostate volume was proved to be higher in Chinese compared to western population, which could theoretically lead to a higher PSA value and miss PCa at biopsy [14]. PSAD, which could eliminate the influence of PV on PSA, was proved to be a significant predictor for PI-RADs 3-5 lesions [15,16]. Also, a recent study [17] has validated the incremental value of PSAD in combination with PI-RADS for the accuracy of PCa screening and showed that the NPV of PI-RADS could be improved by inclusion of PSAD and unnecessary biopsies could be reduced. Even for PCa men on active surveillance, combining PSAD and PIRADS score could predict upstaging when PIRADS score is > = 3 with PSAD > 0.15 [18]. We entered PSAD into the model, and it resulted in an excellent diagnostic performance. In view of the fact that the benefit of mpMRI is becoming an increasingly important aspect of urologic practice [19], there are several reasons that the development of this model should be favored. First of all, it combines PI-RADs v2 with clinical factors PSAD and age, resulting in good clinical performance among both urologists and radiologists. Though moderate inconsistence still exists among the interobserver agreements, PI-RADs v2 reduce variability in imaging by establishing guidelines, summarizing suspicion levels, and standardizing reports. Clinical urologists could improve the diagnostic ability by learning the diagnostic process of PI-RADS v2. Secondly, all patients included in the study received 24-core systematic TRUS-guided biopsy, and the impact on tumor detection of different biopsy methods could be avoided. TRUS-guided systematic biopsy was validated to have similar overall detection compared to MRI-targeted Biopsy or MRI-TRUS fusion biopsy [20,21], though the detection rate of csPCa might be lower. Last but not the least, this model included only three variables and made it simplified and applied for not only urologists but also radiologists, which was different from previous models.
There are several limitations of this study that should be noticed. The main limitation is a retrospective single-center design, and prospective multicenter external validation should be required to validate its accuracy better. Besides, our outcomes were got according to biopsy-proven Gleason score but not postprostatectomy pathological grading, which may result in a lower diagnosis quantity of csPCa and make the predictive accuracy of the model be underestimated [22,23]. Furthermore, we did not enter DRE which was previously proved even a better predictor than PSA into the model, because we wanted the model as objective as possible. And DRE was often performed by resident physicians in our center, which led to a wide difference when it came to the results positive or negative.
We recommend a further study on how would the model performed if we take PI-RADS v2 score 3 as the threshold rather than 4 in current study. And whether this model could be used to assess the diagnostic concordance of csPCa between biopsy results and postprostatectomy pathological results will be explored in our next study.

Conclusion
The model based on age, PSAD, and PI-RADs v2 score showed internally validated high predictive value for both PCa and csPCa. It could be used to improve the diagnostic performance of suspicious PCa. However, further multicenter external validation should be performed for its wide application.  Fig. 3 Decision curve analysis demonstrating the net benefit associated with the use of the model 1 a and model 2 b. None means "treat none," and all means "treat all." Model PCa (csPCa) means "treat those with PCa (or csPCa) predicted by model"