Background & Summary
Prostate cancer is the most prevalent malignancy in males worldwide1, requiring accurate diagnostic and treatment approaches. For diagnosis, the global Prostate Imaging Reporting and Data Systems (PI-RADS) guidelines are used to assess prostatic lesions on multi-parametric magnetic resonance imaging (mpMRI), with different primary sequences depending on their zonal location2. These anatomical zones of the prostate are defined as the peripheral zone (PZ), central zone (CZ), transitional zone (TZ), and anterior fibromuscular stroma (AFS), and presents different characteristics and histological features3. Radiotherapy treatment of prostate cancer has traditionally been delivered with a homogeneous dose to the entire prostate. Recent studies have demonstrated the efficacy of a local dose escalation to a sub volume of the prostate, a so-called focal boost4. The relationship between urethral dose and urinary toxicity highlights the importance of limiting the dose to the intraprostatic urethra to minimize side effects from treatment5,6. Additionally, focal boost treatment could potentially be further optimized by using the zonal information to individualize treatment and risk stratification, as the location of the cancer in different zones has different incidence, prognosis, and outcome, making treatment zonal-dependent instead of zonal-agnostic7.
Manual segmentations of the prostate, its anatomical zones, and the urethra on MRI are tedious and time-consuming. Therefore, the development of an individualized, automatic method to segment the prostate, its internal zones, and the urethra is relevant in current medical practice, with implications both for treatment and diagnosis. The current literature of machine learning methods for automatic segmentations of prostate zonal anatomy on MRI was recently discussed in a topical review8. The review highlighted current limitations in clinical applicability regarding the data used. Terminology and delineated structures were inconsistent, inter-reader variability to compare model performance was generally lacking, and most critically, an open external dataset for comparison of model performances was absent.
In this work, we present a dataset with manual segmentations of the prostate and its internal zones, with terminology consistent with the PI-RADS v2.1s guidelines, as well as the prostatic urethra. It comprises 240 segmentations for 200 patients from the PROSTATEx dataset9, totalling 1200 individual structures. For 40 patients, independent duplicate segmentations are provided, presenting inter-reader variability data useful for comparing the performance of automatic segmentation models. This dataset can be used either as a training and testing dataset for machine learning purposes or as an external dataset for models trained on private datasets to allow for unbiased testing of model performances.
To the best of our knowledge, it is the first publicly available dataset containing all the prostatic zones and the first publicly available dataset containing the prostatic urethra delineated on MRI.
Methods
Image data
The dataset consists of 200 patients, programmatically randomly selected, from the PROSTATEx dataset available at the Cancer Imaging Archive9,10. Each sample contains mpMRI exams acquired from one of two different 3 T MR Siemens scanners (MAGNETOM Trio and Skyra). T2-weighted (T2w) images were acquired with an in-plane resolution of 0.3 to 0.7 mm and a slice thickness of 3.0 to 4.0 mm, with 0.5 × 0.5 × 3.0 mm being the most frequent resolution (74%). The selected data consisted of patients with (66) and without (134) clinically significant prostate cancer (csPCa), defined as a biopsy proven Gleason Grade Group ≥2. Other image sequences are available, as well as patient selection criteria11.
Segmentations
The prostatic zones and urethra were manually segmented slice by slice on the axial T2-weighted images by two experienced radiologists working in collaboration with three junior colleagues. The two more experienced radiologists had >1000 and ~500 interpreted prostate MRIs each, while the juniors had<100 prostate MRIs each. All delineations by junior colleagues were checked and, if necessary, corrected by one of the more experienced radiologists. The delineations were based on the zonal description by McNeal3 and the PI-RADS guidelines (v2.1)2. A retrospective dose analysis for focal boost treatment was used as the basis for the urethra12. A short description of the segmented structures is provided in Table1.
The delineation procedure was performed with 3D Slicer13 v.5.2.2 and started by initially delineating the prostate contour followed by the prostatic urethra, as a circular structure with a 6 mm diameter in each slice. Thereafter, the prostatic zones were generally delineated in the order of PZ, TZ, CZ, and AFS. If the urethra was difficult to delineate in any individual slice, it was considered acceptable to interpolate that slice based on adjacent slices if the result was corrected to follow the desired structure.
Out of the total 200 patients selected, 40 were delineated independently by two experienced radiologists. This resulted in two independent segmentations for all test patients, creating an inter-reader variability baseline, which allows for a comparison of the model performance. After completion, all segmentations were reviewed by a multi-professional team, who performed minor adjustments, removing isolated pixels and harmonizing segmentations between slices in Hero v.2023.1.1 (Hero Imaging AB, Umeå, Sweden; https://www.heroimaging.com/). Two different individuals performed the minor adjustments for the duplicate segmentations to ensure reliability. Subsequently, all segmentations were submitted for final approval from one of the two experienced radiologists, with the duplicates approved by the initial delineator using the online evaluation tool naesView (https://naesview.com). In the end, the resulting dataset includes 160 samples with single segmentations for training, and 40 samples with duplicate segmentations for testing.
Data Records
The dataset is available at Zenodo14.
Each patient segmentation is saved as an nrrd-file which contains image geometry but no patient metadata. The segmentations have names corresponding to each respective patient following the PROSTATEx naming convention. In short, the segmentation for patient 0000 is called Seg-0000, and duplicate segmentations are separated between reader 1 and 2 as Seg-####_R1 and Seg-####_R2, respectively. If metadata is needed, it can be gathered from the corresponding PROSTATEx DICOM images.
A summary file containing information from PROSTATEx about the presence of csPCa in the samples included in this cohort is available in Excel format. Full information is available at the image repository9.
Technical Validation
An example of the delineated structures for a representative patient is displayed in Fig.1.
Example segmentation. The structures delineated for one patient displayed in the axial (left), sagittal (top right) and coronal view (bottom right).
For the 40 duplicate samples, inter-reader variability is presented as mean ± standard deviation in Table2. The inter-reader variability metrics show overlap and boundary measures, with Dice Similarity Coefficient (DSC) as well as Hausdorff Distance (HD) and Average Symmetric Surface Distance (ASSD)15. Additionally, the Centre Line Distance (CLD) is provided for the urethra. All metrics are volumetrically based (i.e. not measured slice-wise) and provide context for the consistency of the delineations and can be used when comparing the performance of automatic segmentation methods.
-
Dice Similarity Coefficient (DSC):
The DSC measures the volumetric overlap between a reference mask and a prediction,
$${DSC}=\frac{2\left|{V}_{{\rm{ref}}}\cap {V}_{{\rm{pred}}}\right|}{\left|{V}_{{\rm{ref}}}\right|+\left|{V}_{{\rm{pred}}}\right|}$$
where \({V}_{{\rm{ref}}}\) and \({V}_{{\rm{pred}}}\) represent the reference and predicted volumes, respectively.
Hence, DSC is one for a complete overlap between the two volumes and zero for no overlap.
-
Hausdorff Distance (HD):
The HD is a boundary metric that calculates the maximum of all shortest distances for all voxels from one surface to the other,
$${HD}({S}_{{\rm{ref}}},{S}_{{\rm{pred}}})=\max (h\left({S}_{{\rm{ref}}},{S}_{{\rm{pred}}}\right),h\left({S}_{{\rm{pred}}},{S}_{{\rm{ref}}}\right))$$
where \({S}_{{\rm{ref}}}\) and \({S}_{{\rm{pred}}}\) are the reference and predicted surfaces, respectively, as derived from the original volumes, and \(h({S}_{{\rm{ref}}},\,{S}_{{\rm{pred}}})\) is given by,
$$h({S}_{{\rm{ref}}},{S}_{{\rm{pred}}})=\mathop{\max }\limits_{r{\rm{\in }}{S}_{{\rm{ref}}}}\,\mathop{\min }\limits_{p{\rm{\in }}{S}_{{\rm{pred}}}}||r-p||$$
where \(|\left|r-p\right||\) is the Euclidean distance.
Ideally, HD should be as close to zero as possible, but it is sensitive to outliers as the maximum distance of all shortest distances between the surfaces is returned.
-
Average Symmetric Surface Distance (ASSD):
The ASSD measures the average of all distances for every voxel from one surface to the other and vice versa. It is given by,
$${ASSD}\left({S}_{{\rm{ref}}},{S}_{{\rm{pred}}}\right)=\frac{d\left({S}_{{\rm{ref}}},{S}_{{\rm{pred}}}\right)+d({S}_{{\rm{pred}}},{S}_{{\rm{ref}}})}{{N}_{{\rm{ref}}}+{N}_{{\rm{pred}}}}$$
See AlsoPROSTATEx-2 Challengewhere \({S}_{{\rm{ref}}}\) and \({S}_{{\rm{pred}}}\) as well as \({N}_{{\rm{ref}}}\,\) and \({N}_{{\rm{pred}}}\) are the reference and predicted surfaces and their number of surface voxels, respectively. The distance \(d({S}_{{\rm{ref}}},{S}_{{\rm{pred}}})\) is determined as,
$$d({S}_{{\rm{ref}}},{S}_{{\rm{pred}}})=\sum _{r\in {S}_{{\rm{ref}}}}\,\mathop{\min }\limits_{p{\rm{\in }}{S}_{{\rm{pred}}}}||r-p||$$
where \(|\left|r-p\right||\) is the Euclidean distance.
Like HD, ASSD would ideally be as close to zero as possible, although it is less sensitive to outliers as it represents an average over all shortest distances between the voxels of the two surfaces.
-
Centre Line Distance (CLD):
The CLD is calculated as the ASSD for two skeletonized structures, meaning the delineations are reduced to a single voxel per slice.
The CLD should ideally be as close to zero as possible as this indicates smaller deviations between the skeletonized delineations.
For a more detailed analysis of the inter-reader variability, a confusion matrix is presented in Table3. The matrix shows the segmentations of Reader 1 along the rows and Reader 2 along the columns. Discrepancies are indicative of individual tendencies in the segmentation approach of each reader. Reader 1 tends to delineate a prostate that is 5% larger than that delineated by Reader 2. This enlargement is predominantly observed in the size of the PZ and CZ, which explains most of the differences observed in the matrix for these structures and the background. Reader 2 outlines a larger AFS, which then overlaps with TZ as classified by Reader 1. Other discrepancies have less individual tendencies, and most disagreements classified as adjacent structures, presumably along the borders. This is an anticipated outcome given the absence of visually distinct boundaries, underscoring the complexity of the task.
Usage Notes
For the simplest usage of the dataset, a description with Python code is available (https://github.com/UMU-DDI/ProstateZones) of how to extract the relevant images from the PROSTATEx dataset and to set up a folder structure containing images and their corresponding segmentations upon download.
Code availability
No code is needed to use the data, but a Python script for simple structuring is available (https://github.com/UMU-DDI/ProstateZones). The GitHub repository includes a requirements .txt file, a Python script, as well as other complementary files. Additionally, the GitHub repository includes the Hero workflow used to calculate the inter-reader variability metrics for full transparency.
References
Ferlay, J. et al. Global Cancer Observatory: Cancer Today (version 1.1). Lyon, France: International Agency for Research on Cancer. Avaliable from: https://gco.iarc.who.int/today, accessed [22 March 2024].
Turkbey, B. et al. Prostate imaging reporting and data system version 2.1: 2019 update of prostate imaging reporting and data system version 2. European urology 76, 340–351 (2019).
McNeal, J. E. Normal histology of the prostate. The American journal of surgical pathology 12, 619–633 (1988).
Kerkmeijer, L. G. W. et al. Focal Boost to the Intraprostatic Tumor in External Beam Radiotherapy for Patients With Localized Prostate Cancer: Results From the FLAME Randomized Phase III Trial. J Clin Oncol 39, 787–796, https://doi.org/10.1200/JCO.20.02873 (2021).
Draulans, C. et al. Primary endpoint analysis of the multicentre phase II hypo-FLAME trial for intermediate and high risk prostate cancer. Radiother Oncol 147, 92–98, https://doi.org/10.1016/j.radonc.2020.03.015 (2020).
Leeman, J. E. et al. Radiation Dose to the Intraprostatic Urethra Correlates Strongly With Urinary Toxicity After Prostate Stereotactic Body Radiation Therapy: A Combined Analysis of 23 Prospective Clinical Trials. Int J Radiat Oncol Biol Phys 112, 75–82, https://doi.org/10.1016/j.ijrobp.2021.06.037 (2022).
Ali, A. et al. Prostate zones and cancer: lost in transition? Nat Rev Urol 19, 101–115, https://doi.org/10.1038/s41585-021-00524-7 (2022).
Wu, C. et al. Automatic segmentation of prostate zonal anatomy on MRI: a systematic review of the literature. Insights Imaging 13, 202, https://doi.org/10.1186/s13244-022-01340-2 (2022).
Litjens, G., Debats, O., Barentsz, J., Karssemeijer, N., & Huisman, H. ProstateX Challenge data. The Cancer Imaging Archive. https://doi.org/10.7937/K9TCIA.2017.MURS5CL (2017).
Clark, K. et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging 26, 1045–1057, https://doi.org/10.1007/s10278-013-9622-7 (2013).
Litjens, G., Debats, O., Barentsz, J., Karssemeijer, N. & Huisman, H. Computer-aided detection of prostate cancer in MRI. IEEE Trans Med Imaging 33, 1083–1092, https://doi.org/10.1109/TMI.2014.2303821 (2014).
Groen, V. H. et al. Urethral and bladder dose-effect relations for late genitourinary toxicity following external beam radiotherapy for prostate cancer in the FLAME trial. Radiother Oncol 167, 127–132, https://doi.org/10.1016/j.radonc.2021.12.027 (2022).
Fedorov, A. et al. 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging 30, 1323–1341, https://doi.org/10.1016/j.mri.2012.05.001 (2012).
Holmlund, W. et al. ProstateZones – Segmentations of the prostatic zones and urethra for the PROSTATEx dataset. Zenodo. https://doi.org/10.5281/zenodo.10718469 (2024).
Reinke, A. et al. Common limitations of image processing metrics: A picture story. arXiv preprint arXiv:2104.05642 (2021).
Funding
Open access funding provided by Umea University.
Author information
Authors and Affiliations
Umeå University, Department of Diagnostics and Intervention, Umeå, Sweden
William Holmlund,Attila Simkó,Karin Söderkvist,Patrik Brynolfsson&Tufve Nyholm
University of Szeged, Albert Szent-Györgyi Medical School, Department of Radiology, Szeged, Hungary
Péter Palásti,Szilvia Tótin,Kamilla Kalmár,Zsófia Domoki,Zsuzsanna Fejes&Zsigmond Tamás Kincses
Skåne University Hospital, Department of Haematology, Oncology and Radiation Physics, Lund, Sweden
Patrik Brynolfsson
Authors
- William Holmlund
View author publications
You can also search for this author in PubMedGoogle Scholar
- Attila Simkó
View author publications
You can also search for this author in PubMedGoogle Scholar
- Karin Söderkvist
View author publications
You can also search for this author in PubMedGoogle Scholar
- Péter Palásti
View author publications
You can also search for this author in PubMedGoogle Scholar
- Szilvia Tótin
View author publications
You can also search for this author in PubMedGoogle Scholar
- Kamilla Kalmár
View author publications
You can also search for this author in PubMedGoogle Scholar
- Zsófia Domoki
View author publications
You can also search for this author in PubMedGoogle Scholar
- Zsuzsanna Fejes
View author publications
You can also search for this author in PubMedGoogle Scholar
- Zsigmond Tamás Kincses
View author publications
You can also search for this author in PubMedGoogle Scholar
- Patrik Brynolfsson
View author publications
You can also search for this author in PubMedGoogle Scholar
- Tufve Nyholm
View author publications
You can also search for this author in PubMedGoogle Scholar
Contributions
Conceptualization: W.H., T.N., P.B., K.S. Data creation: P.P., S.T., K.K., Z.D., Z.F. Quality assurance: W.H., A.S., K.S., P.P., Z.F. Drafting of manuscript: W.H. All authors (W.H., A.S., K.S., P.P., S.T., K.K., Z.D., Z.F., T.Z.K., P.B., T.N.) critically reviewed and approved the final manuscript.
Corresponding authors
Correspondence to William Holmlund or Tufve Nyholm.
Ethics declarations
Competing interests
T.N. and P.B. are co-owners, while AS is employed by Hero Imaging AB, developing the software used during evaluations.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Holmlund, W., Simkó, A., Söderkvist, K. et al. ProstateZones – Segmentations of the prostatic zones and urethra for the PROSTATEx dataset. Sci Data 11, 1097 (2024). https://doi.org/10.1038/s41597-024-03945-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03945-2