Data Splitsย ๐๏ธ
Data is sampled into four splits, with the following use-cases:
- Public Training and Development Dataset (1500 cases):
Available for all participants and researchers, to train and develop AI models. All data is fully anonymized and made available under a non-commercialย CC BY-NCย 4.0ย license. Includes 328 cases from theย ProstateX Challenge. For all updates/fixes regarding this dataset, please join the challenge and check out ourย dedicated forum postย on this topic.
Imaging data has been released via:ย zenodo.org/record/6624726 (DOI: 10.5281/zenodo.6624726)
Annotations have been released and are maintained via:ย github.com/DIAGNijmegen/picai_labels
Available for all participants and researchers, to train and develop AI models. All data is fully anonymized and made available under a non-commercialย CC BY-NCย 4.0ย license. Includes 328 cases from theย ProstateX Challenge. For all updates/fixes regarding this dataset, please join the challenge and check out ourย dedicated forum postย on this topic.
Imaging data has been released via:ย zenodo.org/record/6624726 (DOI: 10.5281/zenodo.6624726)
Annotations have been released and are maintained via:ย github.com/DIAGNijmegen/picai_labels
- Private/Sequestered Training Dataset (7607 cases):
Used exclusively by the organizers to retrain the top-ranking 5 AI algorithms, with large-scale data, during theย Closed Testing Phase.
Used exclusively by the organizers to retrain the top-ranking 5 AI algorithms, with large-scale data, during theย Closed Testing Phase.
- Hidden Tuning Cohort (100 cases):
Used for aย live, public leaderboardย that enables model selection and tuning, during theย Open Development Phase.
Used for aย live, public leaderboardย that enables model selection and tuning, during theย Open Development Phase.
- Hidden Testing Cohort (1000 cases):
Used to determine the top 5 AI algorithms at the end of theย Open Development Phase. Used to benchmark AI, radiologists, and test all hypotheses at the end of theย Closed Testing Phase. Includes internal testing data (unseen cases from seen centers) and external testing data (unseen cases from an unseen center). A subset of 400 cases from this cohort is used to facilitate theย PI-CAI: Reader Study.
Used to determine the top 5 AI algorithms at the end of theย Open Development Phase. Used to benchmark AI, radiologists, and test all hypotheses at the end of theย Closed Testing Phase. Includes internal testing data (unseen cases from seen centers) and external testing data (unseen cases from an unseen center). A subset of 400 cases from this cohort is used to facilitate theย PI-CAI: Reader Study.
Imaging Data ๐ฅ
The complete dataset used for the PI-CAI challenge comprises a cohort of 9000โ11,000 prostate MRI exams, curated from three Dutch centers {Radboud University Medical Center (RUMC), Ziekenhuis Groep Twente (ZGT), University Medical Center Groningen (UMCG)} and one Norwegian center {Norwegian University of Science and Technology (NTNU)}. Institutional review boards of all four centers have waived the need for informed patient consent, with respect to the retrospective scientific use of anonymized clinical data in this challenge.
All patient exams are of men suspected of harboring csPCa (e.g. due to elevated levels of PSA, abnormal DRE findings). Patients are included only if they do not have a history of treatment or prior ISUP โฅ 2 findings.
All patient exams include basic clinical variables {patient age, prostate volume, PSA level, PSA density} as reported in their diagnostic reports, basic acquisition variables {scanner manufacturer, scanner model name, diffusion b-value}, and bpMRI scans, acquired using Siemens Healthineers or Philips Medical Systems-based scanners with surface coils. Imaging consists of the following sequences:
- Axial, sagittal and coronal T2-weighted imaging (T2W).
- Axial high b-value (โฅ 1000 s/mmยฒ) diffusion-weighted imaging (DWI).
- Axial apparent diffusion coefficient maps (ADC).
โ ๏ธAbsolute intensity values of ADC scans used in the PI-CAI challenge are not universal or clinically meaningful on their own (e.g., unlikeย Hounsfield units (HU) in CT scans, where -1000 HU will always indicate air), due to non-standardized acquisition protocols across centers and/or inconsistent image scaling (T.L. Chenevert et al., 2014). Furthermore, PI-RADS v2 recommends that absolute ADC values should be used with caution, as these can vary substantially depending on the value and number of b-values selected, the magnet strength, the vendor, and inter-patient variability (T. Barrett et al., 2015).
For theย Public Training and Development Datasetย and theย Private/Sequestered Training Dataset:
- Every patient case will at least have three imaging sequences: axial T2W, axial DWI and axial ADC scans (i.e. files ending inย
_t2w.mha
,ย _hbv.mha
,ย _adc.mha
). Additionally, they can also have either, both or none of these optional imaging sequences: sagittal and coronal T2W scans (i.e. files ending inย _sag.mha
,ย _cor.mha
ย here). No patient case will includeย dynamic contrast-enhanced (DCE) sequences.
_t2w.mha
,ย _hbv.mha
,ย _adc.mha
). Additionally, they can also have either, both or none of these optional imaging sequences: sagittal and coronal T2W scans (i.e. files ending inย _sag.mha
,ย _cor.mha
ย here). No patient case will includeย dynamic contrast-enhanced (DCE) sequences.For theย Hidden Tuning Cohortย and theย Hidden Testing Cohort:
- Every patient case will have exactly five imaging sequences: axial, sagittal and coronal T2W; axial DWI and axial ADC scans (i.e. files ending inย
_t2w.mha
,ย _sag.mha
,ย _cor.mha
,ย _hbv.mha
,ย _adc.mha
ย here). For part of theย Hidden Testing Cohort, DCE sequences will only be available toย radiologists participating in the PI-CAI: Reader Study. But they will not be available for AI algorithms, within the context of this grand challenge, at any given stage.
_t2w.mha
,ย _sag.mha
,ย _cor.mha
,ย _hbv.mha
,ย _adc.mha
ย here). For part of theย Hidden Testing Cohort, DCE sequences will only be available toย radiologists participating in the PI-CAI: Reader Study. But they will not be available for AI algorithms, within the context of this grand challenge, at any given stage.To dive deeper into the clinical significance of different prostate MRI sequences, and why they are useful for csPCa detection/diagnosis, feel free to have a look at:
Clinical and Scanner Information ๐งช
For the Public Training and Development Dataset and the Private/Sequestered Training Dataset:
- PSAโฐ, prostate volumeโฐ, PSA densityโฐ, patient age^, MRI scanner manufacturer^, MRI scanner model name^ and diffusion b-value of the high b-value DWI/HBV scan^, will be available to every AI algorithm per case.
For the Hidden Tuning Cohort and the Hidden Testing Cohort:
- PSA^, prostate volume^ยน, PSA density^ยฒ, patient age^, MRI scanner manufacturer^, MRI scanner model name^ and diffusion b-value of the high b-value DWI/HBV scan^, will be available to every AI algorithm per case.
โฐย available, if value is reported during clinical routineยน if value is not reported during clinical routine, it is retrospectively calculated by an expert radiologist
ยฒ if value isย not reportedย during clinical routine, it is retrospectivelyย calculated from the PSA and prostate volume
^ always available
ยฒ if value isย not reportedย during clinical routine, it is retrospectivelyย calculated from the PSA and prostate volume
^ always available
^ always available