Projects

This page contains a brief overview of projects that I significantly shaped throughout the entire project life cycle. If you’re interested in a full list of projects I have been involved in, please check out my CV.

2026

2026
Upcoming
ADNEX2: updating the ADNEX model for surgically and conservatively managed patients
NA
We updated the ADNEX model towards a more clinically relevant population by including in the training population patients that were not operated.
2025
Dec
2025
Research Synthesis Methods
Model calibration may vary across hospitals or datasets, yet existing tools rarely account for this clustering. We propose and compare three approaches—clustered group calibration, two-stage meta-analysis, and mixed-model calibration—validated in simulations and a multi-centre case study (N = 2,489 ovarian tumours). We recommend two-stage meta-analysis with splines for estimating the overall curve and prediction interval, and mixed-model calibration for cluster-specific curves.
Dec
2025
The Lancet Digital Health
We evaluated 32 performance measures across five domains (discrimination, calibration, overall performance, classification, and clinical utility) for validating predictive AI models in medicine. We identify two key properties for selecting appropriate measures: whether the measure is proper and whether it captures statistical vs. decision-analytical performance. Classification measures—including the F1 score—performed poorly in most clinically relevant settings. We recommend AUC, calibration plots, and net benefit as essential reporting measures.
Oct
2025
BMJ Open
We conducted a systematic review and meta-analysis of 11 studies (8,271 tumours) directly comparing the ADNEX and RMI models for ovarian malignancy diagnosis. ADNEX (AUC 0.92) substantially outperformed RMI (AUC 0.85), and was clinically useful in 96% of operated patients compared to 15% for RMI at standard decision thresholds. Most included studies were at high risk of bias due to incomplete reporting and poor methodology.
Jun
2025
arXiv preprint
Clinical prediction models are evaluated at the population level, yet decisions are made for individuals. Using real and synthetic ovarian cancer data, we trained 59,400 model variants and show that individual risk estimates are far more uncertain than commonly assumed—model uncertainty and applicability uncertainty often dominate estimation uncertainty, even for models that perform well overall. We argue that predictive algorithms should inform rather than dictate clinical care.
2024
Sep
2024
Diagnostic and Prognostic Research
Random forests routinely produce near-perfect training c-statistics in clinical risk prediction, raising concerns about overfitting. Through heatmap visualizations and a simulation study across 192 scenarios, we show this arises from local probability spikes around training events—a structural feature of the algorithm. Despite this, test c-statistics remained competitive, though calibration was consistently poor, challenging the recommendation to use fully grown trees when the goal is probability estimation.
Feb
2024
BMJ Medicine
We conducted a systematic review and meta-analysis of 47 external validation studies (17,007 tumours) of the ADNEX model for ovarian cancer diagnosis across 43 centres in 18 countries. The summary AUC was 0.93 (95% PI 0.85–0.98) both with and without CA125, with a 95% estimated probability of clinical usefulness in a new centre. A key limitation was that 91% of validations had a high risk of bias, and model calibration was rarely assessed.
2023
Oct
2023
International Society of Ultrasound in Obstetrics and Gynecology
2022
Nov
2022
Statistical Journal of the IAOS
This was my master thesis project were the goal was to estimate the missing values of the industrial turnover index in Spain using random forest.
Aug
2022
International association of official statistics