Predicting Spontaneous Pneumothorax Recurrence with Machine Learning: A Synthetic Example
DOI:
https://doi.org/10.5281/zenodo.18680928Keywords:
Pneumothorax, machine learning, prediction modelAbstract
Aim: Recurrence after primary spontaneous pneumothorax (PSP) remains clinically relevant and may influence the intensity of follow-up and the choice of interventions. Reported recurrence rates vary widely across cohorts. Machine learning (ML) can complement conventional risk stratification by combining multiple predictors into an individualized probability estimate.
Methodology: We generated a synthetic dataset of 1,000 patients with a 12-month recurrence prevalence of 50% to demonstrate an end-to-end supervised ML workflow. Predictors were constructed to mimic common clinical and imaging-derived variables (age, sex, smoking exposure, bleb size, emphysema score, prior pneumothorax, treatment strategy, and a muscle-mass proxy). We compared penalized logistic regression with a random forest classifier, using a stratified train/test split. Model performance was assessed by discrimination (ROC-AUC), overall accuracy (Brier score), calibration intercept/slope, and decision curve analysis (DCA) for clinical utility.
Results: On the held-out test set, logistic regression achieved ROC-AUC 0.7633 and Brier score 0.1989; the random forest achieved ROC-AUC 0.7501 and Brier score 0.2055. Calibration intercept/slope were -0.0910/1.1853 for logistic regression and -0.0438/1.2649 for the random forest. Both models showed positive net benefit at decision thresholds of 0.30 and 0.50.
Conclusion: This synthetic example illustrates key practical steps (data preparation, model training, evaluation, and reporting) and common pitfalls (data leakage, overfitting, and miscalibration). For real-world deployment, transparent reporting and external validation are essential.
References
1. Sadikot, R. T., Greene, T., Meadows, K., & Arnold, A. G. (1997). Recurrence of primary spontaneous pneumothorax. Thorax, 52(9), 805–809. https://doi.org/10.1136/thx.52.9.805
2. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25, 44–56. https://doi.org/10.1038/s41591-018-0300-7
3. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G., Thrun, S., & Dean, J. (2019). A guide to deep learning in healthcare. Nature Medicine, 25, 24–29. https://doi.org/10.1038/s41591-018-0316-z
4. Bellini, V., Valente, M., Del Rio, P., & Bignami, E. (2021). Artificial intelligence in thoracic surgery: A narrative review. Journal of Thoracic Disease, 13(12), 6963–6975. https://doi.org/10.21037/jtd-21-761
5. Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., & Aerts, H. J. W. L. (2018). Artificial intelligence in radiology. Nature Reviews Cancer, 18, 500–510. https://doi.org/10.1038/s41568-018-0016-5
6. Lambin, P., Leijenaar, R. T. H., Deist, T. M., Peerlings, J., de Jong, E. E. C., van Timmeren, J., Sanduleanu, S., Larue, R. T. H. M., Even, A. J. G., Jochems, A., van Wijk, Y., Woodruff, H., van Soest, J., Lustberg, T., Roelofs, E., van Elmpt, W., Dekker, A., Mottaghy, F. M., Wildberger, J. E., & Walsh, S. (2017). Radiomics: The bridge between medical imaging and personalized medicine. Nature Reviews Clinical Oncology, 14, 749–762. https://doi.org/10.1038/nrclinonc.2017.141
7. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324
8. Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M. J., & Kattan, M. W. (2010). Assessing the performance of prediction models: A framework for traditional and novel measures. Epidemiology, 21(1), 128–138. https://doi.org/10.1097/EDE.0b013e3181c30fb2
9. Vickers, A. J., & Elkin, E. B. (2006). Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making, 26(6), 565–574. https://doi.org/10.1177/0272989X06295361
10. Collins, G. S., Reitsma, J. B., Altman, D. G., & Moons, K. G. M. (2015). Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMJ, 350, g7594. https://doi.org/10.1136/bmj.g7594
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 OKAN KARATAŞ

This work is licensed under a Creative Commons Attribution 4.0 International License.