What is the Gold Standard?
Ever see two very skilled radiologists argue about an imaging diagnosis? Of course! it happens all the time.
In fact, this scenario occurs in all medical fields, especially where human visual assessment is required. Below is a graph from a famous 2020 paper showing the variability of skilled opthalmologists in the grading of diabetic retinopathy by fundoscopic exam. Similar results can be found in the radiology, pathology, and dermatology literature.
In fact, this scenario occurs in all medical fields, especially where human visual assessment is required. Below is a graph from a famous 2020 paper showing the variability of skilled opthalmologists in the grading of diabetic retinopathy by fundoscopic exam. Similar results can be found in the radiology, pathology, and dermatology literature.
Labeling Errors in Large Data Sets
Here, the first question to ask is: Who labeled the data and how? Humans/computers/or both? Board-certified specialized radiologists or other?
Notwithstanding Gold Standard issues as described above, appreciable true diagnostic overlap between pathologies in a given image. Is the retrocardiac opacity on a chest x-ray, pneumonia, atelectasis, pleural effusion, or all three?
Here, the first question to ask is: Who labeled the data and how? Humans/computers/or both? Board-certified specialized radiologists or other?
Notwithstanding Gold Standard issues as described above, appreciable true diagnostic overlap between pathologies in a given image. Is the retrocardiac opacity on a chest x-ray, pneumonia, atelectasis, pleural effusion, or all three?
Technical Errors in Large Data Sets
(Very large data sets, particularly those obtained from multiple sites, frequently contain wrongly stored or labeled images. For example, careful analysis of the original NIH data base of ~108,948 frontal chest x-rays (CXR8) was found to contain almost 500 major technical errors, including
Fortunately, more modern versions of this data base (CXR14) have been scrubbed for technical errors, but still contain ambiguous/disputed labeling errors. |
Biases in the Training Data Base
Computer "Short Cuts"
Advanced Discussion (show/hide)»
No supplementary material yet. Check back soon!
References
Abdalla M, Fine B. Hurdles to artificial intelligence deployment: noise in schemas and “Gold” labels. Radiology: Artificial Intelligence 2023: 5(2):e220056 [DOI LINK]
Faghani S, Khosravi B, Zhang K, et al. Mitigating bias in radiology machine learning: 3. Performance metrics. Radiology: Artificial Intelligence 2022; 4(5):e220061 [DOI LINK]
Rouzrokh P, Khosravi B, Faghani S, et al. Mitigating bias in radiology machine learning: 1. Data handling. Radiology: Artificial Intelligence 2022; 4(5):e210290 [DOI LINK]
Tymchenko B, Marchenko P, Spodarets D. Deep learning approach to diabetic retinopathy detection. arXiv:3003.02261v1 (3 Mar 2020) [WEB LINK]
Wang X, Peng Y, Lu Z, et al. Chest-xray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017: 3462-3471.
Zhang K, Khosravi B, Vahdati S, et al. Mitigating bias in radiology machine learning: 2. Model development. Radiology: Artificial Intelligence 2022; 4(5):e220010 [DOI LINK]
Abdalla M, Fine B. Hurdles to artificial intelligence deployment: noise in schemas and “Gold” labels. Radiology: Artificial Intelligence 2023: 5(2):e220056 [DOI LINK]
Faghani S, Khosravi B, Zhang K, et al. Mitigating bias in radiology machine learning: 3. Performance metrics. Radiology: Artificial Intelligence 2022; 4(5):e220061 [DOI LINK]
Rouzrokh P, Khosravi B, Faghani S, et al. Mitigating bias in radiology machine learning: 1. Data handling. Radiology: Artificial Intelligence 2022; 4(5):e210290 [DOI LINK]
Tymchenko B, Marchenko P, Spodarets D. Deep learning approach to diabetic retinopathy detection. arXiv:3003.02261v1 (3 Mar 2020) [WEB LINK]
Wang X, Peng Y, Lu Z, et al. Chest-xray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017: 3462-3471.
Zhang K, Khosravi B, Vahdati S, et al. Mitigating bias in radiology machine learning: 2. Model development. Radiology: Artificial Intelligence 2022; 4(5):e220010 [DOI LINK]
Related Questions
What causes magnetism?
What causes magnetism?