Researchers from Cambridge took all papers on the topic published from January to October 2020.
âŠïļ 2212 papers
âŠïļ 415 after initial screening
âŠïļ 62 chosen for detailed analysis
âŠïļ 0 with potential for clinical use
healthcare-in-europe.com
There are important lessons here ð
âŠïļ 2212 papers
âŠïļ 415 after initial screening
âŠïļ 62 chosen for detailed analysis
âŠïļ 0 with potential for clinical use
healthcare-in-europe.com
There are important lessons here ð
Small datasets ð
Getting medical data is hard, because of privacy concerns, and at the beginning of the pandemic, there was just not much data in general.
Many papers were using very small datasets often collected from a single hospital - not enough for real evaluation.
ð
Getting medical data is hard, because of privacy concerns, and at the beginning of the pandemic, there was just not much data in general.
Many papers were using very small datasets often collected from a single hospital - not enough for real evaluation.
ð
Biased datasets ð§ð§âðĶē
Some papers used a dataset that contained non-COVID images from children and COVID images from adults. These methods probably learned to distinguish children from adults... ðĪ·ââïļ
ð
Some papers used a dataset that contained non-COVID images from children and COVID images from adults. These methods probably learned to distinguish children from adults... ðĪ·ââïļ
ð
Training and testing on the same data â
OK, you just never do that! Never!
ð
OK, you just never do that! Never!
ð
Unbalanced datasets âïļ
There are much more non-COVID scans than real COVID cases, but not all papers managed to adequately balance their dataset to account for that.
Check out this thread for more details on how to deal with imbalanced data:
ð
There are much more non-COVID scans than real COVID cases, but not all papers managed to adequately balance their dataset to account for that.
Check out this thread for more details on how to deal with imbalanced data:
ð
Unclear evaluation methodology âïļ
Many papers failed to disclose the amount of data they were tested or important aspects of how their models work leading to poor reproducibility and biased results.
ð
Many papers failed to disclose the amount of data they were tested or important aspects of how their models work leading to poor reproducibility and biased results.
ð
The problem is in the data ð―
The big problem for most methods was the availability of high-quality data and a deep understanding of the problem - many papers didn't even consult with radiologists.
A high-quality and diverse dataset is more important than your fancy model!
ð
The big problem for most methods was the availability of high-quality data and a deep understanding of the problem - many papers didn't even consult with radiologists.
A high-quality and diverse dataset is more important than your fancy model!
ð
References ðïļ
Full article in Nature: nature.com
More detailed coverage: statnews.com
Source for X-Ray image: bmj.com
Full article in Nature: nature.com
More detailed coverage: statnews.com
Source for X-Ray image: bmj.com
This week I'm reposting some of my best threads from the past months, so I can focus on creating my machine learning course.
Next week I'm back with some new content on machine learning and web3, so make sure you follow me @haltakov.
Next week I'm back with some new content on machine learning and web3, so make sure you follow me @haltakov.
Loading suggestions...