haltakov.eth ðŸ§ąðŸ”Ļ
haltakov.eth ðŸ§ąðŸ”Ļ

@haltakov

10 Tweets 1 reads Nov 17, 2021
Can you detect COVID-19 using Machine Learning? ðŸĪ”
You have an X-ray or CT scan and the task is to detect if the patient has COVID-19 or not. Sounds doable, right?
None of the 415 ML papers published on the subject in 2020 was usable. Not a single one!
Let's see why 👇
Researchers from Cambridge took all papers on the topic published from January to October 2020.
▩ïļ 2212 papers
▩ïļ 415 after initial screening
▩ïļ 62 chosen for detailed analysis
▩ïļ 0 with potential for clinical use
healthcare-in-europe.com
There are important lessons here 👇
Small datasets 🐁
Getting medical data is hard, because of privacy concerns, and at the beginning of the pandemic, there was just not much data in general.
Many papers were using very small datasets often collected from a single hospital - not enough for real evaluation.
👇
Biased datasets 🧒🧑‍ðŸĶē
Some papers used a dataset that contained non-COVID images from children and COVID images from adults. These methods probably learned to distinguish children from adults... ðŸĪ·â€â™‚ïļ
👇
Training and testing on the same data ❌
OK, you just never do that! Never!
👇
Unbalanced datasets ⚖ïļ
There are much more non-COVID scans than real COVID cases, but not all papers managed to adequately balance their dataset to account for that.
Check out this thread for more details on how to deal with imbalanced data:
👇
Unclear evaluation methodology ⁉ïļ
Many papers failed to disclose the amount of data they were tested or important aspects of how their models work leading to poor reproducibility and biased results.
👇
The problem is in the data ðŸ’―
The big problem for most methods was the availability of high-quality data and a deep understanding of the problem - many papers didn't even consult with radiologists.
A high-quality and diverse dataset is more important than your fancy model!
👇
References 🗒ïļ
Full article in Nature: nature.com
More detailed coverage: statnews.com
Source for X-Ray image: bmj.com
This week I'm reposting some of my best threads from the past months, so I can focus on creating my machine learning course.
Next week I'm back with some new content on machine learning and web3, so make sure you follow me @haltakov.

Loading suggestions...