8 Tweets 7 reads May 02, 2023
4 types of Outliers ๐Ÿ”ฝ
Let's discuss each ๐Ÿงต
1/8
1๏ธโƒฃ Procedural error
They are usually caused by data entry errors or mistakes in codes/software etc.
They can cause real trouble, but easy to notice them, since they are prominent.
They can be eliminated most of the time.
2/8
2๏ธโƒฃ Extraordinary event
The outliers are the results of a unique event.
The consequences depend on the seriousness of the event.
It can cause minor or huge outliers, but it is usually easy to notice them.
3/8
Consider this:
We are tracking daily rainfalls for a city, but a hurricane hits the town that lasts for 1 week.
The rainfall level during the hurricane is not normal. The event heavily affected our data collection.
4/8
3๏ธโƒฃ Extraordinary observation
A unique observation occurs, which is relatively easy to notice, but hard to explain.
Domain knowledge is necessary to decide if the observation should be included in the analysis or not.
5/8
4๏ธโƒฃ Combination of unique variables
The values for these observations are normal, they are not particularly high or low on the variables one by one.
On the other hand, the combinations of the variables result in a unique observation that differs from the population.
6/8
Of course, this is the hardest category to identify and it is almost impossible without domain knowledge.
Big picture is not enough here, you need to go deep.
In these cases, the observation is usually retained and analyzed.
7/8
That's it for today.
I hope you've found this thread helpful.
Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.
Thanks ๐Ÿ˜‰
8/8

Loading suggestions...