I decided to check how does XGBoost *really* perform on the datasets used in the paper, and the results were not pretty.
2/6
2/6
The main takeaway: for all three datasets used in the paper, the reported performance of XGBoost was widely inaccurate and the real performance was much better than their best results.
3/6
3/6
This lack of careful considerations is unfortunately extremely prevalent in the “NNs for tabular data research community.”
4/6
4/6
One gets a feeling that whole community has an actual disdain for tabular data, and a contempt to do all the important and necessary background work to get a better understanding of their subject matter.
5/6
5/6
Read more in my latest Medium post: @tunguz/another-deceptive-nn-for-tabular-data-the-wild-unsubstantiated-claims-about-constrained-f9450e911c3f" target="_blank" rel="noopener" onclick="event.stopPropagation()">medium.com
Three XGBoost notebooks that I used can be found below:
Loan dataset: github.com
Blog dataset: github.com
COMPAS dataset: github.com
#DataScience #MachineLearning #DS #ML #AI
6/6
Three XGBoost notebooks that I used can be found below:
Loan dataset: github.com
Blog dataset: github.com
COMPAS dataset: github.com
#DataScience #MachineLearning #DS #ML #AI
6/6
Loading suggestions...