Bojan Tunguz
Bojan Tunguz

@tunguz

6 Tweets Dec 27, 2022
This past week I came across another paper that purports to get the SOTA for NNs for tabular data. Due to the extreme penchant for exaggeration in this community, I have given up on checking most of these claims, but decided to take a look at this particular work.
1/6
I decided to check how does XGBoost *really* perform on the datasets used in the paper, and the results were not pretty.
2/6
The main takeaway: for all three datasets used in the paper, the reported performance of XGBoost was widely inaccurate and the real performance was much better than their best results.
3/6
This lack of careful considerations is unfortunately extremely prevalent in the “NNs for tabular data research community.”
4/6
One gets a feeling that whole community has an actual disdain for tabular data, and a contempt to do all the important and necessary background work to get a better understanding of their subject matter.
5/6
Read more in my latest Medium post: @tunguz/another-deceptive-nn-for-tabular-data-the-wild-unsubstantiated-claims-about-constrained-f9450e911c3f" target="_blank" rel="noopener" onclick="event.stopPropagation()">medium.com
Three XGBoost notebooks that I used can be found below:
Loan dataset: github.com
Blog dataset: github.com
COMPAS dataset: github.com
#DataScience #MachineLearning #DS #ML #AI
6/6

Loading suggestions...