Marco Giordano
Marco Giordano

@GiordMarco96

20 Tweets Apr 03, 2023
SQL vs Pandas vs Tidyverse.
Ok cool, now every #SEO talks about BigQuery.
Then why do I keep telling you SQL is not really the answer?
I explain to you why Data Analysis goes through coding ๐Ÿงต
First of all, if you have some decent experience with SQL, you'd know it's a pain in the arse.
Simple commands are easy but complex queries are hard to debug.
You can close this thread, as this is the main reason.
SQL is needed when you deal with large databases, like 10M rows and 20 columns.
This is not Big Data but it's enough to make you use SQL.
There are visible improvements in speed, you can easily tell.
However, most SEO data isn't that big and only a minority of people work with super large datasets.
If you are reading this, chances are you mostly work with normal cases.
If so, you need other things, not SQL knowledge!
Pandas and Tidyverse are respectively Python and R libraries for data analysis.
I use Pandas because you know, Python is more diffused in the SEO community.
In theory, Tidyverse has the most solid syntax of the 3.
Pandas and Tidyverse win in terms of code and ease of use, all you often care about.
Speed isn't the only factor as you need to complete the task first.
Python/R have libraries and SQL can only dream of them.
Querying databases isn't really flexible compared to this duo.
So yeah, in most cases you don't need SQL.
It's a subpar choice since you have to bring this data elsewhere, like a dashboard/notebook/sheet.
Some basic knowledge is well appreciated as it takes like 1 week to learn basic SQL.
Advanced options may not be required.
Pandas and Tidyverse are slow after a certain point.
And that's where you use other libraries to work with larger data sets.
Don't worry, the syntax isn't that hard.
This is where you can flex your coding superpowers.
Polars is one of the most recent and powerful Python libraries.
Not only it's way faster than Pandas but it can be considered the perfect ETL tool.
You just need some practice but it's definitely a powerful option.
pola.rs
PySpark is one of the most powerful options out there but it's still slower than Polars.
The grammar is lovely and easier though.
It can have the edge for more complex projects, but for local files, Polars is the choice.
R can boast data.table, a famous library that can severely improve your performance.
If you plan on using this language, you must learn it.
It can be combined to tidyverse to a certain extent.
R has always been preferred for its superior statistical capabilities and for dashboards.
Healthcare and Business are where it thrives, whereas Python is quite common in SEO.
Go for the most popular in your industry, trust me.
But even so, speed is not a priority for many scenarios and libraries can change.
In fact, I know much less than a dedicated data professional.
That's normal, the average SEO case is not like that!
And even with all these tools, you are an SEO.
I talk about these topics because I am a Data Analyst too but they are not required at all for many.
This is more beneficial for businesses since Data + SEO is a killer combo.
And why is it good?
I talk about this in my past thread:
More info on audits and how you can leverage data in this thread.
Analytics is about inspiring and asking the right questions.
BigQuery is an added value but it's not a panacea.
For SEOs: learn SQL if you are into data and would like to have a more analytical mind.
For business owners: BigQuery and a Data Analyst are a big competitive advantage, given that you do what they say.
If you want to know more about the different terms, this article is a masterpiece.
For SEO, it's almost always about Analytics and less about Statistics.
kozyrkov.medium.com
Follow me for threads, tips, and case studies (coming soon) about SEO, content, and Python/data.
If you liked this thread, consider liking and retweeting it!๐Ÿงต
I offer:
- Content audits for publishers and B2C content
- Consultancies and freelancing for publishers and B2C content
- Training and mentorship for data to any business/agency
bookk.me

Loading suggestions...