Rattibha

A list of the most useful #Python libraries you can use for #SEO right now. 🐍
This updated thread will tell you the main libraries for #DataScience and #NLP that you should consider.
Use them in your workflow! 🧵

Numpy & Pandas: the foundations for data analysis, just learn them.
Without these 2 libraries, you cannot do Data Science at all. Good knowledge of Pandas can get you quite far.

Advertools: the best SEM library out there.
It’s very useful for crawling, log file analysis, analyzing SERPs and querying the Knowledge Graph.
The ideal Swiss knife you need in your arsenal.
advertools.readthedocs.io

advertools.readthedocs.io/en/master/

advertools — Python

Get productive as an online marketer with a Python package that helps in automating many of the impo...

Ecommercetools: The ideal package for analyzing eCommerce data and getting access to some useful NLP functions.
It’s a rare jewel in your collection that is very handy for technical SEO and e-commerce as well.
pypi.org

pypi.org/project/ecomme…

ecommercetools

EcommerceTools is a data science toolkit for ecommerce, marketing science, and Python SEO.

Requests: Make HTTPS requests via Python, essential for web scraping.
Sure, there are alternatives but you should learn them. It's very important and a lot of your initial work will require this library.
pypi.org

pypi.org/project/reques…

requests

Python HTTP for Humans.

urllibb: for working with URLs. It should be part of your arsenal.
Take some time to study all the options and possible use cases.
docs.python.org

docs.python.org/3/library/urll…

urllib — URL handling modules

Source code: Lib/urllib/ urllib is a package that collects several modules for working with URLs: ur...

BeautifulSoup: a library to extract data from HTML/XML files, used in combination with scraping libraries to convert data into Python objects.
One of the first ones you’ll probably learn in your Python journey.
crummy.com

crummy.com/software/Beaut…

Beautiful Soup Documentation — Beautiful Soup 4.12.0 documentation

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your fa...

Scrapy: the absolute peak of scraping.
Nothing is better than this, even though the setup may be hard.
You can carry out any scraping task with this library.

Matplotlib/Seaborn/Plotly: you need some sort of visualization and these libs are here to help you.
You can start with Seaborn which is easier to use. DataViz is an important topic and you should value it.

NLTK/spaCy: work with human language to analyze text data and get insights into the nuances of our language.
This is necessary to get your hands dirty with text data.
The latter can be used to recognize entities and parts of speech.

Querycat: few functions but good quality thanks to association rule mining and BERT.
It's one of my favorite libraries, but the installation may not be immediate.
It's useful for visualizing losses in impressions over time.
github.com

github.com/jroakes/queryc…

Sklearn: A staple for Machine Learning.
I don't think you really need it, but it's one of the first libs you will encounter.
scikit-learn.org

scikit-learn.org/stable/

scikit-learn: machine learning in Python — scikit-learn 1.2.2 documentation

Simple and efficient tools for predictive data analysis Accessible to everybody, and reusable in var...

Transformers: Pretrained models to handle a wide range of tasks. Essential for NLP!
This library is crucial for the most advanced tasks and quite reliable too. I highly suggest you check my other thread:

sentence_transformers: Python framework for state-of-the-art sentence, text, and image embeddings.
Use it for keyword clustering and other text-related tasks. It's one of my most used libraries right now.
sbert.net

sbert.net

SentenceTransformers Documentation — Sentence-Transformers documentation

Trafilatura: download, parse and scrape web pages.
If you work with content, look no further.
Cleaning the HTML elements of a page is overrated, don't waste your life on it!
trafilatura.readthedocs.io

trafilatura.readthedocs.io/en/latest/

A Python package & command-line tool to gather text on the Web — trafilatura 1.5.0 documentation

Trafilatura is a Python package and command-line tool designed to gather text on the Web. Its main a...

Streamlit/Dash: interactive web applications.
Useful for prototyping and communicating.
Streamlit is one of the most favorite solutions for the SEO community.

Typer: create apps that you can run from your command line.
Extremely powerful for personal uses and for running local scripts.
A game-changer for automating your workflow.
typer.tiangolo.com

typer.tiangolo.com

Typer

Typer, build great CLIs. Easy to code. Based on Python type hints.

networkx: the must-have graph theory library.
I recommend you learn it once you have mastered the basics.
Graph Theory is of great importance for analysts who want to level up their game.
More on this in future threads.
networkx.org

networkx.org

NetworkX — NetworkX documentation

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, a...

searchconsole: Use this library to import your data from the GSC API.
It's easy to set up and it's one of the most used libraries in my workflow.
github.com

github.com/joshcarty/goog…

GitHub - joshcarty/google-searchconsole: A wrapper for the Google Search Console API.

A wrapper for the Google Search Console API. Contribute to joshcarty/google-searchconsole developmen...

BERTopic: one of my most used NLP libraries and for good reasons. I dedicated an entire thread on the topic:

scattertext: library for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot.
A short example from the official docs: (#citation" target="_blank" rel="noopener" onclick="event.stopPropagation()">github.com).

github.com/JasonKessler/s…

GitHub - JasonKessler/scattertext: Beautiful visualizations of how language differs among document types.

Beautiful visualizations of how language differs among document types. - GitHub - JasonKessler/scatt...

openpyxl: if you have to work with Excel data and create spreadsheets.
There are other libraries but I prefer to use this one. It's quite nice and it works well for most of the tasks.
openpyxl.readthedocs.io

openpyxl.readthedocs.io/en/stable/

openpyxl - A Python library to read/write Excel 2010 xlsx/xlsm files — openpyxl 3.1.2 documentation

Start with scraping and data analysis.
Then, you can move to NLP libraries and study topics like NER and Clustering.

Sticking to the mainstream libraries is necessary to get access to "better" documentation.
My suggestion is to try alternatives and always look for new opportunities across the web.
Be sure to always do your research, you could find the perfect library for your needs.

Follow me for threads, tips, and case studies (coming soon) about SEO, content, and Python/data.
If you liked this thread, consider liking and retweeting it!🧵

I offer short consultancies and full freelancing for publishers and B2C content.
bookk.me

bookk.me/marcogiordano

bookk.me | Marco Giordano

Book time with Marco Giordano