Fernando 🇮🇹🇨🇭
Fernando 🇮🇹🇨🇭

@Franc0Fernand0

11 Tweets 1 reads Feb 26, 2023
Many distributed systems effectively use specialized storage like:
• time series
• blob storage
• graph databases
• spatial databases
Here is a quick introduction to them:
{1/10} ↓
Time series are a specialized storage for large amounts of data related to a specific time.
They are optimized to measure data changes over time and perform statistical computations.
They are really useful for monitoring purposes.
{2/10}
Typical use cases of time series are:
• monitoring system's parts with many simultaneous events
• collect telemetry data in IoT systems with many devices
• dealing with financial data (stock, cryptocurrencies)
Popular implementations are InfluxDB and Prometheus.
{3/10}
Blobs are specialized storage for storing and retrieving massive amounts of unstructured data.
Blob is indeed an acronym for binary large objects.
Images, videos, extensive text data, and compiled code are all examples of such objects.
{4/10}
These databases are complex and optimized for availability and durability.
From a user perspective, they look like key-value storages since data is accessed through a key.
Google cloud storage, Amazon S3, and Azure blob storage are popular implementations.
{5/10}
Graph databases are specialized storage for storing data with many relationships.
Data are stored according to the graph data model with entities and relations between them.
Executing queries on these graphs looks like traversing a graph.
{6/10}
Executing the same queries on relational databases would be more complicated and costly.
Indeed, the number of joins between tables would be high.
The most popular graph database implementation is Neo4j with its query language Cypher.
{7/10}
Spatial databases are specialized storage for storing spatial data, like locations on a map.
They rely on spatial indexes like k-d tree or quadtree and can efficiently perform spatial-related queries.
Quadtrees are the simplest and are conceptually similar to a grid.
{8/10}
The outer grid rectangle represents the root and contains all the possible spatial locations.
Each rectangle is recursively divided into 4, and a quadrant represents the children of a node.
The division stops in quadrants with less than a specific number of locations.
{9/10}
So geographical areas with many locations are divided into more quadrants.
Spatial queries using quadtrees are efficient, requiring base 4 logarithmic time.
Databases with good geospatial support are PostgreSQL, MongoDB, and ElasticSearch.
{10/10}
Thanks for reading!
If you liked it, I'd be grateful if you'd:
• like or retweet the 1st tweet
• follow @Franc0Fernand0 for more distributed system content
• subscribe my newsletter polymathicengineer.com
Your support encourages me to keep writing!

Loading suggestions...