Bilgin Ibryam
Bilgin Ibryam

@bibryam

7 Tweets 2 reads Dec 24, 2022
Kafka and Presto Sweet Spot: Ad-hoc Interactive SQL query at Uber scale 🧵
eng.uber.com
Presto at Uber
✅15 Presto clusters
✅5,000 nodes
✅7,000 weekly active users
✅500,000 queries daily
✅50 PB from HDFS
Kafka at Uber
✅Pub-sub message bus for Rider and Driver apps ✅Streaming analytics
✅Streaming changelogs
✅Data ingestion into Apache Hadoop
New challenge: How to answer if order with UUID X is missing in a Kafka Topic Y?
👎Stream processing engines such as Apache Flink, Apache Storm, or ksql process the stream continuously and either output a processed stream or incrementally maintain an updatable view. Not a good fit for point lookup or run analytical queries over the events in the past.
👎The real-time OLAP Datastores such as Apache Pinot, Apache Druid, and Clickhouse, are equipped with advanced indexing techniques to serve low-latency queries. However OLAP requires a non-trivial onboarding process, and takes storage and compute resources for serving
🧠Engineering mindest: create a solution with available in-house tools: Use Presto's Kafka connector that allows the use of Kafka topics as tables where each message in a topic is represented as a row in Presto...

Loading suggestions...