Data Processing
29 Threads
How are open LLMs trained and created in 2024? ๐ค @01AI_Yi just released their paper on how they created the YI, a family of LLMs and V-LLMs. The paper includes details on the data...
ChatGPT is the best partner for data processing with its latest update. You can search for data on the web, exploit it with the Code Interpreter and export it in Excel format easi...
Excel and ChatGPT are the best combo. You can automate your data processing easily. I show you how to use AI with Excel without typing formulas: https://t.co/b32SKkwddr
Extract tables from documents using @llama_index UnstructuredElementParser and then use RecursiveRetriever to enable hybrid tabular/semantic queries and also comparisons over multi...
PyTorch 101: Dataset Class clearly explained! Understanding the Dataset Class is crucial for efficiently managing & preprocessing your data while building models in PyTorch! Toda...
Data processing is not always straightforward Then how is Data Processed in the Cloud Ecosystem? /๐งต/
Starting with @AlchemyLearn
If youโre building โchat over your PDFsโ with LLMs, you need to deal with the pesky issue of how to parse embedded tables/diagrams. Native text splitting + top-k on your tables ==...
What is under the hood of ๐ฆ๐ฝ๐ฎ๐ฟ๐ธ? Apache Spark is an extremely popular distributed processing framework utilizing in-memory processing to speed up task execution. Most of its libra...
What is under the hood of ๐๐ฝ๐ฎ๐ฐ๐ต๐ฒ ๐๐น๐ถ๐ป๐ธ? ๐ฆ๐๐ฟ๐ฒ๐ฎ๐บ ๐ฃ๐ฟ๐ผ๐ฐ๐ฒ๐๐๐ถ๐ป๐ด is becoming more important and accessible to all organisations. One of the most popular Stream Processing Frameworks used...
Got asked how stream processing platforms (e.g. Apache Flink, Kafka Streams, Spark Structured Streaming) compare to streaming databases (e.g. RisingWave, Materialize, PranaDB). The...
Machine Learning models are sneaky little bastards that use any available shortcuts to optimize their evaluation metric. Every tutorial speaks about splitting your data randomly....
I have tried hundreds of AI tools. Here are 23 great ones that are not just GPT4. 1. CatBird - Run one prompt on 16+ Stable Diffusion models. https://t.co/q2rGDwyXFg https://t...
4 text functions to learn in less than 5 minutes:
10 Skills Employers Want To See In Your Data Analytics Portfolio:
โจ๏ธ๐ก What is #Scala programming lang? - Scala is used in Data processing, distributed computing, and web development. It powers the data engineering infrastructure of many companie...
5 challenges faced when dealing with big data: https://t.co/D4mZcj7x92
7 GitHub repositories will make you a standout developer from 99% of people:
4 text functions you need to know:
5 machine learning tricks to make your life easy. (A thread) ๐งต๐
5 simple tricks to supercharge your machine learning development. (A thread) ๐
Dealing with common problems in Machine Learning data ๐๐งต
Here is what Machine Learning tutorials told you to do: 1. Start by transforming your dataset 2. Then split it (train, validation, and test sets) 3. Finally, build your model Ple...
We just published a YouTube video that explains why is Kafka fast. If you prefer video format, consider subscribing to our ByteByteGo youtube channel: https://t.co/Abm5CkdvPE I...