We've seen that smaller chunks are good for capturing semantic meaning and larger ones are good for providing better context. @llama_index AutoMergingRetriever takes it one step f...

ChatGPT can now browse the internet to provide you with current and authoritative information, complete with direct links to sources. It is no longer limited to data before Septemb...

Building QA with LLMs involves pairing an LLM (synthesis) with a retrieval model. This retrieval model typically involves max embedding similarity lookup. What if you *also* use...

Retrieval for QA systems is hard Vector search is good for capturing semantically similar texts, but often queries specify desired attributes like time, authorship, or other "meta...

Traditional keyword search has its limitations — it often doesn’t find the relevant information that matches the user’s search intent. 🔍 Here’s how Cohere’s multilingual text unde...

1. Everything I've managed to gather about Jordon Walker. As he ghosted off the world wide web information is very limited, but through some old tricks and other twitter users I've...

Where does generative retrieval have a significant advantage over bi-encoder retrieval? Our #EMNLP2022 paper "Generative Multi-hop Retrieval" shows that the answer is 🦘multi-hop🦘 r...

Our education methods aren't working. We're using industrial-era education practices to prepare ourselves for an internet-era world. It's incredibly broken, and we're suffering f...

SQL: What is Indexing? a thread...

Retrieval-based models are increasingly important in NLP/QA. But an important factor in modeling text is knowing *where* it came from. Our #ICLR2022 paper proposes retrieval-based...

Portals complement main topics in Wikipedia,and expound upon topics by introducing the reader to key articles, images,and categories that further describe the subject and its relat...

This is HUGE!!!! @VRSVirginia and @jkbjournalist are you aware of this? https://t.co/fV9rfCrxJP