Extract tables from documents using @llama_index UnstructuredElementParser and then use RecursiveRetriever to enable hybrid tabular/semantic queries and also comparisons over multiple docs.
Let's see how to use this advanced RAG technique 🧵👇
Let's see how to use this advanced RAG technique 🧵👇
@llama_index First we load the documents.
Then we create the new UnstructuredElementNodeParser from LLamaIndex.
Then we create the new UnstructuredElementNodeParser from LLamaIndex.
@llama_index This parser:
- extracts tables from data
- converts those tables to Dataframe
- for each of those tables, it creates 2 nodes
- one Table Node that contains the Dataframe as string
- another IndexNode that stores the summary of that table and a reference to that Table Node
- extracts tables from data
- converts those tables to Dataframe
- for each of those tables, it creates 2 nodes
- one Table Node that contains the Dataframe as string
- another IndexNode that stores the summary of that table and a reference to that Table Node
@llama_index Next we partition the nodes using this built-in function of the Unstructured parser.
Here BaseNodes contain the regular nodes and the IndexNodes (not the Table Nodes)
NodeMapping contains {id->Node} mapping for those remaining Table Nodes.
Here BaseNodes contain the regular nodes and the IndexNodes (not the Table Nodes)
NodeMapping contains {id->Node} mapping for those remaining Table Nodes.
@llama_index Next, we create the vector_index using these BaseNodes (that doesn't have the Table nodes) and then create a vector_retriever with this index.
@llama_index Then, we create the RecursiveRetriever (detailed guide on this amazing retriever is in the oven , so stay tuned 🔥)
1st argument is the id of the recursion root, this is the retriever from where recursive retriever starts retrieving.
1st argument is the id of the recursion root, this is the retriever from where recursive retriever starts retrieving.
@llama_index 2nd argument is a dictionary containing all the retrievers, here we have only one, the root one, which we created using the base nodes earlier.
@llama_index For this use case, we only supply the NodeMapping of the Table nodes as node_dict argument to the RecursiveRetriever.
These node(s) will be retrieved if the IndexNodes referring to one of these Table nodes is retrieved by our root retriever.
These node(s) will be retrieved if the IndexNodes referring to one of these Table nodes is retrieved by our root retriever.
@llama_index Now if we try some queries referencing info from the table, we'll get better retrieval compared to the naive top-k RAG.
@llama_index Details about it on the official documentation:
#extract-elements" target="_blank" rel="noopener" onclick="event.stopPropagation()">docs.llamaindex.ai
#extract-elements" target="_blank" rel="noopener" onclick="event.stopPropagation()">docs.llamaindex.ai
@llama_index Thanks for reading.
I write about AI, LLMs, RAG etc. and try to make complex topics as easy as possible.
Stay tuned for more ! 🔥 #AI #RAG
I write about AI, LLMs, RAG etc. and try to make complex topics as easy as possible.
Stay tuned for more ! 🔥 #AI #RAG
Loading suggestions...