In this tutorial, we are going to make our PDF chatbot a little less dumb. The problem with the previous implementation is:
- Full contents of the PDF are sent to LLM instead of intelligently sending only the content relevant to the question prompt.
- Full content can very easily bypass the token limit. In gpt-3 it is 4096 which is quite small.
- In a few API calls, we can lose several dollars because we are sending whole PDF content to OpenAI.
The main idea is to send only the contents which is relevant to the question. We have already seen that embedding helps vectorize the text and put them in higher dimensional space. Where similar text is represented by nearby vectors.
Architecture:
Implementation:
This time we will need some more third-party packages, lets add them to our requirements.txt file:
## old requirements here
langchain-community==0.0.36
langchain-chroma==0.1.0
Let us now create a new file named qna_with_embeddings.py and put the below code:
import os
from dotenv import load_dotenv
from langchain.text_splitter import CharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain_openai import OpenAIEmbeddings
from langchain_openai import OpenAI
from langchain.chains import RetrievalQA
load_dotenv(".env")
api_key = os.getenv("OPENAI_SECRET_KEY")
pdf_loader = PyPDFLoader('./pdfs/large_resume.pdf')
documents = pdf_loader.load()
# chunk_overlap is the number of characters that each chunk overlaps with the previous chunk
# this helps to ensure that the model can capture the context of the entire document
text_splitter = CharacterTextSplitter(separator="\n", chunk_size=100, chunk_overlap=80, length_function=len)
chunks = text_splitter.split_documents(documents)
print(chunks)
print("api_key", api_key)
embeddings = OpenAIEmbeddings(api_key=api_key)
vector_store = Chroma.from_documents(chunks, embeddings)
chain = RetrievalQA.from_chain_type(
llm=OpenAI(api_key=api_key),
retriever=vector_store.as_retriever(search_kwargs={"k": 2}),
return_source_documents=True,
verbose=True
)
response = chain.invoke({"query": "What is the email address of the candidate?"})
print("Output ",response.get("result"))
The PDF content is split into smaller chunks of text using the CharacterTextSplitter with a chunk size of 40 characters and an overlap of 10 characters between chunks to maintain some context between chunks. The text chunks are embedded using OpenAI's embedding model, and the embeddings are stored in a Chroma vector store. A RetrievalQA chain is created, which combines the OpenAI language model (LLM) with the Chroma vector store as a retriever. The chain is invoked with the query "What is the email address of the candidate?", which retrieves relevant chunks from the vector store, passes them to the LLM, and generates a response based on the retrieved information. Specifically, "k": 2
is a keyword argument that specifies the number of nearest neighbors (or most relevant chunks) to retrieve from the vector store. In this case, it is set to 2, which means that the retriever will return the two most relevant chunks of text based on their similarity to the query.
Try executing the code with python qna_with_embeddings.py