In my previous post titled, "Build a Chat Application with Ollama and Open Source Models", I went through the steps of how to build a Streamlit chat application that used Ollama to run the open source model Mistral locally on my machine. Refer to that post for help in setting up Ollama and Mistral. In this post, I will extend some of those ideas and show how to create a "Research Assistant" using Ollama, Mistral, RAG, LlamaIndex, and Streamlit.
This application will have two parts:
- Document retrieval: I will build a page that will use the arXiv repository API to pull the most relevant documents for a topic into a vector based index using LlamaIndex.
- Document chat: Based on all of the documents that have been pulled into the vector database, I will build a chat interface page that allows the user to chat on topics that are in the database using either Mistral or OpenAI - the user will be able to pick which LLM they want to use to chat with all of the documents that have been built up in the database.
But first what is RAG?
Retrieval Augmented Generated systems (RAG) are AI systems that enhance an output's relevance and accuracy by combining the strengths of large language models. The basic idea behind retrieval augmented generation is to enhance the language model's output by retrieving and incorporating relevant information from a large corpus of text or knowledge base. This approach aims to address the limitations of traditional language models, which can sometimes generate factually incorrect or inconsistent text due to their limited knowledge or understanding of the world.
The key advantages of retrieval augmented generation include:
- Improved factual accuracy and consistency - reducing hallucinations: By incorporating relevant information from external sources, the generated text is more likely to be factually accurate and consistent with real-world knowledge.
- Enhanced knowledge coverage: The model can leverage a vast amount of information from a knowledge base, effectively expanding its knowledge beyond what is encoded in a language model.
- Adaptability: The retrieval can be tailored to specific domains or knowledge sources, allowing the model to generate text that is relevant and accurate for a particular domain or task.
- Overcoming a model's training cutoff date: Language models have an effective cutoff date that they have been trained on and cannot respond accurately on events that happened after that date. By using RAG with new documents, the LLM can have access to knowledge past its cutoff date.
In this application I will use LlamaIndex to implement RAG. LlamaIndex is great at ingesting data from a wide variety of sources (PDFs, Word files, images, audio, PPT, etc.). LlamaIndex has a very convenient function called SimpleDirectoryReader that can read through all of the files in a directory and if it is one of the many files it can load it will load it. These files will be stored as vector based embeddings. From the LlamaIndex documentation:
"Embeddings are used in LlamaIndex to represent your documents using a sophisticated numerical representation. Embedding models take text as input, and return a long list of numbers used to capture the semantics of the text. These embedding models have been trained to represent text this way, and help enable many applications, including search!"
But before we can embed some documents to search and chat with, we need to get some documents for our database. This is the data acquisition page from above. On this page, the user will enter a topic such as "Mamba in AI" and the code will use the arXiv API to download the most relevant and recent PDF documents. arXiv is a repository of several million scholarly documents from everything from computer science to physics.
The code will then create embeddings for those documents and make those embeddings available to chat with. In this application, I am using OpenAI embeddings (text-embeddings-ada-002) and QDrant embeddings. The QDrant embeddings will be used with Mistral. In a real application you would most likely only be doing one kind of LLM with one kind of embedding. But for illustration purposes, I'm doing both OpenAI and Mistral. QDrant is an open source set of embeddings and the LlamaIndex documentation for QDrant can be found here. So if you want to use a completely open source version of both the LLM and the embeddings and not have to worry about token pricing then using Mistral and QDrant is one of many possible options.
The code to pull the documents for the topic from arXiv is in "research.py." It uses a python module to make using their API a little easier appropriately called arxiv.
pip install arxiv
research.py:
''' Get papers from arXiv using the arXiv API '''
def get_arxiv(query, num_documents):
search = arxiv.Search( query = query, max_results = num_documents, sort_by = arxiv.SortCriterion.Relevance, sort_order = arxiv.SortOrder.Descending )
titles = [] summaries = [] authors = [] published = [] links = []
for result in search.results():
titles.append(result.title) summaries.append(result.summary) authors.append(', '.join(author.name for author in result.authors)) published.append(result.published) links.append(', '.join(str(link) for link in result.links))
result.download_pdf(dirpath="./documents")
df = pd.DataFrame({'title': titles, 'summary': summaries, 'authors': authors, 'published': published, 'links': links})
df = df.sort_values(by='published', ascending = False) df = df.reset_index(drop=True)
if os.path.exists('documents.csv'):
df.to_csv('documents.csv', mode='a', header=False, index=False)
else:
df.to_csv('documents.csv', index=False)
return df
The code uses the arxiv "Search" function to get the most relevant articles based on the user inputted topic and number of documents to retrieve and store those documents in the documents folder. I then write the meta data for the articles appending it to a "documents.csv" file. One change that you could make here is to instead store this metadata in a database.
This arxiv function is called from the Streamlit UI page: "1 - Data Acquistion.py." This Streamlit page will ask the user for a topic and the maximum number of documents to retrieve. After receiving the documents from the arxiv function, it will create the embeddings and the Llamaindex client (query_engine) in "client.py."
1 - Data Acquistion.py:
import streamlit as st import pandas as pd import os
from pages.utilities.research import get_arxiv from pages.utilities.client import get_mistral_query_engine, get_gpt_query_engine
if __name__ == "__main__":
st.set_page_config(layout="wide") st.title('Research Assistant')
st.divider()
with st.sidebar:
max_documents = st.number_input("Max number of documents:", value=10)
topic = st.text_input('Research Topic:')
with st.spinner('Thinking...'):
if len(topic) > 0:
get_arxiv(topic, max_documents)
if os.path.exists('documents.csv'):
df = pd.read_csv("documents.csv") df = df.drop_duplicates(subset=['title'])
st.dataframe(df)
if topic:
try: query_engine = get_mistral_query_engine(True) query_engine = get_gpt_query_engine(True)
except: pass
client.py:
import streamlit as st import os import qdrant_client
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader from llama_index.core import ( load_index_from_storage, ServiceContext)
from llama_index.llms.ollama import Ollama from llama_index.core.storage.storage_context import StorageContext from llama_index.vector_stores.qdrant import QdrantVectorStore
@st.cache_resource def get_mistral_query_engine(data_changed):
llm_model = Ollama(model="mistral") collection_name = "storage"
if 'qdrant_client' not in st.session_state:
st.session_state.qdrant_client = qdrant_client.QdrantClient(path="./qdrant_data")
if 'vector_store' not in st.session_state:
st.session_state.vector_store = QdrantVectorStore(client=st.session_state.qdrant_client, collection_name=collection_name)
if 'service_context' not in st.session_state:
st.session_state.service_context = ServiceContext.from_defaults(llm=llm_model, embed_model="local")
if 'storage_context' not in st.session_state:
st.session_state.storage_context = StorageContext.from_defaults(vector_store=st.session_state.vector_store)
qdrant_persist_dir= "./qdrant_data/collection/storage"
if not os.path.exists(qdrant_persist_dir) or data_changed:
documents = SimpleDirectoryReader("documents").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=st.session_state.service_context, storage_context=st.session_state.storage_context)
index.storage_context.persist()
else:
index = VectorStoreIndex.from_vector_store(vector_store=st.session_state.vector_store, service_context=st.session_state.service_context)
query_engine = index.as_query_engine(streaming=False)
return query_engine
@st.cache_resource def get_gpt_query_engine(data_changed):
gpt_persist_dir = "./storage"
if not os.path.exists(gpt_persist_dir) or data_changed:
documents = SimpleDirectoryReader("documents").load_data() index = VectorStoreIndex.from_documents(documents) index.storage_context.persist()
else:
storage_context = StorageContext.from_defaults(persist_dir=gpt_persist_dir) index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()
return query_engine
As you can see, it used information from our database and not information it had been trained on. We know this because Mamba came out in December of 2023 - after the training cutoff for both models.
No comments:
Post a Comment