Creating a chat application that is both easy to build and versatile enough to integrate with open source large language models or proprietary systems from giants like OpenAI or Google is a very worthwhile venture. Such flexibility allows developers to choose the best model for their needs, whether prioritizing privacy and security by running models locally or leveraging the advanced capabilities of cloud-based services.
Running open source language models can enhance data security, as sensitive information would never leave the organization, offering a compelling advantage in scenarios where confidentiality is paramount.
Additionally, this approach can lead to substantial cost savings by eliminating the need for purchasing tokens for every interaction, a common expense when using external APIs. Because using propriety models can get expensive - especially in test mode. So it would be great if an engineer could build out the model and test it with an open source large language model and then just by changing a couple of lines of code switch to either a different open source LLM or to a proprietary model.
Using open source models democratizes access to cutting-edge AI technologies but also empower developers and businesses of all sizes to create more personalized and secure communication tools without breaking the bank. Although this has been possible for many months, it has gotten easier and easier.
With the recent release from Ollama, I will show that this can be done with just a few steps and in less than 75 lines of Python code and have a chat application running as a deployable Streamlit application.
But first, what is Ollama?
Ollama offers a straightforward and user-friendly platform for operating large language models, right now catering especially to MacOS and Linux users, with plans to extend support to Windows in the near future. It accommodates a wide variety of models, such as Lama 2, CodeLlama, Phi, Mixtral, etc. but the one I'll be using in this example is Mistral 7B. It is a relatively simple setup process. (If you have Windows and don't want to wait for Ollama to be available, you can use LM Studio.) Ollama creates a server endpoint that you can use in your application.
Chat Application:
Step 1:
Download and install Ollama.
https://ollama.com/This is a straightforward installation process and will place an icon on the menu bar that Ollama is available.
Download an LLM model.
In this example, we will be using Mistral 7b. To download the model run this command in the terminal:
ollama pull mistralThe ollama pull command downloads the model. If you want a different model, such as Llama you would type llama2 instead of mistral in the ollama pull command. A full list of available models can be found here.
Run the LLM model Mistral.
To run Mistral 7b type this command in the terminal.
This will start Mistral at the endpoint of http://localhost:11434/v1. You could start chatting with the endpoint right in the terminal, but what we are going to do is use this endpoint in our chat application.
Build the chat application.
The newest version of Ollama that came out last week includes OpenAI compatibility as explained here. So if you don't have the latest version go ahead and download it. With OpenAI compatibility we get a standardized output regardless of what LLM model we are using. Before this release, I used LiteLLM to get this compatibility, but this is no longer necessary.
All of the code for this application can get be found at this github repository.
But let's walk through the major parts of the code.
In order to initialize the client and get the OpenAI compatibility, we create a base URL from the Ollama endpoint. For api_key, we put 'ollama', but this could be anything since there's no API key. If we were using the OpenAI API, we would put our API key here.
client = openai.OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='ollama', # api_key is required, but unused for local models
)
We also need to keep track of the conversation and send it to the LLM with each API call, so that the LLM remembers what was said before, i.e. we are sending not just the current prompt but the history of preceding prompts and responses. Since this is Streamlit, I'm using st.session_state.
I append both the prompt question and the response answer:
q = {
"role": "assistant",
"content": response.choices[0].message.content
}
st.session_state.message_list.append(q)
prompt = st.chat_input("Ask a question")if prompt:with st.spinner('Thinking...'):answer = conversation.message(prompt)for l in st.session_state.message_list:if l['role'] == 'user':with st.chat_message("user"):st.write(l['content'])elif l['role'] == 'assistant':with st.chat_message("assistant"):st.write(l['content'])
Here is the full code:
Step 5:
Run the application
Before you run the application it is best to create a conda environment. Let’s call it chat:
conda create -n chat python=3.11
conda activate chat
You will also need to install dependencies:
pip install openai
pip install streamlit
Now you can run the streamlit application:
streamlit run app.py
Let's look at the UI and output.And that's it! In a few easy steps and less than 75 lines of code, we now have a chat application running that is using a local open source LLM.
No comments:
Post a Comment