Sunday, February 11, 2024

Build a Chat Application with Ollama and Open Source Models

Creating a chat application that is both easy to build and versatile enough to integrate with open source large language models or proprietary systems from giants like OpenAI or Google is a very worthwhile venture. Such flexibility allows developers to choose the best model for their needs, whether prioritizing privacy and security by running models locally or leveraging the advanced capabilities of cloud-based services.

Running open source language models can enhance data security, as sensitive information would never leave the organization, offering a compelling advantage in scenarios where confidentiality is paramount.

Additionally, this approach can lead to substantial cost savings by eliminating the need for purchasing tokens for every interaction, a common expense when using external APIs. Because using propriety models can get expensive - especially in test mode. So it would be great if an engineer could build out the model and test it with an open source large language model and then just by changing a couple of lines of code switch to either a different open source LLM or to a proprietary model.

Using open source models democratizes access to cutting-edge AI technologies but also empower developers and businesses of all sizes to create more personalized and secure communication tools without breaking the bank. Although this has been possible for many months, it has gotten easier and easier.

With the recent release from Ollama, I will show that this can be done with just a few steps and in less than 75 lines of Python code and have a chat application running as a deployable Streamlit application.  

But first, what is Ollama?

Ollama offers a straightforward and user-friendly platform for operating large language models, right now catering especially to MacOS and Linux users, with plans to extend support to Windows in the near future. It accommodates a wide variety of models, such as Lama 2, CodeLlama, Phi, Mixtral, etc. but the one I'll be using in this example is Mistral 7B. It is a relatively simple setup process. (If you have Windows and don't want to wait for Ollama to be available, you can use LM Studio.) Ollama creates a server endpoint that you can use in your application.

Chat Application:

Step 1:

Download and install Ollama.

https://ollama.com/

This is a straightforward installation process and will place an icon on the menu bar that Ollama is available.

Step 2:

Download an LLM model.

In this example, we will be using Mistral 7b. To download the model run this command in the terminal:

ollama pull mistral

The ollama pull command downloads the model. If you want a different model, such as Llama you would type llama2 instead of mistral in the ollama pull command. A full list of available models can be found here.

Step 3:

Run the LLM model Mistral.

To run Mistral 7b type this command in the terminal.

ollama run mistral

This will start Mistral at the endpoint of http://localhost:11434/v1. You could start chatting with the endpoint right in the terminal, but what we are going to do is use this endpoint in our chat application.

Step 4:

Build the chat application.

The newest version of Ollama that came out last week includes OpenAI compatibility as explained here. So if you don't have the latest version go ahead and download it. With OpenAI compatibility we get a standardized output regardless of what LLM model we are using. Before this release, I used LiteLLM to get this compatibility, but this is no longer necessary.

All of the code for this application can get be found at this github repository.

But let's walk through the major parts of the code.

In order to initialize the client and get the OpenAI compatibility, we create a base URL from the Ollama endpoint. For api_key, we put 'ollama', but this could be anything since there's no API key. If we were using the OpenAI API, we would put our API key here.

client = openai.OpenAI(

      base_url = 'http://localhost:11434/v1',

      api_key='ollama', # api_key is required, but unused for local models

  )

We also need to keep track of the conversation and send it to the LLM with each API call, so that the LLM remembers what was said before, i.e. we are sending not just the current prompt but the history of preceding prompts and responses. Since this is Streamlit, I'm using st.session_state.

I append both the prompt question and the response answer:

    q = {

      "role": "assistant",

      "content": response.choices[0].message.content

    }

    st.session_state.message_list.append(q)

Because of how Streamlist rewrites the UI every time the user makes an input change, I loop through everything in session_state and write it to the chat control:

prompt = st.chat_input("Ask a question")
  if prompt:
    
    with st.spinner('Thinking...'):
            
      answer = conversation.message(prompt)
            
      for l in st.session_state.message_list:
        
        if l['role'] == 'user':
          with st.chat_message("user"):
            st.write(l['content'])
        elif l['role'] == 'assistant':
          with st.chat_message("assistant"):
            st.write(l['content'])

Here is the full code:

import openai
import streamlit as st

if 'message_list' not in st.session_state:
st.session_state.message_list = [
{"role": "system", "content": "You are a helpful assistant."}
]

class Conversaion:

client = openai.OpenAI(
base_url = 'http://localhost:11434/v1',
api_key='ollama', # api_key is required, but unused for local models
)
def __init__(self):
pass
def message(self, question):
q = {
"role": "user",
"content": question
}
st.session_state.message_list.append(q)
response = self.client.chat.completions.create(
model="mistral",
messages=st.session_state.message_list
)
q = {
"role": "assistant",
"content": response.choices[0].message.content
}
st.session_state.message_list.append(q)
return response.choices[0].message.content

if __name__ == "__main__":
st.title('Chatbot')

message = st.chat_message("assistant")
message.write("Hello human!")

conversation = Conversaion()
prompt = st.chat_input("Ask a question")
if prompt:
with st.spinner('Thinking...'):
answer = conversation.message(prompt)
for l in st.session_state.message_list:
if l['role'] == 'user':
with st.chat_message("user"):
st.write(l['content'])
elif l['role'] == 'assistant':
with st.chat_message("assistant"):
st.write(l['content'])


Step 5:

Run the application

Before you run the application it is best to create a conda environment. Let’s call it chat:

conda create -n chat python=3.11

conda activate chat

You will also need to install dependencies:

pip install openai

pip install streamlit

Now you can run the streamlit application:

streamlit run app.py

Let's look at the UI and output.

As you can see, it answers the question and then in the followup question, it knows that the question of, "Could you explain more?" refers to the preceding conversation.


And that's it! In a few easy steps and less than 75 lines of code, we now have a chat application running that is using a local open source LLM.


No comments:

Post a Comment

Elements of Monte Carlo Tree Search - Typical and Non-typical Applications

Monte Carlo Tree Search (MCTS) offers a very intuitive way of tackling challenging decision making problems. In essence, MCTS combines the...