Engineering

ON THIS PAGE

Jan 30, 2024

Engineering

Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 3

Sergey Leksikov
Machine Learning Researcher

Jan 30, 2024

Engineering

Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 3

Sergey Leksikov
Machine Learning Researcher

Part 1. Introduction to LLMs and Tool Interaction

Part 2. Backend.AI Gorilla LLM model serving

Part 3. Making own API Retriever and Question Answering system with few lines of code locally without training and serving LLM

Part 3. Making own API Retriever and Question Answering system with few lines of code locally without training and serving LLM

Previously, in Part 1 we talked about Tool LLM and their usage. Part 2 demonstrated how to run Gorilla LLM on Backend.AI. In the Part 3, there will be talk about the case when there are no GPU available, but we still want to get help and assistance regarding our API.

Suppose we have Backend.AI, and we want to get information about Backend.AI REST API and Functional API in more interactive way via question answering style. The example of REST API can be described in this documentation: https://docs.backend.ai/en/latest/manager/rest-reference/index.html

Figure 1. Backend.AI REST API Documentation

In addition, Backend.AI REST API documentation can be exported into openapi.json format:

Figure 2. Backend.AI openai.json

Another source of BackendAI API is functional API defined in Backend.AI Client. We want to know how to interact with Backend.AI and which parts of code are responsible. The client code repository is responsible with managing and interacting with cloud and computing environment:

Steps to make a Question Answering API system

Let’s setup Backend.AI Client locally from https://github.com/lablup/backend.ai/tree/main/src/ai/backend/client on our local PC environment and create a new directory bai-dev/src/ai/backend/client/gpt_api_client

Figure 3. The directory location of gpt_api_client

At vector_data directory let’s create two sub directories data1/ which will store a REST api documentation: openapi.json and data2/ will store selected B.AI Client files over which we want to do an API Question Answering.

Figure 4. Overview of data directories with openapi.json and client function code files

Let’s install python library LlamaIndex library. Pip install llama-index. Note LlamaIndex is not related to Meta LLaMA language model. LlamaIndex is about data structures and methods for efficient processing and storing documents for retrieval.
Let’s convert our api and code files into an embedded vector and store them in a Vector Database with LLamaIndex. Let’s use Jupyter Notebook interactive environment which is also integrated in out VSCode on a local PC.

Figure 5. Jupyter Notebook interactive environment. Loading openapi.json from data/ directory. Then asking questions from query engine over a vector index.

Vectorize data2/ directory with our code functions

Figure 6. Load data2/ directory with code files from B.AI Client. Then vectorize them into index and create a question answering engine.

We can save both indexes using python Pickle or Joblib libraries which are commonly used for storing and serializing objects to later load them into system. joblib.dump(index, "rest_api_index.joblib") and joblib.dump(index, "functional_index.joblib")

Jupyter Notebook environment already provides to us ability to ask questions and get response in interactive way. Additionally, we can load the saved vectorized indexes on FastAPI server and answer questions over the web. In previous Part 2, we set computational session with Gorilla LLM. From the previous demo we still have a computational session with a FastAPI server.
Let’s transfer the files rest_api_index.joblib and functional_index.joblib to api_helper/ vFolder at Backend.AI Cloud session
At file server.py load the vector indexes and define the query engines.

Figure 7. server.py definition of index files and query engine.

For each query engine we specify an FastAPI endpoint.

Figure 8. Code snippets for REST and Functional API retrieval

Test server response from your local PC using curl command. When a server gets queried on a specific endpoint, it will get an answer from a user.

curl -X POST -H "Content-Type: application/json" -d '{"instruction":"Create a new session"}' http://127.0.0.1:8000/rest_api

Figure 9. Command line response from curl command. Example 1

curl -X POST -H "Content-Type: application/json" -d '{"instruction":"Create a new session"}' http://127.0.0.1:8000/functional

Figure 10. Command line response from curl command. Example 2

In addition, we can make a web app which receives user input, sends to corresponding endpoint, and receives the answer.

Figure 11. A web app prototype for Question Answering over Backend.AI REST and Functional API. Example 1

Figure 12. A web app prototype for Question Answering over Backend.AI REST and Functional API. Example 2

Conclusion

In Part 3, we demonstrated how to locally create a Question-Answering system using open-source python library LLamaIndex which helped to convert our documents and Backend.AI code into vector form. The question answering can be done in interactive way in a Jupyter Notebook environment which Visual Studio Code supports with plugins. Furthermore, we decided to move those vector indexes to a Backend.AI Cloud environment where a Gorilla LLM API tuned model is server. Then an API Question-Answering web app was implemented to assist users over network.

Reference:

LLama Index. https://docs.llamaindex.ai/en/stable/

Demo video for Backend.AI API Helper and Gorilla LLM:

Part 1. Introduction to LLMs and Tool Interaction

Part 2. Backend.AI Gorilla LLM model serving

Part 3. Making own API Retriever and Question Answering system with few lines of code locally without training and serving LLM

Blog

Engineering

Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 3

Backend.AI Meets Tool LLMs : Revolutionizing AI Interaction with Tools - Part 3

Part 3. Making own API Retriever and Question Answering system with few lines of code locally without training and serving LLM

Conclusion

Reference: