REVIEWS AI

Jetson copilot review: Exploring llama3 and RAG applications based on Jetson Orin 64GB

DFRobot Jul 12 2024 261470

The newly released Jetson Copilot has attracted a lot of attention. Through this review based on the Jetson Orin 64GB platform, we will have a comprehensive understanding of the functions and performance of Jetson Copilot and its potential in practical applications. We will guide you through every step from installation to startup, and experience its interaction with the llama3 8b model and how to use pre-built indexes for efficient questioning.

Installation and startup

To start using Jetson Copilot, you first need to clone its code repository from GitHub:

Bash
git clone https://github.com/NVIDIA-AI-IOT/jetson-copilot/
cd jetson-copilot
./setup_environment.sh

./launch_jetson_copilot.sh

After executing the above command, Jetson Copilot will start the Ollama server and Streamlit application inside the Docker container. You can access the web application hosted on Jetson through the URL output by the console.

On Jetson, you can use a web browser to open the local URL (http://localhost:8501) to access the application. If you are using a PC on the same network connected to Jetson, you can also access it through the network URL.

Interact with llama3 8b (Jetson Orin enables 50W power consumption mode)

Jetson copilot currently only supports llama3 8b models. Due to the loading of the model, the first conversation speed is slow, and the subsequent conversation speed is about 13 tokens/s.

Figure: Demo of llama3 8b

RAG

Ask Copilot relevant questions using a pre-built index

Copilot's example is a Jeston Orin operation document. From the demonstration video, it can be seen that Copilot takes about 26 seconds to search and generate content from the index document.

Figure: Demo of Copilot

Create your own index based on your documents and ask questions

Use the LattePanda Mu product webpage content from DFRobot online store as the index document:

Building jetson-copilot/index folder

In addition, Jetson Copilot currently only supports the mxbai-embed-large embedding model. mxbai-embed-large is an advanced embedding model that achieved the best performance on MTEB (Massive Text Embedding Benchmark) as of March 2024, surpassing the Bert-large size model. It uses contrastive training and AnglE loss function for fine-tuning, making it adaptable to a wide range of topics and fields, suitable for various practical applications and retrieval-enhanced generation (RAG) use cases.

When processing data, Jetson Copilot uses Chunk size to split the data set into small blocks, and uses Chunk overlap to ensure that there is a certain overlap between the split data blocks to reduce edge effects.

Building jetson-copilot/index folder

The generated folder will be in the jetson-copilot/index folder:

jetson-copilot/index folder

Test that multiple URLs can still generate index documents:

multiple URLs test

Figure: Demo

You can also choose to use OpenAI's embedding model to generate an index file:

Figure: OpenAI's embedding model

Conclusion

Jetson Copilot, an advanced tool based on NVIDIA Jetson Orin, provides an easy command line startup method.

llama3 exploration scenario:

Currently, it is optimized for the llama3 8b model, ensuring a smooth conversation experience and processing about 13 tokens per second.

RAG application built with llama3:

In addition, it also supports efficient index creation using the mxbai-embed-large model. Regarding data processing, users can flexibly adjust the chunk size and chunk overlap of data blocks to optimize data segmentation and reduce information loss. Jetson Copilot also allows users to use OpenAI's embedding model to build index files, further enriching its functionality. The process of retrieving and generating content from indexed documents takes about 26 seconds, and the actual output token speed is also 13 tokens/s. Jetson Copilot is a comprehensive and easy-to-use tool that is very suitable for a variety of practical application scenarios and retrieval enhancement generation (RAG) tasks.

Comparison of performance of different frameworks

When using the MLC/TVM framework, the performance of different large language models on Jetson Orin is also different. It can be seen that the text generation rate of the Llama3 8B (int4) on Jetson AGX Orin under the MLC/TVM framework reaches 40 tokens/s.

SLM text generation rate

Reference

1. code: https://github.com/NVIDIA-AI-IOT/jetson-copilot/

2. Jetson-copilot TODO:

Jetson-copilot TODO

FAQ:

1. Error: Unable to open localhost, solution: give docker permissions

Bash
sudo usermod -aG docker root
sudo reboot

2. Network error, solution: reconnect to the network and restart

Network error and solution