Running Vicuna locally: the complete idiot's guide to basically running ChatGPT on your own laptop for free

The new Vicuna-13B large language model is a Very Big Deal. A big enough deal to drive one of the biggest players in AI into a state of panic. Why? Because it produces results of near-ChatGPT quality with a model that’s a fraction of the size, is freely available, and runs comfortably on consumer-grade laptops instead of requiring a massive server farm.

This feels like a flashbulb moment for AI, and I think it’s important enough of a milestone that people should experience and contemplate it themselves first-hand. That goes for AI fanbois and doomers, alike.

The good news is that you don’t need to be an expert to try this for yourself. Thanks to the efforts of the open source community, even armchair amateurs like me can do it. All you need is a Mac with an Apple Silicon CPU (i.e. M1 or M2) and the willingness to copy and paste some commands.

When you’re done, you’ll have something that is nearly as good as ChatGPT, if noticeably slower. But it will be running entirely locally, on your own laptop, with complete privacy and freedom. Quite the feat.

May 23, 2023 update: the instructions and download links below have been updated to reflect recent changes to llama.cpp, which require downloading a Vicuna weights file that is in a new format.

How to run Vicuna on your MacBook

Open Terminal.
If you don’t have them already, install the Xcode command line tools.

% xcode-select --install
Confirm that you have Python 3.10 installed.

% python3 --version

If this command doesn’t return a version number (e.g. “Python 3.10.8”) follow these instructions to install Homebrew and Python3:

% /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

% brew install python@3.10
Clone llama.cpp from GitHub.

% git clone git@github.com:ggerganov/llama.cpp.git
Build the app.

% cd llama.cpp

% make

% python3 -m pip install -r requirements.txt
Click here to download the Vicuna model data file from HuggingFace (be warned: it’s a 7.3 GB file).
Return to Terminal and move the model data file (”stable-vicuna-13B.ggmlv3.q4_0.bin”) to the necessary location.

% mv ~/Downloads/stable-vicuna-13B.ggmlv3.q4_0.bin ~/llama.cpp/models/
Start the chat interface by issuing the following command.

% ./main -t 8 -m ./models/stable-vicuna-13B.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 --interactive-first -r "### Human:" -p "### Human:"

IMPORTANT: Where you see -t 8 above, change the 8 to match the number of CPU cores your Mac has:
- M1 = 8
- M1 Pro = 8 or 10
- M1 Max = 10
- M1 Ultra = 20
- M2 = 8
- M2 Pro = 10 or 12
- M2 Max = 30 or 38
The model is now running. You should see:

Running in interactive mode.

Press Ctrl+C to interject at any time.

Press Return to return control to LLaMa.

If you want to submit another line, end your input in ‘\’.

### Human:
Just start typing, as if you were using ChatGPT! You’ll notice it is slower than ChatGPT, and there may often be a long pause before responses start appearing, depending on your prompt and your Mac’s CPU. But it should be usable.

(When you’re done and want to exit, hit Ctrl+C twice. To start the model again, just repeat step 8.)

Example output:

### Human: Write a 30-40 word backstory for a fictional character who is the marshal of a small settlement in post-apocalyptic Modesto, CA.

### Assistant: James Heckleworth was once a police officer in Modesto, CA until the world ended. After his wife passed away from radiation exposure, he took up the role of marshal in the small settlement that had formed in the aftermath. Despite his grief and the constant danger of living in this new world, James has proven to be an effective leader and protector of his community. He is tough but fair, often showing compassion towards those in need while maintaining order and safety within the town’s walls.

Congratulations: you’re using an AI that a year ago was the stuff of science fiction, and just weeks ago was under the near-exclusive control of big tech. Welcome to the (very weird) future.

(Image: DALL-E)