5 Quick Steps to Run an LLM on a GPU-poor Laptop

It might seem surprising, but thanks to llama.cpp and Hugging Face's LMStudio community it's pretty easy to run a large language model (LLM) on your own laptop. You don't even need much in the way of resources or technical know how. After all, the laptop I'm using is running Ubuntu and has an old Intel Core i7 4-core CPU and a paltry 16 GB of RAM!

Why would you do such a thing? Well, llama.cpp is the go-to inference framework that allows one to operationalize (i.e., use) an existing open source large language model. With llama.cpp and it's command line interface (llama-cli) you can easily download, run, and send prompts to a wide range of state-of-the-art LLMs. So if you want a private AI, something to aid you in development, or are just trying to learn more about these technologies you will find llama.cpp very helpful.

Without further ado, here are the steps to install what you need on Ubuntu:

1) Install the libcurl4 dev package so that your setup can easily download AI models:

 apt install libcurl4-openssl-dev

2) Download, decompress, and compile ccache:

tar xf ccache-4.10.2-linux-x86_64.tar.xz
cd ccache-4.10.2-linux-x86_64/
make install

3) Clone the llama.cpp repo:

git clone https://github.com/ggerganov/llama.cpp

4) From inside the llama.cpp repo, compile llama.cpp:

make LLAMA_CURL=1 -j 4

Here I enabled Llama to use CURL and used the j flag to indicate the number of parallel processes used. My CPU has only four cores so I input 4.

5) Start using llama-cli! What you do is specify the hugging-face repo (--hf-repo) and file (--hf-file), which will check to see if you already have the model, otherwise it will download it. For example, here's a recent Microsoft release:

./llama-cli --hf-repo lmstudio-community/
Phi-3.5-mini-instruct-GGUF --hf-file Phi-3.5-mini-instruct-Q6_K.gguf -p "Your prompt goes here!" -c 8192 -cnv --color -t 4

Or a recent Google release:

./llama-cli --hf-repo lmstudio-community/gemma-2-2b-it-GGUF --hf-file gemma-2-2b-it-Q8_0.gguf -p "Your prompt goes here!" -c 8192 -cnv --color -t 4

You can find all of them here: https://huggingface.co/lmstudio-community. The additional flags I used are as follows:

-p is your prompt.
-c is the context size.
-cnv is "conversation mode", more like a ChatGPT experience.
--color enables different colors of text in the llama-cli responses in your terminal.
-t specifies the number of threads to use (I use a paltry 4 core CPU).

If you have an Nvidia graphics card make sure you check out the flags for CUDA support! You can find the documentation for llama-cli flags here (or via the command line, of course): https://github.com/ggerganov/llama.cpp/tree/master/examples/main

Happy local LLMing!