A Quick Guide to Run LLMs Locally on PCs

The past few weeks have been exciting in the GenAI space with Apple launching its open-source LLM called OpenELM, which can be used locally on devices rather than on the cloud. Other organizations have joined the race to launch their versions of Large Language Models.

So, why is everyone interested in working with LLMs? I was wondering about these questions recently and was intrigued to explore some Ollama models(more on this in the coming sections).

Large Language Models are so popular because of their wide applications in NLP, automation, business, and healthcare, that most organizations leverage their in-house LLMs for daily tasks.

An introduction to LLMs has been covered here. Check it out!

What Is Ollama?

Ollama is a project aiming to make LLMs more accessible. It is a tool that provides open-sourced and free LLMs to be downloaded and run locally. Once an Ollama LLM is downloaded, it can be used with a few commands in CLI.

Ollama currently supports the usage of these LLMs.

Llama – A state-of-the-art model from Meta, which is available in 8B,45B, and 70B parameter sizes and different versions(llama2, llama3, llama3.1, llama2-Chinese)
Gemma – Gemma is a lightweight model by Google Deepmind. It has different versions – gemma,codegemma, and gemma2, with varying parameter sizes
Mistral – Mistral is a collection of LLMs by Mistral AI ranging in different parameter sizes and versions
Phi – Phi is a family of lightweight and SOTA models developed by Microsoft, and it has different models ranging from mini(3B) and medium(7B)
Llava – Llava is a special multi-modal model built to understand visual and language understanding

There are a few other open-sourced LLMs like qwen from Alibaba, mxbai from mixedbread.ai, which add to the research and development of intelligent agents.

The B in parameter size of a model(example – 7B) refers to the number of parameters the model has learned during training. The more the parameters, the powerful the LLM is. Also, large parameter size directly effects the computational resources required.

How to Download Ollama?

Ollama can be downloaded for MacOS, Linux, and Windows using the link in resources.

Once Ollama is downloaded, select a model of your choice from the Models page. For demonstration purposes, we will use gemma2.

On the gemma2 model page, we can see the command used to pull the LLM.

We can use the below command to run any LLM from Ollama.

ollama run <model>

Another useful command is ollama list which gives a list of models successfully downloaded and ready to use.

Let us start with the introductions, shall we?

Hi! can you tell me who you are and what can you do for me?

This is the prompt I passed to the Gemma model, to which it replied as follows:

So, let’s test all the capabilities of Gemma in the following sections.

How to Create Content Using Gemma?

In this section, the Gemma2 model is asked to create a poem about Tom and Jerry, but in the Renaissance era, to test if the agent has got the history right.

Here’s the prompt:

create a short poem about tom and jerry in the renaissance era

Gemma's Creative Poem — Gemma’s Creative Poem

The model used a few key points from the Renaissance period to create this poem; Florence is often referred to as the birthplace of Renaissance, Leonardo da Vinci was a popular painter in that period and Fresco refers to the method of painting used by Renaissance artists.

Atomic Habits – A Review by LLM

Let us ask the model to summarize a book in short.

Prompt: can you summarize "Atomic Habits" by "James Clear" in exactly 500 words?

Does Gemma Understand French?

Let us pass a beginner French text to the model and ask it to translate.

The model gives the following text as an output, but it took a lot of time for this task. This might be because of the huge parameter size of the model and the computational resources of the device.

If the execution takes an unusual amount of time or if the model doesn’t work as expected, it is suggested to experiment with a model having a small parameter size – for example, phi3:3B.

We can validate the response with Google Translate, by passing the same text.

Wrap Up

In this post, we have looked at Ollama, the docker-like tool for running open-sourced LLMs locally. We have discussed the available models and explored the installation and setup. Then, we leveraged the power of LLMs locally by running Gemma to create, summarize, and translate.

Apart from the above tasks, these models can also be helpful as personal assistants, helping us understand code and learn. More on these capabilities in the next post!

Stay tuned and keep learning!