Ollama

Ollama is a tool for running large language models (LLMs) locally on your machine. It lets you download, run, and interact with open-source models without needing cloud APIs.

Ollama is recommended for running LLMs offline, experimenting with different models, or building local AI applications without sending data to external servers.

Installation

macOS

brew install ollama

Linux / Windows (WSL)

curl -fsSL https://ollama.com/install.sh | sh

Verify Installation

ollama --version

Getting Started

Download a Model

ollama pull llama3          # Llama 3 (Meta)
ollama pull mistral         # Mistral (Mistral AI)
ollama pull codellama       # Code-specific model
ollama pull phi            # Microsoft's Phi model

List Available Models

ollama list

Run a Model

ollama run llama3

This opens an interactive chat session. Type your message and press Enter to get a response. Type /bye to exit.

Chat via API

You can also use Ollama as a local API:

curl -X POST http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}'

Common Commands

Command	Description
`ollama pull <model>`	Download a model
`ollama list`	List downloaded models
`ollama run <model>`	Start an interactive session
`ollama stop <model>`	Stop a running model
`ollama rm <model>`	Remove a model
`ollama cp <source> <target>`	Copy/rename a model

Configuration

Ollama stores models in ~/.ollama/models. You can change this by setting the OLLAMA_MODELS environment variable.

GPU Acceleration

Ollama automatically uses GPU acceleration if available (CUDA on NVIDIA, Metal on Apple Silicon). For large models, ensure you have enough VRAM.

Custom Prompts

You can customize the system prompt by creating a Modelfile:

ollama create mychat -f ./Modelfile

Use Cases

Local AI assistant — Chat without internet
Coding help — Use codellama for code-related questions
Experimentation — Test different models locally
Privacy-sensitive tasks — Keep data on your machine
Testing LLM integrations — Run a local API for development

Integration with VS Code and Other Tools

Ollama can be used as a backend for various extensions and tools that support local LLM APIs. For example:

Some VS Code extensions support connecting to a local Ollama instance.
You can use Ollama with tools like OpenCode or other AI assistants by pointing them to http://localhost:11434.

Troubleshooting

Model Won’t Load

Check available RAM/VRAM: ollama list shows sizes.
Reduce context size: Use /set verbose false or start with --num ctx 2048.

Slow Performance

Ensure GPU acceleration is enabled.
Close other applications to free RAM.
Try a smaller model like phi or mistral.

API Not Responding

Ensure Ollama is running: ollama serve (if not already). The API runs on http://localhost:11434.

Resources

Ollama Website: https://ollama.com
Ollama GitHub: https://github.com/ollama/ollama
Model Library: https://ollama.com/library

Guohui's Notes

Explorer

Ollama

Installation

macOS

Linux / Windows (WSL)

Verify Installation

Getting Started

Download a Model

List Available Models

Run a Model

Chat via API

Common Commands

Configuration

GPU Acceleration

Custom Prompts

Use Cases

Integration with VS Code and Other Tools

Troubleshooting

Model Won’t Load

Slow Performance

API Not Responding

Resources

Graph View

Table of Contents