Ollama is a tool for running large language models (LLMs) locally on your machine. It lets you download, run, and interact with open-source models without needing cloud APIs.

Ollama is recommended for running LLMs offline, experimenting with different models, or building local AI applications without sending data to external servers.

Installation

macOS

brew install ollama

Linux / Windows (WSL)

curl -fsSL https://ollama.com/install.sh | sh

Verify Installation

ollama --version

Getting Started

Download a Model

ollama pull llama3          # Llama 3 (Meta)
ollama pull mistral         # Mistral (Mistral AI)
ollama pull codellama       # Code-specific model
ollama pull phi            # Microsoft's Phi model

List Available Models

ollama list

Run a Model

ollama run llama3

This opens an interactive chat session. Type your message and press Enter to get a response. Type /bye to exit.

Chat via API

You can also use Ollama as a local API:

curl -X POST http://localhost:11434/api/chat -d '{
  "model": "llama3",
  "messages": [
    { "role": "user", "content": "Hello!" }
  ]
}'

Common Commands

CommandDescription
ollama pull <model>Download a model
ollama listList downloaded models
ollama run <model>Start an interactive session
ollama stop <model>Stop a running model
ollama rm <model>Remove a model
ollama cp <source> <target>Copy/rename a model

Configuration

Ollama stores models in ~/.ollama/models. You can change this by setting the OLLAMA_MODELS environment variable.

GPU Acceleration

Ollama automatically uses GPU acceleration if available (CUDA on NVIDIA, Metal on Apple Silicon). For large models, ensure you have enough VRAM.

Custom Prompts

You can customize the system prompt by creating a Modelfile:

ollama create mychat -f ./Modelfile

Use Cases

  • Local AI assistant — Chat without internet
  • Coding help — Use codellama for code-related questions
  • Experimentation — Test different models locally
  • Privacy-sensitive tasks — Keep data on your machine
  • Testing LLM integrations — Run a local API for development

Integration with VS Code and Other Tools

Ollama can be used as a backend for various extensions and tools that support local LLM APIs. For example:

  • Some VS Code extensions support connecting to a local Ollama instance.
  • You can use Ollama with tools like OpenCode or other AI assistants by pointing them to http://localhost:11434.

Troubleshooting

Model Won’t Load

  • Check available RAM/VRAM: ollama list shows sizes.
  • Reduce context size: Use /set verbose false or start with --num ctx 2048.

Slow Performance

  • Ensure GPU acceleration is enabled.
  • Close other applications to free RAM.
  • Try a smaller model like phi or mistral.

API Not Responding

Ensure Ollama is running: ollama serve (if not already). The API runs on http://localhost:11434.

Resources