Ollama is a tool for running large language models (LLMs) locally on your machine. It lets you download, run, and interact with open-source models without needing cloud APIs.
Ollama is recommended for running LLMs offline, experimenting with different models, or building local AI applications without sending data to external servers.
Installation
macOS
brew install ollamaLinux / Windows (WSL)
curl -fsSL https://ollama.com/install.sh | shVerify Installation
ollama --versionGetting Started
Download a Model
ollama pull llama3 # Llama 3 (Meta)
ollama pull mistral # Mistral (Mistral AI)
ollama pull codellama # Code-specific model
ollama pull phi # Microsoft's Phi modelList Available Models
ollama listRun a Model
ollama run llama3This opens an interactive chat session. Type your message and press Enter to get a response. Type /bye to exit.
Chat via API
You can also use Ollama as a local API:
curl -X POST http://localhost:11434/api/chat -d '{
"model": "llama3",
"messages": [
{ "role": "user", "content": "Hello!" }
]
}'Common Commands
| Command | Description |
|---|---|
ollama pull <model> | Download a model |
ollama list | List downloaded models |
ollama run <model> | Start an interactive session |
ollama stop <model> | Stop a running model |
ollama rm <model> | Remove a model |
ollama cp <source> <target> | Copy/rename a model |
Configuration
Ollama stores models in ~/.ollama/models. You can change this by setting the OLLAMA_MODELS environment variable.
GPU Acceleration
Ollama automatically uses GPU acceleration if available (CUDA on NVIDIA, Metal on Apple Silicon). For large models, ensure you have enough VRAM.
Custom Prompts
You can customize the system prompt by creating a Modelfile:
ollama create mychat -f ./ModelfileUse Cases
- Local AI assistant — Chat without internet
- Coding help — Use
codellamafor code-related questions - Experimentation — Test different models locally
- Privacy-sensitive tasks — Keep data on your machine
- Testing LLM integrations — Run a local API for development
Integration with VS Code and Other Tools
Ollama can be used as a backend for various extensions and tools that support local LLM APIs. For example:
- Some VS Code extensions support connecting to a local Ollama instance.
- You can use Ollama with tools like OpenCode or other AI assistants by pointing them to
http://localhost:11434.
Troubleshooting
Model Won’t Load
- Check available RAM/VRAM:
ollama listshows sizes. - Reduce context size: Use
/set verbose falseor start with--num ctx 2048.
Slow Performance
- Ensure GPU acceleration is enabled.
- Close other applications to free RAM.
- Try a smaller model like
phiormistral.
API Not Responding
Ensure Ollama is running: ollama serve (if not already). The API runs on http://localhost:11434.
Resources
- Ollama Website: https://ollama.com
- Ollama GitHub: https://github.com/ollama/ollama
- Model Library: https://ollama.com/library