Skip to Content
LLM Providers

LLM Provider Setup Guide

This guide covers how to configure and use the different LLM providers supported by Evidence-Bound.

Quick Start

Set the LLM_PROVIDER environment variable to choose your provider:

# Options: azure_openai (default), anthropic, gemini, ollama LLM_PROVIDER=azure_openai

Provider Comparison

ProviderLatencyCostQualityAir-GapBest For
Azure OpenAILow$$$ExcellentNoEnterprise with existing Azure
Anthropic ClaudeLow$$$ExcellentNoBest reasoning, legal analysis
Google GeminiVery Low$$Very GoodNoCost-effective, high volume
Ollama (local)MediumFreeGoodYesOn-prem, data sovereignty

1. Azure OpenAI (Default)

Best for: Enterprise deployments with Azure infrastructure.

Setup

  1. Create an Azure OpenAI resource in the Azure Portal
  2. Deploy a model (e.g., gpt-5-mini)
  3. Get your endpoint and API key

Configuration

LLM_PROVIDER=azure_openai # Required AZURE_OPENAI_CHAT_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_CHAT_API_KEY=your-api-key MODEL_ID=gpt-5-mini # Your deployment name # Optional AZURE_OPENAI_CHAT_API_VERSION=2024-02-15-preview

Notes

  • MODEL_ID is your deployment name, not the model name
  • Supports GPT-5-mini, GPT-4 Turbo, GPT-3.5 Turbo
  • Enterprise SLA and compliance certifications available

2. Anthropic Claude

Best for: Complex legal reasoning, nuanced analysis.

Setup

  1. Create an account at console.anthropic.com 
  2. Generate an API key
  3. Add billing information

Configuration

LLM_PROVIDER=anthropic # Required ANTHROPIC_API_KEY=sk-ant-api03-xxxxx # Optional (defaults shown) ANTHROPIC_MODEL=claude-sonnet-4-20250514

Available Models

ModelSpeedQualityCostNotes
claude-sonnet-4-20250514FastExcellent$$Recommended - best balance
claude-opus-4-20250514SlowerBest$$$Highest capability
claude-3-5-sonnet-20241022FastExcellent$$Previous generation
claude-3-5-haiku-20241022Very FastGood$Cost-effective

Notes

  • Excellent at following complex instructions
  • Strong performance on legal document analysis
  • 200K context window on most models

3. Google Gemini

Best for: Cost-effective deployments, fast response times.

Setup

  1. Go to Google AI Studio 
  2. Create an API key
  3. Enable the Generative AI API

Configuration

LLM_PROVIDER=gemini # Required GEMINI_API_KEY=your-api-key # Optional (defaults shown) GEMINI_MODEL=gemini-2.0-flash

Available Models

ModelSpeedQualityCostNotes
gemini-2.0-flashVery FastVery Good$Recommended - best value
gemini-1.5-proFastExcellent$$Longer context (1M tokens)
gemini-1.5-flashVery FastGood$Balance of speed/quality

Notes

  • Very competitive pricing
  • Fast response times
  • Good for high-volume workloads

4. Ollama (Local / On-Prem)

Best for: Air-gapped environments, data sovereignty, development.

Setup

  1. Install Ollama

    # macOS / Linux curl -fsSL https://ollama.ai/install.sh | sh # Windows # Download from https://ollama.ai/download
  2. Start the Ollama server

    ollama serve # Server runs on http://localhost:11434
  3. Pull a model

    # Recommended for most cases (16GB RAM) ollama pull llama3.2:8b # Alternative models ollama pull mistral:7b # Fast, good reasoning ollama pull qwen2.5:7b # Good for structured tasks ollama pull llama3.3:70b # Best quality (needs 40GB+ VRAM)

Configuration

LLM_PROVIDER=ollama # Optional (defaults shown) OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama3.2:8b
ModelRAM RequiredQualitySpeedUse Case
llama3.2:8b16GBGoodFastGeneral use - best balance
llama3.3:70b40GB+ VRAMExcellentSlowComplex legal reasoning
mistral:7b16GBGoodVery FastQuick queries
qwen2.5:7b16GBGoodFastStructured extraction

Remote Ollama Server

To use Ollama on a different machine:

# On the Ollama server, allow external connections OLLAMA_HOST=0.0.0.0 ollama serve # In your .env OLLAMA_BASE_URL=http://ollama-server.internal:11434

GPU Acceleration

Ollama automatically uses GPU if available:

  • NVIDIA: Install CUDA drivers
  • Apple Silicon: Metal acceleration automatic
  • AMD: ROCm support (Linux only)

Check GPU usage:

ollama ps # Shows running models and memory usage

Notes

  • No API costs - runs entirely locally
  • Data never leaves your network
  • Longer response times than cloud providers
  • Quality varies by model size
  • First request may be slow (model loading)

Switching Providers

Switching providers requires only configuration changes:

# Development: Use Ollama (free, local) LLM_PROVIDER=ollama OLLAMA_MODEL=llama3.2:8b # Production: Use Azure OpenAI (enterprise SLA) LLM_PROVIDER=azure_openai AZURE_OPENAI_CHAT_ENDPOINT=https://... AZURE_OPENAI_CHAT_API_KEY=... MODEL_ID=gpt-5-mini

No code changes required. The get_llm_client() factory function returns the appropriate client based on configuration.


Testing Your Configuration

Verify Provider Works

# Run the LLM provider tests pytest tests/test_llm_providers.py -v # Test a specific provider (requires credentials) python -c " from app.llm import get_llm_client client = get_llm_client() print(f'Provider: {client.provider}') print(f'Model: {client.model}') "

Test with a Simple Query

# Start the API server cd apps/api && uvicorn app.main:app --reload # Make a test request curl -X POST http://localhost:8000/v1/ask \ -H "Content-Type: application/json" \ -H "X-Tenant-Id: test-tenant" \ -H "X-Matter-Id: test-matter" \ -d '{"question": "What is 2+2?"}'

Troubleshooting

Azure OpenAI

ErrorCauseFix
401 UnauthorizedInvalid API keyCheck AZURE_OPENAI_CHAT_API_KEY
404 Not FoundWrong endpoint or deploymentVerify AZURE_OPENAI_CHAT_ENDPOINT and MODEL_ID
429 Too Many RequestsRate limit exceededImplement backoff or upgrade quota

Anthropic

ErrorCauseFix
401 Invalid API keyBad or expired keyRegenerate key at console.anthropic.com
429 Rate limitToo many requestsAdd delay between requests
400 Bad requestInvalid parametersCheck model name and parameters

Gemini

ErrorCauseFix
403 API key invalidBad key or API not enabledEnable Generative AI API in Google Cloud
429 Rate limitQuota exceededCheck quota at Google Cloud Console
400 Bad requestInvalid model or parametersVerify model name

Ollama

ErrorCauseFix
Connection refusedOllama not runningRun ollama serve
Model not foundModel not pulledRun ollama pull <model>
TimeoutModel too large / slow hardwareUse smaller model or increase timeout
Out of memoryInsufficient RAM/VRAMUse smaller model or add memory

Security Considerations

API Key Management

  • Never commit API keys to version control
  • Use environment variables or secrets managers
  • Rotate keys periodically
  • Use separate keys for dev/staging/production

Ollama Security

  • Ollama has no built-in authentication
  • Don’t expose Ollama to public internet
  • Use firewall rules to restrict access
  • Consider VPN for remote access

Error Message Sanitization

All providers sanitize error messages to prevent API key leakage:

# Gemini: API keys in error responses are redacted "Gemini HTTP 400: [REDACTED]..." # Ollama: Internal URLs are not exposed "Cannot connect to Ollama server. Is Ollama running?"

Environment Variable Reference

# === Provider Selection === LLM_PROVIDER=azure_openai # azure_openai | anthropic | gemini | ollama # === Azure OpenAI === AZURE_OPENAI_CHAT_ENDPOINT=https://your-resource.openai.azure.com AZURE_OPENAI_CHAT_API_KEY=your-key AZURE_OPENAI_CHAT_API_VERSION=2024-02-15-preview MODEL_ID=gpt-5-mini # === Anthropic === ANTHROPIC_API_KEY=sk-ant-xxx ANTHROPIC_MODEL=claude-sonnet-4-20250514 # === Gemini === GEMINI_API_KEY=your-key GEMINI_MODEL=gemini-2.0-flash # === Ollama === OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama3.2:8b