Code
Specialized code generation and analysis models powered by Zen MoDE with extended context for repository-scale understanding.
The Zen Code model family includes specialized models for software engineering: from repo-scale code understanding and agentic refactoring to SQL query generation and database schema design. All models are built on the Zen architecture and support extended context windows for handling large codebases.
Models
| Model | Params | Context | HF | Paper |
|---|---|---|---|---|
| Zen 5 Coder | — | 256K | weights | paper |
| Zen 5 Coder (GGUF) | 79.7B (MoE) | — | weights | paper |
| Zen SQL | 8B | 32K | weights | paper |
Quick start
Zen 5 Coder: Repository-scale code understanding
Zen 5 Coder is an 80B sparse-MoE model optimized for repo-scale understanding, agentic refactoring, and multi-step code generation tasks. Use it locally via llama.cpp (GGUF) or via the Hanzo API.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load Zen 5 Coder via transformers
model = AutoModelForCausalLM.from_pretrained("zenlm/zen-5-coder")
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-5-coder")
# Prepare prompt with chat template
messages = [
{"role": "user", "content": "Refactor this Python function to use async/await."}
]
text = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
# Generate code
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=2048)
print(tokenizer.decode(output[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))Zen SQL: SQL query generation and database tasks
Zen SQL is an 8B model specialized for complex query generation, schema design, optimization, and documentation. It supports PostgreSQL, MySQL, SQLite, BigQuery, Snowflake, and more.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("zenlm/zen-sql")
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-sql")
messages = [
{"role": "user", "content": "Generate a PostgreSQL query to find users with more than 10 orders in the last 30 days."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1024)
print(tokenizer.decode(output[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))Using the Hanzo API
All Zen Code models are available via the OpenAI-compatible endpoint at api.hanzo.ai. This is the recommended path for production deployments.
from openai import OpenAI
client = OpenAI(api_key="your-api-key", base_url="https://api.hanzo.ai/v1")
response = client.chat.completions.create(
model="zen5-coder",
messages=[
{"role": "user", "content": "Write a TypeScript function that validates email addresses."}
],
max_tokens=2048,
)
print(response.choices[0].message.content)Local inference with llama.cpp
For the GGUF-quantized versions, use llama.cpp for CPU or GPU inference:
# Download GGUF (example with Q4_K_M quantization)
# Zen 5 Coder GGUF requires 48+ GB VRAM for Q4_K_M
llama-cli -m zen-5-coder-q4_k_m.gguf -p "Explain this code snippet" -n 2048Notes
- Zen 5 Coder is designed for repo-scale code understanding with 256K context, ideal for codebases, PRs, and multi-file refactoring tasks.
- Zen SQL specializes in database work, generating production-ready queries across multiple SQL dialects.
- All models are Apache 2.0 licensed and support commercial use.
- For optimal performance with large contexts or codebases, use the Hanzo API or local inference on GPU-enabled hardware.
Chat & Reasoning
Zen chat and reasoning models ranging from 0.6B edge deployments to frontier 1M-context reasoning systems.
Vision-Language
Zen vision-language models for multimodal image and text understanding, OCR, visual reasoning, and agentic tasks—scaling from 0.8B on-device to 235B frontier models.