Agents

Zen agent models optimized for agentic reasoning, tool use, and multi-step task execution.

The Zen agent model family is purpose-built for agentic workflows, function calling, and multi-step reasoning. These models are trained on real-world environment reinforcement learning to excel at tool use, code generation, and complex task execution.

Model Family

Model	Params	Context	HF	Paper
Zen 5 Mini	MoE (~10B active)	—	weights	paper
Zen Agent 4B	4.02B	32K	weights	paper
Zen Eco 4B Agent (GGUF)	4.02B	32K	weights	paper
Zen Eco 4B Agent (MLX)	4.02B	32K	weights	paper

Quick Start

Below is a simple example using the Zen Agent 4B model with the Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "zenlm/zen-agent-4b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [{"role": "user", "content": "What tools do you have access to?"}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

Zen API

For production deployments and higher throughput, use the Zen API endpoint at api.hanzo.ai. The API provides OpenAI-compatible endpoints and automatic scaling:

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen-agent-4b",
    "messages": [{"role": "user", "content": "Execute this task..."}]
  }'

Get a free API key with $5 credit at console.hanzo.ai.

Model Variants

Zen 5 Mini is the frontier-agentic tier, featuring a sparse mixture-of-experts architecture with ~10B active parameters and trained on large-scale real-world environment reinforcement learning.

Zen Agent 4B and Zen Eco variants are compact, efficient 4B models optimized for tool calling and multi-step reasoning with a 32K token context window. The GGUF variant is optimized for llama.cpp and CPU inference, while the MLX variant is tuned for Apple Silicon hardware.

Agents

Agents

Model Family

Quick Start

Zen API

Model Variants

On this page