Real-time and generative safety moderation across 119 languages with multi-tiered severity classification.

Safety

Zen Safety models provide comprehensive content moderation and safety classification for AI systems. Built on the Zen architecture, they offer real-time token-level monitoring, generative safety assessment, and multilingual support across 119 languages.

The family spans three core capabilities: instant streaming moderation for token-level detection, generative classification for detailed prompt and response analysis, and multilingual safety across diverse deployment scenarios.

Models

Model	Params	Context	Type	HF	Paper
Zen Guard	3.09B	—	Classification	weights	paper
Zen Guard Gen	7.62B	32K	Classification	weights	paper
Zen Guard Gen 8B	7.62B	—	Generation	weights	paper
Zen Guard Stream	3B	—	Classification	weights	paper
Zen Guard Stream 4B	3.09B	—	Generation	weights	paper
Zen3 Guard	5.78B	—	Classification	weights	paper

Quick Start

Prompt and Response Moderation

Use Zen Guard for real-time safety classification of LLM inputs and outputs:

from transformers import AutoModelForCausalLM, AutoTokenizer
import re

model_name = "zenlm/zen-guard"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

def classify_safety(content):
    """Extract safety classification and categories from model output."""
    safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
    category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive|Copyright Violation|Jailbreak|None)"
    safe_match = re.search(safe_pattern, content)
    label = safe_match.group(1) if safe_match else None
    categories = re.findall(category_pattern, content)
    return label, categories

# Moderate a user prompt
prompt = "How can I learn about cybersecurity?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=128)
result = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
label, categories = classify_safety(result)

print(f"Safety: {label}")
print(f"Categories: {categories}")
# Output:
# Safety: Safe
# Categories: []

Model Selection

Zen Guard (3B): Fast, low-latency safety classification for streaming scenarios
Zen Guard Gen (8B): Highest-accuracy generative moderation for batch processing
Zen Guard Stream (3B): Token-level real-time monitoring with 5ms latency per token
Zen3 Guard (6B): Multilingual variant for 119-language deployments

Using the Zen API

For production deployments, use the OpenAI-compatible Zen API endpoint at https://api.hanzo.ai/v1/:

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen-guard",
    "messages": [
      {"role": "user", "content": "Is this prompt safe?"}
    ]
  }'

Zen Safety models are fully compatible with standard library integrations (OpenAI Python SDK, LangChain, etc.) pointing to the Zen API endpoint.

Safety Categories

All Zen Safety models classify content across 9 primary categories:

Violent — Violence instructions, methods, or depictions
Non-violent Illegal Acts — Hacking, unauthorized activities
Sexual Content — Sexual imagery or descriptions
PII — Personally identifiable information disclosure
Suicide & Self-Harm — Self-harm encouragement or methods
Unethical Acts — Bias, discrimination, hate speech
Politically Sensitive — False political information
Copyright Violation — Unauthorized copyrighted material
Jailbreak — System prompt override attempts

Severity Levels

Classifications use three-tiered severity:

Safe — Content poses no safety concern
Controversial — Content may offend or be sensitive but is not dangerous
Unsafe — Content violates safety policies

Language Support

All Zen Safety models support 119 languages and dialects, with optimized performance for English, Chinese, Spanish, and 115+ additional languages.

Performance

Model	Accuracy	Latency	VRAM (FP16)
Zen Guard	96.8%	120ms	8GB
Zen Guard Gen	97.5%	120ms	16GB
Zen Guard Stream	95.2%	5ms	8GB
Zen3 Guard	96.1%	100ms	12GB

Deployment

Deploy using SGLang or vLLM for production:

# Using SGLang
python -m sglang.launch_server --model-path zenlm/zen-guard --port 30000

# Using vLLM
vllm serve zenlm/zen-guard --port 8000 --max-model-len 32768

Safety

Safety

Models

Quick Start

Prompt and Response Moderation

Model Selection

Using the Zen API

Safety Categories

Severity Levels

Language Support

Performance

Deployment

Further Reading

On this page