Zen4

Zen4 is a family of open, uncensored AI models built on abliterated weights from frontier open-source MoE architectures.

From the 4B Mini for edge deployment to the 1.04T Ultra for cloud-scale reasoning, Zen4 models run unrestricted with no safety theater.

Model Tiers

Consumer Line

5 models from 4B to 80B MoE - dense models for edge and MoE flagships for desktop

Coder Line

3 coding models from 31B to 355B for agentic programming

Ultra Line

Trillion-parameter MoE models for cloud deployment

Training

Fine-tune Zen4 models with MLX, Unsloth, or DeepSpeed

Quick Start

Use a Zen4 model

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("zenlm/zen4")
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen4")

messages = [{"role": "user", "content": "Hello, who are you?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Run with MLX (Apple Silicon)

pip install mlx-lm
python -m mlx_lm.generate --model zenlm/zen4-max --prompt "Explain quantum computing"

Run with Ollama

ollama run zen4

Key Features

Abliterated: Safety restrictions removed via orthogonalization of refusal directions
Efficient MoE: Flagship models use only 3B active parameters from 30B-80B total
Long Context: Up to 256K tokens on MoE models, 32K on dense models
Open Weights: All models available on HuggingFace

Introduction