World Models
Generative world models for interactive video scene synthesis with camera control.
Overview
Zen World Models are diffusion-based generative models that synthesize coherent, interactive video scenes from text prompts and camera trajectories. Built on the Zen Mixture-of-Distilled-Experts (MoDE) architecture, these models enable exploration of virtual environments through camera control, making them ideal for robotics simulation, game development, and interactive media generation.
Available Models
Quick Start
Using Transformers (Local)
Install the diffusers library and load a model:
from diffusers import AutoPipelineForText2Video
import torch
model_id = "zenlm/zen-world"
pipe = AutoPipelineForText2Video.from_pretrained(
model_id,
torch_dtype=torch.bfloat16
)
pipe = pipe.to("cuda")
# Generate video frames from a text prompt
video_frames = pipe(
"A drone flying over a tropical coastline at golden hour"
).frames[0]Via Zen API (Recommended)
For easy access without local GPU requirements, use the OpenAI-compatible Zen API endpoint at api.hanzo.ai:
from openai import OpenAI
client = OpenAI(
base_url="https://api.hanzo.ai/v1",
api_key="your-api-key"
)
response = client.images.generate(
model="zen-world",
prompt="A drone flying over a tropical coastline at golden hour",
size="1280x720"
)
print(response.data[0].url)Model Details
Zen World is a 13B parameter generative world model that renders coherent video scenes from text descriptions. It excels at continuous scene generation and is ideal for tasks requiring rapid iteration on environment design.
Zen Voyager is a 32.8B parameter camera-controlled world model that generates explorable, interactive video scenes. It supports dynamic camera trajectories, making it suited for applications requiring camera control and spatial navigation through virtual environments.
Both models are built on the Zen MoDE architecture, offering a balance between inference efficiency and generation quality. They leverage diffusion-based generation for pixel-perfect scene synthesis.
Use Cases
- Robotics Simulation: Generate training environments for robot navigation and manipulation
- Game Development: Create dynamic, procedurally-generated game worlds
- Visual Effects: Synthesize background plates and environmental scenes
- Interactive Media: Build explorable virtual environments with camera control
- Research: Study emergent world dynamics and physics-based scene generation