Rerankers
High-performance cross-encoder reranking models for retrieval-augmented generation and semantic search.
Rerankers
Zen Rerankers are cross-encoder models optimized for retrieval-augmented generation (RAG), semantic search, and search quality improvement. They re-score retrieved passages or documents to precisely rank relevance, integrating seamlessly into multi-stage retrieval pipelines.
All Zen Reranker models support 100+ languages and are available in multiple formats (SafeTensors, GGUF, and MLX) for flexible deployment across edge, CPU, and GPU infrastructure.
Model Family
| Model | Parameters | Context | Weights | Paper |
|---|---|---|---|---|
| Zen Reranker | 4.02B | 32K | weights | paper |
| Zen Reranker 0.6B | 0.6B | 32K | weights | paper |
| Zen Reranker 0.6B (GGUF) | 0.6B | 8K | weights | paper |
| Zen Reranker 4B | 4.02B | 32K | weights | paper |
| Zen Reranker 4B (GGUF) | 4B | 8K | weights | paper |
| Zen Reranker 8B | 8.19B | 32K | weights | paper |
| Zen Reranker 8B (GGUF) | 8B | 8K | weights | paper |
Use Cases
Zen Rerankers excel in:
- Retrieval-Augmented Generation (RAG) — re-score retrieved chunks before LLM context injection for higher precision
- Search pipelines — improve document ranking after initial dense or BM25 retrieval
- Question answering — score and rank candidate answers for relevance
- Semantic deduplication — cluster documents based on relevance scoring
- Cross-lingual retrieval — multilingual ranking for global applications
Quick Start
Install dependencies:
pip install transformers torchBasic reranking with the 4B model:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "zenlm/zen-reranker-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
model_name,
torch_dtype=torch.float16
)
def rerank(query, passages):
"""Score and rank passages by relevance to query."""
pairs = [[query, p] for p in passages]
inputs = tokenizer(
pairs,
padding=True,
truncation=True,
max_length=512,
return_tensors="pt"
)
with torch.no_grad():
scores = model(**inputs).logits.squeeze(-1)
ranked = sorted(
zip(passages, scores.tolist()),
key=lambda x: x[1],
reverse=True
)
return ranked
# Example usage
query = "What are the benefits of renewable energy?"
passages = [
"Wind and solar power reduce carbon emissions and provide clean electricity.",
"Coal is still the most abundant fuel source worldwide.",
"Renewable energy reduces dependence on fossil fuels and lowers costs long-term."
]
results = rerank(query, passages)
for passage, score in results:
print(f"{score:.3f}: {passage}")With sentence-transformers:
from sentence_transformers import CrossEncoder
model = CrossEncoder("zenlm/zen-reranker-4B")
pairs = [
["What are renewable energy benefits?", "Wind power reduces emissions."],
["What are renewable energy benefits?", "Coal is abundant worldwide."],
["What are renewable energy benefits?", "Solar lowers long-term costs."],
]
scores = model.predict(pairs)
for pair, score in zip(pairs, scores):
print(f"{score:.3f}: {pair[0]} | {pair[1]}")Zen API
All Zen Reranker models are also available through the Zen API at api.hanzo.ai. This provides a unified OpenAI-compatible inference endpoint:
curl -X POST https://api.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $ZEN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "zen-reranker-4b",
"messages": [...]
}'See the Zen API documentation for authentication, usage, and pricing.
Model Selection
- 0.6B — Edge devices, mobile, minimal latency; lightest footprint
- 4B — Balanced quality and speed; recommended for most RAG pipelines
- 8B — Highest precision; for demanding production retrieval systems
GGUF variants are optimized for llama.cpp and Ollama for CPU-based deployment.
Technical Details
All Zen Reranker models:
- Architecture: Qwen3-based cross-encoder
- Pipeline: text-classification (sequence scoring)
- Languages: 100+ languages (multilingual)
- License: Apache 2.0
- Context window: 32K (SafeTensors) / 8K (GGUF)
- Framework: Transformers / PyTorch compatible