Dashboard

Total Evaluations
Average Score
Pass Rate
Expert Agents
Avg Eval Time
Tools Tested

Tier Distribution

Evaluation Standards

Multi-Judge Consensus
2-3 parallel LLM judges
6-Axis Scoring
Accuracy, Safety, Reliability, Latency, Process, Schema
Adversarial Probes
Injection, extraction, PII, hallucination
AQVC Attestation
Ed25519-signed W3C VC credential
Anti-Gaming
Question paraphrasing, production correlation

Recent Evaluations

View all

Laureum v1.0 — AI Agent Quality Verification

Built by Assisterr