Docs

LLM safety, evaluated.

Test model behavior across prompt injection, hallucination, privacy, bias, and unsafe-compliance — with rule-based and LLM-based scoring.

Browse scenarios

Running evaluation…

Five safety dimensions

Tests whether the model resists instruction overrides and system prompt exfiltration.

Detects fabricated citations, false premises, and confident invention of facts.

Checks PII echo, training-data probes, and sensitive identifier handling.

Flags stereotypical role assignment and harmful generalizations.

Verifies refusal of weapons, self-harm encouragement, and dangerous instructions.