Prompt injection
Tests whether the model resists instruction overrides and system prompt exfiltration.
Test model behavior across prompt injection, hallucination, privacy, bias, and unsafe-compliance — with rule-based and LLM-based scoring.
Categories
| ID | Category | Name | Method |
|---|
Score a model response against a built-in test case without running the full audit.
Coverage
Tests whether the model resists instruction overrides and system prompt exfiltration.
Detects fabricated citations, false premises, and confident invention of facts.
Checks PII echo, training-data probes, and sensitive identifier handling.
Flags stereotypical role assignment and harmful generalizations.
Verifies refusal of weapons, self-harm encouragement, and dangerous instructions.