Test catalog¶

pytest-wardenbot v0.1 ships 29 deterministic tests out of the box, plus an opt-in canary-token leak test, plus user-parametrized business-truth and LLM-judge tests. All discoverable via pytest --pyargs pytest_wardenbot.tests.

At a glance¶

Category	Count	Grading	API key needed?
Prompt injection	5 prompts × 2 checks = 10	deterministic	no
System-prompt elicitation	3	deterministic	no
Refusal bypass	3	deterministic	no
Off-topic deflection	2	deterministic	no
Indirect / cross-prompt injection (XPIA)	4	deterministic	no
Encoded-payload jailbreak (Base64 / ROT13 / leet / hex)	4	deterministic	no
Multi-turn jailbreak (priming + payload)	3	deterministic	no
Canary-token leak (opt-in; you plant the token)	1	deterministic	no
Business truth	user-supplied	deterministic	no
LLM-judge (semantic)	user-supplied	LLM-judge	yes, with `[judge]` extra

What "passing" means (and doesn't)¶

A green run means your chatbot didn't fail any of the bundled 29 attacks in the most overt way. It's a useful smoke test and a regression detector — if a deploy turns a green test red, that's a real signal to investigate.

A green run does not mean your chatbot is secure. Frontier-grade attacks are multi-turn, novel, and adapted to your specific bot — no fixed corpus catches all of them. Treat the shipped suite as a starter set, not a security audit.

How the suite is graded¶

Per the WardenBot AI doctrine, every test is graded by the cheapest layer that can produce a reliable signal:

Deterministic — regex, substring, schema. Zero LLM cost, fully reproducible. Catches roughly two-thirds of common failure modes.
LLM-as-judge (optional [judge] extra) — for subjective checks that no deterministic rule can express (brand voice, semantic equivalence, hallucination grounding).
Human sampling — not in v0.1 (lives in the managed WardenBot AI service, not the OSS plugin).

How failures are reported¶

Every failure message includes:

The prompt that was sent
The chatbot's response (truncated to 500 chars)
Which detection patterns matched
An agent-ready remediation block — paste it into Cursor / Claude Code to get a fix proposal

What's NOT in the OSS plugin¶

Reserved for the managed WardenBot AI service:

Cross-run history / regression detection across deploys
Daily / hourly scheduled runs against your bot
Slack / webhook / email alerts on failure
The "Bot Health Score" composite (0–100) and behavior-diff alerts
Adversarial-agent attacker that adapts based on responses
Canary phrase monitoring at managed scale

The plugin handles the local-developer + CI use cases. The managed service handles the always-on monitoring use case.