How PromptWall works

1. Scanner

Multi-pattern regex + heuristics for:

Jailbreak phrases (“Ignore previous instructions”, “You are now DAN”)
Exfiltration patterns (system prompt leaks, API key reveals)
Tool-smuggling attempts
Unicode obfuscation

Returns injection_score and matched patterns. Feeds into stage 2.

2. Policy engine

Evaluates tenant-specific rules (stored in policies table):

Custom block/allow patterns per tenant
Output filters for PII, PCI, trade secrets
Per-endpoint rules

Decision: allow | rewrite | block.

3. Grounding classifier

LLM-based heuristic that decides whether the answer needs to come from a verified source (tool call, knowledge base). Sets grounding_required flag.Important for factual queries (“What is our Q3 revenue?”) where hallucination risk is high.

4. Tool router / LLM call

If grounding_required: routes to SQL, webhook, or knowledge base and gets verified data. Then calls the LLM with the data as context.If not: calls the LLM directly.Supports five providers: OpenAI, Anthropic, Google, Azure OpenAI, AWS Bedrock.

5. Judge

LLM-based validator that compares the answer against the verified source:

Supported: answer is grounded in the source
Contradiction: answer contradicts the source
Numeric mismatch: answer has different numbers than the source
Insufficient evidence: source doesn’t support the answer
Unsupported inference: answer draws conclusions not in the source

Used only for grounded responses. Skipped for pure conversational answers.

6. Evidence evaluator + Security detector

Post-generation checks:

SecurityViolationDetector: scans answer for canary words, secret patterns, blocked PII
EvidenceConsistencyEvaluator: final grading of answer-vs-source alignment
Computes confidence (high / medium / low)

7. Enforcement

Final decision on the response:

allow: return as-is
rewrite: return answer with caveat or redactions
regenerate: ask LLM again with stronger grounding
block: return a policy violation message

Configurable per tenant via POLICY_MODE (enforce / observe).

8. Audit & Metering

Every decision persists to:

request_logs (Postgres)
audit_logs (Postgres)
security_events if blocked/rewritten
learning_events via Redis → worker → DB
usage_counters (for billing quotas)
S3 archive (SSE-KMS encrypted) with PII redaction

Zero-retention mode: prompt and answer content can be routed only to S3 (with customer KMS key), never stored in DB.

Stage	Duration
Scanner	~5 ms
Policy engine	~5 ms
Judge (LLM call)	~150 ms
Enforcement	~5 ms
Persistence (async)	0 ms perceived
Total	~165 ms

Get Started

Concepts

Guides

How PromptWall works

The pipeline

Eight stages

Response time

Fail-open vs fail-closed

Get Started

Concepts

Guides

​The pipeline

​Eight stages

​Response time

​Fail-open vs fail-closed

The pipeline

Eight stages

Response time

Fail-open vs fail-closed