The Quality module ships a pluggable redaction layer that scrubs PII and secrets out of every framework boundary that could leak — log lines, trace attributes, memory writes, captured LLM prompts, agent events. Redaction is always-on when the decorator is wired in; opt-out is per-tenant per-sink, never the default.

The redactor SPI is Redactor in com.tnsai.quality.redaction. The default implementation PatternRedactor ships with a 14-pattern catalog covering email, phone, credit card, SSN, TC Kimlik, API keys, JWT, AWS keys, IBAN, IP, bearer tokens, and SSH private-key blocks. Operators wire the redactor into framework sinks via decorator types (RedactingMemoryStore, RedactingAgentEventPublisher).

Quick start

import com.tnsai.quality.redaction.*;

// 1. Build a redactor with the default pattern catalog.
Redactor redactor = PatternRedactor.withDefaults();

// 2. Wrap the agent's memory store and event publisher.
MemoryStore           memory = new RedactingMemoryStore(new InMemoryStore(), redactor);
AgentEventPublisher   events = new RedactingAgentEventPublisher(new Slf4jAgentEventPublisher(), redactor);

// 3. From this point on, every write through these sinks is scrubbed.
agent.setMemoryStore(memory);
agent.setEventPublisher(events);

Pattern catalog

RedactionPatterns.defaults() returns the patterns below. Each is exported as a constant on RedactionPatterns (e.g. RedactionPatterns.EMAIL) so you can subset or augment.

Pattern	Severity	Example
`email`	MEDIUM	`user@example.com`
`phone_e164`	MEDIUM	`+15551234567`
`credit_card`	CRITICAL	`4111-1111-1111-1111` (Luhn-validated)
`us_ssn`	CRITICAL	`123-45-6789`
`tc_kimlik`	CRITICAL	11-digit Turkish national ID (checksum-validated)
`openai_api_key`	HIGH	`sk-...`
`anthropic_api_key`	HIGH	`sk-ant-...`
`github_pat`	HIGH	`ghp_...`
`aws_access_key`	HIGH	`AKIA...`
`jwt`	HIGH	three base64url segments separated by `.`
`iban`	MEDIUM	`DE89370400440532013000`
`ipv4`	LOW	`10.0.0.1`
`bearer_token`	HIGH	`Bearer <token>` in headers
`ssh_private_key_block`	CRITICAL	`-----BEGIN ... PRIVATE KEY-----` blocks

Numeric patterns (credit card, TC Kimlik) include checksum validators so order IDs that look like credit-card numbers don't get scrubbed. The placeholder shape is [REDACTED:<pattern_name>] and is itself inert against every default pattern — a placeholder cannot be re-redacted on a later pass.

Where redaction fires

Sink	Decorator	What gets scrubbed
Conversation memory writes	`RedactingMemoryStore`	`addMessage(role, content)`, `addMessage(Map)`, `search(query, …)`
Agent event log	`RedactingAgentEventPublisher`	All 13 `TnsAIEvent` variants — every user-input-bearing field rebuilt with redacted values
Search queries against memory	`RedactingMemoryStore`	`search(query, limit)` — query scrubbed before backend dispatch

Reads (getHistory, getRecentHistory) pass through unchanged because content stored via these decorators is already redacted at write time; double-scrubbing wastes cycles.

Composition

Every redactor type below implements the same Redactor SPI, so they slot into each other freely.

Multiple redactor pipeline

CompositeRedactor chains redactors in order; the output text of one feeds the next. Useful for stacking a fast pattern matcher with a slower LLM-classifier or Presidio bridge.

Redactor pipeline = CompositeRedactor.of(
        PatternRedactor.withDefaults(),                    // fast, regex-based
        new LLMClassifierRedactor(haiku, "pii-policy"));   // slower, NER-based (when shipped)

Order matters: the first redactor's [REDACTED:...] placeholders pass through every later redactor unchanged, so a downstream classifier doesn't get to "explain" what was already scrubbed.

Per-tenant policies

TenantPolicyRedactor dispatches each call to a per-tenant redactor based on the active RedactionContext. Resolution order:

RedactionContext.tenantPolicyId() — explicit override
EventContext.tenantId() from the framework's correlation context
Configured default redactor — fallback for unmatched / blank ids

Redactor everything   = PatternRedactor.withDefaults();
Redactor emailOnly    = new PatternRedactor(List.of(RedactionPatterns.EMAIL));
Redactor pciOnly      = new PatternRedactor(List.of(RedactionPatterns.CREDIT_CARD,
                                                   RedactionPatterns.US_SSN));

Redactor dispatcher = TenantPolicyRedactor.builder()
        .tenant("acme",   emailOnly)        // acme: minimal redaction
        .tenant("globex", pciOnly)          // globex: PCI-only
        .defaultRedactor(everything)        // everyone else: full catalog
        .build();

The dispatch happens on every call (no per-thread cache) so a single agent group serving multiple tenants doesn't leak across tenant boundaries.

Audit trail

AuditingRedactor decorates any redactor and emits a RedactionAuditEvent to a RedactionAuditListener whenever findings exist. Aggregates pattern counts + highest severity + framework correlation context — the redacted content itself never appears in the audit event.

Redactor base    = PatternRedactor.withDefaults();
Redactor audited = new AuditingRedactor(base, event -> {
    metrics.counter("redaction.applied",
            "scope", event.scope().name(),
            "severity", event.highestSeverity().name())
           .increment(event.totalFindings());
    if (event.isCritical()) {
        slack.alert("CRITICAL pii redaction in tenant " + event.eventContext().tenantId());
    }
});

AuditingRedactor IS a Redactor, so it stacks anywhere — inside TenantPolicyRedactor for per-tenant audit, or wrapping the dispatcher itself for cross-tenant audit.

A broken audit listener cannot break redaction itself: exceptions thrown inside onRedaction(...) are caught and logged at WARN.

Custom patterns

Build your own RedactionPattern and pass it to a PatternRedactor:

import java.util.regex.Pattern;
import com.tnsai.quality.redaction.*;

RedactionPattern medicalRecordId = RedactionPattern.of(
        "medical_record_id",
        "MRN-\\d{8}",                  // pattern (compiled to Pattern internally)
        Severity.CRITICAL);

RedactionPattern licensePlate = RedactionPattern.of(
        "license_plate_tr",
        "\\b\\d{2}\\s?[A-Z]{1,3}\\s?\\d{2,4}\\b",
        Severity.MEDIUM,
        plate -> plate.length() >= 7);  // optional validator filters false positives

List<RedactionPattern> myCatalog = new java.util.ArrayList<>(RedactionPatterns.defaults());
myCatalog.add(medicalRecordId);
myCatalog.add(licensePlate);

Redactor custom = new PatternRedactor(myCatalog);

The validator runs after the regex matches and lets you check shape rules (Luhn checksum, range, character class) that regex alone can't express. Patterns without a validator accept every regex match.

SPI summary

Type	Role
`Redactor`	The SPI. Three methods: `scrubString`, `scrubValue`, `maybeContainsSensitive`.
`PatternRedactor`	Default implementation — regex-driven, stateless, thread-safe.
`RedactionPatterns`	Constants for the 14 default patterns + `defaults()` list.
`RedactionResult`	Output of `scrubString` — text + findings.
`RedactionFinding`	One match — pattern name, offsets, placeholder, severity.
`RedactionContext`	Per-call context — `EventContext`, scope, tenant policy id.
`RedactionScope`	Enum of sinks — `LOG_ATTR`, `MEMORY_WRITE`, `LLM_PROMPT`, etc.
`Severity`	LOW / MEDIUM / HIGH / CRITICAL.
`CompositeRedactor`	Chain of redactors.
`TenantPolicyRedactor`	Per-tenant dispatcher.
`AuditingRedactor`	Decorator that fires `RedactionAuditEvent`s.
`RedactingMemoryStore`	`MemoryStore` decorator.
`RedactingAgentEventPublisher`	`AgentEventPublisher` decorator.

Invariants

The framework's property test suite (RedactorPropertyTest) verifies four invariants that every Redactor implementation must hold:

Idempotency — redact(redact(x)).text == redact(x).text. A sink that double-scrubs (memory write then log emit) doesn't drift.
Pattern survival — no input PII fragment appears in the redacted output.
Placeholder safety — the literal placeholder [REDACTED:foo] does not match any default pattern.
Length bounded — output length ≤ input length + (max placeholder length × findings count).

Custom redactors that ship into framework sinks should add coverage to the same suite.

Trade-offs

False positives — phone-regex matches tracking IDs that look like phones. Default placeholder preserves visual shape; tighten per-pattern with a validator.
False negatives — rare PII formats (driver's licence, medical record IDs) are not in the default catalog. Plug in custom patterns or the (forthcoming) classifier redactor.
Performance cost — on hot paths with huge payloads (1MB+ LLM responses), redaction adds measurable latency. Big-payload paths should run through async / buffered publish.
Data loss risk — aggressive redaction may scrub legitimate content (base64 images, opaque IDs that look like tokens). Tune per tenant.

Roadmap

The redaction SPI lives in tnsai-quality and is stable. Pending work tracked under issue #80:

LLMClassifierRedactor — opt-in classifier using any LLMClient for non-pattern PII (names, addresses, health info).
PresidioRedactor — bridge to Microsoft Presidio for state-of-the-art NER.
YAML policy loader — tenant-redaction-policies.yaml parser to populate TenantPolicyRedactor from config.
OnNotification hook integration — fire on critical-severity findings (depends on the hook system in #60).

Redaction

On this page