TnsAI

Sampling

Production agents emit thousands of events per second — chat chunks, tool calls, memory writes, hook invocations. Sending all of them to a log aggregator buys a bigger Elasticsearch cluster every quarter; dropping everything to ERROR-only makes debugging impossible. The sampling SPI in tnsai-quality.sampling lets operators decide per-event whether to capture, drop, or down-sample, with policy choices that scale from "dev — capture everything" to "production — errors always, INFO 1%".

The SPI is EventSamplingPolicy. Default implementations (AlwaysCapturePolicy, HeadBasedSamplingPolicy, PriorityBasedSamplingPolicy, ErrorAlwaysPolicy) ship with the framework. Operators wire a policy into their event publisher via the SamplingAgentEventPublisher decorator.

Quick start

import com.tnsai.quality.sampling.*;
import com.tnsai.observability.events.AgentEventPublisher;

// 1. Pick a named policy shape (or build a custom one).
EventSamplingPolicy policy = SamplingPolicies.production();

// 2. Wrap your publisher.
AgentEventPublisher slf4j   = new Slf4jAgentEventPublisher();
AgentEventPublisher sampled = new SamplingAgentEventPublisher(slf4j, policy);

// 3. From this point on, every publish call goes through the policy.
agent.setEventPublisher(sampled);

Decision shape

SamplingDecision is a sealed interface with three variants. Java 21 exhaustive switch makes the publisher's dispatch compile-checked:

DecisionMeaning
CaptureCapture at full fidelity. No downstream rate adjustment needed.
DropDiscard the event. Never reaches the underlying publisher.
CaptureDownsampled(rate)Capture, but record the configured rate so downstream aggregators (Prometheus, Datadog) can extrapolate volume (count += 1 / rate).

Capture.INSTANCE and Drop.INSTANCE are reusable singletons — most policies don't allocate a fresh decision per call.

Default policies

AlwaysCapturePolicy

The dev / single-tenant default — every event captured. Use the singleton: AlwaysCapturePolicy.INSTANCE.

EventSamplingPolicy verbose = AlwaysCapturePolicy.INSTANCE;

HeadBasedSamplingPolicy

Coin-flip per event at a fixed rate. Stateless, lock-free, the simplest production policy. Returns CaptureDownsampled(rate) when captured.

EventSamplingPolicy p = new HeadBasedSamplingPolicy(0.10);   // 10%

Trade-off: no per-session affinity. A captured chat may have some chunks captured and others not. Use TailBasedSamplingPolicy (forthcoming) when you need every event from a captured session or none.

PriorityBasedSamplingPolicy

Per-EventLevel rate. Default rates encode the production observability rule of thumb — never drop anything that's already a problem:

LevelDefault rate
CRITICAL1.0 (always)
ERROR1.0 (always)
WARN1.0 (always)
INFO0.1 (10%)
DEBUG0.01 (1%)
// Defaults
EventSamplingPolicy p = PriorityBasedSamplingPolicy.withDefaults();

// Custom rates per level
Map<EventLevel, Double> overrides = Map.of(
        EventLevel.INFO, 0.5,
        EventLevel.DEBUG, 0.0           // hard-drop DEBUG
);
EventSamplingPolicy p = new PriorityBasedSamplingPolicy(overrides);

ErrorAlwaysPolicy

Simpler than PriorityBasedSamplingPolicy — one switch on level, no per-level map:

  • WARN / ERROR / CRITICAL → always Capture
  • INFO / DEBUG → coin-flip at infoDebugRate
EventSamplingPolicy p = new ErrorAlwaysPolicy(0.05);     // 5% INFO/DEBUG
EventSamplingPolicy d = ErrorAlwaysPolicy.withDefault(); // 10% INFO/DEBUG

Use when "is this a problem?" is the only distinction that matters for capture.

RuleBasedSamplingPolicy

Ordered list of declarative rules, first match wins. Each rule combines an event-kind glob, a minimum level, an optional tenant filter, and an outcome (CAPTURE / DROP / SAMPLE_AT_RATE).

Glob shapes:

  • "*" — matches any kind
  • "chat.*" — matches one component after the prefix (chat.chunk ✓, chat.tool.invoked ✗)
  • "tool.invoked" — exact match
EventSamplingPolicy p = RuleBasedSamplingPolicy.builder()
        .capture("tool.invoked")                      // always capture
        .drop("internal.heartbeat")                   // hard-drop noise
        .sample("chat.chunk", 0.001)                  // 0.1% of chunks
        .capture("*", EventLevel.WARN, null)          // WARN+ everywhere
        .defaultDrop()                                // anything unmatched
        .build();

The programmatic builder is the canonical API; a YAML loader can produce the same Rule list at startup as a future enhancement.

AdaptiveSamplingPolicy

Wraps a delegate policy and degrades gracefully under measured backpressure. The consumer supplies a BackpressureSignal — a double in [0.0, 1.0] representing "fraction of capacity in use" (queue depth, ack lag, Resilience4j RateLimiter saturation, …). The policy reads it and:

  • below lowWaterMark (default 0.5) → delegate runs unchanged
  • between low and high → linearly downsample the delegate's Capture decisions
  • at or above highWaterMark (default 0.9) → drop everything below criticalLevel (default ERROR)

Critical-and-above events always survive, regardless of pressure.

BackpressureSignal queueSignal = () -> queueDepth.get() / (double) MAX_DEPTH;

EventSamplingPolicy p = new AdaptiveSamplingPolicy(
        SamplingPolicies.production(),     // delegate
        queueSignal);                       // sane defaults: 0.5 / 0.9 / ERROR

// Or with custom thresholds:
EventSamplingPolicy p2 = new AdaptiveSamplingPolicy(
        delegate, queueSignal,
        0.4, 0.85, EventLevel.WARN);

Tracks droppedDueToPressureCount() for monitoring how often pressure forced a drop above what the delegate would have done.

Named factories — SamplingPolicies

Three shapes for the most common operator intents:

FactoryReturnsWhen to use
SamplingPolicies.verbose()AlwaysCapturePolicy.INSTANCEdev / staging / single tenant
SamplingPolicies.defaults()PriorityBasedSamplingPolicy.withDefaults()production with INFO/DEBUG distinction
SamplingPolicies.production()ErrorAlwaysPolicy.withDefault()production with strict "errors-always"
EventSamplingPolicy p = SamplingPolicies.production();

Per-tenant policies

TenantPolicySamplingPolicy dispatches each call to a per-tenant policy based on EventContext.tenantId. A multi-tenant agent group can apply different rates per tenant without the publisher knowing tenant boundaries.

Resolution order:

  1. EventContext.tenantId from SamplingInput.context
  2. Configured default policy — fallback for null / blank / unmatched
EventSamplingPolicy dispatcher = TenantPolicySamplingPolicy.builder()
        .tenant("acme",   SamplingPolicies.verbose())            // power user — keep everything
        .tenant("globex", new ErrorAlwaysPolicy(0.001))          // chatty — aggressive sampling
        .defaultPolicy(SamplingPolicies.production())            // everyone else
        .build();

The dispatch happens on every call (no per-thread cache) so a single agent group serving multiple tenants doesn't leak rates across boundaries.

Wiring into the publisher

SamplingAgentEventPublisher is a decorator over the AgentEventPublisher SPI in tnsai-core. Each publish call builds a SamplingInput from event.eventType() + EventLevel + EventContext, asks the policy, and publishes only when SamplingDecision.shouldPublish() is true.

AgentEventPublisher base    = new Slf4jAgentEventPublisher();
AgentEventPublisher sampled = new SamplingAgentEventPublisher(base, policy);

Constructor variants:

ConstructorUse when
(delegate, policy)bare publish(event) calls fall back to EventContext.empty()
(delegate, policy, contextSupplier)per-call tenant / agent context flows in via the supplier

Both publish overloads (level-aware and bare) route through the policy. Bare publish(event) defaults to EventLevel.INFO. Null event passes straight through to the delegate without consulting the policy.

Composition with redaction

SamplingAgentEventPublisher and RedactingAgentEventPublisher are independent decorators — both implement the same SPI. Stack order chooses which runs first:

// Order A: sample first, redact what survives.
//   Cheaper when drop rate is high — redactor only runs on captured events.
AgentEventPublisher pub = new RedactingAgentEventPublisher(
        new SamplingAgentEventPublisher(base, policy), redactor);

// Order B: redact every event, then sample.
//   Uniform redaction cost; slightly easier to debug ("what would have been emitted?").
AgentEventPublisher pub = new SamplingAgentEventPublisher(
        new RedactingAgentEventPublisher(base, redactor), policy);

Pick order A in production (where drop rate is meaningful) and order B when debugging redaction patterns.

Rate limiting (last-resort backstop)

Sampling makes per-event capture/drop decisions; a rate limiter is a per-tenant volume cap that protects the aggregator when sampling alone isn't enough. The two compose: sampling reduces volume cheaply, the limiter bounds the survivors.

import com.tnsai.quality.ratelimit.*;

// Per-tenant token bucket (Resilience4j-backed). Defaults are generous;
// configure tighter limits per noisy tenant.
EventRateLimiter limiter = Resilience4jEventRateLimiter.builder()
        .defaultLimit(1000, Duration.ofSeconds(1))           // 1000 events/sec
        .perTenant("noisy-tenant", 100, Duration.ofSeconds(1))
        .build();

AgentEventPublisher slf4j   = new Slf4jAgentEventPublisher();
AgentEventPublisher rate    = new RateLimitedAgentEventPublisher(slf4j, limiter);
AgentEventPublisher sampled = new SamplingAgentEventPublisher(rate, policy);

Stack order matters. sampled → rate → sink (above) lets the cheap sampling decision run first; the rate limiter only sees what survived. Reverse the stack (rate → sampled → sink) when the limiter's per-tenant cap matters more than sampling cost.

The limiter returns a Decision:

  • ACCEPT → publish
  • DROP → silently swallow (with a counter increment for monitoring)
  • DEFER → opt-in deferred path; the default decorator drops, override onDefer to queue

Resilience4jEventRateLimiter#flushDropCounters() returns a per-tenant snapshot (and clears) so a periodic flusher can emit one ratelimit.dropped aggregate event per minute instead of suppressing the signal entirely.

SPI summary

TypeRole
EventSamplingPolicyThe SPI — single method sample(SamplingInput) → SamplingDecision.
SamplingDecisionSealed interface — Capture, Drop, CaptureDownsampled(rate).
SamplingInputPer-call frozen view — eventKind, level, context.
AlwaysCapturePolicyCapture every event; the dev default.
HeadBasedSamplingPolicyCoin-flip at fixed rate.
PriorityBasedSamplingPolicyPer-EventLevel rates with sensible defaults.
ErrorAlwaysPolicyWARN+ always; INFO/DEBUG at one rate.
RuleBasedSamplingPolicyDeclarative rule list (glob + level + tenant + outcome), first match wins.
AdaptiveSamplingPolicyWraps a delegate; downsamples linearly under backpressure.
SamplingPoliciesNamed factories — verbose, defaults, production.
TenantPolicySamplingPolicyPer-tenant dispatcher.
SamplingAgentEventPublisherAgentEventPublisher decorator that consults the policy.
EventRateLimiterPer-tenant token-bucket SPI returning ACCEPT / DROP / DEFER.
Resilience4jEventRateLimiterDefault limiter — per-tenant configs, drop counters.
RateLimitedAgentEventPublisherDecorator that consults the limiter before delegating.

Trade-offs

  • No per-session affinityHeadBased and Priority decide per event in isolation. A captured session may have some chunks dropped. TailBasedSamplingPolicy (forthcoming) buffers per session and decides at session end based on outcome (error / slow / expensive → capture; boring → drop).
  • CaptureDownsampled extrapolation cost — downstream aggregators must know to scale counters by 1 / rate. If your aggregator doesn't support per-event rates, use Capture (no rate) policies only.
  • Statistical variance — at low rates over short windows, observed capture counts vary widely. Plan dashboards over windows large enough that the law of large numbers settles.
  • Cost vs. fidelity — aggressive sampling reduces aggregator cost but loses signal for rare bugs. Production agents typically run SamplingPolicies.production() (10% INFO/DEBUG) until cost forces tighter rates.

Roadmap

The sampling + rate-limiting SPIs are stable in tnsai-quality.sampling and tnsai-quality.ratelimit. Cost governance is documented in Cost Governance. Pending work under issue #82:

  • TailBasedSamplingPolicy — buffer per session, decide at session end based on outcome (matches OTel's tail-based trace sampling).
  • YAML-driven rule loader for RuleBasedSamplingPolicy (programmatic builder is the canonical API; YAML is a producer).
  • Grafana dashboard JSON + Prometheus Alertmanager rules for rate-limit drop counts and budget utilisation.

On this page