# TnsAI Framework — Full Documentation > Concatenated MDX of every docs page on tnsai.dev. > Generated from `content/docs/**/*.mdx` at build time. > Page boundaries marked by `---` and a `## URL` heading. --- # Advanced Agent Features URL: https://tnsai.dev/docs/agents/advanced Description: Beyond the basic Agent lifecycle (create, chat, stop), TnsAI.Core provides specialized subsystems for cognitive support, streaming, ensemble execution, hierarchy management, and chat orchestration. These are internal components extracted from the Agent class for cohesion; most are accessed through the Agent public API rather than directly. import { Callout } from 'fumadocs-ui/components/callout' ## AgentCognitiveSupport `com.tnsai.agents.AgentCognitiveSupport` holds planning, reasoning, evaluation, environment perception, and feedback collection for an Agent. It is created internally during agent construction and accessed through `AgentCapabilities`. ### Evaluation Hooks Eval hooks fire at lifecycle points (before/after chat, before/after tool call, on error, on agent stop) and record metrics into an `EvalContext`. ```java // Add an evaluation hook agent.addEvalHook(myHook); // Enable/disable evaluation agent.setEvalEnabled(true); // Record goal completion agent.recordGoalCompletion("goal-1", true, Map.of("steps", 3)); ``` Key methods on `AgentCognitiveSupport`: | Method | Description | | -------------------------------------------------------------------------------------- | --------------------------------------------------- | | `addEvalHook(EvalHook hook)` | Register an eval hook | | `removeEvalHook(EvalHook hook)` | Unregister an eval hook | | `clearEvalContext()` | Create a fresh `EvalContext` with new session ID | | `setEvalEnabled(boolean)` | Enable or disable all eval hooks | | `isEvalEnabled()` | Check if evaluation is active | | `recordGoalCompletion(String goalId, boolean success, Map details)` | Record a goal outcome | | `fireOnBeforeChat(String message)` | Fire pre-chat eval event | | `fireOnAfterChat(String result, long latencyMs)` | Fire post-chat eval event | | `fireOnBeforeToolCall(String toolName, Map arguments)` | Fire pre-tool eval event | | `fireOnAfterToolCall(String toolName, Object result, boolean success, long latencyMs)` | Fire post-tool eval event | | `fireOnAgentStop(String status)` | Fire agent stop event and complete the eval context | ### Planning (BDI) Planning uses a `PlannerHandle` (SPI, provided by tnsai-intelligence) for GOAP/HTN planning. ```java // Set a planner agent.setPlannerHandle(myPlanner); // Plan for all goals List steps = agent.plan(); // Plan for a specific goal List steps = agent.planForGoal("deliverPackage"); // Execute a plan PlannerHandle.PlanResult result = agent.executePlan(steps); ``` Key methods on `AgentCognitiveSupport`: | Method | Signature | | --------------------- | ---------------------------------------------------------------- | | `plan` | `List plan(List roles)` | | `planForGoal` | `List planForGoal(String goalName, List roles)` | | `executePlan` | `PlanResult executePlan(List steps, List roles)` | | `extractCurrentState` | `Map extractCurrentState(List roles)` | | `setPlannerHandle` | `void setPlannerHandle(PlannerHandle handle)` | | `getPlannerHandle` | `Optional getPlannerHandle()` | | `isPlanningEnabled` | `boolean isPlanningEnabled()` | ### Reasoning Strategy Reasoning strategies (ReAct, Tree-of-Thought, etc.) are pluggable via `ReasoningStrategyHandle`. ```java agent.setReasoningStrategy(reactStrategy); // After chat, inspect reasoning Optional result = agent.getLastReasoningResult(); ``` When reasoning is enabled, `AgentOrchestrator.chat()` routes through the strategy instead of direct LLM invocation. ### Environment Perception Agents can perceive and act on environments (BDI perception-to-belief pipeline). ```java agent.setEnvironment(myEnvironment); Percept percept = agent.perceive(); ActionOutcome outcome = agent.actOnEnvironment("moveForward", Map.of("speed", 5)); ``` Environment changes trigger `onEnvironmentChangeWithRoles`, which checks for unsatisfied goals if a planner is configured. ### Feedback Collection `FeedbackCollector` (SPI, provided by tnsai-intelligence) records tool outcomes for preference learning. ```java agent.setFeedbackCollector(myCollector); FeedbackCollector collector = agent.getFeedbackCollector(); ``` Tool outcomes are recorded automatically via `recordToolOutcome(toolName, result, success, latencyMs)`. ## AgentEnsembleExecutor `com.tnsai.agents.ensemble.AgentEnsembleExecutor` provides multi-LLM ensemble operations accessed via `agent.getEnsembleExecutor()`. ### Patterns | Method | Signature | Description | | ------------------ | ---------------------------------------------------------------------------------------------------------- | ------------------------------------------ | | `parallelChat` | `ParallelResults parallelChat(String message, List llms)` | Fan out to all LLMs, collect all responses | | `raceChat` | `Optional raceChat(String message, List llms, long timeoutSeconds)` | First successful response wins | | `consensusChat` | `Optional consensusChat(String message, List llms, Function scorer)` | Score responses, pick highest | | `majorityVoteChat` | `Optional majorityVoteChat(String message, List llms)` | Most common response pattern | | `ensembleChat` | `Optional ensembleChat(String message, List llms, LLMClient aggregator)` | Synthesize responses with aggregator LLM | ```java // Fan-out to multiple LLMs ParallelLLM.ParallelResults results = agent.getEnsembleExecutor() .parallelChat("Explain quantum computing", List.of(gpt4, claude, gemini)); // Race: first response wins Optional fastest = agent.getEnsembleExecutor() .raceChat("Quick question", List.of(gpt4, claude), 10); // Consensus with scoring Optional best = agent.getEnsembleExecutor() .consensusChat("Review this code", llms, response -> scoreQuality(response)); // Ensemble synthesis Optional synthesized = agent.getEnsembleExecutor() .ensembleChat("Complex analysis", List.of(gpt4, claude), aggregatorLLM); ``` ## AgentHierarchyManager `com.tnsai.agents.hierarchy.AgentHierarchyManager` manages parent-child relationships between agents, including bidirectional consistency and task delegation/escalation. ### Hierarchy Setup ```java Agent supervisor = new SupervisorAgent(); Agent developer = new DeveloperAgent(); Agent tester = new TesterAgent(); // Bidirectional parent-child links supervisor.addChild(developer); supervisor.addChild(tester); // developer.getParent() == supervisor // supervisor.getChildren() contains both // Add multiple at once supervisor.addChildren(developer, tester, designer); ``` ### Task Delegation and Escalation ```java // Supervisor delegates to child supervisor.delegateToChild(developer.id(), "writeCode", Map.of( "task", "Implement authentication", "language", "Java" )); // Developer escalates to supervisor developer.escalateToParent("requestApproval", Map.of( "reason", "Production deployment requires sign-off" )); ``` Both methods use `AgentCommunicationManager` to send `TaskMessageType.REQUEST` messages. `delegateToChild` throws `NoSuchElementException` if the child is not found; `escalateToParent` throws `IllegalStateException` if no parent is set. ### HierarchyContext Interface The manager uses a `HierarchyContext` interface (implemented by `Agent`) to access internals: ```java public interface HierarchyContext { String getAgentId(); Agent getAgentRef(); Agent getParentRef(); void setParentRef(Agent parent); List getChildrenSnapshot(); boolean hasChild(Agent child); boolean addChildDirect(Agent child); boolean removeChildDirect(Agent child); Agent findChildById(String childId); AgentCommunicationManager getCommunicationManager(); } ``` ## AgentStreamingSupport `com.tnsai.agents.streaming.AgentStreamingSupport` handles streaming and event-based chat operations. ### Token Streaming ```java // Stream tokens as a Java Stream Stream tokens = agent.streamChat("Tell me about Java"); tokens.forEach(System.out::print); ``` ### Streaming with Tool Calls The `streamChatWithTools` method combines streaming with multi-turn tool execution (up to 10 iterations): ```java agent.streamChatWithTools("Search for TnsAI docs", chunk -> { if (chunk.isContent()) { System.out.print(chunk.getContent()); } else if (chunk.isToolCall()) { System.out.println("Tool: " + chunk.getToolCall().get().getName()); } }); ``` Flow: stream LLM response -\\> if tool calls received, execute them -\\> send results back to LLM -\\> repeat until final text or max iterations (10). ### Event-Based Chat (AG-UI) ```java // With direct event consumer agent.chatWithEvents("Hello", event -> { if (event instanceof TextDeltaEvent delta) { System.out.print(delta.getText()); } else if (event instanceof ToolCallStartEvent tc) { System.out.println("Calling: " + tc.getToolName()); } }); // With session-based publisher agent.chatWithEvents("Hello", sessionId); ``` Events emitted: `RunStartEvent`, `StatusEvent`, `TextDeltaEvent`, `ToolCallStartEvent`, `ToolCallEndEvent`, `ErrorEvent`, `RunEndEvent`. ## AgentChatOrchestrator `com.tnsai.agents.chat.AgentChatOrchestrator` handles LLM invocation, response processing, structured output, and RAG augmentation. ### LLM Invocation ```java // Full control Object response = chatOrchestrator.invokeLLM(message, useHistory, useTools, trace); // Process response (handles text and tool calls) String result = chatOrchestrator.processLLMResponse(response, useHistory); ``` ### Structured Output (Guardrails Pattern) ```java // With custom parser MyOutput output = agent.chatWithStructure( "Extract the key points", new MyOutputParser(), 3 // maxRetries for correction ); // With format detection (JSON, YAML, TOML) MyRecord record = agent.chatWithFormat("Generate config", MyRecord.class, 2); // With explicit format MyRecord record = agent.chatWithFormat("Generate config", MyRecord.class, OutputFormat.YAML, 2); ``` The orchestrator detects output format from `@OutputFormatSpec` on the target class or agent class, defaulting to JSON. ### RAG (Knowledge Base Augmentation) When a `KnowledgeBase` is set on the agent, the orchestrator automatically augments user messages with relevant knowledge results before sending to the LLM: ```java agent.setKnowledgeBase(myKnowledgeBase); agent.setKnowledgeBaseTopK(5); // Messages are now automatically augmented with knowledge context agent.chat("What is our refund policy?"); ``` ## AgentOrchestrator `com.tnsai.agents.orchestration.AgentOrchestrator` is the top-level coordinator that wires together `AgentToolExecutor`, `AgentChatOrchestrator`, `AgentStreamingSupport`, and `AgentEnsembleExecutor`. It implements the context interfaces for all three sub-delegates so they can access agent internals without circular dependencies. ### Key Public Methods | Method | Description | | ----------------------------------------------------------------------------- | --------------------------------------------------------------- | | `chat(String message, boolean useHistory, boolean useTools)` | Main chat entry point with eval hooks and context graph tracing | | `streamChat(String message)` | Token-by-token streaming | | `streamChatWithTools(String message, Consumer handler)` | Streaming with tool calling loop | | `chatWithEvents(String message, Consumer consumer)` | Event-based chat | | `chatWithEvents(String message, String sessionId)` | Event-based chat with session publisher | | `executeActionPublic(String name, Map params)` | Execute a named action | | `executeActionTyped(ActionRequest request)` | Execute with typed request/response | | `executeActionOnRole(String roleId, String name, Map params)` | Execute on a specific role | | `setToolCallFilter(ToolCallFilter filter)` | Set permission control for tool calls | | `setToolCallListener(ToolCallListener listener)` | Set progress callbacks | | `setKnowledgeBase(KnowledgeBase kb)` | Set RAG knowledge base | | `shutdown()` | End context conversations | ### Context Graph Integration When context graph is enabled, the orchestrator automatically: - Starts/ends context conversations around chat sessions - Creates snapshots at conversation boundaries - Records `DecisionTrace` entries for each chat (success or failure) ## AgentCapabilities `com.tnsai.agents.capabilities.AgentCapabilities` is the capability facade that bridges cognitive support, resilience, variants, and context graphs. Agent delegates to this class for all capability operations. ### Variant Selection ```java agent.setVariant(AgentVariant.HIGH); AgentVariant current = agent.getVariant(); // Auto-resolve based on task AgentVariant resolved = agent.resolveVariant("Complex refactoring task"); // Custom selector agent.setVariantSelector(mySelector); ``` ### Context Graph ```java agent.enableContextGraph(snapshotStore, decisionTraceStore); Optional cm = agent.getContextManager(); boolean enabled = agent.isContextGraphEnabled(); ``` ### Resilience ```java AgentHealthState health = agent.getHealthState(); ResilienceExecutor executor = agent.getResilienceExecutor(); RetryPolicy policy = agent.getDefaultRetryPolicy(); DeadLetterQueue dlq = agent.getDeadLetterQueue(); agent.clearRecoveryState(); ``` ## Related Documentation - [Streaming](/docs/agents/behavior/streaming) -- streaming basics and ChatChunk - [Tools](/docs/capabilities/tools/registration) -- tool registration and the Tool interface - [Roles](/docs/agents/fundamentals/roles) -- role-based action discovery - [Events](/docs/agents/fundamentals/events) -- TnsAI event system - [Resilience](/docs/agents/reliability/resilience) -- retry, circuit breaker, recovery - [Variants](/docs/agents/behavior/variants) -- AgentVariant cost/quality tiers --- # Behavior URL: https://tnsai.dev/docs/agents/behavior Description: How your agent talks, streams, and remembers. import { Callout } from 'fumadocs-ui/components/callout' ## Pages - [Prompt Strategies](/docs/agents/behavior/prompt-strategies) — System prompt templates, role composition, example injection. - [Output Parsing](/docs/agents/behavior/output-parsing) — Structured output with `chatWithFormat`, retries, validation. - [Streaming](/docs/agents/behavior/streaming) — `streamChatWithTools`, chunk types, event flow. - [Variants](/docs/agents/behavior/variants) — Quality, speed, cost tiers on a single agent. - [Memory](/docs/agents/behavior/memory) — Conversation history, window management, summarization. --- # Memory URL: https://tnsai.dev/docs/agents/behavior/memory Description: TnsAI.Core provides a pluggable memory system for agent conversation history. The MemoryStore interface defines storage, retrieval, pruning, and search operations. Four implementations cover different persistence and sharing requirements. The AgentBuilder.memoryStore() method wires a store into an agent. import { Callout } from 'fumadocs-ui/components/callout' ## MemoryStore Interface `MemoryStore` is the contract that all memory implementations must follow. It defines how an agent saves, retrieves, prunes, and searches its conversation history. You pick an implementation based on your persistence needs (in-memory for development, file-based for production, shared for multi-agent setups). ```java public interface MemoryStore { void init(String agentId); void addMessage(String role, String content); void addMessage(Map message); List> getHistory(); List> getRecentHistory(int limit); void clear(); void prune(int maxTokens); default List search(String query, int limit); } ``` | Method | Description | | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | | `init(String agentId)` | Initializes the store for a specific agent. Triggers history loading for persistent stores. | | `addMessage(String role, String content)` | Adds a simple message with role (`user`, `assistant`, `system`) and content. | | `addMessage(Map message)` | Adds a structured message (e.g., with tool calls, metadata). The map is defensively copied. | | `getHistory()` | Returns the full conversation history as an unmodifiable list. | | `getRecentHistory(int limit)` | Returns the last N messages. | | `clear()` | Removes all messages. | | `prune(int maxTokens)` | Removes oldest messages until the estimated token count is within the limit. System prompts are preserved when possible. | | `search(String query, int limit)` | Semantic search over history. Default returns empty list. `AbstractMemoryStore` provides a TF-IDF implementation. | ## AbstractMemoryStore If you want to create your own memory store, extend `AbstractMemoryStore` instead of implementing `MemoryStore` from scratch. It handles thread safety, pruning, and search out of the box -- you only need to define how history is loaded and what happens after it changes. All built-in implementations extend `AbstractMemoryStore`, which provides: - **Thread safety** via `ReentrantLock` (Virtual Thread friendly, avoids carrier thread pinning) - **LinkedList** backing for O(1) removal during pruning - **TF-IDF search** using log-normalized term frequency and smooth inverse document frequency - **Extension hooks**: `onHistoryChanged()` and `loadHistory()` Subclasses only need to override two methods: ```java // Called after any modification (addMessage, clear, prune) protected void onHistoryChanged() { } // Called during init() to load persisted history protected void loadHistory() { } ``` Additional utility method: ```java // Returns current history size public int size(); ``` ## InMemoryStore The simplest memory store -- it keeps conversation history in a plain list in memory. This is the default when you do not configure any memory store on your agent. It is fast and requires zero setup, but all data is lost when the process exits. ```java public class InMemoryStore extends AbstractMemoryStore { public InMemoryStore(); } ``` All functionality is inherited from `AbstractMemoryStore` with no persistence hooks. Suitable for: - Development and testing - Short-lived conversations - Scenarios where persistence is not required This is the default store used by `AgentBuilder` when no store is explicitly configured. ## FileMemoryStore When you need conversations to survive restarts, use `FileMemoryStore`. It writes each agent's history to a JSON file on disk after every change and reloads it automatically when the agent initializes. ```java public class FileMemoryStore extends AbstractMemoryStore { public FileMemoryStore(); // uses default dir: .tnsai/memory/ public FileMemoryStore(String storageDirPath); public Path getAgentFile(); // path to agent's JSON file } ``` Each agent's history is stored in a separate file: `{storageDirPath}/{agentId}.json`. Suitable for: - Production environments requiring conversation persistence - Long-running agents that need to resume conversations - Multi-session scenarios Example: ```java FileMemoryStore store = new FileMemoryStore("./data/memory"); store.init("agent-1"); store.addMessage("user", "Hello"); // Automatically saved to ./data/memory/agent-1.json // On restart, history is loaded from file FileMemoryStore restored = new FileMemoryStore("./data/memory"); restored.init("agent-1"); List> history = restored.getHistory(); // Contains previous messages ``` ## SharedMemoryStore In multi-agent systems, you often want several agents to read and write the same conversation history. `SharedMemoryStore` wraps any other `MemoryStore` and adds an access-control layer so only authorized agents can interact with the shared memory. ```java public class SharedMemoryStore implements MemoryStore { public SharedMemoryStore(MemoryStore delegate, String namespace); public void grantAccess(String agentId); public void revokeAccess(String agentId); public boolean hasAccess(String agentId); public String getNamespace(); } ``` Thread safety uses `ReentrantReadWriteLock` for high-throughput concurrent reads with exclusive writes. Only the first authorized agent to call `init()` initializes the underlying store. Calling `init()` with an unauthorized agent ID throws `IllegalStateException`. Example: ```java MemoryStore base = new InMemoryStore(); SharedMemoryStore shared = new SharedMemoryStore(base, "team-chat"); shared.grantAccess("agent-a"); shared.grantAccess("agent-b"); // First agent initializes the shared memory shared.init("agent-a"); // Both agents can read and write shared.addMessage("assistant", "Hello from agent-a"); List> history = shared.getHistory(); // visible to all authorized agents ``` ## SharedMemoryRegistry When you have many shared memory namespaces across your application, `SharedMemoryRegistry` acts as a central lookup. It ensures each namespace has exactly one `SharedMemoryStore` instance, creating one on demand if it does not exist yet. The framework uses this internally when processing `@Memory(shared = true, shareWith = {...})` annotations. ```java public final class SharedMemoryRegistry { public static SharedMemoryRegistry getInstance(); public SharedMemoryStore getOrCreate(String namespace); public SharedMemoryStore getOrCreate(String namespace, MemoryStore baseStore); public Optional find(String namespace); public boolean remove(String namespace); public void clear(); public int size(); } ``` `getOrCreate(String namespace)` creates a `SharedMemoryStore` wrapping an `InMemoryStore` if the namespace does not exist. Use the two-argument overload to provide a custom base store (e.g., `FileMemoryStore` for persistence). Example: ```java SharedMemoryRegistry registry = SharedMemoryRegistry.getInstance(); // Get or create a shared namespace SharedMemoryStore shared = registry.getOrCreate("project-alpha"); shared.grantAccess("researcher"); shared.grantAccess("writer"); // With custom persistence SharedMemoryStore persistent = registry.getOrCreate( "persistent-namespace", new FileMemoryStore("./shared-memory") ); // Look up existing namespace Optional found = registry.find("project-alpha"); // Clean up registry.remove("project-alpha"); ``` ## MemoryStoreFactory If you use the `@MemorySpec` annotation to configure memory declaratively, `MemoryStoreFactory` is what turns that annotation into an actual `MemoryStore` instance at runtime. It reads the annotation's fields and automatically wraps the base store with decorators for capacity limits, token budgets, or summarization as needed. ```java public final class MemoryStoreFactory { public static MemoryStore create(MemorySpec config); public static MemoryStore createDefault(); // returns new InMemoryStore() } ``` Supported persistence types: `IN_MEMORY`, `FILE`, `REDIS` (via SPI), `DATABASE` (via SPI). The factory automatically wraps the base store with decorators based on configuration: - **CapacityAwareMemoryStore** -- enforces message count limits with configurable prune strategy - **SummarizingMemoryStore** -- summarizes old messages instead of deleting them when capacity is exceeded - **TokenAwareMemoryStore** -- enforces token limits by pruning after each message ## Integration with AgentBuilder The most common way to configure memory is through the `AgentBuilder`. Call `.memoryStore()` to plug in any `MemoryStore` implementation. If you skip this step, the agent defaults to `InMemoryStore` (volatile, no persistence). ```java // Default (in-memory, volatile) Agent agent = AgentBuilder.create() .model("claude-sonnet-4") .build(); // Persistent file storage Agent agent = AgentBuilder.create() .model("claude-sonnet-4") .memoryStore(new FileMemoryStore("./data/memory")) .build(); // Shared memory between agents SharedMemoryStore shared = SharedMemoryRegistry.getInstance() .getOrCreate("team-namespace"); shared.grantAccess("agent-1"); shared.grantAccess("agent-2"); Agent agent1 = AgentBuilder.create() .model("claude-sonnet-4") .memoryStore(shared) .build(); Agent agent2 = AgentBuilder.create() .model("gpt-4o") .memoryStore(shared) .build(); ``` ## Code Examples These examples demonstrate typical memory usage patterns, from basic conversation persistence to semantic search and token-aware pruning. ### Conversation with History Management This example creates an agent with file-based memory so conversations survive restarts. You can also access the memory store directly to inspect recent history. ```java Agent agent = AgentBuilder.create() .model("claude-sonnet-4") .memoryStore(new FileMemoryStore("./memory")) .build(); // Conversation persists across restarts agent.chat("What is the capital of France?"); agent.chat("What about Germany?"); // Access history directly MemoryStore memory = agent.getMemoryStore(); List> recent = memory.getRecentHistory(5); ``` ### Semantic Search over History The built-in TF-IDF search lets you find past messages by meaning rather than scrolling through the full history. This is useful for agents that need to recall earlier context from a long conversation. ```java MemoryStore store = new InMemoryStore(); store.init("search-agent"); store.addMessage("user", "Tell me about Java generics"); store.addMessage("assistant", "Java generics provide compile-time type safety..."); store.addMessage("user", "How do I create a REST API?"); store.addMessage("assistant", "You can use Spring Boot or Javalin..."); // TF-IDF search finds relevant messages List results = store.search("generics type safety", 2); // Returns messages about Java generics ranked by relevance ``` ### Token-Aware Pruning LLMs have a maximum context window size. When your conversation history grows too large, call `prune()` to remove the oldest messages until the estimated token count fits within the model's limit. System prompts are preserved when possible. ```java MemoryStore store = new InMemoryStore(); store.init("pruning-agent"); // Add many messages for (int i = 0; i < 1000; i++) { store.addMessage("user", "Message " + i + " with some content"); store.addMessage("assistant", "Response to message " + i); } // Prune to fit context window store.prune(4096); // Oldest messages are removed, keeping history within token limit ``` ## Advanced Memory The `com.tnsai.memory.advanced` package provides production-grade memory capabilities for agents that need more than simple history management. These include vector-based semantic search, keyword-based BM25 indexing, hybrid retrieval that combines both, skill persistence, importance-based pruning, staleness detection, and full session serialization. ### VectorMemoryStore When you need to find past messages based on meaning (not just keywords), `VectorMemoryStore` converts each message into an embedding vector and uses cosine similarity to find the closest matches. This powers true semantic retrieval over conversation history. ```java VectorMemoryStore vectorStore = VectorMemoryStore.builder() .embeddingClient(embeddingClient) .dimensions(1536) .build(); vectorStore.init("agent-1"); vectorStore.addMessage("user", "The deployment uses Kubernetes with 3 replicas"); // Semantic search -- finds relevant messages even with different wording List results = vectorStore.search("container orchestration setup", 5); ``` ### BM25Index While vector search excels at finding semantically similar content, sometimes you need exact keyword matching. `BM25Index` uses the BM25 scoring algorithm (the same approach behind traditional search engines) to rank documents by how well they match specific terms. ```java BM25Index index = new BM25Index(); index.addDocument("doc-1", "Kubernetes deployment configuration"); index.addDocument("doc-2", "Database connection pooling settings"); List results = index.search("Kubernetes", 5); ``` ### HybridMemoryRetriever For the best retrieval quality, combine both approaches. `HybridMemoryRetriever` runs a vector search and a BM25 keyword search in parallel, then merges the results using Reciprocal Rank Fusion (RRF). This captures both semantic meaning and exact keyword matches in a single query. ```java HybridMemoryRetriever retriever = HybridMemoryRetriever.builder() .vectorStore(vectorStore) .bm25Index(bm25Index) .vectorWeight(0.6) .keywordWeight(0.4) .fusionK(60) .build(); List results = retriever.search("deployment configuration", 10); ``` RRF fusion merges the ranked lists from both retrieval methods, giving high-quality results that capture both semantic meaning and exact keyword matches. ### SkillMemoryStore Agents can learn reusable procedures and remember them across sessions. `SkillMemoryStore` saves these skills as Markdown files on disk, making them human-readable, easy to version-control with Git, and editable by hand if needed. ```java SkillMemoryStore skillStore = new SkillMemoryStore(".tnsai/skills/"); skillStore.saveSkill("deploy-k8s", SkillEntry.builder() .name("deploy-k8s") .description("Deploy application to Kubernetes cluster") .steps(List.of("Build Docker image", "Push to registry", "Apply manifests")) .tags(Set.of("devops", "kubernetes")) .build()); Optional skill = skillStore.loadSkill("deploy-k8s"); List devopsSkills = skillStore.findByTag("devops"); ``` ### MemoryImportanceScorer When memory grows too large and needs pruning, not all entries are equally valuable. `MemoryImportanceScorer` assigns an importance score to each entry based on recency, access frequency, semantic distinctiveness, and user-marked importance, so the pruning process can keep the most valuable memories. ```java MemoryImportanceScorer scorer = new MemoryImportanceScorer(); double importance = scorer.score(memoryEntry); // Factors: recency, access frequency, semantic distinctiveness, user-marked importance ``` ### StalenessDetector Over time, memory accumulates near-duplicate or outdated entries that waste context window space. `StalenessDetector` uses Jaccard similarity to find entries that are too similar to each other, and age thresholds to flag entries that are too old, so you can clean them up. ```java StalenessDetector detector = StalenessDetector.builder() .similarityThreshold(0.85) // entries above 85% similarity are stale .maxAge(Duration.ofDays(30)) // entries older than 30 days are candidates .build(); List staleIds = detector.detectStale(memoryEntries); ``` ### SessionSerializer / SessionRestorer If you need to save and restore an agent's complete session state (not just conversation history, but all runtime state), these utilities serialize the session to bytes and restore it later. This is useful for long-running agents that need to survive process restarts without losing their place. ```java // Save session state SessionSerializer serializer = new SessionSerializer(); byte[] data = serializer.serialize(agent.getSession()); Files.write(Path.of("session-backup.bin"), data); // Restore session state SessionRestorer restorer = new SessionRestorer(); AgentSession restored = restorer.restore(Files.readAllBytes(Path.of("session-backup.bin"))); agent.restoreSession(restored); ``` --- # Output Parsing & Serialization URL: https://tnsai.dev/docs/agents/behavior/output-parsing Description: TnsAI provides type-safe output parsing for converting raw LLM responses into structured Java objects, and a multi-format serialization system for producing structured output. import { Callout } from 'fumadocs-ui/components/callout' ## OutputParser\ Interface LLMs return free-form text, but your application usually needs structured Java objects. The `OutputParser` interface defines the contract for converting raw LLM text into a typed object of your choice. It also provides prompt instructions you can include in your LLM call so the model knows what format to produce. | Method | Description | | ----------------------------------------------- | ---------------------------------------------------------- | | `parse(String)` | Returns a `ParseResult` (success or error) | | `parseOrThrow(String)` | Returns `T` or throws `ParseException` | | `parseOptional(String)` | Returns `Optional` | | `getTargetType()` | Returns the `Class` this parser produces | | `getFormatInstruction()` | Prompt text guiding the LLM to produce the expected format | | `getSchemaDescription()` | Schema string (e.g. JSON Schema) for the target type | | `getErrorCorrectionPrompt(failedOutput, error)` | Generates a retry prompt when parsing fails | ```java OutputParser parser = JsonOutputParser.forType(WeatherResponse.class); ParseResult result = parser.parse(llmOutput); if (result.isSuccess()) { WeatherResponse weather = result.get(); } ``` ## JsonOutputParser\ The most commonly used parser. It extracts JSON from LLM responses -- even when the model wraps JSON in markdown code blocks or surrounds it with explanatory text -- and deserializes it into your Java class using Jackson. **Features:** - Extracts JSON from ` ```json ` code blocks - Falls back to raw `{...}` or `[...]` detection - Supports Java Records and POJOs - Field validation via `OutputValidator` - Auto-generates schema descriptions from target type reflection ### Factory method The simplest way to create a `JsonOutputParser` is with the static `forType()` factory. It sets up sensible Jackson defaults and auto-generates schema descriptions from your target class. ```java JsonOutputParser parser = JsonOutputParser.forType(Person.class); ``` ### Builder For more control, use the builder to supply a custom Jackson `ObjectMapper`, enable strict mode (which rejects unknown JSON properties), or plug in a custom validator. ```java JsonOutputParser parser = JsonOutputParser.builder(Person.class) .objectMapper(customMapper) // custom Jackson ObjectMapper .strictMode(true) // fail on unknown properties .validator(customValidator) // custom OutputValidator .build(); ``` ### Parsing LLM output with embedded JSON This example demonstrates the parser's ability to extract a JSON block from an LLM response that includes surrounding explanatory text. The parser automatically locates the JSON within the markdown code fence and deserializes it. ````java record Person(String name, int age) {} JsonOutputParser parser = JsonOutputParser.forType(Person.class); ParseResult result = parser.parse(""" Here's the person data: ```json {"name": "John", "age": 30} ``` """); Person person = result.get(); // Person[name=John, age=30] ```` Default ObjectMapper settings: - `FAIL_ON_UNKNOWN_PROPERTIES = false` - `ACCEPT_SINGLE_VALUE_AS_ARRAY = true` - `INDENT_OUTPUT = true` - `NON_NULL` property inclusion ## RetryableParser\ LLMs occasionally produce malformed output -- missing fields, broken JSON, or wrong structure. `RetryableParser` wraps any parser and handles this automatically: when parsing fails, it sends the LLM an error-correction prompt explaining what went wrong and asks it to try again, up to a configurable number of retries. ### Wrapping a parser To add retry behavior, wrap your existing parser with `RetryableParser.wrap()`. You can optionally configure the maximum number of retries (default is 3). ```java JsonOutputParser baseParser = JsonOutputParser.forType(Person.class); RetryableParser parser = RetryableParser.wrap(baseParser) .maxRetries(3) // default is 3 .build(); ``` ### Manual retry flow If you want to control the retry loop yourself (for example, to use a different LLM for corrections), you can get the correction prompt and send it manually. ```java ParseResult result = parser.parse(llmOutput); if (result.isFailure()) { String correctionPrompt = parser.getCorrectionPrompt(llmOutput, result.getError()); // Send correctionPrompt to LLM, then parse the new response } ``` ### Automatic retry with LLM function The easiest approach: pass `parseWithRetry` a function that calls your LLM. It will automatically loop -- sending correction prompts and re-parsing -- up to `maxRetries` times until parsing succeeds or retries are exhausted. ```java ParseResult result = parser.parseWithRetry(initialOutput, prompt -> { return llmClient.chat(prompt); // your LLM call }); if (result.isSuccess()) { Person person = result.get(); } ``` ### Attempt tracking The retryable parser records every attempt so you can inspect what happened during the retry loop -- useful for debugging and monitoring parse success rates. ```java parser.getAttemptCount(); // total attempts made parser.getAttempts(); // List (output, success, error) parser.getLastAttempt(); // most recent attempt parser.clearAttempts(); // reset history ``` ## ParseResult\ Every parser returns a `ParseResult` instead of throwing exceptions or returning null. It is a monadic result type (similar to Rust's `Result` or Scala's `Either`) that always tells you whether parsing succeeded or failed, and gives you safe access to the value or the error message. ### Construction You create `ParseResult` instances through static factory methods rather than constructors. Parsers return these automatically, but you can also create them yourself for testing or custom parsing logic. | Factory method | Description | | ------------------------------------------------------------------- | ------------------------------ | | `ParseResult.success(value, rawOutput, parseTimeMs)` | Successful parse with timing | | `ParseResult.success(value)` | Successful parse (shorthand) | | `ParseResult.failure(error, rawOutput)` | Failed parse | | `ParseResult.validationFailure(error, validationErrors, rawOutput)` | Failed validation with details | ### Querying These methods let you inspect the result, extract the parsed value, or get error details without risking null pointer exceptions. ```java result.isSuccess(); // true if parsed result.isFailure(); // true if error result.get(); // value or throws IllegalStateException result.getOrElse(defaultVal); // value or fallback result.getError(); // error message (null on success) result.getValidationErrors(); // List validation details result.getRawOutput(); // original LLM text result.getParseTimeMs(); // parse duration in ms result.toOptional(); // Optional ``` ### Transformations Like `Optional` or `Stream`, `ParseResult` supports `map` and `flatMap` so you can transform the parsed value without unwrapping it first. Failures pass through unchanged. ```java // Map to a different type ParseResult nameResult = result.map(User::name); // FlatMap to another ParseResult ParseResult
addr = result.flatMap(user -> parseAddress(user.addressJson())); ``` ### Side effects Use these methods to run an action only when the result is a success or a failure, without needing an `if` statement. This keeps your code concise and readable. ```java // Conditional actions result.ifSuccess(user -> save(user)); result.ifFailure(error -> log.warn(error)); // Handle both cases result.ifSuccessOrElse( user -> System.out.println("Parsed: " + user), error -> System.err.println("Error: " + error) ); ``` ## OutputSerializer Interface While `OutputParser` converts LLM text into Java objects, `OutputSerializer` does the reverse: it converts Java objects into structured text formats (JSON, YAML, etc.) and back again. This is useful when you need to produce output in a specific format, or when re-serializing a parsed result into a different format for downstream consumers. | Method | Description | | ---------------------------------------------- | ------------------------------------------------ | | `getFormat()` | The `OutputFormat` this serializer handles | | `serialize(data, prettyPrint)` | Serialize an object to string | | `serialize(data)` | Serialize with pretty print (default) | | `serializeList(items, itemClass, prettyPrint)` | Serialize a typed list | | `deserialize(data, targetClass)` | Deserialize string to object | | `deserializeList(data, itemClass)` | Deserialize string to typed list | | `getFormatInstructions(targetClass)` | LLM prompt instructions for single-object output | | `getListFormatInstructions(itemClass)` | LLM prompt instructions for list output | | `supportsType(dataClass)` | Check if format supports a data structure | ### OutputFormat enum TnsAI supports five output formats. JSON and YAML are standard, while TOON and TONL are custom token-optimized formats that significantly reduce token usage when sending structured data to LLMs. | Format | Extension | MIME Type | Nesting | Token Efficiency | | ------ | --------- | -------------------- | ------- | --------------------- | | `JSON` | `.json` | `application/json` | Yes | Baseline | | `YAML` | `.yaml` | `application/x-yaml` | Yes | \~10-15% fewer tokens | | `TOON` | `.toon` | `text/x-toon` | Yes | \~40% fewer tokens | | `TONL` | `.tonl` | `text/x-tonl` | Yes | \~32-50% fewer tokens | | `TEXT` | `.txt` | `text/plain` | No | N/A | ### Implementations Each format has a dedicated serializer class. You rarely need to use these directly -- the `OutputSerializerRegistry` provides a simpler API for accessing them. - **JsonOutputSerializer** -- Jackson-based JSON with configurable pretty printing. - **YamlOutputSerializer** -- Zero-dependency YAML with multi-line string support and flow-style compact lists. - **ToonOutputSerializer** -- Token-Optimized Object Notation for uniform arrays. - **TonlOutputSerializer** -- Token-Optimized Notation Language with schema support. - **TextOutputSerializer** -- Plain `toString()` serialization. Deserialization limited to `String` and basic primitives. ### OutputSerializerRegistry The registry is a one-stop shop for serialization. It holds all built-in serializers and provides convenience methods so you do not need to look up or instantiate serializers yourself. You can also register custom serializers here. ```java OutputSerializerRegistry registry = OutputSerializerRegistry.getInstance(); // Get a specific serializer OutputSerializer jsonSerializer = registry.getSerializer(OutputFormat.JSON); String json = jsonSerializer.serialize(myObject); // Convenience methods String yaml = registry.serialize(myObject, OutputFormat.YAML); Person person = registry.deserialize(jsonString, OutputFormat.JSON, Person.class); // LLM prompt instructions String instructions = registry.getFormatInstructions(OutputFormat.JSON, Person.class); // Register a custom serializer registry.register(OutputFormat.JSON, new CustomJsonSerializer()); // Reset to defaults registry.reset(); ``` ## Full Example This end-to-end example shows the complete workflow: defining an output type as a Java record, creating a parser with retry support, including format instructions in the prompt, parsing the LLM response with automatic correction, handling the result, and re-serializing to a different format. ```java // 1. Define output type record AnalysisResult(String summary, List issues, double score) {} // 2. Create parser with retry JsonOutputParser baseParser = JsonOutputParser.forType(AnalysisResult.class); RetryableParser parser = RetryableParser.wrap(baseParser) .maxRetries(2) .build(); // 3. Include format instructions in the prompt String prompt = "Analyze this code.\n\n" + baseParser.getFormatInstruction(); // 4. Parse LLM response with automatic retry String llmResponse = llmClient.chat(prompt); ParseResult result = parser.parseWithRetry(llmResponse, llmClient::chat); // 5. Handle result result.ifSuccessOrElse( analysis -> { System.out.println("Score: " + analysis.score()); analysis.issues().forEach(issue -> System.out.println("- " + issue)); }, error -> System.err.println("Parse failed after retries: " + error) ); // 6. Re-serialize to a different format if (result.isSuccess()) { OutputSerializerRegistry registry = OutputSerializerRegistry.getInstance(); String yaml = registry.serialize(result.get(), OutputFormat.YAML); System.out.println(yaml); } ``` --- # Prompt Strategies URL: https://tnsai.dev/docs/agents/behavior/prompt-strategies Description: TnsAI includes a prompt enhancement system that applies proven prompting techniques to improve LLM response quality. The system is built around the PromptStrategy enum, PromptEnhancer builder, and EnhancedPrompt output. import { Callout } from 'fumadocs-ui/components/callout' **Package:** `com.tnsai.prompt.strategy` ## PromptStrategy Enum Prompt strategies are research-backed techniques that improve LLM response quality by structuring how the model approaches a problem. Instead of writing complex system prompts by hand, you pick one or more strategies and the framework generates the right instructions automatically. Twelve predefined strategies are available, based on techniques from OpenAI, Anthropic, and Google AI research. | Strategy | Description | Multi-Pass | Post-Processing | | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :--------: | :-------------: | | `CHAIN_OF_THOUGHT` | Step-by-step reasoning before final answer. Best for math, logic, multi-step analysis. | No | No | | `CHAIN_OF_VERIFICATION` | Self-verification with generated questions. Initial answer, then verify, then refine. Reported 60% to 92% accuracy improvement on complex queries. | Yes | Yes | | `CONFIDENCE_WEIGHTED` | Includes confidence score (0--100%), key assumptions, and alternatives when confidence is below threshold. | No | No | | `STRUCTURED_THINKING` | Four-phase protocol: UNDERSTAND, ANALYZE, STRATEGIZE, EXECUTE. | No | No | | `MULTI_PERSPECTIVE` | Examines problem from Technical, Business, User Experience, and Risk perspectives, then synthesizes a balanced recommendation. | No | No | | `CONSTRAINT_FIRST` | Separates hard constraints (must satisfy) from soft preferences (nice to have) before proceeding. | No | No | | `ITERATIVE_REFINEMENT` | Multi-pass generation: Draft, Critique, Refine, Review. | Yes | Yes | | `CONTEXT_BOUNDARIES` | Clear separation of Context, Focus, Task, and Constraints. Explicitly flags insufficient information. | No | No | | `FEW_SHOT_EXAMPLES` | Learns from positive and negative examples to guide response format and quality. | No | No | | `META_PROMPTING` | AI designs the optimal prompt for the task first, then responds to it. | Yes | No | | `SIX_PART_ANATOMY` | Comprehensive structure: Role, Objective, Request, Process, Output, Stop Condition. | No | No | | `ATOM_OF_THOUGHT` | Decomposes problems into independent atoms solved in parallel, then synthesizes. Unlike CoT, errors are isolated per atom. +30--40% accuracy on complex reasoning, +20--30% token usage. Best for 70B+ parameter models. | Yes | No | ### Strategy Methods Each strategy enum value provides these methods to retrieve its system instruction text and to check whether it requires multiple LLM passes or post-processing. | Method | Return Type | Description | | -------------------------- | ----------- | --------------------------------------------------------------------------------------------------- | | `getSystemInstruction()` | `String` | The full system instruction text injected for this strategy | | `getDescription()` | `String` | Short human-readable description | | `requiresPostProcessing()` | `boolean` | `true` for `CHAIN_OF_VERIFICATION` and `ITERATIVE_REFINEMENT` | | `isMultiPass()` | `boolean` | `true` for `CHAIN_OF_VERIFICATION`, `ITERATIVE_REFINEMENT`, `META_PROMPTING`, and `ATOM_OF_THOUGHT` | ## PromptEnhancer Builder `PromptEnhancer` is where you assemble your prompting configuration. It is a fluent builder that lets you combine multiple strategies, set a role and objective, define constraints, provide few-shot examples, and choose an output format. When you call `.enhance()`, it compiles everything into an `EnhancedPrompt` ready to send to the LLM. ### Quick Construction If you only need a single strategy with no extra configuration, use the shorthand factory method. ```java // Single strategy PromptEnhancer enhancer = PromptEnhancer.withStrategy(PromptStrategy.CHAIN_OF_THOUGHT); EnhancedPrompt prompt = enhancer.enhance("Solve: 2x + 5 = 15"); ``` ### Full Builder API For more control, use the builder. You can combine multiple strategies, set a role and objective, add constraints and preferences, provide positive and negative examples, define process steps, and choose an output format. ```java PromptEnhancer enhancer = PromptEnhancer.builder() .strategy(PromptStrategy.CHAIN_OF_THOUGHT) .strategy(PromptStrategy.CONFIDENCE_WEIGHTED) .role("Expert mathematician") .objective("Solve algebra problems accurately") .constraint("Show all work") .constraint("Verify answer by substitution") .constraints(List.of("Use standard notation", "Simplify")) .softPreference("Explain in simple terms") .softPreferences(List.of("Use examples", "Keep it concise")) .positiveExample("2x = 10", "x = 5", "Correct division by 2") .positiveExample("3x + 1 = 7", "x = 2") .negativeExample("2x = 10", "x = 10", "Forgot to divide") .processStep("Parse the equation") .processStep("Isolate the variable") .process(List.of("Simplify", "Verify")) .outputFormat(OutputFormat.STRUCTURED) .stopCondition("Stop after verification pass") .verificationQuestions(5) // CoVe: number of verification questions .refinementPasses(3) // Iterative Refinement: number of passes .confidenceThreshold(0.7f) // Confidence Weighted: threshold for alternatives .build(); ``` ### Builder Methods The complete list of builder methods. All setter methods are chainable and can be called in any order. | Method | Parameter | Description | | ----------------------------------------- | -------------------------- | -------------------------------------------------------------- | | `strategy(PromptStrategy)` | strategy | Adds a prompting strategy (chainable, multiple allowed) | | `role(String)` | role | Sets the role/persona (e.g., "Expert researcher") | | `objective(String)` | objective | Sets the high-level goal | | `constraint(String)` | constraint | Adds a single hard constraint | | `constraints(List)` | constraints | Adds multiple hard constraints | | `softPreference(String)` | preference | Adds a single soft preference | | `softPreferences(List)` | preferences | Adds multiple soft preferences | | `positiveExample(String, String)` | input, output | Adds a good example | | `positiveExample(String, String, String)` | input, output, explanation | Adds a good example with reasoning | | `negativeExample(String, String, String)` | input, badOutput, whyBad | Adds a bad example with explanation | | `processStep(String)` | step | Adds a single numbered process step | | `process(List)` | steps | Adds multiple process steps | | `outputFormat(OutputFormat)` | format | Sets the response output format | | `stopCondition(String)` | condition | Sets the completion/stop condition | | `verificationQuestions(int)` | count | Number of verification questions (for `CHAIN_OF_VERIFICATION`) | | `refinementPasses(int)` | passes | Number of refinement passes (for `ITERATIVE_REFINEMENT`) | | `confidenceThreshold(float)` | threshold | Threshold for showing alternatives (for `CONFIDENCE_WEIGHTED`) | ### OutputFormat Enum This enum tells the LLM what format to use in its response. The framework appends the corresponding instruction to the system prompt so the model produces output in the desired structure. `PromptEnhancer.OutputFormat` controls the response format instruction. | Value | Instruction | | ------------------ | --------------------------------------------------- | | `TEXT` | Provide your response as plain text. | | `JSON` | Provide your response as valid JSON. | | `MARKDOWN` | Format your response using Markdown. | | `MARKDOWN_TABLE` | Format your response as a Markdown table. | | `BULLET_POINTS` | Use bullet points for your response. | | `NUMBERED_LIST` | Use a numbered list for your response. | | `STRUCTURED` | Use clear sections with headers for your response. | | `CODE` | Provide code with comments explaining each section. | | `COMPARISON_TABLE` | Create a comparison table with pros/cons. | ### Example Record When using the `FEW_SHOT_EXAMPLES` strategy, you teach the LLM by showing it input/output pairs. The `Example` record holds one such pair, with an optional explanation of why the output is correct (or incorrect for negative examples). ```java // record Example(String input, String output, String explanation) new PromptEnhancer.Example("2x = 10", "x = 5", "Divided both sides by 2"); new PromptEnhancer.Example("2x = 10", "x = 5"); // explanation is optional (null) ``` ### Enhancer Instance Methods Once built, the `PromptEnhancer` instance provides these methods. The main one is `enhance()`, which takes a user message and returns a fully assembled `EnhancedPrompt`. | Method | Return Type | Description | | ----------------------------- | ---------------------- | ------------------------------------------------------------------ | | `enhance(String userMessage)` | `EnhancedPrompt` | Applies all configured strategies and produces the enhanced prompt | | `getStrategies()` | `List` | Returns the configured strategies | | `requiresMultiPass()` | `boolean` | `true` if any strategy is multi-pass | ## EnhancedPrompt `EnhancedPrompt` is what you get after calling `enhance()`. It holds the assembled system prompt (with all strategy instructions baked in), the original user message, and metadata about which strategies are active. Pass it directly to your `LLMClient` to make the enhanced call. ### Methods These methods let you access the prompt content and check which capabilities the enhanced prompt expects from the LLM response. | Method | Return Type | Description | | ----------------------------- | ------------------------ | --------------------------------------------------------------------------------------------------- | | `getSystemPrompt()` | `String` | The full system prompt with all strategy instructions | | `getUserMessage()` | `String` | The original user message | | `getSystemPromptOptional()` | `Optional` | System prompt wrapped in `Optional` for LLMClient convenience | | `getStrategies()` | `List` | List of applied strategies | | `getOutputFormat()` | `Optional` | The output format, if specified | | `requiresMultiPass()` | `boolean` | `true` if any applied strategy is multi-pass | | `requiresPostProcessing()` | `boolean` | `true` if any strategy needs post-processing | | `expectsConfidenceScore()` | `boolean` | `true` if `CONFIDENCE_WEIGHTED` is applied | | `expectsStructuredThinking()` | `boolean` | `true` if `STRUCTURED_THINKING` is applied | | `expectsVerification()` | `boolean` | `true` if `CHAIN_OF_VERIFICATION` is applied | | `getCombinedPrompt()` | `String` | System prompt + user message in a single string (for models without separate system prompt support) | | `getEstimatedOverhead()` | `int` | Estimated additional tokens from enhancement (\~4 chars/token) | ### Using EnhancedPrompt with LLMClient Here is how to pass the enhanced prompt to your LLM client. Most providers accept a separate system prompt; for those that do not, use `getCombinedPrompt()` to get a single string. ```java EnhancedPrompt prompt = enhancer.enhance("What causes inflation?"); // With separate system prompt support ChatResponse response = client.chat( prompt.getUserMessage(), prompt.getSystemPromptOptional(), Optional.empty(), Optional.empty() ); // Without system prompt support String combined = prompt.getCombinedPrompt(); ``` ## Integration with AgentBuilder You can wire prompt strategies directly into an agent through the `AgentBuilder`, so every message the agent processes is automatically enhanced. There are three approaches: adding individual strategies, adding a list of strategies, or providing a fully configured `PromptEnhancer`. ```java // Add individual strategies Agent agent = AgentBuilder.create() .llm(llmClient) .role(myRole) .promptStrategy(PromptStrategy.CHAIN_OF_THOUGHT) .promptStrategy(PromptStrategy.CONFIDENCE_WEIGHTED) .build(); // Add multiple strategies at once Agent agent = AgentBuilder.create() .llm(llmClient) .promptStrategies(List.of( PromptStrategy.STRUCTURED_THINKING, PromptStrategy.MULTI_PERSPECTIVE )) .build(); // Use a fully configured PromptEnhancer PromptEnhancer enhancer = PromptEnhancer.builder() .role("Expert researcher") .objective("Provide accurate information") .strategy(PromptStrategy.CHAIN_OF_VERIFICATION) .constraint("Always cite sources") .positiveExample("Question", "Good answer", "Why it's good") .build(); Agent agent = AgentBuilder.create() .llm(llmClient) .promptEnhancer(enhancer) .build(); ``` | AgentBuilder Method | Description | | ----------------------------------------- | -------------------------------- | | `.promptStrategy(PromptStrategy)` | Adds a single strategy | | `.promptStrategies(List)` | Adds multiple strategies at once | | `.promptEnhancer(PromptEnhancer)` | Sets a fully configured enhancer | ## Code Examples These examples show how to apply different strategies to real-world use cases. Each one demonstrates a different prompting technique suited to the task at hand. ### Chain-of-Thought for Math Chain-of-Thought prompting asks the model to show its step-by-step reasoning before giving a final answer. This significantly improves accuracy on math, logic, and multi-step analysis tasks. ```java PromptEnhancer enhancer = PromptEnhancer.builder() .strategy(PromptStrategy.CHAIN_OF_THOUGHT) .role("Mathematics tutor") .outputFormat(OutputFormat.STRUCTURED) .build(); EnhancedPrompt prompt = enhancer.enhance( "A train travels 120km in 2 hours. It then travels 180km in 3 hours. " + "What is its average speed for the entire journey?" ); ``` ### Chain-of-Verification for Fact Checking Chain-of-Verification (CoVe) makes the model generate an initial answer, then create verification questions to check its own claims, and finally refine the answer based on what it finds. This is a multi-pass strategy that dramatically reduces factual errors. ```java PromptEnhancer enhancer = PromptEnhancer.builder() .strategy(PromptStrategy.CHAIN_OF_VERIFICATION) .verificationQuestions(5) .constraint("Each claim must be independently verifiable") .build(); EnhancedPrompt prompt = enhancer.enhance("What were the causes of World War I?"); if (prompt.requiresMultiPass()) { // Handle multi-pass verification flow } ``` ### Atom-of-Thought for Complex Reasoning Atom-of-Thought decomposes a complex problem into independent "atoms" that can be solved in parallel, then synthesizes the results. Unlike Chain-of-Thought, errors in one atom do not cascade to others. This works best with large models (70B+ parameters) and is combined here with confidence scoring. ```java PromptEnhancer enhancer = PromptEnhancer.builder() .strategy(PromptStrategy.ATOM_OF_THOUGHT) .strategy(PromptStrategy.CONFIDENCE_WEIGHTED) .confidenceThreshold(0.7f) .objective("Analyze system architecture tradeoffs") .build(); EnhancedPrompt prompt = enhancer.enhance( "Compare microservices vs monolith for a 10-person startup " + "building a real-time analytics platform" ); // Estimated overhead: prompt.getEstimatedOverhead() tokens ``` ### Few-Shot with Examples Few-shot prompting teaches the model by example. You provide a few input/output pairs (both good and bad), and the model learns the expected format and quality from them. This is especially effective for classification, formatting, and style-matching tasks. ```java PromptEnhancer enhancer = PromptEnhancer.builder() .strategy(PromptStrategy.FEW_SHOT_EXAMPLES) .positiveExample( "The food was great", "Sentiment: POSITIVE (0.95)", "Clear positive language" ) .positiveExample( "Terrible service, never again", "Sentiment: NEGATIVE (0.98)", "Strong negative indicators" ) .negativeExample( "The food was great", "positive", "Missing confidence score and proper format" ) .outputFormat(OutputFormat.TEXT) .build(); EnhancedPrompt prompt = enhancer.enhance("The product works but could be better"); ``` --- # Streaming URL: https://tnsai.dev/docs/agents/behavior/streaming Description: TnsAI supports three streaming modes for real-time token delivery from LLM providers. import { Callout } from 'fumadocs-ui/components/callout' ## Token Streaming Returns text tokens as they are generated — simplest mode: ```java Stream tokens = agent.streamChat("Explain relativity"); tokens.forEach(System.out::print); ``` ## ChatChunk Streaming Returns typed chunks with metadata (token counts, finish reason, tool calls): ```java llmClient.streamChatWithSpec(request).forEach(chunk -> { switch (chunk.getType()) { case START -> System.out.println("Stream started: " + chunk.getModel()); case CONTENT -> System.out.print(chunk.getContent()); case TOOL_CALL -> handleToolCall(chunk.getToolCall().orElseThrow()); case DONE -> System.out.println("\nTokens: " + chunk.getTokenCount()); case ERROR -> System.err.println("Error: " + chunk.getContent()); } }); ``` ### Chunk Types Each `ChatChunk` has a type that tells you what kind of data it carries. Your code should handle each type to respond appropriately as the stream progresses. | Type | Description | | ----------- | ------------------------------------- | | `START` | Stream initialization with model info | | `CONTENT` | Text content delta | | `TOOL_CALL` | Tool/function invocation request | | `DONE` | Stream complete with finish reason | | `ERROR` | Error occurred during streaming | ### Finish Reasons When a stream ends, the `DONE` chunk includes a finish reason that explains why the LLM stopped generating. This helps you decide what to do next -- for example, if the reason is `TOOL_CALLS`, you need to execute the requested tool and feed the result back. | Reason | Description | | ---------------- | ----------------------- | | `STOP` | Natural completion | | `LENGTH` | Max tokens reached | | `TOOL_CALLS` | LLM wants to call tools | | `CONTENT_FILTER` | Content was filtered | | `ERROR` | Error during generation | ## Handler-Based Streaming Callback pattern with full tool-call loop — ideal for UI integration: ```java llmClient.streamChatWithHandler(request, chunk -> { if (chunk.isContent()) { System.out.print(chunk.getContent()); } else if (chunk.isToolCall()) { // Framework handles tool execution automatically } else if (chunk.isDone()) { System.out.println("\nFinish reason: " + chunk.getFinishReason()); } }); ``` ## Convenience Methods `ChatChunk` provides static factory methods so you can create chunks without calling constructors directly. These are useful when you build custom streaming pipelines or write tests that simulate LLM output. ```java // ChatChunk factory methods ChatChunk.start(model, requestId); ChatChunk.content("Hello", tokenCount, index); ChatChunk.content("Hello"); ChatChunk.toolCall(toolCallObject); ChatChunk.done(FinishReason.STOP, totalTokens); ChatChunk.error("Something went wrong"); ``` ## Which Mode to Use? TnsAI offers three streaming modes at different levels of abstraction. Pick the simplest one that meets your needs. | Mode | Use When | | -------------------- | --------------------------------------------------- | | **Token Stream** | Simple text display, CLI output | | **ChatChunk Stream** | Need metadata (tokens, model), manual tool handling | | **Handler-Based** | UI integration, automatic tool execution loop | ## Async Execution The `AsyncAgent` interface (`com.tnsai.agents.async.AsyncAgent`) provides non-blocking chat operations with multiple consumption patterns. ### Methods `AsyncAgent` exposes several ways to consume responses. Choose based on whether you need simple text, typed events, or reactive backpressure control. | Method | Return Type | Description | | ----------------------------- | --------------------------- | ----------------------------------------------------------- | | `chatAsync(message)` | `CompletableFuture` | Async chat, completes with full response | | `chatAsync(message, options)` | `CompletableFuture` | Async chat with `ChatOptions` | | `chatStream(message)` | `Stream` | Streaming tokens as a Java Stream | | `chatEventStream(message)` | `Stream` | Typed event stream (tokens, tool calls, etc.) | | `chatPublisher(message)` | `Flow.Publisher` | Reactive Streams publisher for backpressure-aware consumers | | `cancel()` | `void` | Cancels any ongoing async operation | | `isProcessing()` | `boolean` | True if an async operation is in progress | | `getProgress()` | `double` | Execution progress (0.0 - 1.0) | ### CompletableFuture The simplest async pattern. `chatAsync` returns a `CompletableFuture` that completes with the full response string once the LLM finishes generating. Use this when you do not need to show partial results to the user. ```java AsyncAgent agent = new MyAsyncAgent(); agent.chatAsync("Tell me about Java") .thenAccept(response -> System.out.println(response)) .exceptionally(e -> { e.printStackTrace(); return null; }); ``` ### Token Stream Returns a `Stream` that emits each text token as it arrives. This lets you print tokens to the console (or a UI) incrementally instead of waiting for the full response. ```java agent.chatStream("Tell me a story") .forEach(token -> System.out.print(token)); ``` ### Typed Event Stream `ChatEvent` subtypes distinguish tokens from tool calls and other events: ```java agent.chatEventStream("Complex task") .forEach(event -> { if (event instanceof ChatEvent.Token t) { System.out.print(t.content()); } else if (event instanceof ChatEvent.ToolCall tc) { System.out.println("Calling tool: " + tc.toolName()); } }); ``` ### Reactive Publisher For backpressure-aware consumers using `java.util.concurrent.Flow`: ```java agent.chatPublisher("Generate a report") .subscribe(new Flow.Subscriber<>() { private Flow.Subscription subscription; @Override public void onSubscribe(Flow.Subscription s) { this.subscription = s; s.request(1); } @Override public void onNext(ChatEvent event) { process(event); subscription.request(1); } @Override public void onError(Throwable t) { t.printStackTrace(); } @Override public void onComplete() { System.out.println("Done"); } }); ``` ### Cancellation You can cancel a running async operation at any time. This is useful for timeout handling or when the user navigates away from a page before the response finishes. ```java CompletableFuture future = agent.chatAsync("Long running task"); // Cancel if still running if (agent.isProcessing()) { agent.cancel(); } ``` --- # Agent Variants URL: https://tnsai.dev/docs/agents/behavior/variants Description: Agent variants let you trade off between response quality, execution speed, and token cost. A single agent can switch variants at runtime -- per task or per action. import { Callout } from 'fumadocs-ui/components/callout' ## AgentVariant Enum The four variant tiers represent different quality/speed/cost tradeoffs. Pick the one that matches your task, or use `AUTO` to let the framework decide at runtime based on task complexity. Defined in `com.tnsai.enums.AgentVariant`. Four tiers: | Variant | Quality | Speed | Cost | Best For | | -------- | -------------- | ------------ | ------------ | -------------------------------------------------- | | `HIGH` | Max (1.0) | Slow (0.3) | High (1.0) | Complex refactoring, security review, architecture | | `MEDIUM` | Balanced (0.7) | Normal (0.6) | Medium (0.5) | Regular development, feature implementation | | `MINI` | Basic (0.4) | Fast (1.0) | Low (0.2) | Quick fixes, typo corrections, simple queries | | `AUTO` | Adaptive | Adaptive | Optimal | Production environments with varied workloads | ### Helper methods These convenience methods let you check what a variant prioritizes without comparing enum values directly. ```java variant.isQualityFocused(); // true for HIGH, MEDIUM variant.isSpeedFocused(); // true for MINI variant.isCostOptimized(); // true for MINI, AUTO ``` ### Task-based suggestion If you are not sure which variant to use, `forTask()` analyzes the task description and suggests one based on keyword matching. This is a simple heuristic -- for smarter auto-selection, use `VariantManager` with auto mode enabled. ```java AgentVariant.forTask("Refactor the auth system"); // HIGH AgentVariant.forTask("Fix a typo in README"); // MINI AgentVariant.forTask("Implement login page"); // MEDIUM ``` Keywords that push toward HIGH: `refactor`, `architect`, `complex`, `critical`, `security`, `review`. Keywords that push toward MINI: `typo`, `fix`, `simple`, `quick`, `minor`, `small`. ## VariantSpec Each variant tier has a `VariantSpec` that defines its concrete settings: which LLM model to use, token limits, available tools, timeout, retry count, and temperature. You can use the built-in specs or build a custom one for your specific models and requirements. ### Predefined specs The built-in specs for each tier ship with sensible defaults for model selection, token limits, and timeouts. These are what you get when you use a variant without customization. | | HIGH | MEDIUM | MINI | | --------------------- | ------------------------- | ---------------------------- | ------------------------------------ | | **Preferred model** | `claude-opus-4` | `claude-sonnet-4` | `claude-haiku-3` | | **Fallback models** | `gpt-4`, `gemini-1.5-pro` | `gpt-4o`, `gemini-1.5-flash` | `gpt-4o-mini`, `gemini-1.5-flash-8b` | | **Max input tokens** | 128,000 | 64,000 | 32,000 | | **Max output tokens** | 16,384 | 8,192 | 4,096 | | **Tool set** | ALL | STANDARD | MINIMAL | | **Timeout** | 10 min | 5 min | 2 min | | **Max retries** | 3 | 2 | 1 | | **Temperature** | 0.7 | 0.5 | 0.3 | | **Streaming** | Yes | Yes | No | `AUTO` defaults to the MEDIUM spec and adjusts dynamically at runtime. ### Using predefined specs Retrieve the built-in spec for a variant tier with `VariantSpec.forVariant()` and query its settings. ```java VariantSpec highSpec = VariantSpec.forVariant(AgentVariant.HIGH); String model = highSpec.getPreferredModel(); // "claude-opus-4" int inputTokens = highSpec.getMaxInputTokens(); // 128000 Duration timeout = highSpec.getTimeout(); // PT10M ``` ### Building a custom spec When the built-in specs do not match your environment (different models, different limits), build a custom one. Custom specs are immutable -- once built, they cannot be changed. ```java VariantSpec custom = VariantSpec.builder() .variant(AgentVariant.HIGH) .preferredModel("claude-opus-4") .fallbackModels(List.of("gpt-4", "gemini-1.5-pro")) .maxInputTokens(128000) .maxOutputTokens(16384) .toolSet(VariantSpec.ToolSet.ALL) .timeout(Duration.ofMinutes(10)) .maxRetries(3) .temperature(0.7) .enableStreaming(true) .addSetting("customKey", "value") .build(); ``` ### Model resolution When the preferred model is unavailable (API outage, not provisioned), `getEffectiveModel` automatically falls back to the next available model from the fallback list. ```java Set available = Set.of("gpt-4o", "claude-haiku-3"); String model = highSpec.getEffectiveModel(available); // "gpt-4o" (fallback) ``` ### Immutable copies Since specs are immutable, changing a field returns a new `VariantSpec` instance. The original is not modified. ```java VariantSpec modified = spec.withVariant(AgentVariant.MEDIUM); VariantSpec remodeled = spec.withModel("gpt-4-turbo"); ``` ### ToolSet levels The `ToolSet` controls which tools are available to the agent in a given variant. Lower tiers restrict tool access to reduce cost and latency. | Level | Description | | ---------- | -------------------- | | `ALL` | All available tools | | `STANDARD` | Most common tools | | `MINIMAL` | Essential tools only | | `NONE` | No tools | ## VariantManager The `VariantManager` handles variant switching at runtime. It can operate in manual mode (you choose the variant) or auto mode (it analyzes each task and picks the best tier). It also tracks usage statistics per variant, so you can see how often each tier is used and how well it performs. ### Creating a manager Create a `VariantManager` with a default variant. If not specified, it defaults to `MEDIUM`. ```java VariantManager manager = new VariantManager(); // defaults to MEDIUM VariantManager manager = new VariantManager(AgentVariant.HIGH); // explicit initial ``` ### Manual switching Explicitly set the variant when you know what quality level the next task needs. ```java manager.setVariant(AgentVariant.HIGH); AgentVariant current = manager.getCurrentVariant(); // HIGH VariantSpec spec = manager.getCurrentSpec(); // VariantSpec for HIGH ``` ### Auto mode In auto mode, the manager analyzes each task description and automatically switches to the most appropriate variant. This is ideal for production environments where tasks vary in complexity. ```java manager.setAutoMode(true); // Analyzes task complexity (0-10 score) and switches automatically AgentVariant suggested = manager.suggestVariant("Refactor the authentication system"); // suggested = HIGH, manager now using HIGH spec ``` Complexity scoring adds/subtracts from a base score of 5. Score `>= 7` returns HIGH, score `<= 3` returns MINI, otherwise MEDIUM. Task length is also considered (`>200` chars adds 1, `<50` chars subtracts 1). ### Custom specs per variant Override the default spec for any variant tier. This is useful when you want to use a different model or different token limits for a specific tier in your environment. ```java VariantSpec custom = VariantSpec.builder() .variant(AgentVariant.HIGH) .preferredModel("my-custom-model") .maxInputTokens(200000) .build(); manager.setVariantSpec(AgentVariant.HIGH, custom); ``` ### Change listeners Register callbacks to be notified whenever the variant changes, whether manually or through auto mode. This is useful for logging, metrics, or adjusting other system behavior based on the active variant. ```java // Register VariantManager.Registration reg = manager.onVariantChange(event -> { System.out.printf("Variant: %s -> %s (reason: %s)%n", event.previous(), event.current(), event.reason()); }); // Unregister reg.unregister(); ``` The `VariantChangeEvent` record contains `previous`, `current`, and `reason` (either `"manual"` or `"auto:"`). ### Usage statistics The manager tracks task count, success rate, and timing per variant. Use this data to understand your cost distribution and identify variants that are underperforming. ```java manager.recordTask(AgentVariant.HIGH, 1500, true); VariantManager.VariantStats stats = manager.getStats(AgentVariant.HIGH); stats.getTaskCount(); // total tasks stats.getSuccessRate(); // 0.0 - 1.0 stats.getAverageDurationMs(); // average task time stats.getMinDurationMs(); stats.getMaxDurationMs(); // All stats Map all = manager.getAllStats(); ``` ## @Variant Annotation Some actions always need a specific quality level regardless of the agent's current setting -- a security audit should always use HIGH, while a text formatter can always use MINI. The `@Variant` annotation locks an action method to a specific variant tier. The framework temporarily switches to that variant for the duration of the action, then restores the previous one. | Attribute | Type | Default | Description | | ------------- | -------------- | ---------- | -------------------------------------------------- | | `value` | `AgentVariant` | (required) | Variant to use | | `reason` | `String` | `""` | Documentation for variant choice | | `recordStats` | `boolean` | `true` | Track usage statistics | | `fallback` | `AgentVariant` | `MEDIUM` | Fallback if primary variant's model is unavailable | ```java // Force HIGH for security-critical action @ActionSpec(type = ActionType.LLM, description = "Security analysis") @Variant(AgentVariant.HIGH) public String analyzeSecurityRisks(String code) { return "Analyze security: " + code; } // Use MINI for a quick utility @ActionSpec(type = ActionType.LOCAL, description = "Format text") @Variant(AgentVariant.MINI) public String formatText(String text) { return text.trim(); } // Let the framework auto-select based on input @ActionSpec(type = ActionType.LLM, description = "Code review") @Variant(value = AgentVariant.AUTO, reason = "Complexity varies by input size") public String reviewCode(String code) { return "Review: " + code; } ``` Actions without `@Variant` use the agent's current variant. The annotation only affects the specific method it decorates. ## Full Example This end-to-end example shows how to set up an agent with auto variant selection, log all variant switches, run tasks of varying complexity, override the variant at runtime, and check usage statistics afterward. ```java // Configure agent with variant support VariantManager variantManager = new VariantManager(AgentVariant.AUTO); variantManager.setAutoMode(true); // Log all variant switches variantManager.onVariantChange(event -> log.info("Variant {} -> {} ({})", event.previous(), event.current(), event.reason())); Agent agent = AgentBuilder.create() .withVariant(AgentVariant.AUTO) .llm(new OpenAIClient()) .build(); // Simple task -- framework auto-selects MINI agent.chat("Fix the typo in line 42"); // Complex task -- framework auto-selects HIGH agent.chat("Refactor the authentication module for OAuth2 support"); // Override at runtime agent.setVariant(AgentVariant.HIGH); agent.chat("Critical security audit of payment processing"); // Check statistics VariantManager.VariantStats highStats = variantManager.getStats(AgentVariant.HIGH); System.out.printf("HIGH: %d tasks, %.0f%% success, avg %dms%n", highStats.getTaskCount(), highStats.getSuccessRate() * 100, highStats.getAverageDurationMs()); ``` --- # Action System URL: https://tnsai.dev/docs/agents/fundamentals/action-system Description: The action system is the execution backbone of TnsAI. When an LLM decides to call a function, or an agent needs to perform work, the request flows through ActionExecutor, which routes it to the appropriate executor based on the action's ActionType. import { Callout } from 'fumadocs-ui/components/callout' ## Architecture Overview ``` Agent.executeAction(name, params) | v ActionExecutor.execute(action, role, params, context) | +-- 1. Validate inputs +-- 2. Check @ApprovalRequired +-- 3. Route by ActionType: | LOCAL -> Reflection invocation on Role | WEB_SERVICE -> WebServiceExecutor (HTTP) | LLM -> LLMRoleExecutor (single-shot LLM call) | MCP_TOOL -> McpToolExecutor (MCP protocol) +-- 4. Return result or wrap in ActionExecutionException ``` Tool dispatch — when the LLM emits a tool call during an `LLM` action — is handled separately by the agent's `ToolMethodDispatcher`, which is built from every POJO and dynamic tool registered with `AgentBuilder`. ## ActionType Enum `com.tnsai.enums.ActionType` defines the four execution methods: | Value | Executor | Description | | ------------- | -------------------- | -------------------------------------------------------------------------- | | `LOCAL` | Reflection | Direct Java method invocation on the Role | | `WEB_SERVICE` | `WebServiceExecutor` | HTTP REST API calls | | `LLM` | `LLMRoleExecutor` | Single-shot LLM call; tool dispatch via the agent's `ToolMethodDispatcher` | | `MCP_TOOL` | `McpToolExecutor` | Model Context Protocol server calls | ## ActionExecutor `com.tnsai.actions.ActionExecutor` is the central, thread-safe dispatcher. ### Construction ```java // Built by AgentBuilder using the agent's tool registry ActionExecutor executor = new ActionExecutor(toolMethodDispatcher); ``` Default executors are registered automatically: `WebServiceExecutor` for `WEB_SERVICE`, `LLMRoleExecutor` for `LLM`, and `McpToolExecutor` for `MCP_TOOL` (if tnsai-mcp is on classpath). The `ToolMethodDispatcher` is built from `AgentBuilder.builtInTools(...)`, `.toolPojos(...)`, and `.dynamicTool(...)` registrations. ### Execution Flow 1. **Validation** -- null checks on action, role, parameters 2. **Approval check** -- if the action method has `@ApprovalRequired`, verifies `_approval_token` is present in parameters 3. **Routing** -- `LOCAL` actions are invoked via reflection; others delegate to `TypedActionExecutor` 4. **Error wrapping** -- all exceptions become `ActionExecutionException` with a category (`PARAMETER`, `INVOCATION`, `NETWORK`, `UNKNOWN`) ### Approval-Required Actions ```java // In a Role class @ApprovalRequired @ActionSpec(description = "Deploy to production") public String deploy(String target) { ... } // Calling with approval token Map params = Map.of( "target", "production", "_approval_token", approvalService.getToken() ); agent.executeAction("deploy", params); ``` If the token is missing, `ApprovalRequiredException` is thrown. ## TypedActionExecutor Interface `com.tnsai.actions.executors.TypedActionExecutor` is the extension point for custom execution strategies. ```java public interface TypedActionExecutor { Object execute( ActionMetadata action, Role role, Map parameters, Map context ); } ``` The `context` map provides runtime data: `"llm"` (LLMClient), `"agent"` (Agent reference), `"mcpToolName"` (for MCP routing). ## Executor Types ### WebServiceExecutor Handles `WEB_SERVICE` actions by making HTTP calls using OkHttp. Features: - URL template variables: `{city}`, `{id}` in endpoints are replaced from parameters - Parameter types: `QUERY` (URL params), `PATH` (URL path segments), `BODY` (JSON payload) - Authentication: `BEARER` and `BASIC` via environment variables - Custom headers via `@Header` annotations - Configurable per-action timeout ```java @ActionSpec( type = ActionType.WEB_SERVICE, endpoint = "https://api.weather.com/v1/forecast/{city}", httpMethod = HttpMethod.GET, paramType = ParamType.QUERY, auth = AuthType.BEARER, authToken = "WEATHER_API_KEY", timeout = 5000 ) @Header(key = "Accept", value = "application/json") public Object getWeather(String city) { return null; } ``` ### LLMRoleExecutor Handles `LLM` actions with a single-shot LLM call. The action's prompt comes from the method body's return value (the convention is to return a String describing what the LLM should do); the executor sends that prompt to the agent's LLM client and returns the raw response. Tool dispatch is **not** done by this executor. If the LLM emits a tool call in its response, dispatch flows through the agent's `ToolMethodDispatcher`, which is built once from the `AgentBuilder.builtInTools(...)` / `.toolPojos(...)` / `.dynamicTool(...)` registrations and shared across every `LLM` action. Per-action overrides: | `@ActionSpec` field | Effect | | ------------------- | -------------------------------------------------------------------------------------------------------- | | `llmSystemPrompt` | Prepended as the system message of the chat request, overriding the LLM client's default for this action | | `llmTemperature` | When `>= 0`, sets the chat temperature; `-1.0f` (default) means "fall back to the LLM client's default" | ### McpToolExecutor Handles `MCP_TOOL` actions by connecting to remote MCP servers. Features: - Automatic tool discovery via `tools/list` - Connection caching per endpoint - API key support via environment variables - Reflection-based MCP client creation (no compile-time dependency on tnsai-mcp) ```java @ActionSpec( type = ActionType.MCP_TOOL, serverUrl = "https://mcp.api.coingecko.com/mcp", description = "Access cryptocurrency data" ) public String cryptoData(String query) { return null; } ``` Key methods: | Method | Description | | -------------------------------------------------- | ---------------------------------- | | `discoverTools(String endpoint, String apiKeyEnv)` | Discover tools from MCP server | | `getEndpointForTool(String toolName)` | Find which endpoint handles a tool | | `close()` | Disconnect all cached MCP clients | ## ActionContract `com.tnsai.actions.contracts.ActionContract` provides optional pre/post condition validation for roles. ```java public interface ActionContract { default void validatePreConditions(ActionMetadata action, Map parameters) throws ValidationException { } default void validatePostConditions(ActionMetadata action, Object result) throws ValidationException { } default void validateInvariants() throws ValidationException { } } ``` A role implements this interface to enforce contracts: ```java public class OrderRole extends Role implements ActionContract { @Override public void validatePreConditions(ActionMetadata action, Map params) throws ValidationException { if ("placeOrder".equals(action.getName())) { if (!params.containsKey("items")) { throw new ValidationException("items parameter required"); } } } } ``` ## TypeConverter `com.tnsai.actions.TypeConverter` handles automatic parameter type conversion in action invocations. ### Supported Conversions | Source | Target Types | | --------------------- | ------------------------------------------------- | | `String` | `int`, `long`, `double`, `float`, `boolean`, Enum | | `Number` | `int`, `long`, `double`, `float` | | `Map` | Any POJO/record (via Jackson) | ```java // String to int Object result = TypeConverter.convert("42", int.class); // 42 // Number to int Object result = TypeConverter.convert(42L, int.class); // 42 // Map to POJO record UserDto(String name, int age) {} Map params = Map.of("name", "Alice", "age", 30); UserDto user = TypeConverter.convertMapToPojo(params, UserDto.class); ``` Utility methods: | Method | Description | | -------------------------------------------- | ------------------------------------------- | | `convert(Object value, Class targetType)` | Convert a value to the target type | | `convertMapToPojo(Map, Class)` | Convert a map to a POJO via Jackson | | `isPrimitiveOrWrapper(Class)` | Check if type is primitive or wrapper | | `isSimpleType(Class)` | Check if type does not need POJO conversion | Enum conversion is case-insensitive: `TypeConverter.convert("get", HttpMethod.class)` matches `HttpMethod.GET`. ## ActionRequest and ActionResponse Typed records that replace raw `Map` and untyped `Object` returns. ### ActionRequest `com.tnsai.actions.model.ActionRequest` -- immutable request record. ```java // With parameters ActionRequest request = ActionRequest.of("searchWeb", Map.of( "query", "Java frameworks", "maxResults", 10 )); // Without parameters ActionRequest request = ActionRequest.of("getStatus"); ``` Fields: `actionName` (required, non-blank), `parameters` (never null, defensively copied). ### ActionResponse `com.tnsai.actions.model.ActionResponse` -- immutable response record. ```java // Success ActionResponse response = ActionResponse.success(resultData); // Failure ActionResponse response = ActionResponse.failure("Connection timeout"); // Failure with partial result ActionResponse response = ActionResponse.failure("Partial data received", partialData); ``` Fields: `value` (result object), `success` (boolean), `error` (String, null on success). ### Usage with Agent ```java ActionRequest request = ActionRequest.of("searchWeb", Map.of("query", "TnsAI")); ActionResponse response = agent.executeAction(request); if (response.success()) { Object data = response.value(); } else { logger.error("Failed: {}", response.error()); } ``` ## Related Documentation - [Roles](/docs/agents/fundamentals/roles) -- defining roles with `@ActionSpec` annotations - [Capabilities](/docs/agents/fundamentals/capabilities) -- reusable body-less `@ActionSpec` contracts via `@Capability` interfaces - [Tools](/docs/capabilities/tools/registration) -- registering POJO toolkits the LLM can call - [Advanced Tools](/docs/capabilities/tools/registration-advanced) -- filters, listeners, and dispatcher introspection - [SPI Reference](/docs/reference/spi) -- SPI interfaces for cross-module extensibility --- # Capabilities URL: https://tnsai.dev/docs/agents/fundamentals/capabilities Description: Capabilities are reusable, body-less action contracts. A @Capability interface carries one or more @ActionSpec-annotated methods that describe what the capability does; the framework dispatches the call at runtime. A role gains the capability by implements-ing the interface — without writing any method bodies. import { Callout } from 'fumadocs-ui/components/callout' This page covers the pattern, composition, override rules, validation, and migration from the legacy `return null;` style. *Available from 0.3.1. The legacy pattern (concrete methods with `return null;` bodies) was removed in 0.5.0.* ## The Problem Before capabilities, every dispatched `@ActionSpec` method required a `return null;` body: ```java @ActionSpec( type = ActionType.LLM, description = "Summarise the text in one short paragraph", llmSystemPrompt = "You are concise.", llmTemperature = 0.2f ) public String summarize(String text) { return null; // body unused — LLMRoleExecutor handles dispatch } ``` The body is dead code: [`ActionExecutor`](/docs/agents/fundamentals/action-system) skips method invocation entirely for `LLM`/`MCP_TOOL`/`WEB_SERVICE` actions when there is no `ActionResult` parameter. Yet the method declaration forces three problems: - **Misleading signature** — static analysis marks the method as "always returns null"; downstream `@NonNull` checks inferred from this are bogus. - **Silent failure on bypass** — anyone who invokes the method outside the dispatch path (direct `role.summarize(x)`, test code, misconfigured filter) gets `null`. This looks identical to a genuine LLM failure and makes debugging painful. - **Duplicated specification** — every role that wants the same capability copy-pastes the `@ActionSpec` annotation. Drift is inevitable. ## The Pattern Move the method into a `@Capability` interface. The interface's `default` body throws `Actions.dispatchedByFramework()` — a loud, framework-owned marker that never executes in the normal dispatch path: ```java @Capability public interface Summarizer { @ActionSpec( type = ActionType.LLM, description = "Summarise the text in one short paragraph", llmSystemPrompt = "You are concise.", llmTemperature = 0.2f ) default String summarize(String text) { throw Actions.dispatchedByFramework(); } } ``` A role picks up the capability by implementing the interface — with no method bodies of its own: ```java @RoleIdentity(name = "Editor", goal = "Produce clean, readable articles") public class EditorRole extends Role implements Summarizer { // State, lifecycle, and other non-capability methods live here. // No `summarize` body — inherited from Summarizer. } ``` When the LLM calls `summarize`, `ActionExecutor` discovers the method through the capability interface and routes dispatch through the `LLM` executor — exactly as it would for a concrete `@ActionSpec` method. The default `throw` body is never executed in the normal path; it only fires if someone bypasses dispatch and invokes the method directly, producing a clear `DispatchedByFrameworkException` instead of a silent `null`. ## Composition A role can implement any number of capability interfaces. Each contributes its actions to the role: ```java @Capability public interface Translator { @ActionSpec(type = ActionType.LLM, description = "Translate the text to the target language") default String translate(String text, String targetLanguage) { throw Actions.dispatchedByFramework(); } } @Capability public interface Classifier { @ActionSpec(type = ActionType.LLM, description = "Classify input as POSITIVE / NEGATIVE / NEUTRAL") default String classifySentiment(String text) { throw Actions.dispatchedByFramework(); } } @RoleIdentity(name = "Assistant", goal = "Handle ad-hoc requests") public class AssistantRole extends Role implements Summarizer, Translator, Classifier { // Three capabilities, zero bodies. All actions ready for LLM dispatch. } ``` `ActionDiscovery` walks the role's interface chain — including super-interfaces of capabilities (`MultilingualSummarizer extends Summarizer`) — so every capability contributes its methods regardless of whether it arrives via direct implementation or transitive extension. ## Override — Role Declaration Wins If a role declares the same signature as a capability's default method, the role's version is what ends up in the discovered action list. Use this to drop dispatch entirely for a specific role and provide a deterministic local implementation: ```java public class StrictEditor extends Role implements Summarizer { // Replaces Summarizer.summarize's LLM dispatch with a deterministic local impl @Override @ActionSpec(type = ActionType.LOCAL, description = "Truncate-first summarise (deterministic)") public String summarize(String text) { return text.length() <= 60 ? text : text.substring(0, 60) + "..."; } } ``` Discovery sees `summarize` declared on the concrete class first, records its signature, and then skips the capability's default when walking the interface chain. Exactly one `summarize` action ends up in the role's metadata, and its type is `LOCAL` rather than `LLM`. ## Validation Two rules are enforced at action-discovery time; violations throw `IllegalStateException` with a message naming the offending interface and method. ### Capability methods must be `default` An abstract method on a `@Capability` interface would force every adopting role to write a body — defeating the point of the annotation. The error message points at the correct body: ```text @Capability interface com.example.Summarizer method summarize must be a default method. Abstract capability methods force every adopting role to write a body, defeating the purpose of the annotation. Use `default { throw Actions.dispatchedByFramework(); }` as the body. ``` ### Capability methods must not declare an `ActionResult` parameter Capabilities are pure dispatch. Post-processing (reading the LLM's raw response, running it through custom logic before returning) belongs on the concrete role class where per-role logic makes sense. The validator rejects the mixed case: ```text @Capability interface com.example.Summarizer method summarize must not declare an ActionResult parameter. Capabilities are pure dispatch — move post-processing (ActionResult-based) to a concrete method on the role class itself. ``` If you want post-processing, keep that specific method off the capability interface and declare it directly on the role (the legacy pattern). Capability-dispatched and role-owned methods coexist on the same role without interference. ## Migration To migrate a legacy role: 1. For each dispatched `@ActionSpec` method (LLM / MCP / WEB\_SERVICE) whose body is `return null;`, extract it to a `@Capability` interface. 2. Give the interface method a `default { throw Actions.dispatchedByFramework(); }` body. 3. On the role class, remove the original method entirely and add `implements YourCapability`. 4. Methods that have a meaningful body — typically because they declare an `ActionResult` parameter for post-processing — stay on the role class unchanged. ### Before ```java @RoleIdentity(name = "Editor", goal = "Produce clean articles") public class EditorRole extends Role { @ActionSpec(type = ActionType.LLM, description = "Summarise the text", llmSystemPrompt = "You are concise.", llmTemperature = 0.2f) public String summarize(String text) { return null; } @ActionSpec(type = ActionType.LLM, description = "Translate to the target language") public String translate(String text, String targetLanguage) { return null; } } ``` ### After ```java @Capability public interface Summarizer { @ActionSpec(type = ActionType.LLM, description = "Summarise the text", llmSystemPrompt = "You are concise.", llmTemperature = 0.2f) default String summarize(String text) { throw Actions.dispatchedByFramework(); } } @Capability public interface Translator { @ActionSpec(type = ActionType.LLM, description = "Translate to the target language") default String translate(String text, String targetLanguage) { throw Actions.dispatchedByFramework(); } } @RoleIdentity(name = "Editor", goal = "Produce clean articles") public class EditorRole extends Role implements Summarizer, Translator { // No capability bodies. } ``` The first time `Summarizer` is used by another role, the duplication problem is already solved — update the prompt once, every adopter gets the change. ## When Not to Use Capabilities - **`ActionType.LOCAL` methods** — these have real bodies that the framework invokes via reflection. They are not "framework-dispatched" and do not have the `return null;` problem. Leave them on the concrete role class. - **Methods that declare an `ActionResult` parameter** — they are validated-out of capability interfaces by design (see above). Keep them on the role. - **One-off methods used by a single role** with no prospect of reuse — extracting a capability interface for one consumer adds indirection without benefit. Capabilities shine when two or more roles share the same `@ActionSpec`. ## Related - [Action System](/docs/agents/fundamentals/action-system) — routing, `ActionType` enum, executor types. - [Roles](/docs/agents/fundamentals/roles) — role identity, responsibilities, lifecycle. ## Implementation References - `com.tnsai.capabilities.Capability` — the marker annotation (`@Target(TYPE)`, `@Retention(RUNTIME)`). - `com.tnsai.actions.Actions.dispatchedByFramework()` — helper returning `DispatchedByFrameworkException` (subtype of `UnsupportedOperationException`). - `com.tnsai.actions.ActionDiscovery` — two-pass discovery: role class first, then the capability interface chain. --- # Event System URL: https://tnsai.dev/docs/agents/fundamentals/events Description: The event system provides full observability into the agent lifecycle. Events use a sealed interface hierarchy with 20+ event types, enabling type-safe pattern matching. import { Callout } from 'fumadocs-ui/components/callout' ## Subscribing to Events To listen to what your agent is doing, pass an event callback to `chatWithEvents`. The callback receives every event the agent fires during the run, and you can use Java's pattern matching to handle only the ones you care about. ```java String response = agent.chatWithEvents("Do research on AI", event -> { switch (event) { case RunStartEvent e -> log.info("Agent run started"); case ActionStartEvent e -> log.info("Action: {}", e.actionName()); case ToolCallStartEvent e -> log.info("Tool: {}", e.toolName()); case ToolCallEndEvent e -> log.info("Result: {}", e.result()); case ErrorEvent e -> log.error("Error: {}", e.message()); case RunEndEvent e -> log.info("Run completed"); default -> {} // Other events } }); ``` ## Event Types Events are grouped into categories based on what part of the agent they relate to. Each event carries contextual data you can inspect in your handler. ### Lifecycle Events These events tell you when the agent starts and stops processing, and when it transitions between states. Use them for logging, timing, or coordinating external systems. | Event | When | | ------------------------ | --------------------------------- | | `RunStartEvent` | Agent begins processing a message | | `RunEndEvent` | Agent finishes processing | | `AgentStateChangedEvent` | Agent state transitions | ### Action Events Actions are the high-level steps an agent takes (for example, "search the web" or "write a file"). These events let you track when each action starts and finishes. | Event | When | | ------------------ | -------------------------- | | `ActionStartEvent` | Action execution begins | | `ActionEndEvent` | Action execution completes | ### Tool Events Tools are the concrete functions an agent can call (like an HTTP client or a file reader). These events fire each time a tool is invoked, so you can monitor tool usage, measure latency, or build dashboards. | Event | When | | -------------------- | ------------------------- | | `ToolCallStartEvent` | Tool invocation begins | | `ToolCallEndEvent` | Tool invocation completes | ### Communication Events These events cover messages flowing to and from the agent, as well as any custom events you emit yourself using the `@EventEmitter` annotation. | Event | When | | ------------------- | ---------------------------------------- | | `MessageEvent` | Agent sends or receives a message | | `EventEmitterEvent` | Custom event emitted via `@EventEmitter` | ### Error Events When something goes wrong during a run, these events let you react immediately. `ErrorEvent` signals a hard failure, while `WarningEvent` signals a recoverable issue the agent can continue past. | Event | When | | -------------- | --------------------------------- | | `ErrorEvent` | An error occurs during processing | | `WarningEvent` | A non-fatal issue is detected | ## Event Publisher If you need to fire your own events from inside agent roles or custom logic, grab the publisher from the agent and call `publish`. Any registered handler will receive your custom event just like a built-in one. ```java TnsAIEventPublisher publisher = agent.getEventPublisher(); publisher.publish(new CustomEvent("data")); ``` ## Event Handler Registration Instead of handling all events in one big switch block, you can register a handler for a single event type. This is useful for focused concerns like metrics collection or audit logging. ```java EventHandlerRegistry registry = agent.getEventHandlerRegistry(); registry.register(ToolCallStartEvent.class, event -> { metrics.increment("tool.calls"); }); ``` ## Annotation-Based Handlers For the cleanest approach, annotate a method with `@EventHandler` and TnsAI will wire it up automatically. No manual registry calls needed -- just declare the event type you want and write your logic. ```java @EventHandler(ToolCallEndEvent.class) public void onToolComplete(ToolCallEndEvent event) { log.info("Tool {} took {}ms", event.toolName(), event.duration()); } ``` --- # Fundamentals URL: https://tnsai.dev/docs/agents/fundamentals Description: The core moving parts of a single agent. This page covers the Agent class itself — construction, chat, memory, lifecycle. See the other pages in this section for Roles, the Action System, Capabilities, and Events. import { Callout } from 'fumadocs-ui/components/callout' An `Agent` is the top-level orchestrator in TnsAI. It owns an LLM client, one or more roles, a memory store, and an event system. Agents handle the full chat loop: receiving a message, consulting their roles for available actions, calling the LLM, executing tool calls, and returning a response. ## Quick Start The fastest way to create an agent is with `AgentBuilder`: ```java Agent agent = AgentBuilder.create() .llm(LLMClientFactory.create("openai", "gpt-4o", 0.7f)) .role(RoleBuilder.create() .name("Assistant") .goal("Help users with their questions") .build()) .build(); String response = agent.chat("What is BDI architecture?"); ``` For more control, extend the `Agent` class directly: ```java @AgentSpec(name = "ResearchAgent", description = "Conducts research") public class ResearchAgent extends Agent { @Override protected LLMClient getLLM() { return LLMClientFactory.create("anthropic", "claude-sonnet-4-20250514", 0.7f); } @Override protected List getRoles() { return List.of(Role.create(ResearchRole.class)); } } ``` ## Creating Agents There are two ways to create an agent: programmatically with `AgentBuilder`, or declaratively by extending the `Agent` class and using annotations. Use the builder when you want quick, inline setup. Use annotations when you want a reusable agent class with its configuration baked in. ### With AgentBuilder (programmatic) `AgentBuilder` lets you configure an agent in a single fluent chain. This is the best approach for simple agents or when you want to assemble an agent dynamically at runtime. ```java Agent agent = AgentBuilder.create() .id("agent-001") .llm(new OpenAIClient("gpt-4o")) .role(myRole) .roles(List.of(role1, role2)) .builtInTools(BuiltInTool.WEB_SEARCH_TOOLS, BuiltInTool.UTILITY_TOOLS) .toolPojos(new MyDomainTools()) .memoryStore(new InMemoryStore()) .maxContextTokens(8192) .build(); ``` ### With Annotations (declarative) If you prefer a class-per-agent design, extend `Agent` and use `@AgentSpec` and `@LLMSpec` annotations. This keeps configuration next to the code and makes agents easy to discover in your project. ```java @AgentSpec(name = "Analyst", description = "Data analysis agent") @LLMSpec(provider = "openai", model = "gpt-4o", temperature = 0.3f) public class AnalystAgent extends Agent { @Override protected List getRoles() { return List.of(Role.create(AnalystRole.class)); } } ``` ## Chat Methods Once you have an agent, you interact with it through chat methods. TnsAI provides several variants depending on whether you need conversation history, streaming output, or visibility into tool calls happening inside the agent loop. ```java // Simple chat — single turn, uses conversation history String response = agent.chat("Explain quantum computing"); // Chat without history String response = agent.chat("Translate this to French", false); // Streaming — returns tokens as they arrive Stream tokens = agent.streamChat("Write a poem about Java"); tokens.forEach(System.out::print); // Event-driven chat — full visibility into the agent loop String response = agent.chatWithEvents("Research AI safety", event -> { switch (event) { case ToolCallStartEvent e -> System.out.println("Calling: " + e.toolName()); case ToolCallEndEvent e -> System.out.println("Result: " + e.result()); case ErrorEvent e -> System.err.println("Error: " + e.message()); default -> {} } }); ``` ## Memory Management Agents automatically track conversation history so the LLM has context across turns. You can also inspect, modify, or prune this history directly when you need to manage token usage or reset a conversation. ```java // Get conversation history List> history = agent.getConversationHistory(); // Clear all history agent.clearHistory(); // Add a message manually agent.addToHistory("user", "Remember this context"); // Prune memory to fit within a token limit (removes oldest messages first) agent.getMemoryStore().prune(4096); ``` ## Lifecycle Agents have a start/stop lifecycle. Call `start()` to initialize the agent and `shutdown()` to release its resources. You can check whether an agent is active or inspect its health state at any time. ```java agent.start(); boolean running = agent.isRunning(); AgentHealthState health = agent.getHealthState(); agent.shutdown(); ``` ## Configuration Summary This table lists every property you can set on an agent through `AgentBuilder`. Only `llm` and at least one `role` are required; everything else has sensible defaults. | Property | Builder Method | Default | Description | | ----------------- | --------------------------------- | ---------------- | ---------------------------------------------------------------- | | ID | `.id(String)` | Auto-generated | Unique agent identifier | | LLM | `.llm(LLMClient)` | Optional | Language model client (omit for traditional agents) | | Roles | `.role(Role)` | Required | Agent roles | | Built-in toolkits | `.builtInTools(BuiltInTool...)` | Empty | Shipped POJO toolkits from `tnsai-tools` | | Custom toolkits | `.toolPojos(Object...)` | Empty | Your own POJOs with `@Tool` methods | | Runtime tools | `.dynamicTool(DynamicToolMethod)` | Empty | Tools whose identity is only known at runtime (e.g. MCP proxies) | | Memory | `.memoryStore(MemoryStore)` | `InMemoryStore` | Conversation memory | | Context limit | `.maxContextTokens(int)` | Provider default | Max context window | | Knowledge base | `.knowledgeBase(KnowledgeBase)` | None | RAG knowledge source | | Prompt strategy | `.promptStrategy(PromptStrategy)` | Default | Prompt enhancement | | Reasoning | `.reasoningStrategy(String)` | None | Reasoning strategy name | ## SPI Extension Points The Core module defines SPI interfaces that other modules implement. Extensions are discovered automatically via `ServiceLoader`: | SPI Interface | Purpose | Implementing Module | | ---------------------- | ---------------------------------- | ------------------- | | `MessageBroker` | Agent communication routing | Coordination | | `ResilienceStrategy` | Resilience pattern implementations | Quality | | `CognitiveModel` | Cognitive processing models | Intelligence | | `CheckpointerFactory` | State checkpointing | Custom | | `CheckpointerProvider` | Checkpoint storage backends | Custom | Register an SPI implementation by adding a file to `META-INF/services/`: ``` # META-INF/services/com.tnsai.spi.MessageBroker com.example.MyCustomMessageBroker ``` ## Next in this Section - [Roles](/docs/agents/fundamentals/roles) — Bundling capabilities into `Role` classes. - [Action System](/docs/agents/fundamentals/action-system) — `@ActionSpec` routing and executor types. - [Capabilities](/docs/agents/fundamentals/capabilities) — Reusable body-less action contracts via `@Capability` interfaces. - [Events](/docs/agents/fundamentals/events) — The agent's lifecycle event bus. --- # Roles URL: https://tnsai.dev/docs/agents/fundamentals/roles Description: A Role defines what an agent can do. Each role has an identity (name, goal, domain), a set of responsibilities, and discoverable actions. Roles generate the system prompt that instructs the LLM. Actions are methods annotated with @ActionSpec — they are discovered at runtime via reflection and routed to one of four executor types. import { Callout } from 'fumadocs-ui/components/callout' ## Creating Roles There are two ways to create a role: programmatically with `RoleBuilder`, or declaratively with annotations. Pick whichever style fits your project -- they produce the same result. ### With RoleBuilder (programmatic) Use `RoleBuilder` when you want to define a role inline -- for example in tests, scripts, or when the role configuration is loaded dynamically at runtime. ```java Role role = RoleBuilder.create() .name("Researcher") .goal("Find and synthesize information from academic sources") .domain("academic-research") .duty("Search papers", "Users need access to recent research") .duty("Summarize findings") .mustNever("Fabricate citations", "Academic integrity") .mustAlways("Include source references", "Traceability") .llm(new AnthropicClient("claude-sonnet-4-20250514")) .build(); ``` ### With Annotations (declarative) Use `@RoleSpec` when you want the role definition to live directly on the class. This is the preferred approach for production roles because everything -- name, capabilities, LLM config -- is visible at a glance. ```java @RoleSpec( name = "Researcher", description = "Finds and synthesizes academic information", beliefs = {"research_context", "available_sources"}, desires = {"find_papers", "synthesize_findings"}, intentions = {"search_database", "analyze_paper"}, capabilities = {"search", "analysis", "summarization"}, domains = {"academic", "research"}, responsibilities = { @Responsibility( name = "Paper Search", description = "Search academic databases", actions = {"searchPapers", "filterResults"} ) }, llm = @LLMConfig(provider = "anthropic", model = "claude-sonnet-4-20250514") ) public class ResearchRole extends Role { @Override public RoleIdentity getIdentity() { return new RoleIdentity("Researcher", "Find papers", "academic"); } @Override public List getResponsibilities() { return List.of( new CoreDuty("Search papers", "Find relevant research"), new CoreDuty("Analyze findings", "Extract key insights") ); } } ``` ## Actions An `Action` is a method annotated with `@ActionSpec` on a Role class. Actions are routed to one of four executor types based on their `ActionType`: | Type | Executor | Description | | ------------- | --------------------- | -------------------------------------------------------------------------- | | `LOCAL` | `TypedActionExecutor` | Direct method invocation via reflection | | `WEB_SERVICE` | `WebServiceExecutor` | HTTP REST API calls | | `LLM` | `LLMRoleExecutor` | Single-shot LLM call; tool dispatch via the agent's `ToolMethodDispatcher` | | `MCP_TOOL` | `McpToolExecutor` | Model Context Protocol tools | ### Defining Actions Annotate any method on your Role class with `@ActionSpec` to expose it as an action. The `type` field tells the framework which executor handles the call. ```java @ActionSpec( name = "searchPapers", description = "Search for academic papers on a topic", type = ActionType.LLM ) public String searchPapers(@LLMParam("The search query") String query) { // Implementation } ``` ## ActionResult When an action delegates to an external system (HTTP call, LLM tool, MCP tool), the framework executes the call and makes the raw result available as an `ActionResult`. There are two usage patterns: ### Pure Delegate (Abstract, No Body) If the method has no body (abstract or the framework handles it entirely), the framework executes the action and returns the result directly. No `ActionResult` parameter is needed: ```java @ActionSpec( type = ActionType.WEB_SERVICE, endpoint = "https://api.example.com/users/{id}" ) public abstract Object getUser(String id); ``` ### Post-Process with ActionResult Add an `ActionResult` parameter to receive the raw execution result and transform it before returning: ```java @ActionSpec( type = ActionType.WEB_SERVICE, endpoint = "https://api.example.com/data/{id}" ) public Object getData(String id, ActionResult result) { // Return as-is return result; // Or extract a field Map json = result.asMap(); return json.get("name"); } ``` ### ActionResult API `ActionResult` wraps the raw value returned by the external system and provides convenience methods for common conversions like JSON parsing and type deserialization. | Method | Return type | Description | | -------------- | --------------------- | --------------------------------------------------------- | | `getValue()` | `Object` | Raw result value | | `asString()` | `String` | Value as String (JSON-serialized if not already a String) | | `asMap()` | `Map` | Value as Map (parsed from JSON if needed) | | `asList()` | `List` | Value as List (parsed from JSON if needed) | | `asJson()` | `JsonNode` | Jackson `JsonNode` for flexible JSON traversal | | `as(Class)` | `T` | Deserialize to a specific type | | `isNull()` | `boolean` | True if the underlying value is null | | `isEmpty()` | `boolean` | True if null, empty String, empty Map, or empty List | ### Example -- Transforming a Web Service Response This example shows a common pattern: calling a weather API and reshaping the JSON response into a human-readable string before returning it to the agent. ```java @ActionSpec( type = ActionType.WEB_SERVICE, endpoint = "https://api.weather.com/forecast/{city}" ) public String getForecast(String city, ActionResult result) { JsonNode json = result.asJson(); String temp = json.path("main").path("temp").asText(); String desc = json.path("weather").get(0).path("description").asText(); return String.format("Temperature: %s, Conditions: %s", temp, desc); } ``` ## Role Accessors Once you have a `Role` instance, these methods let you inspect its identity, actions, safety constraints, and generated system prompt. This is useful for debugging, logging, or building tooling around roles. ```java RoleIdentity identity = role.identity(); String name = role.getName(); String goal = role.getGoal(); String domain = role.getDomain(); List actions = role.getActions(); int count = role.getActionCount(); boolean has = role.hasAction("searchPapers"); Optional action = role.getAction("searchPapers"); List mustNever = role.getMustNeverConstraints(); List mustAlways = role.getMustAlwaysConstraints(); String systemPrompt = role.getSystemPrompt(); String minimalPrompt = role.getMinimalPrompt(); ``` ## BDI Model TnsAI implements the Belief-Desire-Intention (BDI) architecture for agent reasoning: ```java Agent agent = AgentBuilder.create() .llm(llm) .role(role) .identity(new AgentIdentity("ResearchBot", "AI Research Assistant")) .belief(new Belief("domain", "artificial intelligence")) .belief(new Belief("max_papers", 10)) .desire(new Desire("find_papers", "Locate relevant research papers")) .desire(new Desire("synthesize", "Create comprehensive summaries")) .intention(new Intention("search", "Search academic databases")) .intention(new Intention("analyze", "Analyze paper contents")) .capability(new Capability("search", "Academic database search")) .capability(new Capability("summarization", "Text summarization")) .build(); ``` | Concept | Class | Purpose | | -------------- | ------------ | -------------------------------------------- | | **Belief** | `Belief` | What the agent knows (key-value pairs) | | **Desire** | `Desire` | What the agent wants to achieve (goals) | | **Intention** | `Intention` | How the agent plans to act (committed plans) | | **Capability** | `Capability` | What the agent can do (skills) | | **Plan** | `Plan` | Structured plan for achieving desires | --- # Agents URL: https://tnsai.dev/docs/agents Description: Everything about building a single agent — from the first Agent instance to advanced cognitive composition. import { Callout } from 'fumadocs-ui/components/callout' ## Sections - [Fundamentals](/docs/agents/fundamentals) — Agents, roles, actions, and the event bus. Start here. - [Behavior](/docs/agents/behavior) — Prompts, output parsing, streaming, memory, variants. - [Reliability](/docs/agents/reliability) — Resilience, error handling, schema identity. - [Advanced](/docs/agents/advanced) — Cognitive support, planner/reasoner handles, ensemble execution. ## Related - [Multi-Agent](/docs/multi-agent) — When one agent isn't enough. - [Capabilities](/docs/capabilities) — Tools, intelligence, RAG, LLM. --- # Error Handling URL: https://tnsai.dev/docs/agents/reliability/error-handling Description: TnsAI.Core provides a structured exception hierarchy rooted in TnsAIException. Every exception carries an error code, retryability flag, and suggested retry parameters, enabling automated recovery decisions across the framework. import { Callout } from 'fumadocs-ui/components/callout' ## TnsAIException (Base Class) All TnsAI exceptions extend `TnsAIException`, which itself extends `RuntimeException` (unchecked). ```java public class TnsAIException extends RuntimeException { public boolean isRetryable(); public String getErrorCode(); public long getSuggestedRetryDelayMs(); public int getMaxRetryAttempts(); } ``` | Method | Description | | ---------------------------- | ------------------------------------------------------------------------------------------------ | | `isRetryable()` | `true` for transient errors (network, rate limits, server errors) | | `getErrorCode()` | Auto-derived code in format `TNSAI-CLASSNAME` (e.g., `TNSAI-NETWORK`, `TNSAI-RATELIMIT`) | | `getSuggestedRetryDelayMs()` | Default `1000ms` for retryable, `0` for non-retryable. Subclasses override with specific delays. | | `getMaxRetryAttempts()` | Default `3` for retryable, `0` for non-retryable | The error code is derived from the class name using `Locale.ROOT`: ```java // TnsAIException -> "TNSAI-TNSAI" // LLMException -> "TNSAI-LLM" // NetworkException -> "TNSAI-NETWORK" ``` ## LLMException When something goes wrong during a call to an LLM provider (invalid API key, context too long, server outage), the framework throws an `LLMException`. Each exception carries an `ErrorType` that tells you exactly what happened and whether it makes sense to retry. ```java public class LLMException extends TnsAIException { public String getModel(); public ErrorType getErrorType(); public long getSuggestedRetryDelayMs(); } ``` ### ErrorType Enum The `ErrorType` enum classifies the root cause of an LLM failure. Use it to decide how your application should react -- for example, retrying a transient `SERVER_ERROR` but surfacing a permanent `AUTHENTICATION_FAILED` to the user. | Value | Retryable | Description | | ----------------------- | --------- | -------------------------------------------------- | | `MODEL_NOT_FOUND` | No | Model not found or unavailable | | `AUTHENTICATION_FAILED` | No | Invalid API key or auth failure | | `CONTENT_FILTERED` | No | Content policy violation | | `INVALID_REQUEST` | No | Invalid request format | | `MODEL_OVERLOADED` | Yes | Model overloaded, try again | | `CONTEXT_TOO_LONG` | No | Context length exceeded | | `SERVER_ERROR` | Yes | Generic server error | | `MALFORMED_TOOL_CALL` | No | Bad JSON or missing fields in tool call from model | | `CAPABILITY_MISMATCH` | No | Model does not support a required capability | | `UNKNOWN` | Yes | Unknown error | Retry delays are error-type specific: - `MODEL_OVERLOADED` -- 5000ms - `SERVER_ERROR` -- 2000ms - All others -- 1000ms ### Factory Methods Instead of calling the constructor directly, use these static factory methods to create `LLMException` instances with the correct `ErrorType` and retry parameters already set. ```java LLMException.modelNotFound("gpt-5") LLMException.authenticationFailed("claude-sonnet-4", "Invalid API key") LLMException.contentFiltered("gpt-4o", "Violates content policy") LLMException.contextTooLong("gpt-4o-mini", 128000, 150000) LLMException.modelOverloaded("claude-sonnet-4") LLMException.serverError("gemini-2.5-flash", cause) LLMException.malformedToolCall("gpt-4o", "search", "Invalid JSON in arguments", cause) LLMException.malformedToolCall("gpt-4o", "search", "Missing required field 'query'") ``` ## RateLimitException LLM providers enforce request quotas and return HTTP 429 ("Too Many Requests") when you exceed them. `RateLimitException` wraps these responses and is always retryable, carrying the provider-suggested wait time so your code can back off automatically. ```java public class RateLimitException extends TnsAIException { public Long getRetryAfterMs(); public String getService(); public Integer getRemainingQuota(); public long getSuggestedRetryDelayMs(); // Uses retryAfterMs, defaults to 60000ms public int getMaxRetryAttempts(); // Returns 5 } ``` ### Factory Methods These factory methods create `RateLimitException` instances from common rate-limit scenarios, automatically setting the correct retry delay and max retry count. ```java // Parse HTTP 429 Retry-After header (seconds -> ms conversion) RateLimitException.fromHttp429("openai", "30") // LLM quota exceeded (defaults to 300000ms / 5 minutes) RateLimitException.llmQuotaExceeded("claude-sonnet-4") // API endpoint rate limit with explicit delay RateLimitException.apiRateLimit("/api/v1/chat", 10000L) ``` ## ActionExecutionException When an agent action fails at runtime (a web service call times out, a parameter is missing, an MCP tool returns an error), the framework throws an `ActionExecutionException`. It includes the action name, type, and error category so you can programmatically decide whether to retry, fix parameters, or escalate. ```java public class ActionExecutionException extends TnsAIException { public String getActionName(); public ActionType getActionType(); public ErrorCategory getCategory(); public String getDetailedMessage(); } ``` ### ErrorCategory Enum Each `ActionExecutionException` is tagged with an `ErrorCategory` that groups the failure by root cause. This makes it straightforward to write a `switch` block that handles transient network errors differently from permanent validation errors. | Category | Retryable (default) | Description | | -------------- | ------------------- | ------------------------------- | | `NETWORK` | Yes | Connection timeout, DNS failure | | `PARAMETER` | No | Missing parameter, wrong type | | `CLIENT_ERROR` | No | HTTP 4xx status codes | | `SERVER_ERROR` | Yes | HTTP 5xx status codes | | `VALIDATION` | No | Contract violations | | `LLM` | Yes | Model errors, quota exceeded | | `MCP` | Yes | MCP tool errors | | `INVOCATION` | No | Reflection, method not found | | `UNKNOWN` | No | Unclassified errors | The `getDetailedMessage()` method produces a structured log line: ``` [WEB_SERVICE] Action 'fetchWeather' failed: Network error | Category: Network error | Retryable: true | Cause: SocketTimeoutException (Connect timed out) ``` ### Factory Methods Use these static factories to create `ActionExecutionException` instances with the correct category, retryability flag, and detailed message already populated. ```java ActionExecutionException.fromNetworkError("fetchWeather", ActionType.WEB_SERVICE, ioException) ActionExecutionException.fromParameterError("search", ActionType.LOCAL, "query is required", cause) ActionExecutionException.fromApiError("createIssue", ActionType.WEB_SERVICE, 503, "Service Unavailable", cause) ActionExecutionException.fromLLMError("summarize", ActionType.LLM, llmException) ActionExecutionException.fromMCPError("mcp-tool-name", mcpException) ActionExecutionException.fromInvocationError("calculate", ActionType.LOCAL, reflectionException) ``` ## Other Exceptions Beyond the main exception types above, TnsAI provides several specialized exceptions for network failures, timeouts, validation errors, capability mismatches, and control-flow signals. The table below summarizes their retry behavior and key fields. | Exception | Retryable | Retry Delay | Max Retries | Key Fields | | ------------------------------- | --------- | ------------------------ | ----------- | ------------------------------------------ | | `NetworkException` | Yes | 2000ms | 5 | `host`, `port` | | `TimeoutException` | Yes | `min(timeoutMs/2, 5000)` | 3 | `timeoutMs`, `operation` | | `ValidationException` | No | -- | -- | -- | | `ApprovalRequiredException` | No | -- | -- | `actionName`, `reason` | | `TaskCompleteException` | No | -- | -- | `summary`, `result`, `success`, `metadata` | | `LLMCapabilityException` | No | -- | -- | `provider`, `capability` | | `ToolCallNotSupportedException` | No | -- | -- | `model`, `provider` | ### NetworkException Factories Create `NetworkException` instances for common connectivity failures like refused connections, DNS resolution problems, and connection timeouts. ```java NetworkException.connectionRefused("api.example.com", 443) NetworkException.dnsResolutionFailed("api.example.com", cause) NetworkException.connectionTimeout("api.example.com", 443, 5000L) ``` ### TimeoutException Factories Create `TimeoutException` instances for operations that exceed their time budget, whether that is an LLM call, an HTTP request, or an action execution. ```java TimeoutException.llmTimeout(30000L) TimeoutException.httpTimeout("https://api.example.com/v1/chat", 10000L) TimeoutException.actionTimeout("fetchData", 5000L) ``` ### LLMCapabilityException Factories Thrown when you request a feature (streaming, vision, structured output) that the selected model or provider does not support. These are never retryable because the model simply lacks the capability. ```java LLMCapabilityException.streamingNotSupported("phi", "Ollama") LLMCapabilityException.visionNotSupported("gpt-3.5-turbo", "OpenAI") LLMCapabilityException.structuredOutputNotSupported("llama-2", "Ollama") ``` ### TaskCompleteException (Control Flow) `TaskCompleteException` is not an error -- it is a control flow signal used to indicate that a task has been completed and the agent loop should terminate. ```java // Simple completion throw new TaskCompleteException("Analysis complete"); // With result data throw TaskCompleteException.withResult("Task done", Map.of("filesCreated", 5)); // Failed completion throw TaskCompleteException.failed("Could not complete", "API unavailable"); // With metadata throw TaskCompleteException.withMetadata("Done", Map.of("duration", "45s")); ``` Handling in the agent loop: ```java try { agent.run(task); } catch (TaskCompleteException e) { System.out.println("Summary: " + e.getSummary()); System.out.println("Success: " + e.isSuccess()); if (e.hasResult()) { MyResult result = e.getResultAs(MyResult.class); } } ``` ## Code Examples These examples show common patterns for handling TnsAI exceptions in your application code. ### Catching and Classifying Errors The recommended approach is to catch exceptions from most specific to least specific. This lets you handle rate limits, LLM-specific errors, and generic TnsAI errors each in the most appropriate way. ```java try { String response = agent.chat("Analyze this data"); } catch (RateLimitException e) { // Wait for the provider-specified delay Thread.sleep(e.getSuggestedRetryDelayMs()); // Retry... } catch (LLMException e) { if (e.getErrorType() == LLMException.ErrorType.CONTEXT_TOO_LONG) { // Truncate context and retry } else if (e.isRetryable()) { // Retry with backoff } else { // Log and fail logger.error("LLM error [{}]: {}", e.getErrorCode(), e.getMessage()); } } catch (TnsAIException e) { if (e.isRetryable()) { logger.warn("Retryable error [{}], retrying in {}ms", e.getErrorCode(), e.getSuggestedRetryDelayMs()); } else { logger.error("Non-retryable error [{}]: {}", e.getErrorCode(), e.getMessage()); } } ``` ### Handling Action Execution Errors When an action fails, you can use the error category to decide on recovery. Transient errors (network, server) can be retried with backoff, while parameter or validation errors need to be fixed before retrying. ```java try { executor.execute(action, params); } catch (ActionExecutionException e) { logger.error(e.getDetailedMessage()); switch (e.getCategory()) { case NETWORK, SERVER_ERROR -> { // Transient -- retry with backoff Thread.sleep(e.getSuggestedRetryDelayMs()); } case PARAMETER, VALIDATION -> { // Fix parameters and retry logger.warn("Fix parameters for action: {}", e.getActionName()); } case LLM -> { // Check nested LLMException for details if (e.getCause() instanceof LLMException llm) { logger.warn("LLM error type: {}", llm.getErrorType()); } } default -> throw e; } } ``` ### Using Error Codes for Monitoring Every `TnsAIException` carries a stable error code (like `TNSAI-NETWORK` or `TNSAI-RATELIMIT`) that you can use as a metric tag in your monitoring system. This example shows how to increment a counter on each failure for dashboards and alerting. ```java try { agent.chat("query"); } catch (TnsAIException e) { metrics.counter("tnsai.errors", "code", e.getErrorCode(), "retryable", String.valueOf(e.isRetryable()) ).increment(); throw e; } ``` --- # Reliability URL: https://tnsai.dev/docs/agents/reliability Description: Making agents survive the real world. import { Callout } from 'fumadocs-ui/components/callout' ## Pages - [Resilience](/docs/agents/reliability/resilience) — Retry, fallback, health state, circuit breaker. - [Error Handling](/docs/agents/reliability/error-handling) — Error types, propagation, recovery. - [Long-Running Runs](/docs/agents/reliability/long-running-runs) — Checkpoint, resume, idempotent retry, runtime cost ceiling for hour-scale executions. - [Schema Identity](/docs/agents/reliability/schema-identity) — Deterministic schema hashes across LLM providers. --- # Long-Running Runs URL: https://tnsai.dev/docs/agents/reliability/long-running-runs Description: Multi-hour and multi-day agent executions need a categorically different runtime than minute-scale interactive sessions. Process crashes, runtime upgrades, transient API failures, and operator-initiated pauses all happen — the framework's reliability layer makes them survivable rather than catastrophic. import { Callout } from 'fumadocs-ui/components/callout' The `com.tnsai.reliability` package in `tnsai-core` ships the primitives: - **`Checkpoint`** + **`CheckpointStore`** — durable state at step boundaries - **`ResumableRun`** + **`DefaultResumableRun`** — execution driver that consumes the stores - **`Progress`** + **`ProgressSink`** — observable mid-run state for live dashboards - **`RunConfig`** — duration / retries / cost / progress-timeout policy bundle - **`Outcome`** + **`AbortReason`** — terminal-result sealed type Pairs with the [idempotency layer](/docs/capabilities/tools/idempotency) (`tnsai-core.idempotency`) so side-effecting steps don't double-execute on retry, and with [cost governance](/docs/security/cost-governance) so a runaway loop hits a configured ceiling instead of burning unbounded spend. ## When to use You probably want this if your agent run: - takes longer than the longest interactive session you'd let the user wait on (rule of thumb: \\> 5 minutes) - calls external systems with non-trivial side effects (creates issues, posts messages, runs payments, kicks off CI builds) - has a per-run budget worth enforcing during execution rather than after the fact - needs to be inspectable from a separate process / dashboard while it's running You probably don't need it for: - single-LLM-call request handlers - read-only research agents that always finish in seconds - one-shot CLI utilities without resumption needs ## Quick start ```java import com.tnsai.reliability.*; import com.tnsai.idempotency.InMemoryIdempotencyStore; import java.time.Duration; import java.math.BigDecimal; // 1. Pick stores. InMemory for tests; Filesystem / SQLite for persistence. CheckpointStore checkpoints = new FilesystemCheckpointStore(Path.of("/var/tns/checkpoints")); var idempotency = new InMemoryIdempotencyStore(); ProgressSink progress = new InMemoryProgressSink(); // 2. Define a state codec — encode/decode whatever your run threads // through its steps. JSON is the recommended default; here we use UTF-8 // bytes for a string state. DefaultResumableRun.StateCodec codec = new DefaultResumableRun.StateCodec<>() { public byte[] encode(String s) { return s.getBytes(StandardCharsets.UTF_8); } public String decode(byte[] b) { return new String(b, StandardCharsets.UTF_8); } }; // 3. Compose the run. Each step is a UnaryOperator with a name and a // side-effecting flag. Side-effecting steps consult the idempotency // store so a retry doesn't double-execute. DefaultResumableRun run = DefaultResumableRun.builder() .codec(codec) .checkpointStore(checkpoints) .idempotencyStore(idempotency) .progressSink(progress) .step(DefaultResumableRun.Step.pure("plan", state -> state + "\n[plan]")) .step(DefaultResumableRun.Step.sideEffect("post-pr", state -> { githubClient.createPr(...); return state + "\n[posted]"; })) .step(DefaultResumableRun.Step.pure("verify", state -> state + "\n[verified]")) .build(); // 4. Execute with a config bound by duration / retries / cost / progress timeout. RunConfig config = RunConfig.builder() .maxDuration(Duration.ofHours(8)) .costCeilingUSD(new BigDecimal("5.00")) .checkpointInterval(Duration.ofMinutes(5)) .maxRetries(3) .progressTimeout(Duration.ofMinutes(10)) .build(); Outcome result = run.execute("start", config); ``` The orchestrator returns one of four [`Outcome`](#outcome-shape) variants when the run terminates. ## Outcome shape `Outcome` is a sealed interface — pattern-match exhaustively: | Variant | When | | ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | | `Outcome.Completed` | Run reached the end of its step list | | `Outcome.Failed` | A step exhausted its retry budget; carries the throwable | | `Outcome.Aborted` | Policy stopped the run (`AbortReason`); user-requested, budget exhausted, max-duration exceeded, progress timeout, retry-limit exceeded | | `Outcome.Suspended` | Run yielded cleanly at a step boundary; resumable via the carried `checkpointId` | ```java Outcome result = run.execute(input, config); String final = switch (result) { case Outcome.Completed c -> c.result(); case Outcome.Failed f -> { logger.error("Failed: {}", f.reason()); yield ""; } case Outcome.Aborted a -> { alert("Aborted: " + a.reason()); yield ""; } case Outcome.Suspended s -> { /* resume later via run.resume(s.runId()) */ yield ""; } }; ``` ## Resume after a crash After a process crash, runtime upgrade, or `OOMError`, the run is recoverable as long as the configured `CheckpointStore` survives: ```java // New process, same store backend. DefaultResumableRun run = /* ... same builder, same store paths ... */; Outcome result = run.resume(savedRunId); // Resume loads the latest checkpoint, decodes the state via the codec, // and continues from the next step. ``` If `run.resume(runId)` finds no checkpoint for the id, it returns `Outcome.Failed` with a clear "no such run" message rather than silently starting a fresh run — confusing one for the other was the bug pattern this primitive prevents. ## Side-effect safety: the kill-and-resume invariant The unit test that defines the contract: > A side-effecting step that runs once and crashes mid-execution must not run a second time on resume — its cached result is replayed instead. Implementation: `DefaultResumableRun` keys side-effecting steps by `runId:stepIndex:stepName` against the configured `IdempotencyStore`. On the original run the orchestrator records the step's return value via `IdempotencyEntry.forSuccess(...)`; on resume the orchestrator checks the store before invoking the body. Cached hit → replay; miss → execute + record. Concretely: a step that posts a GitHub PR, captured the PR id in its return state, then crashed before saving its checkpoint, will on resume: - find the cached PR id in the idempotency store under `::post-pr` - skip the body (no second `POST /pulls` to GitHub) - thread the cached state forward into the next step This is what makes hour-scale runs against rate-limited / cost-bearing external APIs feasible. ## CheckpointStore selection | Store | When | | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `InMemoryCheckpointStore` | Tests, single-process workloads, "checkpointing-on" baseline. Doesn't survive restarts. | | `FilesystemCheckpointStore` | Single-process production. JSON files under `//.json`. Atomic write-tmp-then-rename with `fsync`. | | `SqliteCheckpointStore` | Personal-fleet workloads (one binary on a laptop / NAS). Single SQLite file, durable across restarts. Optional `xerial:sqlite-jdbc` dep — pull only when used. | | `RedisCheckpointStore` | Multi-process fleet sharing a Redis instance. Per-checkpoint blob + per-run sorted set for O(log N) latest / range queries. Reuses the existing Jedis dep (no new artefact). | | `S3CheckpointStore` | Distributed deployments where checkpoints must outlive a single host — runs started on box A may resume on box B / region 2. JSON object per checkpoint at `s3:////.json`. Optional `software.amazon.awssdk:s3` dep. | ### Picking between distributed stores | Concern | Redis | S3 | | ------------------- | ---------------------------- | ------------------------------------------------ | | **Latency** | sub-ms intra-region | tens of ms (eventually-consistent regions: more) | | **Durability** | RDB / AOF — operator-tunable | 11 nines — managed | | **Cost** | proportional to memory + ops | proportional to storage + requests | | **Operations** | needs a Redis cluster | managed by AWS | | **Region affinity** | typically per-region cluster | cross-region replication available | Use Redis for fast resume cycles (operator-paused runs picked up within minutes); use S3 for archival / cross-region durability. The two are not mutually exclusive — a layered stack with a Redis hot path and an S3 cold path is a reasonable production shape, but ships as a consumer-side composite rather than a built-in. ## Cost ceiling enforcement `DefaultResumableRun.Builder#costMonitor` is optional. Provide one and pair it with `RunConfig.costCeilingUSD` to make the orchestrator check spend before every step: ```java // Wire to the cost-governance store from tnsai-quality. CostBudgetStore costStore = ...; // your spend ledger DefaultResumableRun.CostMonitor monitor = () -> Optional.of(costStore.currentSpend( CostScope.tenant(tenantId), Duration.ofHours(24))); DefaultResumableRun run = DefaultResumableRun.builder() // ... .costMonitor(monitor) .build(); run.execute(input, RunConfig.builder() .costCeilingUSD(new BigDecimal("10.00")) // $10 cap .build()); ``` When `currentSpend ≥ ceiling`, the orchestrator: 1. Saves a final checkpoint at the current state 2. Publishes `Progress.RunAborted(BUDGET_EXHAUSTED)` 3. Returns `Outcome.Aborted` The consumer can resume the run on a higher ceiling once it's available — the saved checkpoint preserves work-done-so-far. ## Progress events `Progress` is a sealed interface with eight variants. Subscribe to a run's events for live dashboards or progress-timeout enforcement: ```java ProgressSink.Subscription sub = sink.subscribe(runId, event -> { switch (event) { case Progress.StepStarted s -> dashboard.markStepRunning(s.stepIndex()); case Progress.StepCompleted s -> dashboard.markStepDone(s.stepIndex(), s.took()); case Progress.CheckpointSaved c -> dashboard.checkpointAt(c.checkpointId()); case Progress.CostUpdate c -> dashboard.spendTick(c.spentUSD()); case Progress.Heartbeat h -> dashboard.heartbeat(h.currentStep()); case Progress.RunCompleted r -> dashboard.markDone(); case Progress.RunFailed r -> dashboard.markFailed(r.reason()); case Progress.RunAborted r -> dashboard.markAborted(r.reason()); } }); // Cancel when the dashboard tab closes. sub.cancel(); ``` `InMemoryProgressSink` is the default. `KafkaProgressSink` and similar fan-out adapters are tracked as follow-ups; the SPI accepts third-party impls today. ## RunConfig defaults | Field | Default | Notes | | -------------------- | ------- | -------------------------------------------------------------- | | `maxDuration` | 8h | Hard wall-clock cap; `MAX_DURATION_EXCEEDED` on breach | | `costCeilingUSD` | empty | Optional; without it, runtime cost-ceiling checks are skipped | | `checkpointInterval` | 5 min | Caps worst-case replay window after a crash | | `maxRetries` | 3 | Per-step retry budget before `Outcome.Failed` | | `progressTimeout` | 10 min | "stuck" detector — abort if no `Progress` event in this window | Use `RunConfig.defaults()` for an overnight-run-friendly starting point, or `RunConfig.builder()` for finer control. ## Tool annotations Tools that participate in resumable runs benefit from declaring their effect classification + idempotency expectation. Two annotations were added to `@Tool` for this: ```java import com.tnsai.annotations.*; public class GithubTools { @Tool(name = "github.create_pr", description = "Open a pull request on a GitHub repo", sideEffect = SideEffect.EXTERNAL, idempotencyHint = IdempotencyHint.REQUIRED) public PullRequest createPr(@ToolParam("title") String title, ...) { ... } @Tool(name = "github.list_branches", description = "List branches on a GitHub repo", sideEffect = SideEffect.READ) // idempotencyHint defaults to NONE public List listBranches(...) { ... } } ``` `SideEffect`: | Value | Meaning | | ---------- | ---------------------------------------------------------------------- | | `NONE` | Pure function. Default. | | `READ` | Reads external state, doesn't mutate. | | `WRITE` | Mutates state inside the framework or a system the framework controls. | | `EXTERNAL` | Calls a third-party system whose effect we can't reverse. | `IdempotencyHint`: | Value | Meaning | | ---------- | ---------------------------------------------------------------------------------------------------------- | | `NONE` | No tracking. Default. | | `OPTIONAL` | Track when caller supplies a key, otherwise dispatch unguarded. | | `REQUIRED` | Refuse to dispatch without an idempotency key. Reserve for payments / public posts / irreversible deletes. | Both fields default to safe values so existing tools retain their current behaviour without modification — the annotations are purely additive metadata. ## What's not in this layer Out of scope for the framework primitives, tracked separately: - **Distributed transaction coordination** (sagas) — checkpointing is local; cross-system consistency is a different problem. - **Run inspection UI** — the `ProgressSink` API is enough for now; UI is downstream. - **Backfill for runs without checkpointing** — additive feature, not retroactive. - **Harness execution-loop refactor** — the existing `AgentExecutor` will move onto `ResumableRun` as part of the harness work (TNS-291). ## See Also - **[Idempotency](/docs/capabilities/tools/idempotency)** — the SPI consumed by side-effecting steps - **[Cost Governance](/docs/security/cost-governance)** — the spend ledger the cost monitor wires into - **[Resilience](/docs/agents/reliability/resilience)** — companion retry / fallback / circuit-breaker layer for individual operations - **[Error Handling](/docs/agents/reliability/error-handling)** — how step exceptions surface and propagate --- # Resilience URL: https://tnsai.dev/docs/agents/reliability/resilience Description: TnsAI.Core provides a declarative resilience framework built on top of Resilience4j. The @Resilience annotation configures retry, circuit breaker, rate limiting, bulkhead isolation, timeout, and fallback policies for actions and roles. The ResilienceExecutor applies these policies in a layered pipeline and tracks terminal failures in a dead-letter queue. import { Callout } from 'fumadocs-ui/components/callout' ## @Resilience Annotation The `@Resilience` annotation is the single entry point for declaring all resilience policies on an action or role. Apply it to a method to configure that specific action, or to a class to set defaults for all actions in the role. Method-level annotations override type-level ones. ```java @Documented @Retention(RetentionPolicy.RUNTIME) @Target({ElementType.METHOD, ElementType.TYPE}) public @interface Resilience { Retry retry() default @Retry(); CircuitBreaker circuitBreaker() default @CircuitBreaker(); RateLimit rateLimit() default @RateLimit(); int timeout() default 0; // milliseconds, 0 = no timeout String fallback() default ""; // fallback method name (same signature) boolean bulkhead() default false; // thread pool isolation int maxConcurrent() default 10; // max concurrent calls for bulkhead } ``` ### @Retry When a transient error occurs (network timeout, server 500), retrying after a short delay often succeeds. The `@Retry` sub-annotation configures automatic retries with exponential backoff (each retry waits longer) and optional jitter (randomized delays to avoid thundering herd problems). ```java @interface Retry { int maxAttempts() default 0; // 0 = no retries int backoffMs() default 1000; // initial delay double multiplier() default 2.0; // backoff multiplier int maxBackoffMs() default 30000; // max delay cap Class[] retryOn() default {}; // empty = all Class[] noRetryOn() default {}; // exclusions boolean jitter() default true; // add randomization } ``` Example: ```java @Resilience(retry = @Retry( maxAttempts = 3, backoffMs = 500, multiplier = 2.0, maxBackoffMs = 10000, retryOn = {NetworkException.class, RateLimitException.class}, jitter = true )) public String fetchData(String query) { ... } ``` With the default `multiplier = 2.0` and `jitter = true`, delays are approximately: 500ms, 1000ms, 2000ms (with random jitter applied to each). ### @CircuitBreaker If a downstream service is down, retrying every request wastes resources and can cascade failures across your system. A circuit breaker tracks failures and, after a threshold is reached, "opens" the circuit to reject all calls immediately. After a cooldown period, it allows a few test calls through ("half-open") to check if the service has recovered. ```java @interface CircuitBreaker { boolean enabled() default false; int failureThreshold() default 5; // failures before opening int failureWindowMs() default 60000; // window for counting failures int resetTimeoutMs() default 30000; // wait before half-open int successThreshold() default 3; // successes needed in half-open int failureRateThreshold() default 0; // percentage (0-100), alternative to count } ``` Example: ```java @Resilience(circuitBreaker = @CircuitBreaker( enabled = true, failureThreshold = 5, failureWindowMs = 60000, resetTimeoutMs = 30000, successThreshold = 3 )) public String callExternalApi() { ... } ``` States: **Closed** (normal) -\\> **Open** (reject all calls) -\\> **Half-Open** (allow `successThreshold` test calls) -\\> **Closed** (if tests pass). ### @RateLimit Rate limiting prevents your application from overwhelming a downstream service or exceeding API quotas. The `@RateLimit` sub-annotation lets you cap the number of requests in a time window, with four different strategies for how that limit is enforced. ```java @interface RateLimit { boolean enabled() default false; int maxRequests() default 100; int windowMs() default 60000; // 1 minute Strategy strategy() default Strategy.SLIDING_WINDOW; } ``` ## Rate Limit Strategies The four strategies differ in how smoothly they distribute requests over time. Choose based on whether your use case tolerates bursts or needs strict even pacing. | Strategy | Description | | ---------------- | ------------------------------------------------------------------------------------------------ | | `FIXED_WINDOW` | Counts requests in fixed time windows. Simple but can allow bursts at window boundaries. | | `SLIDING_WINDOW` | Counts requests in a sliding time window. Smoother rate control than fixed window. | | `TOKEN_BUCKET` | Tokens are added at a fixed rate. Each request consumes one token. Allows controlled bursts. | | `LEAKY_BUCKET` | Requests are processed at a fixed rate. Excess requests queue or are rejected. Smoothest output. | Example: ```java @Resilience(rateLimit = @RateLimit( enabled = true, maxRequests = 60, windowMs = 60000, strategy = RateLimit.Strategy.TOKEN_BUCKET )) public String queryLlm(String prompt) { ... } ``` ## Combining Policies In production, you typically want multiple resilience layers working together. A single `@Resilience` annotation can configure retry, circuit breaker, rate limiting, timeout, bulkhead isolation, and a fallback method all at once. ```java @Resilience( retry = @Retry(maxAttempts = 3, backoffMs = 1000), circuitBreaker = @CircuitBreaker(enabled = true, failureThreshold = 5, resetTimeoutMs = 30000), rateLimit = @RateLimit(enabled = true, maxRequests = 100, windowMs = 60000), timeout = 5000, bulkhead = true, maxConcurrent = 10, fallback = "fetchDataFallback" ) public Result fetchData(String query) { ... } // Fallback must have same signature public Result fetchDataFallback(String query) { return Result.cached(query); } ``` Type-level defaults apply to all actions in a role unless overridden: ```java @Resilience( retry = @Retry(maxAttempts = 2), timeout = 10000 ) public class ApiRole extends Role { @Resilience(retry = @Retry(maxAttempts = 5)) // overrides type-level retry public String criticalCall() { ... } // Uses type-level defaults: 2 retries, 10s timeout public String normalCall() { ... } } ``` ## ResilienceExecutor The `ResilienceExecutor` is the engine that applies all your resilience policies at runtime. It builds a layered pipeline where each policy wraps the next, using Resilience4j under the hood. You can also use it programmatically (without annotations) for ad-hoc resilient operations. ### Pipeline Order Policies are applied in a specific order, from outermost to innermost. Rate limiting is checked first (to avoid unnecessary work), then bulkhead isolation, then the circuit breaker, and finally retries wrap the actual operation. ``` Rate Limit -> Bulkhead -> Circuit Breaker -> Retry -> Operation ``` Each layer wraps the next. When an operation fails and exhausts all resilience layers, the failure is recorded in the dead-letter queue. ### Construction You can create a `ResilienceExecutor` with defaults, inject a custom dead-letter queue, or take full control over all Resilience4j registries. ```java // Default registries + in-memory DLQ ResilienceExecutor executor = new ResilienceExecutor(); // Custom dead-letter queue ResilienceExecutor executor = new ResilienceExecutor(myDlq); // Full control over all registries ResilienceExecutor executor = new ResilienceExecutor( circuitBreakerRegistry, bulkheadRegistry, rateLimiterRegistry, deadLetterQueue ); ``` ### Programmatic Usage with ResilienceRequest When you need resilience for a one-off operation that is not tied to an annotated action method, use `ResilienceRequest` to define the operation and its policies inline. ```java ResilienceExecutor executor = new ResilienceExecutor(); String result = executor.execute( ResilienceRequest.builder() .operationId("fetch-weather") .operation(() -> httpClient.get("https://api.weather.com/current")) .retryPolicy(RetryPolicy.defaultPolicy()) .rateLimit(60, 60000) // 60 requests per minute .bulkhead(5) // max 5 concurrent calls .timeout(5000) // 5 second timeout .fallbackStrategy(ex -> "Weather data unavailable") .build() ); ``` ### FallbackStrategy When all retries are exhausted and the operation still fails, a fallback strategy provides a default value instead of throwing an exception. This is useful for returning cached data or a graceful degradation response. ```java @FunctionalInterface public interface FallbackStrategy { T fallback(Exception exception); // Default: accepts all exceptions default boolean supports(Exception exception) { return true; } } ``` Custom fallback with selective exception handling: ```java FallbackStrategy fallback = new FallbackStrategy<>() { @Override public String fallback(Exception exception) { return "Cached result"; } @Override public boolean supports(Exception exception) { return exception instanceof NetworkException; } }; ``` ### Exception Handling When a resilience policy rejects a request (rather than the underlying operation failing), the executor catches these Resilience4j-specific exceptions so you know which layer blocked the call. | Exception | Meaning | | --------------------------- | ----------------------------- | | `CallNotPermittedException` | Circuit breaker is open | | `BulkheadFullException` | Max concurrent calls exceeded | | `RequestNotPermitted` | Rate limit exceeded | | `TimeoutException` | Operation timed out | All failures are recorded in the dead-letter queue before being re-thrown. ## DeadLetterQueue When an operation fails even after retries, circuit breaker bypass, and fallback attempts, the failure is not silently discarded. Instead, it is recorded in a `DeadLetterQueue` (DLQ) so you can monitor terminal failures, alert on them, or replay the operations later. ### Interface The DLQ interface is intentionally simple: enqueue failed entries, and query them by operation ID or in bulk. ```java public interface DeadLetterQueue { void enqueue(DeadLetterEntry entry); List getEntries(); List getEntries(String operationId); int size(); } ``` ### DeadLetterEntry Each DLQ entry captures everything you need to understand and potentially replay a failed operation: which operation failed, what exception occurred, when it happened, and any additional context as metadata. ```java DeadLetterEntry entry = DeadLetterEntry.builder() .operationId("fetch-weather") .exceptionType("NetworkException") .exceptionMessage("Connection timed out") .timestamp(Instant.now()) .metadata(Map.of("host", "api.weather.com", "port", 443)) .build(); ``` Fields: - `id` -- auto-generated UUID - `operationId` -- identifies the operation that failed - `exceptionType` -- exception class name - `exceptionMessage` -- exception message - `timestamp` -- when the failure occurred - `metadata` -- additional context (immutable map) ### Accessing the DLQ You can query the DLQ through the executor to get all failures, filter by operation ID, or check the total failure count. ```java ResilienceExecutor executor = new ResilienceExecutor(); // After operations... DeadLetterQueue dlq = executor.getDeadLetterQueue(); // All failures List allFailures = dlq.getEntries(); // Failures for a specific operation List weatherFailures = dlq.getEntries("fetch-weather"); // Total failure count int failureCount = dlq.size(); ``` The default implementation (`InMemoryDeadLetterQueue`) stores entries in memory. Implement the `DeadLetterQueue` interface for persistent storage (database, Redis, etc.) and pass it to the `ResilienceExecutor` constructor. ## Code Examples These examples show resilience in practice, from a fully annotated action to DLQ monitoring. ### Action with Full Resilience This example shows a real-world action with all resilience layers active: retries for network and rate-limit errors, a circuit breaker to stop calling a failing service, rate limiting to stay within API quotas, a timeout, and a fallback that returns cached data. ```java @ActionSpec(type = ActionType.WEB_SERVICE, name = "fetchStockPrice") @Resilience( retry = @Retry( maxAttempts = 3, backoffMs = 1000, multiplier = 2.0, retryOn = {NetworkException.class, RateLimitException.class} ), circuitBreaker = @CircuitBreaker( enabled = true, failureThreshold = 5, resetTimeoutMs = 30000 ), rateLimit = @RateLimit( enabled = true, maxRequests = 120, windowMs = 60000, strategy = RateLimit.Strategy.SLIDING_WINDOW ), timeout = 10000, fallback = "fetchStockPriceFallback" ) public double fetchStockPrice(String symbol) { return stockApi.getPrice(symbol); } public double fetchStockPriceFallback(String symbol) { return cacheStore.getLastKnownPrice(symbol); } ``` ### Monitoring Failures via DLQ In production, you want to know when operations are permanently failing. This example sets up a periodic check that logs all DLQ entries, which you can hook into your alerting system. ```java ResilienceExecutor executor = new ResilienceExecutor(); // Periodic monitoring ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor(); scheduler.scheduleAtFixedRate(() -> { DeadLetterQueue dlq = executor.getDeadLetterQueue(); int count = dlq.size(); if (count > 0) { logger.warn("Dead letter queue has {} entries", count); for (DeadLetterEntry entry : dlq.getEntries()) { logger.warn(" Failed: {} - {} at {}", entry.getOperationId(), entry.getExceptionType(), entry.getTimestamp()); } } }, 0, 1, TimeUnit.MINUTES); ``` --- # Schema, Identity, and Enums URL: https://tnsai.dev/docs/agents/reliability/schema-identity Description: TnsAI.Core provides typed schemas for LLM tool definitions, agent identity and communication style modeling, decentralized identifiers (DIDs), and a set of enums that control agent behavior. import { Callout } from 'fumadocs-ui/components/callout' ## ToolDefinition `com.tnsai.schema.ToolDefinition` is a typed record representing an LLM tool/function definition. It replaces untyped `Map` schemas with compile-time safety. ### Record Fields ```java public record ToolDefinition( String name, // required, non-blank String description, // defaults to "" Map parameters, // JSON Schema object, never null String endpoint, // optional API endpoint, defaults to "" String apiKeyEnvVar // optional env var name for API key ) ``` ### Creating Definitions ```java // Builder (recommended) ToolDefinition tool = ToolDefinition.builder() .name("search_weather") .description("Get weather for a city") .addParameter("city", "string", "City name", true) .addParameter("unit", "string", "Temperature unit", false) .addEnumParameter("format", "Output format", List.of("json", "text"), false) .endpoint("https://api.weather.com/v1") .apiKeyEnvVar("WEATHER_API_KEY") .build(); // Simple (no parameters) ToolDefinition simple = ToolDefinition.of("get_time", "Returns current UTC time"); // From OpenAI-format map ToolDefinition parsed = ToolDefinition.fromMap(existingMap); ``` ### Conversion ```java // To OpenAI function calling format Map map = tool.toMap(); // {"type": "function", "function": {"name": ..., "description": ..., "parameters": ...}} // Batch conversions List defs = ToolDefinition.fromMaps(listOfMaps); List> maps = ToolDefinition.toMaps(listOfDefs); ``` ### Query Methods | Method | Return | Description | | ---------------------- | --------------------- | ----------------------------------- | | `hasParameters()` | `boolean` | True if properties map is non-empty | | `requiredParameters()` | `List` | Names of required parameters | | `properties()` | `Map` | Parameter property schemas | | `requiresApiKey()` | `boolean` | True if `apiKeyEnvVar` is set | ### Builder API | Method | Description | | ------------------------------------------------------- | -------------------------------- | | `name(String)` | Set tool name | | `description(String)` | Set description | | `addParameter(name, type, description, required)` | Add a parameter | | `addEnumParameter(name, description, values, required)` | Add enum-constrained parameter | | `parameters(Map)` | Set raw parameters schema | | `endpoint(String)` | Set API endpoint URL | | `apiKeyEnvVar(String)` | Set API key environment variable | | `noApiKeyRequired()` | Clear API key requirement | ## ToolSchemaGenerator `com.tnsai.schema.ToolSchemaGenerator` generates tool/function schemas for LLM function calling from roles and tools. ### Generating Schemas ```java ToolSchemaGenerator generator = new ToolSchemaGenerator(); // Typed ToolDefinition records (preferred) List defs = generator.generateToolDefinitions(roles); // Single action ToolDefinition def = generator.generateToolDefinition(action); // From a standalone Tool ToolDefinition def = generator.generateToolDefinition(tool); // Legacy Map format List> schemas = generator.generateToolSchemas(roles); Map schema = generator.generateToolSchema(action); // Human-readable description for system prompts String desc = generator.generateToolsDescription(roles); ``` ### Configuration ```java // Enable/disable example inclusion in schemas generator.withExamples(true); ``` ### Java-to-JSON-Schema Type Mapping | Java Type | JSON Schema Type | | ------------------------------------------------- | ----------------------------------- | | `String`, `char`, `Character` | `"string"` | | `int`, `Integer`, `long`, `Long`, `short`, `byte` | `"integer"` | | `double`, `Double`, `float`, `Float` | `"number"` | | `boolean`, `Boolean` | `"boolean"` | | Arrays, `List` | `"array"` | | Enums | `"string"` (with `enum` constraint) | | Other | `"object"` | ## AgentIdentity `com.tnsai.models.agent.AgentIdentity` represents a personality trait or characteristic that shapes agent behavior. Identities are included in the system prompt to influence communication style. ```java public final class AgentIdentity { private final String name; // required, non-blank private final String description; // required, non-blank } ``` ### Usage ```java AgentIdentity analytical = new AgentIdentity( "analytical", "Takes a data-driven approach to problem solving" ); AgentIdentity empathetic = new AgentIdentity( "empathetic", "Shows understanding and consideration for user emotions" ); // In an Agent subclass @Override protected List getIdentities() { return List.of( new AgentIdentity("expert", "Deep knowledge in software engineering"), new AgentIdentity("patient", "Takes time to explain concepts clearly"), new AgentIdentity("thorough", "Considers all aspects before responding") ); } ``` ### Common Identity Types | Category | Examples | | ---------- | ------------------------------------------ | | Cognitive | analytical, creative, logical, intuitive | | Social | friendly, professional, empathetic, direct | | Behavioral | proactive, thorough, efficient, cautious | | Domain | expert, specialist, generalist | The class is immutable and thread-safe. It supports Jackson JSON serialization via `@JsonCreator`/`@JsonProperty`. ## Communication (Style) `com.tnsai.models.agent.Communication` is a record that defines how an agent expresses itself. ```java public record Communication( Tone tone, Formality formality, Verbosity verbosity ) ``` ### Usage ```java Communication style = new Communication( Tone.FRIENDLY, Formality.CASUAL, Verbosity.CONCISE ); // Default style Communication defaultStyle = Communication.defaultStyle(); // Tone.PROFESSIONAL, Formality.NEUTRAL, Verbosity.MODERATE // Get description String desc = style.getDescription(); // "Tone: Warm and approachable, Formality: ..., Verbosity: ..." // Generate prompt section for LLM String promptSection = style.generatePromptSection(); ``` Null values default to: `Tone.PROFESSIONAL`, `Formality.NEUTRAL`, `Verbosity.MODERATE`. ## DID (Decentralized Identifier) `com.tnsai.identity.DID` implements W3C DID Core for agent identification. Format: `did::` ```java // Parse from string DID did = DID.parse("did:wba:example.com:agent-123"); // Create from components DID did = new DID("wba", "example.com:agent-123"); // Factory methods DID wba = DID.createWba("example.com", "agent-123"); // did:wba:example.com:agent-123 DID web = DID.createWeb("example.com"); // did:web:example.com ``` ### Methods | Method | Return | Description | | ----------------------- | --------- | -------------------------------------------- | | `getMethod()` | `String` | DID method (e.g., `"wba"`, `"web"`, `"key"`) | | `getMethodSpecificId()` | `String` | Method-specific identifier | | `getDidString()` | `String` | Full DID string | | `asString()` | `String` | Alias for `getDidString()` | | `toURI()` | `URI` | DID as a `java.net.URI` | | `isWba()` | `boolean` | Check if `did:wba` | | `isWeb()` | `boolean` | Check if `did:web` | Validation: method must be lowercase alphanumeric (`[a-z0-9]+`). Invalid format throws `IllegalArgumentException`. ## Core Enums ### ActionType `com.tnsai.enums.ActionType` -- execution method for actions. | Value | Description | | ------------- | ----------------------------- | | `LOCAL` | Direct Java method invocation | | `WEB_SERVICE` | HTTP API calls | | `LLM` | LLM with tool selection | | `MCP_TOOL` | Model Context Protocol | ### AgentVariant `com.tnsai.enums.AgentVariant` -- quality/speed/cost tiers. | Variant | Quality | Speed | Cost | Description | | -------- | -------- | -------- | ------- | ---------------------- | | `HIGH` | MAX | SLOW | HIGH | Complex/critical tasks | | `MEDIUM` | BALANCED | NORMAL | MEDIUM | Regular development | | `MINI` | BASIC | FAST | LOW | Quick fixes | | `AUTO` | ADAPTIVE | ADAPTIVE | OPTIMAL | Task-based selection | ```java agent.setVariant(AgentVariant.HIGH); // Auto-select based on task keywords AgentVariant suggested = AgentVariant.forTask("Complex refactoring"); // HIGH AgentVariant suggested = AgentVariant.forTask("Fix typo"); // MINI ``` Key methods: `isQualityFocused()`, `isSpeedFocused()`, `isCostOptimized()`, `forTask(String)`. Sub-enums: `Quality` (MAX/BALANCED/BASIC/ADAPTIVE), `Speed` (SLOW/NORMAL/FAST/ADAPTIVE), `Cost` (HIGH/MEDIUM/LOW/OPTIMAL). ### Tone `com.tnsai.enums.agent.Tone` -- communication tone. | Value | Description | | -------------- | ------------------------------- | | `ANALYTICAL` | Analytical and logical | | `EMPATHETIC` | Understanding and compassionate | | `ASSERTIVE` | Direct and confident | | `FRIENDLY` | Warm and approachable | | `PROFESSIONAL` | Formal and business-like | | `CREATIVE` | Imaginative and innovative | ### Formality `com.tnsai.enums.agent.Formality` -- language formality level. ### Verbosity `com.tnsai.enums.agent.Verbosity` -- response length preference. ### AuthType `com.tnsai.enums.AuthType` -- authentication types for web service actions: `NO_AUTH`, `BEARER`, `BASIC`. ### HttpMethod `com.tnsai.enums.HttpMethod` -- HTTP methods: `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, `HEAD`, `OPTIONS`. ## Related Documentation - [Action System](/docs/agents/fundamentals/action-system) -- how ActionType routes to executors - [Tools](/docs/capabilities/tools/registration) -- the Tool interface that ToolDefinition describes - [Variants](/docs/agents/behavior/variants) -- detailed variant configuration - [Advanced Agent Features](/docs/agents/advanced) -- how identities and variants are used - [Roles](/docs/agents/fundamentals/roles) -- role-based action discovery with @ActionSpec --- # Capabilities URL: https://tnsai.dev/docs/capabilities Description: Pluggable building blocks that give an agent power beyond raw LLM calls. import { Callout } from 'fumadocs-ui/components/callout' ## Sections - [Tools](/docs/capabilities/tools) — Built-in catalog (62 POJO toolkits, \~206 `@Tool` methods, 29 categories), custom tools, registration. - [Skills](/docs/capabilities/skills) — On-demand modular knowledge (Claude Code's Skills layer), `SKILL.md` parser, resolver policies, per-skill tool scope. - [Intelligence](/docs/capabilities/intelligence) — Planning (GOAP/HTN), reasoning (ReAct/ToT), FSM, context, learning. - [RAG](/docs/capabilities/rag) — Knowledge bases, retrieval strategies, production pipelines. - [LLM](/docs/capabilities/llm) — Providers, routing, caching, cost tracking, audio. ## Related - [Agents](/docs/agents) — Where capabilities get composed. - [Deploy & Integrate](/docs/integrate) — Where capabilities reach the outside world. --- # Advanced Intelligence Patterns URL: https://tnsai.dev/docs/capabilities/intelligence/advanced Description: Advanced cognitive capabilities in TnsAI.Intelligence for reasoning, memory consolidation, output validation, and iterative refinement. import { Callout } from 'fumadocs-ui/components/callout' ## Reasoning Strategies TnsAI.Intelligence provides three advanced reasoning executors that go beyond simple prompt-response patterns. Each implements a different strategy for exploring solution spaces. ### ReActExecutor The `ReActExecutor` implements the ReAct (Reason + Act + Observe) pattern where an LLM explicitly reasons about what to do, takes an action, observes the result, and repeats until the goal is achieved. This produces structured Thought/Action/Observation traces for interpretability and decision auditing. Based on "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022). ```java ReActExecutor react = ReActExecutor.builder() .llm(client) .actionHandler((name, input) -> { if ("search".equals(name)) return Optional.of(searchService.search(input)); if ("calculate".equals(name)) return Optional.of(calculator.eval(input)); return Optional.empty(); }) .maxSteps(10) .thoughtFormat(ThoughtFormat.STRUCTURED) .stopCondition(StopCondition.finalAnswerDetected()) .availableActions("search, calculate") .timeout(Duration.ofMinutes(5)) .build(); ReActResult result = react.execute("Find the population of Tokyo and compare it to NYC"); System.out.println("Answer: " + result.getFinalAnswer()); System.out.println("Status: " + result.getStatus()); // SUCCESS, TIMEOUT, ERROR, MAX_STEPS_REACHED, STOPPED System.out.println("Steps: " + result.getSteps().size()); System.out.println("LLM calls: " + result.getTotalLLMCalls()); System.out.println("Duration: " + result.getDuration()); // Inspect individual reasoning steps for (ReActStep step : result.getSteps()) { System.out.println("Step " + step.getStepNumber()); System.out.println(" Thought: " + step.getThought()); System.out.println(" Action: " + step.getAction()); System.out.println(" Observation: " + step.getObservation()); } ``` **ThoughtFormat options**: `STRUCTURED` (formal Thought/Action/Observation format) or `FREE_FORM` (open-ended reasoning). The `ActionHandler` is a `@FunctionalInterface` that takes an action name and input, returning `Optional` (empty means action not available). **ReActStatus values**: `SUCCESS`, `TIMEOUT`, `ERROR`, `MAX_STEPS_REACHED`, `STOPPED`. ### TreeOfThoughtsExecutor The `TreeOfThoughtsExecutor` explores multiple reasoning paths by generating candidate thoughts at each step, evaluating them for promise, pruning low-quality branches, and continuing on promising paths. Based on "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" (Yao et al., 2023). ```java TreeOfThoughtsExecutor tot = TreeOfThoughtsExecutor.builder() .llm(client) .evaluator(BranchEvaluator.llm(evalClient)) .pruning(PruningStrategy.BEAM_SEARCH) .beamWidth(3) .maxDepth(5) .branchingFactor(3) .pruneThreshold(0.3) .timeout(Duration.ofMinutes(10)) .build(); ToTResult result = tot.explore("Design a REST API for a todo app"); System.out.println("Best path: " + result.getBestPath()); System.out.println("Total nodes: " + result.getTotalNodes()); System.out.println("Pruned: " + result.getPrunedNodes()); System.out.println("Max depth reached: " + result.getMaxDepthReached()); ``` **PruningStrategy options**: | Strategy | Behavior | | --------------- | ------------------------------------------------- | | `BEAM_SEARCH` | Keep top-K nodes by score at each level (default) | | `BEST_FIRST` | Sort by cumulative score, keep top-K | | `GREEDY` | Keep only the single best node | | `DEPTH_LIMITED` | No score-based pruning, only depth limit | | `EXHAUSTIVE` | No pruning at all | ### GraphOfThoughtsExecutor The `GraphOfThoughtsExecutor` extends Tree of Thoughts by allowing cycles and merging of thought branches. Better for problems where partial solutions can be combined. Based on "Graph of Thoughts" (Besta et al., 2023). ```java GraphOfThoughtsExecutor got = GraphOfThoughtsExecutor.builder() .llm(client) .evaluator(BranchEvaluator.llm(client)) .operations(List.of(GoTOperation.GENERATE, GoTOperation.AGGREGATE, GoTOperation.REFINE)) .maxNodes(20) .branchingFactor(3) .timeout(Duration.ofMinutes(10)) .build(); GoTResult result = got.explore("Design a database schema for e-commerce"); System.out.println("Best thought: " + result.getBestThought()); System.out.println("Aggregated insight: " + result.aggregatedInsight()); System.out.println("Total nodes: " + result.totalNodes()); System.out.println("Merges performed: " + result.mergeCount()); System.out.println("Duration: " + result.duration()); ``` **GoTOperation types**: `GENERATE` (create child thoughts), `AGGREGATE` (merge top-scoring nodes into a unified solution), `REFINE` (improve existing thoughts), `SCORE` (evaluate nodes). The executor uses a frontier-based exploration: each iteration generates children, refines promising nodes (score \\> 0.3), and aggregates top nodes from different branches. The frontier is pruned to the top `branchingFactor` nodes each round. > **Cross-reference**: For basic reasoning configuration, see [Reasoning](/docs/capabilities/intelligence/reasoning). ## Atom of Thought (AoT) The `AotProcessor` implements the Atom of Thought reasoning technique that decomposes complex problems into independent "atoms" solved in parallel. Unlike Chain-of-Thought where errors in early steps propagate, AoT isolates atoms so one bad result does not ruin the answer. Based on "Atom of Thoughts for Markov LLM Test-Time Scaling" (arXiv, Feb 2025). ### Benefits and Trade-offs - +30-40% accuracy improvement on complex reasoning tasks - Error isolation between atoms - Parallel execution of independent atoms - Per-atom confidence tracking - Trade-off: +20-30% token usage due to multiple LLM calls ### Usage ```java // Simple usage AotProcessor aot = new AotProcessor(llmClient); AotResult result = aot.process("Calculate compound interest for $1000 at 5% for 3 years"); System.out.println("Answer: " + result.getAnswer()); System.out.println("Confidence: " + result.getConfidence() + "%"); // With configuration AotProcessor aot = AotProcessor.builder() .llmClient(llmClient) .maxAtoms(8) .parallelExecution(true) .minConfidence(40f) .complexityThreshold(50) .synthesisStrategy(AtomSynthesizer.SynthesisStrategy.WEIGHTED_COMBINATION) .build(); AotResult result = aot.process(complexQuestion); System.out.println(result.getDetailedTrace()); // Force AoT even for simple questions AotResult forced = aot.processForced("What is 2+2?"); // Check question complexity before processing int score = aot.getComplexityScore("Compare Python and Java for web development"); ``` ### How It Works 1. **Complexity check**: The processor scores the question for complexity (multi-step indicators, calculation keywords, comparison keywords). Below the threshold, it processes as a single atom. 2. **AtomGenerator**: Decomposes the problem into atoms. Tries heuristic patterns first (calculation, comparison, list), then falls back to LLM-based decomposition. Each atom has an `id`, `description`, `prompt`, `type` (REASONING, RETRIEVAL, COMPUTATION, VALIDATION, SYNTHESIS), `dependencies`, and `priority`. 3. **AtomSolver**: Solves atoms in parallel with dependency resolution. Groups atoms by level (independent atoms first, then dependent ones). Includes dependency context in prompts and extracts confidence scores from LLM responses. 4. **AtomSynthesizer**: Combines atom solutions into a final answer using the configured synthesis strategy, weighting by confidence. Call `aot.shutdown()` to release the executor thread pool when done. ## Memory Consolidation The `MemoryConsolidationPipeline` automates the transition from short-term conversation memory to long-term knowledge storage. ### Pipeline Steps 1. Extract knowledge from the session (via `KnowledgeExtractor`) 2. Filter by confidence threshold 3. Apply forgetting curve to older knowledge 4. Store high-value knowledge (via `KnowledgeStore`) 5. Compact conversation context if compactor provided ```java MemoryConsolidationPipeline pipeline = MemoryConsolidationPipeline.builder() .knowledgeExtractor(extractor) .contextCompactor(compactor) .knowledgeStore(store) .forgettingCurve(ForgettingCurve.EXPONENTIAL) .compactionConfig(compactionConfig) .minKnowledgeConfidence(0.5) .build(); ConsolidationResult result = pipeline.consolidate(session); System.out.println("Extracted: " + result.knowledgeCount() + " knowledge items"); System.out.println("Original turns: " + result.originalCount()); System.out.println("Consolidated turns: " + result.consolidatedCount()); System.out.println("Summary: " + result.summary()); System.out.println("Duration: " + result.duration()); ``` ### ForgettingCurve Controls how memory importance decays over time when not accessed. Knowledge that decays below 0.1 is automatically deleted from the store. | Curve | Formula | Behavior | | ------------- | --------------------------------- | ------------------------------------------------------------ | | `NONE` | No decay | All memories retain full importance forever | | `LINEAR` | `score * max(0, 1 - days * 0.01)` | Gradual linear decline, reaches zero after \~100 days | | `EXPONENTIAL` | `score * e^(-0.05 * days)` | Ebbinghaus-inspired rapid initial decay that slows over time | ```java double decayed = ForgettingCurve.EXPONENTIAL.decay(0.9, Duration.ofDays(7)); ``` > **Cross-reference**: For context management fundamentals, see [Context](/docs/capabilities/intelligence/context). ## Context Compaction The `ContextCompactor` interface provides strategies for reducing conversation context size when approaching the token limit. ### ContextCompactor Interface ```java public interface ContextCompactor { CompactionResult compact(List> messages, CompactionConfig config); boolean shouldCompact(int currentTokens, int maxTokens, CompactionConfig config); int estimateTokens(List> messages); } ``` ### TwoPhaseCompactor Combines truncation and LLM summarization in two phases for optimal context reduction. - **Phase 1** (`TruncatingCompactor`): Truncates large tool-call arguments and tool-result content in older messages. Fast, requires no LLM call. - **Phase 2** (`LLMContextCompactor`): Summarizes older messages using an LLM. Only runs if phase 1 alone is insufficient. ```java LLMClient summarizer = LLMClientFactory.create("openai", "gpt-4o-mini"); TwoPhaseCompactor compactor = new TwoPhaseCompactor( new TruncatingCompactor(200), // max 200 chars per truncated field new LLMContextCompactor(summarizer) ); CompactionConfig config = CompactionConfig.builder() .thresholdRatio(0.85) // trigger compaction at 85% capacity .preserveLastN(5) // keep last 5 messages intact .build(); if (compactor.shouldCompact(currentTokens, maxTokens, config)) { CompactionResult result = compactor.compact(messages, config); String summary = result.summary(); int savedTokens = result.originalTokenCount() - result.compactedTokenCount(); } ``` ## Structured Output Validation The `StructuredOutputExecutor` ensures LLM outputs conform to an expected schema with automatic retry on validation failure. ```java StructuredOutputExecutor executor = StructuredOutputExecutor.builder() .llm(client) .targetType(OrderSummary.class) .outputFormat(OutputFormat.JSON) .rules(List.of( ValidationRule.notNull("orderId"), ValidationRule.range("total", 0, 100000), ValidationRule.pattern("email", "^[\\w.-]+@[\\w.-]+$") )) .maxRetries(3) .systemPrompt("You are an order processing assistant.") .build(); StructuredOutputExecutor.StructuredOutputResult result = executor.generate("Summarize this order: ..."); if (result.success()) { OrderSummary order = result.value(); System.out.println("Parsed in " + result.attempts() + " attempts"); } else { System.out.println("Failed after " + result.attempts() + " attempts"); System.out.println("Errors: " + result.errors()); } ``` ### ValidationRule Built-in validation rules for common checks: | Factory Method | Behavior | | ----------------------------------------------- | --------------------------------------- | | `ValidationRule.notNull(fieldName)` | Field must exist and be non-null | | `ValidationRule.range(fieldName, min, max)` | Numeric field must be within range | | `ValidationRule.pattern(fieldName, regex)` | String field must match regex | | `ValidationRule.custom(description, predicate)` | Custom predicate on the entire data map | On validation failure, the executor sends a correction prompt containing the specific errors and re-requests a valid response, up to `maxRetries` times. ## NormEngine The `NormEngine` evaluates and enforces behavioral norms (obligations, prohibitions, permissions) on agent actions at runtime. Norms can be defined via `@Norm` annotations on role classes or programmatically. ### Annotation-Based Norms ```java @Norms({ @Norm(type = NormType.OBLIGATION, action = "logActivity", description = "Must log all activities", priority = 10), @Norm(type = NormType.PROHIBITION, action = "shareData", condition = "isConfidential", description = "Must not share confidential data", priority = 20), @Norm(type = NormType.PERMISSION, action = "readPublicData", description = "May read public data") }) public class AnalystRole { } NormEngine engine = NormEngine.fromAnnotations(AnalystRole.class); ``` ### Programmatic Norms ```java NormEngine engine = NormEngine.of( new NormEntry(NormType.PROHIBITION, "isConfidential", "shareData", "Must not share confidential data", 20), new NormEntry(NormType.OBLIGATION, "", "logActivity", "Must log all activities", 10) ); // Add norms dynamically at runtime engine.addNorm(new NormEntry(NormType.PERMISSION, "", "readData", "May read data", 5)); ``` ### Checking Actions ```java // Check if an action is permitted (evaluates prohibitions) NormEngine.CheckResult result = engine.checkAction("shareData", condition -> condition.equals("isConfidential") && context.isConfidential()); if (result.isViolation()) { for (NormViolation v : result.violations()) { System.out.println("Violated: " + v.norm().description()); } } // Check for unfulfilled obligations NormEngine.CheckResult obligations = engine.checkObligations( Set.of("readData"), // actions already performed condition -> true // condition evaluator ); // Query active norms for current context List activeObligations = engine.getActiveObligations(cond -> true); List activeProhibitions = engine.getActiveProhibitions(cond -> true); List activePermissions = engine.getActivePermissions(cond -> true); List allNorms = engine.getAllNorms(); ``` ### NormEntry Record Fields: `type()` (NormType), `condition()`, `action()`, `description()`, `priority()`. The `hasCondition()` method returns true if the norm has a non-blank condition expression. ## RefinementLoop The `RefinementLoop` iteratively refines LLM outputs until they meet predefined quality standards. Inspired by Claude's Ralph plugin, it repeatedly evaluates output against completion criteria and re-prompts for corrections. ```java RefinementLoop loop = RefinementLoop.builder() .task("Convert Python to TypeScript") .completionCriteria(CompletionCriteria.builder() .compilerCheck("tsc --noEmit") .testCommand("npm test") .mustNotContain("def ", "import ") .mustContain("interface", "type") .customCheck("no-any", code -> !code.contains(": any")) .build()) .maxIterations(10) .timeout(Duration.ofMinutes(30)) .stopHook(StopHook.onAllTestsPass()) .onIteration(iter -> log.info("Iteration {} score: {}", iter.iterationNumber(), iter.evaluation().overallScore())) .build(); RefinementResult result = loop.execute(agent, pythonCode); // or loop.execute(llmClient, input) System.out.println("Final output: " + result.getFinalOutput()); System.out.println("Iterations: " + result.getIterationCount()); System.out.println("Status: " + result.getStatus()); // SUCCESS, TIMEOUT, MAX_ITERATIONS, STOPPED, ERROR System.out.println("Duration: " + result.getDuration()); System.out.println("Metrics: " + result.getMetrics()); // totalIterations, totalDurationMs, totalTokens, scoreProgression, finalScore ``` ### CompletionCriteria Defines when the refinement loop should stop. Combines multiple check types: ```java CompletionCriteria criteria = CompletionCriteria.builder() .withLLM(llmClient) // required for llmCheck .compilerCheck("tsc --noEmit") // shell command (exit 0 = pass) .testCommand("npm test") // shell command .lintCheck("eslint .") // shell command .mustContain("interface", "type") // required content .mustNotContain("def ", "import ") // forbidden content .matchesPattern("export default .*") // regex match .wordCount(50, 5000) // word count range .minLines(10) // minimum line count .validJson() // valid JSON check .customCheck("no-any", code -> !code.contains(": any")) .llmCheck("Is this idiomatic TypeScript?", 0.8) // LLM-based quality check .build(); EvaluationResult eval = criteria.evaluate(output); boolean done = eval.allCriteriaMet(); double score = eval.overallScore(); // 0.0 - 1.0 List passed = eval.passed(); List failed = eval.failed(); List reasons = eval.getFailureReasons(); ``` ### RefinementStatus Possible terminal states: `SUCCESS` (all criteria met), `TIMEOUT`, `MAX_ITERATIONS`, `STOPPED` (stop hook triggered), `ERROR`. The refinement prompt includes failed checks as issues to fix and passed checks as elements to preserve, guiding the LLM toward incremental improvement. --- # Context Management URL: https://tnsai.dev/docs/capabilities/intelligence/context Description: Context window management, decision tracing, session history, knowledge extraction, automatic memory consolidation, and auto-summarization. These components help agents operate effectively within token limits and learn from past interactions. import { Callout } from 'fumadocs-ui/components/callout' ## Auto-Summarization (Context Compaction) LLMs have a fixed context window, and long conversations will eventually exceed it. Compaction strategies automatically shrink the conversation history to fit within the token budget while preserving as much important information as possible. Package: `com.tnsai.context.compaction`. ### TruncatingCompactor The simplest strategy: removes oldest messages (preserving system prompts) until the token budget is met. Zero cost -- no LLM calls required. ```java ContextCompactor compactor = new TruncatingCompactor(); List> compacted = compactor.compact(messages, 4096); ``` ### TwoPhaseCompactor Two-phase strategy that first truncates tool call arguments, then uses an LLM to summarize older conversation turns. Preserves more semantic information than pure truncation. ```java TwoPhaseCompactor compactor = TwoPhaseCompactor.builder() .llm(llmClient) .argTruncationThreshold(500) // truncate tool args over 500 chars .summaryRatio(0.3) // summarize oldest 30% of messages .build(); List> compacted = compactor.compact(messages, 8192); ``` **Phase 1 -- Argument Truncation**: Tool call arguments and results exceeding the threshold are truncated to their keys and types. This often reclaims significant tokens without losing conversational context. **Phase 2 -- LLM Summarization**: If Phase 1 does not reduce the context enough, older messages are sent to the LLM for summarization. The summary replaces the original messages as a single system message. | Compactor | Cost | Information Loss | Best For | | --------------------- | ---------------- | --------------------------------- | ---------------------------------------- | | `TruncatingCompactor` | Free | High (oldest context lost) | Short conversations, cost-sensitive | | `TwoPhaseCompactor` | Low (1 LLM call) | Low (summary preserves key facts) | Long conversations, multi-turn reasoning | ## LLMContextCompactor A standalone LLM-based compactor that gives you fine-grained control over when compaction triggers, how many recent messages to preserve, and how the summary is generated. Use this when you need precise configuration beyond what TwoPhaseCompactor offers. ```java LLMClient summarizer = LLMClientFactory.create("openai", "gpt-4o-mini"); LLMContextCompactor compactor = new LLMContextCompactor(summarizer); CompactionConfig config = CompactionConfig.builder() .thresholdRatio(0.80) .preserveLastN(5) .preserveSystemPrompt(true) .minMessagesForCompaction(10) .summarizerModel("gpt-4o-mini") .maxSummaryTokens(500) .build(); CompactionResult result = compactor.compact(messages, config); ``` ### CompactionConfig These settings control when compaction fires and how much context is preserved versus summarized. | Parameter | Default | Description | | -------------------------- | -------- | ------------------------------------------------ | | `thresholdRatio` | 0.80 | Token ratio at which compaction triggers | | `preserveLastN` | 5 | Recent messages to keep unmodified | | `preserveSystemPrompt` | true | Always keep system prompt | | `minMessagesForCompaction` | 10 | Minimum messages before compaction | | `summarizerModel` | null | Model for summarization (null = agent's default) | | `summaryPromptTemplate` | built-in | Custom template (use `{messages}` placeholder) | | `maxSummaryTokens` | 500 | Max tokens for the generated summary | ### Compaction Process The compactor follows a four-step process that ensures the system prompt and recent messages are always preserved, while older messages are summarized into a compact form. 1. Separate system prompt (if preserved) 2. Identify last N messages to preserve 3. Summarize remaining messages via LLM 4. Reconstruct: \[system prompt] + \[summary as system message] + \[preserved messages] ## AgentContextManager The central hub for observing and recording what your agent does during a conversation. It captures state snapshots at key moments, traces the reasoning behind each decision, tracks entities mentioned in the conversation, and manages conversation lifecycle. This is essential for debugging, auditing, and building agents that can learn from their own history. ```java AgentContextManager contextManager = new AgentContextManager( "agent-001", new InMemorySnapshotStore(), new InMemoryDecisionTraceStore() ); ``` ### Context Snapshots A snapshot captures the agent's complete state (beliefs, desires, intentions) at a specific moment. By taking snapshots before and after decisions, you can see exactly what changed and why. ```java ContextSnapshot snapshot = contextManager.createSnapshot( SnapshotTrigger.PRE_DECISION, agent.getBeliefs(), agent.getDesires(), agent.getIntentions() ); List recent = contextManager.getRecentSnapshots(10); Optional diff = contextManager.diffSnapshots(beforeId, afterId); ``` ### Decision Tracing Wrap any agent operation in a decision trace to automatically record what the agent was thinking before and after the decision, what it decided, and whether it succeeded. This creates an audit trail of every significant action. ```java // Full version with BDI suppliers String response = contextManager.executeWithTrace( "Process discount request", () -> agent.chat("Should we approve 20% discount?"), agent::getBeliefs, agent::getDesires, agent::getIntentions ); // Simplified version (without BDI) String result = contextManager.executeWithTrace( "Summarize report", () -> agent.chat("Summarize this...") ); ``` This creates pre/post snapshots, records the full decision trace, and handles errors transparently. ### Querying Decisions Once decisions are recorded, you can query them by various criteria -- recent decisions, failures, successes, or by action name. The manager also supports finding similar past decisions and generating human-readable explanations. ```java List recent = contextManager.getRecentDecisions(10); List failures = contextManager.findFailedDecisions(); List successes = contextManager.findSuccessfulDecisions(); List byAction = contextManager.findDecisionsByAction("approve"); Optional explanation = contextManager.explainDecision(traceId); List precedents = contextManager.findPrecedents(referenceTrace, 0.7); List summaries = contextManager.summarizeRecentDecisions(5); ``` ### Entity Tracking Register external entities (users, resources, tasks) so they are included in context snapshots. This helps the agent maintain awareness of what objects it is working with. ```java contextManager.trackEntity(customerEntity); contextManager.untrackEntity("entity-123"); Collection tracked = contextManager.getTrackedEntities(); ``` ### Context Variables Store arbitrary key-value metadata that gets included in every snapshot. Use this for session-level information like the current user, project, or priority level. ```java contextManager.setVariable("sessionType", "support"); contextManager.setVariable("priority", 5); Optional typed = contextManager.getVariable("priority", Integer.class); ``` ### Conversation Lifecycle Group related decisions into a conversation so you can query all decisions that happened during a specific interaction. ```java String convId = contextManager.startConversation(); // ... process messages ... List inConv = contextManager.findDecisionsInCurrentConversation(); contextManager.endConversation(); ``` ### Cleanup Remove old traces and snapshots to prevent unbounded storage growth. Pass a cutoff time to delete everything older than that point. ```java Instant cutoff = Instant.now().minus(Duration.ofDays(30)); int deletedTraces = contextManager.cleanupOldTraces(cutoff); int deletedSnapshots = contextManager.cleanupOldSnapshots(cutoff); ``` ## BDIContextMapper An internal utility that converts the agent's BDI (Belief-Desire-Intention) state into the context graph structures used by `AgentContextManager`. You typically do not need to use this directly -- the context manager handles the mapping automatically. ## AgentSession A session records the full conversation between a user and an agent: every message, tool call, and result, along with metadata and outcome. Sessions are the building blocks for history search, knowledge extraction, and learning from past interactions. ```java AgentSession session = AgentSession.start("code-assistant", "/path/to/project"); session.addUserMessage("Fix the authentication bug"); session.addAssistantMessage("I'll analyze the auth module..."); session.addToolCall("readFile", Map.of("path", "auth.java")); session.addToolResult("readFile", "public class Auth { ... }"); session.addMetadata("model", "gpt-4o"); session.addTags("security", "bug-fix"); session.success(); // or: session.fail("Compilation error"); // or: session.cancel(); session.getTurnCount(); // 4 session.getDuration(); // Duration since start session.getTurnsByType(TurnType.TOOL_CALL); // Tool call turns only session.getFirstUserMessage(); // "Fix the authentication bug" session.isComplete(); // true session.isSuccess(); // true ``` ### SessionOutcome Tracks how the session ended. This is useful for filtering sessions during analysis -- for example, finding all failed sessions to understand common failure modes. | Value | Description | | ------------- | ---------------------- | | `IN_PROGRESS` | Session is active | | `SUCCESS` | Completed successfully | | `FAILED` | Ended with error | | `CANCELLED` | User cancelled | ## SessionStore / FileSessionStore Persist agent sessions to disk so they survive application restarts. `FileSessionStore` saves each session as a JSON file and provides query methods to find sessions by agent, project, tag, outcome, or date range. ```java SessionStore store = SessionStore.file(".tnsai/history/"); store.save(session); Optional loaded = store.load("session-123"); store.findRecent(10); store.findByAgent("code-assistant"); store.findByProject("/my/project"); store.findByTag("security"); store.findByOutcome(SessionOutcome.SUCCESS); store.findByDateRange(LocalDate.now().minusDays(7), LocalDate.now()); store.count(); ``` ## SearchableSessionStore / SessionQuery When you need to search session contents (not just metadata), use `SearchableSessionStore`. It supports full-text search combined with filters for agent, project, tags, outcome, and date range. ```java SessionQuery query = SessionQuery.builder() .text("authentication bug") .agent("code-assistant") .project("/my/project") .tags("security", "urgent") .outcome(SessionOutcome.SUCCESS) .dateRange(LocalDate.now().minusDays(7), LocalDate.now()) .searchMode(SessionQuery.SearchMode.ALL) .limit(20) .offset(0) .build(); List results = searchableStore.search(query); ``` ### SessionQuery Convenience Shortcut factory methods for common query patterns so you do not need to use the builder for simple cases. ```java SessionQuery recent = SessionQuery.recent(10); SessionQuery textSearch = SessionQuery.text("database migration"); SessionQuery lastWeek = SessionQuery.builder().lastDays(7).build(); ``` ### SearchMode Controls how the text search matches against session content. | Mode | Description | | ------- | ------------------------ | | `ANY` | Match any word (default) | | `ALL` | Match all words | | `EXACT` | Match exact phrase | | `REGEX` | Regular expression match | ## AutoConsolidationManager Monitors conversation state and triggers memory consolidation automatically based on configurable thresholds. ```java AutoConsolidationManager manager = AutoConsolidationManager.builder() .pipeline(consolidationPipeline) .triggers(Set.of( ConsolidationTrigger.MESSAGE_COUNT, ConsolidationTrigger.TOKEN_LIMIT, ConsolidationTrigger.ON_SESSION_END)) .messageCountThreshold(50) .tokenBudget(8192) .tokenThresholdPercent(80) .build(); // Call on each message ConsolidationResult result = manager.onMessageAdded(session, estimatedTokens); if (result != null) { System.out.println("Consolidated: " + result.knowledgeCount() + " items"); } // Call when session ends manager.onSessionEnd(session); ``` ### ConsolidationTrigger These triggers define when automatic consolidation runs. You can enable one or more triggers depending on your needs. | Trigger | Default | Fires When | | ---------------- | ------------- | --------------------------------- | | `MESSAGE_COUNT` | 50 messages | Counter reaches threshold | | `TOKEN_LIMIT` | 80% of budget | Estimated tokens exceed threshold | | `ON_SESSION_END` | always | `onSessionEnd()` is called | Consolidation is thread-safe: concurrent triggers are skipped if one is already in progress. ## LLMKnowledgeExtractor Uses an LLM to analyze session conversations and extract reusable knowledge. ```java KnowledgeExtractor extractor = new LLMKnowledgeExtractor(llmClient, 0.5); List knowledge = extractor.extract(session); // Filter by type knowledge.stream() .filter(k -> k.type() == ExtractionType.SOLUTION) .filter(k -> k.isHighConfidence()) .forEach(k -> System.out.println(k.summary())); // Extract specific types only List decisions = extractor.extract(session, EnumSet.of(ExtractionType.DECISION, ExtractionType.LEARNING)); // Adjust confidence threshold KnowledgeExtractor strict = extractor.withMinConfidence(0.8); ``` ### ExtractionType Five categories of knowledge that can be extracted from session conversations. Each type has a different practical use -- solutions help with similar problems in the future, patterns provide reusable templates, and antipatterns warn about approaches to avoid. | Type | Key | Description | | ------------- | ------------------ | ---------------------------------------- | | `SOLUTION` | `problem-solution` | A problem and how it was solved | | `PATTERN` | `pattern` | A reusable code or design pattern | | `DECISION` | `decision` | An architectural decision with rationale | | `LEARNING` | `learning` | A lesson learned or insight | | `ANTIPATTERN` | `antipattern` | An approach to avoid | Each `ExtractedKnowledge` item includes content, optional problem description, code snippets, related files, tags, and a confidence score (0.0-1.0). ### Knowledge Storage Persist extracted knowledge for later retrieval. Two implementations are provided: in-memory for testing and file-based for production. ```java KnowledgeStore store = new InMemoryKnowledgeStore(); // Or persist to files: KnowledgeStore store = new FileKnowledgeStore(".tnsai/knowledge/"); store.save(knowledge); List solutions = store.findByType(ExtractionType.SOLUTION); ``` --- # Finite State Machine URL: https://tnsai.dev/docs/capabilities/intelligence/fsm Description: Deterministic state machine for bounded agent autonomy. Provides guard-based transitions, entry/exit actions, automatic transitions, event payloads, listeners, and visualization to Mermaid and Graphviz DOT. import { Callout } from 'fumadocs-ui/components/callout' ## Quick Start This example builds a simple review workflow with five states and guarded transitions. The machine starts in IDLE, moves through PROCESSING and REVIEW, and ends in either APPROVED or REJECTED based on a confidence score. ```java State idle = State.initial("IDLE"); State processing = State.of("PROCESSING"); State review = State.of("REVIEW"); State approved = State.terminal("APPROVED"); State rejected = State.terminal("REJECTED"); StateMachine sm = StateMachine.builder("ReviewWorkflow") .states(idle, processing, review, approved, rejected) .transition(Transition.from(idle).to(processing).on("START")) .transition(Transition.from(processing).to(review).on("SUBMIT")) .transition(Transition.from(review).to(approved) .on("REVIEW_COMPLETE") .when(Guard.greaterThan("confidence", 0.9))) .transition(Transition.from(review).to(rejected) .on("REVIEW_COMPLETE") .when(Guard.lessThan("confidence", 0.5))) .maxTransitions(100) .build(); sm.start(); sm.fire(Event.of("START")); sm.fire(Event.of("SUBMIT")); sm.getContext().set("confidence", 0.95); sm.fire(Event.of("REVIEW_COMPLETE")); System.out.println(sm.getCurrentState().getName()); // "APPROVED" System.out.println(sm.isInTerminalState()); // true ``` ## State States can be initial, terminal, or intermediate. Each supports entry/exit actions, timeouts, and metadata. ### Factory Methods Use these to create states quickly. Every state machine needs exactly one initial state, and at least one terminal state where the machine stops. | Method | Description | | ---------------------- | -------------------------------------- | | `State.initial(name)` | Entry point -- exactly one per machine | | `State.of(name)` | Regular intermediate state | | `State.terminal(name)` | End state -- machine completes here | ### Builder For more control, the builder lets you attach entry/exit actions (code that runs when a state is entered or left), set timeouts, and store metadata on the state. ```java State processing = State.builder("PROCESSING") .initial() // Mark as initial .onEntry(ctx -> ctx.set("startTime", System.currentTimeMillis())) .onExit(ctx -> { long duration = System.currentTimeMillis() - (long) ctx.get("startTime"); ctx.set("processingDuration", duration); }) .timeout(30_000) // 30 second timeout (ms) .metadata("retryable", true) .build(); processing.isInitial(); // true processing.isTerminal(); // false processing.hasTimeout(); // true processing.getTimeoutMs(); // 30000 processing.getSpec("retryable"); // true ``` States are identified by name. Two states with the same name are considered equal. ## Transition Transitions define movement between states, triggered by events with optional guards, actions, and priority. ```java // Simple transition Transition t1 = Transition.from(idle).to(processing).on("START").build(); // With guard and priority Transition t2 = Transition.from(review).to(approved) .on("REVIEW_COMPLETE") .when(Guard.greaterThan("confidence", 0.9)) .priority(10) .build(); // With transition action Transition t3 = Transition.from(review).to(rejected) .on("REVIEW_COMPLETE") .when(Guard.lessThan("confidence", 0.5)) .action((ctx, from, to) -> ctx.set("reason", "Low confidence")) .build(); // Automatic transition (no event required, fires when guard passes) Transition auto = Transition.from(processing).to(review) .automatic() .when(Guard.equals("processed", true)) .build(); // Internal (self-loop) transition Transition internal = Transition.internal(processing) .on("RETRY") .action((ctx, from, to) -> { int count = ctx.getOrDefault("retries", 0); ctx.set("retries", count + 1); }) .build(); ``` When multiple transitions match the same event from the same state, the one with the highest `priority` wins. ## Event Events are the signals that drive the state machine forward. When you fire an event, the machine looks for a matching transition from the current state. Events can carry key-value payload data that guards and actions can read. ```java Event start = Event.of("START"); Event review = Event.of("REVIEW_COMPLETE") .withData("confidence", 0.95) .withData("reviewer", "agent-1"); double confidence = review.getData("confidence"); // 0.95 String reviewer = review.getDataOrDefault("reviewer", "unknown"); review.hasData("notes"); // false sm.fire(review); ``` ## StateMachineContext The context is a shared key-value store that guards, actions, and listeners can all read and write. Use it to pass data between states, track results, record errors, and review the full transition history. ```java StateMachineContext ctx = new StateMachineContext(Map.of("userId", "u123")); ctx.set("processed", true); ctx.get("userId"); // "u123" ctx.getOrDefault("retries", 0); // 0 ctx.has("processed"); // true ctx.remove("processed"); ctx.setResult("Task completed"); String result = ctx.getResult(); ctx.setError("Timeout exceeded"); ctx.hasError(); // true // Transition history List history = ctx.getHistory(); for (var record : history) { System.out.printf("%s -> %s via %s%n", record.fromState(), record.toState(), record.event()); } // Start machine with pre-populated context sm.start(new StateMachineContext(Map.of("threshold", 0.8))); ``` ## Guard Guards are conditions that must be true for a transition to fire. They check the context (shared state) and optionally the event payload. If a guard returns false, the transition is skipped even if the event matches. TnsAI provides a rich set of built-in guards plus composition operators so you can combine them. ### Built-in Guards These factory methods cover the most common conditions. For anything more complex, use a lambda or compose built-in guards with `.and()`, `.or()`, and `.negate()`. | Guard | Description | | --------------------------------- | ------------------------------------- | | `Guard.always()` | Always true (default for transitions) | | `Guard.never()` | Always false | | `Guard.hasKey("k")` | True if key exists in context | | `Guard.equals("k", val)` | True if context value equals expected | | `Guard.greaterThan("k", 0.9)` | Numeric comparison (`>`) | | `Guard.lessThan("k", 0.5)` | Numeric comparison (`<`) | | `Guard.eventDataEquals("k", val)` | Check current event's payload | ### Custom and Composed Guards You can write guards as lambdas or compose built-in guards using `.and()`, `.or()`, and `.negate()` for complex conditions. ```java // Lambda guard Guard isAuthenticated = ctx -> ctx.has("userId"); // Event data guard Guard approvedStatus = ctx -> { Event event = ctx.getCurrentEvent(); return "APPROVED".equals(event.getData("status")); }; // AND composition Guard safe = Guard.greaterThan("confidence", 0.8) .and(Guard.lessThan("risk", 0.3)); // OR composition Guard acceptable = Guard.equals("role", "admin") .or(Guard.equals("role", "moderator")); // Negation Guard notBlocked = Guard.hasKey("blocked").negate(); ``` ## StateMachineListener Listeners let you observe everything that happens in the state machine without modifying its behavior. Use them for logging, metrics, debugging, or triggering side effects when specific transitions occur. ```java sm.addListener(new StateMachineListener() { @Override public void onStateEntered(StateMachine sm, State state, Event event) { System.out.println("Entered: " + state.getName()); } @Override public void onTransition(StateMachine sm, State from, State to, Event event) { String trigger = event != null ? event.getName() : "auto"; System.out.printf("%s -> %s [%s]%n", from.getName(), to.getName(), trigger); } @Override public void onNoTransition(StateMachine sm, State state, Event event) { System.out.println("Unhandled event: " + event.getName()); } @Override public void onCompleted(StateMachine sm) { System.out.println("Completed in: " + sm.getCurrentState().getName()); } }); // Built-in logging listener (SLF4J) sm.addListener(StateMachineListener.logging()); // Logs: [FSM:ReviewWorkflow] IDLE -> PROCESSING [START] ``` ## StateMachineVisualizer Export your state machine as a diagram for documentation, debugging, or sharing. Three output formats are supported: Mermaid (renders in GitHub, Notion, etc.), Graphviz DOT (for offline rendering), and plain text. ### Mermaid Generates a Mermaid state diagram that renders natively on GitHub, GitLab, and many documentation tools. ```java String mermaid = StateMachineVisualizer.toMermaid(sm); // stateDiagram-v2 // %% ReviewWorkflow // [*] --> IDLE // IDLE --> PROCESSING : START // PROCESSING --> REVIEW : SUBMIT // REVIEW --> APPROVED : REVIEW_COMPLETE [guarded] // REVIEW --> REJECTED : REVIEW_COMPLETE [guarded] // APPROVED --> [*] // REJECTED --> [*] // With options String mermaid = StateMachineVisualizer.toMermaid(sm, MermaidOptions.defaults() .includeTitle(true) .includeStateDescriptions(true) .showGuards(true) .showAutoTransitions(true) .highlightCurrentState(true)); ``` ### Graphviz DOT Generates DOT format for use with Graphviz or compatible rendering tools, with configurable layout direction, colors, and fonts. ```java String dot = StateMachineVisualizer.toDot(sm); String dot = StateMachineVisualizer.toDot(sm, DotOptions.defaults() .direction("LR") // LR, TB, RL, BT .nodeShape("ellipse") .fontName("Helvetica") .currentStateColor("#90EE90") .showGuards(true)); ``` ### Plain Text A simple human-readable summary of the machine's states, transitions, and current status -- useful for logging and console output. ```java String text = StateMachineVisualizer.toText(sm); // State Machine: ReviewWorkflow // Status: COMPLETED // Current State: APPROVED // Transitions: 3 // // States: // - IDLE [initial] // - PROCESSING // - REVIEW // - APPROVED [terminal] // - REJECTED [terminal] // // Transitions: // - IDLE -> PROCESSING on START // - PROCESSING -> REVIEW on SUBMIT // - REVIEW -> APPROVED on REVIEW_COMPLETE // - REVIEW -> REJECTED on REVIEW_COMPLETE ``` ## Status Lifecycle A state machine moves through these statuses during its lifetime. You can query the current status at any time to decide whether to fire more events or handle completion. | Status | Meaning | | ------------- | --------------------------------- | | `NOT_STARTED` | Created but `start()` not called | | `RUNNING` | Active, accepting events | | `COMPLETED` | Reached a terminal state | | `FAILED` | Max transitions exceeded or error | | `TIMEOUT` | State timeout triggered | ```java sm.getStatus(); // Status.RUNNING sm.isRunning(); // true sm.isInTerminalState(); // false sm.getTransitionCount(); // 2 sm.getAvailableEvents(); // ["SUBMIT"] sm.reset(); // Back to NOT_STARTED ``` ## Comprehensive Example This example models a real-world order processing workflow with automatic transitions, validation guards, cancellation handling, and logging. It demonstrates how all the FSM building blocks fit together. ```java // Order processing workflow State pending = State.builder("PENDING").initial() .onEntry(ctx -> log.info("Order received")).build(); State validating = State.builder("VALIDATING") .timeout(10_000).build(); State payment = State.of("PAYMENT"); State fulfilled = State.terminal("FULFILLED"); State cancelled = State.terminal("CANCELLED"); StateMachine orderFSM = StateMachine.builder("OrderProcess") .states(pending, validating, payment, fulfilled, cancelled) .transition(Transition.from(pending).to(validating).automatic() .when(Guard.hasKey("orderId"))) .transition(Transition.from(validating).to(payment) .on("VALIDATED").when(Guard.equals("valid", true))) .transition(Transition.from(validating).to(cancelled) .on("VALIDATED").when(Guard.equals("valid", false)) .action((ctx, from, to) -> ctx.set("reason", "Validation failed"))) .transition(Transition.from(payment).to(fulfilled) .on("PAYMENT_RESULT").when(Guard.equals("paid", true))) .transition(Transition.from(payment).to(cancelled) .on("PAYMENT_RESULT").when(Guard.equals("paid", false))) .transition(Transition.from(validating).to(cancelled).on("CANCEL")) .transition(Transition.from(payment).to(cancelled).on("CANCEL")) .maxTransitions(50) .build(); orderFSM.addListener(StateMachineListener.logging()); StateMachineContext ctx = new StateMachineContext(Map.of("orderId", "ORD-123")); orderFSM.start(ctx); // PENDING -> VALIDATING (auto) ctx.set("valid", true); orderFSM.fire(Event.of("VALIDATED")); // -> PAYMENT ctx.set("paid", true); orderFSM.fire(Event.of("PAYMENT_RESULT")); // -> FULFILLED System.out.println(StateMachineVisualizer.toMermaid(orderFSM)); ``` ## Thread Safety `StateMachine` is not thread-safe by design to keep the implementation simple and fast. If your application fires events from multiple threads, you need to synchronize access externally. ```java synchronized (sm) { sm.fire(event); } ``` --- # Intelligence URL: https://tnsai.dev/docs/capabilities/intelligence Description: Give agents planning, reasoning, state machines, and learning capabilities. import { Callout } from 'fumadocs-ui/components/callout' ## Pages - [Planning](/docs/capabilities/intelligence/planning) — GOAP and HTN planners. - [Reasoning](/docs/capabilities/intelligence/reasoning) — ReAct, Tree of Thoughts, Self-Consistency. - [FSM](/docs/capabilities/intelligence/fsm) — Finite state machines for deterministic flows. - [Context](/docs/capabilities/intelligence/context) — Session context, decision tracing. - [Learning](/docs/capabilities/intelligence/learning) — Feedback loops, preference learning. - [Advanced](/docs/capabilities/intelligence/advanced) — Custom strategies, handle composition. --- # Learning and Refinement URL: https://tnsai.dev/docs/capabilities/intelligence/learning Description: Feedback-driven learning, normative constraint enforcement, iterative refinement loops, prompt optimization, and structured output validation. These components enable agents to improve over time and produce higher-quality outputs. import { Callout } from 'fumadocs-ui/components/callout' ## Feedback Represents user or system feedback on agent output. Four factory methods for common types: ```java Feedback positive = Feedback.thumbsUp("Great explanation!"); Feedback negative = Feedback.thumbsDown("Too verbose, needs to be concise"); Feedback fix = Feedback.correction("Use formal tone, not casual"); Feedback pref = Feedback.preference("Always include code examples"); // Attach to a session Feedback withSession = positive.withSessionId("session-123"); feedback.type(); // FeedbackType.POSITIVE feedback.content(); // "Great explanation!" feedback.timestamp(); // Instant feedback.id(); // Auto-generated UUID feedback.metadata(); // Map ``` ### FeedbackType Four feedback types cover the most common ways users and systems respond to agent output. | Type | Factory | Description | | ------------ | --------------------- | -------------------------------------- | | `POSITIVE` | `thumbsUp(comment)` | Good output -- reinforce this behavior | | `NEGATIVE` | `thumbsDown(comment)` | Bad output -- avoid this in future | | `CORRECTION` | `correction(text)` | Specific fix to apply | | `PREFERENCE` | `preference(text)` | User style/tone preference | ## FeedbackLearner Analyzes collected feedback and produces actionable learnings: prompt adjustments, user preferences, and good examples for few-shot prompting. ```java FeedbackLearner learner = FeedbackLearner.builder() .llm(client) .feedbackStore(FeedbackStore.inMemory()) .strategies(List.of( LearningStrategy.PROMPT_ADJUSTMENT, LearningStrategy.PREFERENCE_LEARNING, LearningStrategy.EXAMPLE_COLLECTION)) .minFeedbackForLearning(3) .build(); learner.recordFeedback(Feedback.correction("Use formal tone")); learner.recordFeedback(Feedback.thumbsDown("Response was too long")); learner.recordFeedback(Feedback.preference("Include code examples")); learner.recordFeedback(Feedback.thumbsUp("Perfect level of detail")); FeedbackLearner.LearningResult result = learner.learn(); result.promptAdjustments(); // ["Use formal tone", "Keep responses concise"] result.preferences(); // ["User prefers code examples"] result.goodExamples(); // ["Perfect level of detail"] result.feedbackAnalyzed(); // 4 result.hasLearnings(); // true ``` ### LearningStrategy Each strategy focuses on a different type of feedback and produces a different kind of actionable output. | Strategy | Analyzes | Produces | | --------------------- | -------------------------------- | -------------------------- | | `PROMPT_ADJUSTMENT` | Negative + correction feedback | Rules for system prompt | | `PREFERENCE_LEARNING` | Preference + correction feedback | User preference profile | | `EXAMPLE_COLLECTION` | Positive feedback | Good examples for few-shot | ### FeedbackStore A simple store for collecting feedback items. The in-memory implementation is suitable for testing; for production, persist feedback to your preferred storage backend. ```java FeedbackStore store = FeedbackStore.inMemory(); store.save(feedback); List all = store.getAll(); List corrections = store.getByType(FeedbackType.CORRECTION); ``` ## NormEngine Runtime enforcement of normative constraints extracted from `@Norm` and `@Norms` annotations. Checks actions against obligations, prohibitions, and permissions. ### Annotation-Driven Setup The easiest way to define norms is with `@Norms` and `@Norm` annotations on your agent class. The engine reads these at construction time and enforces them at runtime. ```java @Norms({ @Norm( type = NormType.PROHIBITION, action = "sharePersonalData", condition = "hasConsent == false", description = "Cannot share personal data without consent", priority = 10 ), @Norm( type = NormType.OBLIGATION, action = "logAccess", description = "Must log all data access events", priority = 5 ), @Norm( type = NormType.PERMISSION, action = "readPublicData", description = "Can read public data at any time" ) }) public class DataAgent { /* ... */ } ``` ### Using the Engine Create a `NormEngine` from annotations or explicit entries, then call `checkAction()` before performing any action to verify it does not violate any active norms. You can also check whether all obligations have been fulfilled. ```java // Create from annotations NormEngine engine = NormEngine.fromAnnotations(DataAgent.class); // Or from explicit entries NormEngine engine = NormEngine.of( new NormEntry(NormType.PROHIBITION, "hasConsent == false", "sharePersonalData", "No sharing without consent", 10), new NormEntry(NormType.OBLIGATION, "", "logAccess", "Must log access", 5) ); // Check if an action is allowed Predicate conditionEval = condition -> ConditionEvaluator.evaluate(condition, currentState); NormEngine.CheckResult result = engine.checkAction("sharePersonalData", conditionEval); if (result.isViolation()) { for (NormViolation v : result.violations()) { System.out.println("VIOLATION: " + v.description()); } } // Check obligation fulfillment Set fulfilled = Set.of("readData"); // logAccess NOT fulfilled NormEngine.CheckResult obligations = engine.checkObligations(fulfilled, conditionEval); if (obligations.isViolation()) { System.out.println("Unfulfilled obligations found"); } // Query active norms List activeObligations = engine.getActiveObligations(conditionEval); List activeProhibitions = engine.getActiveProhibitions(conditionEval); List activePermissions = engine.getActivePermissions(conditionEval); // Add norms dynamically at runtime engine.addNorm(new NormEntry(NormType.PROHIBITION, "", "deleteProduction", "Never delete production data", 100)); ``` ### NormEntry Each norm entry specifies a type (obligation, prohibition, or permission), the action it applies to, an optional condition for when it is active, and a priority for conflict resolution. ```java public record NormEntry( NormType type, // OBLIGATION, PROHIBITION, PERMISSION String condition, // When this norm is active (empty = always) String action, // The action this norm applies to String description, // Human-readable explanation int priority // Higher = more important ) { } ``` ### NormViolation When an action violates a norm, the engine returns one or more `NormViolation` records explaining what went wrong. ```java public record NormViolation( NormEntry norm, // The violated norm String action, // The action that violated it String description // Explanation of the violation ) { } ``` ## RefinementLoop Iterative refinement that repeatedly processes outputs until they meet predefined quality standards. Runs checks after each iteration and re-prompts the LLM with specific failure details. ```java RefinementLoop loop = RefinementLoop.builder() .task("Convert Python to TypeScript") .completionCriteria(CompletionCriteria.builder() .compilerCheck("tsc --noEmit") .testCommand("npm test") .mustNotContain("def ", "import ") .mustContain("interface", "export") .validJson() .build()) .maxIterations(10) .timeout(Duration.ofMinutes(30)) .onIteration(iter -> log.info("Iteration {} score: {}", iter.iterationNumber(), iter.evaluation().overallScore())) .build(); // Execute with an Agent RefinementResult result = loop.execute(agent, pythonCode); // Or with an LLMClient directly RefinementResult result = loop.execute(llmClient, pythonCode); result.getFinalOutput(); // The refined output result.getIterations(); // Total iterations run result.getStatus(); // SUCCESS, MAX_ITERATIONS, TIMEOUT, ERROR, STOPPED result.getDuration(); // Total time result.getHistory(); // List of IterationResult per iteration ``` ### RefinementStatus The final status tells you why the loop stopped, so you can handle each case appropriately. | Status | Description | | ---------------- | ------------------------ | | `SUCCESS` | All criteria met | | `MAX_ITERATIONS` | Hit iteration limit | | `TIMEOUT` | Hit time limit | | `ERROR` | LLM call failed | | `STOPPED` | StopHook triggered early | ### CompletionCriteria Defines the quality checks that must all pass for refinement to stop. You can combine compiler checks, content assertions, structural validations, custom predicates, and even LLM-based quality judgments. ```java CompletionCriteria criteria = CompletionCriteria.builder() // Shell commands (compiler, test runner, linter) .compilerCheck("javac -d out *.java") .testCommand("mvn test -q") .lintCheck("eslint --quiet .") // Content presence/absence .mustContain("public class", "@Override") .mustNotContain("System.out.println", "TODO") // Structure checks .validJson() .minLines(10) .wordCount(50, 500) .matchesPattern("class\\s+\\w+\\s+implements\\s+\\w+") // Custom predicate .customCheck("no-any-type", code -> !code.contains(": any"), "Output must not contain TypeScript 'any' type") // LLM-based quality check (requires withLLM first) .withLLM(evalClient) .llmCheck("Is this idiomatic TypeScript?", 0.8) .build(); // Evaluate against output EvaluationResult eval = criteria.evaluate(output); eval.allCriteriaMet(); // true if all required checks passed eval.overallScore(); // ratio of passed checks (0.0 to 1.0) eval.passed(); // List eval.failed(); // List eval.getFailureReasons(); // List of failure messages ``` ### Built-in Checks These are the available check types you can add to `CompletionCriteria`. Mix and match them to define exactly what "done" means for your use case. | Method | Description | | ------------------------------- | ---------------------------------------------- | | `compilerCheck(cmd)` | Shell command must exit 0 | | `testCommand(cmd)` | Test suite must pass | | `lintCheck(cmd)` | Linter must pass | | `mustContain(strings...)` | All strings must appear in output | | `mustNotContain(strings...)` | None may appear in output | | `validJson()` | Output must parse as JSON | | `matchesPattern(regex)` | Pattern must match | | `wordCount(min, max)` | Word count in range | | `minLines(min)` | Minimum line count | | `customCheck(name, predicate)` | Any `Predicate` | | `llmCheck(question, threshold)` | LLM rates quality 0-100, must exceed threshold | ## PromptOptimizer Automated prompt tuning through iterative refinement, strategy selection, and A/B testing against test cases. ```java PromptOptimizer optimizer = PromptOptimizer.builder() .llmClient(client) .maxIterations(5) .targetScore(0.9f) .candidateStrategies(List.of( PromptStrategy.CHAIN_OF_THOUGHT, PromptStrategy.CHAIN_OF_VERIFICATION, PromptStrategy.CONFIDENCE_WEIGHTED)) .build(); List testCases = List.of( new TestCase("What is 2+2?", "4"), new TestCase("Capital of France?", "Paris") ); OptimizationResult result = optimizer.optimize("Answer the question:", testCases); System.out.println("Best prompt: " + result.getBestPrompt()); System.out.println("Score: " + result.getScore()); System.out.println("Iterations: " + result.getIterations()); ``` The optimizer tries each candidate strategy, evaluates against test cases, then uses LLM-based refinement to suggest further improvements. It stops when `targetScore` is reached or `maxIterations` is exhausted. ### Strategy Suggestions If you are not sure which prompt strategy to use, the optimizer can analyze your task description and suggest strategies that are likely to work well. ```java List suggested = optimizer.suggestStrategies( "analyze and solve math problems"); // -> [CHAIN_OF_THOUGHT, STRUCTURED_THINKING] ``` ## StructuredOutputExecutor Ensures LLM outputs conform to a target type with automatic validation and retry on failure. ```java StructuredOutputExecutor executor = StructuredOutputExecutor.builder() .llm(client) .targetType(OrderSummary.class) .outputFormat(OutputFormat.JSON) .rules(List.of( ValidationRule.notNull("orderId"), ValidationRule.range("total", 0, 100000), ValidationRule.pattern("email", ".*@.*\\..*"))) .maxRetries(3) .systemPrompt("You are an order processing assistant.") .build(); StructuredOutputResult result = executor.generate( "Summarize this order: customer=John, items=3 widgets at $25 each" ); if (result.success()) { OrderSummary order = result.value(); } else { System.out.println("Failed after " + result.attempts() + " attempts"); System.out.println("Errors: " + result.errors()); } ``` ### Retry Flow The executor handles the common problem of LLMs producing malformed or invalid structured output by automatically retrying with targeted error feedback. 1. Generate output with format instructions appended to prompt 2. Deserialize response to target type 3. Run validation rules against deserialized data 4. On parse or validation failure: build correction prompt with specific errors 5. Repeat up to `maxRetries` times The correction prompt includes the specific errors so the LLM can fix them directly rather than guessing what went wrong. --- # Planning URL: https://tnsai.dev/docs/capabilities/intelligence/planning Description: Goal-oriented planning for AI agents. TnsAI provides three planner implementations: annotation-driven backward chaining, utility-based scoring, and LLM-powered dynamic planning with human-in-the-loop approval and adaptive replanning. import { Callout } from 'fumadocs-ui/components/callout' ## Planner Interface Every planner in TnsAI implements this interface, which defines how to generate action plans from the current world state. You can use one of the three built-in planners or register your own via `META-INF/services/com.tnsai.planning.Planner`. ```java public interface Planner { List plan(Map state); List plan(Map state, boolean useChaining); List getGoals(); List getActions(); List findUnsatisfiedGoals(Map state); List findSatisfiedGoals(Map state); boolean isGoalSatisfied(String goalName, Map state); List findActionsForGoal(PlanningGoal goal, Map state); List findApplicableActions(Map state); Map applyEffects(PlanningAction action, Map state); } ``` ## PlanningGoal Goals define what the agent wants to achieve. Created from `@Goal` annotations or programmatically. ```java // Simple goal with defaults PlanningGoal goal = PlanningGoal.of("survive", "health > 0"); // Goal with priority PlanningGoal urgent = PlanningGoal.of("heal", "health > 50", Priority.HIGH); // Full constructor PlanningGoal full = new PlanningGoal( "survive", // name "health > 0", // condition expression Priority.CRITICAL, // priority "Keep health above zero", // description true, // persistent (re-evaluate after achievement) 100 // deadline in ticks (-1 for none) ); ``` | Field | Type | Description | | ------------ | ---------- | ------------------------------------------ | | `name` | `String` | Unique goal identifier | | `condition` | `String` | Boolean expression evaluated against state | | `priority` | `Priority` | Determines planning order (higher = first) | | `persistent` | `boolean` | Re-evaluate after achievement | | `deadline` | `int` | Ticks until expiry (-1 = none) | ## PlanningAction Actions represent things the agent can do, with preconditions, postconditions, and utility fields. ```java // Simple factory PlanningAction heal = PlanningAction.of( "heal", "health < 50", "health = 100", "survive"); // Builder with utility fields PlanningAction search = PlanningAction.builder("search") .description("Search for resources") .precondition("energy > 10") .postcondition("resources = resources + 5") .fulfills("gather") .cost(10) .value(50) .weight(1.5f) .tags("exploration", "gathering") .build(); search.utility(); // 40 (value - cost) search.weightedUtility(); // 60.0 (utility * weight) search.fulfillsGoal("gather"); // true search.hasTag("exploration"); // true search.hasPrecondition(); // true search.hasPostcondition(); // true ``` | Field | Type | Default | Description | | --------------- | ------------- | -------- | -------------------------------------------- | | `name` | `String` | required | Action identifier | | `precondition` | `String` | `""` | Condition that must be true before execution | | `postcondition` | `String` | `""` | State changes after execution | | `fulfills` | `Set` | `{}` | Goal names this action helps achieve | | `method` | `Method` | null | Java method to invoke (null for simulation) | | `cost` | `int` | 1 | Execution cost for utility calculation | | `value` | `int` | 1 | Expected value for utility calculation | | `weight` | `float` | 1.0 | Multiplier for utility score | | `tags` | `Set` | `{}` | Tags for filtering/grouping | ## BackwardChainingPlanner Starts from unsatisfied goals and works backward to find action sequences. Handles multi-step plans where one action's postcondition enables another's precondition. ### Annotation-Driven Setup The easiest way to define goals, actions, and state is with annotations on a Java class. The planner reads `@Goal`, `@ActionSpec`, and `@State` annotations at construction time and builds the planning model automatically. ```java @RoleSpec( name = "combat-medic", goals = { @Goal(name = "survive", condition = "health > 0", priority = Priority.CRITICAL), @Goal(name = "heal-team", condition = "teamHealth > 50", priority = Priority.HIGH), @Goal(name = "gather", condition = "supplies > 10", priority = Priority.NORMAL) } ) public class CombatMedic { @State(name = "health") private int health = 100; @State(name = "supplies") private int supplies = 5; @State(name = "teamHealth") private int teamHealth = 30; @ActionSpec( description = "Use medkit to heal a teammate", precondition = "supplies > 0", postcondition = "teamHealth = 80, supplies = supplies - 1", fulfills = {"heal-team"} ) public void healTeammate() { /* ... */ } @ActionSpec( description = "Search area for supplies", precondition = "health > 20", postcondition = "supplies = supplies + 3", fulfills = {"gather"} ) public void searchForSupplies() { /* ... */ } } ``` ### Using the Planner Once you have a planner, call `plan(state)` with the current world state to get an ordered list of actions. The backward chaining algorithm figures out which actions to execute and in what order to satisfy unsatisfied goals. ```java // Create from annotated class Planner planner = new BackwardChainingPlanner(CombatMedic.class); // Extract current state from @State fields Map state = BackwardChainingPlanner.extractState(medicInstance); // state = {health=100, supplies=5, teamHealth=30} // Generate plan (backward chaining enabled by default) List plan = planner.plan(state); // Result: [searchForSupplies, healTeammate] // Because: need supplies first (heal-team precondition), then heal // Query goals planner.findUnsatisfiedGoals(state); // [heal-team, gather] planner.isGoalSatisfied("survive", state); // true (health=100 > 0) // Custom max depth Planner planner = new BackwardChainingPlanner(CombatMedic.class, 5); // Programmatic setup Planner planner = new BackwardChainingPlanner(goals, actions, 8); ``` The backward chaining algorithm recurses up to `maxDepth` (default 10) and tracks visited actions to prevent infinite cycles. ## UtilityAIPlanner Greedily selects the action with the highest utility score. Unlike backward chaining (goal-directed), utility AI is reactive -- it picks the best action at each step. ### Considerations Considerations are scoring functions that evaluate how desirable each action is given the current state. The planner multiplies all consideration scores together to produce a final utility value for each action, then picks the highest one. | Factory Method | Description | | --------------------------------------- | ------------------------------------------------------ | | `Consideration.cost()` | Lower cost = higher score (inverse, normalized to 100) | | `Consideration.cost(weight)` | Weighted cost consideration | | `Consideration.value()` | Higher value = higher score (normalized to 100) | | `Consideration.value(weight)` | Weighted value consideration | | `Consideration.utility()` | value - cost, normalized | | `Consideration.preconditionSatisfied()` | 1.0 if met, 0.0 if not | | `Consideration.hasTag(tag)` | 1.0 if action has tag | | `Consideration.combine(...)` | Weighted average of multiple | ```java // Custom consideration Consideration urgency = (action, state) -> { Integer priority = (Integer) state.get("taskPriority"); return priority != null ? priority / 10.0f : 0.5f; }; ``` ### Builder Pattern You can build a UtilityAIPlanner programmatically by adding goals, actions, and considerations. The planner evaluates all actions against the considerations and selects the one with the highest combined score. ```java UtilityAIPlanner planner = UtilityAIPlanner.builder() .goal(PlanningGoal.of("optimize", "efficiency > 80")) .action(PlanningAction.builder("cacheResults") .cost(5).value(40).fulfills("optimize").build()) .action(PlanningAction.builder("parallelProcess") .cost(20).value(80).fulfills("optimize").build()) .consideration(Consideration.cost(0.3f)) .consideration(Consideration.value(0.5f)) .consideration(Consideration.preconditionSatisfied()) .build(); Optional best = planner.selectBestAction(state); List ranked = planner.getActionsByUtility(state); float score = planner.calculateUtility(action, state); ``` ### Annotation-Driven with @Utility Instead of building programmatically, you can annotate actions with `@Utility` to set their cost, value, and weight directly in the class definition. The planner reads these at construction time. ```java @ActionSpec( description = "Cache query results", precondition = "cacheSize < maxCache", postcondition = "cacheHitRate = 0.8", fulfills = {"performance"}, utility = @Utility(cost = 5, value = 40, weight = 1.2f, tags = {"cache"}) ) public void cacheResults() { /* ... */ } UtilityAIPlanner planner = new UtilityAIPlanner(MyRole.class); ``` ## LLMDynamicPlanner Uses an LLM to decompose natural-language goals into executable step sequences. Suitable for open-ended tasks where actions cannot be predefined. ```java LLMDynamicPlanner planner = LLMDynamicPlanner.builder() .llm(client) .capability(CapabilityDescriptor.of("search", "Search the web for information")) .capability(CapabilityDescriptor.of("write_file", "Write content to a file")) .capability(CapabilityDescriptor.of("run_tests", "Execute test suite")) .additionalContext("Project uses Java 21 with Maven") .temperature(0.2f) .build(); LLMPlan plan = planner.generatePlan("Create a summary of recent AI news"); System.out.println(plan.toDisplayString()); // Plan for: Create a summary of recent AI news // Steps: // 1. [search] Find recent AI news articles // 2. [write_file] Write summary to output.md for (LLMPlanStep step : plan.steps()) { System.out.printf("[%s] %s (args: %s)%n", step.actionName(), step.description(), step.arguments()); } ``` ### LLMPlan An immutable data structure representing the generated plan. It supports non-destructive modifications (removing steps, reordering) that return a new plan, which is useful for human-in-the-loop approval workflows where reviewers may want to adjust the plan before execution. ```java plan.size(); // Number of steps plan.isEmpty(); // True if no steps plan.goal(); // Original goal string plan.reasoning(); // LLM's overall strategy plan.withoutStep(2); // New plan without step at index 2 plan.withReorderedSteps(List.of(0, 2, 1)); // New plan with reordered steps plan.remainingFrom(3); // New plan with steps from index 3 onward plan.toDisplayString(); // Human-readable format ``` ### LLMPlanStep Each step in an LLM-generated plan maps to one of the declared capabilities. It includes the action to execute, a human-readable description, optional arguments, and the LLM's reasoning for why this step is needed. ```java LLMPlanStep step = LLMPlanStep.of(0, "search", "Find recent articles"); step.stepIndex(); // 0 step.actionName(); // "search" step.description(); // "Find recent articles" step.arguments(); // Map step.reasoning(); // Why this step is needed ``` ## PlanApprovalGate Human-in-the-loop approval between plan generation and execution. ```java PlanApprovalGate gate = PlanApprovalGate.builder() .reviewCallback(plan -> { System.out.println(plan.toDisplayString()); System.out.print("Approve? (y/n): "); String input = scanner.nextLine(); if ("y".equals(input)) return ApprovalDecision.approve(); return ApprovalDecision.reject("User declined"); }) .autoApproveEmpty(true) .build(); Optional approved = gate.review(generatedPlan); approved.ifPresent(plan -> engine.executePlan(plan)); // Generate + review in one call Optional result = gate.generateAndReview(planner, "Deploy the app"); // Auto-approve for testing PlanApprovalGate autoGate = PlanApprovalGate.autoApprove(); ``` ### ApprovalDecision The reviewer's response to a proposed plan. Decisions can accept, reject, or modify the plan by removing or reordering steps. | Factory | Description | | --------------------------------------------- | ------------------------------ | | `ApprovalDecision.approve()` | Accept plan as-is | | `ApprovalDecision.reject(reason)` | Reject with reason | | `ApprovalDecision.removeSteps(List)` | Accept with steps removed | | `ApprovalDecision.reorder(List)` | Accept with reordered steps | | `ApprovalDecision.modify(removed, newOrder)` | Accept with both modifications | ## AdaptiveReplanEngine Executes LLM-generated plans with automatic replanning on step failure. ```java AdaptiveReplanEngine engine = AdaptiveReplanEngine.builder() .llm(client) .planner(planner) .stepExecutor(step -> { try { String output = myToolRunner.run(step.actionName(), step.arguments()); return StepExecutionResult.success(output); } catch (Exception e) { return StepExecutionResult.failure(e.getMessage()); } }) .maxReplanAttempts(3) .build(); PlanExecutionResult result = engine.execute("Deploy the application"); System.out.println(result.success()); System.out.println(result.replanCount()); // Execute an existing plan PlanExecutionResult result = engine.executePlan(approvedPlan, currentState); ``` ### Replanning Flow When a step fails, the engine does not simply stop. Instead, it asks the LLM to create a revised plan that accounts for the failure, then continues execution. This makes plans resilient to unexpected errors. 1. Execute steps sequentially via `StepExecutor` 2. On failure: collect completed steps, error details, remaining steps 3. Call LLM with failure context to generate a revised plan 4. Continue execution with revised plan 5. Repeat up to `maxReplanAttempts` times ## Full Pipeline Example This shows the recommended end-to-end workflow: the LLM generates a plan, a human reviews and approves it, and the adaptive engine executes it with automatic replanning on failure. ```java // 1. Create planner with capabilities LLMDynamicPlanner planner = LLMDynamicPlanner.builder() .llm(client).capabilities(capabilities).build(); // 2. Set up approval gate PlanApprovalGate gate = PlanApprovalGate.builder() .reviewCallback(myReviewUI::showPlan).build(); // 3. Set up execution engine AdaptiveReplanEngine engine = AdaptiveReplanEngine.builder() .llm(client).planner(planner) .stepExecutor(myExecutor).maxReplanAttempts(3).build(); // 4. Generate, approve, execute Optional approved = gate.generateAndReview(planner, goal, state); approved.ifPresent(plan -> { PlanExecutionResult result = engine.executePlan(plan, state); if (result.success()) { System.out.println("Goal achieved!"); } }); ``` --- # Reasoning URL: https://tnsai.dev/docs/capabilities/intelligence/reasoning Description: Advanced reasoning strategies for complex problem solving. TnsAI provides multiple reasoning executors based on recent AI research, from simple chain-of-thought to graph-based reasoning with merging and refinement. import { Callout } from 'fumadocs-ui/components/callout' ## ThinkingResult When an LLM uses extended thinking (like Claude's chain-of-thought), TnsAI wraps the output in a `ThinkingResult` so you can inspect both the reasoning process and the final answer separately. This is useful for debugging, auditing, or displaying the model's step-by-step logic to users. ```java ThinkingResult result = ThinkingResult.builder() .thinkingProcess("Step 1: Analyze the input... Step 2: Consider edge cases...") .finalAnswer("The optimal solution is X because...") .thinkingTokens(1200) .outputTokens(350) .thinkingBlocks(List.of("analysis block", "verification block")) .build(); System.out.println(result.hasThinking()); // true System.out.println(result.getTotalTokens()); // 1550 System.out.println(result.getFinalAnswer()); ``` | Method | Returns | Description | | ---------------------- | -------------- | ------------------------------- | | `getThinkingProcess()` | `String` | Full internal reasoning text | | `getFinalAnswer()` | `String` | Answer produced after thinking | | `getThinkingTokens()` | `int` | Tokens consumed by thinking | | `getOutputTokens()` | `int` | Tokens in the final answer | | `getTotalTokens()` | `int` | Sum of thinking + output tokens | | `getThinkingBlocks()` | `List` | Individual thinking blocks | | `hasThinking()` | `boolean` | True if thinking was performed | ## Tree of Thoughts (ToT) Explores multiple reasoning paths by generating candidate thoughts, evaluating them, and pruning low-quality branches. Based on "Tree of Thoughts: Deliberate Problem Solving with Large Language Models" (Yao et al., 2023). ### TreeOfThoughtsExecutor The executor manages the full exploration lifecycle: generating candidate thoughts at each depth, scoring them, pruning weak branches, and returning the best reasoning path found. ```java TreeOfThoughtsExecutor tot = TreeOfThoughtsExecutor.builder() .llm(client) .evaluator(BranchEvaluator.llm(evalClient)) .pruning(PruningStrategy.BEAM_SEARCH) .beamWidth(3) .maxDepth(5) .branchingFactor(3) .pruneThreshold(0.3) .timeout(Duration.ofMinutes(10)) .build(); ToTResult result = tot.explore("Design a REST API for a todo app"); System.out.println(result.getBestPath()); System.out.println(result.getBestScore()); ``` ### Builder Parameters These settings control how broadly and deeply the tree is explored, and when to stop. | Parameter | Default | Description | | ----------------- | ------------- | --------------------------------- | | `llm` | required | LLMClient for thought generation | | `evaluator` | required | BranchEvaluator for scoring nodes | | `pruning` | `BEAM_SEARCH` | Pruning strategy | | `beamWidth` | 3 | Branches to keep per level | | `maxDepth` | 5 | Maximum tree depth | | `branchingFactor` | 3 | Candidate thoughts per node | | `pruneThreshold` | 0.3 | Minimum score to survive | | `timeout` | 10 min | Exploration time limit | ### PruningStrategy Pruning determines which branches to keep and which to discard during exploration. Choosing the right strategy lets you balance thoroughness against cost -- more aggressive pruning is faster and cheaper, while less pruning explores more possibilities. | Strategy | Description | | --------------- | -------------------------------------------------------------- | | `BEAM_SEARCH` | Keep top-k branches at each level (default) | | `BEST_FIRST` | Always expand the highest cumulative-score node | | `DEPTH_LIMITED` | Explore all branches up to max depth | | `GREEDY` | Always pick the single best branch | | `MCTS` | Monte Carlo Tree Search -- balance exploration vs exploitation | | `EXHAUSTIVE` | No pruning -- explore all branches (expensive) | ### ToTResult The result captures the full exploration tree, best path, and statistics: ```java ToTResult result = tot.explore("Optimize this algorithm"); result.getBestPath(); // Combined thought chain of best leaf result.getBestScore(); // Average score along best path result.hasSolution(); // true if bestScore > 0.5 result.getBestLeaf(); // Optional result.getTopPaths(3); // Top-3 leaf nodes by score result.getTotalNodes(); // Total nodes explored result.getPrunedNodes(); // Nodes pruned during exploration result.getMaxDepthReached(); // Deepest level reached result.getDuration(); // Total exploration time ``` ### ThoughtNode Each node in the tree represents a single reasoning step. Nodes track their score, depth, and links to parent and children, so you can traverse the full reasoning path from root to any leaf. ```java ThoughtNode root = ThoughtNode.root("Design a caching strategy"); ThoughtNode child = root.addChild("n1", "Use LRU cache with TTL"); child.setScore(0.85); child.getThoughtChain(); // "Design a caching strategy -> Use LRU cache with TTL" child.getCumulativeScore(); // Sum of scores along path child.getAverageScore(); // Average score along path child.getDepth(); // 1 child.isLeaf(); // true (no children yet) child.isRoot(); // false child.isEvaluated(); // true (score was set) child.isPruned(); // false child.getPath(); // [root, child] ``` ### BranchEvaluator The evaluator scores each reasoning step on a 0.0-1.0 scale, which the pruning strategy uses to decide which branches to keep. You can use an LLM to judge quality, a fast heuristic, or combine both. ```java // LLM-based: asks an LLM to rate the reasoning step 0-100 BranchEvaluator llmEval = BranchEvaluator.llm(evalClient); // Heuristic: scores based on thought length and reasoning keywords BranchEvaluator heuristic = BranchEvaluator.heuristic(); // Combined: averages multiple evaluators BranchEvaluator combined = BranchEvaluator.combined(llmEval, heuristic); // Custom evaluator BranchEvaluator custom = (node, goal) -> { return node.getThought().contains("therefore") ? 0.8 : 0.4; }; ``` ## Graph of Thoughts (GoT) Extension of ToT that allows merging and refining thought branches. Better for problems where partial solutions can be combined. Based on "Graph of Thoughts" (Besta et al., 2023). ### GraphOfThoughtsExecutor The executor drives the graph exploration, applying generate, aggregate, and refine operations to build a directed graph of thoughts rather than a strict tree. ```java GraphOfThoughtsExecutor got = GraphOfThoughtsExecutor.builder() .llm(client) .evaluator(BranchEvaluator.llm(client)) .operations(List.of( GoTOperation.GENERATE, GoTOperation.AGGREGATE, GoTOperation.REFINE)) .maxNodes(20) .branchingFactor(3) .timeout(Duration.ofMinutes(10)) .build(); GoTResult result = got.explore("Design a database schema for e-commerce"); System.out.println(result.getBestThought()); System.out.println(result.aggregatedInsight()); System.out.println(result.mergeCount()); // How many merge operations occurred ``` ### GoTOperation Operations define what the graph executor does at each step. Unlike ToT which only generates and evaluates, GoT can also merge partial solutions together and iteratively refine them. | Operation | Description | | ----------- | ----------------------------------------------- | | `GENERATE` | Generate new thoughts from existing ones | | `AGGREGATE` | Merge multiple thoughts into a unified solution | | `REFINE` | Improve an existing thought iteratively | | `SCORE` | Evaluate a thought's quality | ### GoTNode Unlike tree nodes, a `GoTNode` can have multiple parents because merge operations combine separate reasoning branches into one. This makes it a true graph structure rather than a tree. ```java GoTNode root = GoTNode.root("Design a notification system"); GoTNode emailApproach = root.addChild("n1", "Use email queues", GoTOperation.GENERATE); GoTNode pushApproach = root.addChild("n2", "Use push notifications", GoTOperation.GENERATE); // Merge two approaches into one GoTNode merged = GoTNode.merge("m1", "Hybrid: email for async, push for real-time", List.of(emailApproach, pushApproach)); merged.isMergeNode(); // true merged.getParents().size(); // 2 ``` ### GoTResult The result of a graph exploration, including the best thought found, aggregate insights from merging, and statistics about the exploration process. `GoTResult` is a record with these fields: ```java GoTResult result = got.explore("..."); result.getBestThought(); // Content of the highest-scored node result.getBestScore(); // Score of best node result.totalNodes(); // Total nodes in graph result.mergeCount(); // Number of AGGREGATE operations result.hasSolution(); // true if bestScore > 0.5 result.aggregatedInsight(); // Synthesized insight from best nodes result.duration(); // Exploration time ``` ## CausalReasoner Causal reasoning engine for why, what-if, and intervention queries. Uses an LLM with a causal model description to analyze cause-effect relationships. ```java CausalReasoner reasoner = CausalReasoner.builder() .llm(client) .causalModel("Sales depend on marketing spend, season, and competitor pricing") .build(); Map context = Map.of( "q3_sales", 150000, "marketing_budget", 50000, "season", "summer" ); // Why did something happen? CausalResult why = reasoner.why("Why did sales drop in Q3?", context); System.out.println(why.answer()); System.out.println(why.reasoning()); System.out.println(why.confidence()); // 0.0-1.0 // What-if counterfactual CausalResult whatIf = reasoner.whatIf("What if we doubled marketing?", context); // Intervention prediction CausalResult intervene = reasoner.intervene("Cut prices by 15%", context); ``` ### CausalQueryType Three types of causal queries are supported, each answering a different kind of question about cause and effect. | Type | Method | Purpose | | -------------- | ------------------------- | ------------------------------ | | `WHY` | `reasoner.why(...)` | Explain why something happened | | `WHAT_IF` | `reasoner.whatIf(...)` | Counterfactual reasoning | | `INTERVENTION` | `reasoner.intervene(...)` | Predict effect of an action | ### CausalResult The result of a causal query, containing the explanation or prediction along with a confidence score indicating how certain the model is. | Field | Type | Description | | ------------ | ----------------- | ------------------------------------ | | `queryType` | `CausalQueryType` | Which type of query was made | | `answer` | `String` | The causal explanation or prediction | | `reasoning` | `String` | Step-by-step reasoning chain | | `confidence` | `double` | Confidence score (0.0-1.0) | ## SelfConsistencyExecutor Generates multiple reasoning paths and returns the consensus answer via majority voting or other aggregation. Based on "Self-Consistency Improves Chain of Thought Reasoning" (Wang et al., 2022). ```java SelfConsistencyExecutor executor = SelfConsistencyExecutor.builder() .llm(client) .numPaths(5) .aggregation(Aggregation.MAJORITY_VOTE) .baseTemperature(0.7) .temperatureVariance(0.1) .parallel(true) .maxConcurrency(5) .timeout(Duration.ofMinutes(5)) .systemPrompt("You are a math tutor.") .build(); ConsistencyResult result = executor.reason("What is 17 * 23?"); System.out.println(result.getConsensusAnswer()); // "391" System.out.println(result.getConfidence()); // 0.8 (4/5 paths agreed) System.out.println(result.isUnanimous()); // false System.out.println(result.getAnswerCounts()); // {"391"=4, "392"=1} ``` ### Builder Parameters Configure how many reasoning paths to generate and how to combine them into a final answer. | Parameter | Default | Description | | --------------------- | --------------- | ------------------------------------- | | `llm` | required | LLMClient for reasoning | | `numPaths` | 5 | Number of reasoning paths to generate | | `aggregation` | `MAJORITY_VOTE` | How to combine answers | | `baseTemperature` | 0.7 | Base LLM temperature | | `temperatureVariance` | 0.1 | Variance between path temperatures | | `systemPrompt` | none | Optional system prompt | | `parallel` | true | Execute paths in parallel | | `maxConcurrency` | 5 | Max parallel threads | | `timeout` | 5 min | Timeout for parallel execution | ### Aggregation The aggregation strategy determines how multiple reasoning paths are combined into a single consensus answer. Majority vote is the simplest and most common choice. | Strategy | Description | | --------------- | ------------------------------------------------ | | `MAJORITY_VOTE` | Most common answer wins (default) | | `WEIGHTED_VOTE` | Weight by reasoning chain length | | `UNANIMOUS` | Require all paths to agree | | `THRESHOLD` | First answer appearing in \\> 50% of paths | | `LLM_SYNTHESIS` | Use LLM to synthesize best answer from all paths | ### ConsistencyResult The result tells you which answer won the vote, how confident the consensus is, and lets you inspect each individual reasoning path for debugging. ```java ConsistencyResult result = executor.reason("..."); result.getConsensusAnswer(); // The winning answer result.getConfidence(); // Ratio of paths that agreed result.getAllPaths(); // All ReasoningPath objects result.getAnswerCounts(); // Map of answer frequencies result.getTotalPaths(); // Number of paths generated result.isUnanimous(); // true if confidence >= 1.0 result.isConfident(0.8); // true if confidence >= threshold ``` ## Choosing a Strategy Pick the strategy that matches your problem type and cost budget. Simple factual questions work well with self-consistency, while complex design problems benefit from tree or graph exploration. | Strategy | Best For | Cost | | ----------------- | ---------------------------------------------- | ----------------------------------------- | | Tree of Thoughts | Step-by-step decomposition problems | High (branching factor x depth LLM calls) | | Graph of Thoughts | Problems where partial solutions combine | Higher (includes merge/refine operations) | | Self-Consistency | Factual questions with verifiable answers | Medium (N parallel LLM calls) | | Causal Reasoning | Diagnosing causes and predicting interventions | Low (single LLM call per query) | --- # Advanced LLM Patterns URL: https://tnsai.dev/docs/capabilities/llm/advanced Description: Advanced capabilities in TnsAI.LLM for observability, structured output, resilience, caching, intelligent routing, and cost management. import { Callout } from 'fumadocs-ui/components/callout' ## Providers TnsAI.LLM ships with 13 concrete provider implementations (plus the `AbstractLLMClient` base), all implementing the `LLMClient` interface from tnsai-core. API keys are configured via environment variables. | Provider | Class | Notes | | ------------- | ------------------- | ------------------------------------------------ | | Anthropic | `AnthropicClient` | Claude models, prompt caching, extended thinking | | OpenAI | `OpenAIClient` | GPT models, JSON mode, function calling | | Azure OpenAI | `AzureOpenAIClient` | Azure-hosted OpenAI models | | Google Gemini | `GeminiClient` | Gemini models, vision, structured output | | AWS Bedrock | `BedrockClient` | Claude/Titan via AWS | | Mistral | `MistralClient` | Mistral/Mixtral models | | Cohere | `CohereClient` | Command models | | Groq | `GroqClient` | Ultra-low latency inference | | HuggingFace | `HuggingFaceClient` | Inference API models | | Ollama | `OllamaClient` | Local model serving | | OpenRouter | `OpenRouterClient` | Multi-provider gateway | | MiniMax | `MiniMaxClient` | MiniMax models | | ZhipuAI | `ZhipuAIClient` | GLM models | | Whisper | `WhisperClient` | Audio transcription (audio package) | All providers support `chat()`, `streamChat()`, `streamChatWithSpec()`, and `streamChatWithHandler()` methods. Provider capabilities are exposed via `getCapabilities()` which returns `LLMCapabilities` with fields like `supportsVision()`, `supportsFunctionCalling()`, `supportsStructuredOutput()`, `getMaxInputTokens()`, `getInputCostPer1KTokens()`, etc. > **Cross-reference**: For provider setup and basic usage, see [Providers](/docs/capabilities/llm/providers). ## Observability ### ObservableLLMClient `ObservableLLMClient` wraps any `LLMClient` and notifies registered observers about all requests, responses, and errors. This enables non-invasive monitoring of LLM operations without modifying existing code. ```java // Create base client LLMClient baseClient = new OpenAIClient("gpt-4o"); // Create metrics collector LLMMetrics metrics = new LLMMetrics(); // Wrap with observability LLMClient observedClient = new ObservableLLMClient(baseClient, metrics); // Use normally -- all calls are tracked observedClient.chat("Hello!"); // Multiple observers LLMMetrics metrics = new LLMMetrics(); PromptLogger logger = new PromptLogger(); LLMClient client = new ObservableLLMClient(baseClient, metrics, logger); // Access internals LLMClient delegate = ((ObservableLLMClient) client).getDelegate(); LLMObserver observer = ((ObservableLLMClient) client).getObserver(); ``` ### LLMObserver Interface Implement `LLMObserver` for custom monitoring. All methods have default no-op implementations. ```java public interface LLMObserver { void onRequest(LLMClient client, String message, Optional systemPrompt, Optional>> history, Optional>> tools); void onResponse(LLMClient client, ChatResponse response, long latencyMs); void onError(LLMClient client, Exception error, long latencyMs); void onStreamChunk(LLMClient client, String chunk, int chunkIndex); void onStreamComplete(LLMClient client, int totalChunks, long latencyMs); void onStreamError(LLMClient client, Exception error, int chunksReceived, long latencyMs); } ``` Compose multiple observers with `ObservableLLMClient.CompositeObserver.of(observer1, observer2)`. ### LLMMetrics `LLMMetrics` implements `LLMObserver` and collects comprehensive metrics: - Request/response/error counts (global and per-provider) - Token usage estimates (input and output) - Latency statistics (average, p50, p95, p99) - Cost estimates based on provider pricing - Stream chunk counts ```java LLMMetrics metrics = new LLMMetrics(); LLMClient client = new ObservableLLMClient(baseClient, metrics); // After some usage LLMMetrics.Report report = metrics.getReport(); System.out.println("Total requests: " + report.totalRequests()); System.out.println("Total responses: " + report.totalResponses()); System.out.println("Total errors: " + report.totalErrors()); System.out.println("Success rate: " + report.successRate() + "%"); System.out.println("Error rate: " + report.errorRate() + "%"); System.out.println("Input tokens: " + report.totalInputTokens()); System.out.println("Output tokens: " + report.totalOutputTokens()); System.out.println("Estimated cost: $" + report.totalEstimatedCost()); System.out.println("Avg latency: " + report.avgLatencyMs() + "ms"); System.out.println("P95 latency: " + report.p95LatencyMs() + "ms"); System.out.println("P99 latency: " + report.p99LatencyMs() + "ms"); // Per-provider breakdown Map byProvider = metrics.getMetricsByProvider(); for (var entry : byProvider.entrySet()) { LLMMetrics.ProviderMetrics pm = entry.getValue(); System.out.println(entry.getKey() + ": " + pm.requests() + " requests, " + pm.avgLatencyMs() + "ms avg, $" + pm.estimatedCost()); } metrics.reset(); // clear all metrics ``` ## Structured Output (JSON Mode) ### JsonModeClient `JsonModeClient` wraps any `LLMClient` to enforce JSON output. Uses provider-native JSON mode when available, falls back to prompt engineering for providers that lack native support. ```java // Simple wrap LLMClient baseClient = new OpenAIClient("gpt-4o"); JsonModeClient client = JsonModeClient.wrap(baseClient); // Get JSON response ChatResponse response = client.chat("List 3 programming languages"); // Response: {"languages": ["Python", "Java", "JavaScript"]} // Parse to a specific type LanguageList list = client.chatAs(LanguageList.class, "List 3 programming languages"); // With system prompt Person person = client.chatAs(Person.class, "Generate a person", Optional.of("You are a test data generator.")); ``` ### With JSON Schema ```java ResponseFormat format = ResponseFormat.jsonSchema("Person", Map.of( "type", "object", "properties", Map.of( "name", Map.of("type", "string"), "age", Map.of("type", "integer") ), "required", List.of("name", "age") )); JsonModeClient client = JsonModeClient.builder() .client(baseClient) .responseFormat(format) .build(); ChatResponse response = client.chat("Generate a person"); // Response: {"name": "Alice", "age": 30} ``` ### ResponseFormat Represents the desired output format. Three types: | Type | Factory | Behavior | | ------------- | ----------------------------------------- | ---------------------------------- | | `TEXT` | `ResponseFormat.text()` | Default text output | | `JSON_OBJECT` | `ResponseFormat.jsonObject()` | Valid JSON, structure not enforced | | `JSON_SCHEMA` | `ResponseFormat.jsonSchema(name, schema)` | JSON conforming to provided schema | Generate schema from a class: `ResponseFormat.jsonSchema("Person", Person.class)`. Convert to provider-specific formats: `format.toOpenAIFormat()`, `format.toGeminiFormat()`, `format.toOllamaFormat()`. Key methods: `isJson()`, `hasSchema()`, `isStrict()`, `getSchema()`, `getSchemaName()`. ### Advanced Options ```java JsonModeClient client = JsonModeClient.builder() .client(baseClient) .responseFormat(format) .objectMapper(customMapper) // custom Jackson ObjectMapper .forcePromptEngineering(true) // skip native JSON mode, always use prompt engineering .schemaFromClass("Person", Person.class) // generate schema from class .build(); // Check native support boolean nativeSupport = client.supportsNativeJsonMode(); // Parse raw JSON Person p = client.parseResponse("{\"name\":\"Alice\",\"age\":30}", Person.class); ``` On parse failure, `JsonModeClient.JsonParseException` is thrown, which contains `getRawContent()` for debugging. ## Resilience ### CircuitBreakerClient `CircuitBreakerClient` prevents cascading failures by fast-failing when a provider is consistently down. Implements the standard three-state circuit breaker pattern. **State transitions**: `CLOSED` (normal, counting failures) -\\> `OPEN` (fast-fail after N consecutive failures) -\\> `HALF_OPEN` (after recovery timeout, allows one probe request) -\\> `CLOSED` (if probe succeeds) or back to `OPEN` (if probe fails). ```java // Simple wrap (5 failures, 30s recovery) LLMClient resilient = CircuitBreakerClient.wrap(openaiClient); // Custom settings LLMClient resilient = CircuitBreakerClient.builder() .client(openaiClient) .failureThreshold(3) .recoveryTimeout(Duration.ofSeconds(60)) .build(); // Inspect state CircuitBreakerClient cb = (CircuitBreakerClient) resilient; CircuitBreakerClient.State state = cb.getState(); // CLOSED, OPEN, HALF_OPEN int failures = cb.getConsecutiveFailures(); // Metrics CircuitBreakerClient.CircuitBreakerMetrics metrics = cb.getMetrics(); System.out.println("Success rate: " + metrics.successRate() + "%"); System.out.println("Total requests: " + metrics.totalRequests()); System.out.println("Rejected (fast-fail): " + metrics.rejectedCount()); System.out.println("State transitions: " + metrics.stateTransitions()); // Manual reset cb.reset(); ``` When the circuit is open, all requests throw `CircuitOpenException` (contains model name, failure count, recovery timeout, and trip time). Compose with `FallbackRouter` for automatic failover: ```java FallbackRouter router = FallbackRouter.of( CircuitBreakerClient.wrap(primary), CircuitBreakerClient.wrap(fallback) ); ``` ## Caching ### PromptCachingClient `PromptCachingClient` wraps any `LLMClient` and adds Anthropic-style prompt caching support. Automatically adds cache control markers to system prompts, tools, and conversation history breakpoints. ```java PromptCachingClient client = PromptCachingClient.builder() .client(anthropicClient) .cacheSystemPrompt(true) // cache system prompt (default: true) .cacheTools(true) // cache tool definitions (default: true) .cacheHistoryBreakpoints(2) // cache breakpoints in history (max 4) .minTokensForCaching(1024) // minimum tokens to trigger caching (default: 1024) .build(); // Use normally -- caching is automatic ChatResponse response = client.chat("Hello", systemPrompt, history, tools); // Check cache statistics System.out.println("Cache read tokens: " + client.getTotalCacheReadTokens()); System.out.println("Cache creation tokens: " + client.getTotalCacheCreationTokens()); System.out.println("Hit rate: " + client.getCacheHitRate()); System.out.println("Estimated savings: " + (client.getEstimatedSavings() * 100) + "%"); System.out.println("Requests: " + client.getRequestCount()); client.resetStats(); ``` **Cost savings**: Cache reads are 90% cheaper than regular input tokens. Cache writes are 25% more expensive (one-time cost). TTL is 5 minutes, refreshed on each use. > **Cross-reference**: For more on caching strategies, see [Caching](/docs/capabilities/llm/caching). ### SemanticCache The `SemanticCache` interface provides similarity-based caching for LLM responses. Unlike exact-match caching, it matches semantically equivalent prompts using embedding vectors. ```java SemanticCache cache = InMemorySemanticCache.builder() .embeddingProvider(new OpenAIEmbeddingProvider()) .highThreshold(0.95) // direct hit threshold .lowThreshold(0.70) // below this, skip cache .ttlSeconds(3600) // 1-hour TTL .maxEntries(10000) .build(); // Check cache Optional hit = cache.findSimilar("What is Python?", 0.90); if (hit.isPresent()) { return hit.get().response(); // cache hit } // Cache miss -- call LLM and store String response = llm.chat("What is Python?"); cache.put("What is Python?", response); // With system prompt consideration cache.findSimilar("What is Python?", Optional.of("Be concise"), 0.90); cache.put("What is Python?", Optional.of("Be concise"), response); // Find multiple similar entries List results = cache.findAllSimilar("Python language", 0.70, 5); // Statistics SemanticCache.CacheStats stats = cache.getStats(); System.out.println("Hits: " + stats.hits()); System.out.println("Misses: " + stats.misses()); System.out.println("Hit rate: " + stats.hitRate()); System.out.println("Size: " + stats.currentSize()); System.out.println("Evictions: " + stats.evictions()); ``` ## Routing TnsAI.LLM provides multiple routing strategies that implement `LLMRouter` (which extends `LLMClient`). All routers can be used as drop-in replacements for a single client. ### CapabilityRouter Routes requests based on required capabilities (vision, function calling, structured output, context window size). Selects the first eligible client matching the capability filter. ```java CapabilityRouter router = CapabilityRouter.builder() .addClient(new OpenAIClient("gpt-4o")) // vision + tools .addClient(new GroqClient("llama-3.3-70b")) // tools only .addClient(new OllamaClient("llama3.2")) // basic text .defaultRequirement(cap -> cap.supportsFunctionCalling()) .build(); // Use as a normal LLMClient router.chat("Use the search tool"); // Select specific capability on demand Optional visionClient = router.selectVisionCapable(); Optional toolClient = router.selectToolCapable(); Optional jsonClient = router.selectStructuredOutputCapable(); Optional bigContext = router.selectWithMinContext(128_000); // Generic capability filter Optional custom = router.selectByCapability( cap -> cap.supportsVision() && cap.supportsFunctionCalling()); // Statistics LLMRouter.RoutingStats stats = router.getStats(); router.resetStats(); ``` ### CostBasedRouter Routes to the cheapest viable provider. Sorts clients by input cost and tries the cheapest first, falling back to more expensive options on failure. Can reduce costs by up to 85% for simple queries. ```java CostBasedRouter router = CostBasedRouter.builder() .addClient(new OpenAIClient("gpt-4o-mini")) // $0.15/1M input .addClient(new GroqClient("llama-3.3-70b")) // $0.59/1M input .addClient(new OpenAIClient("gpt-4o")) // $2.50/1M input .addClient(new AnthropicClient("claude-sonnet-4")) // $3.00/1M input .build(); // Simple queries go to cheapest model router.chat("What is 2+2?"); // With capability requirement CostBasedRouter visionRouter = CostBasedRouter.builder() .addClient(new OpenAIClient("gpt-4o-mini")) .addClient(new OpenAIClient("gpt-4o")) .requireCapability(cap -> cap.supportsVision()) .build(); // Cost tracking CostBasedRouter.CostStats stats = router.getCostStats(); System.out.println("Total estimated cost: $" + stats.totalEstimatedCost()); System.out.println("Cost per provider: " + stats.costPerProvider()); System.out.println("Input tokens: " + stats.totalInputTokens()); System.out.println("Output tokens: " + stats.totalOutputTokens()); ``` ### LatencyBasedRouter Routes to the fastest available provider. Learns from actual response times and adapts routing decisions using a moving average window of the last 20 measurements. ```java LatencyBasedRouter router = LatencyBasedRouter.builder() .addClient(new GroqClient("llama-3.3-70b")) // ~100ms TTFT .addClient(new OpenAIClient("gpt-4o-mini")) // ~300ms TTFT .addClient(new AnthropicClient("claude-sonnet-4")) // ~600ms TTFT .maxLatencyMs(500) // exclude providers slower than 500ms .build(); // Routes to fastest (Groq) automatically, adapts over time router.chat("Quick question"); // Latency statistics LatencyBasedRouter.LatencyStats stats = router.getLatencyStats(); System.out.println("Fastest: " + stats.fastestProvider() + " (" + stats.fastestLatencyMs() + "ms)"); System.out.println("Per provider: " + stats.avgLatencyPerProvider()); ``` Initially uses estimated latency from `LLMCapabilities.getEstimatedLatencyMs()`. As actual measurements accumulate, routing decisions shift to measured performance. Failed requests are penalized with +5000ms latency to deprioritize unreliable providers. > **Cross-reference**: For routing basics and FallbackRouter, see [Routing](/docs/capabilities/llm/routing). ## Cost Management ### CostTracker The `CostTracker` interface provides a unified API for recording and analyzing LLM usage costs. The `InMemoryCostTracker` is the default implementation. ```java CostTracker tracker = new InMemoryCostTracker(); // Record usage UsageRecord record = UsageRecord.builder() .modelId("gpt-4o") .inputTokens(1000) .outputTokens(500) .build(); tracker.record(record); // Query List all = tracker.getRecords(); List byTime = tracker.getRecords(Instant.now().minus(Duration.ofHours(1)), Instant.now()); List byModel = tracker.getRecordsByModel("gpt-4o"); List byProvider = tracker.getRecordsByProvider("openai"); // Costs BigDecimal total = tracker.getTotalCost(); BigDecimal periodCost = tracker.getTotalCost(periodStart, periodEnd); Map byModelCost = tracker.getCostByModel(); Map byProviderCost = tracker.getCostByProvider(); // Statistics CostTracker.CostStatistics stats = tracker.getStatistics(); System.out.println("Records: " + stats.recordCount()); System.out.println("Total cost: $" + stats.totalCost()); System.out.println("Avg cost/request: $" + stats.averageCostPerRequest()); System.out.println("Input tokens: " + stats.totalInputTokens()); System.out.println("Output tokens: " + stats.totalOutputTokens()); System.out.println("Cached tokens: " + stats.totalCachedTokens()); System.out.println("Avg latency: " + stats.averageLatencyMs() + "ms"); stats.mostExpensiveRequest().ifPresent(r -> System.out.println("Most expensive: " + r.modelId() + " $" + r.cost())); ``` ### BudgetManager `BudgetManager` provides configurable spending limits with automatic enforcement, alert thresholds, and time-based budget periods. Thread-safe for concurrent use. ```java BudgetManager budget = BudgetManager.builder() .limit(100.00) // $100 budget .monthly() // or .daily() or .period(Duration.ofDays(7)) .alertThresholds(0.50, 0.80, 0.90, 0.95) // or .defaultAlertThresholds() .hardLimit(true) // hard limit (default) vs .softLimit() .costTracker(tracker) // optional: sync from CostTracker .onAlert(alert -> log.warn("Budget alert: {}", alert)) .onLimitExceeded(cost -> stopRequests()) .build(); // Atomic check-and-spend (prevents TOCTOU race conditions) if (budget.trySpend(new BigDecimal("0.05"))) { // make API call } else { // budget exceeded } // Or separate check/spend if (budget.canSpend(estimatedCost)) { // make API call budget.recordSpend(actualCost); // returns false if limit exceeded } // Query status BigDecimal remaining = budget.getRemainingBudget(); double usage = budget.getUsagePercent(); // 0.0 to 1.0+ Duration timeLeft = budget.getRemainingTime(); // Comprehensive status BudgetManager.BudgetStatus status = budget.getStatus(); System.out.println("State: " + status.state()); // OK, WARNING, CRITICAL, EXCEEDED, UNLIMITED System.out.println("Spend: $" + status.currentSpend() + " / $" + status.limit()); System.out.println("Remaining: $" + status.remaining()); // Manual reset budget.reset(); ``` **BudgetState values**: `OK` (\\< 70%), `WARNING` (70-90%), `CRITICAL` (90-100%), `EXCEEDED` (\\> 100%), `UNLIMITED` (no limit set). **BudgetAlertType values**: `THRESHOLD_REACHED`, `LIMIT_EXCEEDED`, `PERIOD_RESET`. Budgets automatically reset when the period elapses. If a `CostTracker` is provided, the budget syncs spend from tracked records at each period reset. > **Cross-reference**: For cost tracking basics, see [Cost Tracking](/docs/capabilities/llm/cost-tracking). --- # Audio & Speech URL: https://tnsai.dev/docs/capabilities/llm/audio Description: The WhisperClient provides speech-to-text capabilities powered by OpenAI's Whisper model. It supports transcription in multiple languages and translation of non-English audio to English. import { Callout } from 'fumadocs-ui/components/callout' ## Quick Start Two lines of code to transcribe or translate any audio file. The client handles file upload, API communication, and retry logic. ```java WhisperClient whisper = new WhisperClient(); // Transcribe audio from a file String text = whisper.transcribe(new File("speech.mp3")); // Translate non-English audio to English String english = whisper.translate(new File("french_speech.mp3")); ``` ## Builder When you need to use a custom API key or base URL (for example, if you are running a Whisper-compatible server), use the builder. ```java WhisperClient whisper = WhisperClient.builder() .model("whisper-1") // Model name (default: "whisper-1") .apiKey("sk-...") // API key (default: OPENAI_API_KEY env var) .baseUrl("https://...") // Base URL (default: OPENAI_BASE_URL or OpenAI) .build(); ``` ## Transcription Convert speech audio into text. Three overloads are available, from a simple one-liner to a fully configurable version with language hints, timestamps, and custom response formats. ```java // 1. File in, text out String text = whisper.transcribe(new File("speech.mp3")); // 2. AudioPart in, result out (default options) TranscriptionResult result = whisper.transcribe(AudioPart.fromFile(new File("speech.mp3"))); // 3. AudioPart + options, result out TranscriptionResult result = whisper.transcribe( AudioPart.fromFile(new File("meeting.wav")), TranscriptionOptions.builder() .language("en") .responseFormat(ResponseFormat.VERBOSE_JSON) .timestampGranularities(List.of("word", "segment")) .temperature(0.0f) .prompt("Technical meeting about AI architecture") .build() ); // Access result fields String text = result.getText(); result.getLanguage().ifPresent(lang -> System.out.println("Detected: " + lang)); result.getDuration().ifPresent(dur -> System.out.println("Duration: " + dur + "s")); if (result.hasWords()) { result.getWords().forEach(w -> System.out.println(w)); } if (result.hasSegments()) { result.getSegments().forEach(s -> System.out.println(s)); } ``` ### TranscriptionOptions Fine-tune the transcription by specifying the language, providing a context prompt, or requesting word-level timestamps. | Parameter | Type | Default | Description | | ------------------------ | ---------------- | ---------------- | ------------------------------------------------------------- | | `language` | `String` | Auto-detect | ISO-639-1 language code (e.g., `"en"`, `"tr"`, `"fr"`) | | `prompt` | `String` | None | Optional prompt to guide style or continue a previous segment | | `responseFormat` | `ResponseFormat` | `JSON` | Output format (see below) | | `temperature` | `float` | Provider default | Sampling temperature (0.0 = deterministic) | | `timestampGranularities` | `List` | Empty | `"word"` and/or `"segment"` (requires `VERBOSE_JSON` format) | ### TranscriptionResult The result always includes the transcribed text. When using `VERBOSE_JSON` format, you also get the detected language, audio duration, and optional word/segment timestamps. | Method | Return Type | Description | | --------------- | --------------------------- | --------------------------------------------- | | `getText()` | `String` | The transcribed text | | `getLanguage()` | `Optional` | Detected language (verbose JSON only) | | `getDuration()` | `Optional` | Audio duration in seconds (verbose JSON only) | | `getSegments()` | `List>` | Segment-level timestamps | | `getWords()` | `List>` | Word-level timestamps | | `hasSegments()` | `boolean` | Whether segment data is present | | `hasWords()` | `boolean` | Whether word data is present | ## Translation Translate audio in any supported language into English text. The source language is detected automatically -- you do not need to specify it. ```java // Simple file translation String english = whisper.translate(new File("turkish_speech.mp3")); // With options String english = whisper.translate( AudioPart.fromFile(new File("german_lecture.wav")), TranslationOptions.builder() .responseFormat(ResponseFormat.TEXT) .temperature(0.0f) .prompt("Academic lecture on physics") .build() ); ``` ### TranslationOptions Similar to transcription options but without a language parameter, since translation always auto-detects the source language. | Parameter | Type | Default | Description | | ---------------- | ---------------- | ---------------- | ------------------------------------------ | | `prompt` | `String` | None | Optional prompt to guide translation style | | `responseFormat` | `ResponseFormat` | `JSON` | Output format | | `temperature` | `float` | Provider default | Sampling temperature | Translation always outputs English. There is no language parameter -- the source language is detected automatically. ## ResponseFormat Choose the output format based on what you need. Use `TEXT` for simple transcriptions, `VERBOSE_JSON` for timestamps and metadata, or `SRT`/`VTT` for subtitle generation. | Value | Description | | -------------- | ---------------------------------------------------------------- | | `JSON` | Returns `{"text": "..."}` | | `TEXT` | Returns plain text | | `SRT` | Returns SubRip subtitle format | | `VTT` | Returns WebVTT subtitle format | | `VERBOSE_JSON` | Returns text + language, duration, segments, and word timestamps | ```java // Subtitle generation TranscriptionResult srt = whisper.transcribe( AudioPart.fromFile(new File("video.mp4")), TranscriptionOptions.builder() .responseFormat(ResponseFormat.SRT) .build() ); ``` ## AudioPart `AudioPart` is a content wrapper from `tnsai-core` that handles the details of encoding and formatting audio data for API submission. Create one from whichever source you have -- file, bytes, Base64 string, or URL. ```java // From file (reads and Base64-encodes) AudioPart audio = AudioPart.fromFile(new File("speech.wav")); // From Base64 string AudioPart audio = AudioPart.fromBase64(base64String, "audio/mp3"); // From byte array AudioPart audio = AudioPart.fromBytes(rawBytes, "audio/wav"); // From URL AudioPart audio = AudioPart.fromUrl("https://example.com/audio.mp3"); ``` ## Supported Audio Formats The following audio formats are accepted by the Whisper API: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, flac, aac, aiff. Maximum file size: **25 MB**. ## Configuration Set your OpenAI API key to authenticate with the Whisper service. An optional base URL override is available for self-hosted or proxy deployments. | Environment Variable | Required | Description | | -------------------- | -------- | ------------------------------------------------------ | | `OPENAI_API_KEY` | Yes | OpenAI API key | | `OPENAI_BASE_URL` | No | Custom base URL (default: `https://api.openai.com/v1`) | ## Error Handling The client includes built-in resilience so you do not need to implement retry logic yourself. Transient errors (network issues, rate limits) are retried up to 3 times with exponential backoff. Permanent failures throw `LLMException`. ```java try { String text = whisper.transcribe(new File("speech.mp3")); } catch (LLMException e) { System.err.println("Transcription failed: " + e.getMessage()); } catch (IllegalArgumentException e) { System.err.println("File too large or invalid: " + e.getMessage()); } ``` --- # LLM Caching URL: https://tnsai.dev/docs/capabilities/llm/caching Description: Reduce latency and cost with semantic response caching. The cache uses similarity matching so that near-identical prompts return cached responses without hitting the API. import { Callout } from 'fumadocs-ui/components/callout' ## Setup Wrap any `LLMClient` with `CachedLLMClient` to enable semantic caching. The cache compares new prompts against stored ones using embedding similarity, so even rephrased questions can hit the cache. ```java LLMClient cached = CachedLLMClient.wrap(baseClient) .withCache(InMemorySemanticCache.builder() .embeddingProvider(new OpenAIEmbeddingProvider()) .ttlSeconds(3600) .build()) .highThreshold(0.95) // Direct cache hit .lowThreshold(0.70) // Similarity threshold .cacheStreaming(true) // Cache streaming responses .build(); ``` ## How It Works The cache uses a two-threshold system to decide when a stored response is "close enough" to reuse. This avoids returning stale answers for slightly different questions while still catching exact or near-exact duplicates. 1. A prompt comes in 2. The cache computes semantic similarity against stored prompts 3. If similarity \\>= `highThreshold` (0.95) → **direct hit**, return cached response 4. If similarity between `lowThreshold` and `highThreshold` → **gray zone**, may verify 5. If similarity \\< `lowThreshold` (0.70) → **cache miss**, call LLM and store result ## Configuration Tune the similarity thresholds to control the tradeoff between cache hit rate and answer accuracy. Lower thresholds give more hits but risk returning less relevant cached responses. | Parameter | Default | Description | | -------------------------- | ----------------------- | -------------------------------------- | | `withCache(SemanticCache)` | `InMemorySemanticCache` | Semantic cache implementation to use | | `highThreshold` | 0.95 | Similarity score for direct cache hit | | `lowThreshold` | 0.70 | Minimum similarity to consider a match | | `cacheStreaming` | true | Whether to cache streaming responses | TTL and max size are configured on the `SemanticCache` implementation (e.g., `InMemorySemanticCache.builder().ttlSeconds(3600).build()`). ## Prompt Caching Some providers (notably Anthropic) offer native prompt caching that lets you reuse previously processed system prompts, tools, and conversation prefixes at dramatically reduced cost. `PromptCachingClient` wraps any client and automatically adds the required cache control markers -- you do not need to modify your prompts manually. ### Builder Configure which parts of the request to cache and how many breakpoints to place in conversation history. ```java PromptCachingClient client = PromptCachingClient.builder() .client(anthropicClient) // Required: LLM client to wrap .cacheSystemPrompt(true) // Cache the system prompt (default: true) .cacheTools(true) // Cache tool definitions (default: true) .cacheHistoryBreakpoints(2) // Number of history cache points (default: 4, max: 4) .minTokensForCaching(1024) // Minimum token threshold (default: 1024, min: 1024) .build(); ``` ### How It Works `PromptCachingClient` transparently modifies outgoing requests to add cache markers. Your application code uses the client like any other `LLMClient` -- the caching is invisible. - **System prompt**: Marked for caching so it is reused across requests without re-processing. - **Tools**: A `cache_control` marker is added to the last tool definition (per Anthropic's recommendation). - **History**: Cache breakpoints are distributed evenly across conversation history. With `cacheHistoryBreakpoints(2)` and 10 messages, breakpoints are placed at positions \~3 and \~6. Usage is identical to any `LLMClient` -- caching is automatic: ```java ChatResponse response = client.chat("Hello", systemPrompt, history, tools); // Streaming works too Stream tokens = client.streamChat("Tell me more", systemPrompt, history, tools); Stream chunks = client.streamChatWithSpec("Continue", systemPrompt, history, tools); ``` ### Cost Savings Prompt caching provides significant cost savings, especially for applications with long system prompts or many tools. The initial cache write is slightly more expensive, but every subsequent read saves 90%. | Operation | Cost Impact | | ----------- | ----------------------------------------- | | Cache read | **90% cheaper** than regular input tokens | | Cache write | **25% more expensive** (one-time cost) | | Cache TTL | 5 minutes (refreshed on each use) | Over multiple requests with the same system prompt and tools, savings compound rapidly. ### Statistics Monitor your cache effectiveness with built-in counters. Track hit rates and estimated savings to verify that caching is working as expected. ```java // Token counters long readTokens = client.getTotalCacheReadTokens(); long creationTokens = client.getTotalCacheCreationTokens(); long requests = client.getRequestCount(); // Cache hit rate (0.0 to 1.0) double hitRate = client.getCacheHitRate(); // Estimated savings as a fraction (e.g., 0.85 = 85% savings) // Accounts for read savings (90%) minus creation overhead (25%) double savings = client.getEstimatedSavings(); // Reset all counters client.resetStats(); ``` Per-response cache usage is also available on `ChatResponse`: ```java ChatResponse response = client.chat("Hello", systemPrompt, history, tools); if (response.hasCacheUsage()) { response.getCacheReadInputTokens().ifPresent( tokens -> System.out.println("Cache read: " + tokens)); response.getCacheCreationInputTokens().ifPresent( tokens -> System.out.println("Cache created: " + tokens)); } ``` ### Configuration Introspection Inspect the current caching configuration at runtime, useful for debugging or logging the active settings. ```java client.isCacheSystemPromptEnabled(); // boolean client.isCacheToolsEnabled(); // boolean client.getCacheHistoryBreakpoints(); // int client.getDelegate(); // underlying LLMClient ``` ## Semantic Cache Interface If the built-in `InMemorySemanticCache` does not fit your needs (for example, you want Redis-backed caching or persistence across restarts), implement the `SemanticCache` interface with your own storage backend. ```java public interface SemanticCache { Optional get(String prompt); void put(String prompt, String response); void invalidate(String prompt); void clear(); } ``` Built-in: `InMemorySemanticCache` (thread-safe, LRU eviction). --- # Cost Tracking URL: https://tnsai.dev/docs/capabilities/llm/cost-tracking Description: Monitor and control LLM spending across providers with built-in cost tracking, budget management, and model pricing data for 100+ models. import { Callout } from 'fumadocs-ui/components/callout' ## Setup Wrap any `LLMClient` with `CostAwareLLMClient` to start tracking costs automatically. Every request records its token usage and calculates cost based on the model's pricing. ```java // Wrap with default cost tracking LLMClient tracked = CostAwareLLMClient.wrap(client); // Or use the builder for full control CostAwareLLMClient tracked = CostAwareLLMClient.builder() .client(client) .costTracker(new InMemoryCostTracker()) .budgetManager(budget) .build(); ``` ## Querying Costs Access accumulated cost data at any time. You can get the total across all models, break down by individual model, or query a specific time range. ```java tracker.getTotalCost(); // Total across all models tracker.getCostByModel("gpt-4o"); // Per-model breakdown tracker.getCostInRange(startTime, endTime); // Time-range query ``` ## Budget Management Set spending limits to prevent runaway costs. When the budget is exceeded, the `CostAwareLLMClient` will reject further requests. You can also set an alert threshold to get early warnings before hitting the limit. ```java BudgetManager budget = BudgetManager.builder() .limit(new BigDecimal("50.00")) .daily() // Duration.ofDays(1) .alertThreshold(0.80) .build(); // Or monthly budget BudgetManager monthly = BudgetManager.builder() .limit(new BigDecimal("500.00")) .monthly() // Duration.ofDays(30) .alertThreshold(0.80) .build(); ``` ## Model Pricing TnsAI ships with built-in pricing data for 100+ models so cost calculations work out of the box. Prices are in USD per 1 million tokens. | Model | Input | Output | | ---------------- | ------ | ------ | | gpt-4o | $2.50 | $10.00 | | gpt-4o-mini | $0.15 | $0.60 | | claude-sonnet-4 | $3.00 | $15.00 | | claude-opus-4 | $15.00 | $75.00 | | claude-3.5-haiku | $0.80 | $4.00 | | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.15 | $0.60 | | gemini-2.0-flash | $0.075 | $0.30 | ### Programmatic Usage Look up pricing for any model and calculate costs for a given token count. ```java ModelPricing pricing = ModelPricing.forModel("gpt-4o"); BigDecimal inputCost = pricing.calculateInputCost(1000); // 1000 tokens BigDecimal outputCost = pricing.calculateOutputCost(500); ``` ## Usage Records If you need to record usage manually (for example, from external API calls), you can create `UsageRecord` objects and pass them to the tracker directly. ```java tracker.record(UsageRecord.builder() .modelId("gpt-4o") .inputTokens(1000) .outputTokens(500) .build()); ``` --- # LLM URL: https://tnsai.dev/docs/capabilities/llm Description: Configure providers, route between models, cache responses, and track cost. import { Callout } from 'fumadocs-ui/components/callout' ## Pages - [Providers](/docs/capabilities/llm/providers) — 30+ built-in LLM providers and how to add your own. - [Routing](/docs/capabilities/llm/routing) — Pick the right model per request (size, cost, capability). - [Caching](/docs/capabilities/llm/caching) — Prompt cache, response cache. - [Cost Tracking](/docs/capabilities/llm/cost-tracking) — Per-agent and per-session cost accounting. - [Observability](/docs/capabilities/llm/observability) — Capture every LLM call as a typed `LLMCallLog` event with prompt, response, usage, cost, and streaming timing. - [Audio](/docs/capabilities/llm/audio) — Speech-to-text and text-to-speech. - [Advanced](/docs/capabilities/llm/advanced) — Request hooks, custom transports, rate limiting. --- # LLM Observability URL: https://tnsai.dev/docs/capabilities/llm/observability Description: Capture every LLM call as a typed LLMCallLog event — prompt, response, token usage, cost, streaming timing, errors, full context. One publish call per request, decorator-shaped so any provider works without modification. import { Callout } from 'fumadocs-ui/components/callout' > See also: **[Cost Tracking](/docs/capabilities/llm/cost-tracking)** — the older `CostAwareLLMClient` + `BudgetManager` system focused on spend control. Use observability for full call telemetry; use cost tracking when you need budget enforcement at the client edge. ## Why a typed event SLF4J debug lines and OTel span attributes answer "did this call happen?" but not "which prompt did agent X send in session Y?", "what did the LLM reply?", "what did this turn cost in USD attributed to which tenant?". Every major LLM-ops tool (LangFuse, Helicone, Phoenix) is built around per-call telemetry as a first-class object. `LLMCallLog` is the same shape, native to TnsAI. ## Quick Start Wrap any `LLMClient` with `CapturingLLMClient`. The default publisher emits one structured SLF4J line per call: ```java import com.tnsai.llm.observability.CapturingLLMClient; import com.tnsai.llm.observability.JsonLLMPricingRegistry; import com.tnsai.llm.observability.Slf4jLLMCallPublisher; LLMClient base = LLMClientFactory.create("openai", "gpt-4o", 0.7f); LLMClient observed = new CapturingLLMClient( base, JsonLLMPricingRegistry.defaultRegistry(), // 7 providers, 14+ models new Slf4jLLMCallPublisher()); Agent agent = AgentBuilder.create() .role(new MyRole()) .llm(observed) .build(); ``` Every chat / streamChat now logs: ``` INFO com.tnsai.llm.callLog - llm.call provider=openai model=gpt-4o elapsedMs=842 \ promptTokens=312 completionTokens=89 cachedTokens=0 totalTokens=401 \ costUSD=0.00168 pricingTable=2026-05 finishReason=STOP streamed=false tools=2 ``` Failures log at `WARN` with `errorClass`, `errorMessage`, and `httpStatus`. ## What gets captured `LLMCallLog` is a typed record carrying: | Field | Description | | --------------------------------------- | ------------------------------------------------------------------ | | `callId` | UUID — primary key for joining call to downstream events | | `startedAt` / `completedAt` / `elapsed` | Wall-clock timing | | `provider` / `model` / `endpoint` | Routing | | `prompt` | Messages, system prompt, parameters, prompt-cache markers | | `tools` | `ToolSurface` — names, schemas, SHA-256 hash for cache correlation | | `response` | Content, tool calls, reasoning content (o1 / Claude thinking) | | `usage` | Prompt / completion / cached / reasoning / total tokens | | `cost` | `CostEstimate` — prompt / completion / cached-discount / total USD | | `finishReason` | STOP, LENGTH, TOOL\_CALL, CONTENT\_FILTER | | `streamMetrics` | TTFT + chunk count for streaming calls | | `error` | `ErrorInfo` for failed calls — re-thrown after capture | | `context` | Full `EventContext` — tenant, agent, role, capability, session | | `retryAttempt` | Retry counter | ## Pricing Registry `JsonLLMPricingRegistry` loads versioned rate cards from classpath JSON: ```java JsonLLMPricingRegistry pricing = JsonLLMPricingRegistry.defaultRegistry(); // loads /pricing/2026-05.json — 7 providers, 14+ models ``` Default coverage: `openai` (GPT-4o, GPT-4o-mini, o1-preview), `anthropic` (Claude Sonnet 4, Opus 4, Haiku 4.5), `google` (Gemini 2.0 Flash, Pro), `mistral` (Large, Small), `groq` (Llama 3.3 70B, Mixtral 8x7B), `cohere` (Command R+, R), `ollama` (wildcard at zero — local models). Bring your own rate card for enterprise-negotiated pricing or new providers: ```java LLMPricingRegistry custom = new InMemoryLLMPricingRegistry("contract-2026-05"); custom.register("openai", "gpt-4o", new ModelPricing( BigDecimal.valueOf(0.0015), // promptPer1k (negotiated) BigDecimal.valueOf(0.0005), // cachedPer1k BigDecimal.valueOf(0.006), // completionPer1k null)); // reasoningPer1k LLMClient observed = new CapturingLLMClient(base, custom, new Slf4jLLMCallPublisher()); ``` The `pricingTable` field on every `LLMCallLog` records which version generated the cost — historical estimates don't shift when rates change downstream. ## Streaming Capture For streaming calls, the decorator captures `StreamMetrics`: ```java public record StreamMetrics( Instant firstChunkAt, Duration timeToFirstToken, // operator's #1 latency metric long chunkCount, Duration interChunkP50, // p50/p99 are zero in 0.9.x; histogram-friendly Duration interChunkP99 // counts ship now, percentiles in a follow-up ) {} ``` TTFT (time to first token) is the metric you graph for user-perceived latency. ## Tool Surface Hashing When the LLM call advertises tools, `ToolSurface` carries the names + JSON schemas plus a SHA-256 hash of the canonical sorted-key form: ```java public record ToolSurface( List toolNames, List toolSchemas, String surfaceHash ) {} ``` Same `surfaceHash` across calls = identical tool set = prompt-cache friendly. Use the hash to identify cacheable trajectories in your dashboards. ## Custom Publisher `LLMCallPublisher` is a single-method functional interface. Build your own to push to LangFuse, Helicone, Phoenix, or a custom sink: ```java public final class LangFusePublisher implements LLMCallPublisher { @Override public void publish(LLMCallLog call) { // Convert LLMCallLog → LangFuse trace + generation langfuseClient.trace() .name(call.callId()) .metadata(Map.of( "provider", call.provider(), "model", call.model(), "tenant", call.context().tenantId().orElse("default"))) .generation(g -> g .input(call.prompt().messages()) .output(call.response().content()) .usage(call.usage()) .totalCost(call.cost().totalUSD())) .submit(); } } ``` The publisher contract requires `publish` not to throw — observability failures must never block the agent's hot path. ## Cost Attribution `LLMCallLog.context()` carries the full `EventContext` — tenant, agent, role, capability, session, group. Aggregate cost in your downstream sink along any of these dimensions: - **Per tenant** — billing - **Per agent** — which agent is the budget hog - **Per role** — which role's LLM allocation is tight - **Per capability** — chatty vs terse `@Capability` implementations - **Per session** — per-conversation cost for end-user billing Multi-agent cost split per group member works the same way — group context propagates. ## What's Not in the Default Publisher `Slf4jLLMCallPublisher` deliberately does NOT log raw prompt or response text. Those can carry PII (user dictation, API keys passed as tool arguments, addresses in responses). Verbose dump belongs behind the redaction SPI from issue #80, on a separate publisher with explicit consumer opt-in. ## Coverage Notes - The decorator covers `chat()` and `streamChat()`. Multimodal `chat(List ...)` and tool-aware `streamChatWithSpec` pass through without capture in 0.9.x — those paths are smaller in production usage and will land with integration coverage in a follow-up. - `usage().promptTokens()` is zero when the provider didn't populate the usage block (some local Ollama models). Cost estimate is also zero — a meaningful "no usage data" signal, not a bug. - The `endpoint` field is populated when the underlying client exposes its base URL; falls back to empty string otherwise. ## See Also - **[Cost Tracking](/docs/capabilities/llm/cost-tracking)** — `CostAwareLLMClient` + `BudgetManager` for client-edge spend control - **[Sampling](/docs/sampling)** — pair `CapturingLLMClient` with sampling decorators when you ship to a high-volume aggregator - **[Providers](/docs/capabilities/llm/providers)** — the 14 built-in LLM providers all work with the decorator unchanged --- # LLM Providers URL: https://tnsai.dev/docs/capabilities/llm/providers Description: The LLM module provides a unified interface to 30+ language-model providers. Every provider implements the same LLMClient interface, so switching providers means changing one line — the model name and provider key — not your agent code. import { Callout } from 'fumadocs-ui/components/callout' ## Quick Start Create a client for any supported provider with a single factory call. The client handles authentication, serialization, retries, and streaming automatically. ```java LLMClient client = LLMClientFactory.create("openai", "gpt-4o", 0.7f); ChatResponse response = client.chat("What is quantum computing?"); ``` API keys are resolved from environment variables automatically: ```bash export OPENAI_API_KEY=sk-... export ANTHROPIC_API_KEY=sk-ant-... ``` The full environment-variable matrix — every provider, every key name — lives in [Configuration Reference](/docs/reference/configuration). For each provider, the variable name is also verified at build time by `ProviderEnvVarConsistencyTest` in `tnsai-llm`, so the docs cannot drift without CI failing. ## Supported Providers Each row below maps a provider key (the first arg to `LLMClientFactory.create(...)`) to its `LLMClient` implementation. Models change frequently — see each provider's docs for the up-to-date list and use whatever model string they document. ### Frontier and major hosted providers | Provider | Provider key | Class | | ------------------------- | --------------------- | ------------------ | | **OpenAI** | `openai` | `OpenAIClient` | | **Anthropic** | `anthropic`, `claude` | `AnthropicClient` | | **Google Gemini** | `gemini`, `google` | `GeminiClient` | | **Google Vertex AI** | `vertexai`, `vertex` | `VertexAIClient` | | **xAI (Grok)** | `xai`, `grok` | `XAIGrokClient` | | **Mistral La Plateforme** | `mistral` | `MistralClient` | | **Cohere** | `cohere` | `CohereClient` | | **DeepSeek** | `deepseek` | `DeepSeekClient` | | **Perplexity (Sonar)** | `perplexity`, `pplx` | `PerplexityClient` | ### Fast / specialized inference | Provider | Provider key | Class | | --------------------------- | ------------------- | ------------------- | | **Groq** | `groq` | `GroqClient` | | **Cerebras** | `cerebras` | `CerebrasClient` | | **NVIDIA NIM** | `nvidia`, `nim` | `NvidiaNIMClient` | | **Together AI** | `together` | `TogetherAIClient` | | **Fireworks AI** | `fireworks` | `FireworksAIClient` | | **DeepInfra** | `deepinfra` | `DeepInfraClient` | | **Replicate** | `replicate` | `ReplicateClient` | | **Hugging Face** | `huggingface`, `hf` | `HuggingFaceClient` | | **OpenRouter (aggregator)** | `openrouter` | `OpenRouterClient` | ### Enterprise platforms (cloud-managed) | Provider | Provider key | Class | | ------------------------ | ----------------------- | ------------------- | | **AWS Bedrock** | `bedrock`, `aws` | `BedrockClient` | | **Azure OpenAI** | `azure`, `azure-openai` | `AzureOpenAIClient` | | **IBM watsonx.ai** | `watsonx`, `ibm` | `WatsonxClient` | | **Databricks Mosaic AI** | `databricks`, `mosaic` | `DatabricksClient` | ### Regional / non-Western providers | Provider | Provider key | Class | | ---------------------- | -------------------- | ---------------------- | | **Alibaba Qwen Cloud** | `qwen`, `dashscope` | `QwenCloudClient` | | **Tencent Hunyuan** | `hunyuan`, `tencent` | `TencentHunyuanClient` | | **ZhipuAI** | `zhipu`, `glm` | `ZhipuAIClient` | | **MiniMax** | `minimax` | `MiniMaxClient` | | **01.AI (Yi)** | `yi`, `01ai` | `YiClient` | ### Local / self-hosted | Provider | Provider key | Class | | -------------------- | ------------ | ---------------------- | | **Ollama** | `ollama` | `OllamaClient` | | **LM Studio** | `lmstudio` | `LMStudioClient` | | **llama.cpp server** | `llamacpp` | `LlamaCppServerClient` | | **vLLM** | `vllm` | `VLLMClient` | Local providers don't require an API key — point them at the right base URL (`OLLAMA_BASE_URL`, `LMSTUDIO_BASE_URL`, etc.) and the local server handles the rest. ## Creating Clients ### Using the Factory `LLMClientFactory` is the recommended way to create clients. It resolves API keys from environment variables, selects the correct provider class, and applies default settings. Use this unless you need fine-grained control over client construction. ```java // Basic — provider name, model, temperature LLMClient client = LLMClientFactory.create("openai", "gpt-4o", 0.7f); // With max tokens LLMClient client = LLMClientFactory.create("anthropic", "claude-sonnet-4-20250514", 0.7f, 4096); // With topP (nucleus sampling) LLMClient client = LLMClientFactory.create("gemini", "gemini-2.5-flash", 0.7f, 2048, 0.95f); // From @RoleSpec annotation LLMClient client = LLMClientFactory.fromAnnotation(MyRole.class); ``` Provider keys are case-insensitive. Aliases for each provider are listed in the [Supported Providers](#supported-providers) tables above. ### Direct Construction When you need to pass a custom API key, a non-standard base URL, or provider-specific settings, construct the client class directly. ```java // OpenAI with custom settings LLMClient client = new OpenAIClient("gpt-4o", 0.7f, 0.95f, 4096); // Anthropic with custom API key LLMClient client = new AnthropicClient("claude-sonnet-4-20250514", "sk-ant-..."); // Ollama with custom base URL LLMClient client = new OllamaClient("http://gpu-server:11434", "llama3", 0.7f, 4096, null); // Azure with endpoint LLMClient client = new AzureOpenAIClient("gpt-4", "your-api-key", "https://myresource.openai.azure.com/"); ``` ## Environment Variables The factory and direct constructors resolve API keys from environment variables automatically. The full matrix — every provider, every key name, plus optional `_BASE_URL` companion variables for self-hosted endpoints — lives in [Configuration Reference](/docs/reference/configuration). The list is authored against `ProviderEnvVarConsistencyTest` in `tnsai-llm`, so the docs cannot drift without CI failing. Self-hosted and local providers (`ollama`, `lmstudio`, `llamacpp`, `vllm`) don't require an API key — just set the `*_BASE_URL` to point the client at your container or local server. Ollama defaults to `http://localhost:11434`. ## Streaming All providers support streaming, which lets you display tokens to the user as they are generated rather than waiting for the complete response. Three streaming patterns are available depending on how much control you need. ```java // Text stream Stream tokens = client.streamChat("Tell me a story"); // ChatChunk stream Stream chunks = client.streamChatWithSpec(request); // Handler-based client.streamChatWithHandler(request, chunk -> { ... }); ``` ## Resilience TnsAI includes built-in resilience features so your application keeps working even when LLM providers have temporary issues. ### Circuit Breaker A circuit breaker prevents your application from repeatedly calling a failing provider. After a configurable number of consecutive failures, it stops sending requests ("opens the circuit") and returns errors immediately, giving the provider time to recover. ```java LLMClient resilient = CircuitBreakerClient.builder() .client(openaiClient) .failureThreshold(3) .recoveryTimeout(Duration.ofSeconds(60)) .build(); ``` States: **CLOSED** (normal) -\\> **OPEN** (fast-fail) -\\> **HALF\_OPEN** (probe recovery). ### Built-in Retry All providers include automatic retry with exponential backoff for transient errors like rate limits and server errors. This is enabled by default with no configuration needed. - **Max retries:** 3 - **Initial delay:** 1 second - **Retriable HTTP codes:** 408, 425, 429, 500, 502, 503, 504, 529 - **Retriable exceptions:** `ConnectException`, `SocketTimeoutException` ## Observability Understanding what your LLM calls are doing in production is critical for debugging, cost tracking, and compliance. The observability layer lets you wrap any `LLMClient` with logging, metrics, and custom observers without changing your application code. ### ObservableLLMClient `ObservableLLMClient` is a decorator that wraps an existing client and intercepts every call, forwarding lifecycle events to one or more observers. Your application code uses the wrapped client exactly like the original -- the observability is completely transparent. ```java // Single observer LLMClient observable = new ObservableLLMClient(client, metrics); // Multiple observers (varargs) LLMClient observable = new ObservableLLMClient(client, metrics, promptLogger, auditObserver); ``` Internally, multiple observers are merged into a `CompositeObserver`. Null observers and `LLMObserver.NOOP` are filtered out automatically. ### LLMObserver Interface (6 hooks) `LLMObserver` is the callback interface for monitoring LLM operations. All six methods have default no-op implementations, so you only override the hooks you care about. For example, you might only need `onResponse` for latency tracking. | Hook | When it fires | Parameters | | ------------------ | --------------------------- | --------------------------------------------- | | `onRequest` | Before sending a request | client, message, systemPrompt, history, tools | | `onResponse` | After a successful response | client, response, latencyMs | | `onError` | When a request fails | client, error, latencyMs | | `onStreamChunk` | For each streaming chunk | client, chunk, chunkIndex | | `onStreamComplete` | When streaming finishes | client, totalChunks, latencyMs | | `onStreamError` | When streaming fails | client, error, chunksReceived, latencyMs | ```java LLMObserver myObserver = new LLMObserver() { @Override public void onRequest(LLMClient client, String message, Optional systemPrompt, Optional>> history, Optional>> tools) { log.info("Request to {}: {}", client.getModel(), message); } @Override public void onResponse(LLMClient client, ChatResponse response, long latencyMs) { log.info("Response from {} ({}ms)", client.getModel(), latencyMs); } }; LLMClient observed = new ObservableLLMClient(client, myObserver); ``` A pre-built no-op sentinel is available as `LLMObserver.NOOP`. Compose multiple observers with `CompositeObserver`: ```java LLMObserver combined = CompositeObserver.of(metricsObserver, loggingObserver, auditObserver); ``` ### PromptLogger (PII Filtering + MDC Context) `PromptLogger` is a production-ready observer that logs every LLM request and response with automatic PII redaction. It prevents sensitive data (emails, credit cards, API keys) from appearing in your logs and adds correlation IDs so you can trace requests through distributed systems. **PII filtering.** When enabled (default), the logger redacts common sensitive patterns before writing to logs: | Pattern | Replacement | | ----------------------- | ------------------ | | Email addresses | `[REDACTED_EMAIL]` | | Phone numbers | `[REDACTED_PHONE]` | | Credit card numbers | `[REDACTED_CC]` | | Social Security Numbers | `[REDACTED_SSN]` | | API keys and tokens | `[REDACTED_KEY]` | | IP addresses | `[REDACTED_IP]` | **MDC context.** The logger populates SLF4J MDC with correlation fields so downstream log infrastructure (ELK, Datadog, etc.) can group related events: | MDC Key | Value | | --------------- | --------------------------------- | | `llm.requestId` | Unique 8-char ID per request | | `llm.provider` | Provider name (e.g. `OpenAI`) | | `llm.model` | Model name (e.g. `gpt-4o`) | | `llm.latencyMs` | Request latency (set on response) | MDC fields are cleared automatically after each request/response cycle. **Builder API:** ```java PromptLogger promptLogger = PromptLogger.builder() .filterPII(true) // default: true .logLevel(PromptLogger.LogLevel.INFO) // DEBUG, INFO, or WARN .maxContentLength(500) // default: 200 chars .logFullContent(false) // true to disable truncation .build(); LLMClient observed = new ObservableLLMClient(client, promptLogger); ``` **Factory methods** for common configurations: ```java PromptLogger.withPIIFiltering(); // PII enabled, defaults PromptLogger.withoutFiltering(); // PII disabled, defaults ``` **Log output format:** ``` [INFO] LLM Request [req-abc123] OpenAI/gpt-4o: "My email is [REDACTED_EMAIL]" [INFO] LLM Response [req-abc123] OpenAI/gpt-4o (245ms): "The answer is 4" ``` Tool calls within responses are logged individually: ``` [INFO] Tool call: calculator({"expression":"2+2"}) ``` ### LLMMetrics (Performance Tracking) `LLMMetrics` is an observer that automatically collects performance data -- request counts, token usage, latency percentiles (p50/p95/p99), error rates, and estimated costs -- across all providers. Use it to monitor your LLM spending and identify performance bottlenecks. **Setup:** ```java LLMMetrics metrics = new LLMMetrics(); LLMClient observed = new ObservableLLMClient(client, metrics); // Use the client normally observed.chat("Hello!"); ``` **Global report** via `getReport()`: ```java LLMMetrics.Report report = metrics.getReport(); report.totalRequests(); // total request count report.totalResponses(); // successful responses report.totalErrors(); // error count report.totalInputTokens(); // estimated input tokens report.totalOutputTokens(); // estimated output tokens report.totalEstimatedCost(); // cost in USD (based on provider pricing) report.avgLatencyMs(); // average latency report.p50LatencyMs(); // median latency report.p95LatencyMs(); // 95th percentile latency report.p99LatencyMs(); // 99th percentile latency report.successRate(); // percentage (0-100) report.errorRate(); // percentage (0-100) report.timestamp(); // Instant of report generation ``` **Per-provider breakdown** via `getMetricsByProvider()`: ```java Map byProvider = metrics.getMetricsByProvider(); for (var entry : byProvider.entrySet()) { String providerKey = entry.getKey(); // e.g. "OpenAI/gpt-4o" LLMMetrics.ProviderMetrics pm = entry.getValue(); pm.requests(); // request count for this provider pm.responses(); // successful responses pm.errors(); // errors pm.inputTokens(); // estimated input tokens pm.outputTokens(); // estimated output tokens pm.estimatedCost(); // cost in USD pm.avgLatencyMs(); // average latency pm.streamChunks(); // total streaming chunks pm.successRate(); // percentage (0-100) pm.errorRate(); // percentage (0-100) } ``` Token counts are estimated at \~4 characters per token. Cost is calculated using `LLMCapabilities.getInputCostPer1KTokens()` and `getOutputCostPer1KTokens()` from the provider. Call `metrics.reset()` to clear all counters. **Combining observers.** Use metrics alongside prompt logging: ```java LLMMetrics metrics = new LLMMetrics(); PromptLogger logger = PromptLogger.withPIIFiltering(); LLMClient observed = new ObservableLLMClient(client, metrics, logger); ``` ## JSON Mode When you need the LLM to return valid JSON instead of free-form text, wrap your client with `JsonModeClient`. It uses provider-native JSON mode when available (OpenAI, Gemini) and falls back to prompt engineering for providers that lack native support (Anthropic, Ollama). ### Quick Wrap The simplest way to get JSON output -- just wrap your existing client. ```java // Simple wrap -- uses JSON_OBJECT format, auto-detects native support JsonModeClient client = JsonModeClient.wrap(baseClient); ChatResponse response = client.chat("List 3 programming languages"); // {"languages": ["Python", "Java", "JavaScript"]} ``` ### Builder For advanced control, the builder lets you specify a custom JSON schema, provide your own ObjectMapper, or force prompt engineering mode even when native JSON mode is available. ```java JsonModeClient client = JsonModeClient.builder() .client(baseClient) // Required: LLM client to wrap .responseFormat(format) // ResponseFormat (default: jsonObject()) .objectMapper(customMapper) // Custom Jackson ObjectMapper .forcePromptEngineering(true) // Skip native JSON mode, use prompt injection .schemaFromClass("Person", Person.class) // Auto-generate schema from class .build(); ``` ### chatAs -- Type-Safe JSON Parsing The `chatAs` method combines JSON generation and deserialization in one step, returning a strongly-typed Java object instead of a raw JSON string. ```java record LanguageList(List languages) {} // Simple LanguageList list = client.chatAs(LanguageList.class, "List 3 programming languages"); // With system prompt LanguageList list = client.chatAs(LanguageList.class, "List 3 languages", Optional.of("You are a helpful assistant")); // Full parameters LanguageList list = client.chatAs(LanguageList.class, message, systemPrompt, history, tools); ``` If JSON parsing fails, `JsonModeClient.JsonParseException` is thrown with `getRawContent()` for debugging. ### ResponseFormat Controls the structure of LLM output. Use `text()` for default behavior, `jsonObject()` for generic JSON, or `jsonSchema()` to enforce a specific schema. ```java // Plain text (default LLM behavior) ResponseFormat text = ResponseFormat.text(); // JSON object (valid JSON, structure not enforced) ResponseFormat json = ResponseFormat.jsonObject(); // JSON Schema (valid JSON conforming to a schema) ResponseFormat schema = ResponseFormat.jsonSchema("Person", Map.of( "type", "object", "properties", Map.of( "name", Map.of("type", "string"), "age", Map.of("type", "integer") ), "required", List.of("name", "age") )); // JSON Schema from a Java class (uses SchemaGenerator) ResponseFormat schema = ResponseFormat.jsonSchema("Person", Person.class); ``` ### SchemaGenerator Automatically generates a JSON Schema from any Java class using reflection. This saves you from writing schemas by hand -- just pass your record or POJO and it produces a valid schema that the LLM can follow. ```java public record Person(String name, int age, List hobbies) {} Map schema = SchemaGenerator.generateSchema(Person.class); // {"type":"object","properties":{"name":{"type":"string"},"age":{"type":"integer"}, // "hobbies":{"type":"array","items":{"type":"string"}}},"required":["name","age","hobbies"]} // Record-specific (all components required) Map schema = SchemaGenerator.generateRecordSchema(Person.class); ``` ### Provider Support Not all providers support native JSON mode. When native support is unavailable, `JsonModeClient` falls back to prompt engineering (injecting JSON instructions into the prompt). | Provider | JSON\_OBJECT | JSON\_SCHEMA | | --------- | :----------------: | :-----------------------: | | OpenAI | Yes | Yes (GPT-4o, GPT-4-turbo) | | Anthropic | No (use tool\_use) | No (use tool\_use) | | Gemini | Yes | Yes | | Ollama | Depends on model | No | ## Model Capabilities Before sending a request that requires specific features (vision, tool calling, large context), you should check whether the model supports them. The `LLMCapabilities` interface provides a standardized way to query any model's features, limits, and pricing. ```java LLMClient client = new OpenAIClient("gpt-4o"); LLMCapabilities caps = client.getCapabilities(); // Check before use if (caps.supportsVision()) { response = client.chat(List.of(textPart, imagePart), system, history, tools); } // Context window check if (estimatedTokens > caps.getMaxInputTokens()) { // Truncate or summarize } ``` ### Core Capability Methods These boolean methods tell you what the model can do. Check them before using advanced features to avoid runtime errors. | Method | Return | Description | | --------------------------------- | --------- | --------------------------------------------- | | `supportsStreaming()` | `boolean` | Streaming responses (most modern LLMs) | | `supportsVision()` | `boolean` | Image/visual input (GPT-4o, Claude 3, Gemini) | | `supportsFunctionCalling()` | `boolean` | Tool/function calling for agents | | `supportsStructuredOutput()` | `boolean` | JSON mode / structured output | | `supportsSystemPrompt()` | `boolean` | System prompt distinction (default `true`) | | `supportsParallelFunctionCalls()` | `boolean` | Multiple tool calls in one response | ### Token Limits Know your model's context window to avoid truncation errors and plan your context management strategy. | Method | Return | Description | | ---------------------- | ------ | ------------------------------------------------- | | `getMaxInputTokens()` | `int` | Maximum input tokens (context window) | | `getMaxOutputTokens()` | `int` | Maximum output tokens | | `getContextWindow()` | `int` | Total context window (defaults to maxInputTokens) | ### Modality Modalities describe what types of input a model can process. Use this to check whether a model supports image, audio, or video input before sending multimodal content. ```java Set modalities = caps.getSupportedModalities(); boolean canHandleAudio = caps.supportsModality(Modality.AUDIO); ``` ### Provider & Cost Information Access pricing and provider metadata to estimate costs before making calls or to build cost-tracking dashboards. | Method | Return | Description | | ---------------------------- | ------------------ | ----------------------------------------------- | | `getProviderName()` | `String` | Provider name (OpenAI, Anthropic, Google, etc.) | | `getModelId()` | `String` | Model identifier | | `getModelVersion()` | `Optional` | Model version | | `getInputCostPer1KTokens()` | `Optional` | Input cost in USD per 1K tokens | | `getOutputCostPer1KTokens()` | `Optional` | Output cost in USD per 1K tokens | | `getEstimatedLatencyMs()` | `Optional` | Estimated time-to-first-token in ms | ### Special Capabilities Some models support advanced features beyond standard chat. Check these before using specialized functionality. | Method | Default | Description | | ------------------------- | --------------- | ----------------------------- | | `supportsCodeExecution()` | `false` | Code Interpreter support | | `supportsWebBrowsing()` | `false` | Web browsing support | | `supportsFileUpload()` | From modalities | File upload support | | `supportsReasoning()` | `false` | Reasoning/thinking (o1-style) | ### meetsRequirements A convenience method that checks multiple capability requirements at once, so you can verify a model is suitable for your use case in a single call. ```java boolean suitable = caps.meetsRequirements( true, // requiresVision true, // requiresTools 32000 // minContextTokens ); ``` ### Validation Methods When a capability is required (not optional), use these methods to fail fast at startup with a clear error message rather than getting cryptic errors at runtime. ```java caps.requireToolCalling(); // throws ToolCallNotSupportedException caps.requireStreaming(); // throws LLMCapabilityException caps.requireVision(); // throws LLMCapabilityException ``` ### Model Capability Profiles A quick reference for the most commonly used models and their supported features. | Model | Vision | Tools | JSON | Context | | ---------------- | :----: | :---: | :--: | ------: | | GPT-4o | Yes | Yes | Yes | 128K | | GPT-4-turbo | Yes | Yes | Yes | 128K | | GPT-3.5-turbo | No | Yes | Yes | 16K | | Claude Sonnet 4 | Yes | Yes | Yes | 200K | | Claude 3 Opus | Yes | Yes | Yes | 200K | | Gemini 2.5 Flash | Yes | Yes | Yes | 1M | | Llama 3.2 | No | Yes | No | 128K | | Mistral Large | No | Yes | Yes | 128K | ## Multimodal Input Some models can process images, audio, and video alongside text. TnsAI uses a `ContentPart` system to represent mixed-media messages, so you can combine text with images or audio in a single request. | Class | Type | Description | | ----------- | --------- | --------------------------------- | | `TextPart` | `"text"` | Plain text content | | `ImagePart` | `"image"` | Image data (Base64 encoded) | | `AudioPart` | `"audio"` | Audio data (Base64, URL, or file) | | `VideoPart` | `"video"` | Video data (Gemini) | ### Sending Images Create an `ImagePart` from Base64-encoded data and include it alongside text in a multimodal message. ```java // Create image part from Base64 data ImagePart image = ImagePart.fromBase64(base64Data, "image/png"); // Build multimodal message List parts = List.of( new TextPart("What do you see in this image?"), image ); // Send to a vision-capable model ChatResponse response = client.chat(parts, Optional.of("You are a helpful assistant"), Optional.empty(), Optional.empty() ); ``` ### Sending Audio Create an `AudioPart` from a file, byte array, Base64 string, or URL. The model will process the audio alongside any text you include. ```java // From file AudioPart audio = AudioPart.fromFile(new File("recording.mp3")); // From Base64 AudioPart audio = AudioPart.fromBase64(base64String, "audio/wav"); // From byte array AudioPart audio = AudioPart.fromBytes(rawBytes, "audio/mp3"); // From URL AudioPart audio = AudioPart.fromUrl("https://example.com/audio.mp3"); // Send as multimodal message List parts = List.of( new TextPart("Transcribe this audio"), audio ); ChatResponse response = client.chat(parts, systemPrompt, history, tools); ``` ### Capability Check Before Multimodal Always check the model's capabilities before sending multimodal content. If the model does not support vision or audio, fall back to a text-only alternative. ```java LLMCapabilities caps = client.getCapabilities(); if (caps.supportsVision()) { // Safe to send ImagePart client.chat(List.of(new TextPart("Describe this"), image), system, history, tools); } else { // Fall back to text-only client.chat("Describe the concept", system, history, tools); } ``` ## SPI Registration The LLM module uses Java's ServiceLoader mechanism to register itself automatically. You do not need to configure this manually -- just add the `tnsai-llm` dependency to your project and the providers become available through `LLMClientFactory`. ``` # META-INF/services/com.tnsai.llm.LLMClientProvider com.tnsai.llm.LLMClientFactoryProvider ``` --- # LLM Routing URL: https://tnsai.dev/docs/capabilities/llm/routing Description: Route requests across multiple LLM providers with built-in strategies. Routing enables failover, cost optimization, latency reduction, and capability-based model selection. import { Callout } from 'fumadocs-ui/components/callout' ## Fallback Router The simplest routing strategy: tries each provider in the order you list them, moving to the next only if the current one fails. Use this when you want high availability with a clear priority order. ```java LLMRouter router = FallbackRouter.of( new OpenAIClient("gpt-4o"), new AnthropicClient("claude-sonnet-4-20250514"), new GroqClient("llama-3.3-70b-versatile") ); ChatResponse response = router.chat("Hello"); // If OpenAI fails → tries Anthropic → then Groq ``` ## Cost-Based Router Automatically selects the cheapest provider for each request based on the model pricing data. Use this when you want to minimize LLM spending while keeping multiple providers available. ```java LLMRouter router = CostBasedRouter.of( new OpenAIClient("gpt-4o-mini"), // $0.15/$0.60 per 1M tokens new GroqClient("llama-3.3-70b"), // Free tier new AnthropicClient("claude-3.5-haiku") // $0.80/$4.00 ); ``` ## Latency-Based Router Tracks the actual response time of each provider and routes new requests to the fastest one. The router continuously updates its latency measurements, so it adapts if a provider speeds up or slows down. ```java LLMRouter router = LatencyBasedRouter.of( new OpenAIClient("gpt-4o"), new GroqClient("llama-3.3-70b") // Typically faster ); ``` ## Capability Router Inspects each request's requirements (vision, tool calling, etc.) and routes to a provider that supports them. This lets you use cheaper text-only models for simple requests while reserving expensive multimodal models for requests that need them. ```java LLMRouter router = CapabilityRouter.of( new OpenAIClient("gpt-4o"), // Vision + tools new OllamaClient("llama3") // Text only ); // Vision requests go to OpenAI, text-only to Ollama ``` ## Round-Robin Router Distributes requests evenly across providers in a rotating order. This is useful for spreading rate limit usage across multiple API keys or providers. ```java LLMRouter router = RoundRobinRouter.of( new OpenAIClient("gpt-4o"), new AnthropicClient("claude-sonnet-4-20250514"), new GeminiClient("gemini-2.5-flash") ); ``` ## Task-Based Router Routes based on automatic task type classification. The router analyzes prompt content using keyword matching, regex patterns, and conversation history to select the best model for each request. ```java TaskBasedRouter router = TaskBasedRouter.builder() .forTask(TaskType.CODING, new AnthropicClient("claude-sonnet-4")) .forTask(TaskType.CREATIVE, new OpenAIClient("gpt-4o")) .forTask(TaskType.MATH, new OpenAIClient("o1-preview")) .forTask(TaskType.FAST, new GroqClient("llama-3.3-70b")) .forTask(TaskType.VISION, new OpenAIClient("gpt-4o")) .defaultClient(new OpenAIClient("gpt-4o-mini")) .build(); // Auto-routing -- no manual model selection router.chat("Write a Python function to sort a list"); // -> Claude (CODING) router.chat("Solve: integral of x squared dx"); // -> o1 (MATH) router.chat("Write a poem about autumn"); // -> GPT-4o (CREATIVE) router.chat("What is 2+2?"); // -> GPT-4o-mini (default) ``` ### TaskType Enum Ten task categories cover the most common LLM use cases. Each type has built-in keywords for automatic classification, and you can add custom keywords for your domain. | TaskType | Description | Recommended Models | Example Keywords | | ------------- | --------------------------------------- | ------------------------------------ | ---------------------------------------------- | | `CODING` | Code generation, debugging, refactoring | Claude Sonnet, GPT-4o, Codestral | code, function, debug, python, java, algorithm | | `CREATIVE` | Stories, poetry, marketing copy | GPT-4o, Claude Opus | poem, creative, fiction, blog post, narrative | | `MATH` | Calculations, proofs, equations | o1, Gemini 2.0, Claude with CoT | calculate, solve, equation, integral, theorem | | `ANALYSIS` | Data analysis, summarization | GPT-4o, Gemini, Claude | analyze, summarize, extract, compare, metrics | | `TRANSLATION` | Language translation | GPT-4o, Gemini | translate, in english, in french | | `CHAT` | Conversational Q\&A | Any capable model | hello, what is, explain, tell me | | `FAST` | Quick, simple tasks | GPT-4o-mini, Groq, Gemini Flash | quick, simple, brief, yes or no | | `REASONING` | Deep reasoning and logic | o1, o1-pro, Claude extended thinking | think step by step, logic, deduce, infer | | `VISION` | Image understanding | GPT-4o, Gemini, Claude (vision) | image, picture, screenshot, diagram | | `GENERAL` | Default fallback | Configured default client | (no keywords) | Each `TaskType` provides `matches(text)` for boolean matching and `matchScore(text)` returning the number of keyword hits. ### TaskClassifier `TaskClassifier` analyzes the text of each prompt and assigns it to a task type. It uses a multi-signal scoring system that combines regex patterns, keyword matching, custom keywords, and conversation history to make accurate classifications. **Classification strategy (in order):** 1. **Pattern detection** -- Regex patterns for code blocks, function signatures, file extensions, math equations, math symbols, and image references. Highest scoring priority. 2. **Keyword scoring** -- Each keyword match from `TaskType.getKeywords()` adds 2 points. 3. **Custom keyword scoring** -- User-added keywords add 3 points each. 4. **History context boost** -- Last 3 history messages give a 1.5-point boost per matching task type. 5. **Best match selection** -- Highest total score wins. Score is converted to confidence (0.0-1.0). Falls back to default type if below `minConfidenceThreshold`. ```java TaskClassifier classifier = TaskClassifier.defaultClassifier(); // Simple classification TaskType type = classifier.classify("Write a Python function to sort a list"); // Returns: CODING // With confidence score ClassificationResult result = classifier.classifyWithConfidence("Solve x^2 + 2x + 1 = 0"); result.taskType(); // MATH result.confidence(); // 0.85 result.reason(); // "Pattern and keyword match" // With conversation history for context List history = List.of("I'm working on a React app", "Help me debug"); TaskType type = classifier.classify("What's wrong with this code?", history); // Returns: CODING (boosted by history context) ``` **Custom classifier:** ```java TaskClassifier classifier = TaskClassifier.builder() .addKeywords(TaskType.CODING, Set.of("backend", "frontend", "docker")) .addPattern(TaskType.MATH, Pattern.compile("[0-9]+")) .setDefaultType(TaskType.CHAT) .setMinConfidenceThreshold(0.2) .build(); ``` ### TaskBasedRouter Builder Configure which LLM client handles each task type. A default client is required as a fallback for unclassified requests. ```java TaskBasedRouter router = TaskBasedRouter.builder() .forTask(TaskType.CODING, codingClient) // Assign client per task type .forTask(TaskType.MATH, mathClient) .defaultClient(generalClient) // Required: fallback for unmatched tasks .classifier(customClassifier) // Custom TaskClassifier (optional) .confidenceThreshold(0.3) // Below this, use default (default: 0.3) .build(); ``` **Convenience setup for common mappings:** ```java TaskBasedRouter router = TaskBasedRouter.builder() .withCommonSetup( codingClient, // CODING reasoningClient, // MATH + REASONING fastClient, // FAST + CHAT generalClient // default ) .build(); ``` ### Manual Override When you know the task type in advance, you can bypass automatic classification and route directly to the appropriate client. ```java // Force a specific task type (bypasses classification) ChatResponse response = router.chatAs(TaskType.CODING, "Explain this concept"); // Get the client for a task type directly LLMClient codingClient = router.getClientForTask(TaskType.CODING); // Inspect classification without routing ClassificationResult result = router.classifyTask("Write a poem"); ``` ### Routing Statistics Track how requests are distributed across task types to understand your usage patterns and estimate cost savings from intelligent routing. ```java TaskRoutingStats stats = router.getTaskStats(); // Requests per task type Map requests = stats.requestsPerTask(); // Token usage per task type Map tokens = stats.tokensPerTask(); // Estimated cost savings vs. sending everything to a premium model double savings = stats.estimatedSavings(); // Most common task type Optional common = stats.mostCommonTask(); // Task distribution as percentages Map distribution = stats.taskDistribution(); // General routing stats (total, success, failure, per-provider) RoutingStats routing = stats.routingStats(); // Reset all counters router.resetStats(); ``` ## All Strategies Choose a routing strategy based on your primary concern: availability, cost, speed, or task complexity. You can also combine strategies by nesting routers. | Router | Strategy | Best For | | -------------------- | ------------------------ | ------------------------------- | | `FallbackRouter` | Try next on failure | High availability | | `CostBasedRouter` | Cheapest provider | Budget optimization | | `LatencyBasedRouter` | Fastest measured latency | Real-time applications | | `CapabilityRouter` | Model capabilities match | Mixed workloads (vision + text) | | `RoundRobinRouter` | Even load distribution | Rate limit management | | `TaskBasedRouter` | Task type classification | Varied complexity tasks | --- # RAG URL: https://tnsai.dev/docs/capabilities/rag Description: Retrieval-Augmented Generation — from knowledge base setup to production pipelines. import { Callout } from 'fumadocs-ui/components/callout' ## Where RAG lives in the framework RAG is intentionally split across three modules so each layer can evolve independently: | Layer | Module | Provides | | ------------------ | -------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | | **Primitives** | [tnsai-core](https://github.com/TnsAI-Framework/TnsAI/tree/main/tnsai-core) (`com.tnsai.knowledge`, `com.tnsai.memory.advanced`) | `KnowledgeBase`, `Document`, `EmbeddingFunction`, `BM25Index`, `VectorMemoryStore`, `HybridMemoryRetriever` | | **Strategies** | [tnsai-intelligence](https://github.com/TnsAI-Framework/TnsAI/tree/main/tnsai-intelligence) (`com.tnsai.intelligence.rag`) | `RAGPipeline`, `RAGStrategy` (`Vector`, `Keyword`, `Hybrid`), `RetrievedDocument`, `RAGContext` | | **Service / HTTP** | [tnsai-server](https://github.com/TnsAI-Framework/TnsAI/tree/main/tnsai-server) (`com.tnsai.server.rag`) | `RagService`, `FileIndexer`, `CodeChunker`, `HybridRetriever`, retrieval streams | If you are building an embedded agent, you typically use Core primitives + Intelligence strategies. If you are running TnsAI.Server, you also get the Service layer with HTTP endpoints, indexing, and stream-based retrieval. ## Pages - [Knowledge Base](/docs/capabilities/rag/knowledge-base) — Create, populate, query a `KnowledgeBase`. - [Strategies](/docs/capabilities/rag/strategies) — Swap retrieval algorithms via the RAG SPI. - [Pipeline](/docs/capabilities/rag/pipeline) — Chunk storage, embedding, hybrid search, production deployment. --- # Knowledge Base & RAG URL: https://tnsai.dev/docs/capabilities/rag/knowledge-base Description: TnsAI provides a built-in Retrieval-Augmented Generation (RAG) system through the KnowledgeBase interface, Document model, and @KnowledgeSource annotation. Agents can retrieve relevant context from vector databases, files, URLs, or in-memory stores before making LLM calls. import { Callout } from 'fumadocs-ui/components/callout' **Package:** `com.tnsai.knowledge` ## KnowledgeBase Interface `KnowledgeBase` is the core abstraction for storing and searching documents. Implementations can use in-memory storage, vector databases (Pinecone, Weaviate, Milvus), full-text search (Elasticsearch, OpenSearch), or hybrid approaches. You program against this interface, and swap implementations without changing your agent code. ### Methods These are the operations every `KnowledgeBase` implementation must support. The most important ones are `addDocument` (to ingest content) and `search` (to retrieve relevant context for an LLM call). | Method | Signature | Description | | ------------------- | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ | | `addDocument` | `void addDocument(Document document)` | Adds a single document. Throws `KnowledgeBaseException` on failure. | | `addDocuments` | `default void addDocuments(List documents)` | Adds multiple documents. Default implementation iterates `addDocument`. | | `getDocument` | `Optional getDocument(String id)` | Retrieves a document by ID. Returns empty if not found. | | `removeDocument` | `boolean removeDocument(String id)` | Removes a document by ID. Returns `true` if removed, `false` if not found. | | `search` | `List search(String query, int topK)` | Natural language search. Returns results ordered by relevance (highest first). | | `search` | `List search(String query, int topK, Map filters)` | Search with metadata filtering. Filter entries are key-value pairs that must match. | | `searchByEmbedding` | `List searchByEmbedding(float[] embedding, int topK)` | Similarity search using a pre-computed embedding vector. | | `size` | `int size()` | Returns the total number of documents. | | `isEmpty` | `default boolean isEmpty()` | Returns `true` if the knowledge base has no documents. Delegates to `size() == 0`. | | `clear` | `void clear()` | Removes all documents from the knowledge base. | | `contains` | `default boolean contains(String id)` | Checks if a document with the given ID exists. Delegates to `getDocument(id).isPresent()`. | ## Document `Document` is an immutable value object representing a document or document chunk. Each document has: - **id** -- Unique identifier (auto-generated UUID if not specified) - **content** -- The text content (required, cannot be null or empty) - **metadata** -- Arbitrary key-value pairs for filtering and context (immutable copy) - **embedding** -- Optional vector representation for similarity search (defensive copy) ### Factory Methods The quickest way to create a `Document` is with the static `of()` methods. These are convenient for simple use cases where you do not need to set an explicit ID or embedding. ```java // Simple document (auto-generated ID) Document doc = Document.of("This is the document content"); // Document with a single metadata entry Document doc = Document.of("Product docs...", "source", "docs/product.md"); ``` ### Builder For full control over the document's ID, metadata, and embedding vector, use the builder. This is the recommended approach when you need to attach metadata for filtered searches or pre-computed embeddings for similarity search. ```java Document doc = Document.builder() .id("doc-001") // optional, UUID generated if omitted .content("Product documentation...") // required .metadata("source", "docs/product.md") // single entry .metadata("category", "documentation") // chainable .metadata(Map.of("version", "2.0")) // bulk metadata .embedding(embeddingVector) // optional float[] .build(); ``` Builder method `content(String)` throws `NullPointerException` if null. `build()` throws `IllegalStateException` if content is null or empty. ### Accessors These getter methods let you read the document's fields. Metadata is accessed through the `getSpec` methods, and embeddings are returned as defensive copies to preserve immutability. | Method | Return Type | Description | | ------------------------------------ | --------------------- | ----------------------------------------------------------------- | | `getId()` | `String` | Unique document ID | | `getContent()` | `String` | Document text content | | `getSpec()` | `Map` | Unmodifiable metadata map | | `getSpec(String key)` | `Object` | Single metadata value, or `null` | | `getSpec(String key, Class type)` | `T` | Type-safe metadata value, returns `null` if missing or wrong type | | `hasEmbedding()` | `boolean` | Whether an embedding is present | | `getEmbedding()` | `float[]` | Copy of embedding array, or `null` | | `getEmbeddingDimension()` | `int` | Embedding vector length, or `0` if none | ### Immutable Copy with Embedding Since `Document` is immutable, attaching an embedding returns a new `Document` instance rather than modifying the original. This is useful when you compute embeddings separately after initial document creation. ```java // Attach an embedding to an existing document (returns a new Document) Document withVector = doc.withEmbedding(embeddingVector); ``` Equality is based on `id` only. ## SearchResult `SearchResult` wraps a matched `Document` with a relevance score. Implements `Comparable` -- natural ordering is by score descending (highest first). | Method | Return Type | Description | | ----------------- | ----------- | ------------------------------------------------- | | `getDocument()` | `Document` | The matched document | | `getScore()` | `double` | Relevance score (higher = more relevant) | | `getContent()` | `String` | Convenience: delegates to `document.getContent()` | | `getDocumentId()` | `String` | Convenience: delegates to `document.getId()` | Constructor: `new SearchResult(Document document, double score)` -- document cannot be null. ```java List results = knowledgeBase.search("query", 5); for (SearchResult result : results) { System.out.printf("Score: %.4f | %s%n", result.getScore(), result.getContent()); } ``` ## InMemoryKnowledgeBase `InMemoryKnowledgeBase` is a thread-safe, in-memory implementation suitable for testing and small datasets. It provides: - **ConcurrentHashMap** storage for thread safety - **TF-IDF keyword search** with stop-word removal for `search()` - **Cosine similarity** for both TF-IDF vectors and raw embeddings (`searchByEmbedding`) - **Metadata filtering** support ```java KnowledgeBase kb = new InMemoryKnowledgeBase(); kb.addDocument(Document.of("Java is a programming language")); kb.addDocument(Document.of("Python is also a programming language")); kb.addDocument(Document.builder() .content("Rust is a systems programming language") .metadata("category", "systems") .build()); // Keyword search List results = kb.search("programming language", 5); // Filtered search List filtered = kb.search("programming", 5, Map.of("category", "systems")); // Embedding search List similar = kb.searchByEmbedding(queryEmbedding, 3); ``` For production workloads with large datasets, use a vector database implementation (Pinecone, Weaviate, Qdrant) instead. ## @KnowledgeSource Annotation **Package:** `com.tnsai.annotations` Declarative configuration for RAG knowledge sources. Can be applied to types (agent classes) or methods (individual actions). Repeatable via `@KnowledgeSources`. **Targets:** `ElementType.TYPE`, `ElementType.METHOD` **Retention:** `RetentionPolicy.RUNTIME` ### Fields Each `@KnowledgeSource` annotation is configured through the fields below. At minimum you need `name` and `type`; the remaining fields let you tune connection details, retrieval parameters, and caching behavior. | Field | Type | Default | Description | | ---------------- | --------------- | ----------- | --------------------------------------------------------------- | | `name` | `String` | (required) | Unique identifier for this knowledge source | | `type` | `KnowledgeType` | `VECTOR_DB` | The knowledge source type | | `provider` | `String` | `""` | Vector database provider (for `VECTOR_DB`) | | `index` | `String` | `""` | Index/collection name (for `VECTOR_DB`) | | `path` | `String` | `""` | File path or URL (for `FILE` or `URL`) | | `connection` | `String` | `""` | Database connection string (for `DATABASE`) | | `query` | `String` | `""` | SQL query template with `${query}` placeholder (for `DATABASE`) | | `topK` | `int` | `5` | Maximum number of results to retrieve | | `minSimilarity` | `double` | `0.7` | Minimum similarity score threshold (0.0--1.0) | | `embeddingModel` | `String` | `""` | Embedding model name for vector search | | `dimensions` | `int` | `1536` | Embedding vector dimensions | | `namespace` | `String` | `""` | Namespace/partition for multi-tenant sources | | `filter` | `String` | `""` | Metadata filter in JSON format | | `cache` | `boolean` | `true` | Whether to cache retrieval results | | `cacheTTL` | `int` | `300` | Cache time-to-live in seconds | | `enabled` | `boolean` | `true` | Whether this source is active | | `priority` | `int` | `0` | Query priority (higher = queried first) | ### KnowledgeType Enum The `type` field of `@KnowledgeSource` determines where the framework looks for documents. Choose the type that matches your data source. | Value | Description | | ------------ | ---------------------------------------------------- | | `VECTOR_DB` | Vector database (Pinecone, Weaviate, Qdrant, Chroma) | | `FILE` | Local file (JSON, YAML, TXT, PDF, DOCX) | | `URL` | Remote URL or REST API | | `DATABASE` | SQL or NoSQL database | | `MEMORY` | Agent's conversation memory | | `WEB_SEARCH` | Web search results | | `CACHE` | In-memory cache | ### Annotation Examples These examples show how to attach knowledge sources to an agent class or an individual action method. You can combine multiple sources on the same class using the repeatable annotation pattern. ```java // On a class -- multiple sources via @Repeatable @KnowledgeSource( name = "product-docs", type = KnowledgeType.VECTOR_DB, provider = "pinecone", index = "products", topK = 5, minSimilarity = 0.8 ) @KnowledgeSource( name = "faq", type = KnowledgeType.FILE, path = "knowledge/faq.json" ) public class SupportAgent extends Agent { ... } // On a method @ActionSpec(type = ActionType.LLM, description = "Answer question") @KnowledgeSource(name = "product-docs", topK = 3) public String answerQuestion(String question) { // Relevant context is automatically retrieved before the LLM call } // Database source @KnowledgeSource( name = "customer-data", type = KnowledgeType.DATABASE, connection = "jdbc:postgresql://localhost/mydb", query = "SELECT content FROM docs WHERE content ILIKE '%${query}%' LIMIT 10" ) // Web search source with caching @KnowledgeSource( name = "web-context", type = KnowledgeType.WEB_SEARCH, topK = 3, cache = true, cacheTTL = 600 ) ``` ## Integration with AgentBuilder If you prefer programmatic configuration over annotations, you can attach a knowledge base directly through the `AgentBuilder`. This is useful when you want to populate the knowledge base dynamically at startup or share one instance across multiple agents. Use `.knowledgeBase()` and `.knowledgeBaseTopK()` on `AgentBuilder` to attach a knowledge base programmatically: ```java KnowledgeBase kb = new InMemoryKnowledgeBase(); kb.addDocument(Document.of("Product X supports features A, B, C")); kb.addDocument(Document.of("Pricing starts at $99/month")); Agent agent = AgentBuilder.create() .llm(llmClient) .role(supportRole) .knowledgeBase(kb) // attach the knowledge base .knowledgeBaseTopK(3) // override default top-K (default: 5) .build(); ``` ## Full RAG Example This end-to-end example shows the complete RAG workflow: creating a knowledge base, adding documents with metadata, searching for relevant context, building an augmented prompt, and sending it to the agent. It also demonstrates filtered search to narrow results by metadata category. ```java // 1. Create and populate knowledge base KnowledgeBase kb = new InMemoryKnowledgeBase(); kb.addDocuments(List.of( Document.builder() .content("Product X supports features A, B, and C.") .metadata("source", "product-docs") .metadata("category", "features") .build(), Document.builder() .content("Pricing starts at $99/month for the Basic plan.") .metadata("source", "pricing-page") .metadata("category", "pricing") .build(), Document.builder() .content("Enterprise plan includes SSO and dedicated support.") .metadata("source", "pricing-page") .metadata("category", "pricing") .build() )); // 2. Search for relevant context String query = "What features does Product X have?"; List context = kb.search(query, 3); // 3. Build augmented prompt String augmentedPrompt = "Context:\n" + context.stream() .map(r -> r.getContent()) .collect(Collectors.joining("\n")) + "\n\nQuestion: " + query; // 4. Send to agent String answer = agent.chat(augmentedPrompt); // Or use filtered search for specific categories List pricingResults = kb.search("plan cost", 3, Map.of("category", "pricing")); ``` --- # RAG Pipeline URL: https://tnsai.dev/docs/capabilities/rag/pipeline Description: The server provides a per-session Retrieval-Augmented Generation pipeline that indexes local codebases, chunks source files by language boundaries, and retrieves relevant context using hybrid BM25 + vector search with Reciprocal Rank Fusion. import { Callout } from 'fumadocs-ui/components/callout' ## Architecture Overview The RAG pipeline has three stages: indexing (scanning files and splitting them into chunks), storage (keeping chunks in an in-memory knowledge base with BM25 and vector indexes), and retrieval (finding the most relevant chunks for a user's query using hybrid search). ``` Directory --> FileIndexer --> CodeChunker --> KnowledgeBase (in-memory) | User Query --> HybridRetriever --> [BM25Stream 60%] -+ RRF --> Results --> [VectorStream 40%] -+ ``` Each session gets its own `RagService`, lazily created by `SessionManager.getRag(sessionId)`. The service is thread-safe: indexing is serialized via a `ReentrantLock`, while reads (search) run concurrently. ## RagService The central orchestrator for a session's RAG pipeline. ```java RagService rag = sessionManager.getRag("my-session"); // Index a directory rag.indexDirectory(Path.of("/project/src"), progress -> { System.out.printf("Indexed %d/%d: %s%n", progress.indexedFiles(), progress.totalFiles(), progress.currentFile()); }); // Search List results = rag.search("authentication middleware", 5); // Build augmented prompt (auto-prepends context) String prompt = rag.buildContextPrompt("How does auth work?", 5); // Document management String docId = rag.addDocument("Custom knowledge...", Map.of("source", "manual")); rag.removeDocument(docId); List docs = rag.listDocuments(); ``` The hybrid retriever is configured at construction with BM25 at 60% weight and the vector knowledge base at 40%: ```java this.hybridRetriever = HybridRetriever.builder() .stream(bm25Stream, 0.6) .stream(new KnowledgeBaseStream(knowledgeBase), 0.4) .build(); ``` ## FileIndexer The `FileIndexer` recursively walks a directory, identifies source files by extension, splits them into chunks using `CodeChunker`, and stores the chunks in the knowledge base. It supports incremental indexing so only changed files are re-processed on subsequent runs. ### Supported Extensions (28+) The indexer recognizes 28+ file extensions covering most popular programming languages and configuration formats. `java`, `ts`, `tsx`, `js`, `jsx`, `py`, `md`, `json`, `yml`, `yaml`, `xml`, `html`, `css`, `sh`, `sql`, `go`, `rs`, `rb`, `kt`, `scala`, `c`, `cpp`, `h` -- plus language aliases (`kts`, `bash`, `zsh`, `markdown`, `htm`, `cc`, `cxx`, `hpp`, `sc`). ### Filtering The indexer automatically skips build artifacts, dependency directories, and files matching your `.gitignore` patterns to avoid polluting the knowledge base with irrelevant content. - **Skipped directories**: `.git`, `node_modules`, `build`, `dist`, `target`, `.idea`, `.vscode`, `.gradle`, `__pycache__`, `vendor`, `.next`, `out`, `coverage`, `.svn`, `.hg` - **Ignore files**: Reads `.gitignore` and `.tnsignore` from the root, converting glob patterns to Java `PathMatcher` instances - **Size limit**: Files larger than 512 KB or empty files are skipped ### Incremental Indexing To avoid re-processing unchanged files, the indexer computes a SHA-256 hash of each file's content and stores it in a `ConcurrentHashMap`. On re-index: 1. If the hash matches the previous run, the file is skipped 2. If the file changed, old chunks are removed from both KnowledgeBase and BM25Stream 3. New chunks are generated and added Call `fileIndexer.clearHashes()` to force a full re-index. ## CodeChunker The `CodeChunker` splits source files into semantically meaningful chunks -- for example, by class or function boundaries in Java/TypeScript, or by headings in Markdown. This ensures that search results return coherent, self-contained code blocks rather than arbitrary line ranges. ### Chunking Strategies The chunker picks a strategy based on the file's language. Languages with known structure get smarter splitting; everything else falls back to fixed-size line groups. | Language | Strategy | Boundary Detection | | ---------------------- | ------------------------- | ------------------------------------------------------------------- | | Java, Kotlin, Scala | Class/method boundaries | Regex: class/interface/enum/record declarations + method signatures | | TypeScript, JavaScript | Function/class boundaries | Regex: export/function/class/const arrow declarations | | Markdown | Heading boundaries | Regex: `#{1-6}` heading lines | | Everything else | Fixed line groups | Max 100 lines per chunk | Small files (100 lines or fewer) are always kept as a single chunk. Large boundary-detected chunks are sub-split into 100-line groups. Each chunk becomes a `Document` with metadata: ```java Document.builder() .id("src/auth/Middleware.java:15-45") .content(chunkContent) .metadata("file", "src/auth/Middleware.java") .metadata("startLine", 15) .metadata("endLine", 45) .metadata("language", "java") .build(); ``` ## BM25Stream The `BM25Stream` provides keyword-based search using the Okapi BM25 algorithm, which is the same ranking function used by search engines like Elasticsearch. It scores documents based on how well their terms match the query, accounting for term frequency and document length. ### Parameters These BM25 parameters control how the scoring behaves. The defaults work well for code search and rarely need tuning. | Parameter | Value | Description | | --------- | ----- | ----------------------------- | | K1 | 1.2 | Term frequency saturation | | B | 0.75 | Document length normalization | ### Text Processing Pipeline Before scoring, queries and documents go through a text processing pipeline that normalizes, tokenizes, and stems terms. This improves recall by matching different forms of the same word. 1. **Tokenization**: Lowercase, strip non-alphanumeric (except `_`), split on whitespace, drop tokens with 1 character or fewer 2. **Stop word removal**: 50 common English stop words 3. **Stemming**: Suffix-stripping rules for 14 suffixes (`-ies`, `-ing`, `-tion`, `-sion`, `-ment`, `-ness`, `-able`, `-ous`, `-ful`, `-less`, `-ly`, `-ed`, `-er`, `-es`, `-s`) 4. **Synonym expansion** (query-time only): 20 coding-domain synonym pairs ### Synonym Pairs At query time, common coding abbreviations are expanded to their full forms (and vice versa) so that searching for "auth" also finds documents containing "authentication". | Term | Synonyms | | ------ | ----------------------------- | | db | database | | auth | authentication, authorization | | config | configuration | | perf | performance | | impl | implementation | | req | request | | res | response | | err | error | | msg | message | | fn | function | | param | parameter | | repo | repository | | env | environment | | async | asynchronous | | sync | synchronous | ## HybridRetriever The `HybridRetriever` combines results from multiple search strategies (like BM25 keyword search and vector similarity search) into a single ranked list. This hybrid approach gives better results than either method alone because keyword search finds exact term matches while vector search captures semantic similarity. ### Fusion Algorithm The retriever merges results using Reciprocal Rank Fusion (RRF), which combines rankings without needing normalized scores. For each document appearing in any stream's results: ``` score(doc) = SUM over streams: weight(stream) / (K + rank(doc, stream) + 1) ``` Where `K = 60` (the RRF constant). Documents are then sorted by fused score. ### Diversification To prevent a single large file from dominating search results, the retriever limits output to a maximum of 3 chunks per source file. This ensures the agent sees context from multiple relevant files. ```java HybridRetriever retriever = HybridRetriever.builder() .stream(bm25Stream, 0.6) // 60% weight .stream(vectorStream, 0.4) // 40% weight .build(); List results = retriever.retrieve("authentication flow", 10); ``` ## Context Prompt Format When the agent asks a question, `RagService.buildContextPrompt` searches for relevant code and prepends it to the user's query. This gives the LLM the codebase context it needs to answer accurately. ``` [Relevant code context] --- file: src/auth/Middleware.java (lines 15-45) --- public class AuthMiddleware { private final TokenValidator validator; ... } --- file: src/auth/TokenValidator.java (lines 1-30) --- public class TokenValidator { ... } [User question] How does the authentication middleware work? ``` If no context is found (empty knowledge base or no matches), the original query is returned unchanged. ## Document Management API Beyond automatic directory indexing, you can manually add, list, and remove documents in the knowledge base. This is useful for injecting custom knowledge (like deployment procedures or domain-specific documentation) that is not part of the codebase. ```java // Add a document with metadata String docId = rag.addDocument("Custom knowledge content", Map.of("source", "user", "topic", "deployment")); // List documents (returns preview, length, metadata) List docs = rag.listDocuments(); // DocumentInfo(id, preview(100chars), contentLength, metadata) // Get a specific document Optional doc = rag.getDocument(docId); // Remove boolean removed = rag.removeDocument(docId); // Clear everything rag.clear(); ``` Documents added via `addDocument` are tracked separately and appear in `listDocuments()`. Both manually added documents and file-indexed chunks are searchable through the same hybrid retriever. --- # RAG Strategy SPI URL: https://tnsai.dev/docs/capabilities/rag/strategies Description: TnsAI.Intelligence provides a pluggable Retrieval-Augmented Generation (RAG) framework with three built-in strategies and a composable pipeline. Package: com.tnsai.intelligence.rag. import { Callout } from 'fumadocs-ui/components/callout' ## RAGStrategy Interface Every retrieval strategy implements this interface. You call `retrieve(query, topK)` with a user query and the number of results you want, and the strategy returns the most relevant documents it can find. ```java public interface RAGStrategy { List retrieve(String query, int topK); String name(); } ``` Each strategy returns ranked `RetrievalResult` objects containing the retrieved text, a relevance score, and source metadata. ```java public record RetrievalResult( String content, double score, Map metadata ) {} ``` ## VectorRAGStrategy Dense vector retrieval using embedding similarity. Best for semantic matching where exact keywords may not appear in the source documents. ```java RAGStrategy vectorRAG = VectorRAGStrategy.builder() .embeddingClient(embeddingClient) .vectorStore(vectorStore) .similarityThreshold(0.7) .build(); List results = vectorRAG.retrieve("How do agents communicate?", 5); ``` | Parameter | Default | Description | | --------------------- | -------- | ------------------------------------- | | `embeddingClient` | required | Client for generating embeddings | | `vectorStore` | required | Vector database for similarity search | | `similarityThreshold` | 0.7 | Minimum cosine similarity to include | ## KeywordRAGStrategy Sparse retrieval using BM25 scoring. Best for queries with specific technical terms, identifiers, or exact phrases. ```java RAGStrategy keywordRAG = KeywordRAGStrategy.builder() .index(bm25Index) .build(); List results = keywordRAG.retrieve("ContextCompactor interface", 5); ``` ## HybridRAGStrategy Combines vector and keyword strategies using Reciprocal Rank Fusion (RRF) to merge result lists. This gives the best of both semantic and lexical matching. ```java RAGStrategy hybridRAG = HybridRAGStrategy.builder() .vectorStrategy(vectorRAG) .keywordStrategy(keywordRAG) .vectorWeight(0.6) .keywordWeight(0.4) .fusionK(60) // RRF constant .build(); List results = hybridRAG.retrieve("agent memory persistence", 10); ``` | Parameter | Default | Description | | ----------------- | -------- | ------------------------------------------------------ | | `vectorStrategy` | required | Dense retrieval strategy | | `keywordStrategy` | required | Sparse retrieval strategy | | `vectorWeight` | 0.6 | Weight for vector results in fusion | | `keywordWeight` | 0.4 | Weight for keyword results in fusion | | `fusionK` | 60 | RRF smoothing constant (higher = more equal weighting) | ## RAGPipeline A pipeline wraps a retrieval strategy with optional query rewriting (to improve recall) and result reranking (to improve precision). This lets you build a complete retrieval system by composing simple, testable components. ```java RAGPipeline pipeline = RAGPipeline.builder() .strategy(hybridRAG) .queryRewriter(query -> expandAcronyms(query)) .reranker((results, query) -> crossEncoderRerank(results, query)) .maxResults(5) .build(); List results = pipeline.execute("How does RRF fusion work?"); ``` ### Pipeline Stages The pipeline processes a query through three stages. Each stage is optional -- you can use just a strategy, or add rewriting and reranking for better results. ``` User Query | v Query Rewriter (optional) -- expand, rephrase, or decompose the query | v RAGStrategy.retrieve() -- fetch candidates from one or more sources | v Reranker (optional) -- re-score and re-order results | v Top-K Selection -- return final results ``` | Stage | Interface | Description | | -------------- | ------------------------------------------------------------------ | ----------------------------------------------------- | | Query Rewriter | `Function` | Transform the query before retrieval | | Strategy | `RAGStrategy` | Core retrieval (vector, keyword, or hybrid) | | Reranker | `BiFunction, String, List>` | Re-score results using a cross-encoder or other model | ## Integration with Agents Once you have a RAG pipeline, you can wire it directly into an agent. The agent will automatically retrieve relevant documents before generating each response, so it can answer questions grounded in your data. ```java Agent agent = AgentBuilder.create() .model("claude-sonnet-4") .ragPipeline(pipeline) .build(); // The agent automatically retrieves relevant context before generating responses String response = agent.chat("Explain the memory architecture"); ``` ## Choosing a Strategy Pick the strategy that matches your data and query patterns. For most production systems, hybrid gives the best results by combining semantic understanding with exact term matching. | Strategy | Strengths | Weaknesses | Best For | | -------- | ------------------------------------------------------ | ---------------------------------------- | --------------------------- | | Vector | Semantic understanding, handles paraphrasing | Misses exact terms, requires embeddings | Natural language queries | | Keyword | Fast, exact term matching, no embeddings needed | No semantic understanding | Technical docs, code search | | Hybrid | Best overall recall, handles both semantic and lexical | Higher latency (two retrievals + fusion) | Production RAG systems | --- # Skills URL: https://tnsai.dev/docs/capabilities/skills Description: On-demand modular knowledge between role and tools. The framework's answer to: how do I keep multi-step procedures and domain knowledge out of the always-on system prompt without losing them when they're actually relevant? import { Callout } from 'fumadocs-ui/components/callout' The `com.tnsai.skills` package in `tnsai-core` ships the primitives: - **`Skill`** — record carrying a name, a description (the trigger phrase the resolver scores against), the markdown body, and per-skill scoping (`allowedTools`, `userInvocable`, `disableModelInvocation`). - **`SkillStore`** — SPI for discovery; the framework ships `InMemorySkillStore` (test seam) and `FileSystemSkillStore` (Claude Code-compatible `//SKILL.md` layout). - **`SkillResolver`** — picks top-N candidates per user turn; defaults to `KeywordSkillResolver` (no extra round-trip), upgradeable to `LLMSkillResolver` (semantic match, one extra LLM call per turn). - **`SkillManager`** — per-agent facade combining store + resolver + session state. Handles `/skill-name` parsing, invocation source gating, and prompt-section rendering. - **`SkillSession` + `ActiveSkill`** — runtime state tracking which skills the agent has activated this session. - **`SkillScopedToolCallFilter`** — `ToolCallFilter` that enforces the union of `allowed-tools` across active skills. Skills sit between the always-on **role** layer and the always-on **tool** layer: | Layer | Loaded | Granularity | | ---------------- | ------------------------------ | ------------------------ | | Role / CLAUDE.md | Always | Stable agent identity | | Tool | Always | Atomic action | | Capability | Always (compile-time) | Method-level declaration | | RAG hit | Per query (similarity) | Evidence text | | **Skill** | **On demand (intent-matched)** | **Multi-step procedure** | | Hook | Always (system gate) | Cross-cutting policy | This is the "Skills" layer in Claude Code's 5-layer architecture (CLAUDE.md / Skills / Hooks / Subagents / Plugins). TnsAI was missing it before TNS-289. ## Why a separate layer Three forces motivate skills as a distinct primitive: 1. **Token budget hygiene** — `RoleSpec` and `CLAUDE.md` content rides every prompt. Procedures that only matter when invoked ("deploy procedure", "API design conventions", "customer escalation protocol") shouldn't pin tokens on every turn. 2. **Author-time portability** — Claude Code, Cursor, and other tools are converging on the `agentskills.io` `SKILL.md` standard. A skill authored once works across every framework that supports it. 3. **Per-skill scoping** — `allowed-tools` declares which tools a skill expects to use; `userInvocable` and `disableModelInvocation` declare who may activate it. Both are recorded on the activation event and enforceable through `SkillScopedToolCallFilter`. ## Quick start ```java import com.tnsai.skills.*; import com.tnsai.agents.AgentBuilder; import java.nio.file.Path; // 1. Pick a store. FileSystemSkillStore reads the Claude Code-compatible // layout: //SKILL.md. Programmatic registration // works through InMemorySkillStore for tests / embedded use. SkillStore store = new FileSystemSkillStore(Path.of(".tnsai/skills")); // 2. Wire on the AgentBuilder. Wiring a store auto-upgrades the // policy from OFF to AUTO (the consumer's intent: "I configured // skills, surface them"). Override with .skillResolverPolicy(...) // if you want MANUAL_ONLY or OFF. Agent agent = AgentBuilder.create() .id("research-agent") .llm(...) .role(myRole) .skillStore(store) // accountability wiring (TNS-298) elided for brevity .build(); ``` From this point on: - The resolver runs on every chat turn against the registered skill descriptions; the top `maxActiveSkills` candidates appear in the per-message system prompt under `## Skill candidates for this turn`. - The user types `/deploy staging` to manually activate a skill — the framework intercepts BEFORE the LLM round-trip, so manual invocation costs zero LLM tokens. - Once activated, the skill's substituted body lives in the system prompt under `# Skills > ## Active skill bodies` for the remainder of the session. - `agent.invokeSkill("name", args, env)` lets framework code activate a skill programmatically, bypassing `userInvocable=false`. ## SKILL.md format ```markdown --- name: deploy description: Production deploy procedure with rollback support when-to-use: When the user asks to ship a build to prod or staging allowed-tools: - bash - kubectl argument-hint: arguments: - environment disable-model-invocation: false user-invocable: true --- # Deploy procedure 1. Verify CI is green. 2. `kubectl apply -f manifests/$0/` ``` The `$0` placeholder gets substituted with the first positional argument when the skill is invoked. See [Skill format](/docs/capabilities/skills/skill-format) for the full frontmatter reference. ## Resolver policies | Policy | Behaviour | | -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `AUTO` (default when a store is wired) | Resolver runs on every user turn; top-N candidate descriptions appear in the system prompt; the LLM may invoke via the synthetic `invoke_skill` tool; users may invoke via `/skill-name`. | | `MANUAL_ONLY` | Resolver does NOT run; only `/skill-name` and `agent.invokeSkill(...)` activate skills. Useful when the deployment wants deterministic skill loading. | | `OFF` | Skill layer disabled. Framework default when no store is wired. | ## Lifecycle 1. **Discovery** — store enumerates registered skills at startup. Only the `description` field is in the always-on system prompt. 2. **Resolution** — per user message, `SkillResolver` ranks candidates by relevance. 3. **Activation** — user invocation or model tool call moves the full body into context for the rest of the session. 4. **Re-activation** — invoking an active skill again replaces the previous snapshot; the latest invocation's substituted body wins. `SkillActivationEvent` (sealed branch of `TnsAIEvent`) is emitted on every activation, carrying the source (`USER_SLASH_COMMAND` / `MODEL_TOOL_CALL` / `PROGRAMMATIC`), the skill name, and the supplied arguments. ## Per-skill tool scope When a skill declares `allowed-tools`, callers can attach `SkillScopedToolCallFilter` to the agent so tool calls outside the union of active-skill `allowed-tools` are blocked with a `Guide` action that names the active skills: ```java agent.setToolCallFilter(new SkillScopedToolCallFilter(agent.getSkillManager())); ``` The filter is permissive when no active skill declares `allowed-tools` — the field is opt-in scoping, not a default constraint. ## Subagent context fork `SkillManager.preloadFrom(parent)` copies the parent's active-skill snapshots into a child manager so `AgentGroup`-spawned subagents inherit the procedural context their parent had loaded. Mirrors Claude Code's `context: fork` pattern. Parent's snapshot wins on name collision; the child's pre-existing activations are preserved on top. ## What's not in this layer (deferred) - **`paths` glob auto-activation** — file-context-aware activation; v2 (the v1 trigger surface is user-message-aware via the resolver) - **Plugin distribution** — packaging skills into the Plugins layer; tracked separately - **`context: fork` semantics for full isolation** — current preload is a copy, not a fork; v2 - **GUI skill registry** — visual catalog in `TnsAI.Web`; v3 - **Live file-watcher** — call `FileSystemSkillStore.refresh()` instead ## See also - [Skill format](/docs/capabilities/skills/skill-format) — full SKILL.md frontmatter reference and substitution rules - [Registration](/docs/capabilities/skills/registration) — `AgentBuilder` API + custom resolvers + custom stores - [Hooks](/docs/security) — skill activation events flow through the hook bus - [Approvals and Annotations](/docs/security/approvals-and-annotations) — `@ApprovalRequired` works alongside skills (approvals gate access; skills supply the procedure) - [Accountability](/docs/security/accountability) — `SkillActivationEvent` rides the same trace as the resulting liability records --- # Registration URL: https://tnsai.dev/docs/capabilities/skills/registration Description: How to wire a SkillStore and SkillResolver into an AgentBuilder, swap the defaults, and integrate skill activation with the rest of the framework. import { Callout } from 'fumadocs-ui/components/callout' ## AgentBuilder API | Method | Default | Required? | | ------------------------------------------- | --------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | | `.skillStore(SkillStore)` | none | Required to enable skills. Wiring a store auto-upgrades policy from `OFF` to `AUTO`. | | `.skillResolver(SkillResolver)` | `KeywordSkillResolver` | Optional. Override only if the default token-overlap scoring isn't accurate enough. | | `.skillResolverPolicy(SkillResolverPolicy)` | `OFF` (or `AUTO` once a store is wired) | Override to force `MANUAL_ONLY` (resolver does not run) or `OFF` (skill layer disabled even when a store is present). | | `.maxActiveSkills(int)` | `3` | Cap on candidate descriptions surfaced per turn. Bounds prompt overhead. | ```java Agent agent = AgentBuilder.create() .id("research-agent") .llm(llm) .role(myRole) .skillStore(new FileSystemSkillStore(Path.of(".tnsai/skills"))) .skillResolverPolicy(SkillResolverPolicy.AUTO) .maxActiveSkills(5) // accountability wiring (TNS-298) elided .build(); ``` When `skillStore` is not called, `agent.getSkillManager()` returns `null` and the agent operates without a skills layer — the documented "skills disabled" contract. ## Stores ### FileSystemSkillStore The recommended production store. Reads the Claude Code-compatible layout: ``` .tnsai/skills/ ├── deploy/SKILL.md └── lint/SKILL.md ``` Scans on construction. Re-scan via `refresh()` to pick up disk changes: ```java FileSystemSkillStore store = new FileSystemSkillStore(Path.of(".tnsai/skills")); // ... time passes, someone added a new skill ... store.refresh(); ``` Programmatic registrations (`store.register(skill)`) survive `refresh()` if their name doesn't collide with a disk skill. Disk content takes precedence on collision. ### InMemorySkillStore Test seam. Programmatic-only: ```java InMemorySkillStore store = new InMemorySkillStore(List.of( Skill.builder("deploy") .description("Production deploy procedure") .body("1. Verify CI...\n2. kubectl apply...") .allowedTools(List.of("bash", "kubectl")) .build())); ``` ### Custom stores Implement `SkillStore` to back skills with a database / classpath / enterprise registry. The contract is small (`findByName`, `list`, `register`); concurrency must be safe under multi-threaded agent dispatch. ## Resolvers ### KeywordSkillResolver (default) Cheap word-overlap scoring. Field weights: `name` 3×, `when-to-use` 2×, `description` 1×. Stop-words filtered. Zero-score candidates dropped. Ties break alphabetically by name for stable ordering. ### LLMSkillResolver Asks the configured `LLMClient` to pick the most relevant skill names from a compact catalog. Higher accuracy when triggers are paraphrased semantically; one extra LLM call per turn. ```java .skillResolver(new LLMSkillResolver(llmClient)) ``` Falls back to `KeywordSkillResolver` on LLM failure rather than silently returning empty — a transient hiccup must not strip skills from the prompt. ### Custom resolvers Implement `SkillResolver`. Common variants: - **Embedding-based** — pre-compute description embeddings; score by cosine similarity against the user message embedding. - **Hybrid** — keyword + LLM rerank. - **Rule-based** — wire-up that always surfaces a fixed subset. ## Tool-call scope (`SkillScopedToolCallFilter`) When skills declare `allowed-tools`, attach the skill-aware filter so the LLM is constrained to those tools while the skill is active: ```java agent.setToolCallFilter(new SkillScopedToolCallFilter(agent.getSkillManager())); ``` Behaviour summary: - No active skills → permissive (every tool call allowed). - Active skills, none declare `allowed-tools` → permissive. - One or more active skills declare `allowed-tools` → tool calls outside the union of their lists are blocked with a `ToolCallAction.Guide` action that names the active skills, so the LLM can self-correct. Compose with other filters by wrapping in your own `ToolCallFilter` chain. ## Subagent context fork When an `AgentGroup` spawns a subagent that should inherit the parent's procedural context, preload from the parent's manager: ```java SkillManager parentMgr = parent.getSkillManager(); SkillManager childMgr = child.getSkillManager(); if (parentMgr != null && childMgr != null) { childMgr.preloadFrom(parentMgr); } ``` The child's pre-existing activations are preserved on top; the parent's snapshots are appended. Re-activations replace any same-named child snapshot. ## Programmatic invocation `Agent.invokeSkill(name, arguments, environment)` activates a skill directly. The flag `userInvocable=false` is bypassed (programmatic source overrides user-facing visibility), but `disableModelInvocation` is irrelevant here because the source is `PROGRAMMATIC`, not `MODEL_TOOL_CALL`. ```java Optional activated = agent.invokeSkill( "deploy", List.of("staging"), Map.of("BUILD_TAG", "v1.4.2")); ``` Returns empty when: - No `SkillStore` is wired. - No skill with the given name is registered. ## Slash commands Users (or test harnesses) activate a skill manually with `/skill-name args...`. `Agent.chat(...)` intercepts this prefix BEFORE the LLM round-trip, so manual invocation costs zero LLM tokens and returns a confirmation string. ```java String reply = agent.chat("/deploy staging"); // "Skill 'deploy' activated; its body is now in context and will guide subsequent turns." ``` `userInvocable=false` skills reject slash invocation with a friendly error instead of activating. ## Activation events Every successful activation emits a `SkillActivationEvent` (sealed branch of `TnsAIEvent`): ```java public record SkillActivationEvent( String eventId, Instant timestamp, String runId, String agentName, String skillName, Source source, // USER_SLASH_COMMAND / MODEL_TOOL_CALL / PROGRAMMATIC List arguments ) implements TnsAIEvent {} ``` Use `agent.chatWithEvents(message, eventConsumer)` to receive these alongside the rest of the event stream. The same event flows through the configured hook bus, so policy hooks (e.g. "log every skill activation to the audit pipeline") can attach there. ## See also - [Skills overview](/docs/capabilities/skills) - [Skill format](/docs/capabilities/skills/skill-format) — frontmatter + substitution - [Hooks](/docs/security) — `SkillActivationEvent` rides the same bus as other TnsAI events - [Accountability](/docs/security/accountability) — skill activations correlate with the resulting liability records via `runId` --- # SKILL.md format URL: https://tnsai.dev/docs/capabilities/skills/skill-format Description: A SKILL.md file is a YAML frontmatter block followed by a markdown body. The framework's parser (SkillMdParser) accepts the same shape Claude Code's parser does, so a skill authored once works in both environments. import { Callout } from 'fumadocs-ui/components/callout' ## File layout `//SKILL.md` — the directory name is the default skill name when the frontmatter is silent on `name`. `FileSystemSkillStore` scans `/` and treats every immediate child directory containing a `SKILL.md` as a registered skill. ``` .tnsai/skills/ ├── deploy/ │ ├── SKILL.md │ ├── reference.md (optional, lazy-referenced by body) │ └── scripts/precheck.sh (optional, executed by skill body) ├── lint/ │ └── SKILL.md └── customer-escalation/ └── SKILL.md ``` Supporting files (`reference.md`, scripts, …) are NOT loaded eagerly — the body declares them by relative path and consumers fetch on demand. ## Frontmatter reference ```yaml --- name: deploy # falls back to parent directory name description: Production deploy procedure with rollback support when-to-use: | When the user asks to ship a build to prod or staging. Trigger phrases: "deploy", "ship", "release", "rollout". allowed-tools: - bash - kubectl - github.create_pr argument-hint: "" arguments: - environment disable-model-invocation: false # default false user-invocable: true # default true paths: - "infra/**" # v2 (file-context auto-activation) --- ``` | Field | Type | Default | Meaning | | -------------------------- | --------------------- | ------------ | ---------------------------------------------------------------------------------------------------------------- | | `name` | string | parent dir | Stable identifier; case-insensitive on lookup. | | `description` | string | **required** | Always in the system prompt; the resolver scores against it. | | `when-to-use` | string | empty | Optional context for the resolver — trigger phrases, example requests. | | `allowed-tools` | list of strings | empty | Tools auto-granted while this skill is active. Empty = no constraint. | | `argument-hint` | string | empty | Free-form hint shown in catalog listings. | | `arguments` | list of strings | empty | Positional argument names for `argument-hint`. | | `disable-model-invocation` | boolean | `false` | If `true`, the LLM may not activate this skill — only the user (or `Agent.invokeSkill(...)`) can. | | `user-invocable` | boolean | `true` | If `false`, hidden from the user-facing menu and rejected on `/skill-name`. Programmatic activation still works. | | `paths` | list of glob patterns | empty | v2 only: auto-activate when the agent is operating on matching files. | ### Field aliases The parser accepts both the kebab-case (`agentskills.io` standard) and snake\_case (Claude Code docs) forms for fields where the docs are split: | Canonical | Aliases | | --------------- | --------------- | | `when-to-use` | `when_to_use` | | `allowed-tools` | `allowed_tools` | | `argument-hint` | `argument_hint` | ### List shorthand A scalar in place of a single-element list is accepted for `allowed-tools`, `paths`, and `arguments`: ```yaml allowed-tools: bash # equivalent to: ["bash"] ``` ### CRLF tolerance Files written on Windows / via `git autocrlf=true` parse identically — the parser normalises line endings before extracting the frontmatter. ## Body Anything between the closing `---` delimiter and the end of the file is the skill body. The body is plain markdown — the framework does not parse it. The body is rendered through `SkillSubstitution` before it lands in the system prompt, so placeholders are resolved at activation time (not at parse time). ## Substitution | Placeholder | Resolves to | | ------------- | --------------------------------------------------------------------- | | `$ARGUMENTS` | All positional arguments space-joined. | | `$0`, `$1`, … | The Nth positional argument (0-indexed). Out-of-range = empty string. | | `${VAR}` | Named env value supplied by the caller. Unknown name = empty string. | | `$$` | Literal dollar sign — escapes the placeholder at that position. | Substitution is **single-pass and left-to-right** — placeholder values are not re-substituted. A literal `$ARGUMENTS` passed as `$0` stays literal in the output. ```markdown --- name: deploy description: Deploy a build to a target environment arguments: - environment --- # Deploy to $0 1. Verify CI is green for the build tagged `${BUILD_TAG}`. 2. `kubectl apply -f manifests/$0/` 3. Smoke-test https://$0.example.com/healthz. ``` Invoked as `/deploy staging` with `env={BUILD_TAG: "v1.4.2"}`, the body renders as: ```markdown # Deploy to staging 1. Verify CI is green for the build tagged `v1.4.2`. 2. `kubectl apply -f manifests/staging/` 3. Smoke-test https://staging.example.com/healthz. ``` ## Validation rules The parser rejects: - Files without an opening `---` delimiter on the first line. - Files with an opening delimiter but no closing one. - Frontmatter that is not valid YAML. - A `Skill` with blank `name` (after defaulting from the directory name) — every skill must have a name. - A `Skill` with blank `description` — the resolver depends on it. A malformed `SKILL.md` is **logged and skipped** by `FileSystemSkillStore`; sibling skills continue to load. One broken skill on disk does not poison the store. ## Authoring tips - **Lead the description with the problem the skill solves**, not the solution. The resolver scores by token overlap with the user's message — "deploy build" lands closer to "ship build" than "ship a release". Name the user's intent in your own words. - **Use `when-to-use` for trigger phrases**. The keyword resolver weights `when-to-use` matches 2× and `name` matches 3×; trigger phrases here move the skill ahead of competitors. - **Keep the body procedural, not narrative**. Numbered steps + concrete commands. The body is what the LLM follows once activated; flowery prose dilutes the signal. - **Reference supporting files by relative path**. The body can say "see `reference.md` for examples" — the consumer (LLM or human) fetches on demand instead of bloating the activation snapshot. ## See also - [Skills overview](/docs/capabilities/skills) — when to reach for skills vs tools / RAG / capabilities - [Registration](/docs/capabilities/skills/registration) — wiring a store + resolver into `AgentBuilder` --- # Tools — Advanced URL: https://tnsai.dev/docs/capabilities/tools/advanced Description: The function-shape POJO model deliberately keeps the tool surface small: a method, an annotation, a registry. Most \"advanced\" features that older docs covered (manifest generators, contract validators, security enforcers, parameter validators, retry/cache wrappers) were retired together with the legacy Tool interface in v0.6.0 / v0.7.0. Cross-cutting concerns now live one layer up — on the @ActionSpec annotation, on the agent's setToolCallFilter / setToolCallListener hooks, or on the dispatcher itself. import { Callout } from 'fumadocs-ui/components/callout' This page covers what's left in the advanced surface today. ## Per-action LLM overrides When an `@ActionSpec(type = LLM)` action needs its own system prompt or temperature without changing the agent's global LLM config, set them on the annotation: ```java @ActionSpec( type = ActionType.LLM, description = "Extract entities from text — must be deterministic", llmSystemPrompt = "You are a precise NER extractor. Output JSON only.", llmTemperature = 0.0f ) public String extractEntities(String text) { return "Extract entities from: " + text; } ``` `llmSystemPrompt` overrides the LLM client's default for the duration of this action; `llmTemperature >= 0` overrides the chat temperature. A negative `llmTemperature` (the `-1.0f` default) means "fall back to the LLM client's default". Tool exposure stays at the agent level — every `LLM` action sees the agent's complete tool registry. ## Inspecting the tool registry at runtime Every agent built with `AgentBuilder` has a `ToolMethodDispatcher` whose `registry()` exposes the full set of registered tools. Use this when you need to introspect the catalog from inside a `setToolCallListener`, write tooling around tool discovery, or assert on registration in tests. ```java import com.tnsai.tools.method.ToolMethodDispatcher; import com.tnsai.tools.method.ToolMethodRegistry; ToolMethodDispatcher dispatcher = agent.getToolMethodDispatcher(); ToolMethodRegistry registry = dispatcher.registry(); for (var tool : registry.allMethods()) { System.out.println(tool.name() + " — " + tool.description()); } ``` ## Direct dispatch `ToolMethodDispatcher.dispatch(name, args)` is the same entry point the LLM tool-call loop uses. Call it directly when you want to invoke a tool from non-LLM code (tests, batch jobs, smoke scripts) without spinning up the full chat path: ```java Object result = dispatcher.dispatch( "csv_summary", Map.of("path", "/data/sales.csv") ); ``` Argument types are coerced via Jackson — pass a `Map` whose entries match the `@ToolParam` parameter names. ## Composio integration `ComposioClient` (`com.tnsai.tools.composio`) provides access to 500+ managed tool integrations via the Composio platform with automatic OAuth handling. It's a separate POJO toolkit registered the same way as any other: ```java AgentBuilder.create() .llm(llm) .role(role) .toolPojos(new ComposioTools()) // wraps ComposioClient .build(); ``` ### Operations | Operation | Description | | ------------- | -------------------------------- | | `apps` | List available apps/integrations | | `tools` | Get tools for a specific app | | `execute` | Execute a tool action | | `connect` | Initiate OAuth connection | | `connections` | List active connections | | `triggers` | List available triggers | | `categories` | List app categories | ### App categories `PRODUCTIVITY`, `CRM`, `DEVELOPER`, `COMMUNICATION`, `MARKETING`, `DESIGN`, `FINANCE`, `SOCIAL`, `AI`, `STORAGE` ### Direct usage ```java ComposioClient composio = new ComposioClient(); // Requires COMPOSIO_API_KEY environment variable String result = composio.execute("{\"operation\":\"apps\",\"category\":\"developer\"}"); String tools = composio.execute("{\"operation\":\"tools\",\"app\":\"github\"}"); String issue = composio.execute(""" {"operation":"execute","app":"github","action":"create_issue", "params":{"repo":"owner/repo","title":"Bug report","body":"..."}} """); ``` ## Cross-References - [Tool Catalog](/docs/capabilities/tools/catalog) — the 62 shipped POJO toolkits and their methods - [Custom Tools](/docs/capabilities/tools/custom-tools) — write your own `@Tool` methods - [Tool Integration](/docs/capabilities/tools/registration) — `setToolCallFilter` / `setToolCallListener` hooks - [MCP Client](/docs/mcp/client) — bridge MCP tools into TnsAI as `DynamicToolMethod` instances --- # Tool Catalog URL: https://tnsai.dev/docs/capabilities/tools/catalog Description: tnsai-tools ships 59 function-shape POJO toolkits exposing roughly 206 @Tool-annotated methods across 29 categories. Each toolkit is a plain class with public methods annotated @Tool; the framework discovers them reflectively via ToolMethodRegistry and dispatches calls through ToolMethodDispatcher (the same path used for any user POJO registered with AgentBuilder.toolPojos(...)). import { Callout } from 'fumadocs-ui/components/callout' For creating your own toolkits, see [Custom Tools](/docs/capabilities/tools/custom-tools). ## Quick Start The compile-safe path is `BuiltInTool` — pass enum constants and the framework instantiates the backing POJO for you: ```java import com.tnsai.agents.AgentBuilder; import com.tnsai.enums.BuiltInTool; import com.tnsai.llm.providers.OpenAIClient; Agent agent = AgentBuilder.create() .llm(new OpenAIClient("gpt-4o")) .role(myRole) .builtInTools( BuiltInTool.WEB_SEARCH_TOOLS, // brave_search, duckduckgo, wikipedia, … BuiltInTool.UTILITY_TOOLS, // calculator, hash, datetime_* BuiltInTool.PDF_TOOLS // pdf_extract_text, pdf_metadata, … ) .build(); String response = agent.chat("What is the population of Tokyo? Triple it."); ``` The LLM sees every `@Tool` method on every registered toolkit as a callable function and dispatches by method name (`brave_search`, `calculator`, `pdf_extract_text`, …). Most toolkits read credentials from environment variables on first use: ```bash export BRAVE_API_KEY=your-key export OPENAI_API_KEY=your-key ``` ## How toolkits are organised Each `BuiltInTool` enum entry maps a stable `toolName` to the FQCN of a POJO in `tnsai-tools`. The POJO's public `@Tool`-annotated methods are the actual functions exposed to the LLM. For example, `BuiltInTool.CSV_TOOLS` backs `com.tnsai.tools.file.CsvTools`, which exposes `csv_summary`, `csv_columns`, `csv_filter`, `csv_head`, `csv_search`. Tables below list each toolkit's enum constant, backing class, the methods it exposes, and any required environment variables. Method names are what the LLM will see and call. ## search Web search, scraping, and specialised lookups. | Enum | Methods | API key | | -------------------------- | -------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- | | `WEB_SEARCH_TOOLS` | `duckduckgo`, `wikipedia`, `wikidata`, `searxng`, `npm`, `maven_central`, `brave_search`, `serpapi`, `tavily`, `exa` | Per-provider (none for the first six; `BRAVE_API_KEY`, `SERPAPI_API_KEY`, `TAVILY_API_KEY`, `EXA_API_KEY` for the last four) | | `WEB_SCRAPING_TOOLS` | `web_scraper`, `firecrawl` | `FIRECRAWL_API_KEY` (firecrawl only) | | `QNA_TOOLS` | `hackernews`, `stackoverflow_search` | None | | `SPECIALIZED_SEARCH_TOOLS` | `yahoo_finance_lookup`, `yahoo_finance_news`, `wolfram_alpha` | `WOLFRAM_APP_ID` (wolfram only) | ## academic Free / freemium scholarly databases. | Enum | Methods | API key | | ---------------------- | --------------------------------------------------------------------------------------------------------------- | ------- | | `ACADEMIC_TOOLS` | `arxiv_search`, `pubmed_search`, `semantic_scholar_search`, `openalex_search`, `crossref_search`, `dblp_search` | None | | `ACADEMIC_EXTRA_TOOLS` | `orcid_search`, `orcid_get`, `unpaywall_lookup`, `biorxiv_recent` | None | ## file Text and structured-document parsing. | Enum | Methods | Notes | | ---------------- | ------------------------------------------------------------------------------------ | -------------------------------------------- | | `CSV_TOOLS` | `csv_summary`, `csv_columns`, `csv_filter`, `csv_head`, `csv_search` | Accepts file path or literal CSV string | | `JSON_TOOLS` | `json_query` | Jayway JsonPath against file or literal JSON | | `XML_TOOLS` | `xml_query` | XPath against XML | | `PDF_TOOLS` | `pdf_extract_text`, `pdf_extract_pages`, `pdf_metadata`, `pdf_merge`, `pdf_to_image` | PDFBox-backed | | `MARKDOWN_TOOLS` | `markitdown` | Convert any supported source to Markdown | | `FILE_IO_TOOLS` | `file_read`, `file_write` | Sandboxed by Role policy | ### `CSV_TOOLS` — example `CsvTools` is the canonical example for the function-shape pattern. The five methods below are independent — the LLM picks the one it needs. ```java Agent agent = AgentBuilder.create() .llm(new OpenAIClient("gpt-4o")) .role(myRole) .builtInTools(BuiltInTool.CSV_TOOLS) .build(); agent.chat("Summarise /data/sales.csv and list the column headers."); // LLM emits two tool calls: csv_summary("/data/sales.csv") then csv_columns(...) ``` | Method | Purpose | Notes | | ------------- | -------------------------------------------------- | ---------------------------------------------- | | `csv_summary` | Row/column counts plus per-column dtype/null stats | Default for "describe this CSV" prompts | | `csv_columns` | Extract a subset of columns by name | Case-insensitive; falls back to numeric index | | `csv_filter` | Rows where a column's value contains a substring | Case-insensitive substring, capped at 100 rows | | `csv_head` | First N rows as a Markdown ASCII table | | | `csv_search` | All rows where any cell contains the term | Capped at 100 rows | The methods accept typed parameters (path, column name, etc.) — there's no `path|||command` text protocol any more. Each method is a regular Java method whose signature defines the LLM-facing schema (via `@Tool` and `@ToolParam`). ## communication Email, chat, and SMS sending. | Enum | Methods | Config | | ----------------- | -------------------------------------------------------------------- | --------------------------------------------------------------------------- | | `EMAIL_TOOLS` | `smtp_send`, `gmail_send`, `gmail_inbox` | `SMTP_HOST` / `SMTP_USER` / `SMTP_PASS` (SMTP); `GMAIL_OAUTH_TOKEN` (Gmail) | | `MESSAGING_TOOLS` | `slack_post`, `discord_post`, `twilio_sms_send`, `twilio_sms_status` | `SLACK_WEBHOOK_URL`, `DISCORD_WEBHOOK_URL`, `TWILIO_SID` + `TWILIO_TOKEN` | ## fintech Payment and BNPL platforms. | Enum | Methods | API key | | ------------------------- | --------------------------------------------------------------------------------------------------------------------------- | ---------------------------- | | `SQUARE_TOOLS` | `square_create_payment`, `square_list_payments`, `square_create_invoice`, `square_create_customer`, `square_search_catalog` | `SQUARE_ACCESS_TOKEN` | | `CASH_APP_PAY_TOOLS` | `cashapp_create_request`, `cashapp_get_request`, `cashapp_cancel_request` | `CASHAPP_CLIENT_ID` + secret | | `LIGHTNING_TOOLS` | `lightning_create_invoice`, `lightning_pay_invoice`, `lightning_decode_invoice` | `LN_API_URL` + macaroon | | `AFTERPAY_TOOLS` | `afterpay_create_checkout`, `afterpay_capture_payment`, `afterpay_get_order` | `AFTERPAY_API_KEY` | | `PAYMENT_ANALYTICS_TOOLS` | `payment_revenue_summary` | Varies (cross-processor) | ## commerce Storefront APIs. | Enum | Methods | API key | | --------------- | -------------------------------------------------- | ------------------------------- | | `SHOPIFY_TOOLS` | `shopify_search`, `shopify_get_product` | `SHOPIFY_API_KEY` + shop domain | | `ETSY_TOOLS` | `etsy_search`, `etsy_get_listing`, `etsy_get_shop` | `ETSY_API_KEY` | ## crm Customer-relationship platforms. | Enum | Methods | API key | | ------------------ | ------------------------------------------------------------------------ | -------------------------------- | | `HUBSPOT_TOOLS` | `hubspot_list_contacts`, `hubspot_get_contact`, `hubspot_create_contact` | `HUBSPOT_ACCESS_TOKEN` | | `SALESFORCE_TOOLS` | `salesforce_query`, `salesforce_search`, `salesforce_get_sobject` | `SF_ACCESS_TOKEN` + instance URL | ## database Relational, document, key-value, and vector stores. | Enum | Methods | Config | | ---------------- | --------------------------------------------------------------------------------------------------- | --------------------------------- | | `SQL_TOOLS` | `sql_query` | `JDBC_URL` + driver on classpath | | `MONGO_TOOLS` | `mongo_find`, `mongo_count` | `MONGO_URI` | | `REDIS_TOOLS` | `redis_get`, `redis_keys`, `redis_ttl`, `redis_set`, `redis_del` | `REDIS_URL` | | `QDRANT_TOOLS` | `qdrant_collections`, `qdrant_count`, `qdrant_search`, `qdrant_create_collection`, `qdrant_upsert` | `QDRANT_URL` (+ optional API key) | | `WEAVIATE_TOOLS` | `weaviate_classes`, `weaviate_count`, `weaviate_search`, `weaviate_create_class`, `weaviate_upsert` | `WEAVIATE_URL` | ## developer Repo, dependency, and project introspection. | Enum | Methods | API key | | ------------------------ | --------------------------------------------------------------------------------------------------------- | -------------------------------------------- | | `DEVELOPER_TOOLS` | `github_search_repos`, `github_search_code`, `github_search_issues`, `github_search_users`, `jshell_eval` | `GITHUB_TOKEN` (optional, raises rate limit) | | `DEPENDENCY_TOOLS` | `dependency_latest`, `dependency_compare`, `project_detect_type` | None | | `PROJECT_ANALYZER_TOOLS` | `project_tree`, `project_stats`, `project_languages` | None | ## productivity Calendars, task trackers, docs. | Enum | Methods | API key | | -------------- | ------------------------------------------------------------------------------------------------------------------------ | ---------------------------------- | | `GOOGLE_TOOLS` | `gcal_list_events`, `gcal_get_event`, `gcal_create_event`, `gdrive_list_files`, `gdrive_read_file`, `gdrive_create_file` | `GOOGLE_OAUTH_TOKEN` | | `JIRA_TOOLS` | `jira_search`, `jira_get_issue`, `jira_create_issue`, `jira_transition_issue` | `JIRA_BASE_URL` + `JIRA_API_TOKEN` | | `NOTION_TOOLS` | `notion_search`, `notion_get_page`, `notion_query_database`, `notion_create_page` | `NOTION_TOKEN` | | `TRELLO_TOOLS` | `trello_list_boards`, `trello_get_board`, `trello_list_cards`, `trello_create_card`, `trello_move_card` | `TRELLO_KEY` + `TRELLO_TOKEN` | ## media Audio and TTS. | Enum | Methods | API key | | ------------------ | ---------------------------------- | -------------------- | | `MEDIA_TOOLS` | `whisper_transcribe`, `openai_tts` | `OPENAI_API_KEY` | | `CHATTERBOX_TOOLS` | `chatterbox_tts` | `CHATTERBOX_API_KEY` | ## social Social-network APIs. | Enum | Methods | API key | | ---------------- | ----------------------------------------------------------------- | ------------------------------------------- | | `TWITTER_TOOLS` | `twitter_search`, `twitter_user`, `twitter_timeline` | `X_BEARER_TOKEN` | | `REDDIT_TOOLS` | `reddit_search`, `reddit_subreddit_posts`, `reddit_post_comments` | `REDDIT_CLIENT_ID` + `REDDIT_CLIENT_SECRET` | | `LINKEDIN_TOOLS` | `linkedin_me`, `linkedin_share` | `LINKEDIN_ACCESS_TOKEN` | ## ai Multimodal helpers. | Enum | Methods | API key | | -------------- | --------------- | ---------------- | | `VISION_TOOLS` | `image_analyze` | `GEMINI_API_KEY` | ## code Code execution through the framework's [Sandbox SPI](/docs/security/sandbox). Both built-in tools route through `SandboxFactory.byId("process")` by default with `ResourceLimits.standard()` + `NetPolicy.denyAll()`. Pass an explicit `SandboxFactory` + image-bearing `SandboxSpec` via the full-control constructors for container / WASM / Firecracker isolation. | Enum | Methods | Host requirement | | ------------------------ | ------------------------------------------------------------------------------------------------ | ------------------- | | `JS_EXECUTION_TOOLS` | `js_execute` | `node` on `PATH` | | `PYTHON_EXECUTION_TOOLS` | `python_execute`, `python_version` | `python3` on `PATH` | | `E2B_SANDBOX_TOOLS` | `e2b_create`, `e2b_execute`, `e2b_upload`, `e2b_download`, `e2b_install`, `e2b_list`, `e2b_kill` | `E2B_API_KEY` | ## utility Math, hashing, datetime, encoding. | Enum | Methods | API key | | ---------------- | --------------------------------------------------------------------- | ------------------------------------------- | | `UTILITY_TOOLS` | `calculator`, `hash`, `datetime_now`, `datetime_diff`, `datetime_add` | None | | `ENCODING_TOOLS` | `qr_generate`, `qr_base64`, `qr_read`, `google_translate` | `GOOGLE_TRANSLATE_API_KEY` (translate only) | ## diagram Diagram-as-code rendering. | Enum | Methods | API key | | --------------- | --------------------------------------- | ------- | | `DIAGRAM_TOOLS` | `mermaid_render`, `excalidraw_generate` | None | ## document DOCX / Office / image conversion. | Enum | Methods | API key | | ---------------- | ----------------------------------------------------------------------------------- | ------- | | `DOCUMENT_TOOLS` | `document_read`, `markdown_to_html`, `image_convert`, `image_info`, `slides_create` | None | ## visualization Charts, tables, infographics. | Enum | Methods | API key | | --------------------- | ------------------------------------------------------ | ------- | | `VISUALIZATION_TOOLS` | `chart_ascii`, `table_format`, `infographic_templates` | None | ## realtime Live FX, crypto, weather feeds. | Enum | Methods | API key | | ---------------- | ----------------------------------------------------------- | ------------------------------------------------------ | | `REALTIME_TOOLS` | `crypto_price`, `fx_rates`, `fx_convert`, `weather_current` | None (CoinGecko / Frankfurter / OpenWeatherMap public) | ## finance Borsa Istanbul market data. | Enum | Methods | API key | | --------------- | -------------------------------------------------- | ------- | | `FINANCE_TOOLS` | `bist_quote`, `bist_index`, `bist_fx`, `bist_gold` | None | ## trading Prediction markets. | Enum | Methods | API key | | --------------- | -------------------------------------------------------------- | ---------------- | | `TRADING_TOOLS` | `polymarket_markets`, `polymarket_search`, `polymarket_prices` | None (read-only) | ## scraping Apify actor execution. | Enum | Methods | API key | | ---------------- | ------------------------------------------------------------ | ------------- | | `SCRAPING_TOOLS` | `apify_run_actor`, `apify_search_actors`, `apify_actor_info` | `APIFY_TOKEN` | ## knowledge In-memory knowledge graph. | Enum | Methods | API key | | ----------------- | -------------------------------------------------------------------------- | ------- | | `KNOWLEDGE_TOOLS` | `kg_extract_entities`, `kg_extract_relations`, `kg_add_triple`, `kg_query` | None | ## memory Persistent recall. | Enum | Methods | API key | | -------------- | ------------------------------------------------ | -------------------- | | `MEMORY_TOOLS` | `reever_store`, `reever_recall`, `reever_search` | None (Reever-backed) | ## goose Desktop automation. | Enum | Methods | API key | | ----------------- | -------------------------------------------------------------------------------- | --------------------- | | `GOOSE_TOOLS` | `git_status`, `git_log`, `git_diff`, `git_branches`, `screenshot`, `system_info` | `git` on PATH | | `CLIPBOARD_TOOLS` | `clipboard_read_text`, `clipboard_write_text` | None (headless-aware) | ## project Repo-level read helpers. | Enum | Methods | API key | | --------------- | ----------------------------------------------------------------------------- | ------------------------------- | | `PROJECT_TOOLS` | `project_context`, `project_read_file`, `agentsmd_parse`, `agentsmd_generate` | None (sandboxed by Role policy) | `agentsmd_parse` returns an `AgentsMdContent` record (`intro` + ordered `sections: [{level, title, body}]`) parsed from `AGENTS.md`, with case-variant + `CLAUDE.md` + `README.md` fallback. Use this when an agent needs to route on individual sections (e.g. pull the "Setup" body) rather than the whole document. `agentsmd_generate` produces a draft `AGENTS.md` by detecting the build system from `pom.xml` / `package.json` / `pyproject.toml` / `Cargo.toml` / `go.mod` and filling in language-appropriate setup + test commands — returns the markdown string, the caller decides whether to write it. ## system HTTP and tool discovery. | Enum | Methods | API key | | -------------- | ----------------------------- | ------- | | `SYSTEM_TOOLS` | `http_request`, `tool_search` | None | ## Direct instantiation If you need to register a toolkit outside the `AgentBuilder.builtInTools(...)` path — for example, from a plugin loader — every entry has a no-arg `instantiate()` method that returns a fresh POJO ready for `ToolMethodRegistry`: ```java Object csvToolkit = BuiltInTool.CSV_TOOLS.instantiate(); // csvToolkit is a com.tnsai.tools.file.CsvTools instance ``` `instantiate()` throws `BuiltInToolInstantiationException` if `tnsai-tools` is missing from the classpath (the FQCN string isn't resolvable) or if the POJO's public no-arg constructor fails. ## Authoritative source The full per-toolkit Javadoc — including every `@Tool` method's exact signature and the corresponding `@ToolParam` constraints — lives in [`com.tnsai.enums.BuiltInTool`](https://github.com/TnsAI-Framework/TnsAI/blob/main/tnsai-core/src/main/java/com/tnsai/enums/BuiltInTool.java) and the backing POJOs under [`com.tnsai.tools.*`](https://github.com/TnsAI-Framework/TnsAI/tree/main/tnsai-tools/src/main/java/com/tnsai/tools). --- # Custom Tools URL: https://tnsai.dev/docs/capabilities/tools/custom-tools Description: A custom tool in TnsAI is a plain Java class with public methods annotated @Tool. The framework discovers them reflectively and exposes each method as a function the LLM can call. There is no base class to extend, no SPI to register, no Tool interface to implement — just an instance you hand to AgentBuilder.toolPojos(...). import { Callout } from 'fumadocs-ui/components/callout' For the shipped toolkits (CSV, PDF, web search, Jira, etc.), see the [Tool Catalog](/docs/capabilities/tools/catalog). ## A minimal toolkit ```java import com.tnsai.annotations.Tool; import com.tnsai.annotations.ToolParam; public class CalculatorTools { @Tool(name = "calculator", description = "Evaluate an arithmetic expression") public double calculator( @ToolParam(description = "Expression like '2 + 2 * (3 - 1)'") String expression ) { return new ExpressionParser().parse(expression).evaluate(); } } ``` Register and use it: ```java Agent agent = AgentBuilder.create() .llm(new OpenAIClient("gpt-4o")) .role(myRole) .toolPojos(new CalculatorTools()) .build(); agent.chat("What is 17% of 240?"); // LLM emits: calculator("240 * 0.17") -> 40.8 ``` The method name (`calculator`) is what the LLM sees and calls. `@ToolParam` descriptions surface in the JSON-Schema sent to the model — write them as you'd document an API parameter. ## Multiple methods on one POJO A toolkit groups related methods on a single class. Each `@Tool` method is independent — the LLM picks one per call. ```java public class WeatherTools { @Tool(name = "weather_current", description = "Current weather for a city") public WeatherSnapshot weatherCurrent( @ToolParam(description = "City name, e.g. 'Istanbul'") String city ) { return weatherClient.getCurrent(city); } @Tool(name = "weather_forecast", description = "5-day forecast for a city") public List weatherForecast( @ToolParam(description = "City name") String city, @ToolParam(description = "Number of days, 1-5") int days ) { return weatherClient.getForecast(city, days); } } ``` Register the whole toolkit in one line: ```java AgentBuilder.create() .llm(llm) .role(role) .toolPojos(new WeatherTools()) // both weather_current and weather_forecast registered .build(); ``` Method return values can be any type Jackson can serialise — POJOs, records, `Map`, `List`, primitives. The framework serialises the return value to JSON before handing it back to the LLM. ## Reading credentials Toolkits typically read API keys from environment variables on first use rather than via the constructor. Keeps the `BuiltInTool.instantiate()` path (no-arg constructors) compatible with credential-bearing toolkits. ```java public class WeatherTools { private static String requireApiKey() { String key = System.getenv("WEATHER_API_KEY"); if (key == null || key.isBlank()) { throw new IllegalStateException( "WEATHER_API_KEY environment variable is required"); } return key; } @Tool(name = "weather_current", description = "Current weather for a city") public WeatherSnapshot weatherCurrent(@ToolParam(description = "City") String city) { String key = requireApiKey(); // ... } } ``` ## Mixing custom POJOs with shipped toolkits `toolPojos(...)` and `builtInTools(...)` accumulate into the same `ToolMethodRegistry`. Names must be unique across every registered toolkit — a clash fails fast at `build()` time. ```java Agent agent = AgentBuilder.create() .llm(llm) .role(role) .builtInTools(BuiltInTool.WEB_SEARCH_TOOLS, BuiltInTool.UTILITY_TOOLS) .toolPojos(new WeatherTools(), new MyDomainTools()) .build(); ``` ## Tools that need to be defined at runtime When a tool's identity is only known at runtime — for example, an MCP proxy fronting a remote server's catalog — use `DynamicToolMethod` instead of an annotated POJO: ```java import com.tnsai.tools.method.DynamicToolMethod; DynamicToolMethod proxy = DynamicToolMethod.builder() .name("remote_search") .description("Search the remote knowledge base") .parameter("query", "string", "Search term") .handler(args -> remoteClient.search((String) args.get("query"))) .build(); AgentBuilder.create() .llm(llm) .role(role) .dynamicTool(proxy) .build(); ``` `DynamicToolMethod` and POJO `@Tool` methods share the same registry and dispatcher — the LLM can't tell them apart. ## Per-action LLM overrides If a specific `@ActionSpec(type = LLM)` action needs its own system prompt or temperature without changing the agent's global LLM config, set them directly on the annotation: ```java @ActionSpec( type = ActionType.LLM, description = "Extract entities from text — must be deterministic", llmSystemPrompt = "You are a precise NER extractor. Output JSON only.", llmTemperature = 0.0f ) public String extractEntities(String text) { return "Extract entities from: " + text; } ``` `llmSystemPrompt` overrides the LLM client's default system prompt for this action only; `llmTemperature >= 0` overrides the temperature. Tool exposure stays at the agent level — every `@ActionSpec(type = LLM)` action sees the agent's complete tool registry. ## Permission control Use `setToolCallFilter` to gate or block specific tool calls — see [Tool Integration](/docs/capabilities/tools/registration#tool-call-filters). ## Observability Use `setToolCallListener` to log every tool invocation — see [Tool Integration](/docs/capabilities/tools/registration#tool-call-listeners). --- # Tool Use Examples URL: https://tnsai.dev/docs/capabilities/tools/examples Description: Tool use examples are concrete input/output pairs (and counter-examples) that travel alongside a tool definition to the LLM. They teach the model the call patterns the tool expects — including patterns that look reasonable from a language standpoint but break the contract. import { Callout } from 'fumadocs-ui/components/callout' Anthropic reports tool examples improved accuracy from 72% to 90% on complex parameter handling in their internal testing — see [Introducing advanced tool use on the Claude Developer Platform](https://www.anthropic.com/engineering/advanced-tool-use). TnsAI exposes this surface via the `@ToolExample` annotation. ## What and why The hardest LLM tool failures are not "model picked the wrong tool." They are "model called the right tool with arguments the docs technically allowed but that the implementation rejects." Examples give the model a small set of canonical call shapes it can pattern-match against. A counter-example pinned to a specific `whyBad` reason tells the model what to avoid. Use examples when: - A tool has non-obvious parameter combinations (date ranges, optional filters, sentinel values). - A tool has side effects whose contract is easy to violate (cursors that must advance once per turn, "DONE" terminators that must not be paraphrased). - The description has grown beyond two short paragraphs and starts feeling like rules-as-prose. Skip examples when the tool is `add(a, b)`-trivial. Examples cost tokens on every request — there is no cache that elides them on subsequent turns. ## Annotation reference `@ToolExample` is defined in [`tnsai-core/src/main/java/com/tnsai/annotations/ToolExample.java`](https://github.com/TnsAI-Framework/TnsAI/blob/main/tnsai-core/src/main/java/com/tnsai/annotations/ToolExample.java) with `@Target({})` — meaning it is only valid as a value inside `@ActionSpec.examples`. It cannot appear standalone. | Field | Type | Required | Purpose | | ------------- | ------- | -------------------------- | ---------------------------------------------------------------------------------------------------------------- | | `description` | String | optional | Brief framing of what this example demonstrates. Surfaces in tool description prose for non-Anthropic providers. | | `input` | String | **required** | Example input as a JSON string. Must match the tool's parameter schema. | | `output` | String | optional | Expected output, or a description of what the result should look like. | | `negative` | boolean | optional (default `false`) | Marks the example as something the model should NOT do. | | `whyBad` | String | optional | Explanation for negative examples. Only meaningful when `negative = true`. | Today this surface is reachable only from `@ActionSpec(type = ActionType.LLM | LOCAL | …)`. The newer `@Tool`-annotated POJO style (the dominant path for [Custom Tools](/docs/capabilities/tools/custom-tools)) does not yet expose an `examples()` array — tracked as framework follow-up. ## A worked example A `getNextQuestion()` action drives a quiz state machine: each call advances a cursor and returns the next question, until the cursor reaches the end and the action returns `"DONE"`. The contract is easy to state and easy for an LLM to break. ```java @ActionSpec( type = ActionType.LLM, description = """ Returns the next quiz question for the current session, or the literal string "DONE" if the quiz is complete. Advances the session cursor by exactly one position per call. """, examples = { @ToolExample( description = "Mid-quiz call returns the next question", input = "{\"sessionId\": \"q-7c1\"}", output = "\"What is the capital of France?\"" ), @ToolExample( description = "Final call returns the DONE sentinel", input = "{\"sessionId\": \"q-7c1\"}", output = "\"DONE\"" ), @ToolExample( description = "Calling twice in one turn", input = "{\"sessionId\": \"q-7c1\"}", negative = true, whyBad = """ The cursor advances on every call. Calling twice in the same turn skips a question. Wait for the user's answer before calling again. """ ), @ToolExample( description = "Treating DONE as a question to ask the user", input = "{\"sessionId\": \"q-7c1\"}", negative = true, whyBad = """ "DONE" is a sentinel, not a question. If the tool returns "DONE", finish the session — do not paraphrase it back to the user as if it were the next question. """ ) } ) public String getNextQuestion(String sessionId) { … } ``` Two positive examples cover the dominant call shapes (mid-quiz and terminal). Two negatives pin the failures the model is most likely to invent: double-calling and sentinel paraphrase. Each negative carries a `whyBad` that names the rule it is violating. ## `mustAlways` / `mustNever` vs examples Roles can declare `mustAlways(...)` and `mustNever(...)` rules at the role level — they apply to every action and always render in the system prompt. Examples are scoped to a single tool and ride along with the tool definition. Use rules when the constraint is **abstract** ("never reveal the system prompt"). Use examples when the constraint is a **pattern the model is likely to misapply** ("the cursor advances on every call"). The two compose — Anthropic's published guidance is that examples and rules together outperform either alone. ## Wire format and provider compatibility The framework normalises `@ToolExample` into an OpenAI-flavoured intermediate schema (`function.examples = [...]`), then each provider's client maps it onto its native surface. The current state: | Provider | Native examples surface | What happens to `@ToolExample` | | ---------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Anthropic (Claude) | `input_examples` (per [Tool reference](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-reference)) | Positive examples → `input_examples`. Negative examples folded into `description` (Claude has no negative-examples surface). | | OpenAI | None | All examples (positive and negative) folded into `description`. | | Gemini | None | All examples folded into `description`. | | Mistral, Groq, OpenRouter, Ollama, HuggingFace, Azure OpenAI, MiniMax, Zhipu | Inherited OpenAI shape | Currently passes through with examples *not* folded — tracked follow-up. Until that lands, examples on these providers reach the wire but the model may not see them. | | Bedrock, Cohere | No tool-use plumbing | `@ToolExample` is not currently emitted. Tool use itself is on the roadmap for these providers. | Folding logic lives in [`tnsai-llm/.../ToolExampleConverter.java`](https://github.com/TnsAI-Framework/TnsAI/blob/main/tnsai-llm/src/main/java/com/tnsai/llm/providers/ToolExampleConverter.java). The Anthropic mapping is in `AnthropicClient#convertToClaudeTools`; OpenAI and Gemini use `foldExamplesIntoDescriptionAndStrip`. The schema generator that emits the intermediate `examples` array is [`ToolSchemaGenerator`](https://github.com/TnsAI-Framework/TnsAI/blob/main/tnsai-core/src/main/java/com/tnsai/schema/ToolSchemaGenerator.java). ## Best practices - **Two to three examples per tool.** Cover the dominant case and one boundary. More than four starts to dilute attention and balloons request size. - **Pair negatives with `whyBad`.** A negative without a stated reason is just confusion. Name the rule the example violates. - **Use realistic input.** `{"query": "machine learning", "limit": 10}` beats `{"q": "x", "n": 1}`. The model imitates the shape it sees. - **Reserve negatives for non-obvious failures.** Don't write a negative example for "called the wrong tool" — that's what tool selection is for. Use negatives for sentinel/cursor/idempotency mistakes. - **Account for the token cost.** Every example ships on every request. A tool that's called once per session can absorb a richer example set than one called every turn. ## See also - [Custom Tools](/docs/capabilities/tools/custom-tools) — how to author the tools that examples attach to. - [Registration](/docs/capabilities/tools/registration) — `setToolCallFilter` and `setToolCallListener` for the runtime side. - [Anthropic — Introducing advanced tool use](https://www.anthropic.com/engineering/advanced-tool-use) — source of the 72% → 90% accuracy figure. - [Claude API — Tool reference](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-reference) — `input_examples` field documentation. --- # Idempotency URL: https://tnsai.dev/docs/capabilities/tools/idempotency Description: When a tool gets retried — by @Resilience(retry = …), by an upstream gateway, by a flaky network — the framework needs a way to make sure the side effect happens once. @Idempotent plus an IdempotencyStore is that mechanism. import { Callout } from 'fumadocs-ui/components/callout' This page covers when to reach for it, what the surface looks like, and which store fits which deployment. ## When to use it Idempotency protection earns its keep on actions whose side effect shouldn't repeat: - `send_email(to, subject, body)` — three retries shouldn't mean three emails. - `create_github_issue(title, body)` — three retries shouldn't mean three issues. - `charge_card(amount)` — definitely not three charges. - `append_to_log(msg)` — duplicate entries are a debugging nightmare. It's overhead for read-only operations (`getUser`, `searchWeb`) — skip it there. It's also wrong for *intentionally* non-idempotent operations (`generateRandomId`, `currentTime`). ## The annotation ```java @ActionSpec(type = ActionType.WEB_SERVICE, endpoint = "https://api.example.com/email") @Idempotent(strategy = KeyStrategy.HASH_INPUT, ttlSeconds = 86400) public EmailResult sendEmail(String to, String subject, String body) { return null; } ``` Three knobs: | Field | Meaning | Default | | --------------- | ------------------------------------------- | --------------- | | `strategy` | How the dedup key is derived from the call. | `HASH_INPUT` | | `ttlSeconds` | How long the cache remembers this call. | 3600 (1 hour) | | `onCacheHit` | What to do when a retry hits the cache. | `RETURN_CACHED` | | `cacheFailures` | Whether to cache failed outcomes too. | `false` | ### `KeyStrategy` - **`HASH_INPUT`** — SHA-256 of the canonicalised input. Works when the input is the natural identity ("send THIS exact email"). - **`EXPLICIT`** — your tool overrides `idempotencyKeyFor(input)` and returns whatever string makes sense ("issue-" + title.normalize()). Use this when one input field is the de-facto identity. - **`UUID`** — fresh random key per call. Propagated to upstream services that honour `Idempotency-Key` headers; client-side dedup doesn't fire because each call has a unique key. - **`NONE`** — no protection, default for actions not annotated. ### `RetryBehavior` - **`RETURN_CACHED`** — second call returns the cached outcome. - **`RETURN_CACHED_IF_SUCCESS`** — second call returns cached only if the original succeeded; failed calls fall through and retry. - **`FAIL_FAST`** — second call throws `IdempotencyException`, surfacing the duplicate to the caller. ## A worked example — state-machine The `getNextQuestion()` action that drives a quiz advances a cursor on every call. A retry that's actually a *re-issue* needs to skip the side effect; a retry that's a fresh question needs to run. ```java @ActionSpec( type = ActionType.LLM, description = "Returns the next quiz question for the session, or DONE." ) @Idempotent(strategy = KeyStrategy.EXPLICIT, ttlSeconds = 60) public String getNextQuestion(String sessionId) { return cursorAdvance(sessionId); } @Override public String idempotencyKeyFor(Map input) { // Per-turn key — the same sessionId in the same minute returns // the same answer, but a fresh turn (new turn id passed by the // orchestrator) gets a fresh key and runs. return "quiz:" + input.get("sessionId") + ":" + input.get("turnId"); } ``` ## HTTP `Idempotency-Key` header propagation For `@ActionSpec(type = WEB_SERVICE)` calls on idempotent actions, the framework injects: ```http POST /api/v1/emails HTTP/1.1 Idempotency-Key: 01JFVT8M7KP9X3QNJB5T9VRYC0-a3b2c1 Content-Type: application/json ``` Stripe, SendGrid, Twilio, GitHub all honour this header and dedup server-side. If the upstream doesn't, the client-side store still catches the duplicate before the request goes out. ## Picking a store The framework ships three implementations of `IdempotencyStore`: | Store | When | Survives restart? | Cross-process? | | -------------------------- | -------------------------------------------------------------------- | --------------------------------- | -------------- | | `InMemoryIdempotencyStore` | Default. Dev, tests, single-process production. | No | No | | `RedisIdempotencyStore` | Production with multiple instances. | Yes (per Redis durability config) | Yes | | `PostgresIdempotencyStore` | Production where you already have Postgres + want strong durability. | Yes | Yes | ### `RedisIdempotencyStore` Optional dependency (`redis.clients:jedis` — declared `` on `tnsai-core`; consumers add it themselves): ```xml redis.clients jedis 7.5.0 ``` ```java JedisPool pool = new JedisPool("redis://prod-redis:6379"); IdempotencyStore store = new RedisIdempotencyStore(pool); // Or with a custom key prefix when sharing one Redis cluster across // unrelated deployments: IdempotencyStore store = new RedisIdempotencyStore(pool, "myapp:idem:"); ``` TTL is enforced server-side via `SET … EX …`, so stale entries never need a client-side sweep. Entries are JSON-encoded; a typed POJO cached in process A and read in process B comes back as a `Map`/`List` mosaic — re-marshal at the call site if you need the original type. ### `PostgresIdempotencyStore` No new dependency — uses standard JDBC: ```java DataSource ds = configureDataSource(); PostgresIdempotencyStore store = new PostgresIdempotencyStore(ds); store.createTableIfMissing(); // dev / quick-start ``` Production deployments typically run schema migration via Flyway / Liquibase rather than calling `createTableIfMissing()`. The DDL the store expects: ```sql CREATE TABLE idempotency_entries ( idempotency_key VARCHAR(512) PRIMARY KEY, entry_json TEXT NOT NULL, expires_at TIMESTAMP WITH TIME ZONE NOT NULL ); CREATE INDEX idx_idempotency_expires_at ON idempotency_entries (expires_at); ``` Expiry is lazy — every `get()` filters on `expires_at > now()`. Run `store.purgeExpired()` periodically (cron, scheduled job) to keep the table from growing unboundedly. The implementation uses ANSI-portable UPDATE-then-INSERT (no dialect-specific `ON CONFLICT` / `MERGE INTO`), so it also works on H2 in PostgreSQL mode for tests, and CockroachDB / any Postgres-compatible engine in production. ### Which one? - Single-process app, dedup window seconds-to-minutes → `InMemoryIdempotencyStore`. - Multiple framework instances behind a load balancer → `RedisIdempotencyStore`. Hottest, cheapest dedup. - Operationally already on Postgres, value durability over latency → `PostgresIdempotencyStore`. ## Failure caching Default: only successes are cached. A retry of a failed call falls through and re-attempts. That's usually what you want for transient failures. Set `cacheFailures = true` for *deterministic* failures — calls that will keep failing the same way no matter how many times you retry: ```java @Idempotent(strategy = KeyStrategy.HASH_INPUT, cacheFailures = true, onCacheHit = RetryBehavior.FAIL_FAST) public Order placeOrder(OrderRequest req) { … } ``` A duplicate `placeOrder` with the same content gets `IdempotencyException` on the second call, no upstream traffic, immediate fail. Useful when the failure itself is the dedup signal ("this order was already rejected for fraud — don't re-submit"). ## Best practices - **Pair with `@Resilience(retry = …)`.** The retry tells the framework *to* retry; `@Idempotent` tells it *how to retry safely*. - **Match TTL to the operation's natural window.** A "create issue" call's dedup window can sensibly be 24h (you wouldn't want to re-create the same issue tomorrow either); a "send notification" might be 5 minutes (after that, a re-send is intentional). - **Use `EXPLICIT` for tools whose key is one stable input field.** `HASH_INPUT` over the full input is conservative but produces a fresh key when any field changes — including fields that shouldn't define identity (request\_id, current\_timestamp). - **Document why** a tool is or isn't idempotent at the annotation: `@Idempotent(description = "Sets order to fixed status — safe to retry")`. ## See also - [Custom Tools](/docs/capabilities/tools/custom-tools) — authoring tools that idempotency attaches to. - [Examples](/docs/capabilities/tools/examples) — `@ToolExample` for the LLM-facing surface. - [Stripe — idempotent requests](https://docs.stripe.com/api/idempotent_requests) — the client-side primer for the HTTP `Idempotency-Key` header. - [IETF draft — HTTP Idempotency-Key](https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header) — the standardisation effort. --- # Tools URL: https://tnsai.dev/docs/capabilities/tools Description: Use the shipped POJO toolkit catalog, write custom @Tool methods, and control how the LLM dispatches them. import { Callout } from 'fumadocs-ui/components/callout' ## Pages - [Catalog](/docs/capabilities/tools/catalog) — 62 shipped POJO toolkits (\~206 `@Tool` methods) across 29 categories. - [Custom Tools](/docs/capabilities/tools/custom-tools) — Write a POJO with `@Tool`-annotated methods, register via `AgentBuilder.toolPojos(...)`. - [Examples](/docs/capabilities/tools/examples) — Use `@ToolExample` to attach positive and negative call patterns; provider wire-format mapping. - [Idempotency](/docs/capabilities/tools/idempotency) — `@Idempotent` + `IdempotencyStore` for safe retry semantics. Redis / Postgres / in-memory adapters. - [Registration](/docs/capabilities/tools/registration) — `builtInTools(...)` + `toolPojos(...)` + `dynamicTool(...)`, plus `setToolCallFilter` / `setToolCallListener` hooks. - [Registration (Advanced)](/docs/capabilities/tools/registration-advanced) — Filters, listeners, and runtime introspection. - [Multimodal](/docs/capabilities/tools/multimodal) — `IMAGE_GEN_TOOLS`, `TEXT_TO_SPEECH_TOOLS`, `SPEECH_TO_TEXT_TOOLS` aggregator toolkits (DALL-E 3, FLUX, Stability, ElevenLabs, Cartesia, Deepgram, AssemblyAI, Replicate Whisper). - [Advanced](/docs/capabilities/tools/advanced) — Per-action LLM overrides, dispatcher inspection, Composio integration. For MCP-sourced tools see [MCP / Client](/docs/mcp/client). --- # Multimodal Tools URL: https://tnsai.dev/docs/capabilities/tools/multimodal Description: Three aggregator toolkits in tnsai-tools give an agent text→image, text→speech, and speech→text capability without the consumer having to write provider plumbing. Each toolkit is a function-shape POJO (RFC #188) that exposes one @Tool-annotated method per backend provider, so the LLM can pick a provider at call time based on quality, latency, or cost. import { Callout } from 'fumadocs-ui/components/callout' | Toolkit | `BuiltInTool` enum | Methods | Modality | | ------------------- | ---------------------- | ------------------------------------------------------------------- | ----------------- | | `ImageGenTools` | `IMAGE_GEN_TOOLS` | `dalle3_generate`, `flux_generate`, `stability_generate` | Text → image | | `TextToSpeechTools` | `TEXT_TO_SPEECH_TOOLS` | `elevenlabs_tts`, `cartesia_tts`, `deepgram_tts` | Text → speech | | `SpeechToTextTools` | `SPEECH_TO_TEXT_TOOLS` | `deepgram_transcribe`, `assemblyai_transcribe`, `replicate_whisper` | Audio file → text | Pairs with the older OpenAI-only `MEDIA_TOOLS` (`openai_tts`, `whisper_transcribe`) — register both when you want OpenAI plus a non-OpenAI fallback in the same agent. ## Quick start Register the toolkits the same way as any other built-in tool. The framework instantiates the backing POJO, scans for `@Tool` methods, and exposes each as a separate function the LLM can call: ```java import com.tnsai.agents.AgentBuilder; import com.tnsai.enums.BuiltInTool; import com.tnsai.llm.providers.OpenAIClient; Agent artist = AgentBuilder.create() .llm(new OpenAIClient("gpt-4o")) .role(myRole) .builtInTools( BuiltInTool.IMAGE_GEN_TOOLS, // dalle3_generate, flux_generate, stability_generate BuiltInTool.TEXT_TO_SPEECH_TOOLS, // elevenlabs_tts, cartesia_tts, deepgram_tts BuiltInTool.SPEECH_TO_TEXT_TOOLS // deepgram_transcribe, assemblyai_transcribe, replicate_whisper ) .build(); String response = artist.chat("Draw me a watercolor painting of a giraffe on Mars"); // LLM picks dalle3_generate, calls it; the agent reply embeds the URL. ``` Each `@Tool` reads its API key from a process environment variable on first call. Missing keys throw `IllegalStateException` with the exact variable name in the message. ## Image generation — `IMAGE_GEN_TOOLS` | Method | Backend | Auth | Output shape | | -------------------- | ------------------------------------------- | --------------------- | -------------------------------------------- | | `dalle3_generate` | OpenAI DALL-E 3 | `OPENAI_API_KEY` | `{provider, model, urls[], revised_prompt?}` | | `flux_generate` | Black Forest Labs FLUX (via Replicate sync) | `REPLICATE_API_TOKEN` | `{provider, model, urls[]}` | | `stability_generate` | Stability AI Stable Image v2 | `STABILITY_API_KEY` | `{provider, model, urls[], finish_reason?}` | `dalle3_generate` and `flux_generate` return provider-hosted URLs. **DALL-E URLs expire roughly an hour after generation** — fetch and restage if you need persistence. FLUX URLs live for the duration documented by Replicate. `stability_generate` returns image bytes, which the tool encodes inline as a `data:image/png;base64,…` data URI. The same pattern applies to every TTS method below — uniform shape so the agent doesn't need per-provider plumbing. ```java // Each method validates parameters before the HTTP call so a bad LLM // argument fails fast with a clear message instead of a provider 4xx. String json = new ImageGenTools().dalle3Generate( "A watercolor giraffe on Mars", "1024x1024", // 1024x1024 | 1792x1024 | 1024x1792 "hd", // standard | hd 1 // DALL-E 3 supports n=1 only; tool rejects n>1 upfront ); ``` Defaults are provider-appropriate: DALL-E `1024x1024` standard, FLUX `schnell` model with `1:1` aspect ratio in `webp`, Stability `core` model with `1:1` aspect ratio. ## Text-to-speech — `TEXT_TO_SPEECH_TOOLS` | Method | Backend | Auth | Strength | | ---------------- | --------------------------------------------- | -------------------- | ----------------------------- | | `elevenlabs_tts` | ElevenLabs Multilingual v2 (4 model variants) | `ELEVENLABS_API_KEY` | Quality leader, voice cloning | | `cartesia_tts` | Cartesia Sonic-2 | `CARTESIA_API_KEY` | Latency leader (\~75 ms TTFB) | | `deepgram_tts` | Deepgram Aura (12 voice presets) | `DEEPGRAM_API_KEY` | Cheapest of the three | All three return the same envelope: ```json { "provider": "elevenlabs", "model": "eleven_multilingual_v2", "audio_uri": "data:audio/mpeg;base64,SUQzBAAAAAAAJ...", "audio_bytes": 45920 } ``` `audio_uri` is ready for `