Tool use examples are concrete input/output pairs (and counter-examples) that travel alongside a tool definition to the LLM. They teach the model the call patterns the tool expects — including patterns that look reasonable from a language standpoint but break the contract.

Anthropic reports tool examples improved accuracy from 72% to 90% on complex parameter handling in their internal testing — see Introducing advanced tool use on the Claude Developer Platform. TnsAI exposes this surface via the @ToolExample annotation.

What and why

The hardest LLM tool failures are not "model picked the wrong tool." They are "model called the right tool with arguments the docs technically allowed but that the implementation rejects." Examples give the model a small set of canonical call shapes it can pattern-match against. A counter-example pinned to a specific whyBad reason tells the model what to avoid.

Use examples when:

A tool has non-obvious parameter combinations (date ranges, optional filters, sentinel values).
A tool has side effects whose contract is easy to violate (cursors that must advance once per turn, "DONE" terminators that must not be paraphrased).
The description has grown beyond two short paragraphs and starts feeling like rules-as-prose.

Skip examples when the tool is add(a, b)-trivial. Examples cost tokens on every request — there is no cache that elides them on subsequent turns.

Annotation reference

@ToolExample is defined in tnsai-core/src/main/java/com/tnsai/annotations/ToolExample.java with @Target({}) — meaning it is only valid as a value inside @ActionSpec.examples. It cannot appear standalone.

Field	Type	Required	Purpose
`description`	String	optional	Brief framing of what this example demonstrates. Surfaces in tool description prose for non-Anthropic providers.
`input`	String	required	Example input as a JSON string. Must match the tool's parameter schema.
`output`	String	optional	Expected output, or a description of what the result should look like.
`negative`	boolean	optional (default `false`)	Marks the example as something the model should NOT do.
`whyBad`	String	optional	Explanation for negative examples. Only meaningful when `negative = true`.

Today this surface is reachable only from @ActionSpec(type = ActionType.LLM | LOCAL | …). The newer @Tool-annotated POJO style (the dominant path for Custom Tools) does not yet expose an examples() array — tracked as framework follow-up.

A worked example

A getNextQuestion() action drives a quiz state machine: each call advances a cursor and returns the next question, until the cursor reaches the end and the action returns "DONE". The contract is easy to state and easy for an LLM to break.

@ActionSpec(
    type = ActionType.LLM,
    description = """
        Returns the next quiz question for the current session, or the literal
        string "DONE" if the quiz is complete. Advances the session cursor by
        exactly one position per call.
        """,
    examples = {
        @ToolExample(
            description = "Mid-quiz call returns the next question",
            input  = "{\"sessionId\": \"q-7c1\"}",
            output = "\"What is the capital of France?\""
        ),
        @ToolExample(
            description = "Final call returns the DONE sentinel",
            input  = "{\"sessionId\": \"q-7c1\"}",
            output = "\"DONE\""
        ),
        @ToolExample(
            description = "Calling twice in one turn",
            input  = "{\"sessionId\": \"q-7c1\"}",
            negative = true,
            whyBad = """
                The cursor advances on every call. Calling twice in the same
                turn skips a question. Wait for the user's answer before
                calling again.
                """
        ),
        @ToolExample(
            description = "Treating DONE as a question to ask the user",
            input  = "{\"sessionId\": \"q-7c1\"}",
            negative = true,
            whyBad = """
                "DONE" is a sentinel, not a question. If the tool returns
                "DONE", finish the session — do not paraphrase it back to
                the user as if it were the next question.
                """
        )
    }
)
public String getNextQuestion(String sessionId) { … }

Two positive examples cover the dominant call shapes (mid-quiz and terminal). Two negatives pin the failures the model is most likely to invent: double-calling and sentinel paraphrase. Each negative carries a whyBad that names the rule it is violating.

`mustAlways` / `mustNever` vs examples

Roles can declare mustAlways(...) and mustNever(...) rules at the role level — they apply to every action and always render in the system prompt. Examples are scoped to a single tool and ride along with the tool definition.

Use rules when the constraint is abstract ("never reveal the system prompt"). Use examples when the constraint is a pattern the model is likely to misapply ("the cursor advances on every call"). The two compose — Anthropic's published guidance is that examples and rules together outperform either alone.

Wire format and provider compatibility

The framework normalises @ToolExample into an OpenAI-flavoured intermediate schema (function.examples = [...]), then each provider's client maps it onto its native surface. The current state:

Provider	Native examples surface	What happens to `@ToolExample`
Anthropic (Claude)	`input_examples` (per Tool reference)	Positive examples → `input_examples`. Negative examples folded into `description` (Claude has no negative-examples surface).
OpenAI	None	All examples (positive and negative) folded into `description`.
Gemini	None	All examples folded into `description`.
Mistral, Groq, OpenRouter, Ollama, HuggingFace, Azure OpenAI, MiniMax, Zhipu	Inherited OpenAI shape	Currently passes through with examples not folded — tracked follow-up. Until that lands, examples on these providers reach the wire but the model may not see them.
Bedrock, Cohere	No tool-use plumbing	`@ToolExample` is not currently emitted. Tool use itself is on the roadmap for these providers.

Folding logic lives in tnsai-llm/.../ToolExampleConverter.java. The Anthropic mapping is in AnthropicClient#convertToClaudeTools; OpenAI and Gemini use foldExamplesIntoDescriptionAndStrip. The schema generator that emits the intermediate examples array is ToolSchemaGenerator.

Best practices

Two to three examples per tool. Cover the dominant case and one boundary. More than four starts to dilute attention and balloons request size.
Pair negatives with whyBad. A negative without a stated reason is just confusion. Name the rule the example violates.
Use realistic input. {"query": "machine learning", "limit": 10} beats {"q": "x", "n": 1}. The model imitates the shape it sees.
Reserve negatives for non-obvious failures. Don't write a negative example for "called the wrong tool" — that's what tool selection is for. Use negatives for sentinel/cursor/idempotency mistakes.
Account for the token cost. Every example ships on every request. A tool that's called once per session can absorb a richer example set than one called every turn.

Tool Use Examples

What and why

Annotation reference

A worked example

`mustAlways` / `mustNever` vs examples

Wire format and provider compatibility

Best practices

See also

On this page

Tool Use Examples

What and why

Annotation reference

A worked example

mustAlways / mustNever vs examples

Wire format and provider compatibility

Best practices

See also

On this page

`mustAlways` / `mustNever` vs examples