Designing Tools That LLMs Can Actually Use Reliably

— LLMs can call functions and APIs, but they'll make mistakes you'd never see from human developers. Here's how to design tool interfaces that minimize errors.

level: intermediate topics: agents, tool-use, api-design tags: agents, tools, api-design, reliability

You give an LLM access to a function that queries a database. It calls the function with the wrong parameter type. Or it uses a parameter name that doesn’t exist. Or it calls the function when it shouldn’t, wasting API quota.

These aren’t edge cases—they’re common failure modes. LLMs aren’t compilers. They don’t enforce type safety, they don’t throw helpful errors when parameters are wrong, and they don’t inherently understand your API’s preconditions.

If you design tools the way you’d design them for human developers, your agent will fail frequently. Design them for LLMs, and reliability improves dramatically.

The Parameter Problem

Human developers read documentation, understand types, and use IDEs that autocomplete parameter names. LLMs have none of these affordances.

String parameters are dangerous: If a parameter accepts an enum of valid values (like “status” being “active”, “pending”, or “closed”), the LLM might hallucinate “in_progress” or “complete” instead. Then your function fails because it received an invalid value.

Better: explicitly enumerate valid options in the tool description. Even better: use structured schemas that the LLM API can validate before calling your function.

Optional parameters cause confusion: If a function has 5 optional parameters, the LLM might guess which ones are needed or omit critical ones. Each optional parameter multiplies the complexity of correct usage.

Better: create separate tools for different use cases. Instead of one “search” function with optional filters, create “search_by_date”, “search_by_author”, and “search_all”. Simpler tools are more reliably used.

Complex nested objects are error-prone: If a parameter expects a nested JSON structure, the LLM might get the nesting wrong or misspell keys.

Better: flatten parameters when possible. Instead of passing {"user": {"id": 123, "role": "admin"}}, accept user_id and user_role as separate parameters.

Descriptions Are Your Documentation

The LLM doesn’t read your codebase or internal documentation. It only knows what you tell it in the tool description.

Be explicit about when to use the tool: Don’t just describe what the tool does—explain when it’s appropriate. “Use this tool to find customer account information when the user provides an email address or account ID. Do not use this for general search queries.”

Specify parameter formats precisely: Not “user ID”, but “user ID as an integer (not a string)”. Not “date”, but “date in YYYY-MM-DD format”.

Warn about common mistakes: If a parameter is commonly misused, say so. “Note: The ‘query’ parameter should be a search term, not a full question. Extract keywords before calling this tool.”

Explain error cases: “This tool returns null if the account doesn’t exist. Check the result before proceeding.”

Think of tool descriptions as guardrails. The more clearly you specify correct usage, the fewer errors you’ll see.

Idempotency and Retry Safety

LLMs sometimes call the same tool twice by mistake. They might retry after an error, or they might repeat a call because the conversation context is confusing.

If your tool has side effects (creating records, sending emails, charging payments), duplicate calls cause problems.

Make tools idempotent when possible: If the LLM calls “create_customer” twice with the same email, don’t create two records—return the existing one or error gracefully.

Require confirmation for destructive actions: Don’t let the LLM directly call “delete_account” or “charge_credit_card”. Instead, return a confirmation prompt to the user and only execute after explicit human approval.

Use request IDs for deduplication: If your infrastructure supports it, include a unique request ID with each tool call. If the same ID is called twice, treat it as a retry and return the cached result.

Error Messages the LLM Can Understand

When a tool call fails, you return an error. But a stack trace or generic error message won’t help the LLM recover.

Make errors actionable: Instead of “Invalid parameter”, say “The ‘status’ parameter must be one of: active, pending, closed. You provided: in_progress.”

Suggest corrections: “User ID not found. Did you mean to search by email instead? Use the search_user_by_email tool.”

Distinguish retryable from non-retryable errors: If the error is a rate limit, the LLM can retry later. If the error is a nonexistent resource, retrying won’t help. Make this clear in the error message.

Scoping Tool Capabilities

The more powerful a tool, the harder it is for an LLM to use correctly.

Narrow tools are more reliable: Instead of one “database_query” tool that accepts arbitrary SQL, create specific tools: “get_customer_by_email”, “list_recent_orders”, “update_shipping_address”. The LLM can’t make SQL injection mistakes if it’s not writing SQL.

Limit blast radius: If a tool can modify data, limit how much damage a single call can do. A “bulk_delete” operation is dangerous. A “delete_single_record” operation with confirmation is safer.

Progressive disclosure: Start with read-only tools. Only add write tools when you’ve validated the agent’s behavior. Only add admin-level tools after extensive testing.

Validation at the Boundary

Don’t trust LLM-provided parameters. Validate everything before execution.

Type checking: Even if the LLM API has schema validation, add your own checks. Ensure integers are actually integers, emails are well-formed, dates parse correctly.

Authorization checks: Verify the agent has permission to perform the action. Just because the LLM requested it doesn’t mean it should be allowed.

Rate limiting: Prevent runaway behavior. If the agent calls the same tool 50 times in a loop, something’s wrong. Detect and halt this.

Sanity checks: If a tool call doesn’t make sense in context (like calling “get_account_balance” before “authenticate_user”), reject it or warn the LLM.

Designing for Debuggability

When agents fail, you need to understand why.

Log all tool calls: Record what the LLM requested, what parameters it provided, and what your tool returned. This creates an audit trail for debugging.

Include context in errors: When returning an error, include enough information to understand the failure without needing to check logs separately.

Make tool behavior deterministic: If your tool has randomness or side effects, debugging becomes harder. Keep tool behavior predictable so failures are reproducible.

The Feedback Loop

Your tools will evolve as you observe agent behavior in production.

Monitor tool usage patterns: Which tools are called most? Which tools frequently error? Which tools are never used?

Identify common errors: If the LLM consistently passes wrong parameter types or calls tools inappropriately, improve descriptions or redesign the interface.

Deprecate poorly-designed tools: If a tool is frequently misused despite clear documentation, it’s not the LLM’s fault—it’s a bad interface. Redesign it or replace it.

What Good Tool Design Looks Like

A well-designed tool for LLM use:

  • Has a clear, single responsibility
  • Accepts simple, flat parameters with explicit types
  • Includes detailed descriptions with usage guidelines and format specifications
  • Returns structured, parseable results
  • Handles errors gracefully with actionable error messages
  • Is idempotent or protected against duplicate calls
  • Validates all inputs rigorously
  • Logs all usage for debugging

This might feel like over-engineering compared to APIs designed for humans. But LLMs aren’t humans. They don’t read documentation carefully, they don’t have compile-time checks, and they don’t intuitively understand your domain.

Design tools with these constraints in mind, and your agents become vastly more reliable. Ignore them, and you’ll spend most of your time debugging why the LLM called the wrong function with the wrong parameters at the wrong time.