Output Control with JSON and Schemas

Feb 8, 2026 — Free-form AI output is fragile in production. This article explains how to use JSON and schema validation to make LLM outputs safer, more predictable, and easier to integrate with deterministic systems.

The Problem with Free-Form Output

Unstructured AI output is a reliability hazard.

Consider this prompt:

Extract the customer's name, email, and order number from this message.

The model might respond:

“The customer is John Doe, john@example.com, order #12345”
“Name: John Doe\nEmail: john@example.com\nOrder: 12345”
“John Doe (john@example.com) placed order 12345”
“Customer name is John Doe. Contact at john@example.com. Order number: 12345.”

All are correct, but none are programmatically parseable without fragile regex.

JSON as Output Format

Basic JSON Constraint

prompt = f"""
Extract customer information.

Input: {message}

Output format: JSON
{{
    "name": "string",
    "email": "string",
    "order_number": "string"
}}

Extract:
"""

Benefits:

Parseable output
Type expectations clear
Consistent structure

Schema-Driven Prompting

Define Schema First

from pydantic import BaseModel, EmailStr

class CustomerInfo(BaseModel):
    name: str
    email: EmailStr
    order_number: str

# Generate prompt from schema
prompt = f"""
Extract customer information.

Input: {message}

Output format (JSON):
{CustomerInfo.model_json_schema()}

Extract:
"""

Validate Before Using

response = llm.generate(prompt)

try:
    # Parse and validate
    data = CustomerInfo.model_validate_json(response)

    # Now safe to use
    send_email(data.email)
    lookup_order(data.order_number)

except ValidationError as e:
    # Handle invalid output
    log_error("Schema validation failed", e)
    retry_with_clarification()

Benefits:

Type safety
Validation errors caught early
Self-documenting code

Nested and Complex Schemas

Multi-Level Data

class Product(BaseModel):
    name: str
    price: float
    quantity: int

class Order(BaseModel):
    customer_name: str
    customer_email: EmailStr
    products: list[Product]
    total: float
    notes: str | None = None

prompt = f"""
Extract order information.

Input: {order_text}

Output format (JSON):
{{
    "customer_name": "string",
    "customer_email": "email",
    "products": [
        {{"name": "str", "price": "float", "quantity": "int"}}
    ],
    "total": "float",
    "notes": "string or null"
}}

Extract:
"""

Enum Constraints

Problem: Unreliable Classification

# Free-form response
prompt = "Classify this email as urgent, normal, or low priority"
# Might return: "Urgent", "URGENT", "urgent!", "very urgent", "high priority"

Solution: Enum Schema

from enum import Enum

class Priority(str, Enum):
    URGENT = "urgent"
    NORMAL = "normal"
    LOW = "low"

class EmailClassification(BaseModel):
    priority: Priority
    category: Literal["support", "sales", "billing"]
    requires_response: bool

prompt = f"""
Classify this email.

Input: {email_text}

Output format (JSON):
{{
    "priority": "urgent" | "normal" | "low",
    "category": "support" | "sales" | "billing",
    "requires_response": true | false
}}

Classify:
"""

Benefits:

Only valid values accepted
No parsing ambiguity
Downstream code doesn’t break

Optional vs Required Fields

class ProductReview(BaseModel):
    rating: int  # Required
    review_text: str  # Required
    reviewer_name: str | None = None  # Optional
    would_recommend: bool = True  # Default value

prompt = f"""
Extract product review.

Input: {review}

Output format (JSON):
{{
    "rating": int (1-5, required),
    "review_text": "string (required)",
    "reviewer_name": "string or null (optional)",
    "would_recommend": bool (default: true)
}}

Extract:
"""

Validation Rules Beyond Types

Field Constraints

from pydantic import Field, field_validator

class UserProfile(BaseModel):
    username: str = Field(min_length=3, max_length=20, pattern="^[a-zA-Z0-9_]+$")
    age: int = Field(ge=13, le=120)
    bio: str = Field(max_length=500)

    @field_validator('username')
    def username_must_not_be_profane(cls, v):
        if is_profane(v):
            raise ValueError('Username contains inappropriate content')
        return v

Handling Extraction Failures

Graceful Degradation

class ExtractionResult(BaseModel):
    success: bool
    data: dict | None = None
    error: str | None = None
    confidence: Literal["high", "medium", "low"]

prompt = f"""
Extract structured data from text.

Input: {text}

Output format (JSON):
{{
    "success": true | false,
    "data": {{...}} | null,
    "error": "string if success=false" | null,
    "confidence": "high" | "medium" | "low"
}}

Rules:
- If extraction succeeds, set success=true and populate data
- If text is ambiguous or incomplete, set success=false, error="reason"
- Always provide confidence level

Extract:
"""

result = ExtractionResult.model_validate_json(response)

if result.success:
    process_data(result.data)
elif result.confidence == "low":
    request_human_review(text)
else:
    log_error(result.error)

Array Constraints

class Article(BaseModel):
    title: str
    tags: list[str] = Field(min_length=1, max_length=5)
    authors: list[str] = Field(min_length=1)
    related_articles: list[str] = Field(default_factory=list)

prompt = f"""
Extract article metadata.

Output format (JSON):
{{
    "title": "string",
    "tags": ["string"] (1-5 tags required),
    "authors": ["string"] (at least 1 required),
    "related_articles": ["string"] (optional, can be empty)
}}
"""

Unions and Discriminated Types

class ErrorResponse(BaseModel):
    type: Literal["error"]
    message: str
    code: str

class SuccessResponse(BaseModel):
    type: Literal["success"]
    data: dict

Response = ErrorResponse | SuccessResponse

def parse_response(response_json: str) -> Response:
    data = json.loads(response_json)
    if data["type"] == "error":
        return ErrorResponse.model_validate(data)
    else:
        return SuccessResponse.model_validate(data)

Schema Evolution

Version Schemas

class OrderV1(BaseModel):
    customer_name: str
    items: list[str]
    total: float

class OrderV2(BaseModel):
    customer_name: str
    customer_email: EmailStr  # New required field
    items: list[dict]  # Now structured
    total: float
    tax: float = 0.0  # New optional field

# Use appropriate schema based on context
if api_version == "v1":
    schema = OrderV1
else:
    schema = OrderV2

Real-World Example: Form Extraction

class Address(BaseModel):
    street: str
    city: str
    state: str = Field(pattern="^[A-Z]{2}$")
    zip_code: str = Field(pattern="^\\d{5}(-\\d{4})?$")

class ContactForm(BaseModel):
    first_name: str = Field(min_length=1)
    last_name: str = Field(min_length=1)
    email: EmailStr
    phone: str = Field(pattern="^\\+?1?\\d{10,}$")
    address: Address
    inquiry_type: Literal["sales", "support", "general"]
    message: str = Field(min_length=10, max_length=1000)

prompt = f"""
Extract contact form information.

Input: {form_text}

Output format (JSON):
{ContactForm.model_json_schema()}

Validation rules:
- first_name, last_name: Non-empty strings
- email: Valid email format
- phone: US format, 10+ digits
- state: 2-letter abbreviation (e.g., "CA")
- zip_code: 5 digits or 5+4 format
- inquiry_type: One of: sales, support, general
- message: 10-1000 characters

Extract:
"""

try:
    form = ContactForm.model_validate_json(llm.generate(prompt))
    # All validation passed, safe to process
    process_form(form)
except ValidationError as e:
    # Send back to LLM with error details for retry
    retry_prompt = f"""
Previous extraction failed validation:
{e}

Please re-extract following all rules exactly.
Input: {form_text}
"""

Performance Optimization

Caching Schema Strings

# Don't regenerate schema in every prompt
SCHEMAS = {
    "contact_form": ContactForm.model_json_schema(),
    "order": Order.model_json_schema(),
    "review": Review.model_json_schema()
}

# Reuse cached schemas
prompt = f"""
Input: {data}
Output format: {SCHEMAS['contact_form']}
"""

Streaming + Validation

async def extract_with_streaming(prompt: str, schema: type[BaseModel]):
    buffer = ""
    async for chunk in llm.stream(prompt):
        buffer += chunk

        # Try parsing incrementally
        try:
            obj = schema.model_validate_json(buffer)
            return obj  # Valid JSON received
        except:
            continue  # Keep accumulating

    raise ValueError("Stream ended without valid JSON")

Common Mistakes

❌ No validation, just parse JSON

# Dangerous: Assumes JSON is correct
data = json.loads(response)
send_email(data["email"])  # Might not be valid email

❌ Overly complex schemas

# Too nested, LLMs struggle with deep nesting
class Level5Nested(BaseModel):
    a: dict[str, list[dict[str, list[dict]]]]

❌ Not handling validation failures

# Missing try/except means unhandled exceptions
data = Schema.model_validate_json(response)

Best Practices

Define schemas before prompting
Always validate before using
Keep schemas simple (max 3 levels deep)
Provide clear examples in prompts
Log validation failures for debugging
Retry with error feedback when validation fails
Version your schemas

Conclusion

JSON + Schema validation transforms AI from unreliable text generator to structured data source.

Key benefits:

Predictable outputs: Same structure every time
Type safety: Downstream code doesn’t break
Validation: Catch errors before they propagate
Maintainability: Schemas document expected outputs

Free-form text is fine for humans. Production systems need structure.

Output Control with JSON and Schemas

The Problem with Free-Form Output

JSON as Output Format

Basic JSON Constraint

Schema-Driven Prompting

Define Schema First

Validate Before Using

Nested and Complex Schemas

Multi-Level Data

Enum Constraints

Problem: Unreliable Classification

Solution: Enum Schema

Optional vs Required Fields

Validation Rules Beyond Types

Field Constraints

Handling Extraction Failures

Graceful Degradation

Array Constraints

Unions and Discriminated Types

Schema Evolution

Version Schemas

Real-World Example: Form Extraction

Performance Optimization

Caching Schema Strings

Streaming + Validation

Common Mistakes

❌ No validation, just parse JSON

❌ Overly complex schemas

❌ Not handling validation failures

Best Practices

Conclusion

Continue learning