Self-Repair with Schema Reflection: Building Robust AI Systems Through Automated Error Correction

Introduction

As AI systems move from experimental prototypes to production applications, one challenge consistently emerges: ensuring that language model outputs conform to the precise data structures your application expects. While LLMs excel at generating human-readable text, they struggle with the strict formatting requirements of modern software systems—JSON objects with specific fields, enum values from controlled vocabularies, numerical ranges, and complex nested structures. A single malformed field can cascade into application errors, failed API calls, or corrupted data pipelines.

Self-repair with schema reflection addresses this reliability gap through a deceptively simple pattern: when an LLM generates output that violates a schema, feed the validation errors back to the model and ask it to fix the issues. This creates an iterative refinement loop where the model uses its language understanding to interpret validation failures and generate corrected output. Unlike brittle string parsing or regex-based fixes, schema reflection leverages the model's reasoning capabilities to understand what went wrong and how to correct it.

The power of this pattern lies in its generality and composability. Whether you're extracting structured data from documents, generating API requests, or orchestrating multi-step workflows, schema reflection provides a systematic approach to reliability. It transforms schema validation from a pass-fail gate into an active feedback mechanism that guides the model toward correct output. For AI engineers building production systems, mastering this pattern is essential for achieving the reliability standards that business-critical applications demand.

The Structured Output Problem

Language models are trained on unstructured text and optimized for next-token prediction, not schema compliance. When you prompt an LLM to generate JSON or structured data, you're asking it to perform a task that sits awkwardly outside its core competency. The model must simultaneously handle the semantic content (what information to include), the structural requirements (how to format it), and the syntactic constraints (valid JSON, correct types, required fields). Even powerful models like GPT-4 or Claude regularly produce outputs that fail schema validation—missing required fields, using incorrect types, inventing additional properties, or generating malformed syntax.

The consequences of schema violations vary by application but are universally problematic. In API orchestration systems, malformed requests fail immediately, wasting tokens and user time. In data extraction pipelines, schema errors corrupt downstream processing and require manual intervention. In agent frameworks that chain multiple LLM calls, a single structural error can derail an entire multi-step workflow. Traditional software engineering approaches like type checking and contract validation happen at compile time or through static analysis, but LLM outputs are fundamentally runtime artifacts that can't be validated until after generation.

Early approaches to structured output relied on careful prompting—providing examples, being explicit about required fields, using delimiters or special formatting. These techniques help but remain fundamentally unreliable. No amount of prompt engineering can guarantee schema compliance because the model's generation process doesn't have direct access to schema constraints during token prediction. The model generates text that it predicts will match the pattern you described, but it lacks a verification mechanism to ensure compliance. This gap between generation and validation creates the need for post-generation repair mechanisms.

Understanding Schema Reflection

Schema reflection treats validation errors as structured feedback that guides iterative improvement. The core insight is that modern LLMs are remarkably good at understanding error messages and technical specifications when presented in natural language. When you show a model the schema it was supposed to follow, the output it actually generated, and the specific validation errors that occurred, it can usually determine what needs to change. This capability emerges from the model's training on vast amounts of technical documentation, code, and debugging discussions.

The reflection process operates on three information sources: the original schema specification (JSON Schema, Pydantic model, TypeScript interface, etc.), the generated output that failed validation, and the structured error messages from the validator. These components form a complete picture of what went wrong. The validator provides precise diagnostics—"field 'email' is required but missing" or "value 'maybe' is not in enum ['yes', 'no', 'unknown']"—that the model can interpret and act on. Unlike vague prompts like "try again" or "fix the errors," schema reflection provides actionable, specific guidance for each repair attempt.

The technique builds on the broader pattern of using LLMs for error correction and debugging. Just as models can help developers debug code by analyzing error messages and suggesting fixes, they can debug their own outputs when given appropriate feedback. This meta-cognitive capability—reasoning about and correcting one's own outputs—represents a significant shift from treating LLMs as black-box text generators. Schema reflection operationalizes this capability into a reliable engineering pattern that can be measured, optimized, and integrated into production systems.

What distinguishes schema reflection from simpler retry mechanisms is its use of structured, informative feedback. Rather than just prompting the model to "generate valid JSON," you provide the exact validation errors that need to be addressed. This specificity dramatically improves success rates and convergence speed. In practice, most schema violations can be corrected in 1-2 repair iterations when the model receives clear error feedback, compared to the significantly lower success rate of naive retries without reflection.

Core Implementation Pattern

Implementing self-repair with schema reflection requires orchestrating schema validation, error extraction, and iterative model calls within a controlled retry loop. Here's a production-ready TypeScript implementation using Zod for schema validation:

import { z } from 'zod';
import Anthropic from '@anthropic-ai/sdk';

interface RepairResult<T> {
  data: T;
  success: boolean;
  attempts: number;
  errors: string[];
}

interface RepairConfig {
  maxAttempts?: number;
  temperature?: number;
  onAttempt?: (attempt: number, error: string) => void;
}

class SchemaRepairer<T> {
  constructor(
    private schema: z.ZodSchema<T>,
    private client: Anthropic,
    private config: RepairConfig = {}
  ) {}

  async generate(
    prompt: string,
    systemPrompt?: string
  ): Promise<RepairResult<T>> {
    const maxAttempts = this.config.maxAttempts ?? 3;
    const errors: string[] = [];
    let lastOutput: string = '';

    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      try {
        // Construct messages based on attempt number
        const messages = this.buildMessages(
          prompt,
          lastOutput,
          errors[errors.length - 1],
          attempt
        );

        // Generate response
        const response = await this.client.messages.create({
          model: 'claude-3-5-sonnet-20241022',
          max_tokens: 4096,
          temperature: this.config.temperature ?? 0,
          system: systemPrompt,
          messages,
        });

        const content = response.content[0];
        if (content.type !== 'text') {
          throw new Error('Expected text response');
        }

        lastOutput = this.extractJSON(content.text);

        // Parse and validate against schema
        const parsed = JSON.parse(lastOutput);
        const validated = this.schema.parse(parsed);

        // Success - output matches schema
        return {
          data: validated,
          success: true,
          attempts: attempt,
          errors,
        };
      } catch (error) {
        const errorMessage = this.formatError(error);
        errors.push(errorMessage);

        this.config.onAttempt?.(attempt, errorMessage);

        // If this was the last attempt, return failure
        if (attempt === maxAttempts) {
          throw new Error(
            `Failed to generate valid output after ${maxAttempts} attempts. ` +
            `Last error: ${errorMessage}`
          );
        }
      }
    }

    // TypeScript exhaustiveness check
    throw new Error('Unreachable');
  }

  private buildMessages(
    originalPrompt: string,
    lastOutput: string,
    lastError: string | undefined,
    attempt: number
  ): Anthropic.MessageParam[] {
    if (attempt === 1) {
      // First attempt - include schema in prompt
      return [
        {
          role: 'user',
          content: `${originalPrompt}

Your response must conform to this schema:
${this.schemaToString()}

Respond with valid JSON only, no additional text.`,
        },
      ];
    }

    // Repair attempt - include previous output and error
    return [
      {
        role: 'user',
        content: originalPrompt,
      },
      {
        role: 'assistant',
        content: lastOutput,
      },
      {
        role: 'user',
        content: `Your previous response failed validation with this error:

${lastError}

Required schema:
${this.schemaToString()}

Please generate a corrected response that conforms to the schema. Focus on fixing the specific validation errors. Respond with valid JSON only.`,
      },
    ];
  }

  private schemaToString(): string {
    // Convert Zod schema to human-readable format
    // In production, you might use JSON Schema generation
    try {
      const jsonSchema = zodToJsonSchema(this.schema);
      return JSON.stringify(jsonSchema, null, 2);
    } catch {
      return this.schema.toString();
    }
  }

  private extractJSON(text: string): string {
    // Extract JSON from markdown code blocks or plain text
    const jsonMatch = text.match(/```(?:json)?\n?([\s\S]*?)\n?```/);
    if (jsonMatch) {
      return jsonMatch[1].trim();
    }
    // Try to find JSON object in text
    const objectMatch = text.match(/\{[\s\S]*\}/);
    if (objectMatch) {
      return objectMatch[0];
    }
    return text.trim();
  }

  private formatError(error: unknown): string {
    if (error instanceof z.ZodError) {
      return error.issues
        .map(
          (issue) =>
            `- Path "${issue.path.join('.')}": ${issue.message} ` +
            `(received: ${JSON.stringify(issue.received)})`
        )
        .join('\n');
    }
    if (error instanceof SyntaxError) {
      return `JSON parsing error: ${error.message}`;
    }
    return String(error);
  }
}

// Helper to convert Zod to JSON Schema (simplified)
function zodToJsonSchema(schema: z.ZodSchema): object {
  // In production, use zod-to-json-schema package
  return { type: 'object', description: 'Schema representation' };
}

This implementation demonstrates several critical patterns. First, the message history builds naturally through the conversation—the model sees its previous attempt and the specific errors, creating clear context for repair. Second, error formatting extracts actionable information from Zod's validation errors, translating internal error objects into human-readable descriptions the model can understand. Third, the JSON extraction logic handles common variations in how models output structured data, including markdown code blocks and embedded JSON.

For Python-based systems, the Instructor library provides battle-tested schema reflection with Pydantic integration:

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field, field_validator
from typing import Literal, Optional

# Patch OpenAI client with Instructor
client = instructor.from_openai(OpenAI())

class UserProfile(BaseModel):
    """Structured user profile extracted from text."""
    
    name: str = Field(description="Full name of the user")
    email: str = Field(description="Email address")
    age: Optional[int] = Field(
        None, 
        ge=0, 
        le=150,
        description="Age in years"
    )
    subscription_tier: Literal["free", "pro", "enterprise"] = Field(
        description="Subscription level"
    )
    interests: list[str] = Field(
        description="List of interests or hobbies"
    )
    
    @field_validator('email')
    @classmethod
    def validate_email(cls, v: str) -> str:
        if '@' not in v:
            raise ValueError('Must be a valid email address')
        return v.lower()

# Instructor handles retry and repair automatically
def extract_user_profile(text: str) -> UserProfile:
    return client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        response_model=UserProfile,
        max_retries=3,  # Automatic repair attempts
        messages=[
            {
                "role": "system",
                "content": "Extract structured user profile from text."
            },
            {
                "role": "user",
                "content": text
            }
        ],
    )

# Usage
text = """
Hi, I'm Sarah Johnson (sarah.j@example.com), and I'm 32 years old. 
I'm currently on the enterprise plan. I love hiking, photography, 
and learning about machine learning.
"""

try:
    profile = extract_user_profile(text)
    print(f"Successfully extracted: {profile.model_dump_json(indent=2)}")
except Exception as e:
    print(f"Failed to extract valid profile: {e}")

The Instructor library abstracts much of the retry logic while exposing hooks for custom validation, error formatting, and repair strategies. Its integration with Pydantic means you get the full power of Python's type system and validation framework, including custom validators, computed fields, and complex nested models. The library automatically converts Pydantic validation errors into prompts that guide the model toward correct output.

Advanced Patterns and Optimization

Production systems benefit from several optimization strategies that improve reliability and reduce costs. Partial repair represents one of the most impactful optimizations: instead of regenerating the entire output when validation fails, extract and repair only the invalid portions. This approach dramatically reduces token usage in scenarios with large outputs and localized errors. Implementation requires careful prompt engineering to show the model which specific fields need repair while keeping the valid portions unchanged.

from pydantic import BaseModel, ValidationError
import json

class PartialRepairer:
    """Repairs only invalid fields instead of regenerating entire output."""
    
    def __init__(self, client, model: str = "gpt-4"):
        self.client = client
        self.model = model
    
    def repair_fields(
        self,
        original_data: dict,
        schema: type[BaseModel],
        validation_error: ValidationError
    ) -> dict:
        """Repair only the fields that failed validation."""
        
        # Extract invalid field paths from validation error
        invalid_paths = {
            '.'.join(map(str, error['loc']))
            for error in validation_error.errors()
        }
        
        # Build targeted repair prompt
        error_descriptions = '\n'.join([
            f"- {error['loc']}: {error['msg']}"
            for error in validation_error.errors()
        ])
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {
                    "role": "system",
                    "content": "Repair only the invalid fields in the JSON object."
                },
                {
                    "role": "user",
                    "content": f"""Original object:
{json.dumps(original_data, indent=2)}

Validation errors:
{error_descriptions}

Schema requirements:
{schema.model_json_schema()}

Return a JSON object with ONLY the fields that need repair. Keep the same structure as the original."""
                }
            ],
            temperature=0
        )
        
        # Merge repaired fields back into original data
        repaired_fields = json.loads(response.choices[0].message.content)
        return self.deep_merge(original_data, repaired_fields)
    
    def deep_merge(self, base: dict, updates: dict) -> dict:
        """Recursively merge updates into base dict."""
        result = base.copy()
        for key, value in updates.items():
            if key in result and isinstance(result[key], dict) and isinstance(value, dict):
                result[key] = self.deep_merge(result[key], value)
            else:
                result[key] = value
        return result

Confidence-based early stopping improves cost efficiency by detecting when continued repair attempts are unlikely to succeed. Some validation errors indicate fundamental misunderstandings that won't be resolved through iteration—the model lacks necessary information, the schema is ambiguous, or the task is genuinely too difficult. Tracking error types across attempts and detecting repeated failures allows intelligent early termination before exhausting retry budgets.

Schema simplification and decomposition addresses complex nested schemas by breaking them into smaller validation units. Instead of validating an entire complex object in one pass, validate top-level structure first, then nested components separately. This provides more targeted error feedback and allows partial success—portions of the output that validate correctly can be preserved even if other sections require repair. The trade-off is increased orchestration complexity, but for highly nested schemas with independent validation concerns, decomposition often improves both reliability and token efficiency.

Error message customization tailors validation feedback to the model's understanding. Default validator error messages target human developers and may include technical jargon or implementation details that don't help the model. Custom error formatters can translate validation failures into clearer natural language: instead of "value is not a valid enumeration member," explain "the field must be exactly one of: 'pending', 'approved', 'rejected'." This optimization requires upfront engineering but pays dividends in reduced repair iterations.

Integration with Modern AI Frameworks

Schema reflection integrates naturally with function calling and tool use patterns that have become standard in modern LLM APIs. OpenAI's function calling, Anthropic's tool use, and Google's function declarations all provide structured ways to define expected outputs. These APIs perform internal validation and often implement automatic retries, essentially offering built-in schema reflection. However, understanding the underlying pattern remains valuable because you can customize retry logic, error messaging, and validation rules beyond what the API provides.

LangChain's output parsers and structured output chains provide high-level abstractions over schema reflection. The StructuredOutputParser and PydanticOutputParser handle validation and repair through configurable retry logic. For applications already built on LangChain, these parsers integrate seamlessly with existing chains and agents. The abstraction comes at a cost of reduced control over the repair process, but for many use cases, the convenience outweighs this limitation.

import { z } from 'zod';
import { ChatOpenAI } from '@langchain/openai';
import { StructuredOutputParser } from 'langchain/output_parsers';
import { PromptTemplate } from '@langchain/core/prompts';

// Define schema
const productSchema = z.object({
  name: z.string().describe('Product name'),
  price: z.number().positive().describe('Price in USD'),
  category: z.enum(['electronics', 'clothing', 'books', 'food']),
  in_stock: z.boolean().describe('Whether product is in stock'),
  tags: z.array(z.string()).max(5).describe('Product tags'),
});

// Create parser with automatic repair
const parser = StructuredOutputParser.fromZodSchema(productSchema);

const chain = PromptTemplate.fromTemplate(
  `Extract product information from this text.
  
{format_instructions}

Text: {text}
`
).pipe(
  new ChatOpenAI({ 
    modelName: 'gpt-4',
    temperature: 0,
  })
).pipe(parser);

// Parser automatically handles validation and repair
const result = await chain.invoke({
  text: "The UltraWidget Pro costs $199.99. It's an electronics item, currently out of stock. Tagged: premium, wireless, smart, ergonomic",
  format_instructions: parser.getFormatInstructions(),
});

console.log(result);
// {
//   name: "UltraWidget Pro",
//   price: 199.99,
//   category: "electronics",
//   in_stock: false,
//   tags: ["premium", "wireless", "smart", "ergonomic"]
// }

Agent frameworks like LangGraph, AutoGPT, and CrewAI face particularly acute structured output challenges because agent actions must conform to precise tool schemas. Schema reflection becomes a reliability layer that prevents cascading failures in multi-step agent workflows. When an agent generates an invalid tool call, reflection allows automatic correction before execution, avoiding wasted steps and improving overall task completion rates.

Trade-offs and Limitations

The most direct trade-off in schema reflection is computational cost versus reliability. Each repair attempt consumes additional tokens and adds latency. For simple schemas with high baseline accuracy, the overhead may exceed the benefits—the cost of validation and potential repair outweighs the cost of occasional failures. This trade-off calculation depends heavily on your failure cost model: if downstream processing is expensive or schema violations have business impact, repair costs are justified. If failures are cheap to detect and retry at the application level, simpler approaches may suffice.

Schema reflection works best when validation errors are informative and actionable. Validators that produce vague errors like "invalid value" or "constraint violation" without specifics provide little guidance for repair. The quality of your schema definitions directly impacts repair success rates—well-documented schemas with clear field descriptions and constraints enable better repair than minimal schemas with bare type definitions. This creates additional upfront engineering cost in schema design that must be balanced against runtime reliability improvements.

The technique cannot overcome fundamental capability limitations. If a model lacks the knowledge to generate required information, no amount of reflection will produce valid output. Schema reflection addresses formatting and structural issues, not knowledge gaps or reasoning failures. When tasks require specialized domain knowledge, rare vocabulary, or complex inference, reflection may converge on structurally valid but semantically incorrect outputs. This limitation highlights the importance of combining schema reflection with other reliability patterns like retrieval-augmented generation and human-in-the-loop validation for knowledge-intensive applications.

Repair attempts can introduce subtle semantic drift from the original intent. Each iteration involves regenerating content, and even with careful prompting to preserve valid fields, the model may make unnecessary changes to working outputs while fixing invalid ones. Production systems should implement monitoring to detect excessive semantic changes across repair iterations, potentially implementing diff-based validation that rejects repairs that alter more than the invalid fields.

Maximum retry limits must be carefully tuned based on your latency and cost budgets. Too few retries result in preventable failures; too many waste resources on unsalvageable outputs. Empirical analysis of your specific schemas and task types should guide these limits. In practice, most valid repairs succeed within 2-3 attempts when errors are clear—if validation still fails after three attempts, the issue is likely fundamental rather than superficial.

Best Practices for Production Systems

Comprehensive schema design forms the foundation of effective schema reflection. Every field should include descriptions that explain not just the type but the semantic meaning and constraints. Provide examples of valid values, especially for enums and constrained strings. Use validation rules that can generate specific error messages—prefer enum validation over open-ended string fields, use range constraints on numbers, and define max lengths on arrays and strings. The more specific your schema, the more actionable the validation feedback.

Implement multi-level validation strategies that check both structural correctness and semantic validity. Structural validation (JSON syntax, required fields, correct types) catches the most common errors and can be repaired most reliably. Semantic validation (email format, date ranges, cross-field constraints) requires more sophisticated understanding. Consider running these validation levels sequentially—repair structural issues first, then tackle semantic problems. This separation reduces cognitive load on the model and improves convergence rates.

Logging and observability for schema reflection should capture the full repair history: initial generation, validation errors, repair attempts, and final outcomes. This data enables several critical capabilities: debugging problematic schemas that consistently fail validation, identifying patterns in common errors that suggest prompt improvements, detecting when certain task types require higher retry budgets, and measuring the token cost overhead of reflection for cost optimization. Modern observability platforms like LangSmith, Weights & Biases, or custom metrics systems should treat schema reflection as a first-class workflow with dedicated instrumentation.

Graceful degradation strategies handle cases where repair fails to produce valid output within retry limits. Rather than hard failures, consider returning partial results with validation metadata that indicates which fields are reliable. For non-critical fields, you might relax validation constraints after repair attempts fail, accepting best-effort outputs rather than complete failure. For critical applications, queue failed attempts for human review rather than blocking user workflows. The appropriate degradation strategy depends on your domain—a medical application might prefer failure over invalid data, while a content recommendation system might accept some structural imperfection.

Performance optimization should focus on the most expensive cases: large schemas with nested complexity and high token counts. Consider caching validated outputs for identical or similar prompts—once you've successfully generated and validated output for a query, similar queries may benefit from few-shot examples of correct format. For high-throughput applications, implement adaptive retry limits based on schema complexity metrics: simple flat schemas get fewer retries, complex nested schemas get more. This dynamic allocation improves overall system efficiency while maintaining reliability where it matters most.

Mental Models and Key Insights

The compiler analogy provides an intuitive mental model for schema reflection. Just as compilers report type errors and constraint violations to developers who then fix their code, schema validators report errors to LLMs that fix their outputs. The key difference is that LLMs repair at runtime through regeneration rather than static source code editing. This analogy helps engineers understand that schema reflection isn't "cheating" or a hack—it's applying proven software engineering practices (type checking, constraint validation, iterative error correction) to the domain of probabilistic text generation.

Understanding the 80/20 of schema reflection focuses engineering effort where it matters most. The critical 20% that delivers 80% of reliability improvements consists of: clear, well-documented schemas with specific constraints; informative error messages that identify exactly what's wrong; and retry logic that provides previous outputs and errors as context. These three elements—good schemas, good errors, good context—enable successful repair in the vast majority of cases. Sophisticated partial repair, dynamic retry limits, and complex validation strategies provide incremental improvements but aren't necessary for basic effectiveness.

The progressive enhancement pattern treats schema reflection as a reliability layer over best-effort prompting. Start with prompts that encourage structured output through examples and clear instructions. Add schema validation as a quality gate. Introduce repair only when validation fails. This layered approach means you pay the cost of reflection only when needed, while simpler cases succeed on first attempt. It also provides graceful performance degradation—if retry budgets are exhausted or unexpected errors occur, you fall back to validation without repair, and ultimately to unvalidated output with appropriate error handling.

Key Takeaways

Five practical steps for implementing schema reflection in your AI systems:

Start with explicit, documented schemas: Use Zod, Pydantic, or JSON Schema with detailed field descriptions and constraints. Every field should explain not just the type but the meaning and valid ranges. This investment in schema quality directly translates to repair success rates.
Implement structured error extraction: Transform validator errors into clear natural language that identifies the specific problem and the expected format. Generic "validation failed" messages provide no actionable guidance; specific "field 'status' must be one of ['active', 'pending', 'inactive'] but received 'running'" enables targeted repair.
Build message context carefully: Show the model its previous attempt, the specific validation errors, and the schema requirements. This triad of information—what you generated, why it failed, what's required—enables effective self-correction. Don't just retry with the same prompt.
Set realistic retry limits: Start with 2-3 attempts for typical use cases. Monitor actual repair success rates and adjust based on your schemas and task complexity. Most valid repairs succeed quickly; if you're regularly exhausting retry budgets, the problem likely lies in schema design or task difficulty rather than retry limits.
Monitor and iterate on schema design: Track which schemas and fields consistently fail validation and require multiple repair attempts. Use this data to refine schema designs, improve field descriptions, or identify tasks that need different approaches. Schema reflection provides a rich feedback signal about what's difficult for the model—use it to continuously improve your system.

Conclusion

Self-repair with schema reflection represents a fundamental shift in how we engineer reliable AI systems. Rather than treating language models as black boxes that either succeed or fail, reflection creates a feedback loop that leverages the model's own reasoning capabilities to correct errors. This pattern transforms schema validation from a binary gate into an active participant in the generation process, guiding models toward correct outputs through iterative refinement.

The technique's true power emerges from its generality and composability. Schema reflection works across languages, frameworks, and model providers. It composes naturally with other reliability patterns like retrieval-augmented generation, multi-agent validation, and human-in-the-loop review. As AI systems grow more complex—orchestrating multiple models, chaining tool calls, managing stateful workflows—the need for reliable structured outputs intensifies. Schema reflection provides a principled, engineering-driven approach to meeting these reliability requirements without sacrificing the flexibility and power that make LLMs valuable.

For AI engineers building production systems today, schema reflection should be a standard component of the reliability toolkit. Combined with thoughtful schema design, comprehensive observability, and appropriate retry strategies, it enables the level of structured output reliability that business-critical applications demand. As the field matures, we'll see continued evolution in how validation feedback guides generation—more sophisticated error analysis, model-specific repair strategies, and tighter integration between schemas and generation processes. But the core insight remains: structured feedback enables reliable structured output, turning validation from a barrier into a bridge between probabilistic generation and deterministic requirements.

References

Liu, J., Xia, C. S., Wang, Y., & Zhang, L. (2023). "Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation." arXiv preprint arXiv:2305.01210.
OpenAI. (2024). "Function Calling and Structured Outputs." OpenAI API Documentation. https://platform.openai.com/docs/guides/function-calling
Anthropic. (2024). "Tool Use (Function Calling) with Claude." Anthropic Documentation. https://docs.anthropic.com/en/docs/tool-use
Liu, J. (2023). "Instructor: Structured Outputs for LLMs." GitHub Repository. https://github.com/jxnl/instructor
Pydantic. (2024). "Pydantic V2 Documentation: Data Validation using Python Type Hints." https://docs.pydantic.dev/
Collobert, C. (2023). "Zod: TypeScript-first Schema Validation with Static Type Inference." Zod Documentation. https://zod.dev/
JSON Schema. (2024). "JSON Schema: A Vocabulary for Structural Validation." https://json-schema.org/
LangChain. (2024). "Output Parsers: Structured Output from Language Models." LangChain Documentation. https://python.langchain.com/docs/modules/model_io/output_parsers/
Beurer-Kellner, L., Fischer, M., & Vechev, M. (2023). "Guiding Language Models with Type Specifications and Repair." arXiv preprint arXiv:2303.08128.
Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., Gupta, S., Majumder, B. P., Hermann, K., Welleck, S., Yazdanbakhsh, A., & Clark, P. (2023). "Self-Refine: Iterative Refinement with Self-Feedback." Advances in Neural Information Processing Systems (NeurIPS), 36.