Applying SOLID to prompt engineering: why your prompts violate the single responsibility principle

Introduction

When Robert C. Martin formalized the SOLID principles in the early 2000s, he was addressing a fundamental problem in object-oriented design: systems become unmaintainable when responsibilities blur, dependencies tangle, and modifications cascade unpredictably. Two decades later, we face precisely the same problem in a new domain. As organizations scale their use of large language models, the prompts that drive these systems exhibit the exact same pathologies that SOLID was designed to prevent. A single prompt tries to do too much. Modifications break existing functionality. Subtle variations require complete rewrites. The parallel isn't metaphorical—it's structural.

The discipline of prompt engineering has evolved rapidly from simple question-and-answer patterns to complex multi-step orchestrations involving retrieval, reasoning, validation, and formatting. As these systems grow more sophisticated, the engineering challenges mirror those faced by early object-oriented practitioners. Prompts become the fundamental units of composition in LLM systems, just as classes became the fundamental units of composition in OOP. The same design principles that made object-oriented systems maintainable apply directly to making prompt-based systems reliable. When you understand why your prompt is failing, you'll often find it's committing the same SOLID violation you'd reject in a code review.

The SOLID Principles: A Brief Recap

SOLID represents five foundational principles: Single Responsibility Principle (SRP), Open/Closed Principle (OCP), Liskov Substitution Principle (LSP), Interface Segregation Principle (ISP), and Dependency Inversion Principle (DIP). Each addresses a specific failure mode in system design. SRP demands that each module has one reason to change. OCP requires systems to be open for extension but closed for modification. LSP ensures that substitutable components maintain behavioral contracts. ISP prevents clients from depending on interfaces they don't use. DIP insists that high-level modules shouldn't depend on low-level details. Together, these principles create systems that are maintainable, testable, and adaptable to change.

These principles emerged from observing real codebases under stress. Martin didn't invent them abstractly—he identified patterns that separated systems that could evolve from those that calcified. The principles work because they align technical structure with organizational reality. Code changes for business reasons, and SOLID minimizes the blast radius of those changes. The same forces apply to prompt-based systems. Your marketing team changes brand voice guidelines. Your legal team updates compliance requirements. Your product team refines the user experience. Each change should touch the minimum necessary surface area. When a prompt tries to handle multiple concerns simultaneously, every change becomes risky.

Single Responsibility Principle: The Most Violated Rule in Prompt Design

The single responsibility principle states that a class should have only one reason to change. In prompt engineering, this translates directly: a prompt should have only one reason to be rewritten. The most common anti-pattern is the "god prompt"—a monolithic instruction set that simultaneously handles input validation, business logic, output formatting, error handling, and tone management. Consider a typical customer service prompt that tries to validate the inquiry, determine intent, retrieve relevant information, generate a response, format it appropriately, and ensure brand compliance all in one pass. When any single aspect needs adjustment—when the brand voice changes, when the formatting requirements evolve, when new intents are added—the entire prompt must be rewritten and re-validated.

This violation manifests in unreliable behavior that's difficult to debug. When a god prompt fails, you can't isolate which responsibility caused the failure. Did the intent classification go wrong? Did the retrieval step miss context? Is the formatting broken? Did tone drift occur? The debugging process becomes archaeological, digging through a wall of interleaved instructions trying to understand interaction effects. More insidiously, the prompt becomes unmaintainable. Team members fear touching it because changes in one area mysteriously break behavior in another. The prompt accretes defensive instructions—bandages for edge cases—until it's a brittle, complex artifact that nobody fully understands.

The solution mirrors the one in object-oriented design: decompose by responsibility. A prompt that currently handles inquiry routing, sentiment analysis, response generation, and formatting should be four prompts, each with a single job. The inquiry router classifies intent and nothing else. The sentiment analyzer evaluates emotional content in isolation. The response generator focuses purely on content quality given classified intent and sentiment. The formatter takes well-formed content and structures it according to output requirements. Each component becomes independently testable, debuggable, and modifiable. When brand voice changes, you touch only the response generator. When output format requirements shift, only the formatter changes. The blast radius of modifications shrinks dramatically.

# SOLID Violation: God Prompt
def handle_customer_inquiry(inquiry: str) -> str:
    """A monolithic prompt that does everything."""
    prompt = f"""
    You are a customer service assistant. 
    
    Analyze this inquiry and:
    1. Determine if it's a complaint, question, or request
    2. Check for urgent or angry sentiment
    3. If it's a refund request, verify it's within 30 days
    4. If it's a technical issue, classify the severity
    5. Generate an appropriate response with empathetic tone
    6. Format the response in HTML with proper styling
    7. Ensure compliance with company voice guidelines
    8. Add appropriate disclaimers if discussing refunds
    
    Inquiry: {inquiry}
    
    Provide your response as formatted HTML.
    """
    return llm.generate(prompt)

# SOLID Compliant: Single Responsibility per Prompt
class InquiryClassifier:
    """One job: determine inquiry type."""
    def classify(self, inquiry: str) -> InquiryType:
        prompt = f"""Classify this customer inquiry into one category:
        - complaint
        - question  
        - refund_request
        - technical_issue
        
        Inquiry: {inquiry}
        
        Respond with only the category name."""
        return InquiryType(llm.generate(prompt).strip())

class SentimentAnalyzer:
    """One job: evaluate emotional content."""
    def analyze(self, inquiry: str) -> Sentiment:
        prompt = f"""Rate the sentiment of this message on a scale:
        calm, concerned, frustrated, angry
        
        Message: {inquiry}
        
        Respond with only the sentiment level."""
        return Sentiment(llm.generate(prompt).strip())

class ResponseGenerator:
    """One job: create appropriate content."""
    def generate(self, inquiry: str, inquiry_type: InquiryType, 
                 sentiment: Sentiment) -> str:
        prompt = f"""Generate a customer service response to this {inquiry_type.value}.
        The customer appears {sentiment.value}.
        Focus only on content quality and appropriate tone.
        
        Inquiry: {inquiry}"""
        return llm.generate(prompt)

class ResponseFormatter:
    """One job: structure output."""
    def format_html(self, response: str, disclaimers: list[str]) -> str:
        prompt = f"""Format this response as clean HTML with:
        - Professional styling
        - Proper paragraph breaks
        - Any disclaimers at the end
        
        Response: {response}
        Disclaimers: {disclaimers}"""
        return llm.generate(prompt)

# Orchestration layer composes single-responsibility components
def handle_customer_inquiry(inquiry: str) -> str:
    inquiry_type = InquiryClassifier().classify(inquiry)
    sentiment = SentimentAnalyzer().analyze(inquiry)
    response = ResponseGenerator().generate(inquiry, inquiry_type, sentiment)
    disclaimers = get_disclaimers_for_type(inquiry_type)
    return ResponseFormatter().format_html(response, disclaimers)

The decomposed version is more code, but it's dramatically more maintainable code. Each component can be tested in isolation with clear success criteria. The classifier either correctly identifies inquiry types or it doesn't—no ambiguity. The sentiment analyzer can be validated against labeled examples. The response generator can be evaluated for content quality independently of formatting. When you need to add a new inquiry type, you modify only the classifier's logic and add a corresponding response template. The formatting and sentiment analysis remain untouched and don't need revalidation.

Open/Closed Principle: Extending Without Breaking

The open/closed principle demands that software entities should be open for extension but closed for modification. In prompt engineering, this means your core prompts should accommodate new requirements through composition and parameterization, not through editing the base instruction set. Consider a prompt that generates product descriptions. Initially it handles basic features and benefits. Then marketing wants lifestyle context. Then SEO requires specific keyword integration. Then internationalization needs cultural adaptation. If each requirement involves editing the core prompt, you're violating OCP. Each modification risks breaking existing, validated behavior.

The pattern that emerges violates OCP in a specific way: you're afraid to touch the prompt. It works—mostly—and you've learned through painful experience that seemingly innocuous changes can break edge cases. So you add conditional logic, special cases, and increasingly baroque instruction chains trying to handle new requirements without disturbing the core. The prompt becomes a palimpsest of workarounds. You can see the archaeological layers: the original simple instruction, then the first major revision, then the defensive clauses added when that broke something, then the special case handling, then the meta-instructions about how to interpret the earlier instructions. It's fragile and incomprehensible.

The solution is to design prompts with extension points. Use template parameters for varying content rather than conditionals in instructions. Create role-specific prompt fragments that can be composed. Build a library of modifier prompts that post-process or pre-process without touching the core logic. The key insight is separation of the stable core from the variable periphery. Your core product description logic—how to translate features into compelling copy—is stable. The variable elements are context (lifestyle, technical, enterprise), constraints (length, keywords, reading level), and style (formal, casual, enthusiastic). These should be parameters or composable fragments, not conditional branches in the instruction set.

// SOLID Violation: Modifying core prompt for each new requirement
function generateProductDescription(product: Product, options: any): string {
  let prompt = `Generate a product description for ${product.name}.`;
  
  // Each new requirement modifies the core prompt
  if (options.includeLifestyle) {
    prompt += ` Include lifestyle context showing how it fits into daily life.`;
  }
  if (options.seoKeywords) {
    prompt += ` Incorporate these keywords naturally: ${options.seoKeywords.join(', ')}.`;
  }
  if (options.culturalContext) {
    prompt += ` Adapt for ${options.culturalContext} cultural preferences.`;
  }
  if (options.readingLevel) {
    prompt += ` Write at a ${options.readingLevel} reading level.`;
  }
  // ... prompt grows with each new requirement
  
  return llm.generate(prompt);
}

// SOLID Compliant: Open for extension, closed for modification
interface PromptModifier {
  apply(basePrompt: string, context: any): string;
}

class CoreDescriptionPrompt {
  /**
   * Core prompt is stable and never modified.
   * It focuses only on the essential transformation: product data to description.
   */
  private readonly template = `Generate a compelling product description.
    
Product: {product_name}
Features: {features}
Benefits: {benefits}

Focus on clarity and value proposition.`;

  build(product: Product): string {
    return this.template
      .replace('{product_name}', product.name)
      .replace('{features}', product.features.join(', '))
      .replace('{benefits}', product.benefits.join(', '));
  }
}

class LifestyleModifier implements PromptModifier {
  apply(basePrompt: string, context: any): string {
    return basePrompt + `\n\nAdditional instruction: Include a scenario showing this product in daily life.`;
  }
}

class SEOModifier implements PromptModifier {
  apply(basePrompt: string, context: { keywords: string[] }): string {
    return basePrompt + `\n\nAdditional instruction: Naturally incorporate these terms: ${context.keywords.join(', ')}.`;
  }
}

class CulturalModifier implements PromptModifier {
  apply(basePrompt: string, context: { culture: string }): string {
    return basePrompt + `\n\nAdditional instruction: Adapt references and examples for ${context.culture} audience.`;
  }
}

class ReadingLevelModifier implements PromptModifier {
  apply(basePrompt: string, context: { level: string }): string {
    return basePrompt + `\n\nAdditional instruction: Write at ${context.level} reading level.`;
  }
}

/**
 * Extension happens through composition, not modification.
 * New requirements become new modifiers, not edits to core prompt.
 */
class DescriptionGenerator {
  private core = new CoreDescriptionPrompt();
  private modifiers: PromptModifier[] = [];
  
  withModifier(modifier: PromptModifier): this {
    this.modifiers.push(modifier);
    return this;
  }
  
  generate(product: Product, context: any = {}): string {
    let prompt = this.core.build(product);
    
    // Apply each modifier in sequence
    for (const modifier of this.modifiers) {
      prompt = modifier.apply(prompt, context);
    }
    
    return llm.generate(prompt);
  }
}

// Usage: extend without modifying core
const generator = new DescriptionGenerator()
  .withModifier(new LifestyleModifier())
  .withModifier(new SEOModifier())
  .withModifier(new CulturalModifier());

const description = generator.generate(product, {
  keywords: ['ergonomic', 'productivity'],
  culture: 'Japanese'
});

This architecture makes change safe. When you need to add A/B testing for different tones, you create a ToneModifier without touching the core or any existing modifiers. When SEO requirements evolve, you update SEOModifier in isolation and re-validate just that component. The core prompt remains stable across all these changes. The composition approach also enables sophisticated behavior: you can apply modifiers conditionally, chain them in different orders for different contexts, or even have modifiers that observe and react to previous modifications. The system becomes extensible through well-defined interfaces rather than through increasingly complex conditional logic.

Liskov Substitution Principle: Prompt Modularity and Behavioral Contracts

The Liskov Substitution Principle states that objects of a superclass should be replaceable with objects of subclasses without breaking functionality. In prompt engineering, this translates to: components in a prompt chain should be swappable without violating behavioral expectations. If you have a summarization step in a pipeline, you should be able to replace it with a different summarization approach—extractive versus abstractive, for example—without breaking downstream components. The contract is "produces a summary of specified length"; the implementation details shouldn't matter to consumers.

Violations of LSP in prompt systems typically manifest as tight coupling to specific LLM behaviors or output formats. You build a prompt chain where step two depends not on the logical contract of step one ("provides extracted entities"), but on specific quirks of how step one currently formats its output. Maybe it returns JSON, and step two has parsing logic that depends on specific field names or nesting structure. When you want to swap the entity extraction approach—perhaps to use a more sophisticated model or a different prompting strategy—the change cascades. The new implementation fulfills the logical contract but breaks the format expectations, and suddenly the entire chain needs modification.

The solution requires explicit interface definitions for each component in your prompt chain. Define what each step promises to deliver, not how it delivers it. An entity extractor promises to return a list of identified entities with types and confidence scores. How it formats that internally is an implementation detail. The consuming step should depend on the abstraction—the contract—not the concrete details. This is identical to depending on interfaces rather than implementations in traditional OOP. In practice, this often means building a thin normalization layer that translates various prompt outputs into a canonical form, or designing your prompts to explicitly request output in a specified schema.

from abc import ABC, abstractmethod
from typing import List, Dict
from pydantic import BaseModel

# Define behavioral contracts through interfaces
class Entity(BaseModel):
    text: str
    type: str
    confidence: float

class EntityExtractor(ABC):
    """Contract: Extract entities from text, regardless of how."""
    
    @abstractmethod
    def extract(self, text: str) -> List[Entity]:
        """Returns list of entities. Implementation details don't matter."""
        pass

# Implementation 1: Zero-shot extraction
class ZeroShotExtractor(EntityExtractor):
    def extract(self, text: str) -> List[Entity]:
        prompt = f"""Extract all named entities from this text.
        For each entity, provide: text, type (PERSON/ORG/LOCATION), confidence (0-1).
        
        Text: {text}
        
        Format as JSON array: [{{"text": "...", "type": "...", "confidence": 0.9}}]"""
        
        response = llm.generate(prompt)
        # Parse and normalize to contract
        entities = json.loads(response)
        return [Entity(**e) for e in entities]

# Implementation 2: Few-shot with examples
class FewShotExtractor(EntityExtractor):
    def extract(self, text: str) -> List[Entity]:
        prompt = f"""Extract named entities from text.

        Example 1:
        Text: "Apple Inc. was founded by Steve Jobs in Cupertino."
        Entities:
        - text: "Apple Inc.", type: ORGANIZATION, confidence: 0.95
        - text: "Steve Jobs", type: PERSON, confidence: 0.98
        - text: "Cupertino", type: LOCATION, confidence: 0.92
        
        Now extract from:
        Text: {text}
        
        Provide entities in same format."""
        
        response = llm.generate(prompt)
        # Different internal format, but normalizes to same contract
        entities = self._parse_fewshot_format(response)
        return [Entity(**e) for e in entities]
    
    def _parse_fewshot_format(self, response: str) -> List[Dict]:
        # Implementation-specific parsing
        # Details hidden from consumers
        pass

# Implementation 3: Structured output with function calling
class StructuredExtractor(EntityExtractor):
    def extract(self, text: str) -> List[Entity]:
        # Uses LLM function calling for guaranteed structure
        response = llm.generate(
            prompt=f"Extract entities from: {text}",
            functions=[{
                "name": "report_entities",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "entities": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "text": {"type": "string"},
                                    "type": {"type": "string"},
                                    "confidence": {"type": "number"}
                                }
                            }
                        }
                    }
                }
            }]
        )
        entities = json.loads(response.function_call.arguments)["entities"]
        return [Entity(**e) for e in entities]

# Consumer depends on abstraction, not implementation
class InsightGenerator:
    """Generates insights from entities. Doesn't care HOW entities are extracted."""
    
    def __init__(self, extractor: EntityExtractor):
        self.extractor = extractor
    
    def generate_insights(self, documents: List[str]) -> str:
        all_entities = []
        for doc in documents:
            # Calls interface method - works with ANY EntityExtractor implementation
            entities = self.extractor.extract(doc)
            all_entities.extend(entities)
        
        # Generate insights from entities
        entity_summary = self._summarize_entities(all_entities)
        
        prompt = f"""Based on these extracted entities, provide key insights:
        
        {entity_summary}
        
        Focus on patterns, relationships, and notable findings."""
        
        return llm.generate(prompt)
    
    def _summarize_entities(self, entities: List[Entity]) -> str:
        # Works with any Entity objects that match the contract
        by_type = {}
        for entity in entities:
            by_type.setdefault(entity.type, []).append(entity.text)
        return "\n".join(f"{t}: {', '.join(items)}" for t, items in by_type.items())

# LSP in action: swap implementations without breaking consumers
generator = InsightGenerator(ZeroShotExtractor())
insights = generator.generate_insights(documents)

# Swap to different implementation - everything still works
generator = InsightGenerator(FewShotExtractor())
insights = generator.generate_insights(documents)  # Identical interface

# Yet another implementation - no consumer changes needed
generator = InsightGenerator(StructuredExtractor())
insights = generator.generate_insights(documents)  # Still works

This substitutability is crucial for iterating on prompt systems. You can experiment with different prompting strategies, different models, or entirely different approaches (retrieval-based versus generative, for example) without rewriting your entire system. Each component advertises its contract clearly, and as long as new implementations honor that contract, they're drop-in replacements. This also simplifies testing: you can create mock implementations that return deterministic results for testing downstream components in isolation.

Interface Segregation Principle: Avoiding Prompt Bloat

The Interface Segregation Principle advises that clients shouldn't be forced to depend on interfaces they don't use. In prompt terms: don't make every prompt carry every possible instruction, parameter, or capability. A common anti-pattern is the "universal prompt template" that includes exhaustive instructions covering every possible scenario, most of which are irrelevant to any specific invocation. The prompt carries instructions about handling edge cases that only occur in 2% of calls. It includes tone guidance for contexts that don't apply. It has output formatting requirements that only certain consumers need.

This bloat has real costs. LLMs have context windows, and every token of instruction is a token that could be used for input data or examples. Bloated prompts waste space. More subtly, they create cognitive interference. When a prompt includes ten different instruction sets, seven of which don't apply to the current task, the model must parse and weigh all of them. Instructions can conflict or create ambiguity. The more irrelevant context in the prompt, the more opportunity for the model to attend to the wrong signals. Prompt bloat is directly analogous to interface pollution in traditional software: it creates confusion, wastes resources, and makes the component harder to understand and use correctly.

The solution is prompt interfaces tailored to specific use cases. If you have three different consumers of a summarization prompt—one that needs bullet points, one that needs paragraph form, and one that needs just key statistics—don't create one prompt with instructions for all three formats and a parameter selecting which to use. Create three focused prompt interfaces, each containing only the instructions relevant to its use case. This seems like duplication, but it's beneficial duplication: each interface is clear, minimal, and optimized for its purpose. The common logic can be abstracted into shared components or base templates that each specialized prompt extends.

// SOLID Violation: Universal bloated prompt
class UniversalSummarizer {
  summarize(text: string, options: {
    format?: 'bullets' | 'paragraph' | 'statistics' | 'executive';
    length?: 'short' | 'medium' | 'long';
    audience?: 'technical' | 'executive' | 'general';
    includeMetrics?: boolean;
    includeRecommendations?: boolean;
    highlightRisks?: boolean;
    tone?: 'formal' | 'casual';
    // ... many more options
  }): string {
    // Giant prompt with instructions for every possible combination
    const prompt = `Summarize this text with the following requirements:
    
    ${options.format === 'bullets' ? '- Use bullet points\n' : ''}
    ${options.format === 'paragraph' ? '- Use paragraph form\n' : ''}
    ${options.format === 'statistics' ? '- Focus on statistics only\n' : ''}
    ${options.format === 'executive' ? '- Executive summary format\n' : ''}
    ${options.length === 'short' ? '- Keep it under 50 words\n' : ''}
    ${options.length === 'medium' ? '- Target 100-150 words\n' : ''}
    ${options.length === 'long' ? '- Provide detailed summary, 300-400 words\n' : ''}
    ${options.audience === 'technical' ? '- Use technical terminology\n' : ''}
    ${options.audience === 'executive' ? '- Focus on business impact\n' : ''}
    ${options.audience === 'general' ? '- Avoid jargon\n' : ''}
    ${options.includeMetrics ? '- Include key metrics\n' : ''}
    ${options.includeRecommendations ? '- Add recommendations\n' : ''}
    ${options.highlightRisks ? '- Highlight any risks\n' : ''}
    ${options.tone === 'formal' ? '- Maintain formal tone\n' : ''}
    ${options.tone === 'casual' ? '- Use conversational style\n' : ''}
    
    Text to summarize:
    ${text}`;
    
    // Every call processes ALL instructions, even irrelevant ones
    return llm.generate(prompt);
  }
}

// SOLID Compliant: Segregated interfaces for specific use cases
interface Summarizer {
  summarize(text: string): string;
}

class BulletPointSummarizer implements Summarizer {
  /**
   * Focused interface: only what's needed for bullet point summaries.
   * No irrelevant instructions cluttering the prompt.
   */
  summarize(text: string): string {
    const prompt = `Summarize this text as bullet points.
    Focus on key facts and actionable items.
    
    Text:
    ${text}
    
    Provide 3-5 bullet points.`;
    
    return llm.generate(prompt);
  }
}

class ExecutiveSummarizer implements Summarizer {
  /**
   * Different interface, different focus.
   * Only instructions relevant to executive summaries.
   */
  summarize(text: string): string {
    const prompt = `Create an executive summary of this text.
    Focus on business impact, decisions needed, and bottom-line implications.
    Use formal tone and paragraph structure.
    Target length: 100-150 words.
    
    Text:
    ${text}`;
    
    return llm.generate(prompt);
  }
}

class StatisticalSummarizer implements Summarizer {
  /**
   * Narrow, specialized interface.
   * No formatting instructions, no tone guidance - just data extraction.
   */
  summarize(text: string): string {
    const prompt = `Extract key statistics and metrics from this text.
    Report each as: [metric name]: [value] [unit]
    
    Text:
    ${text}`;
    
    return llm.generate(prompt);
  }
}

class TechnicalSummarizer implements Summarizer {
  private readonly length: number;
  
  constructor(targetLength: number = 200) {
    this.length = targetLength;
  }
  
  /**
   * Technical audience interface: different priorities.
   * Preserves terminology, focuses on implementation details.
   */
  summarize(text: string): string {
    const prompt = `Summarize this technical content for an engineering audience.
    Preserve technical terminology and focus on implementation details.
    Target length: ${this.length} words.
    
    Text:
    ${text}`;
    
    return llm.generate(prompt);
  }
}

// Usage: each client gets exactly the interface it needs
class ReportGenerator {
  generate(data: string): string {
    // Use specialized summarizer - prompt contains ONLY relevant instructions
    const executiveSummary = new ExecutiveSummarizer().summarize(data);
    const keyMetrics = new StatisticalSummarizer().summarize(data);
    
    return `${executiveSummary}\n\nKey Metrics:\n${keyMetrics}`;
  }
}

class DashboardWidget {
  render(data: string): string {
    // Different client, different needs, different interface
    // No wasted tokens on executive summary formatting instructions
    const bulletPoints = new BulletPointSummarizer().summarize(data);
    return this.formatForDisplay(bulletPoints);
  }
  
  private formatForDisplay(summary: string): string {
    // UI-specific formatting
    return summary;
  }
}

Each specialized summarizer has a clean, focused prompt that's easy to understand, test, and optimize. The BulletPointSummarizer doesn't waste tokens explaining executive summary format or statistical extraction. The ExecutiveSummarizer doesn't carry instructions about bullet points. Each interface is minimal and coherent. When you need to improve bullet point generation, you modify only BulletPointSummarizer without risk of affecting executive summaries. The segregation creates clarity and reduces unintended interactions.

Dependency Inversion Principle: Abstracting Prompt Chains

The Dependency Inversion Principle states that high-level modules should not depend on low-level modules; both should depend on abstractions. In prompt engineering, this means your orchestration logic shouldn't depend on specific LLM providers, models, or prompting strategies. A high-level process like "generate research report" should depend on abstractions like "retrieves relevant documents" and "synthesizes information," not on "calls GPT-4 with specific prompt template X using OpenAI API."

Violations appear when you hard-code LLM-specific details throughout your system. Your code directly calls OpenAI's API with model names, temperature settings, and prompt formats embedded in business logic. When you want to switch models, or add a fallback provider, or experiment with a different LLM, the changes cascade through every component. The high-level logic—the actual business process—is tightly coupled to low-level implementation details. This becomes especially painful when different models require different prompting strategies. GPT-4 might work well with verbose instructions while Claude might prefer concise directives. Anthropic's API structure differs from OpenAI's. If your business logic directly depends on these details, you can't adapt without significant refactoring.

The solution is to define abstract interfaces for LLM operations and inject concrete implementations. Your high-level orchestration depends on an abstraction like "TextGenerator" with a method "generate(prompt, constraints)." The concrete implementation—whether it's calling OpenAI, Anthropic, a local model, or a chain of fallbacks—is injected as a dependency. This inversion of control makes the system adaptable. You can swap providers, implement sophisticated retry logic with degradation to cheaper models, or route different types of requests to different providers, all without modifying high-level business logic.

from abc import ABC, abstractmethod
from typing import Optional, Dict, Any
from dataclasses import dataclass

@dataclass
class GenerationRequest:
    prompt: str
    max_tokens: int = 1000
    temperature: float = 0.7
    stop_sequences: Optional[list[str]] = None

@dataclass  
class GenerationResponse:
    text: str
    model_used: str
    tokens_used: int
    latency_ms: float

# Abstraction: high-level code depends on this, not on specific LLMs
class TextGenerator(ABC):
    @abstractmethod
    def generate(self, request: GenerationRequest) -> GenerationResponse:
        """Generate text. Implementation details are hidden."""
        pass

# Low-level implementation details: specific LLM providers
class OpenAIGenerator(TextGenerator):
    def __init__(self, api_key: str, model: str = "gpt-4"):
        self.api_key = api_key
        self.model = model
        # OpenAI-specific setup
    
    def generate(self, request: GenerationRequest) -> GenerationResponse:
        # OpenAI-specific API calls and prompt formatting
        import openai
        import time
        
        start = time.time()
        response = openai.ChatCompletion.create(
            model=self.model,
            messages=[{"role": "user", "content": request.prompt}],
            max_tokens=request.max_tokens,
            temperature=request.temperature,
            stop=request.stop_sequences
        )
        latency = (time.time() - start) * 1000
        
        return GenerationResponse(
            text=response.choices[0].message.content,
            model_used=self.model,
            tokens_used=response.usage.total_tokens,
            latency_ms=latency
        )

class AnthropicGenerator(TextGenerator):
    def __init__(self, api_key: str, model: str = "claude-3-opus-20240229"):
        self.api_key = api_key
        self.model = model
        # Anthropic-specific setup
    
    def generate(self, request: GenerationRequest) -> GenerationResponse:
        # Anthropic-specific API calls and format
        import anthropic
        import time
        
        client = anthropic.Anthropic(api_key=self.api_key)
        start = time.time()
        
        response = client.messages.create(
            model=self.model,
            max_tokens=request.max_tokens,
            temperature=request.temperature,
            messages=[{"role": "user", "content": request.prompt}]
        )
        latency = (time.time() - start) * 1000
        
        return GenerationResponse(
            text=response.content[0].text,
            model_used=self.model,
            tokens_used=response.usage.input_tokens + response.usage.output_tokens,
            latency_ms=latency
        )

class FallbackGenerator(TextGenerator):
    """Sophisticated retry logic with fallback chain."""
    def __init__(self, primary: TextGenerator, fallback: TextGenerator):
        self.primary = primary
        self.fallback = fallback
    
    def generate(self, request: GenerationRequest) -> GenerationResponse:
        try:
            return self.primary.generate(request)
        except Exception as e:
            print(f"Primary generator failed: {e}. Falling back.")
            return self.fallback.generate(request)

class CachingGenerator(TextGenerator):
    """Adds caching without changing interface."""
    def __init__(self, underlying: TextGenerator):
        self.underlying = underlying
        self.cache: Dict[str, GenerationResponse] = {}
    
    def generate(self, request: GenerationRequest) -> GenerationResponse:
        cache_key = f"{request.prompt}:{request.temperature}:{request.max_tokens}"
        
        if cache_key in self.cache:
            cached = self.cache[cache_key]
            print(f"Cache hit for prompt")
            return cached
        
        response = self.underlying.generate(request)
        self.cache[cache_key] = response
        return response

# High-level business logic: depends on abstraction, not implementation
class ResearchReportGenerator:
    """
    High-level module. No knowledge of OpenAI, Anthropic, or any specific LLM.
    Depends only on TextGenerator abstraction.
    """
    def __init__(self, generator: TextGenerator):
        # Dependency injected: could be ANY TextGenerator implementation
        self.generator = generator
    
    def generate_report(self, topic: str, sources: list[str]) -> str:
        # High-level orchestration - no LLM-specific details
        
        # Step 1: Synthesize sources
        synthesis_prompt = self._build_synthesis_prompt(topic, sources)
        synthesis = self.generator.generate(
            GenerationRequest(prompt=synthesis_prompt, temperature=0.3)
        )
        
        # Step 2: Generate insights
        insights_prompt = self._build_insights_prompt(synthesis.text)
        insights = self.generator.generate(
            GenerationRequest(prompt=insights_prompt, temperature=0.7)
        )
        
        # Step 3: Format final report
        report_prompt = self._build_report_prompt(synthesis.text, insights.text)
        report = self.generator.generate(
            GenerationRequest(prompt=report_prompt, temperature=0.5)
        )
        
        return report.text
    
    def _build_synthesis_prompt(self, topic: str, sources: list[str]) -> str:
        return f"""Synthesize these sources about {topic}:
        
        {chr(10).join(sources)}
        
        Provide a coherent synthesis identifying key themes."""
    
    def _build_insights_prompt(self, synthesis: str) -> str:
        return f"""Based on this synthesis, identify key insights and implications:
        
        {synthesis}"""
    
    def _build_report_prompt(self, synthesis: str, insights: str) -> str:
        return f"""Create a research report structured as:
        1. Executive Summary
        2. Key Findings
        3. Insights and Implications
        
        Synthesis: {synthesis}
        Insights: {insights}"""

# Usage: swap implementations without changing high-level logic
if __name__ == "__main__":
    # Configuration 1: Direct OpenAI usage
    generator = OpenAIGenerator(api_key="...", model="gpt-4")
    report_gen = ResearchReportGenerator(generator)
    
    # Configuration 2: Anthropic with caching
    generator = CachingGenerator(
        AnthropicGenerator(api_key="...", model="claude-3-opus-20240229")
    )
    report_gen = ResearchReportGenerator(generator)
    
    # Configuration 3: GPT-4 with fallback to Claude, with caching
    generator = CachingGenerator(
        FallbackGenerator(
            primary=OpenAIGenerator(api_key="...", model="gpt-4"),
            fallback=AnthropicGenerator(api_key="...", model="claude-3-opus-20240229")
        )
    )
    report_gen = ResearchReportGenerator(generator)
    
    # High-level code identical in all cases
    report = report_gen.generate_report(
        topic="SOLID principles in software engineering",
        sources=[...]
    )

This architecture provides enormous flexibility. You can implement a LocalModelGenerator that runs a quantized model on-device for privacy-sensitive operations. You can create a RouterGenerator that examines the request and routes simple queries to cheaper models while sending complex reasoning tasks to premium models. You can implement sophisticated logging, monitoring, cost tracking, and rate limiting—all as decorators around the TextGenerator abstraction—without modifying any business logic. The high-level processes remain stable while low-level implementation evolves.

Practical Implementation Patterns for SOLID Prompts

Translating SOLID principles into working prompt-based systems requires specific architectural patterns. The most fundamental is the prompt registry pattern, which treats prompts as first-class artifacts rather than strings scattered throughout code. A prompt registry maintains a catalog of versioned prompts, each with metadata about purpose, inputs, outputs, and behavioral contracts. This enables treating prompts like functions: they have signatures, documentation, and version history. When debugging issues, you can identify exactly which prompt version was used. When deploying improvements, you can do so through controlled rollout rather than find-and-replace across codebases.

The chain-of-responsibility pattern applies naturally to prompt systems. Complex tasks decompose into chains where each step performs a focused transformation. Document analysis might chain: extraction (pull structured data) → validation (verify completeness) → enrichment (add context) → synthesis (generate insights) → formatting (prepare output). Each link in the chain is a single-responsibility component that can be tested, monitored, and swapped independently. The chain itself is configuration: you can reorder steps, add conditional branching, or create entirely new chains by composing existing components. This is prompt-level modularity achieving the same benefits as function-level modularity in traditional programming.

The adapter pattern solves the practical problem of integrating external LLM services with varying APIs, capabilities, and quirks. Different models have different context window sizes, different formatting preferences, different strengths. An adapter layer normalizes these differences, presenting a consistent interface to application code while handling model-specific details internally. The adapter might break long documents into chunks for models with smaller context windows, reformulate prompts for models that perform better with different instruction styles, or translate between different output schema formats. This encapsulation means model selection becomes a configuration choice rather than a code change.

// Prompt Registry Pattern
interface PromptDefinition {
  id: string;
  version: string;
  description: string;
  template: string;
  inputSchema: object;
  outputSchema: object;
  examples: Array<{ input: any; output: any }>;
}

class PromptRegistry {
  private prompts: Map<string, Map<string, PromptDefinition>> = new Map();
  
  register(prompt: PromptDefinition): void {
    if (!this.prompts.has(prompt.id)) {
      this.prompts.set(prompt.id, new Map());
    }
    this.prompts.get(prompt.id)!.set(prompt.version, prompt);
  }
  
  get(id: string, version: string = 'latest'): PromptDefinition {
    const versions = this.prompts.get(id);
    if (!versions) throw new Error(`Prompt ${id} not found`);
    
    if (version === 'latest') {
      // Return highest version number
      const sorted = Array.from(versions.keys()).sort();
      version = sorted[sorted.length - 1];
    }
    
    const prompt = versions.get(version);
    if (!prompt) throw new Error(`Version ${version} of ${id} not found`);
    return prompt;
  }
  
  listVersions(id: string): string[] {
    return Array.from(this.prompts.get(id)?.keys() || []);
  }
}

// Usage: prompts as versioned artifacts
const registry = new PromptRegistry();

registry.register({
  id: 'entity-extraction',
  version: '1.0.0',
  description: 'Extract named entities using zero-shot approach',
  template: `Extract named entities from this text: {text}`,
  inputSchema: { text: 'string' },
  outputSchema: { entities: 'array' },
  examples: []
});

registry.register({
  id: 'entity-extraction',
  version: '2.0.0',
  description: 'Extract named entities using improved few-shot approach',
  template: `Extract entities (examples provided):\n\n{examples}\n\nText: {text}`,
  inputSchema: { text: 'string' },
  outputSchema: { entities: 'array' },
  examples: [
    { input: { text: 'Apple Inc. in Cupertino' }, 
      output: { entities: [{ text: 'Apple Inc.', type: 'ORG' }] } }
  ]
});

// Application code references prompts by ID, version is configurable
const promptDef = registry.get('entity-extraction', '2.0.0');
const prompt = renderTemplate(promptDef.template, { text: inputText });

// Chain-of-Responsibility Pattern for Prompt Pipelines
interface PipelineStep<TIn, TOut> {
  execute(input: TIn): Promise<TOut>;
  name: string;
}

class Pipeline<TIn, TOut> {
  private steps: Array<PipelineStep<any, any>> = [];
  
  addStep<TNext>(step: PipelineStep<TOut, TNext>): Pipeline<TIn, TNext> {
    this.steps.push(step);
    return this as any;
  }
  
  async execute(input: TIn): Promise<TOut> {
    let result: any = input;
    
    for (const step of this.steps) {
      console.log(`Executing pipeline step: ${step.name}`);
      result = await step.execute(result);
    }
    
    return result;
  }
}

// Individual steps with single responsibilities
class ExtractionStep implements PipelineStep<string, StructuredData> {
  name = 'extraction';
  
  async execute(text: string): Promise<StructuredData> {
    const prompt = registry.get('data-extraction');
    const response = await llm.generate(
      renderTemplate(prompt.template, { text })
    );
    return JSON.parse(response);
  }
}

class ValidationStep implements PipelineStep<StructuredData, StructuredData> {
  name = 'validation';
  
  async execute(data: StructuredData): Promise<StructuredData> {
    const prompt = registry.get('data-validation');
    const response = await llm.generate(
      renderTemplate(prompt.template, { data: JSON.stringify(data) })
    );
    
    const validation = JSON.parse(response);
    if (!validation.isValid) {
      throw new Error(`Validation failed: ${validation.errors}`);
    }
    return data;
  }
}

class EnrichmentStep implements PipelineStep<StructuredData, EnrichedData> {
  name = 'enrichment';
  
  async execute(data: StructuredData): Promise<EnrichedData> {
    const prompt = registry.get('data-enrichment');
    const response = await llm.generate(
      renderTemplate(prompt.template, { data: JSON.stringify(data) })
    );
    return JSON.parse(response);
  }
}

// Compose pipeline from single-responsibility steps
const analysisPipeline = new Pipeline<string, EnrichedData>()
  .addStep(new ExtractionStep())
  .addStep(new ValidationStep())
  .addStep(new EnrichmentStep());

const result = await analysisPipeline.execute(documentText);

// Adapter Pattern for Model Differences
interface ModelAdapter {
  generateText(prompt: string, options: GenerationOptions): Promise<string>;
  maxContextLength: number;
  preferredPromptStyle: 'verbose' | 'concise';
}

class GPT4Adapter implements ModelAdapter {
  maxContextLength = 8192;
  preferredPromptStyle = 'verbose' as const;
  
  async generateText(prompt: string, options: GenerationOptions): Promise<string> {
    // GPT-4 specific API calls and formatting
    return await openai.chat.completions.create({
      model: 'gpt-4',
      messages: [{ role: 'user', content: prompt }],
      ...options
    });
  }
}

class ClaudeAdapter implements ModelAdapter {
  maxContextLength = 100000;
  preferredPromptStyle = 'concise' as const;
  
  async generateText(prompt: string, options: GenerationOptions): Promise<string> {
    // Claude-specific API calls and formatting
    return await anthropic.messages.create({
      model: 'claude-3-opus-20240229',
      messages: [{ role: 'user', content: prompt }],
      max_tokens: options.maxTokens || 4096,
      ...options
    });
  }
}

// Application code works with abstraction
class DocumentAnalyzer {
  constructor(private adapter: ModelAdapter) {}
  
  async analyze(document: string): Promise<Analysis> {
    // Adapt prompt style based on model preference
    const prompt = this.adapter.preferredPromptStyle === 'verbose'
      ? this.buildVerbosePrompt(document)
      : this.buildConcisePrompt(document);
    
    // Handle context length limits
    const chunks = this.chunkIfNeeded(document, this.adapter.maxContextLength);
    
    const results = await Promise.all(
      chunks.map(chunk => this.adapter.generateText(prompt + chunk, {}))
    );
    
    return this.synthesizeResults(results);
  }
  
  private buildVerbosePrompt(doc: string): string {
    return `Please analyze the following document carefully...`;
  }
  
  private buildConcisePrompt(doc: string): string {
    return `Analyze this document:`;
  }
  
  private chunkIfNeeded(text: string, maxLength: number): string[] {
    // Chunking logic based on model's context window
    return [text]; // simplified
  }
  
  private synthesizeResults(results: string[]): Analysis {
    // Combine results from multiple chunks
    return {} as Analysis;
  }
}

// Swap models without changing application logic
const analyzer = new DocumentAnalyzer(new GPT4Adapter());
// Later: const analyzer = new DocumentAnalyzer(new ClaudeAdapter());

These patterns work together synergistically. The prompt registry provides version control and documentation. The chain-of-responsibility pattern enables composing complex behaviors from simple components. The adapter pattern isolates model-specific details. Combined, they create a layered architecture where business logic, prompt design, and LLM integration are cleanly separated. This separation enables independent evolution: improve prompts without changing code, swap models without rewriting prompts, modify business logic without touching LLM integration.

Trade-offs and When to Break the Rules

SOLID principles are guidelines, not laws of nature. There are legitimate reasons to violate them, and understanding when reveals deeper insight than blind adherence. The primary trade-off is complexity versus simplicity. Decomposing a single prompt into five focused components following SRP creates more moving parts. For simple, stable use cases with no need for independent modification of sub-concerns, the decomposition cost exceeds the benefit. A straightforward summarization prompt that will never need extension doesn't benefit from the OCP infrastructure of modifiers and composition. The overhead of abstraction outweighs the flexibility gain.

Latency is another consideration. Each prompt in a chain incurs round-trip latency to the LLM service. A monolithic prompt that handles extraction, validation, and formatting in one pass completes in one round trip. Decomposing it into three focused prompts triples the network overhead. For latency-sensitive applications, this trade-off may be unacceptable. The solution isn't abandoning SOLID but adapting it: keep concerns conceptually separated even within a monolithic prompt. Structure the single prompt into clearly delineated sections, each handling one responsibility. Use clear section headers: "# Extraction Instructions", "# Validation Rules", "# Formatting Requirements". This maintains conceptual separation and some of SOLID's benefits while avoiding the latency cost.

Token costs matter for production systems. Decomposed prompts often require more total tokens than monolithic ones because each step needs context. The extraction prompt needs to see the original text. The validation prompt needs both the original text and the extraction results. The enrichment prompt needs all previous context. Context propagates through the chain, potentially duplicating information across multiple prompts. A monolithic prompt sees everything once and generates a final output. For high-volume systems where token costs drive economics, the efficiency of monolithic prompts can outweigh the maintainability of decomposed ones. The optimal architecture depends on your specific constraints: update frequency, latency requirements, token budget, and team size.

# When to violate SRP: simple, stable, latency-sensitive use case
class QuickSummarizer:
    """
    Deliberately monolithic for performance.
    Use when: task is simple, won't change, latency critical.
    """
    def summarize_and_format(self, text: str) -> dict:
        # Single prompt does extraction and formatting
        # Trade-off: faster (one round trip) but less maintainable
        prompt = f"""Analyze this text and provide:
        1. A 2-3 sentence summary
        2. Key points as bullet list
        3. Word count of original
        
        Text: {text}
        
        Format as JSON: {{"summary": "...", "bullets": [...], "word_count": N}}"""
        
        return json.loads(llm.generate(prompt))

# When to follow SRP: complex, evolving, maintainability-critical
class EvolvableDocumentProcessor:
    """
    Deliberately decomposed for maintainability.
    Use when: requirements change frequently, team is large, 
    different people own different concerns.
    """
    def __init__(self):
        self.extractor = DataExtractor()
        self.validator = DataValidator()
        self.enricher = DataEnricher()
        self.formatter = OutputFormatter()
    
    def process(self, text: str) -> dict:
        # Multiple round trips, more tokens, but each step is
        # independently maintainable, testable, and swappable
        data = self.extractor.extract(text)
        validated = self.validator.validate(data)
        enriched = self.enricher.enrich(validated)
        return self.formatter.format(enriched)

# Hybrid approach: conceptual separation within monolithic prompt
class PragmaticProcessor:
    """
    Single prompt for performance, but structured for maintainability.
    Compromise when you need both speed and some degree of evolvability.
    """
    def process(self, text: str) -> dict:
        # One round trip, but clear separation of concerns
        prompt = f"""Process this text through the following stages:

# STAGE 1: EXTRACTION
Extract key entities, dates, and amounts from the text.

# STAGE 2: VALIDATION  
Verify that extracted data is consistent and complete.
Check for missing required fields.

# STAGE 3: ENRICHMENT
Add context and categorization to extracted data.

# STAGE 4: FORMATTING
Structure the final output as JSON with this schema:
{{"entities": [...], "validation_status": "...", "categories": [...]}}

Text to process:
{text}

Provide final JSON output:"""
        
        return json.loads(llm.generate(prompt))

The right choice depends on your specific context. Early-stage prototypes benefit from simplicity—monolithic prompts let you iterate quickly. Production systems with multiple teams maintaining different aspects benefit from strict separation. High-throughput, cost-sensitive applications may optimize for token efficiency. The key is making the trade-off consciously rather than defaulting to one approach. Understand what you're optimizing for: development speed, runtime performance, maintainability, token cost, or team scaling. Then choose the architecture that best serves those priorities.

Best Practices for SOLID Prompt Engineering

Implementing SOLID principles in prompt engineering requires specific practices that go beyond architectural patterns. The first is comprehensive prompt testing. Each prompt component needs a test suite just as functions do. Test cases should cover expected inputs, edge cases, and known failure modes. For a classification prompt, tests verify correct classification of clear-cut examples, borderline cases, and adversarial inputs designed to confuse the classifier. Automated testing catches regressions when prompts are modified. Without it, you're deploying changes blind, hoping nothing breaks. The investment in test infrastructure pays for itself the first time it catches a breaking change before production.

Version control for prompts is non-negotiable. Prompts should live in version control systems alongside code, not scattered in notebooks or configuration files. Each prompt should have a clear history showing what changed, when, and why. This enables rollback when changes degrade performance. It provides audit trails for debugging issues. It facilitates A/B testing different prompt versions. Treating prompts as code—with code review processes, versioning, and deployment procedures—might seem like overhead for text artifacts, but the discipline prevents the chaos that emerges when prompts evolve through ad-hoc edits.

Observability and monitoring are crucial. Production prompt systems need logging that captures inputs, outputs, latency, token usage, and model responses for each invocation. When issues arise—and they will—this telemetry enables root cause analysis. You can see exactly what prompt was used, what input it received, and what output it generated. Patterns emerge: certain input characteristics correlate with poor outputs. Specific prompt versions show elevated failure rates. Model behavior shifts over time as providers update their systems. Without comprehensive logging, debugging prompt systems is guesswork. With it, issues become tractable.

// Best Practice: Comprehensive Prompt Testing
import { describe, test, expect } from 'vitest';

describe('EntityExtractor', () => {
  const extractor = new ZeroShotExtractor();
  
  test('extracts clear-cut entities correctly', async () => {
    const text = "Apple Inc. was founded by Steve Jobs in Cupertino.";
    const entities = await extractor.extract(text);
    
    expect(entities).toContainEqual(
      expect.objectContaining({
        text: "Apple Inc.",
        type: "ORGANIZATION"
      })
    );
    expect(entities).toContainEqual(
      expect.objectContaining({
        text: "Steve Jobs",
        type: "PERSON"
      })
    );
  });
  
  test('handles ambiguous entities appropriately', async () => {
    // "Apple" could be organization or fruit
    const text = "I bought an apple at the store.";
    const entities = await extractor.extract(text);
    
    // Should NOT extract "apple" as organization
    const orgEntities = entities.filter(e => e.type === "ORGANIZATION");
    expect(orgEntities).toHaveLength(0);
  });
  
  test('handles empty input gracefully', async () => {
    const entities = await extractor.extract("");
    expect(entities).toEqual([]);
  });
  
  test('handles text with no entities', async () => {
    const text = "The weather is nice today.";
    const entities = await extractor.extract(text);
    expect(entities).toEqual([]);
  });
  
  test('extracts entities from long documents', async () => {
    const longText = generateLongDocument(); // 10k+ words
    const entities = await extractor.extract(longText);
    
    // Should complete without timeout or error
    expect(entities.length).toBeGreaterThan(0);
  });
  
  test('assigns reasonable confidence scores', async () => {
    const text = "Microsoft announced new products.";
    const entities = await extractor.extract(text);
    
    // Confidence should be in valid range
    entities.forEach(entity => {
      expect(entity.confidence).toBeGreaterThanOrEqual(0);
      expect(entity.confidence).toBeLessThanOrEqual(1);
    });
  });
});

// Best Practice: Version Control and Metadata
interface PromptMetadata {
  id: string;
  version: string;
  createdAt: Date;
  author: string;
  description: string;
  changeLog: string;
  testCasesPassing: number;
  testCasesTotal: number;
}

class VersionedPromptRegistry {
  private prompts: Map<string, Array<{
    definition: PromptDefinition;
    metadata: PromptMetadata;
  }>> = new Map();
  
  register(definition: PromptDefinition, metadata: PromptMetadata): void {
    if (!this.prompts.has(definition.id)) {
      this.prompts.set(definition.id, []);
    }
    
    this.prompts.get(definition.id)!.push({
      definition,
      metadata
    });
  }
  
  getByVersion(id: string, version: string): PromptDefinition {
    const versions = this.prompts.get(id);
    if (!versions) throw new Error(`Prompt ${id} not found`);
    
    const entry = versions.find(v => v.metadata.version === version);
    if (!entry) throw new Error(`Version ${version} not found`);
    
    return entry.definition;
  }
  
  getLatestStable(id: string): PromptDefinition {
    const versions = this.prompts.get(id);
    if (!versions) throw new Error(`Prompt ${id} not found`);
    
    // Return latest version with all tests passing
    const stable = versions
      .filter(v => v.metadata.testCasesPassing === v.metadata.testCasesTotal)
      .sort((a, b) => b.metadata.createdAt.getTime() - a.metadata.createdAt.getTime());
    
    if (stable.length === 0) {
      throw new Error(`No stable version found for ${id}`);
    }
    
    return stable[0].definition;
  }
  
  getHistory(id: string): PromptMetadata[] {
    const versions = this.prompts.get(id);
    if (!versions) return [];
    
    return versions
      .map(v => v.metadata)
      .sort((a, b) => b.createdAt.getTime() - a.createdAt.getTime());
  }
}

// Best Practice: Comprehensive Observability
interface PromptInvocation {
  promptId: string;
  promptVersion: string;
  timestamp: Date;
  input: any;
  output: any;
  latencyMs: number;
  tokensUsed: number;
  modelUsed: string;
  success: boolean;
  errorMessage?: string;
}

class ObservablePromptExecutor {
  private logger: Logger;
  private metrics: MetricsCollector;
  
  async execute(
    promptDef: PromptDefinition,
    input: any
  ): Promise<any> {
    const startTime = performance.now();
    const invocation: Partial<PromptInvocation> = {
      promptId: promptDef.id,
      promptVersion: promptDef.version,
      timestamp: new Date(),
      input
    };
    
    try {
      const prompt = renderTemplate(promptDef.template, input);
      const response = await llm.generate(prompt);
      
      invocation.output = response;
      invocation.latencyMs = performance.now() - startTime;
      invocation.tokensUsed = countTokens(prompt) + countTokens(response);
      invocation.modelUsed = llm.modelName;
      invocation.success = true;
      
      // Log successful invocation
      this.logger.info('Prompt executed successfully', invocation);
      
      // Record metrics
      this.metrics.recordLatency(promptDef.id, invocation.latencyMs);
      this.metrics.recordTokens(promptDef.id, invocation.tokensUsed);
      this.metrics.incrementSuccess(promptDef.id);
      
      return response;
      
    } catch (error) {
      invocation.success = false;
      invocation.errorMessage = error.message;
      invocation.latencyMs = performance.now() - startTime;
      
      // Log failed invocation with full context
      this.logger.error('Prompt execution failed', {
        ...invocation,
        error: error.stack
      });
      
      // Record failure metrics
      this.metrics.incrementFailure(promptDef.id);
      
      throw error;
    }
  }
  
  async analyzePerformance(promptId: string, timeRange: TimeRange): Promise<PerformanceReport> {
    // Query logged invocations to analyze patterns
    const invocations = await this.logger.query({
      promptId,
      timeRange
    });
    
    return {
      totalInvocations: invocations.length,
      successRate: invocations.filter(i => i.success).length / invocations.length,
      avgLatencyMs: average(invocations.map(i => i.latencyMs)),
      p95LatencyMs: percentile(invocations.map(i => i.latencyMs), 0.95),
      totalTokens: sum(invocations.map(i => i.tokensUsed)),
      commonFailures: this.identifyCommonFailures(invocations.filter(i => !i.success))
    };
  }
  
  private identifyCommonFailures(failures: PromptInvocation[]): Map<string, number> {
    // Group failures by error message to identify patterns
    const grouped = new Map<string, number>();
    for (const failure of failures) {
      const count = grouped.get(failure.errorMessage!) || 0;
      grouped.set(failure.errorMessage!, count + 1);
    }
    return grouped;
  }
}

// Usage: comprehensive testing, versioning, and observability
const registry = new VersionedPromptRegistry();
const executor = new ObservablePromptExecutor(logger, metrics);

// Register versioned prompt with metadata
registry.register(
  promptDefinition,
  {
    id: 'entity-extraction',
    version: '2.1.0',
    createdAt: new Date(),
    author: 'engineering@company.com',
    description: 'Improved entity extraction with better handling of ambiguous cases',
    changeLog: 'Added clarifying examples for organization vs. common nouns',
    testCasesPassing: 47,
    testCasesTotal: 47
  }
);

// Execute with full observability
const prompt = registry.getLatestStable('entity-extraction');
const result = await executor.execute(prompt, { text: inputText });

// Analyze performance over time
const report = await executor.analyzePerformance(
  'entity-extraction',
  { start: oneWeekAgo, end: now }
);

console.log(`Success rate: ${report.successRate * 100}%`);
console.log(`P95 latency: ${report.p95LatencyMs}ms`);
console.log(`Common failures:`, report.commonFailures);

These practices create a production-grade prompt engineering discipline. Testing catches regressions. Version control enables collaboration and rollback. Observability provides visibility into production behavior. Together, they transform prompt engineering from ad-hoc text editing into a rigorous software engineering practice. The overhead is real, but so are the benefits: reliable systems that can evolve without fear, deployed by teams rather than individuals, with performance and costs tracked and optimized systematically.

Key Takeaways

Start with Single Responsibility. Before writing a prompt, articulate its one job in a single sentence. If you need "and" to describe it, you probably need multiple prompts. Decompose god prompts into focused components that each do one thing well. Design for Extension. Build prompts with modifier points and composition in mind. Keep core logic stable and handle variations through parameters or composable fragments rather than editing the base prompt every time requirements change. Define Clear Contracts. Establish what each prompt promises to deliver—its inputs, outputs, and behavior—independent of how it delivers. This enables swapping implementations without breaking consumers and makes testing straightforward. Abstract Infrastructure Details. Don't let business logic depend directly on LLM providers, model names, or API specifics. Define interfaces for generation capabilities and inject concrete implementations. This makes your system adaptable to new models and providers. Invest in Engineering Discipline. Treat prompts as code: version control them, test them comprehensively, and monitor their behavior in production. The discipline seems like overhead initially but becomes essential as systems scale and teams grow.

Conclusion

The parallels between SOLID principles and prompt engineering aren't superficial—they reflect fundamental truths about managing complexity in systems built from composable parts. Whether those parts are classes or prompts, the same forces apply: responsibilities blur, dependencies tangle, and modifications cascade unless you actively design against these tendencies. The SOLID principles succeeded in object-oriented programming because they aligned technical structure with how systems actually evolve. Requirements change. Teams grow. Complexity compounds. SOLID provides patterns for managing these realities.

Prompt engineering is reaching the maturity point where these lessons become critical. Early experimentation with simple prompts worked fine. But as organizations build production systems with hundreds of prompts, complex orchestration chains, and multiple teams touching the same components, the ad-hoc approaches break down. Prompts become unmaintainable. Changes break unexpectedly. Debugging is impossible. Testing is non-existent. These are precisely the problems SOLID solved for object-oriented code, and the solutions transfer directly.

The investment in applying SOLID principles to your prompt engineering practices pays off the same way it did for software architecture: not through immediate velocity gains, but through sustained ability to evolve. Your systems remain maintainable as they grow. Your teams can work independently without stepping on each other. Your prompts can be improved without fear. Your infrastructure can adapt to new models and providers. The discipline feels constraining initially—all good engineering practices do—but it's the difference between systems that calcify and systems that adapt. As prompt-based architectures become the foundation of more critical systems, the engineering discipline matters more. SOLID principles provide a proven framework for building that discipline.

References

Martin, Robert C. "The Principles of OOD." Object Mentor, 2000. Available at: http://www.butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod
Martin, Robert C. Clean Architecture: A Craftsman's Guide to Software Structure and Design. Prentice Hall, 2017.
Martin, Robert C. Agile Software Development, Principles, Patterns, and Practices. Prentice Hall, 2002.
Gamma, Erich, et al. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1994.
Fowler, Martin. "Inversion of Control Containers and the Dependency Injection pattern." MartinFowler.com, 2004. Available at: https://martinfowler.com/articles/injection.html
OpenAI. "GPT Best Practices." OpenAI Documentation, 2024. Available at: https://platform.openai.com/docs/guides/prompt-engineering
Anthropic. "Introduction to Prompt Design." Anthropic Documentation, 2024. Available at: https://docs.anthropic.com/claude/docs/introduction-to-prompt-design
White, Jules, et al. "A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT." arXiv preprint arXiv:2302.11382, 2023.
Zhou, Yongchao, et al. "Large Language Models Are Human-Level Prompt Engineers." arXiv preprint arXiv:2211.01910, 2022.
Wei, Jason, et al. "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models." arXiv preprint arXiv:2201.11903, 2022.