Prompt Anti-patterns Engineers Fall Into

— Many prompt failures come from familiar engineering anti-patterns applied to natural language. This article identifies the most common prompt anti-patterns and explains why they break down in production.

level: intermediate topics: prompting tags: prompting, anti-patterns, llm, production, system-design

Technical Debt Starts at the Prompt

Engineers recognize anti-patterns in code:

  • God objects that do too much
  • Magic numbers without explanation
  • Tight coupling that prevents change
  • Implicit assumptions that break silently

The same anti-patterns appear in prompts.

When prompts become unmaintainable, systems fail in production. This article catalogs the most common prompt anti-patterns, explains why they persist, and shows how to avoid them.


Anti-pattern 1: The Kitchen Sink Prompt

What it looks like

prompt = """
You are a helpful AI assistant that is accurate, professional, concise,
thorough, detailed, careful, precise, thoughtful, and accurate. Always
provide accurate information. Be helpful. Be polite. Be professional.
Think carefully before responding. Make sure your response is accurate.
Answer the user's question: {question}
"""

Why engineers do this

  • Trying to prevent every possible failure mode
  • Adding instructions after each incident
  • Believing more guidance = better behavior

Why it fails

  1. Signal dilution: Model cannot distinguish important from redundant
  2. Token waste: Uses context on repetitive fluff
  3. Unmaintainable: No single source of truth for behavior
  4. False confidence: Feels safer but provides no guarantees

What to do instead

# Clear hierarchy of concerns
system_instructions = """
Role: Data analysis assistant
Task: Answer questions about provided datasets
Constraints:
- Only reference data explicitly provided
- State when information is unavailable
- Format output as JSON
"""

prompt = f"""
{system_instructions}

Dataset: {data}
Question: {question}
"""

Principle: Each instruction should serve a distinct, testable purpose.


Anti-pattern 2: Magic Prompt Phrases

What it looks like

# Copied from internet without understanding
prompt = """
Let's think step by step.
Take a deep breath.
This is important to my career.
I will tip you $200 for a perfect response.
"""

Why engineers do this

  • Saw it in a blog post or paper
  • Worked once in testing
  • Cargo cult prompting

Why it fails

  1. Model-specific: May work on GPT-4, break on Claude
  2. Version-specific: Training data changes behavior
  3. Context-specific: “Step by step” helps math, not classification
  4. Unmeasured: No evidence it helps in this use case

What to do instead

# Test if technique applies to your case
def evaluate_technique(prompt_a, prompt_b, test_cases):
    results_a = [run_prompt(prompt_a, case) for case in test_cases]
    results_b = [run_prompt(prompt_b, case) for case in test_cases]

    return compare_metrics(results_a, results_b)

# Only use techniques you have measured
baseline_prompt = "Solve this problem: {problem}"
cot_prompt = "Solve step by step:\n1. Identify knowns\n2. {problem}"

if evaluate_technique(baseline_prompt, cot_prompt, math_problems):
    use_cot = True

Principle: Techniques must be validated for your specific use case and model.


Anti-pattern 3: Implicit Context Assumptions

What it looks like

# Assumes model "knows" what we mean
prompt = f"Summarize this: {text}"

# Assumes model has external knowledge
prompt = "What is the best practice for this API?"

# Assumes model remembers earlier context
prompt = "Now apply the same format to this data"

Why engineers do this

  • Natural language feels conversational
  • Assumes model has persistent memory
  • Forgets that context is explicit, not implicit

Why it fails

  1. Context window is the only memory: Previous turns may be truncated
  2. No external knowledge base: Model cannot look up current information
  3. Non-deterministic: “This” and “same” are ambiguous
  4. Untestable: Cannot verify what model actually received

What to do instead

# Explicit context in every prompt
prompt = f"""
Task: Summarize the following text in 3 bullet points.
Format: Markdown list, 20 words per bullet maximum.
Text: {text}
Summary:
"""

# Include all required information
prompt = f"""
You previously formatted data as CSV with headers: {headers}
Format this new data the same way:
{new_data}
"""

# Self-contained prompts
def create_prompt(task, data, format_spec):
    return f"""
    Task: {task}
    Input: {data}
    Output format: {format_spec}
    """

Principle: Every prompt must be self-contained and explicit.


Anti-pattern 4: Prompt Spaghetti

What it looks like

# Concatenated string builder
prompt = "You are an assistant"
if user_premium:
    prompt += " with access to advanced features"
if user_history:
    prompt += f"\nPrevious context: {user_history[:100]}"
if urgent:
    prompt += "\nThis is urgent"
prompt += f"\nUser question: {question}"

# Impossible to test or version

Why engineers do this

  • Incremental feature additions
  • Conditional logic seems natural
  • Each engineer adds their piece

Why it fails

  1. Non-deterministic: Different users get different prompts
  2. Untestable: Cannot reproduce exact prompt
  3. No version control: Cannot compare or rollback
  4. Hidden dependencies: Conditions coupled to user state

What to do instead

# Template-based composition
from jinja2 import Template

base_template = Template("""
Role: {{role}}
{% if context %}
Context: {{context}}
{% endif %}
Task: {{task}}
Question: {{question}}
""")

def build_prompt(user, question):
    return base_template.render(
        role="assistant",
        context=get_context(user) if user.premium else None,
        task=get_task_description(user),
        question=question
    )

# Version control templates
# prompts/assistant_v1.jinja
# prompts/assistant_v2.jinja

# Test each template variant
@pytest.mark.parametrize("template_version", ["v1", "v2"])
def test_prompt_template(template_version):
    template = load_template(template_version)
    output = generate(template.render(test_data))
    assert validate(output)

Principle: Prompts should be composable, testable, and versioned like code.


Anti-pattern 5: Prompt as Exception Handler

What it looks like

# Using prompt to fix code problems
prompt = """
Parse this JSON. If it fails, try to fix it.
If there are missing fields, infer reasonable defaults.
If the data looks wrong, correct it.
"""

# Prompt as error recovery
try:
    result = parse_strict(data)
except Exception:
    # Let the AI fix it
    result = llm.generate(f"Fix this data: {data}")

Why engineers do this

  • Seems easier than proper error handling
  • “AI will figure it out”
  • Avoiding deterministic validation logic

Why it fails

  1. Non-deterministic error handling: Different errors produce different fixes
  2. Masks root causes: Problems propagate silently
  3. Expensive: LLM calls for logic that should be deterministic
  4. Unreliable: Model may “fix” data incorrectly

What to do instead

# Deterministic validation + clear failure modes
def parse_data(data: str) -> Result:
    try:
        parsed = json.loads(data)
        validated = Schema.model_validate(parsed)
        return Ok(validated)
    except json.JSONDecodeError as e:
        return Err(f"Invalid JSON: {e}")
    except ValidationError as e:
        return Err(f"Schema validation failed: {e}")

# Use AI only for genuinely ambiguous tasks
def extract_structured_data(text: str) -> Result:
    prompt = f"""
    Extract information from text.
    Required fields: name, email, date
    If a field is not present, set to null (do not guess).

    Text: {text}
    Output (JSON):
    """

    output = llm.generate(prompt)

    # Validate AI output deterministically
    try:
        parsed = json.loads(output)
        validated = Schema.model_validate(parsed)
        return Ok(validated)
    except Exception as e:
        return Err(f"AI output validation failed: {e}")

Principle: Use deterministic logic for deterministic problems. Use AI for ambiguity, not error recovery.


Anti-pattern 6: Vague Success Criteria

What it looks like

# No measurable definition of success
prompt = "Generate a good summary of this article"

# Subjective quality bar
prompt = "Write a professional email response"

# Undefined behavior
prompt = "Classify this text appropriately"

Why engineers do this

  • Natural language feels self-explanatory
  • Assuming model “knows” what quality means
  • Avoiding explicit specification work

Why it fails

  1. Cannot evaluate: No objective pass/fail criteria
  2. Cannot improve: Cannot measure if changes help
  3. Ambiguous to model: “Good” and “professional” are undefined
  4. Production risk: Unknown failure modes

What to do instead

# Explicit success criteria
prompt = """
Summarize this article.
Requirements:
- Exactly 3 sentences
- Include main topic in first sentence
- Include key conclusion in last sentence
- Total length: 50-75 words

Article: {article}
Summary:
"""

# Measurable validation
def validate_summary(summary: str, requirements: dict) -> bool:
    sentences = summary.split('.')
    word_count = len(summary.split())

    checks = {
        "sentence_count": len(sentences) == 3,
        "word_count": 50 <= word_count <= 75,
        "has_conclusion": requirements["conclusion_keyword"] in summary
    }

    return all(checks.values()), checks

# Test against criteria
summary = generate_summary(article)
passed, details = validate_summary(summary, requirements)
assert passed, f"Validation failed: {details}"

Principle: Every prompt should have explicit, measurable success criteria.


Anti-pattern 7: Prompt Inheritance Hell

What it looks like

# Base prompt
base = "You are a helpful assistant"

# Child prompt inherits and modifies
support = base + "\nYou specialize in customer support"

# Grandchild prompt
urgent_support = support + "\nHandle urgent requests first"

# Great-grandchild prompt
vip_urgent_support = urgent_support + "\nUser is VIP"

# Impossible to track what final prompt contains

Why engineers do this

  • Reusing prompts feels DRY
  • Inheritance is familiar from OOP
  • Incremental modifications seem clean

Why it fails

  1. Hidden complexity: Final prompt is sum of all inheritance
  2. Fragile changes: Modifying base breaks all children
  3. Testing nightmare: Must test entire inheritance chain
  4. Conflicting instructions: Child may contradict parent

What to do instead

# Composition over inheritance
class PromptComponents:
    @staticmethod
    def role(role_type: str) -> str:
        return f"Role: {role_type}"

    @staticmethod
    def task(task_desc: str) -> str:
        return f"Task: {task_desc}"

    @staticmethod
    def constraints(rules: list[str]) -> str:
        return "Constraints:\n" + "\n".join(f"- {rule}" for rule in rules)

# Compose explicitly
def build_support_prompt(user: User, request: str) -> str:
    components = [
        PromptComponents.role("customer support agent"),
        PromptComponents.task("resolve customer request"),
        PromptComponents.constraints([
            "Be professional and empathetic",
            "Reference order history if available",
            "Escalate if beyond scope"
        ])
    ]

    if user.is_vip:
        components.append("Priority: VIP customer")

    if request.is_urgent:
        components.append("Priority: Urgent request")

    components.append(f"Request: {request.text}")

    return "\n\n".join(components)

# Test exact combinations
def test_vip_urgent_prompt():
    user = User(is_vip=True)
    request = Request(text="Help", is_urgent=True)
    prompt = build_support_prompt(user, request)
    assert "VIP" in prompt
    assert "Urgent" in prompt

Principle: Compose prompts explicitly. Avoid inheritance.


Anti-pattern 8: Optimization by Guessing

What it looks like

# Tweaking without measurement
# "I think this sounds better"
old_prompt = "Summarize this text concisely"
new_prompt = "Provide a brief summary of this text"

# Deploy without testing
deploy_prompt(new_prompt)

# "It feels faster"
temperature = 0.3  # Changed from 0.5, seems better?

Why engineers do this

  • Lack of evaluation infrastructure
  • Intuition from small manual tests
  • Pressure to ship quickly

Why it fails

  1. No baseline: Cannot tell if change helps
  2. Regression risk: May break existing cases
  3. Wasted effort: Random changes rarely help
  4. False confidence: Feels better, performs worse

What to do instead

# Systematic evaluation
class PromptEvaluator:
    def __init__(self, test_cases: list[TestCase]):
        self.test_cases = test_cases

    def evaluate(self, prompt: str) -> Metrics:
        results = []
        for case in self.test_cases:
            output = generate(prompt, case.input)
            score = self.score_output(output, case.expected)
            results.append(score)

        return Metrics(
            accuracy=mean(results),
            latency=mean([r.latency for r in results]),
            cost=sum([r.tokens for r in results])
        )

    def compare(self, prompt_a: str, prompt_b: str):
        metrics_a = self.evaluate(prompt_a)
        metrics_b = self.evaluate(prompt_b)

        return {
            "accuracy_change": metrics_b.accuracy - metrics_a.accuracy,
            "latency_change": metrics_b.latency - metrics_a.latency,
            "cost_change": metrics_b.cost - metrics_a.cost
        }

# Test before deploying
evaluator = PromptEvaluator(load_test_cases())
comparison = evaluator.compare(old_prompt, new_prompt)

if comparison["accuracy_change"] > 0.05:  # 5% improvement
    deploy_prompt(new_prompt)
else:
    print("No significant improvement, keeping old prompt")

Principle: Measure changes against baseline test cases before deploying.


Recognition Checklist

Your prompts have anti-patterns if:

  • Adding more instructions but quality is not improving
  • Different engineers modify prompts differently
  • Cannot reproduce exact prompt from production logs
  • Copying prompt patterns without testing them
  • No way to measure if changes help
  • Prompts grow longer with each bug fix
  • Using prompts to handle edge cases
  • Cannot explain why prompt is structured this way

Refactoring Prompts

Step 1: Establish baseline metrics

# Capture current performance
baseline = evaluate_current_prompt(test_cases)

Step 2: Identify anti-patterns

# Which patterns exist in current prompts?
audit_prompts(prompt_directory)

Step 3: Refactor one pattern at a time

# Fix one anti-pattern
refactored = fix_kitchen_sink_prompt(current_prompt)

# Test impact
new_metrics = evaluate_prompt(refactored, test_cases)
assert new_metrics >= baseline  # No regression

Step 4: Add guardrails

# Prevent future anti-patterns
- Add prompt linting rules
- Require test coverage for prompts
- Version control all prompts
- Mandate evaluation before deployment

Conclusion

Prompt engineering is software engineering.

Anti-patterns in prompts are as damaging as anti-patterns in code:

  • Technical debt accumulates
  • Systems become fragile
  • Maintenance becomes expensive
  • Failures become unpredictable

Avoiding anti-patterns requires discipline:

  1. Explicit over implicit: State all assumptions
  2. Measured over guessed: Test changes systematically
  3. Composed over concatenated: Build prompts like components
  4. Versioned over ad-hoc: Track prompt changes like code
  5. Validated over hoped: Define success criteria

Treat prompts like interfaces: designed, tested, and maintained with engineering rigor.

Continue learning

Intentional links