Prompt Engineering Is Not a Template Library

I keep seeing prompt libraries sold like recipe books.

Use this phrase. Add this role. Put the instruction in this order. Ask the model to think step by step. Put delimiters around the context. Use this magic sentence.

Some of those tricks work.

The problem is that they do not work as universal laws.

A prompt is not a spell

The same prompt can behave differently across models, tasks, datasets, temperatures, tools, and product contexts.

That should not be surprising. A prompt is not an independent artifact. It is part of a system.

The model matters. The input distribution matters. The available tools matter. The evaluation target matters. The cost of a false positive matters. The user interface around the output matters.

So when somebody says “this prompt always works,” I usually hear “this prompt worked in one narrow situation.”

That is still useful.

It is just not science yet.

Templates hide the evaluation problem

The real work is not writing a pretty instruction.

The real work is knowing whether the output improved.

For a toy example, you can eyeball it. For a product, eyeballing does not scale. You need examples, failure cases, regression checks, and some way to compare runs.

Without evaluation, prompt engineering becomes taste theatre.

You tweak a sentence, the next output feels better, and you declare victory. Then a different user input breaks it tomorrow.

I have done this. It feels productive. It is often just moving the failure around.

The product layer matters more

In agent products, the prompt is only one part of the behavior.

What context enters the model?

Which tools are exposed?

What schema must the output satisfy?

What state is persisted?

What happens when the model is uncertain?

Who approves destructive actions?

Those choices often matter more than the exact wording of the system prompt.

A mediocre prompt inside a well-designed workflow can be usable. A beautiful prompt inside a vague workflow still creates mess.

What I still keep from prompt engineering

I am not saying prompts do not matter.

I still write them carefully. I still avoid ambiguous goals. I still separate task, context, constraints, and output format. I still give examples when the shape is hard to infer.

But I no longer treat prompts as the product.

Prompts are product behavior. That means they need versioning, tests, review, and rollback.

If a prompt change can change user-visible behavior, it should be treated like code or configuration, not like a note in someone’s chat history.

That is the level where prompt engineering becomes less mystical.

Not a library of magic sentences.

A controlled part of a system.

A prompt is not a spell

Templates hide the evaluation problem

The product layer matters more

What I still keep from prompt engineering

Think this is wrong?