The Core Problem: LLMs Are Not Deterministic Functions
Here is the uncomfortable truth that the prompt injection discourse refuses to acknowledge:you cannot secure a non-deterministic function at the input layer.
Every security mechanism we use in traditional software assumes a deterministic system. Input validation works because you can enumerate valid inputs. SQL injection prevention works because SQL has a predictable grammar. XSS protection works because HTML rendering is deterministic.
LLMs break all of these assumptions. Given the same input, an LLM can produce different outputs.Given adversarial inputs, an LLM can be coerced into producing outputs that violate your system constraints. And crucially: there is no mathematical proof that any input validation scheme can prevent this.
This is not a bug. This is the fundamental nature of probabilistic models.
What the Industry Missed
The prompt injection industry emerged from a reasonable premise: if adversaries can manipulate model outputs by crafting malicious inputs, we should filter those inputs.
But this assumes the problem is input classification. It is not. The problem is output trustworthiness in a non-deterministic system.
Consider what happens when you deploy a "prompt injection firewall":
- You block inputs that match known attack patterns
- Adversaries adapt with new patterns
- You update your filter rules
- Adversaries find edge cases
- You add more rules
- False positives increase
- Legitimate use cases break
- You tune the filters
- Security degrades
This is not a solvable problem. You are playing whack-a-mole with an infinite mole space.
Meanwhile, your actual security requirement — "ensure the AI does not execute unauthorized actions" — remains unaddressed.
Why Output Routing Is the Only Mathematically Sound Approach
The correct security model for non-deterministic systems is not input filtering. It is output validation with confidence-based routing.
Here is what that looks like in practice:
const response = await llm.generate(userPrompt);
// Extract the proposed action
const action = parseAction(response);
// Calculate confidence score
const confidence = await confidenceEngine.score({
action: action,
context: userContext,
historicalAccuracy: actionHistory
});
// Policy-based routing
if (confidence > 0.95 && action.riskLevel === 'low') {
await executeAction(action);
} else if (confidence > 0.75 && action.riskLevel === 'medium') {
await reviewQueue.enqueue(action, 'approval_required');
} else {
await reviewQueue.enqueue(action, 'high_risk_review');
}Notice what changed:
- We do not try to block malicious inputs
- We let the LLM process any input
- We evaluate the proposed action, not the prompt
- We route based on confidence and risk, not pattern matching
- High-confidence, low-risk actions execute automatically
- Everything else goes to human review
This is mathematically sound because we are securing the decision boundary, not the input space.
The Practical Implication
If you are building AI systems for production, you need to stop thinking about prompt injection as a security problem and start thinking about decision governance.
Your security model should not be:
- Block adversarial prompts
- Filter malicious inputs
- Sanitize user requests
Your security model should be:
- Evaluate every proposed action
- Calculate confidence for each decision
- Route based on risk and confidence
- Require human review for uncertain or high-risk actions
- Log everything immutably
This is not just better security. This is the only approach that satisfies EU AI Act compliance requirements.
Because Article 13 does not ask "did you block malicious prompts?" It asks: "Can you explain, with evidence, why your AI made this decision?"
And the only way to answer that question is to instrument the decision layer, not the input layer.