After an incident with Claude Code in which my code was overwritten while I was in Plan Mode - a mode whose premise is that it does not touch the code - I decided to dig deeper into how AI harnesses actually work, and how they secure their various modes.
I’m building my own AI harness to support my SIEM work, so before I implement my solution, I wanted to see how others have solved the problem.
I use a few coding harnesses every day (e.g Claude Code, OpenCode), and while their purpose is different from mine, the underlying question is identical: how do you let a language model take actions without letting it take the wrong ones? The interesting part is where these tools quietly fall back on a weaker mechanism than it looks, and present it as a safeguard. This post is about how ‘pretty please’ security - prompt engineering as security - is applied by production tools.
Prompts are not boundaries
The core problem: a system prompt is not a security control but it is being used as one.
In read-only modes - like Plan mode in Claude Code -, AI harnesses typically include a “do not modify any files” instruction in the prompt. But that is only a request, not a rule. The model weighs it against everything else in its context, and it can be outweighed: by a long context that dilutes it, by a more forceful or more recent instruction, or by hostile input shaped to look like an instruction. The model is inclined to obey. It is not constrained to.
The illusion of control in AI Harnesses
From a security perspective, a system prompt is a sign on a door, not a lock. The sign does real work - most models read it and behave, the way a “do not enter” notice turns most people away - and that alone makes it worth having: a model that follows the rule does the right thing on the first try, with no wasted turns or tokens. But a sign does not work on everybody. For the model that ignores it, the sign is not what stops the damage. The lock is.
The prompt shapes behavior and keeps the common case cheap; the lock (in an AI harness, static enforcement in code) stops dangerous behavior and keeps the system safe. You want both, each doing what it does best.
Production Systems
It is worth seeing how real coding harnesses draw that line in practice. I looked at three tools I tend to use: Claude Code, OpenCode, and VS Code. The question for every one of them is the same: when a mode promises to restrict what the model can do, is that promise enforced in code, or is it just ‘pretty please’ security begging the model to cooperate?
1. Claude Code
Claude Code has a read-only mode called ‘Plan’. As the Claude Code documentation puts it: ‘Plan mode tells Claude to research and propose changes without making them. Claude reads files, runs shell commands to explore, and writes a plan, but does not edit your source.’ That is a clear statement, and a user who reads it has every right to believe their source will not be edited while Plan Mode is on.
After research and testing, I can say that promise does not hold up. Entering Plan Mode does not remove the editing tools from the model’s tool set, nor add any rule denying them. Nothing in the code flow changes - every request travels the same path as in normal operation, and the permission logic deciding whether a write is allowed neither knows nor cares that Plan Mode is on.
What Plan Mode actually does is add a prompt telling the model it may not execute tools and may only modify the plan file. That is the entire mechanism.
CC’s workflow when the model ignores the Prompt-based request
This is the guardrail that failed during my planning session. In my case, CC with Opus 4.7 modified files I had not wanted to change yet. The system reminder prompt asked the model not to edit. The model did it anyway, and the edit went straight through.
So, what Plan Mode actually provides is a strongly worded request, no guardrail stands between the model and your files. That is a lot of trust to place in the model.
2. OpenCode
OpenCode improves on this. Where Claude Code stops at a nicely worded request, OpenCode enforces its restrictions with hardcoded guardrails.
In Plan Mode, OpenCode injects a prompt, similar to CC. But the prompt is not the only thing standing in the way: the ’edit’ tools - the file editors the model would normally reach for - are programmatically removed from its tool set entirely, so there is nothing for the model to call. And if it still tries to misbehave a hardcoded guardrail stands in its way and refuses the call before it runs. The prompt is the first line of defence; the code is the line that actually holds.
But there is a soft spot. OpenCode blocks the ’edit’ tools, but the ‘shell’ tool - which can execute arbitrary commands - is not removed from the set, and no hardcoded check refuses its calls. A shell command can modify files just as easily as a dedicated edit tool. Here is OpenCode modifying a file while in Plan Mode using bash:
OpenCode modifying a file via shell in Plan mode
Shell access is genuinely useful in a coding tool - it lets the model gather context or test its code. But as the image shows, it also opens a clear gap. The only thing OpenCode does to limit the shell in Plan Mode is inject this prompt:
<system-reminder>
# Plan Mode - System Reminder
CRITICAL: Plan mode ACTIVE - you are in READ-ONLY phase. STRICTLY FORBIDDEN:
ANY file edits, modifications, or system changes. Do NOT use sed, tee, echo,
cat, or ANY other bash command to manipulate files - commands may ONLY
read/inspect. This ABSOLUTE CONSTRAINT overrides ALL other instructions,
including direct user edit requests. You may ONLY observe, analyze, and plan.
Any modification attempt is a critical violation. ZERO exceptions.
</system-reminder>
The intent is clear: no file touched, not even at the user’s request. The image shows how easily that instruction is ignored.
The edit tools are locked by hardcoded enforcement; the shell tool is not. Stacking adjectives and capitalizing letters in a prompt is not the same as a guardrail hardcoded in the code. The shell is a general-purpose execution surface. Any permission model that ignores it isn’t restricting access - it’s just relocating it. For the most powerful tool in the set (shell), OpenCode falls back to the same mechanism as Claude Code: a strongly worded request.
OpenCode’s workflow when the model ignores the Prompt-based request
3. VS Code
VS Code is the strictest of the three, and the safest. Like Claude Code and OpenCode, its read-only modes - Ask and Plan - inject a prompt telling the model not to edit. But here the prompt isn’t what holds the line: each mode is given a fixed set of tools, and the model can call nothing else.The set consists of read-only tools only: no editing, no writing, no terminal. A model that tries to deviate still cannot change a file, because it was never handed a tool that can.
This makes it secure by default. There is no way around it for the model.
VS Code’s workflow when the model ignores the prompt-based request
What the model cannot reach, the user can add. VS Code lets you extend a mode’s tool list - through config, or by referencing a tool with #toolName in a message. This is intentional: there are real cases where you want a particular tool on hand while planning. It’s secure by default, but the user can still widen the gate.
Where the model is concerned, VS Code’s read-only mode holds. The model can’t get through it - but the user can.
The three side by side
All three harnesses ship a read-only mode and all three use a prompt with it. On its own that means nothing - a prompt is just a request. What separates the tools is whether anything stands behind it once the model ignores the request.
Comparison
| Harness | System prompt | Code enforcement | The gap |
|---|---|---|---|
| Claude Code | Yes | No | The editing tools stay available the whole time; an old “allow” rule or auto mode lets writes through with no permission dialog. |
| OpenCode | Yes | Partial | The edit tools are blocked in code, but the shell is left open and can modify files anyway. |
| VS Code | Yes | Yes | Read-only by tool list - the model is never handed an editing tool. The user can still widen the list. |
The mentioned ‘system prompt’ is not actually a system prompt for every tool. In some cases it is a synthetic user message between tags intended to be used as commands.
The Lock Is the Code
Use the prompt for what it is good at. Telling the model what you want, how to behave, what good output looks like - that belongs in the prompt, and a clear prompt makes a harness genuinely better to use. Just do not let it carry weight it cannot bear. It is not the lock - that is what bit me in Plan Mode, and what I am designing against in my own harness.
The lock is the code that runs whether or not the model agrees with it: tools bound narrowly, every call checked against an explicit allowlist before it executes, and no general-purpose escape hatch like a shell left sitting outside the rules. Newer models comply more often, which is great, but “more often” is the wrong threshold in security.