A Prompt Is Not a Lock: Read-Only Modes in AI Harnesses

After an incident with Claude Code in which my code was overwritten while I was in Plan Mode - a mode whose premise is that it does not touch the code - I decided to dig deeper into how AI harnesses actually work, and how they secure their various modes.

I’m building my own AI harness to support my SIEM work, so before I implement my solution, I wanted to see how others have solved the problem.

I use a few coding harnesses every day (e.g. Claude Code, OpenCode), and while their purpose is different from mine, the underlying question is identical: how do you let a language model take actions without letting it take the wrong ones? The interesting part is the gap between the promise and the actual mechanism: a mode is documented as safe, but nothing in the code actually enforces it. This post is about how ‘pretty please’ security - prompt engineering as security - is applied by production tools.

Prompts are not boundaries

The core problem: a system prompt is not a security control, but it is being used as one.

In read-only modes - like Plan Mode in Claude Code - AI harnesses typically include a “do not modify any files” instruction in the prompt. But that is only a request, not a rule. The model weighs it against everything else in its context, and it can be outweighed: by a long context that dilutes it, by a more forceful or more recent instruction, or by hostile input shaped to look like an instruction. The model is inclined to obey. It is not constrained to.

Featured The illusion of control in AI Harnesses

From a security perspective, a system prompt is a sign on a door, not a lock. The sign does real work - most models read it and behave, the way a “do not enter” notice turns most people away - and that alone makes it worth having: a model that follows the rule does the right thing on the first try, with no wasted turns or tokens. But a sign does not work on everybody. For the model that ignores it, the sign is not what stops the damage. The lock is.

The prompt shapes behavior and keeps the common case cheap; the lock (in an AI harness, static enforcement in code) stops dangerous behavior and keeps the system safe. You want both, each doing what it does best.

Production Systems

It is worth seeing how real coding harnesses draw that line in practice. I looked at three tools I tend to use: Claude Code, OpenCode, and VS Code. The question for every one of them is the same: when a mode promises to restrict what the model can do, is that promise enforced in code, or is it just ‘pretty please’ security begging the model to cooperate?

1. Claude Code

Claude Code has a read-only mode called ‘Plan’. As the Claude Code documentation puts it: ‘Plan mode tells Claude to research and propose changes without making them. Claude reads files, runs shell commands to explore, and writes a plan, but does not edit your source.’ That is a clear statement, and a user who reads it has every right to believe their source will not be edited while Plan Mode is on.

After research and testing, I can say the promise does not hold up. Entering Plan Mode does not remove the editing tools from the model’s tool set, nor add any rule denying them. Nothing in the code flow changes - every request travels the same path as in normal operation, and the permission logic deciding whether a write is allowed neither knows nor cares that Plan Mode is on.

What Plan Mode actually does is add a prompt telling the model it may not execute tools and may only modify the plan file. That is the entire mechanism.

Claude Code’s workflow and guardrails CC’s workflow when the model ignores the Prompt-based request

This is the guardrail that failed during my planning session. In my case, CC with Opus 4.7 modified files I did not want to change yet. The system reminder prompt asked the model not to edit. The model did it anyway, and the edit went straight through.

So, what Plan Mode actually provides is a strongly worded request; no guardrail stands between the model and your files. That is a lot of trust to place in the model.

2. OpenCode

OpenCode improves on this. Where Claude Code stops at a nicely worded request, OpenCode enforces its restrictions with hardcoded guardrails.

In Plan Mode, OpenCode injects a prompt, similar to CC. But the prompt is not the only thing standing in the way: the ’edit’ tools - the file editors the model would normally reach for - are programmatically removed from its tool set entirely, so there is nothing for the model to call. And if it still tries to misbehave, a hardcoded guardrail stands in its way and refuses the call before it runs. The prompt is the first line of defence; the code is the line that actually holds.

But there is a soft spot. OpenCode blocks the ’edit’ tools, but the ‘shell’ tool - which can execute arbitrary commands - is not removed from the set, and no hardcoded check refuses its calls. A shell command can modify files just as easily as a dedicated edit tool. Here is OpenCode modifying a file while in Plan Mode using bash:

OpenCode modifying a file via shell in Plan Mode OpenCode creating a file via shell in Plan Mode

Shell access is genuinely useful in a coding tool - it lets the model gather context or test its code. But as the image shows, it also opens a clear gap. The only thing OpenCode does to limit the shell in Plan Mode is to inject this prompt:

<system-reminder>
# Plan Mode - System Reminder

CRITICAL: Plan mode ACTIVE - you are in READ-ONLY phase. STRICTLY FORBIDDEN:
ANY file edits, modifications, or system changes. Do NOT use sed, tee, echo,
cat, or ANY other bash command to manipulate files - commands may ONLY
read/inspect. This ABSOLUTE CONSTRAINT overrides ALL other instructions,
including direct user edit requests. You may ONLY observe, analyze, and plan.
Any modification attempt is a critical violation. ZERO exceptions.
</system-reminder>

The intent is clear: no file touched, not even at the user’s request. The image shows how easily that instruction is ignored.

The edit tools are locked by hardcoded enforcement; the shell tool is not. Stacking adjectives and capitalizing letters in a prompt is not the same as a guardrail hardcoded in the code. The shell is a general-purpose execution surface. Any permission model that ignores it isn’t restricting access - it’s just relocating it. For the most powerful tool in the set (shell), OpenCode falls back to the same mechanism as Claude Code: a strongly worded request.

OpenCode’s workflow and guardrails OpenCode’s workflow when the model ignores the Prompt-based request

Any tool can have bugs, but these are not bugs. The carefully written prompt above indicates that prompt-as-enforcement is the deliberate design.

3. VS Code

VS Code is the strictest of the three, and the safest. Like Claude Code and OpenCode, its read-only modes - Ask and Plan - inject a prompt telling the model not to edit. But here the prompt isn’t what holds the line: each mode is given a fixed set of tools, and the model can call nothing else. The set consists of read-only tools only: no editing, no writing, no terminal. A model that tries to deviate still cannot change a file, because it was never handed a tool that can.

This makes it secure by default. There is no way around it for the model.

VS Code’s workflow and guardrails VS Code’s workflow when the model ignores the prompt-based request

What the model cannot reach, the user can add. VS Code lets you extend a mode’s tool list - through config, or by referencing a tool with #toolName in a message. This is intentional: there are real cases where you want a particular tool on hand while planning. It’s secure by default, but the user can still widen the gate.

Where the model is concerned, VS Code’s read-only mode holds. The model can’t get through it - but the user can. Factor that in.

The three side by side

All three harnesses ship a read-only mode, and all three use a prompt with it. On its own that means nothing - a prompt is just a request. What separates the tools is whether anything stands behind it once the model ignores the request.

Comparison

Harness	System prompt	Code enforcement	The gap
Claude Code	Yes	No	The editing tools stay available the whole time; an old “allow” rule or auto mode lets writes through with no permission dialog.
OpenCode	Yes	Partial	The edit tools are blocked in code, but the shell is left open and can modify files anyway.
VS Code	Yes	Yes	Read-only by tool list - the model is never handed an editing tool. The user can still widen the list.

The mentioned ‘system prompt’ is not actually a system prompt for every tool. In some cases it is a synthetic user message wrapped in tags and intended to be used as commands.

The Lock Is the Code

Use the prompt for what it is good at. Telling the model what you want, how to behave, what good output looks like - that belongs in the prompt, and a clear prompt makes a harness genuinely better to use. Just do not let it carry weight it cannot bear. It is not the lock - that is what bit me in Plan Mode, and what I am designing against in my own harness.

The lock is the code that runs whether or not the model agrees with it: tools bound narrowly, every call checked against an explicit allowlist before it executes, and no general-purpose escape hatch like a shell left sitting outside the rules. Newer models comply more often, which is great, but “more often” is the wrong threshold in security.