
Two days after Anthropic released Claude Cowork to the public, security researchers at PromptArmor published a working proof of concept. A Word document with text set to 1-point font, white on white, with line spacing at 0.1 contained instructions that were invisible to any human reader. When a user uploaded the file and asked Cowork to process their financial documents, Cowork located those files and silently uploaded them, including partial Social Security numbers, to an attacker's Anthropic account.
The attack completed without triggering any approval prompt, and the victim had no way to know it had occurred.
The underlying vulnerability had been reported to Anthropic three months before Cowork launched, acknowledged by the team, and still not patched when the product shipped.
In the months after launch, researchers at PromptArmor, Zenity Labs, and LayerX Security published detailed attack analyses. Four of them are worth understanding in detail.
Cowork runs with more access than most desktop software you've installed. What it can do within its VM sandbox, and under what conditions, is the foundation for understanding any security risk associated with using it.
Cowork runs in a sandboxed virtual machine on your computer. Within that sandbox, it has significant reach:
When you send a message from your phone, Cowork executes it using your desktop's full granted permissions. The more access you've enabled, the more damage any successful attack can do.
For a detailed breakdown of how Claude Cowork vs Claude Code differ in terms of environment, permissions, and appropriate use cases, our post on Claude Cowork vs Claude Code covers those comparisons in full. The short version: Cowork is a desktop app for knowledge workers running in a VM sandbox; Claude Code is a terminal tool for developers running with full filesystem permissions. What they share is the attack vector. Both process untrusted content, and in both tools, content that contains hidden instructions can redirect Claude's behavior.
That list of capabilities is what Cowork is designed to do. The incidents below show what happens when those same capabilities are used against you.
The PromptArmor attack shows exactly how prompt injection scales when an AI tool can take real actions. All it required was getting a user to process one external file. The malicious instruction was hidden in the document's formatting properties, invisible in any standard document viewer.
Blocking the exfiltration at the network level was harder than it sounds. Cowork's VM restricts most outbound connections, but api.anthropic.com is on the whitelist, because Cowork has to communicate with Anthropic to function at all. The attacker's API key turned that trusted channel into an exfiltration path.
Security researcher Simon Willison, who coined the term "prompt injection" in 2022, pointed out a fundamental tension in Anthropic's response: "I do not think it is fair to tell regular non-programmer users to watch out for 'suspicious actions that may indicate prompt injection.'" Cowork is marketed to office workers, not security analysts.
When the Claude in Chrome extension is connected to Cowork, it adds a browser with access to every site the user is logged into. Researchers at Zenity Labs identified what they described as a "lethal trifecta": the extension can access personal account data, act on that data, and be influenced by instructions embedded in web pages it visits.
In one demonstration, a fake employer email containing hidden instructions successfully directed Claude to delete emails from the user's inbox. The email appeared completely normal, and the deletion happened without triggering any approval step. Once the extension is active, there is no session toggle, meaning any web page Claude visits during a task is a potential injection source.
LayerX Security discovered that Claude Desktop Extensions (DXTs) run without sandboxing and with full system-level privileges, unlike browser extensions which operate within restricted permission boundaries. A malicious Google Calendar event could trigger arbitrary code execution on a user's machine when Claude was asked to handle calendar tasks.
The vulnerability received a CVSS severity score of 10/10, the maximum possible. Anthropic's documented response was that the flaw "falls outside our current threat model."
MCP servers extend Cowork's reach: Google Drive, Slack, calendars, databases, and hundreds of third-party tools. Each connection also adds attack surface. A February 2026 audit by Snyk of the AI agent skills ecosystem found that 36.82% of available skills contain at least one security flaw. 13.4% have critical-level issues, including malware distribution, embedded prompt injection payloads, and exposed secrets. Of the malicious skills identified, 91% combined prompt injection techniques with traditional malware, a combination designed to bypass both AI safety mechanisms and conventional security tooling.
This mirrors the early state of npm and PyPI package ecosystems. The difference is that AI skills have direct access to credentials, file systems, and APIs.
Anthropic isn't ignoring the security problem. The company has invested more in this than most AI vendors have at a comparable stage, most meaningfully in model training, where Claude Opus 4.5 brought successful prompt injection attacks down to roughly 1% in adversarial testing, down from 30 to 40% for earlier models. The table below maps what's been addressed against what's still open.
An ex-NSA security expert who analyzed Anthropic's approach in detail summarized it as: "Anthropic is ahead of the pack. But 'ahead of the pack' in agentic AI security is a low bar."
Model-level defenses get you part of the way. The rest depends on how you configure the tool and what habits you build around it. For a tool with this level of access, those choices do a substantial portion of the security work.
The PromptArmor attack used a Word document that looked completely normal. Other documented attacks have used PDFs, Markdown files, and integration guides. Any file from an untrusted source, whether a forum download, an email attachment, or a shared file from a contact, is a potential injection carrier. Create a dedicated working folder for Cowork and keep external files out of it until you have confirmed what they contain.
Cowork's virtual machine limits most outbound network traffic, but the Anthropic API is whitelisted. That is not a design flaw, it is a necessary function. It means that prompt injection attacks using api.anthropic.com as an exfiltration channel are structurally harder to catch at the network layer, which is exactly how the PromptArmor attack succeeded.
Claude Code and Cowork share the same MCP attack surface, but the Claude Cowork vs Claude Code difference in user profile matters here. Developers using Claude Code are more likely to read code before running it. Cowork is marketed to non-technical users who may install MCP connectors the same way they install browser extensions: quickly, without reviewing what the code does. Before connecting any MCP server, apply the same judgment you would to a software dependency:
Broad folder access amplifies the consequence of any successful injection. Financial documents, credentials, legal files, and anything containing personal data should stay in separate directories that Cowork cannot reach. The PromptArmor victim had connected Cowork to a folder containing confidential real estate files with partial SSNs. That access level determined the damage.
This mode removes the per-step approval that would give you a chance to catch unexpected behavior mid-task. The Zenity Labs email deletion occurred in a context where no intermediate approval step interrupted the injection. Use "Act without asking" only when the source files are trusted, the task scope is fully defined, and you are actively present to intervene.
The extension keeps every logged-in site within Claude's reach. Any web page Claude visits during a task can contain instructions that redirect its behavior. If you do use it, keep the set of permitted sites small and keep sensitive browser sessions (banking, healthcare, email) out of the same window while Cowork is active.
Desktop Extensions run without sandboxing and with full system privileges. The LayerX RCE demonstrated that a malicious calendar event could execute arbitrary code at the OS level. Unlike browser extensions, DXTs have no permission boundary between their code and your machine. A CVSS 10 vulnerability in this category was considered outside Anthropic's threat model. Your defense cannot depend on the vendor's.
Scheduled tasks run without supervision. A task with access to email, calendar, and file storage can cause significant problems if an injection planted in an earlier session activates when the scheduled task runs. Practical constraints that reduce the blast radius:
Cowork activity is not captured in Anthropic's audit logs, Compliance API, or data exports. As NineTwoThree's Cowork vs Claude Code guide notes, organizations subject to SOC 2 Type II, HIPAA, or PCI-DSS cannot demonstrate a complete record of what Cowork accessed without deploying additional infrastructure at the workstation level, typically an OpenTelemetry gateway feeding into a SIEM. Plan that cost before deployment.
In the r/cybersecurity community, IT practitioners have already raised this exact problem: fund managers demanding Cowork installations with no guidance on what security guardrails are actually available. The answer, right now, is that most of those controls must be built externally.
If Claude accesses files or services you did not ask it to, if the scope of a task expands without instruction, or if it requests sensitive information out of context: stop the task and report it using the in-app feedback button. PromptArmor published their research specifically because they wanted users to be able to recognize what an attack in progress actually looks like, and to create pressure on Anthropic to treat disclosure reports as patching priorities.
Prompt injection isn't a bug in Cowork specifically. It's a property of how language models process information, and it applies to any tool in this category. A trusted input channel, whether a file, a web page, or a calendar event, carries content that Claude processes as data but that secretly contains instructions. Claude reads them and follows them.
This is prompt injection, and there is no architectural fix for it yet. The closest parallel in traditional security is SQL injection, which plagued web applications for over a decade until parameterized queries created a structural separation between code and data. Prompt injection is still waiting for an equivalent. Current defenses are filters, and filters can be bypassed. The drop from 30 to 40% attack success to roughly 1% in Opus 4.5 is real progress. It is not zero.
The ten rules above won't eliminate all risk. They make targeted attacks significantly harder to execute and limit the damage if one gets through.
Individual use is manageable. Rolling Cowork out across an organization is a different problem, and individual safety habits don't cover it.
The audit trail gap matters most in regulated industries. Without additional infrastructure, you can't answer "what did Cowork access last Tuesday?" after the fact. Build that logging before deployment. Not after something goes wrong.
MCP governance also gets harder as your team grows. Every MCP server your organization approves is a shared attack surface. An explicit allowlist of approved connectors, with a source code review for anything community-built, is the minimum before rollout.
For organizations working through the broader governance picture, our post on Shadow AI covers the infrastructure and policy decisions that come up when AI tools with significant access start running across a team.
If you want a practical framework for putting guardrails in place before deploying any GenAI tool across your team, download our guide Effective Guardrails for Your GenAI Apps.
For teams that need AI agents running inside security boundaries your organization controls, NineTwoThree builds custom architectures with data sovereignty, audit logging, and access controls built from the start. Talk to us about what that looks like for your workflows.
Two days after Anthropic released Claude Cowork to the public, security researchers at PromptArmor published a working proof of concept. A Word document with text set to 1-point font, white on white, with line spacing at 0.1 contained instructions that were invisible to any human reader. When a user uploaded the file and asked Cowork to process their financial documents, Cowork located those files and silently uploaded them, including partial Social Security numbers, to an attacker's Anthropic account.
The attack completed without triggering any approval prompt, and the victim had no way to know it had occurred.
The underlying vulnerability had been reported to Anthropic three months before Cowork launched, acknowledged by the team, and still not patched when the product shipped.
In the months after launch, researchers at PromptArmor, Zenity Labs, and LayerX Security published detailed attack analyses. Four of them are worth understanding in detail.
Cowork runs with more access than most desktop software you've installed. What it can do within its VM sandbox, and under what conditions, is the foundation for understanding any security risk associated with using it.
Cowork runs in a sandboxed virtual machine on your computer. Within that sandbox, it has significant reach:
When you send a message from your phone, Cowork executes it using your desktop's full granted permissions. The more access you've enabled, the more damage any successful attack can do.
For a detailed breakdown of how Claude Cowork vs Claude Code differ in terms of environment, permissions, and appropriate use cases, our post on Claude Cowork vs Claude Code covers those comparisons in full. The short version: Cowork is a desktop app for knowledge workers running in a VM sandbox; Claude Code is a terminal tool for developers running with full filesystem permissions. What they share is the attack vector. Both process untrusted content, and in both tools, content that contains hidden instructions can redirect Claude's behavior.
That list of capabilities is what Cowork is designed to do. The incidents below show what happens when those same capabilities are used against you.
The PromptArmor attack shows exactly how prompt injection scales when an AI tool can take real actions. All it required was getting a user to process one external file. The malicious instruction was hidden in the document's formatting properties, invisible in any standard document viewer.
Blocking the exfiltration at the network level was harder than it sounds. Cowork's VM restricts most outbound connections, but api.anthropic.com is on the whitelist, because Cowork has to communicate with Anthropic to function at all. The attacker's API key turned that trusted channel into an exfiltration path.
Security researcher Simon Willison, who coined the term "prompt injection" in 2022, pointed out a fundamental tension in Anthropic's response: "I do not think it is fair to tell regular non-programmer users to watch out for 'suspicious actions that may indicate prompt injection.'" Cowork is marketed to office workers, not security analysts.
When the Claude in Chrome extension is connected to Cowork, it adds a browser with access to every site the user is logged into. Researchers at Zenity Labs identified what they described as a "lethal trifecta": the extension can access personal account data, act on that data, and be influenced by instructions embedded in web pages it visits.
In one demonstration, a fake employer email containing hidden instructions successfully directed Claude to delete emails from the user's inbox. The email appeared completely normal, and the deletion happened without triggering any approval step. Once the extension is active, there is no session toggle, meaning any web page Claude visits during a task is a potential injection source.
LayerX Security discovered that Claude Desktop Extensions (DXTs) run without sandboxing and with full system-level privileges, unlike browser extensions which operate within restricted permission boundaries. A malicious Google Calendar event could trigger arbitrary code execution on a user's machine when Claude was asked to handle calendar tasks.
The vulnerability received a CVSS severity score of 10/10, the maximum possible. Anthropic's documented response was that the flaw "falls outside our current threat model."
MCP servers extend Cowork's reach: Google Drive, Slack, calendars, databases, and hundreds of third-party tools. Each connection also adds attack surface. A February 2026 audit by Snyk of the AI agent skills ecosystem found that 36.82% of available skills contain at least one security flaw. 13.4% have critical-level issues, including malware distribution, embedded prompt injection payloads, and exposed secrets. Of the malicious skills identified, 91% combined prompt injection techniques with traditional malware, a combination designed to bypass both AI safety mechanisms and conventional security tooling.
This mirrors the early state of npm and PyPI package ecosystems. The difference is that AI skills have direct access to credentials, file systems, and APIs.
Anthropic isn't ignoring the security problem. The company has invested more in this than most AI vendors have at a comparable stage, most meaningfully in model training, where Claude Opus 4.5 brought successful prompt injection attacks down to roughly 1% in adversarial testing, down from 30 to 40% for earlier models. The table below maps what's been addressed against what's still open.
An ex-NSA security expert who analyzed Anthropic's approach in detail summarized it as: "Anthropic is ahead of the pack. But 'ahead of the pack' in agentic AI security is a low bar."
Model-level defenses get you part of the way. The rest depends on how you configure the tool and what habits you build around it. For a tool with this level of access, those choices do a substantial portion of the security work.
The PromptArmor attack used a Word document that looked completely normal. Other documented attacks have used PDFs, Markdown files, and integration guides. Any file from an untrusted source, whether a forum download, an email attachment, or a shared file from a contact, is a potential injection carrier. Create a dedicated working folder for Cowork and keep external files out of it until you have confirmed what they contain.
Cowork's virtual machine limits most outbound network traffic, but the Anthropic API is whitelisted. That is not a design flaw, it is a necessary function. It means that prompt injection attacks using api.anthropic.com as an exfiltration channel are structurally harder to catch at the network layer, which is exactly how the PromptArmor attack succeeded.
Claude Code and Cowork share the same MCP attack surface, but the Claude Cowork vs Claude Code difference in user profile matters here. Developers using Claude Code are more likely to read code before running it. Cowork is marketed to non-technical users who may install MCP connectors the same way they install browser extensions: quickly, without reviewing what the code does. Before connecting any MCP server, apply the same judgment you would to a software dependency:
Broad folder access amplifies the consequence of any successful injection. Financial documents, credentials, legal files, and anything containing personal data should stay in separate directories that Cowork cannot reach. The PromptArmor victim had connected Cowork to a folder containing confidential real estate files with partial SSNs. That access level determined the damage.
This mode removes the per-step approval that would give you a chance to catch unexpected behavior mid-task. The Zenity Labs email deletion occurred in a context where no intermediate approval step interrupted the injection. Use "Act without asking" only when the source files are trusted, the task scope is fully defined, and you are actively present to intervene.
The extension keeps every logged-in site within Claude's reach. Any web page Claude visits during a task can contain instructions that redirect its behavior. If you do use it, keep the set of permitted sites small and keep sensitive browser sessions (banking, healthcare, email) out of the same window while Cowork is active.
Desktop Extensions run without sandboxing and with full system privileges. The LayerX RCE demonstrated that a malicious calendar event could execute arbitrary code at the OS level. Unlike browser extensions, DXTs have no permission boundary between their code and your machine. A CVSS 10 vulnerability in this category was considered outside Anthropic's threat model. Your defense cannot depend on the vendor's.
Scheduled tasks run without supervision. A task with access to email, calendar, and file storage can cause significant problems if an injection planted in an earlier session activates when the scheduled task runs. Practical constraints that reduce the blast radius:
Cowork activity is not captured in Anthropic's audit logs, Compliance API, or data exports. As NineTwoThree's Cowork vs Claude Code guide notes, organizations subject to SOC 2 Type II, HIPAA, or PCI-DSS cannot demonstrate a complete record of what Cowork accessed without deploying additional infrastructure at the workstation level, typically an OpenTelemetry gateway feeding into a SIEM. Plan that cost before deployment.
In the r/cybersecurity community, IT practitioners have already raised this exact problem: fund managers demanding Cowork installations with no guidance on what security guardrails are actually available. The answer, right now, is that most of those controls must be built externally.
If Claude accesses files or services you did not ask it to, if the scope of a task expands without instruction, or if it requests sensitive information out of context: stop the task and report it using the in-app feedback button. PromptArmor published their research specifically because they wanted users to be able to recognize what an attack in progress actually looks like, and to create pressure on Anthropic to treat disclosure reports as patching priorities.
Prompt injection isn't a bug in Cowork specifically. It's a property of how language models process information, and it applies to any tool in this category. A trusted input channel, whether a file, a web page, or a calendar event, carries content that Claude processes as data but that secretly contains instructions. Claude reads them and follows them.
This is prompt injection, and there is no architectural fix for it yet. The closest parallel in traditional security is SQL injection, which plagued web applications for over a decade until parameterized queries created a structural separation between code and data. Prompt injection is still waiting for an equivalent. Current defenses are filters, and filters can be bypassed. The drop from 30 to 40% attack success to roughly 1% in Opus 4.5 is real progress. It is not zero.
The ten rules above won't eliminate all risk. They make targeted attacks significantly harder to execute and limit the damage if one gets through.
Individual use is manageable. Rolling Cowork out across an organization is a different problem, and individual safety habits don't cover it.
The audit trail gap matters most in regulated industries. Without additional infrastructure, you can't answer "what did Cowork access last Tuesday?" after the fact. Build that logging before deployment. Not after something goes wrong.
MCP governance also gets harder as your team grows. Every MCP server your organization approves is a shared attack surface. An explicit allowlist of approved connectors, with a source code review for anything community-built, is the minimum before rollout.
For organizations working through the broader governance picture, our post on Shadow AI covers the infrastructure and policy decisions that come up when AI tools with significant access start running across a team.
If you want a practical framework for putting guardrails in place before deploying any GenAI tool across your team, download our guide Effective Guardrails for Your GenAI Apps.
For teams that need AI agents running inside security boundaries your organization controls, NineTwoThree builds custom architectures with data sovereignty, audit logging, and access controls built from the start. Talk to us about what that looks like for your workflows.
