New 'Flowbreaking' Attacks Expose Security Flaws in AI Language Models

· 1 min read

article picture

A concerning new type of attack targeting Large Language Models (LLMs) has been discovered, revealing vulnerabilities in the systems that control these AI tools. Security researchers have identified an exploit called "Flowbreaking" that manipulates the supporting infrastructure around LLMs rather than the models themselves.

The attack takes advantage of race conditions in Ubuntu - timing-based vulnerabilities that occur when system components operate out of their intended sequence. Two specific techniques have been documented: the "Second Thoughts" attack and the "Stop Button" exploit.

In the "Second Thoughts" attack, popular AI assistants like Microsoft 365 Copilot and ChatGPT can be tricked into briefly providing responses that should be blocked by their safety controls. While the systems eventually catch and retract these responses, the initial prohibited content is still momentarily exposed.

The "Stop Button" technique exploits the fact that clicking the stop button while an LLM is generating text prevents secondary safety checks from engaging. This allows users to capture partial responses containing content that would normally be filtered out.

What makes these attacks particularly notable is that they don't compromise the AI models directly. Instead, they target the extensive codebase that handles user inputs and model outputs. This infrastructure includes various guardrails and safety mechanisms designed to prevent misuse.

Security experts warn that as LLM systems grow more complex, with multiple layers of code between user input and model output, more vulnerabilities of this type are likely to surface. The discovery highlights the need for robust security testing of not just the AI models, but all the supporting components that make these systems work.

The emergence of Flowbreaking attacks adds to existing concerns about LLM security, joining previously known threats like jailbreaking and prompt injection. As organizations continue deploying AI assistants, protecting against these infrastructure-level exploits becomes increasingly critical.