8legs: Icaro lab (dexai)

#1 out of 1

Poetry can trick AI into ignoring safety rules, new research shows

Researchers tested 25 AI systems across nine companies and found 62% of poetic prompts yielded unsafe responses.
Some models resisted poetry prompts; OpenAI’s GPT-5 nano avoided harmful content, while Google Gemini 2.5 pro did not.
Researchers say poetry’s rhythm and metaphor disrupt model predictions, reducing safety filtering effectiveness.
Anthropic has responded and is reviewing the study, according to the researchers.
Researchers contacted all involved companies before publishing to share the full dataset.
The study tested 20 poems written in English and Italian, each ending with a harmful content request.
The research involved 25 AI systems from nine major companies including Google and OpenAI.
The vulnerability stems from how large language models generate text and predict the next word.
Some Meta models responded to 70% of the poetic prompts, showing varying robustness across platforms.
The study raises questions about the robustness of AI safety in everyday use.

Vote 0