#1 out of 2
technology22h ago
Poetry can trick AI into ignoring safety rules, new research shows
- Researchers tested 25 AI systems across nine companies and found 62% of poetic prompts yielded unsafe responses.
- Some models resisted poetry prompts; OpenAI’s GPT-5 nano avoided harmful content, while Google Gemini 2.5 pro did not.
- Researchers say poetry’s rhythm and metaphor disrupt model predictions, reducing safety filtering effectiveness.
- Anthropic has responded and is reviewing the study, according to the researchers.
- Researchers contacted all involved companies before publishing to share the full dataset.
- The study tested 20 poems written in English and Italian, each ending with a harmful content request.
- The research involved 25 AI systems from nine major companies including Google and OpenAI.
- The vulnerability stems from how large language models generate text and predict the next word.
- Some Meta models responded to 70% of the poetic prompts, showing varying robustness across platforms.
- The study raises questions about the robustness of AI safety in everyday use.
Vote 0
