Poetry and AI – CodeWords

Recently, a loophole was discovered where people were using poetry to trick AI into breaking its own safety rules. By hiding requests inside a poem, users managed to get the AI to hand over restricted info, like its internal code or even guides on weapons and self-harm. It’s honestly a bit unsettling because it’s such a simple way to bypass the system. Unlike a typical hack that requires technical skills, this is something anyone could do just by getting creative with their phrasing.

The real worry is how AI simplifies access to dangerous information. Even though the AI is just pulling from things already on the web, it organizes everything in a way that’s way too convenient. I’m concerned that someone in a vulnerable place might use these shortcuts to find ways to hurt themselves or others. When an AI makes harmful instructions this easy to grab, it turns a difficult search into something that only takes a few seconds.

At the same time, I think some of the reports on this might be exaggerating a bit. This glitch doesn’t actually create dangerous secrets; it just makes them easier to reach. If someone were determined enough, they could probably find this stuff through a regular search engine eventually. So, while this “poetry jailbreak” is definitely a problem, it’s more like a shortcut to existing info than a brand-new threat that appeared out of thin air.

While I don’t think that AI companies should be held responsible for every single bug, if they look the other way and don’t fix it, it is definitely not okay. They might not be at fault for someone finding a clever way to mess with their program, but now that the word is out, they can’t just do nothing. Once a safety gap is exposed, it’s on the companies to step up and fix it to ensure their tech doesn’t become a tool for harm.

Link to article