Recognizing sarcasm and cracking defenses of other AI: What new skills have neural networks mastered recently?

December 29, 2023  16:20

Recently, artificial intelligence systems have begun to better understand the context of information, and have also learned to hack the protection of other AI. This was reported in research published in the scientific journal Computer Science (CS).

Irony and sarcasm: Now not only people can understand them

Researchers from New York University have trained neural networks based on large language models (LLM) to recognize sarcasm and irony in texts created by people.

Several LLMs today can process texts and understand their underlying emotional tone—whether those texts express positive, negative, or neutral emotions. Sarcasm and irony are usually misclassified as "positive" patterns.

Scientists have identified functions and algorithmic components that can help AI better understand the true meaning of what is said. Then they tested their work on the RoBERTa and CASCADE LLM models, testing them with the help of comments on the Reddit forum. It turned out that neural networks have learned to recognize sarcasm almost as well as the average person does.

Chatbot for hacking other AI's security

In turn, researchers from Nanyang Technological University (NTU) in Singapore managed to crack the security of several AI chatbots, including ChatGPT, Google Bard and Microsoft Copilot, bypassing their ethical restrictions and forcing them to generate content contrary to their built-in restrictions.

Scientists trained their own neural network based on the Large Language Model (LLM), which underlies smart chatbots. They also created an algorithm called Masterkey, which itself generated hints that allowed it to bypass the restrictions of popular AI developers. These prohibitions are necessary to prevent chatbots from helping users write viruses, make explosive devices or drugs, and so on.

“Developers of AI services have guardrails to prevent violent, unethical or criminal content from being created using AI. But AI can be outwitted, and now we have used artificial intelligence against its own kind to “hack” LLMs and force them to create such content,” explained Professor Liu Yang, who led the study.

To obtain prohibited information from the AI, requests were generated that bypassed the ethical restrictions built into the program and the censorship of certain words.

According to experts, Masterkey will help identify weaknesses in the security of neural networks faster than hackers can do for illegal purposes.


 
 
 
 
  • Archive