Gmail has unveiled an AI-based spam filter, marking the largest security system update in recent years. The integration involves the introduction of a new text classification system called Resilient & Efficient Text Vectorizer (RETVec) into the email service.
According to developers, RETVec efficiently identifies spam messages, including emails with numerous special characters, emojis, typos, and other elements that were previously distinguishable to humans but challenging for spam filters. Based on available data, the new algorithm is effective in detecting messages with homoglyphs—graphically similar symbols with different meanings.
Google reports that the RETVec algorithm has been trained to effectively identify manipulated messages, including those with inserted or deleted characters, typos, homoglyphs, and more. The algorithm underwent training using an advanced encoder capable of efficiently encoding any characters and words in UTF-8 format. As a result, developers have created an algorithm that works "out of the box" for over 100 languages worldwide.
Apparently, RETVec operates in many ways similarly to human reading. The algorithm is built on the foundation of the TensorFlow AI framework and, in its operation, determines visual "similarity" to interpret the meanings of words rather than the characters they are composed of.
According to Google, replacing the previously used text vectorizer in Gmail with RETVec has increased the spam detection rate by 38% compared to the baseline, while reducing false positives by 19.4%. Furthermore, the number of tensor processing units (TPUs) utilized by the model has decreased by 83%, making this update one of the most significant for Gmail's security system in recent years.