AI Deception Methods using hidden triggers are ahead of modern protection

image


The increasing use of deep neural networks (SNS) for computer vision tasks such as facial recognition, medical imaging, object detection and autonomous driving may attract the attention of cybercriminals. STS has become the basis for deep learning and the broader field of artificial intelligence (AI).

The use of AI is expected to grow rapidly in the coming years. According to analysts at Emergen Research, the global market for GPS technology will grow from $1.26 billion in 2019 to $5.98 billion by 2027, while demand in industries such as healthcare, banking, financial services and insurance will increase sharply.

Such a fast-growing market tends to attract the attention of intruders who can interfere in the process of learning an AI model to introduce hidden functions or triggers into the STS — “Trojan horses” for machine learning. Such a Trojan can change the behavior of the model and lead to bad consequences. For example, people may be incorrectly identified or objects may be incorrectly read. This can be deadly when working with unmanned vehicles that read road signs.

Over the past few years, researchers have published numerous articles describing various methods of attacks and ways to detect and protect against them. Researchers from the Institute of Applied Artificial Intelligence at Deakin University and at the University of Wollongong (both in Australia) it is claimed that many of the proposed approaches to protection against Trojan attacks lag behind the pace of development of attack methods.

In a standard Trojan attack on an image classification model, attackers control the learning process of the image classifier. They introduce a Trojan into the classifier so that it incorrectly classifies the image at the command of the attacker. Trojan attacks continue to evolve and become more complex, with different triggers for different input images, rather than using a single global image.

Experts have proposed two new methods of protection – Variational Input Filtering (“variational input filtering”) and Adversarial Input Filtering (“adversarial input filtering”). Both methods are designed to examine a filter that can detect all Trojans in the model’s input data at runtime.

VIF considers filters as a variational autocoder that gets rid of all noisy information at the input, including triggers. In contrast, AIF uses an auxiliary generator to detect and identify hidden triggers and uses adversarial training for both the generator and the filter so that the filter removes all potential triggers.

Start a discussion …
Source link