Securing AI from adversarial attacks in the current landscape

Insights

  • As advanced AI, including generative AI, becomes more prevalent, the rising concern is the susceptibility to a growing number of AI threat vectors.
  • Attacks on deep learning models are multifarious and subtle. They can be broadly categorized as evasion, inference, poisoning, and backdoor attacks.
  • Nevertheless, only a handful of companies have considered the consequences of falling victim to these threat vectors, or of enhancing their cybersecurity stance.
  • To get ahead of these threats, five recommendations are apt. Make advanced AI secure by design; establish responsible AI hygiene measures; start a red teaming practice; build defense platforms; and integrate AI model security along with enterprise security.
  • The US and UK have started the process of creating a gold standard for AI security. As these guidelines become more widely accepted, we will witness a growth in services, solutions, and platforms that deal with AI model security.

With the rise of advanced AI — trillion-parameter generative AI, advanced computer vision models, and other multimodal, multilingual evolutionary AI systems — exposure to security vulnerabilities specific to such AI expands.

Even with growing pressure to ensure failsafe systems, many industries are slow to adopt advanced AI model security measures and reduce their risk exposure. This is in stark difference to the speed at which firms have advanced traditional security systems on network devices and IT infrastructure.

As advanced AI becomes more prevalent in critical applications such as semi-autonomous vehicles, predictive manufacturing, financial arbitrage, and media content creation and curation, the rising concern is the susceptibility to AI threat vectors.

Just six examples of poor security in advanced AI models:

  • Computer vision models erroneously predict obstacles, potentially causing accidents and loss of life if used in autonomous vehicles or other high-risk scenarios.
  • Predictive algorithms manipulate e-commerce prices, leading to profit loss for consumer, retail, and logistics firms.
  • Critical AI-powered infrastructure, including electrical grid infrastructure, can malfunction, causing large scale power outages as the AI system can be manipulated to cut off power generation by opening or closing specific valves or triggers.
  • AI in healthcare systems can behave erroneously, misdiagnosing or mistreating patients and causing casualties. For example, a medical image analysis tool can classify a malignant tumor as benign, or vice versa.
  • Corrupted AI in robotic systems can damage property or risk life.
  • LLMs may spout out erroneous information or leak sensitive training data.

Given the stakes, firms must prioritize model security in advanced AI before launching AI products and building capabilities. In short, AI-first requires a sec-first approach.

AI model attack taxonomy

Attacks on machine learning (ML) and AI systems typically involve two factors: the level of information the attacker possesses about the model, and the stage of the AI life cycle where the attack happens.

  • Information: A white box attack occurs when the attacker has complete knowledge of the model, including parameters, features, and training data. In contrast, a black box attack is when the attacker lacks insight into the model’s internal mechanisms and only has access to its predictions. A grey box attack falls between these extremes. In real world, an attacker often starts with a black box situation and slowly progresses toward a near-white box scenario using techniques such as adversarial attacks, social engineering, and phishing.
  • Stage of AI life cycle: Attacks can happen during model development, training, inference, and production.

Attacks on deep learning models are multifarious and subtle. These models are often sensitive to small input perturbations due to their inherent mathematical nature, leading to abnormal fluctuations in output. Attackers exploit these vulnerabilities by introducing almost indiscernible changes to the input, deceiving the AI model into misclassifying objects, text, or processed data.

In Figure 1, for example, certain pixels are carefully altered, with changes that remain undetectable to the human eye. These inputs exploit the model’s inherent weakness and cause it to misclassify predictions.

Figure 1. Introducing an adversarial image causes significant object misclassification

Figure 1. Introducing an adversarial image causes significant object misclassification

Source: Infosys

Threat vectors broadly fall in four categories:

  • Evasion attacks: Attackers manipulate model behavior using crafted inputs, causing misfunction. For instance, a fraud detection model can be tricked into recognizing normal transactions as fraudulent, and vice versa.
  • Inference attacks: Attacks aimed at stealing model training data, as well as proprietary models themselves. This can be done by cloning the model or forcing the model to leak data by querying it with multiple inputs. If done quickly and at scale, this sort of attack can lead to significant loss of sensitive enterprise data, compromising competitive advantage and business reputation.
  • Poisoning attacks: Attacker alters data labels, deactivates nodes in a neural network, and injects erroneous input data to corrupt the model by poisoning the training data.
  • Backdoor attacks: Attackers might plant backdoors (bypass an existing security system) in well-known open-source models, applications, platforms, and even hardware. Due to the inherent inexplicability of AI models, these backdoors are difficult to detect.

Given that these four threat vectors can be used at different stages of the AI life cycle, and the sheer number of different AI models, data types, and inherent vulnerabilities of AI use cases, the threat surface is considerable.

In our client engagements, we have observed that AI-first firms are vulnerable to over 300 types of attacks. What’s more, most enterprises have little to no defense mechanisms to detect and respond.

Five ways to get ahead of AI model threats

However, the story doesn’t end with the attacker. There are now sophisticated methods to contain threats, sometimes even before they occur.

Here are five recommendations for firms employing predictive and generative AI models at scale in the enterprise:

  1. Make advanced AI secure by design
    Businesses must adopt a secure by design approach for AI model security, mirroring traditional cybersecurity practices such as role-based access management, homomorphic encryption, differential privacy, and anonymization. For AI, there are unique AI/MLSecOps best practices that augment standard ML operations with added security. Techniques include:
    • Adversarial training and testing: Strengthen AI models through adversarial training to build resilience against manipulative attacks. This involves exposing models to scenarios outside normal operating parameters (corner scenarios) that attackers often exploit. Adversarial testing may also involve wisely evaluating and choosing AI models based on acknowledged vulnerabilities.
    • Network distillation: During the model training phase, combine outputs from multiple deep neural networks (DNNs). Here, the classification output produced by one DNN is utilized to train the subsequent DNN. Our studies reveal that this knowledge transfer can diminish the susceptibility of an AI model to minor disturbances, enhancing the model's overall robustness.
  2. Establish responsible AI hygiene measures
    AI model security is also intricately related to other responsible AI tenets, including fairness, transparency, and explainability. Using inclusive, high quality data sets for training addresses many edge cases and corner scenarios. Explainability mechanisms such as BLEU and ROUGE enable data scientists to effectively respond to anomalous behavior.
  3. Start a red teaming practice
    Red teaming simulates real-world attacks on an AI system from the point of view of an adversary, testing its inherent defenses and vulnerabilities. Organizations can then understand their security strengths and weaknesses from an adversary's perspective. However, red teaming needs to be a continuous exercise, which, when done well, validates and reveals gaps that can be easily countered, as well as shed light on future threat strategies.
  4. Build defense platforms
    Defense platforms detect adversarial attacks, with targeted responses for each threat vector. Typically, a defense platform comprises three layers:
    • A detection layer trained to detect any form of attack.
    • A response layer that responds to attacks in real time.
    • A threat management layer, which enables audit and observation and telemetry, along with the ability to call on a threat database to understand any attacks.
    For instance, Infosys has built automated guardrails for generative AI models. These guardrails screen input prompts for a variety of prompt injections and jailbreak attacks, disabling the attack without further human intervention.
  5. Integrate AI model security along with enterprise security
    Firms must reassess their existing cybersecurity initiatives, keeping in mind the vulnerabilities of enterprise AI models in production. This includes risk detection, security operations centers, and incident response plans. It might also mean redesigning business workflows to contain AI model security risks, such as introducing human decision-making before critical junctures or creating focused alerting systems for anomalous model behavior.

Looking ahead

Historically, regulations have trailed technological advancement, especially in the AI space. The AI value chain is complex, bringing together model providers with platform providers and system integrators, among other entities. Further, regulations often fall short when tackling specific scenarios and nuances, failing to clearly assign responsibility to the right party.

This paper originally intended to address AI model risks in the absence of significant national security regulations for advanced generative AI models. However, remarkably, in November 2023, and in a landmark collaboration, the US Cybersecurity and Infrastructure Security Agency and the UK National Cyber Security Centre jointly released their Guidelines for Secure AI System Development.

Co-sealed by 23 domestic and international cybersecurity organizations, this publication marks a significant step in addressing the intersection of AI, cybersecurity, and critical infrastructure. It also starts the process for creating a “gold standard for AI security”. However, while this is only a guideline, enterprises still lack the necessary skillsets and capabilities to adhere to this as part of routine AI development.

As these guidelines slowly become more widely accepted, we will witness a marked growth in services, solutions, and platforms that deal with AI model security.

In preparation of compliance to future regulatory edicts, forward-thinking firms that employ generative AI in their business processes and customer experiences should proactively engage cybersecurity and AI teams to assess the full spectrum of model vulnerabilities. This involves taking inventory of AI projects and assessing risks from the lens of adversarial security warfare. Further, they should create a set of best practices, design guidelines, and employ reference architectures that act as a North Star for AI developers to embed secure by design across the AI life cycle. This may include adversarial testing and training, and incorporating basic cybersecurity best practices.

Finally, no defense is complete without strategic technology partners. Chief information security officers and their kin should jointly create technical tools and platforms that safeguard AI systems from known attacks as well as form a moat against potential attacks. In doing so, firms across industries can go AI-first without the sweat and fear that encumbers progress, adopting AI ethically, quickly, and profitably.

Connect with the Infosys Knowledge Institute

Opt in for insights from Infosys Knowledge Institute Privacy Statement