Article

AI assurance framework for risk management and ethical use

By Venkatesh Iyengar, Harleen Bedi, Vanishree Mahesh

21 Aug, 2025
7 min read

Insights

AI assurance must go beyond ethics to also cover performance, security, and business alignment.
Traditional QA methods fall short for AI’s dynamic and unpredictable behavior.
Real-world failures show the risks of unvalidated AI outputs and inconsistent responses.
Global regulations are evolving fast, requiring structured AI risk management and transparency.
Frameworks like Infosys BR² help enterprises ensure trustworthy, robust, and explainable AI systems.

When most organizations think of artificial intelligence (AI) assurance, they think of responsible AI — ethics, fairness, and compliance. But in enterprise environments, that is only part of the equation. Assurance must also cover business alignment, model robustness, security, and repeatable performance, especially as AI becomes central to decision-making and customer engagement.

However, existing assurance practices often fall short. Most focus narrowly on compliance checklists or bias audits, overlooking critical enterprise risks such as inconsistent outputs, model drift (where AI models gradually lose accuracy due to changes in data or environment), operational brittleness (system’s inability to handle unexpected situations), and vulnerability to attacks.

AI systems can fail dramatically without proper testing and validation, as recent real-world cases illustrate. In January 2025, several leading American newspapers published an AI-generated summer reading list that included 10 hallucinated books, revealing the AI’s failure to verify sources or authorship accurately. Likewise, a lawyer admitted to using AI to draft a legal brief, containing nearly 30 errors such as faulty citations and references to nonexistent cases.

In enterprise settings, AI agents have shown similar troubling behavior: Giving different answers to identical customer queries, unpredictably changing decision patterns day to day, breaking under complex requests, and failing to coordinate with other AI systems. These examples highlight the urgent need for rigorous testing, including validating data accuracy and contextual relevance, to ensure AI output remains reliable and trustworthy in critical settings.

Balakrishna (Bali) DR, Infosys executive vice president and global services head of AI and industry, says: "As generative AI becomes central to business strategy, assuring its ethical and reliable behavior is no longer optional but a core responsibility. AI assurance is how organizations build trust, mitigate risk, and stay in control of systems that learn, evolve, and sometimes behave unpredictably.”

Moreover, traditional quality assurance (QA) methods need to be restructured to test AI solutions that are nondeterministic in nature, meaning they can produce different results for the same input. Figure 1 provides examples that illustrate the severity of key AI assurance challenges.

To meet these enterprise-level demands, organizations need assurance frameworks (like Infosys BR²) that go beyond ethical compliance and address performance, security, and alignment with business outcomes.

As generative AI becomes central to business strategy, assuring its ethical and reliable behavior is no longer optional but a core responsibility

– Balakrishna (Bali) DR
executive vice president and global services head of AI and industry, Infosys

Evolving frameworks for responsible AI assurance

A variety of AI assurance frameworks have been developed by governments, industry leaders, and independent bodies, each tailored to address specific aspects of AI governance but all converging on the need for responsible AI deployment.

The National Institute of Standards and Technology (NIST) has developed the AI Risk Management Framework (AI RMF), which focuses on managing AI risks to individuals, organizations, and society. Its core functions include governance, risk mapping, measurement, and management.

Similarly, the UK government’s AI assurance framework outlines the ecosystem of stakeholders — from regulators and standards bodies to civil society — and provides practical tools and techniques for embedding assurance in AI governance. It highlights the importance of risk and impact assessments, bias audits, compliance verification, and formal methods to ensure AI systems adhere to laws and ethical norms.

In the private sector, organizations such as EY have developed their own global AI assurance frameworks tailored to audit and assurance practices. EY’s framework integrates AI risk management into financial reporting and operational audits, recognizing that AI increasingly influences key business controls and data integrity. The company equipped its assurance professionals with tools and methodologies to evaluate the robustness of AI models, data quality, and governance structures.

Across these frameworks, a few themes consistently emerge: The importance of continuous monitoring due to AI’s dynamic nature and the integration of human expertise with automated tools to detect and mitigate risks effectively.

What needs to be tested in AI assurance?

To ensure the safety, reliability, and accountability of AI systems, key focus areas include business assurance practices, the effectiveness of benchmarking methods, the rigor of red teaming exercises, and the implementation of responsible AI principles. These components reflect common themes across leading AI assurance methodologies.

Business assurance

Business assurance has a broad spectrum of activities, from risk management and compliance to QA. To ensure that AI solutions stay aligned with business goals, support growth, and deliver on user expectations, traditional testing — user interface, functional, and integration — is combined with AI-specific practices like simulating user behavior and running hallucination checks. Rigorous testing with diverse datasets validates the system’s reliability in real-world conditions and helps prevent errors that could impact performance or trust.

Systematic benchmarking

Benchmarking (Figure 1) involves a systematic evaluation of AI models against established industry standards and performance benchmarks. This includes the use of standard software testing methodologies such as level-wise testing (functional, integration, system, and acceptance) and session-based testing, which breaks testing into focused, time-boxed sessions to catch issues early and improve quality. For AI-specific evaluation, it includes large language model (LLM) benchmarking for language skills and multi-turn testing to assess context maintenance across multiple dialogue turns, plus orchestration testing for task coordination and workflow management.

Figure 1. Agent benchmarking

Source: Infosys Knowledge Institute

Red teaming

Red teaming, a standard cybersecurity testing practice, involves adversarial testing to simulate attacks and uncover vulnerabilities. In AI systems, this includes prompt injection (manipulating inputs to trigger unintended behavior), jailbreak attempts (bypassing safeguards to unlock restricted functions), data poisoning (inserting malicious data into training), and checking for data security and leakage to ensure sensitive information remains protected.

Figure 2. Red teaming via adversarial attacks

Source: Infosys Knowledge Institute

Responsible AI

The component integrates ethics into AI operations. It addresses bias mitigation, transparency, privacy protection, and regulatory compliance, ensuring AI decisions are explainable and auditable. Embedding these ethical considerations into assurance processes helps organizations build AI systems that earn stakeholder trust and comply with evolving legal frameworks.

Data-driven AI assurance

Enterprise AI must be designed to deliver consistent, explainable, and safe outcomes even when user inputs vary. While generative models are inherently probabilistic, robust engineering, rigorous testing, and strong governance practices enable AI assurance solutions to achieve consistent and repeatable results.

Enterprise-level framework for AI assurance

Infosys addresses key areas of AI assurance through its (BR²) framework, which brings together business assurance, benchmarking, red teaming, and responsible AI. Each of these components aligns with widely recognized AI assurance principles but is tailored for practical application at the enterprise level. The framework is designed to navigate the specific challenges posed by AI systems, such as unpredictability, opacity, and dynamic behavior, while embedding assurance practices throughout the development and deployment life cycle.

To operationalize the (BR²) framework (Figure 3), Infosys employs a structured and data-driven assurance platform that integrates multiple enablers to ensure comprehensive testing of AI systems.

The first enabler is data provisioning. This involves creating a golden repository, which is a curated dataset of validated input-output pairs that reflect the AI system’s expected behavior. Based on this repository, the system automatically generates variant inputs to evaluate performance under a wide range of conditions, including edge cases, biased inputs, and scenarios involving privacy concerns.

Curated benchmarks provide the next essential layer. These include a combination of industry-standard benchmarks and custom metrics that are aligned with specific business and domain needs. Evaluations are further enhanced using LLMs, which allow for precise scoring and a deeper understanding of how the AI performs across different contexts.

Infosys addresses key areas of AI assurance through its (BR²) framework, which brings together business assurance, benchmarking, red teaming, and responsible AI.

Automation plays a key role in scaling and streamlining the assurance process. It facilitates the generation of input variants, execution of benchmark tests, and evaluation of outcomes using model-based scoring. The platform also supports session-based and multi-turn testing, which helps simulate real-world interactions and assess how well the AI maintains context over time.

The final component is analysis and telemetry. This layer captures detailed performance data and provides insights through quantitative scoring. It also includes explainability features that help stakeholders interpret results and understand system behavior. Combined, these capabilities offer high test coverage and improved efficiency, with testing time reportedly reduced by up to 50%.

Figure 3. An example of AI assurance framework

Source: Infosys Knowledge Institute

Getting started with AI assurance

While the journey toward mature AI assurance can seem daunting, organizations can begin with targeted efforts:

First, evaluate existing QA capabilities to identify gaps related to AI’s nondeterministic and evolving behavior.
Second, adopt or adapt proven assurance frameworks like (BR²) that combine rigorous testing, security evaluation, and ethical governance.
Third, start with high-impact pilot projects, such as AI-powered customer support bots or decision engines, to build internal expertise and processes.
Finally, embed assurance activities into ongoing operations, making AI validation a continuous, automated practice rather than a sporadic event.

Why organizations need AI assurance now

With AI increasingly embedded in mission-critical applications — from financial services to healthcare, legal analysis to customer service — failure to assure AI systems carries significant risks. These include reputational damage from biased or incorrect AI decisions, financial losses due to compliance violations, and operational disruptions from security breaches.

Moreover, regulatory landscapes are evolving rapidly. Governments worldwide are introducing AI-specific regulations that mandate transparency, fairness, and accountability. Organizations that embed AI assurance into their development and deployment cycles will not only comply more easily but also gain a competitive edge by building trusted AI solutions. Designing AI with built-in safety measures and continuous oversight is essential to prevent unintended harmful consequences.

The dynamic nature of AI models, which continually evolve through retraining or updates, means that assurance cannot be a one-time activity. Continuous validation integrated into CI/CD pipelines ensures that AI maintains performance and ethical standards over time.

Authors

Venkatesh Iyengar, Harleen Bedi, Vanishree Mahesh