Testing Imperatives for AI Systems

In the past, Artificial Intelligence (AI) research was confined primarily to big technology companies and was conceived as a technology concept that could mimic human intelligence. However, with rapid advances in data collection, processing and computation power, AI has become the new electricity for every business. The market for AI has leapfrogged in the last couple of years with applications spanning a broad range of industries. In the coming years, the widespread uptake of AI is anticipated to help unlock its true potential and improve efficiencies in various fields.

Currently, AI systems are more probabilistic in nature and work with sophisticated input models, which can detect suspicious or risky behavior. This is in stark contrast to the deterministic nature of traditional IT systems which use a rule-based approach and generally follow the “if X, then Y” model. Testing of AI systems, therefore, involves a fundamental shift from output conformance to input validation in order to verify their robustness.

With AI systems gathering momentum, there is a growing need to ensure their quality. For example, today’s automobiles are increasingly using multiple intelligent systems with roughly 150 million lines of code, which is more than what is used in modern fighter jets. Thus, it has become imperative to exhaustively test AI systems in order to ensure their robustness. However, testing AI systems poses certain key challenges that can be overcome with the right approach.

Testing of AI Systems is Not Free of Challenges

  • Massive volumes of collected sensor data presents storage and analytics challenges in addition to creating noisy datasets
  • AI systems rely on data gathered during unanticipated events which is extremely difficult to collate thus posing training challenges
  • AI model test scenarios should be equipped to identify and remove human bias which often becomes part of training and testing datasets
  • In AI systems, the defect gets amplified, making it is extremely hard to fix one isolated problem

Key Aspects of AI Systems Testing

Data Validation

The effectiveness of AI systems is largely dependent on the quality of training data, including aspects such as bias and variety. For example, smart phone assistants and car navigation systems find it difficult to comprehend different accents. The impact of training data is illustrated in an experiment by researchers from MIT Media Lab who trained “Norman”, an AI powered psychopath by exposing it to data from the dark corners of the web. Where a regular algorithm perceived a group of people standing around a window as just that, Norman saw them as potentially jumping out of the window. This experiment shows that training data is of utmost importance for AI systems to give the desired output.

Core Algorithm

The heart of AI systems is built on algorithms, which process data and generate insights. Model validation, learnability, algorithm efficiency and empathy are among the key features of this approach.

Learnability is the ability of a system to learn and modify its behavior with time. Some examples of websites with learnability include Netflix and Amazon, which understand user preferences and come up with appropriate recommendations. Another example is a Voice Recognition System like Siri or Cortana, which picks up the semantics of language websites. However, with Cortana now responding to “I am being abused” with the number for the National Domestic Violence hotline, it is important for chatbots to be tested for comprehension of things such as sarcasm and tone.

The recent Uber crash in March 2018 where a pedestrian was killed by a self-driving car was a result of software failure. The car’s sensors did detect the victim, but unfortunately did not identify her as a trigger for applying the brakes. Distinguishing between real and illusory objects is a primary challenge in developing self-driving car software and a classic example of model validation for core algorithms.

Non-functional: Performance and Security Testing

Performance and security testing is integral to AI systems. This also includes aspects such as regulatory compliance. Recently, HSBC’s Voice Recognition System was breached by a customer’s non-identical twin who was able to access balances and recent transactions and could even transfer money between accounts. Improper testing can lead to chatbots being manipulated into revealing business sensitive information.

Systems Integration Testing

AI systems are designed to operate in the larger context of other systems and to solve specific problems. This requires a holistic assessment of AI systems. Thus, integration testing is of primary importance when multiple AI systems with conflicting goals are deployed together.

In the American state of Arkansas, under a Medicaid waiver program, assessor-driven interviews would be used to decide the hours and frequency of caretaker services applicable for a beneficiary. When the healthcare industry turned to automation in order to improve business efficiency, the service hours apportioned to beneficiaries were substantially reduced in many cases. There was no notification sent to the beneficiaries informing them about the change which led to an increase in grievance appeals. Beneficiaries did not receive any response to their appeals as the automation system put in place was too complex to be expressed in common parlance.

As per a leading analyst firm, the global business value derived from AI is forecasted to reach $1.2 trillion in 2018, a 70% increase from 2017. This market is expected to grow to $3.9 trillion by 2022. With more and more systems imbibing AI characteristics, it is important that they are tested thoroughly. Infosys is on a journey to create multiple assets and solutions in this space, and is also enhancing its people skills in preparation.