Testing AI-based systems

Martin Heininger

AI-based systems are becoming popular. However, the broad acceptance of such systems is still quite low in our society. Trust in and acceptance of AI-based systems will significantly increase when we can prove the correct and – more importantly – safe behavior. Verification of AI-based systems is the key.

What is artificial intelligence?

The ISO/IEC 2382:2015 defines artificial intelligence as follows:

“Artificial Intelligence (AI) is a branch of computer science devoted to developing data processing systems that perform functions normally associated with human intelligence, such as reasoning, learning, and self‐improvement.”

This and other definitions of AI are not very precise. In his blog post Artificial Intelligence in safety-critical, Vance Hilderman uses the term “true Artificial Intelligence”. By this he means non-deterministic AI-based systems. According to the definition above, also deterministic systems can be AI. For testing it is important to differentiate here. Non-deterministic systems are a great challenge for test engineering. If I cannot predict a result, how should I judge whether the result delivered is correct?

Verification strategies for such systems are the most complicated ones. Let us first take a look at the complete picture of currently available industrial AI-based systems.

AI-based systems – which technical solutions exist?

All AI systems currently used correspond to weak AI. This means that a machine performs a precisely defined task faster, better and more efficiently. This is implemented in practice by machine learning (ML). Other technologies for weak AI are at research stage. A strong AI would be able to think in a similar way as a human being. This includes strategic and emotional intelligence. AI systems like this do not exist at all.

There are three types of relevant machine learning algorithms:

Supervised and unsupervised ML are deterministic systems; after training these systems, the same input data will create the same output data.

Reinforcement ML is different. They use the created outputs, based on certain inputs, and decide themselves whether the result is acceptable or not. If its not good enough the machine changes its behavior next time. Complex algorithms are used to make these decisions. Therefore, reinforcement ML are considered as non-deterministic.

Neural networks (e.g. FNNs, CNNs, RNNs) and statistical methods (e.g. Kernel Method, Independent components analysis, Bayes-Interference) are used to implement the mentioned machine learning algorithms.

How to test supervised and unsupervised ML AI-based systems

From a testing perspective, supervised and unsupervised AI-based systems are similar. These systems use training data to calibrate the neural network, or the algorithm based on a statistical method, so that the results meet the customer expectations.

Therefore, testing the neural network or the algorithm, which is based on a statistical method, without the training data tends to be meaningless. This means the well-known white box testing methods like structural coverage will no longer be useful in the current form. When neural networks are used, the idea is to measure the “neuron” coverage, i.e., is each “neuron” used at least once. However, based on the experience of structural coverage measurement over the last decades, it’s unlikely that “neuron” coverage will be helpful at all.

It’s obvious that the verification of the training data is much more important. We need to define processes on how to select high-quality training data sets. The aerospace standards DO 200B “Standards for Processing Aeronautical Data” do not have any reference to AI systems, but they offer transferable approaches as to how the quality of data for AI systems can be achieved .

An additional approach to verify these systems would be to systematically analyse the bias of the used neural network/statistical method and create test cases accordingly.

The system test should use a systematically derived set of input data to prove that the training was successful. The (potential) customer should be involved in this process to validate the created AI-based system.

What is different in testing reinforcement ML systems?

The above-mentioned test strategies are also valid for reinforcement machine learning systems. However, due to the fact, that the system does have self-learning elements, testing will be more challenging. The outputs of the system depend on the history (learning experience) of the system. Moreover, the techniques used to implement self-learning elements are very complex. The combinations of possible inputs and outputs will be practically indefinitely high.

Clearly, there will be no way to fully test any of these kinds of systems. To overcome this challenge we need to think about it more intensively, more than we did for conventional systems, about a systematic, risk-based test approach. We need to know the major system risks, the bias of the used neural network/statistical method, the acceptable limits of the system. Based on that and possibly additional parameters we need to derive tests. When we evaluate the test results, we should not focus any longer on the exact result. Much more important will be to define a range where we expect the result to be. As for the inputs, we must systematically derive acceptable and unacceptable ranges for the test results.

Conclusion

Three types of machine learning dominate industrial AI applications today. Testing weak AIs as mentioned above is certainly possible. However, the challenges known from testing conventional systems become more important if we want to test AI-based systems.
Since the acceptance of AI systems will depend on their effectiveness, it is important to validate and verify this effectiveness before launching the systems to the market. In order to be able to master the challenges discussed here, we will have to focus much more on test engineering in the future than we have done in the past.

Video: How an AI system is tested in functional safety

Get more background in the webinar recording about testing AI-based systems in functional safety applications including:

Examples for today’s AI applications
AI systems in functional safety
Standardization (ISO/IEC SC42)
Tool demo for testing AI-based systems (starting from 52:09)

Further information and contact

Unit testing with VectorCAST
Website HEICON , info[at]heicon-ulm.de or +49 (0) 7353 981 781

Contact me

Martin Heininger

Martin Heininger, CEO of HEICON, focuses on pragmatic software processes and methods fulfilling safety-relevant aspects. He has 20 years of experience with safety-critical software in various sectors.

Legal Notice

This text is the intellectual property of the author and is copyrighted by coderskitchen.com. You are welcome to reuse the thoughts from this blog post. However, the author must always be mentioned with a link to this post!

2 thoughts on “Testing AI-based systems”

Jerzy Batalinski
4 October 2021 at 15:30
Martin this is amazing insight. Have you found that the best way to validate the results of an AI-based system is to verify the input and output data?
Takeaways
Therefore, testing the neural network or the algorithm, which is based on a statistical method, without the training data tends to be meaningless.
It’s obvious that the verification of the training data is much more important.
The (potential) customer should be involved in this process to validate the created AI-based system.
Reply
- Martin Heininger
  11 October 2021 at 16:15
  Hello Jerzy,
  thanks for your comment.
  Only in safety-relevant projects the structural coverage is measurend systematically and exentsively. Currently, I do not know any safety-relevant project, where an AI-based system is used. Therefore there is not a real project available, where we could see how meaningful structural coverage measurement is in the light of an AI-based system. However, all AI-based systems I have seen so far, are struggling with respect to system validation. Here is the key to improve the quality and yes, we need the customers involved there very much.
  Additionally, it would be very helpful to analyse systematically the bias-topics of the AI-based system, i.e. the weaknesses of the AI algorithm. I think the potential here need to analysed as well.
  Reply

Testing AI-based systems

Martin Heininger

What is artificial intelligence?

AI-based systems – which technical solutions exist?

How to test supervised and unsupervised ML AI-based systems

What is different in testing reinforcement ML systems?

Conclusion

Video: How an AI system is tested in functional safety

Further information and contact

Martin Heininger

Share:

Legal Notice

2 thoughts on “Testing AI-based systems”

Leave a Comment Cancel reply

Related Posts

Software Testing in the World of IoT

Model-based system testing: 3 reasons why you should start now

Testing Cyber-Physical Systems: The physical part

Why coders care about testing to requirements?

Reducing efforts in system testing

Traditional vs. model-based system testing

Follow us on LinkedIn