AI-based systems are becoming popular. However, the broad acceptance of such systems is still quite low in our society. Trust in and acceptance of AI-based systems will significantly increase when we can prove the correct and – more importantly – safe behavior. Verification of AI-based systems is the key.

What is artificial intelligence?

The ISO/IEC 2382:2015 defines artificial intelligence as follows:

“Artificial Intelligence (AI) is a branch of computer science devoted to developing data processing systems that perform functions normally associated with human intelligence, such as reasoning, learning, and self‐improvement.”

This and other definitions of AI are not very precise. In his blog post Artificial Intelligence in safety-critical, Vance Hilderman uses the term “true Artificial Intelligence”. By this he means non-deterministic AI-based systems. According to the definition above, also deterministic systems can be AI. For testing it is important to differentiate here. Non-deterministic systems are a great challenge for test engineering. If I cannot predict a result, how should I judge whether the result delivered is correct?

Verification strategies for such systems are the most complicated ones. Let us first take a look at the complete picture of currently available industrial AI-based systems.

AI-based systems – which technical solutions exist?

All AI systems currently used correspond to weak AI. This means that a machine performs a precisely defined task faster, better and more efficiently. This is implemented in practice by machine learning (ML). Other technologies for weak AI are at research stage. A strong AI would be able to think in a similar way as a human being. This includes strategic and emotional intelligence. AI systems like this do not exist at all.

There are three types of relevant machine learning algorithms:

  • Supervised and unsupervised ML are deterministic systems; after training these systems, the same input data will create the same output data.
  • Reinforcement ML is different. They use the created outputs, based on certain inputs, and decide themselves whether the result is acceptable or not. If its not good enough the machine changes its behavior next time. Complex algorithms are used to make these decisions. Therefore, reinforcement ML are considered as non-deterministic.
  • Neural networks (e.g. FNNs, CNNs, RNNs) and statistical methods (e.g. Kernel Method, Independent components analysis, Bayes-Interference) are used to implement the mentioned machine learning algorithms.

How to test supervised and unsupervised ML AI-based systems

From a testing perspective, supervised and unsupervised AI-based systems are similar. These systems use training data to calibrate the neural network, or the algorithm based on a statistical method, so that the results meet the customer expectations.

Therefore, testing the neural network or the algorithm, which is based on a statistical method, without the training data tends to be meaningless. This means the well-known white box testing methods like structural coverage will no longer be useful in the current form. When neural networks are used, the idea is to measure the “neuron” coverage, i.e., is each “neuron” used at least once. However, based on the experience of structural coverage measurement over the last decades, it’s unlikely that “neuron” coverage will be helpful at all.

It’s obvious that the verification of the training data is much more important. We need to define processes on how to select high-quality training data sets. The aerospace standards DO 200B “Standards for Processing Aeronautical Data” do not have any reference to AI systems, but they offer transferable approaches as to how the quality of data for AI systems can be achieved .

An additional approach to verify these systems would be to systematically analyse the bias of the used neural network/statistical method and create test cases accordingly.

The system test should use a systematically derived set of input data to prove that the training was successful. The (potential) customer should be involved in this process to validate the created AI-based system.

What is different in testing reinforcement ML systems?

The above-mentioned test strategies are also valid for reinforcement machine learning systems. However, due to the fact, that the system does have self-learning elements, testing will be more challenging. The outputs of the system depend on the history (learning experience) of the system. Moreover, the techniques used to implement self-learning elements are very complex. The combinations of possible inputs and outputs will be practically indefinitely high.

Clearly, there will be no way to fully test any of these kinds of systems. To overcome this challenge we need to think about it more intensively, more than we did for conventional systems, about a systematic, risk-based test approach. We need to know the major system risks, the bias of the used neural network/statistical method, the acceptable limits of the system. Based on that and possibly additional parameters we need to derive tests. When we evaluate the test results, we should not focus any longer on the exact result. Much more important will be to define a range where we expect the result to be. As for the inputs, we must systematically derive acceptable and unacceptable ranges for the test results.


Three types of machine learning dominate industrial AI applications today. Testing weak AIs as mentioned above is certainly possible. However, the challenges known from testing conventional systems become more important if we want to test AI-based systems.
Since the acceptance of AI systems will depend on their effectiveness, it is important to validate and verify this effectiveness before launching the systems to the market. In order to be able to master the challenges discussed here, we will have to focus much more on test engineering in the future than we have done in the past.

Video: How an AI system is tested in functional safety

Get more background in the webinar recording about testing AI-based systems in functional safety applications including:

  • Examples for today’s AI applications
  • AI systems in functional safety
  • Standardization (ISO/IEC SC42)
  • Tool demo for testing AI-based systems (starting from 52:09)

Further information and contact

  • Unit testing with VectorCAST
  • Website HEICON , info[at] or +49 (0) 7353 981 781


Legal Notice

This text is the intellectual property of the author and is copyrighted by You are welcome to reuse the thoughts from this blog post. However, the author must always be mentioned with a link to this post!

2 thoughts on “Testing AI-based systems”

  1. Martin this is amazing insight. Have you found that the best way to validate the results of an AI-based system is to verify the input and output data?


    Therefore, testing the neural network or the algorithm, which is based on a statistical method, without the training data tends to be meaningless.

    It’s obvious that the verification of the training data is much more important.

    The (potential) customer should be involved in this process to validate the created AI-based system.

    • Hello Jerzy,
      thanks for your comment.
      Only in safety-relevant projects the structural coverage is measurend systematically and exentsively. Currently, I do not know any safety-relevant project, where an AI-based system is used. Therefore there is not a real project available, where we could see how meaningful structural coverage measurement is in the light of an AI-based system. However, all AI-based systems I have seen so far, are struggling with respect to system validation. Here is the key to improve the quality and yes, we need the customers involved there very much.

      Additionally, it would be very helpful to analyse systematically the bias-topics of the AI-based system, i.e. the weaknesses of the AI algorithm. I think the potential here need to analysed as well.


Leave a Comment

Related Posts

Daniel Lehner

Reducing efforts in system testing

Which are the best methods to reduce efforts in system testing? In this second part of the Coders Kitchen interview, system testing experts Hans Quecke

Traditional system testing vs. model-based system testing - Coders Kitchen
Daniel Lehner

Traditional vs. model-based system testing

Interview with system testing experts Simone Gronau and Hans Quecke about typical challenges in defining system tests, and how model-based methods might offer an alternative.

What is a software unit - Coders Kitchen
Andreas Horn

What is a software unit?

When we talk about software unit testing, the first question I always ask is “What is a unit for you?”. This can be a very

Hey there!

Subscribe and get a monthly digest of our newest quality pieces to your inbox.