QA Is A Must But How Is It Done in AI.png

QA Is A Must But How Is It Done in AI?

At Intertec, there is a consensus that QA should be performed on almost every project regardless of the domain and type. Some people have an opinion that the QA process is pretty straightforward and can only be applied to the so-called conventional applications.

To break the dilemma and show that unconventional projects require special attention and be quality assured, my fellow QA colleague Gjorgji Karakolev and I initiated a session with the members of our AI workgroup.

During the internal AI session, we tried to understand the general characteristics of an AI project, we discussed and answered the main questions and in the following article we will walk you through the conclusion about the effectiveness of QA in AI-quality assurance in artificial intelligence.

So what’s the deal with an AI project?

The main goal of AI is to generate an artificial being (a model) that can, more or less, replace parts of human intelligent behavior. This is a concept known as Narrow AI. For now, we will disregard the concept of General AI and focus on the models that perform specific tasks, like for example value prediction given some factors or produce natural language given a set of parameters.

Okay, so the model’s purpose is to automatically do something that replaces the user considering a conventional application. For it to be properly used, this model needs to be placed behind an API which on the other hand can be also consumed by a service or by a verifier subject (read QA).

QA Is A Must But How Is It Done in AI 2.png

When it comes to performing QA, the first impression can be encouraging. It resembles the standard API checks, applying the blackbox treatment “don’t worry about how the model has learned but what it has learned”. But, is it really?

QA Is A Must But How Is It Done in AI 3.png

Let us tell you the story behind it. Generating a model usually refers to following a variant of a standard AI life cycle, in most cases based on the CRISP-DM pipeline. It means that to produce a model, an ordered array of phases and their processes inside have to be performed in several iterations. Each of those processes is usually covered by units of code that can be executed independently.

So, the following question arises: Does QA need to be performed on each of them, too? And what about the data?

AI can be unique…

… and this can have a huge impact on QA. Sadly, no matter how much we try to generalize AI, it is a complex discipline that involves many different subfields, techniques and types of models. Therefore testing such a project can vary significantly.

For example, evaluation of a standard machine learning (ML) regression model deals with numbers as relatively simple output. Similarly, ML classification problems can deal with strings but as a well-defined and finite set. This cannot be said for the image/video recognition or the natural language processing/generation (NLP/G) AI types because the evaluation process deals with far more complex output.

Let’s take a look at the challenges of an NLG project that the engineers from the AI team described at the session.

One of the largest challenges in building a functional and effective NLG system is testing (related to the testing process of the life cycle) and final quality assurance. How can we make sure that an NLG system is strong and reliable? This is often critical for commercial NLG systems and is actually very desirable for several academic and research NLG systems.

So how will we test an NLG type of model?

Generally, the practice in AI seems to be the following: run a system on a bunch of test cases and report what percentage of test cases the system got right. One challenge is that we still want to grasp and admit worst-case behavior as average-case behavior.

Valuable quality assurance for NLG requires:

  • extensive collection of unit, integration and performance tests, along with a precise definition of cases for regression testing;
  • automatic checking of all generated texts against quality criteria, as an example checking for spelling mistakes and use of profanity;
  • manual checking of NLG texts by domain experts;
  • bugs should be recorded, prioritized and tracked in a structured and organized fashion, probably employing a bug-tracking tool.

Since it’s often difficult for developers to properly test their own code, we recommend the testing to be executed by dedicated testers.

Good requirements are essential for reliable testing

Requirements are the muse for all software development efforts since they define the new functionality that the stakeholders expect and how their appliance will behave for the users. Most of the issues that arise in building and refining a product are directly traceable to errors in requirements. Poor requirements definition is often a direct result of too many changes and this is often a serious reason for software project failures.

Testing ensures that an application performs in keeping with the design and customary user expectations. This is often done by:

  • verifying the correct function of all communicated pathways within the application;
  • identifying all relevant data that may validate all possible usage contexts.

The test suite must aim to supply full coverage of all paths and every form of data. This implies that user and system requirements must have sufficient detail to enable test designers to craft high-quality test cases.

As technological complexity continues to extend, there’s an acute need for an automatic action generation solution that may keep a lid on costs while providing adequate coverage of elaborate software applications and systems.

Q&A for QA

Let’s take a look at the discussion points from both QA and developer perspectives.

What effort should a quality engineer make to switch from a standard app to an app that handles an AI model?

As usual, quality engineers should understand the principles of AI modeling from scratch, get used to the dynamic test data and the constant need for updates and improvements. If testing the front-end part of a conventional application can be flexible in terms of meeting the UI/UX criteria then performing QA on an AI mode behind an API will be even more flexible depending on the expected quality and performance, which is something different than a strongly defined scenario for standard back end API testing.

Is it reasonable to state that increasing AT coverage will decrease manual effort?

Yes, with smart automation testing, the development will decrease manual effort but in the end, manual checks should be performed for at least some cases of the final result.

Can testing become more redundant?

Of course not, QA is a necessary part of software development. QA is about much more than just testing code.

Which technologies can be used to support manual QA?

We cannot specify which tools or technologies are better at the moment as this will always depend on whoever is using them such as the person, the team or the company/organization.

Which technologies can be used for supporting AT?

The answer is more or less the same as for the manual QA, however, the usage patterns should be more rigid when aligned within the team.

Should QA be responsible for the quality of the data?

If we answer this question by the book then – no. That’s because there are specific job descriptions that are responsible for the quality of the data, like for example, data analysts or data engineers.

However, most companies do not have the luxury of having such experts within a team that deals with an AI project and are prone to make compromises in the form of assigning QA or to lesser extent developers and domain experts to take care of the data quality. Let’s face it, people with those JDs are really hard to find and when starting, improvisation is the recommended way to go.

At which point (in time) should a QA start contributing to the project?

As soon as possible, simply do not wait for the first iteration of the AI life cycle to finish because it is better to make a recovery on a process or a phase itself than to have an unnecessary iteration before you detect an issue. It means immediately after the first well-defined code snippet is developed and testable.

Should manual QA be responsible for reading logs inside the app?

That would be a big benefit for QA because the model “will not report the bugs”. Also, the logs of the processes leading to the model generation should be covered. Using a specialized monitoring tool will enhance the QA process. Above all, the precondition is to have a properly structured and easily understandable format for which the QAs should always make suggestions and active contributions, not leave everything on the developer’s and product manager’s decisions.

Can manual QA be responsible for checking the output, if AT covers the same cases?

Developing smarter automation will result in automatically reporting to manual QA if something is wrong.

Is manual testing best for exploration?

Manual and exploratory tests depend fully on a tester to navigate through a testing path. Though an experienced tester won’t proceed in free form, the test will only continue at the speed of the human manual execution and this might often be hampered by interruptions. Manual testing typically lacks specificity and repeatability.

One key advantage, when testing new features is that the human tester can think of the type of user. Of course, this might vary from a script, which can follow just one path. The foremost frustrating aspect of manual/exploratory testing is the extensive amount of cost and energy that is required. As software expands and escalates in complexity, unscripted testing seems to be, well, nonsense. It’s wide-ranging and variable, though it’s rarely unintentional. On the other hand, if you ask senior automation engineers and QA professionals about the justifying of such testing, you’re likely to find that scripted testing versus exploratory testing might be a perpetual debate.

What rules are going to be practiced with AT testing on NLG?

Although the character of the NLG projects makes sure that manual checks are unavoidable, the creation of more automated tests for every process, especially for the validation part that’s accountable for increasing the standard of the model, results in fewer use cases for manual testing.

Conclusion: How to perform QA in AI?

We hope that we broke at least some of the barriers and the taboos for QA in an AI project. The first requirement is that the one who performs QA is highly motivated and not afraid of this tiny monster. Here are our summarized tips:

  • train more than one model with different data, while not hesitating to include wrong information as well, then make multiple comparisons and suggestions based on the metrics;
  • learn to read the logs or any hidden output of the model and the code related to the project, while comparing the outcomes;
  • understand the functions inside the code and always try to break it;
  • if you feel overloaded with too much information about the project, be guided by the blackbox principles, but not on the model level;
  • always tend to upscale your knowledge in the QA in AI specifics.

Are you someone who is in a search of a new challenge? Do you want to be in a company where out-of-the-box thinking is encouraged? Check out our current job openings.

Kristijan Miloshevski

Jul 29, 2021

Category

Article

Technologies

Quality AssuranceAutomation TestingAIMachine Learning

Let's talk.

Name
Email
Phone
By submitting this, you agree to our Terms and Conditions & Privacy Policy