HyperAcademy

AI and GxP: How to validate machine learning models

people sitting down near table with assorted laptop computers

Feb 26, 2026

Peter Busk

AI and GxP: How to validate machine learning models

Introduction

"Can we even use AI in a GxP environment?" That is a question we often encounter at Hyperbolic when we work with pharma and medicinal companies. The answer is yes, but it requires a fundamentally different approach to validation than traditional software.

Machine learning models are not deterministic like classic software. They learn from data, and their output can change over time. This creates unique challenges in regulated environments where validation, traceability, and reproducibility are legal requirements.

Why is ML validation different?

Traditional software is validated by verifying that it does exactly what it is coded to do. An algorithm that calculates a dose will always give the same output for the same input. But an ML model:

Learns patterns from training data
Can produce different results upon retraining
Has built-in uncertainty
Changes when data changes

This means that classic Software Development Life Cycle (SDLC) validation is not sufficient. We need to think in terms of "Model Lifecycle Management."

Regulatory landscape

FDA, EMA, and other authorities are still developing specific guidelines for AI/ML. But the existing rules fully apply:

21 CFR Part 11: Electronic records and signatures
EU GMP Annex 11: Computerized systems
GAMP 5: Good Automated Manufacturing Practice

At Hyperbolic, we operate on the principle: AI systems in GxP must meet the same requirements for quality, safety, and data integrity as all other software, plus additional requirements specifically for ML.

Validation framework for ML in GxP

Phase 1: Data Governance and Qualification

Everything starts with data. In GxP, it's not enough to have a lot of data; it needs to be qualified data.

Data lineage: Document exactly where data comes from. Which systems? Which processes? How was it collected?

Data quality: Validate that the data is:

Complete (no critical gaps)
Correct (validated against source systems)
Consistent (no conflicts or duplication)
Current (updated according to requirements)

Data splitting: Document how data is divided into training, validation, and test. This split must be reproducible and traceable.

In a project for a pharmaceutical company, we established complete data lineage from production systems through cleansing to training data. Each transformation was documented and validated.

Phase 2: Model Development and Documentation

Requirements Specification: What should the model be able to do? Define clearly:

Problem formulation
Acceptable accuracy levels
Performance requirements
Safety requirements

Model Selection: Document why this specific model type was chosen. We typically compare 3-5 different approaches and document the rationale for the choice.

Hyperparameter tuning: All tuning must be traceable. We log all experiments with MLflow or similar tools, so it is documented how we arrived at the final configuration.

Phase 3: Validation and Testing

Here, GxP validation really differs from standard ML practices.

Testing on independent data: Test data must NEVER have been seen during training or tuning. In GxP, we often require a "locked" test set that is only opened when the model is completed.

Performance qualification: Define acceptance criteria in advance. Example:

Minimum accuracy: 95%
Maximum false negative rate: 2%
Performance must be stable across different batches

Edge case testing: Test the model on:

Outliers and extreme values
Missing data
Data outside the training distribution
Known failure modes

Bias analysis: Document that the model does not have unacceptable bias. For a model screening clinical trial candidates, we tested performance across age, gender, and ethnicity to ensure there was no discrimination.

Phase 4: Deployment and Change Control

Versioning: Each model version must be uniquely identified. We version:

Model architecture
Training data (including exact split)
Hyperparameters
Dependencies (libraries and versions)

Change control: Any change must go through formal change control. Even minor adjustments require:

Impact assessment
Testing
Approval
Documentation

Rollback plan: What do we do if the model fails in production? There should always be a plan to roll back to the previous version or to a manual process.

Phase 5: Continuous Monitoring

ML models are not "set and forget." In GxP, we require continuous monitoring.

Performance monitoring: Track continuously:

Prediction accuracy
Distribution of input data (data drift)
Distribution of outputs
Response times

Periodic review: Quarterly or semi-annual reviews where we verify that the model still performs as expected.

Retraining and revalidation: When should the model be retrained? Define clear criteria:

Performance falls below threshold
Significant data drift detected
New regulatory requirements
Changes in underlying processes

Practical challenges and solutions

Challenge: Explainability

Regulators often want to know "why" the model makes a decision. Deep learning models are notoriously difficult to explain.

Our approach:

Prefer explainable models where possible (decision trees, linear models)
For complex models: Implement SHAP or LIME to explain individual predictions
Document model behavior thoroughly through sensitivity analysis

Challenge: Reproducibility

Being able to reproduce the exact same model is critical in GxP, but ML often involves randomness.

Our approach:

Set ALL random seeds and document them
Version control of everything (code, data, config)
Containerization (Docker) of the entire environment
Automated pipelines that ensure identical processes

Challenge: Audit trails

GxP requires a complete audit trail of all changes and decisions.

Our approach:

Automatic logging of all model interactions
Integration with electronic QMS systems
21 CFR Part 11 compliant signatures on critical decisions

Case: Validation of quality control model

We developed an AI model for automatic inspection of pharmaceutical tablets. This was a category 5 system (GAMP) with direct GxP impact.

Our approach:

6 months of data collection and qualification from the production line
Selection of CNN architecture after comparing it with 4 alternative approaches (documented)
Locked test set with 10,000 tablets manually verified by 3 independent inspectors
Performance requirements: Min 99% accuracy, max 0.1% false negatives (defective tablets marked as OK)
Complete validation documentation: IQ/OQ/PQ of 300+ pages
Continuous monitoring with weekly performance reviews

Result: Model approved by QA, implemented in production, and has operated stably for over 18 months with consistent >99.5% accuracy.

Tools and best practices

MLOps for GxP:

MLflow for experiment tracking (with audit logging)
DVC for data and model versioning
Great Expectations for data validation
Evidently AI for monitoring data drift
SHAP/LIME for model explainability

Documentation templates: We have developed GxP-ready templates for:

ML Model Requirements Specification
ML Model Design Document
Validation Plan and Report
Change Control procedures for ML

Conclusion

AI and ML can absolutely be used in GxP environments, but it requires discipline, thorough documentation, and a structured approach to validation. It is not enough to have a model that "works"; it must be validated, reproducible, and continuously monitored.

At Hyperbolic, we combine a deep understanding of both AI/ML technology and GxP requirements. We help pharma companies navigate this complex landscape and implement AI solutions that deliver value while meeting regulatory requirements.

Peter Busk

CEO & Partner

[ HyperAcademy ]

Our insights from the industry

Colorful software or web code on a computer monitor

Feb 26, 2026

IT & software development

Return on IT projects: How to measure value

"Can you build us an app?" "What does it cost?" "1.5 million kroner." "What do we get for the money?" This is a conversation we often have at Hyperbolic. And the right answer is not "An app with these features." The right answer is "Value that exceeds the investment."

Feb 26, 2026

IT & software development

Scaling agile development: From 5 to 50 developers

Agile works great with a small team. Five developers, one product owner, daily stand-ups, two-week sprints. Communication is easy, decisions are quick, deployment goes smoothly. But then the company grows. Suddenly, you have 20 developers. Then 50. And the agile process that worked so well starts to crack. Stand-ups take 45 minutes. Teams step on each other's toes. Releases become chaotic.

Feb 26, 2026

IT & software development

Cybersecurity in regulated industries

At Hyperbolic, we work with cybersecurity in both the pharmaceutical industry and general software development. Regulated industries face unique challenges: security must be balanced with compliance requirements, legacy systems are often vulnerable, and the consequences of breaches can be catastrophic.

AI and GxP: How to validate machine learning models

Peter Busk

Our insights from the industry

Return on IT projects: How to measure value

Scaling agile development: From 5 to 50 developers

Cybersecurity in regulated industries

We develop apps and complex IT solutions

We develop apps and complex IT solutions

We develop apps and complex IT solutions