AI Airlock: MHRA's Approach to AI in Healthcare

As with any sector, the use of AI has the potential to revolutionise ways of working within the healthcare and pharmaceutical industries. With increasing pressure on healthcare services in the UK and a productivity bottleneck within the pharmaceutical industry, integrating AI for medical purposes offers an opportunity to address clinical needs and streamline operations. Particularly, AI tools within healthcare present the opportunity to improve patient outcomes in applications such as predictive algorithms for early diagnosis or to alleviate some burden on healthcare professionals by assisting in clinical decision-making and automating more routine responsibilities.

Artificial Intelligence as a Medical Device (AIaMD) is AI within software applied for medical purposes. AIaMD falls under the broader Software as a Medical Device (SaMD) umbrella. The use of AIaMD and SaMD is regulated in the UK based on their intended uses and risk levels, under the UK Medical Devices Regulation (UK MDR) 2002. In the UK, there is also supplementary guidance to the MDR from the UK Medicines and Healthcare products Regulatory Agency (MHRA). Due to the novel nature of some SaMD products, the global regulatory landscape is still evolving to catch up with scientific innovation. This may leave regulatory gaps and grey areas in the guidance, which can impact developers’ decisions to pursue AIaMD development, in the UK or otherwise.

The MHRA is proactively tackling this regulatory gap by working on a package of reforms to the UK MDR. Central to this effort is the AI Airlock, a pioneering sandbox programme introduced by the MHRA to identify and address specific regulatory limitations in the current guidance and legislation. Establishing the current challenges of AIaMD through the AI Airlock programme will hopefully guide future regulatory reforms.

Investment in updating regulatory guidance to encompass innovative products such as AIaMD may position the UK as an attractive option for developers and ultimately enable improved patient care.

The AI Airlock Programme

The Airlock programme was launched in April 2024 with a pilot cohort of 4 candidates across various healthcare disciplines and stages of development. The products selected for the pilot cohort were ones that may challenge the current regulatory framework for SaMD and AIaMD in the UK. The programme works in conjunction with an extensive stakeholder network. It brings industry, academia, and regulators together to identify and tackle the challenges identified through the Airlock programme.

Whilst the Airlock programme is not a route to market and products will not be formally tested, it offers a ‘sandbox’ environment for eligible innovative products. This enables manufacturers to generate evidence towards UKCA or CE marking in a safe space, away from the market.

Case studies

Each candidate selected for the pilot cohort had products that addressed a specific problem area in the existing regulatory framework. The challenges for each candidate were clarified. A testing design was devised using the resources and network made available by the Airlock programme. This was known as the Simulation Airlock, to gather different perspectives on the challenges in each case study. Within the Research/Virtual Airlock phase, the candidate teams tested in virtual environments, surrounded by layers of discussion with expert groups and regulatory analysis.

Synthetic data generation and validation

Philips Medical Systems’ Picture Archiving and Communication System (PACS) Radiology AutoImpression is leading innovation in radiology. It is doing this by exploring the use of synthetic data to validate AI-generated reports.

Large Language Models (LLMs) are a form of Generative AI that can create new content, or ‘synthetic data’. Training machine learning models requires a large quantity of high-quality data. However, in practice, there may not be a complete clinical data set available or data from a specific demographic. AI-generated synthetic data represents a potential solution to this issue, but requires thorough validation.

This case study addresses issues around defining the quality of AI-generated synthetic data and using AI for validation. For example, an LLM was used to evaluate the quality of text-based synthetic data compared to real text-based data (in this case, radiology report summaries). It was found that the LLM showed preferential bias to the synthetic data compared to the human-written data. This illustrates the challenge of validating synthetic data, for which there is currently no specific regulatory guidance.

After analysing this Airlock Pilot Cohort results, the team identified synthetic data generation and validation as a high-level risk. It was concluded that no current Medical Device Regulation (MDR) addresses these issues. Experts highlighted that text-based synthetic data, in particular, can produce variable outcomes and therefore adds complexity to the regulatory process. The Airlock team reported that the radiologists consulted largely agreed with the AI judgement but were able to add more nuanced insight than the LLM was able to provide. Participants also questioned the objectivity of the LLM’s judgment when assessing the reports. The Airlock has suggested that this topic should be further explored.

Reducing AI errors for safety

Automedica designed its LLM-powered AI agent, SmartGuideline, for use within clinical workflows, such as summarising clinical evidence to support clinicians in decision-making. The MHRA included this product in the Airlock to explore AI-specific errors, such as hallucinations. This could be when an LLM produces factually incorrect or misleading outputs that may seem plausible. Another example of a potential AI-derived safety concern is the non-deterministic nature of output generation. This means that the same inputs can produce different outputs for a given question.

These issues can arise, for example, when the model training data is incomplete or not representative of the data the product is later used with. In healthcare applications, potential errors may compromise the software’s risk-to-benefit ratio as defined by the UK MDR, so developers must actively mitigate these risks. In this instance, the Airlock programme helped the candidate to implement safety by design through the use of Retrieval-Augmented Generation (RAG) on curated knowledge graphs to enhance the output generated. RAG allows the LLM to intelligently query an external knowledge base to verify its output, therefore successfully mitigating against hallucinations.

As an outcome of this case study, the Airlock’s regulatory analysis defined risk management as currently having a moderate level gap in UK regulation (as opposed to a high-level gap for the above example). UK regulators currently address risk management, but they haven’t developed AI-specific guidance yet. Using RAG to implement safety by design for AI risk management had a positive outcome in this case study. The SmartGuideline observed a reduction in hallucination errors, from 23 at baseline to 0, when they used RAG during the evaluation process. Participants also reported that this approach introduces new risks, such as omitting key data.

Defining LLM explainability requirements

UMA Advisory’s product Oncoflow uses a combination of AI models. It aims to aid decision-making by healthcare professionals to reduce appointment waiting times for cancer appointments. This will lead to earlier treatment, which significantly improves chances of survival. Within the Airlock programme, Oncoflow aims to address the challenge of explainability and balance this with clinical utility. At times, Generative-AI outputs can have poor explainability. This means that it’s not always clear which data was used to produce the output, or that the output generation can’t be traced easily. As with any medicine or device, the explainability and traceability of data and clinical decisions must be readily available for regulatory acceptance. This is to ensure patient safety.

An outcome of the Airlock programme was that regulatory guidance should reflect variation in AI explainability requirements between applications. This case study focused on the regulatory challenges of demonstrating this explainability to the users, such as healthcare professionals. The Airlock programme classified the issue of explainability as a moderate gap within current regulation. This is because although themes of transparency and traceability are covered, current UK regulation is not AI-specific.

Real-time monitoring and post-market surveillance

FAMOS is a Federated AI monitoring system created by Newton’s Tree to be a monitoring system for AIaMD healthcare products. The team applied FAMOS within radiology AlaMD to address challenges around real-time AI model drift and data quality monitoring. Under UK regulation, manufacturers must conduct post-market surveillance for all drugs and devices to mitigate safety risks. This also applies to AIaMD and SaMD. This case study explored the impact of a real-time monitoring system on product safety. It focused on identifying variations in AI performance, data quality, and human-AI relationships. For example, over-reliance on AI was identified and impacted by factors such as clinician fatigue and training. Identification of these trends allows intervention as a proactive safety measure.

The Medical Devices (Post-market Surveillance Requirements) (Amendment) (Great Britain) Regulations 2024 came into force in June. This helped to address the gaps identified within this case study. As a result, the Airlock programme determined this to be a moderate gap because there is still some work to be done to define AI-specific guidance.

Programme outcomes

The MHRA has released initial results and reflections on the AI Airlock programme, including the high-level regulatory gap analysis summarised above. A more detailed regulatory gap analysis, detailed simulation reports, and an independent reviewer’s programme evaluation report are due to become available in the coming months.

The programme will now continue with a new cohort, for which applications closed in July this year. This second phase of the AI Airlock will run until April 2026. The AI Airlock still aims to generate updated and additional guidance to support regulators and developers in the AI-healthcare field. The specific focus of Phase 2 will be risk classification, change management planning, and bias and fairness metrics.

A developer’s perspective

As an AIaMD developer, participation in the AI Airlock programme is a rare opportunity to contribute to shaping UK regulation for these innovative products. This will ultimately improve patient outcomes. Whilst the programme does not offer direct regulatory evaluation of the AIaMD, it offers a unique approach to evidence generation. This is due to input from subject matter experts and connections to key industry stakeholders. Inevitably, cross-sectoral collaboration and better regulatory clarity will contribute to meaningful progress in developing these products, despite the programme not being a route to market.

How can DLRC help?

Overall, the AI Airlock programme demonstrates the MHRA’s willingness to transform the UK regulatory landscape for medical devices. It also highlights the MHRA’s welcoming attitude towards innovative AI products. As a developer looking to develop and launch an AIaMD product, the UK market may, therefore, be very appealing.

DLRC looks forward to the release of the final outcomes of the pilot programme. We are eager to hear about the regulatory challenges presented in the next phase. In the meantime, challenging AIaMD projects may benefit from the wealth of regulatory support offered at DLRC. DLRC holds extensive experience interacting with global health authorities, including the MHRA, and leading notified bodies. For innovative and borderline medical device products, having the support and expertise of a reputable regulatory consultancy early on in your development process can be extremely valuable in the progression of your project. Contact us at hello@dlrcgroup.com to speak to our experts today.