The regulatory frameworks are often unclear when it comes to software as a medical device, and might be more unclear when it comes to artificial intelligence. At Aidence, we work hard to meet regulatory requirements and do our best to interpret the rules that are often not written primarily with software in mind. When doing so, the most important consideration for us is: how do these requirements help us to improve the safety and performance of our device?
In this article, we look at the current software regulatory framework, the Medical Device Directive (MDD), analyse important changes introduced by the new Medical Device Regulation (MDR 2017/745) that will impact software as medical devices and evaluate their applicability to AI. It should come as no surprise that we zoom in on changes regarding the classification of medical devices; they will, beyond doubt, have the biggest impact on software manufacturers in the EU.
A brief history of regulations
When the MDD (93/42/EEC) was first released in 1993, it only mentioned ‘software’ twice, and merely considered software that was embedded in hardware devices. It wasn’t until the 2007 update (2007/47/EC or M5 of the MDD) that software as a stand-alone medical device received special attention (‘software’ being mentioned seven times), and software validation became an explicit requirement. The release of the M5 update (coming into force in 2010) had a strong impact on software manufacturers.
As a regulatory specialist with over 10 years of experience, I find it hard to understand that regulations specifically related to software really developed only after I started my career in regulatory affairs. For me, the level of evidence that needs to be provided to demonstrate safety and performance should pertain to any medical device. A portion of the late development of the software framework can be attributed to the limited role of software in healthcare, while another to regulatory agencies responding slowly to technology developments.
Although the M5 update addressed clear gaps around software development, it failed to address all of its needs. For example, the classification rules were not specifically amended to consider software as a stand-alone medical device. Nor were there any considerations with regard to cybersecurity aspects or software interoperability (topics for upcoming articles). With the vast expansion of software products and new technologies such as AI, there was clear room for improvement.
What the MDR brings
First, it’s worth mentioning that the MDR should have been applicable to all medical device manufacturers as of the 26th of May 2020 at the latest; however, due to the covid-19 pandemic, the date of application has been postponed to 2021. Although the MDR brings an enormous amount of changes (and mentions ‘software’ 48 times in the base text), I will solely focus on the (broader) impact on software.
A new definition of ‘medical device’
The first major change is to the definition of a medical device:
This change broadens the scope of medical device legislation. For software, this affects products that are explicitly intended to prevent or monitor disease without having a diagnostic or therapeutic purpose. For example, if a device claims it helps you stay healthy, it might be perceived to claim prevention of disease, and must then be regulated as a medical device. I can easily think of a number of smartphone apps that will be regulated under the MDR as medical devices. Also new (per Annex II (1.1 (e)), all manufacturers must demonstrate how their device will qualify as a medical device.
An overhaul of software classification
The second major change is the introduction of Rule 11 in Annex VIII. It starts by mentioning:
“Software intended to provide information which is used to take decisions with diagnosis or therapeutic purposes is Classified as class IIa.”
It is interesting that any software, providing any sort of information, which may be used for taking decisions of diagnostic or therapeutic nature, is automatically classified as IIa. In my view, the main purpose of software is nearly always to provide information (whether for interpretation by humans or machines). Most of the information generated by software will influence decisions made with regard to diagnosis or therapy (even if just used for monitoring), and therefore, very little software will remain in class I.
The rule continues by mentioning that:
“except if such decisions have an impact that may cause:
— death or an irreversible deterioration of a person’s state of health, in which case it is in class III; or — a serious deterioration of a person’s state of health or a surgical intervention, in which case it is Classified as class IIb.”
The first part ignores the role of the software (and the information it provides) in decision-making. Namely, if the decision made has severe consequences, the device becomes class IIb or III. What this means is that software that supports a physician in automatically selecting the borders of an area of interest in a medical image to assess the size of an abnormality, which could be cancerous, becomes class IIb at minimum. While the legislation, even more than the MDD 93/42/EEC, focuses on risk management, this rule fails to assess the risk introduced by the software on decision-making. In practice, software can play a role by only providing informative text or just to support the physician, or it could provide a direct diagnosis and even be fully autonomous by making follow-up decisions.
The MDCG guidance
Luckily, the Medical Device Consultancy Group (MDCG) has brought some sense into the rule by releasing the MDCG 2019-11 guidance. It links the classification to the outline of the International Medical Device Regulators Forum (IMDRF) classification framework proposal.
The IMDRF, in comparison to the MDR, differentiates between the degrees of significance of the information provided. This is highly welcome, although conflicting with the actual wording in the MDR 2017/745: whereas the MDR assesses the impact on the basis of the disease outcome of decisions made, the IMDRF assesses the importance of the information provided. The IMDRF guidance unfortunately also leaves room for interpretation (especially in section 5.2 around the state of healthcare situation or condition), but does allow manufacturers and Notified Bodies to apply sense to rule 11.
In conclusion, the rule will affect all manufacturers in the sense that classification rationales require updating against the new rule, and classifications will likely change (to become higher).
How does this relate to AI devices?
AI software, mainly used today to detect (Computer-Aided Detection or CADe) and diagnose disease (Computer-Aided Diagnosis or CADx), will no longer be accepted as class I medical devices. Many AI products are classified today as class I devices, without any active review of a regulatory authority; under MDR, these will require review by Notified Bodies. This is relevant to AI devices, as they are provided with a specified level of performance accuracy. The accuracy must be substantiated by an appropriate level of clinical data, which is reviewed by clinical experts at Notified Bodies. Class I devices have not undergone such clinical expert reviews; this means that physicians in the field should be wary of this fact, and always ask the manufacturer of a class I device for its clinical evidence.
As a side note, in 2019, a second corrigendum to the MDR was published by the EU, extending the MDR compliance deadline for class I devices with a measuring function until the 26th of May 2024. I personally believe this is an unfortunate decision, because it creates an unlevel playing field for the next four years between MDR-compliant (class IIb) and existing software AI devices marketed as class I.
Strengthened clinical evaluation requirements
Further impact of the classification rule changes relates to clinical evaluations. For current class I manufacturers where classification is going up, this means their clinical evaluation will require Notified Body review. Whereas previously Notified Bodies (sort of) mandated MEDDEV 2.7/1 revision 4, many of MEDDEV’s requirements are now explicit in the MDR. I expect class I device manufacturers to have to make quite an effort here. In addition, the MDCG recently published MDCG 2020-1 on clinical evaluation of medical device software. What becomes clear is that the need to conduct clinical investigations / clinical performance studies to demonstrate device performance should be considered.
Other changes are the strengthened Post Market Surveillance and Post Market Clinical Follow-Up requirements, and the introduction of the Periodic Safety Update Report (PSUR), which must be actively submitted to the regulatory authorities for class IIa devices and higher. Clinical evaluations must address the needs for Post Market Clinical Follow-up (PMCF), and an explicit PMCF Plan becomes mandatory.
Clinical evaluation and AI
As AI devices (machine learning, and specifically deep learning) are trained and validated on datasets created and owned by their manufacturers, it is impossible to adequately compare device performance to existing devices on the market. Devices that have similar intended use and technology, might perform differently due to the fact that their training data is different. Therefore, AI devices cannot be released on the sole basis of equivalence, and require their own validation data gathered through a clinical performance study (either of prospective or retrospective data). The data in the clinical performance study must be sufficiently representative to demonstrate device safety and performance for the intended use and intended use population.
Clinical evaluation and continuous learning AI
We are currently unaware of continuous learning systems within the European market. A continuous learning system becomes better when fed with additional information in clinical practice, e.g. by physicians correcting or confirming the AI system’s output.
The MDR requires manufacturers to demonstrate performance and disclose the accuracy of the device in the Instructions for Use (IFU) to the user. Moreover, it requires manufacturers to report changes in performance to the Notified Body. With these measures, the MDR clearly obstructs the introduction of continuous learning systems onto the market. Additional guidance from the MDCG or through harmonised standards would be needed to clarify how these AI systems can be brought to the market. We see an important role for the PSUR and active reporting without review to the Notified Body on the performance. Also, active performance monitoring through appropriate quality control must be in place, to ensure that device performance does not decrease by incorrect data being fed into the device, and to monitor potential bias being introduced. This will further require strong PMCF plans and continuous evaluation of the clinical evaluation and risks.
The FDA has started to document a proposal for such a framework, which is something the EU and MDCG, in my opinion, or ISO / IMDRF should take note of, to create a global harmonised framework for such devices.
Exciting regulatory times
At Aidence, we made significant efforts to update our systems, e.g. our quality system, clinical evaluation, post-market surveillance and risk management processes; additionally, we updated our full set of technical documentation against the requirements of Annex II. Luckily, we have always considered our type of device to be class IIa under the MDD, which reduces the efforts on our side to update our documentation to class IIb.
These are exciting regulatory times, especially for new technologies like ours, and we look forward to working with regulators and organisations to help shape the future regulatory frameworks for AI. As such, we are participating in the SC42 committee by ISO to develop standards around AI, and we are involved with local regulatory agencies in the Netherlands, the UK, and the US to help pave the way forward.