Clinical research

Early experiences of integrating an artificial intelligence-based diagnostic decision support system into radiology settings: a qualitative study

Nuša Farič, PhD, Sue Hinder, PhD, Robin Williams, PhD, Rishi Ramaesh, MD, Miguel O Bernabeu, PhD, Edwin van Beek, PhD, Kathrin Cresswell, PhD

Objectives

Artificial intelligence (AI)-based clinical decision support systems to aid diagnosis are increasingly being developed and implemented but with limited understanding of how such systems integrate with existing clinical work and organizational practices. We explored the early experiences of stakeholders using an AI-based imaging software tool Veye Lung Nodules (VLN) aiding the detection, classification, and measurement of pulmonary nodules in computed tomography scans of the chest.

Materials and methods

We performed semistructured interviews and observations across early adopter deployment sites with clinicians, strategic decision-makers, suppliers, patients with long-term chest conditions, and academics with expertise in the use of diagnostic AI in radiology settings. We coded the data using the Technology, People, Organizations, and Macroenvironmental factors framework.

Results

We conducted 39 interviews. Clinicians reported VLN to be easy to use with little disruption to the workflow. There were differences in patterns of use between experts and novice users with experts critically evaluating system recommendations and actively compensating for system limitations to achieve more reliable performance. Patients also viewed the tool positively. There were contextual variations in tool performance and use between different hospital sites and different use cases. Implementation challenges included integration with existing information systems, data protection, and perceived issues surrounding wider and sustained adoption, including procurement costs.

Discussion

Tool performance was variable, affected by integration into workflows and divisions of labor and knowledge, as well as technical configuration and infrastructure.

Conclusion

The socio-organizational factors affecting performance of diagnostic AI are under-researched and require attention and further research.

Show summary Read more

Higher agreement between readers with deep learning CAD software for reporting pulmonary nodules on CT

H.L. Hempel, M.P. Engbersen, J. Wakkie, B. J.van Kelckhoven, W. de Monyé

Purpose

The aim was to evaluate the impact of CAD software on the pulmonary nodule management recommendations of radiologists in a cohort of patients with incidentally detected nodules on CT.

Methods

For this retrospective study, two radiologists independently assessed 50 chest CT cases for pulmonary nodules to determine the appropriate management recommendation, twice, unaided and aided by CAD with a 6-month washout period. Management recommendations were given in a 4-point grade based on the BTS guidelines. Both reading sessions were recorded to determine the reading times per case. A reduction in reading times per session was tested with a one-tailed paired t-test, and a linear weighted kappa was calculated to assess interobserver agreement.

Results

The mean age of the included patients was 65.0 ± 10.9. Twenty patients were male (40 %). For both readers 1 and 2, a significant reduction of reading time was observed of 33.4 % and 42.6 % (p < 0.001, p < 0.001). The linear weighted kappa between readers unaided was 0.61. Readers showed a better agreement with the aid of CAD, namely by a kappa of 0.84. The mean reading time per case was 226.4 ± 113.2 and 320.8 ± 164.2 s unaided and 150.8 ± 74.2 and 184.2 ± 125.3 s aided by CAD software for readers 1 and 2, respectively.

Conclusion

A dedicated CAD system for aiding in pulmonary nodule reporting may help improve the uniformity of management recommendations in clinical practice.

 

Show summary Read more

Validation of a deep learning computer aided system for CT based lung nodule detection, classification, and growth rate estimation in a routine clinical population

John T. Murchison, Gillian Ritchie, David Senyszak, Jeroen H. Nijwening ,Gerben van Veenendaal, Joris Wakkie, Edwin J. R. van Beek

Objective

In this study, we evaluated a commercially available computer assisted diagnosis system (CAD). The deep learning algorithm of the CAD was trained with a lung cancer screening cohort and developed for detection, classification, quantification, and growth of actionable pulmonary nodules on chest CT scans. Here, we evaluated the CAD in a retrospective cohort of a routine clinical population.

Materials and methods

In total, a number of 337 scans of 314 different subjects with reported nodules of 3–30 mm in size were included into the evaluation. Two independent thoracic radiologists alternately reviewed scans with or without CAD assistance to detect, classify, segment, and register pulmonary nodules. A third, more experienced, radiologist served as an adjudicator. In addition, the cohort was analyzed by the CAD alone. The study cohort was divided into five different groups: 1) 178 CT studies without reported pulmonary nodules, 2) 95 studies with 1–10 pulmonary nodules, 23 studies from the same patients with 3) baseline and 4) follow-up studies, and 5) 18 CT studies with subsolid nodules. A reference standard for nodules was based on majority consensus with the third thoracic radiologist as required. Sensitivity, false positive (FP) rate and Dice inter-reader coefficient were calculated.

Results

After analysis of 470 pulmonary nodules, the sensitivity readings for radiologists without CAD and radiologist with CAD, were 71.9% (95% CI: 66.0%, 77.0%) and 80.3% (95% CI: 75.2%, 85.0%) (p < 0.01), with average FP rate of 0.11 and 0.16 per CT scan, respectively. Accuracy and kappa of CAD for classifying solid vs sub-solid nodules was 94.2% and 0.77, respectively. Average inter-reader Dice coefficient for nodule segmentation was 0.83 (95% CI: 0.39, 0.96) and 0.86 (95% CI: 0.51, 0.95) for CAD versus readers. Mean growth percentage discrepancy of readers and CAD alone was 1.30 (95% CI: 1.02, 2.21) and 1.35 (95% CI: 1.01, 4.99), respectively.

Conclusion

The applied CAD significantly increased radiologist’s detection of actionable nodules yet also minimally increasing the false positive rate. The CAD can automatically classify and quantify nodules and calculate nodule growth rate in a cohort of a routine clinical population. Results suggest this Deep Learning software has the potential to assist chest radiologists in the tasks of pulmonary nodule detection and management within their routine clinical practice.

 

Show summary Read more

Deep Learning for Lung Cancer Detection in Screening CT Scans: Results of a Large-Scale Public Competition and an Observer Study with 11 Radiologists

Colin Jacobs, Arnaud A. A. Setio, Ernst T. Scholten, Paul K. Gerke, Haimasree Bhattacharya, Firdaus A. M. Hoesein, Monique Brink, Erik Ranschaert, et al

Purpose

To determine whether deep learning algorithms developed in a public competition could identify lung cancer on low-dose CT scans with a performance similar to radiologists.

Materials and Methods

In this retrospective study, a dataset consisting of 300 patient scans was used for model assessment; 150 patient scans were from the competition set and 150 were from an independent dataset. Both test datasets contained 50 patient scans with cancer and 100 without cancer. The reference standard was set by histopathological examination for cancer positive scans and imaging follow-up for at least 2 years for cancer negative scans. The test datasets were applied to the top three performing algorithms from the Data Science Bowl 2017 public competition (called grt123, Julian de Wit & Daniel Hammack [JWDH] and Aidence). Model outputs were compared with an observer study of 11 radiologists that assessed the same test datasets. Each scan was scored on a continuous scale by both the deep learning algorithms and the radiologists. Performance was measured using multireader multicase receiver operating characteristic analysis.

Results

The area under the receiver operating characteristic curve (AUC) was 0.877 (95% CI: 0.842, 0.910) for grt123, 0.902 (95% CI: 0.871, 0.932) for JWDH, and 0.900 (95% CI: 0.870, 0.928) for Aidence. The average AUC of the radiologists was 0.917 (95% CI: 0.889, 0.945), which was significantly higher than grt123 (P = .02); however, no significant difference between the radiologists and JWDH (P = .29) or Aidence (P = .26) was found.

Conclusion

Deep learning algorithms developed in a public competition for lung cancer detection in low-dose CT scans reached performance close to that of radiologists.

 

Show summary Read more

Clinical evaluation of a deep-learning-based computer-aided detection system for the detection of pulmonary nodules in a large teaching hospital

C.O.Martins Jarnalo, P.V.M.Linsen, S.P.Blazís, P.H.M.van der Valk, D.B.M.Dickerscheid

Aim

To evaluate a deep-learning-based computer-aided detection (DL-CAD) software system for pulmonary nodule detection on computed tomography (CT) images and assess its added value in the clinical practice of a large teaching hospital.

Materials and methods

A retrospective analysis was performed of 145 chest CT examinations by comparing the output of the DL-CAD software with a reference standard based on the consensus reading of three radiologists. For every nodule in each scan, the location, composition, and maximum diameter in the axial plane were recorded. The subgroup of chest CT examinations (n = 97) without any nodules was used to determine the negative predictive value at the given clinical sensitivity threshold setting.

Results

The radiologists found 91 nodules and the CAD system 130 nodules of which 80 were true positive. The measured sensitivity was 88% and the mean false-positive rate was 1.04 false positives/scan. The negative predictive value was 95%. For 23 nodules, there was a size discrepancy of which 19 (83%) were measured smaller by the radiologist. The agreement of nodule composition between the CAD results and the reference standard was 95%.

Conclusions

The present study found a sensitivity of 88% and a false-positive rate of 1.04 false positives/scan, which match the vendor specification. Together with the measured negative predictive value of 95% the system performs very well; however, these rates are still not good enough to replace the radiologist, even for the specific task of nodule detection. Furthermore, a surprisingly high rate of overestimation of nodule size was observed, which can lead to too many follow-up examinations.

 

Show summary Read more

The effect of CT reconstruction settings on the performance of a deep learning based lung nodule CAD system

Stephan P. Blazis, Dennis B.M. Dickerscheid, Philip V.M. Linsen, Carine O. Martins Jarnalo

Purpose

To study the effect of different reconstruction parameter settings on the performance of a commercially available deep learning based pulmonary nodule CAD system.

Materials and methods

We performed a retrospective analysis of 24 chest CT scans, reconstructed at 16 different reconstruction settings for two different iterative reconstruction algorithms (SAFIRE and ADMIRE) varying in slice thickness, kernel size and iterative reconstruction level strength using a commercially available deep learning pulmonary nodule CAD system. The DL-CAD software was evaluated at 25 different sensitivity threshold settings and nodules detected by the DL-CAD software were matched against a reference standard based on the consensus reading of three radiologists.

Results

A total of 384 CT reconstructions was analysed from 24 patients, resulting in a total of 5786 found nodules. We matched the detected nodules against the reference standard, defined by a team of thoracic radiologists, and showed a gradual drop in recall, and an improvement in precision when the iterative strength levels were increased for a constant kernel size. The optimal DL-CAD threshold setting for use in our clinical workflow was found to be 0.88 with an F2 of 0.73 ± 0.053.

Conclusions

The DL-CAD system behaves differently on IR data than on FBP data, there is a gradual drop in recall, and growth in precision when the iterative strength levels are increased. As a result, caution should be taken when implementing deep learning software in a hospital with multiple CT scanners and different reconstruction protocols. To the best of our knowledge, this is the first study that demonstrates this result from a DL-CAD system on clinical data.

 

Show summary Read more

Artificial intelligence for analysing chest CT images

C.O.Martins Jarnalo, P.V.M.Linsen, S.P.Blazís, P.H.M.van der Valk, D.B.M.Dickerscheid

Summary

The technologies described in this briefing are artificial intelligence (AI) technologies for chest CT. They are used for assisting with triaging, reporting, and identifying abnormalities.

The innovative aspects are that the software helps radiologists and radiographers detect abnormalities in chest CT images.

The intended place in therapy would be to support radiologists and radiographers when reviewing chest CT images in secondary care for people who have been referred for chest CT.

The main points from the evidence summarised in this briefing are from 2 retrospective studies. The best quality evidence came from 1 UK study showing that Veye Chest (Aidence) performed similarly to chest radiologists for lung nodule segmentation growth assessment. The studies were limited in quality and no studies were published in full.

 

Show summary Read more

Using machine learning in diagnostic services A report with recommendations from CQC’s regulatory sandbox

Summary

This report presents the findings from the Care Quality Commission’s (CQC’s) regulatory sandbox pilot. Regulatory sandboxing is a way of working proactively and collaboratively to understand new types of health and social care service, agree what good quality looks like, and develop our approach to regulation. We think this is particularly important for innovative and technology-enabled services, which are developing quickly, and where a response requires collaboration with other national bodies.

This sandbox round focused on the use of machine learning applications for diagnostic purposes in healthcare services. Part of this work involved building a consensus on what is needed to deliver high-quality care in services that use these applications. To do this, we worked with healthcare providers, technology suppliers, people who use services, clinicians, and other stakeholders. We have used the findings of this sandbox to identify and consider where we need to update our current regulatory methods, and what work we need to do to get this right, which will help us to regulate these services better.

 

Show summary Read more

Aidence is now DeepHealth. Learn more about our AI-powered health informatics portfolio on deephealth.com

X

Book a demo

    * Required fields.
    For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.
    This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.