Artificial intelligence has the opportunity to cause a huge disruption in the way healthcare operates, but the challenge is gathering enough data. So what if, instead of focusing all our energy on this data collection process, we can adjust our deep learning algorithms to require less data? That’s what we do with 3D G-CNNs!
Read how Marysia Winkels, Machine Learning Engineer at Aidence, managed to achieve better performance while training our algorithms on 10 times less data using Group-Convolutions.
Her research has been presented at leading conferences on deep learning, including the International Conference on Machine Learning (ICML 2018). An in-depth academic article is to be featured in Medical Image Analysis (MIA) Journal.
Deep learning, and convolutional neural networks in particular, have rapidly become the methodology of choice for all (medical) image related tasks. However, these techniques typically require a substantial amount of labeled data to learn from, meaning a human radiologist needs to manually record their findings in a way the computer can understand. Furthermore, if we want the algorithms to generalise well over different patient populations, scanner types, reconstruction techniques and so forth, we need even more data!
This presents us with the challenge of data efficiency: the ability of an algorithm to learn complex tasks without requiring large quantities of data. Instead of spending all our time and energy at gathering more data, we try to increase the efficiency of the algorithms to handle the data that we already have.
To explore how we can improve the data efficiency of convolutional neural networks (CNN), we first need to understand why CNNs are such an appropriate choice for image tasks in the first place. The great thing about CNNs are that they are roughly invariant to translation. This means that if a model has learned to recognise a structure, such as a dog, it doesn’t matter where in the image the dog appears, it will still recognise it as being a dog.
This is great for images – after all, it rarely matters where exactly in an image a structure occurs, as long as you can see that it’s there. To get technical here for a moment, translations are a type of transformation you can apply to the image, but whether or not you apply it has no influence on the prediction of the model. However, the problem is that there are other types of transformations, such as reflection (mirroring) or rotation that sadly do currently influence the prediction of the model.
This is a problem, because this means that you can’t just present your algorithm with one orientation of an object (such as a dog), and expect it to work with objects that are similar, but rotated or flipped. In practice, however, especially in the medical domain, rotations and reflections of things you want to detect – such as pulmonary nodules – occur both on a small and large scale. In order for the model to be able to recognise that, you have to present the algorithm with all these orientations separately while training, which – as you can guess – means you need more training data.
Our solution to this was to create a new type of convolutional neural network (CNN) called the group-equivariant convolutional neural network (G-CNN), which can handle the rotated and reflected versions of images. And not only regular images, but also 3D volumes – the type that of images that we have when we have CTs or MRIs.
Yes, recognizing dogs in pictures is all fun and games, but how well does this work for real problems, like a medical finding? As a case study, we use pulmonary nodule detection. Pulmonary nodules are small lesions in the lung that may be indicative of lung cancer, which is why radiologists will generally try to detect these so they can track the growth over time. However, looking for these nodules can feel like looking for a needle in a haystack - without the advantage that you can just burn down the haystack to find said needle.
Lung nodules are visible on a chest CT – a 3D scan of the chest, visualising bones, muscles, fat, organs and blood vessels in grayscale. A typical chest CT is comprised of ~300 images (slices), stacked together to form the whole scan. You can imagine that looking through ~300 black and white images to find a small abnormality can be a tedious task, especially considering that nodules can take many shapes and forms.
That’s where AI comes in to help!
Normally, when approaching a data science problem, you utilise all the data you have available. Although we’d of course like to improve the detection of pulmonary nodules specifically, we are also conceptually very interested in how well our new type of network performs when presented with a lot less data to train on.
That’s why we trained our model on four different dataset sizes: 30, 300, 3.000 and 30.000 scans to train on respectively. You can imagine that 30 scans, or maybe even 300, is waaay too little to get meaningful results on, but we’ll do it anyway, just to see what happens!
The results were astonishing.
Of course, we hoped that our intuition was correct and we’d achieve an increase in performance, or a similar performance with a model trained on less data. What our experiments showed, however, was that the models trained on 10x less data with our new type of convolutional neural network achieved a similar - or better! - performance than a standard CNN trained on 10x as much data. This might not seem like a lot, but imagine being a radiologist, and the difference in having to manually detect, segment or classify (and report!) 100 samples instead of 1.000. This improvement in efficiency corresponds to a major reduction in cost and effort of data collection. This in turn makes creating new models more accessible, and brings pulmonary nodule detection and hopefully other CAD systems closer to reality.
This research was performed by Marysia Winkels as part of her thesis for the MSc Artificial Intelligence at the University of Amsterdam in collaboration with Aidence. She currently works as a machine learning engineer at Aidence.
It was supervised by Taco Cohen (Machine Learning researcher at Qualcomm and recently named as one of the 35 under 35 by MIT) & prof. dr. Max Welling (research chair in Machine Learning at the University of Amsterdam and a VP Technologies at Qualcomm), as they originally laid the foundation of the work on equivariance and group-convolutional neural networks.
For further references on this research, go check out Marysia's blog