Cleaning data: the work of superheroes
When it comes to using big data and machine learning for the benefit of patients, there’s still a lot of ground to cover. This isn’t only ethically, but also on the technical and organisational side of things. Happily, the first speaker is Dr Ari Ercole from Cambridge University, who attempts to convince everyone that after his talk – ‘Do-It-Yourself Artificial Intelligence for Clinicians: Not So Hard’ – they will want to rush home to create their own confusion matrix.
With a PhD in physics and a background in neuroscience intensive care, Dr Ercole likes to imagine his work is worthy of a superhero. “But actually most of my job is about taking out the garbage. I spend 80 per cent of my time taking out the dirt and cleaning up data. I get rid of the outliers and the errors and make sure it’s all consistent.” “Once clean it’s not that hard to get to work. Most of the big data concepts are quite straightforward – and not really any more complex or abstract as statistics or epidemiology. Plus it’s cheap: all you need is a computer!”
To Jupyter and beyond
Dr Ercole makes his case convincingly, giving real-time demonstrations applying the commonly used R programming language in the open source Project Jupyter. Using supervised prediction algorithms and machine learning regression, he shows models that can predict ‘whether someone lives or dies’ based on different variables – from gender to the concentration of certain chemicals. The models made one thing clear for the living: avoid lactate build-up.
When Dr Ercole gives a short overview on the state-of-the-art workings of Deep Neural Networks, he admits this approach is less straightforward: “Explainability does become an issue here. The potential is only limited by your imagination!”
Making care data FAIR
The second speaker Ronald Cornet is an associate professor at the Department of Medical Informatics in the Amsterdam Public Health research institute, Amsterdam UMC. His talk ‘From Care Data to FAIR Data’ opens with a warning sign: ‘Caution: Work in Progress’.
Cornet echoes Dr Ercole’s observation that 80 per cent of the work is about cleaning up data. He also jokes about how the ‘free text’ of doctors is often a far cry from the structure that data scientists crave. He says that currently a staggering 25 per cent of all descriptions are changed by doctors. However, his organisation seeks to support clinicians so they avoid burnout. It also tries to reduce the time they spend ticking boxes.
Currently, Amsterdam UMC is in the midst of a four-year programme to apply natural language processing (NLP) for automatic context detection (in Dutch) to help clean up the ‘free text’ of doctors. These efforts are an important step towards making data FAIR – Findable, Accessible, Interoperable and Reusable, Making data FAIR means more research can be done for the benefit of the patient.
Liking your data dirty
As if to emphasise the scale of the work ahead, Dr Cornet ends his talk by turning job recruiter with a screen full of positions that are currently available at the UMC. As students, doctors and data scientists enjoy the pizza they cluster around the speakers to ask more questions. Their enthusiasm is palatable. While countless challenges lie ahead, there’s a real shared feeling: the future is now.
For more reports from previous Medical Data plus Pizza Meet-ups, click here.
Next special edition: Pitching to find a doctor or data scientist
The Amsterdam Medical Data Science Group meetup takes place on the third Tuesday of every month in the Delta Room at the VU University Medical Center Amsterdam’s Intensive Care Unit. The next edition will take place on May 21. It’s the pitch edition so make contact if you have a great idea around big data or a machine learning application.
To find out more visit: https://www.meetup.com/amsterdam-medical-data-science.