With COVID-19, data science is having a very large moment
It’s a somewhat awkward truth: the current global coronavirus pandemic is proving to be a boon for data scientists. Large amounts of data is streaming in from all over: not only hospital intensive care units (ICUs) but the private sector, too. As a result, data scientists have a lot of rich information to pump into their models – improving their ability to predict possible future scenarios that could help us better manage a possible second wave.
There’s a lot of post-corona data science to share…
It was a busy day for medical data science on 24 June 2020. While the original Amsterdam Medical Data plus Pizza events were a bimonthly affair, it’s post-coronavirus substitute ADS & AMDS Webinar is taking place weekly – and this week, it was a data double header.
The webinar is now available for everyone to watch:
Rich data streams from busy ICUs
The reason why the spotlight is on data science is simple: Scientists are gaining access to rich data streams that were previously unavailable. While applying machine learning in clinical settings was already in motion, it always came paired with justifiable but time-consuming manoeuvrings around issues such as privacy.
In the face of COVID-19, these rules have been streamlined so we can learn more quickly about a largely unknown deadly virus. Regardless, the data is still being anonymised and kept on a highly secure server – and there is also an opt-out clause for patients who are not comfortable sharing their data.
Data from the private sector
Meanwhile, the private sector is being very generous with its data – which companies would usually guard carefully as intellectual property and charge others heavily to access.
In short: data scientists have mixed emotions these days. While alarmed at the mounting tragedies caused by a global pandemic, they are also inspired by their sudden access to mountains of rich data that may help them fight a very real – and still largely unknown – enemy.
The ADS & AMDS Webinar’s three presentations proved to be a handy overview of the power and limitations of various data-led approaches aimed at dealing with COVID-19 and our still uncertain future.
Dutch ICUs unite for the greater data good
Daan de Bruin, a lead data scientist at predictive medical modelling company PacMed, presented ‘Building a national COVID-19 ICU database: challenges, solutions and opportunities.’
De Bruin spoke of a historical collaboration involving all the major hospitals in the Netherlands, which will all share their data in the hopes of finding best practices and treatment strategies for COVID-19 patients in ICU. “Since the COVID-19 patient data of a single ICU is insufficient to generate meaningful insights, Dutch ICUs have joined forces,” he explains.
While the project was slowed by the task of standardising the data to a common format before adding it into one centralised data warehouse, De Bruin now sees a clear path for improving COVID-19 treatments. “The data is so rich: ICUs collect so much and so often, it makes it a very exciting environment for a data scientist,” De Bruin enthuses. “We now have enough data to start to make very meaningful models.” He believes the model could already prove helpful in improving treatments in time for a potential second wave.
Predicting the next outbreak – in your neighbourhood
As a postdoctoral data science researcher at Amsterdam UMC, Bjorn van der Ster presented ‘Visualising and predicting a new disease in the community; a case of reusing public data and COVID-19’.
Before the coronavirus arrived in the Netherlands, Van der Ster had collaborated in developing a model to predict influenza outbreaks and peak admissions at a local level – “to help improve planning, optimise resource allocation, and improve patient and hospital staff experience.”
The pivot to confronting COVID-19 was an easy choice to make. Bringing in a variety of data, from weather conditions to the latest patient intake numbers, the first version can now be found at www.windfall.ai. It can already predict new COVID-19 cases, as well as hospital admission and death rates, for the coming four days in very localised areas.
While the data streams have slowed in the Netherlands with the drop of coronavirus cases, Van der Ster is looking ahead to a potential second wave. By applying more public data, such as traffic and socio-economic conditions, he hopes to extend the range and precision of the model’s predictive powers.
“We are also talking to the Maltese and UK governments who are looking at using a similar approach,” says Van der Ster. “Actually, anyone can have access to our data and models. Just ask.”
The ultimate trade-off: health vs economic growth
Much of the same crowd reconvened at 4 pm to tune to the USA for a talk by Professor Kent Smetters and senior analyst Alex Arnon from the University of Pennsylvania’s Wharton School. They presented their work on the Penn-Wharton Budget Model – a model that simulates and predicts the effect of business re-openings on health and economic variables.
Their project also involves multiple data streams. “While we come from an economic background, the models are largely the same,” says Smetters. “So that’s why you are seeing a lot of economic data scientists switching over to do more medical work in the face of the coronavirus.”
Both scientists downplayed the effects of government policies on spreading: “It really comes down to how people are actually behaving. Are they keeping social distance or not? Are they wearing masks? Are they meeting indoors or outdoors?” says Smetters.
Currently, the model predicts more than 200,000 deaths by August 2020. If social distancing measures are reduced, the death toll will be 50,000 higher – but there will be 4.3 million more jobs. Various US states are already using the model to decide on the trade-offs they want to make when coming out of lockdown.