Google has got involved in healthcare data – can we trust it?
Google has some of the most powerful computers and smartest algorithms in the world, has hired some of the best brains in computing, and through its purchase of British firm Deepmind has acquired AI expertise that recently saw an AI beat a human grandmaster at the game of go. Why then would we not want to apply this to potentially solving medical problems – something Google’s grandiose, even hyperbolic statements suggest the company wishes to?
The New Scientist recently revealed a data sharing agreement between the Royal Free London NHS trust and Google Deepmind. The trust released incorrect statements (since corrected) claiming Deepmind would not receive any patient-identifiable data (it will), leading to irrelevant confusion about what data encryption and anonymisation can and cannot achieve.
As people have very strong feelings about third-party access to medical records, all of this has caused a bit of a scandal. But is this an overreaction, following previous health data debacles? Or does this represent a new and worrying development in the sharing of medical records?
Researchers like me have been using this sort of anonymised NHS data for years. Can anyone point to an id breach? https://t.co/7F2FFGHen2— John Appleby (@jappleby123) May 4, 2016
That the NHS outsources its data analysis requirements is nothing new. The NHS data centre HSCIC publishes regular data sharing reports, and its latest report details releases to companies such as CSL-UK, Northgate, McKinsey, and Dr Foster. These firms will sell the processed data back to the NHS.
Actually, while most NHS data sharing with companies is for so-called secondary purposes that lie outside the provision of direct clinical care, the deal with Google is classed as for direct care. Doctors get an app called Streams which uses a patient’s live medical data and their historical record to determine their risk of acute kidney injury.
So it makes perfect sense for the app to access personally identifiable data of the patient being treated, and on that basis the claim that “Google has access to 1.6m patients' data” should not be cause for concern. Especially as Google accesses the data mostly indirectly, through an unnamed third party with certified information security standards, circumventing issues around potential abuse of the data by Google.
Not so clear
But another stated purpose of the deal is “real time clinical analytics, detection, diagnosis and decision support”, presumably with the intention of building an online platform for “medical-data-analysis-as-a-service”. Anything “as-a-service” normally implies the processing is done in the cloud, although the agreement with Google says little about that. Cloud processing means sensitive personal data will be sent to a Google server at some point.
The inclusion of five years of all patients' historic data is justified to “aid service evaluation and audit of the new product”. But it’s hard to see how this is different from just using the data to improve the kidney injury algorithm in the first place. Deepmind’s claim that “Streams does not use AI” is downright bizarre in relation to the amount of data they claim to need, as this amount of data is usually used to feed machine learning algorithms that can then make better decisions because of it. Access to this trove of historic patient data will almost certainly come from Google itself.
Otherwise, the agreement with Google professes to be fully compliant with the Data Protection Act, standard medical data principles, and NHS procedures. Data transfer is secure (and encrypted), staff have been trained to respect confidentiality, and the data cannot be used for other purposes than those listed.
One principle mentioned is the Caldicott principle of using the minimum data required. But here this appears to be interpreted as: in order to treat one patient using Streams, we need five years’ medical data of 1.6m patients. This is seeing clinical care through a mass surveillance lens – we need all the data on everyone, just in case they require treatment. Conveniently, for clinical treatment matters NHS information governance allows the use of “implied consent” rather than any direct involvement from the subjects themselves.
Black box surveillance
The question is, of course, whether we trust Google to stick to these policies. The agreement allows for auditing by the NHS trust, and this may be enough of a deterrent against more direct and blatant abuses.
However, Google deals with personal data constantly: our search histories probably feed back on Google search rankings via some profiling process. Our Gmail emails are scanned for marketing purposes. If we stop Google from recording our location histories for our own use, do they still survive in the Google databases as some “anonymous” person’s location history? There is a lot here that Google is not telling us.
Improving the kidney injury algorithm or developing an analytics platform using medical data will generate more data. Service evaluation of the new product will generate more data. Some of that data will live in the shady world of people profiles, anonymised users, and aggregated user characteristics. It will be data that is somewhat personal but not personal enough for our crude data protection laws to be able to protect it.
In this world of black box surveillance, Google is probably the world’s biggest player. As long as it offers so little transparency in how it uses and processes data, we have to distrust it to some degree – and perhaps in this context specifically.
Eerke Boiten receives funding from the UK government for the Kent Academic Centre of Excellence in Cyber Security Research, as well as from the EU for an Innovative Training Network in Cyber Security.
Eerke Boiten, Senior Lecturer, School of Computing and Director of Academic Centre of Excellence in Cyber Security Research, University of Kent