Health records re identified in significant data breach
December 18, 2017 |
There is significant controversy about whether data can be scrubbed so that it can not be re identified. What is less controversial is that many organisations put insufficient effort into de identifying data. The authors of a paper Health Data in an Open World have demonstrated how they have re identified patients in an supposedly de identified open health data set. The authors, academics at the Shcool of Computing and Information Systems at the University of Melbourne summarised what they did this way:
With the aim of informing sound policy about data sharing and privacy, we describe successful re-identification of patients in an Australian de-identified open health dataset. As in prior studies of similar datasets, a few mundane facts often suffice to isolate an individual. Some people can be identified by name based on publicly available information. Decreasing the precision of the unit-record level data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility. We also examine the value of related datasets in improving the accuracy and confidence of re-identification. Our re-identifications were performed on a 10% sample dataset, but a related open Australian dataset allows us to infer with high confidence that some individuals in the sample have been correctly re-identified. Finally, we examine the combination of the open datasets with some commercial datasets that are known to exist but are not in our possession. We show that they would further increase the ease of re-identification.
The paper is found here.
This story has a run with both the itnews with Health open data bungle meant Aussies could be identified ( following the original story in September last year of Health pulls Medicare dataset after breach of doctor details) and the Fairfax press with Health record details exposed as ‘de-identification’ of data fails. The Privacy Commissioner has announced today that it is currently investigation the publication of the dataset. It also says it opened the investigation in late September 2016. That is an extraordinarily laconic approach to an investigation. And it is ongoing. The Privacy Commissioner’s statement is as follows:
The Australian Information and Privacy Commissioner is currently investigating the publication of the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme (PBS) datasets on data.gov.au. The investigation was opened under section 40(2) of the Australian Privacy Act 1988 (Privacy Act) in late September 2016 when the Department of Health notified the OAIC that the datasets were potentially vulnerable to re-identification.
Given the investigation into the MBS and PBS datasets is ongoing, we are unable to comment on it further at this time. However, the Commissioner will make a public statement at the conclusion of the investigation.
Realising the value of public data to innovations that benefit the community at large is dependent on the public’s confidence that privacy is protected. The OAIC continues to work with Australian Government agencies to enhance privacy protection in published datasets. A recent example is the De-identification Decision-Making Framework developed by CSIRO’s Data61 and the OAIC. This provides guidance to Australian organisations that handle personal information on meeting their ethical responsibilities and legal obligations (such as those under the Privacy Act) when considering how datasets may be shared or released.
The proverbial slow boat to China is a power cruiser compared to the Privacy Commissioner’s timid, laggardly approach to this investigation.
The Fairfax article provides:
One in ten Australians’ private health records have been unwittingly exposed by the Department of Health in an embarrassing blunder that includes potentially exposing if someone is on HIV medication, whether mothers have had terminations, or if mentally unwell people are seeing psychologists.
A report, published on Monday by Dr Chris Culnane, Dr Benjamin Rubinstein and Dr Vanessa Teague from the University of Melbourne’s School of Computing and Information Systems, outlines how de-identified historical health data from the Australian Medicare Benefits Scheme (MBS) and the Pharmaceutical Benefits Scheme (PBS) released to the public in August 2016 can be re-identified using known information about the person to find their record.
“We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the record with known information about the individual such as medical procedures and year of birth,” Dr Culnane said. “This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy.”
The study reveals unique patient records matching the online public information of seven prominent Australians, including three (former or current) MPs and an AFL footballer. While a unique match may not always be accurate, Dr Rubinstein said there was the possibility to improve confidence by cross-referencing other data.
“Because only 10 per cent of Australians are included in the sample data, there can be a coincidental resemblance to someone who isn’t included,” he said.
“We can improve confidence by cross-referencing with a second dataset of population-wide billing frequencies. We can also examine uniqueness according to the characteristics of commercial datasets we know of, such as bank billing data.”
Privacy analyst and Lockstep consultant Stephen Wilson said the breach damaged public confidence in health policy makers and data custodians.
“It’s a huge breach of trust,” he said.
“Promises of ‘de-identification’ and ‘anonymisation’ made by health officials, and ABS too in connection with census data releases, have been shown to be erroneous.
“The ability to re-identify patients from this sort of public release is frankly, in my view, catastrophic. Real dangers are posed to patients with socially difficult conditions.
“It beggars belief that any official would promise ‘anonymity’ any more. These promises cannot be kept.”
Computer security researcher Troy Hunt said re-identification of anonymised records was attractive to researchers and nefarious parties alike.
“In this case, clearly more work needs to be done to protect individuals’ identities,’ he said. “My hope is that the government embraces responsible research like this and strives to improve confidentiality rather than penalise those seeking to report deficiencies such as this.”
The federal Department of Health was notified about the issue December last year.
“The Department of Health takes this matter very seriously and had already referred this to the Privacy Commissioner,” a Department of Health spokesperson told Fairfax Media.
“The project was halted and remains halted, and the dataset was removed immediately.”
The spokesperson said the department had since taken further steps to protect and manage data.
“The Department is working with the University of Melbourne and has already acted to improve its processes. The Department has not been aware of anyone being identified.”
Meanwhile, the Office of the Australian Information Commissioner, which houses Australia’s privacy commissioner, said it was investigating the publication of the datasets.
“The investigation was opened under section 40(2) of the Australian Privacy Act 1988 (Privacy Act) in late September 2016 when the Department of Health notified the OAIC that the datasets were potentially vulnerable to re-identification,” a spokesperson said.
“Given the investigation into the Medicare Benefits Scheme (MBS) and Pharmaceutical Benefits Scheme (PBS) datasets is ongoing, we are unable to comment on it further at this time. However, the commissioner will make a public statement at the conclusion of the investigation.”
The OAIC said it continued to work with Australian government agencies to enhance privacy protection in published datasets.
“A recent example is the De-identification Decision-Making Framework developed by CSIRO’s Data61 and the OAIC. This provides guidance to Australian organisations that handle personal information on meeting their ethical responsibilities and legal obligations (such as those under the Privacy Act) when considering how datasets may be shared or released.”
Dr Teague used the exposure to blast proposed changes to the national Privacy Act which would make it a criminal offence to re-identify government data that has been stripped of identifying markers.
“Legislating against re-identification will hide, not solve, mathematical problems, and have a chilling effect on both scientific research and wider public discourse,” Dr Teague said.
Instead, Dr Teague said there were strong reasons to improve access to high-quality, and sometimes sensitive, data to facilitate research, innovation and sound public policy.
However, she argued there remained important technical and procedural problems to solve.
“Open publication of de-identified records like health, census, tax or Centrelink data is bound to fail as it is trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records,” Dr Teague said.
“We need a much more controlled release in a secure research environment, as well as the ability to provide patients greater control and visibility over their data.”