Research reveals de-identified patient data can be re-identified

Image of screen with data.
Image: iStock

University of Melbourne researchers have found that confidential patient data can be re-identified, without decryption, prompting calls for improved and strengthened algorithms for protecting individuals’ online privacy.

report, published today by Dr Chris Culnane, Dr Benjamin Rubinstein and Dr Vanessa Teague from the University’s School of Computing and Information Systems, outlines how de-identified historical health data from the Australian Medicare Benefits Scheme (MBS) and the Pharmaceutical Benefits Scheme (PBS) released to the public in August 2016 can be re-identified using known information about the person to find their record.

“We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the record with known information about the individual such as medical procedures and year of birth,” Dr Culnane said.

“This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy.”

The latest study reveals unique patient records matching the online public information of seven prominent Australians, including three (former or current) MPs and an AFL footballer.  While a unique match may not always be accurate, Dr Rubinstein said there was the possibility to improve confidence by cross-referencing other data.

“Because only 10 per cent of Australians are included in the sample data, there can be a coincidental resemblance to someone who isn’t included,” Dr Rubinstein said.

“We can improve confidence by cross-referencing with a second dataset of population-wide billing frequencies.  We can also examine uniqueness according to the characteristics of commercial datasets we know of, such as bank billing data.”

Dr Teague said there were strong reasons to improve access to high-quality, and sometimes sensitive, data to facilitate research, innovation and sound public policy. However, she argues there remain important technical and procedural problems to solve.

“Open publication of de-identified records like health, census, tax or Centrelink data is bound to fail as it is trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records,” Dr Teague said.

“We need a much more controlled release in a secure research environment, as well as the ability to provide patients greater control and visibility over their data. Legislating against re-identification will hide, not solve, mathematical problems, and have a chilling effect on both scientific research and wider public discourse.”

Read more on Pursuit.