Export control in a digital world – is synthetic data the future for clinical studies?

Part 5 of ScienceJournalJourneys – Hi I am your SG pharmacometrician, an early career researcher here to share the latest trends and interesting facts in pharmacometrics

When you bring goods to another country, it is normal for the goods to be subject to import and export controls, limiting what you can bring in and out.

Similarly, with data sharing, proper regulations are required. Without proper controls, data sharing could result in the leak of personal private information, causing disastrous implications on the affected individuals. However, controls that are too strict hamper research. Data sharing promotes scientific reproducibility, collaborations and even new discoveries. Researchers thus have a straddle a fine line between data protection and data sharing.

At A*STAR, clinical data is often anonymized according to a list of rules by the personal data protection act (PDPA). However, due to the amount of information present in clinical data, there is still a chance of being able to trace the data back to an individual. Synthetic data presents a potential solution. Based on a real-world population, a virtual population is generated with similar characteristics, but without any identifying information to trace back to an individual. This would allow data sharing to become much easier, bringing us to today’s paper by JB Woillard et al, https://pmc.ncbi.nlm.nih.gov/articles/PMC11706419/#psp413240-sec-0013 on the use of synthetic data in pharmacogenomics.

In this paper, 3 synthetic data generation methods Avatar, CT‐GAN and TVAE were tested for how well the synthetic data could retain the population trends while preventing data reidentification. In their dataset of renal transplant patients, all 3 methods were able to reidentify the significant variable of haplotype in the risk of graft loss, matching the original study. However, other variables of donor age and donor CYP3A5 also came up as significant with CT-GAN and augmented Avatar. The algorithms also showed differing performances in estimating the hazard ratio for the haplotype variable, with CT-GAN having the closest prediction, while Avatar overestimated the hazard ratio significantly. CT-GAN also demonstrated the best performance in terms of privacy.

Overall, this is an important study evaluating the utility of synthetic data for clinical pharmacology studies. While current tools to generate synthetic data might not be ready to generate evidence for clinical decisions, the tools show good promise in being able to mirror a real world population. As data privacy becomes of increasing concern in our rapidly digitized world, it would only make sense to develop these synthetic data methods further to allow researchers to continue making important discoveries while reducing the risk of data privacy breaches.

Hope you learnt something with me today~

Subscribe to my site to never miss a post from me! https://singaporepharmacometrics.com/

Unknown's avatar

About janice goh

Dr. Janice Goh graduated from NUS Pharmacy and is a registered pharmacist with the Singapore Pharmacy Council. She recently completed her PhD in the lab of Professor Rada Savic at the University of California, San Francisco (UCSF) School of Pharmacy. She is currently a senior scientist at the Bioinformatics Institute, A*STAR. Her work focuses on using quantitative systems pharmacology using translational pharmacometrics tools by capitalising on preclinical data to predict clinical outcomes prior to actual trials.
This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

Leave a comment