Healthcare is a challenging area for machine learning research as many datasets have restrictions due to their privacy concerns. Training an ML classification model with a high precision for medical purposes requires advanced computing resources and sufficient training data collected from diverse groups of patients, e.g., age-wise, ethnicity-wise, etc. To make this possible, multiple medical centers need to collaborate to train an ML model on their combined data over a remote advanced computing unit. However, sending raw sensitive data to third parties for processing is highly undesired in the regulated health industries due to privacy concerns and business competition. For example, health insurance portability and accountability act of 1996 (hipaa) is a federal law to prevent disclosure of sensitive patient health information.
Cryptographic methods such as Homomorphic Encryption and Garbled Circuits have already been considered to perform the ML algorithm as a computational function on the encrypted dataset. Unfortunately, it is a challenge to use them in practical setup, due to their high computational overheads and their needs for special setups. We propose a new coding framework that do not suffer from these vulnerabilities. Our proposed coding framework is keyless and does not require to share the encoding information of each organization with outside world. Besides, it is based on mixing elements of each sample to break the spatial dependencies within each sample and mixing each sample with an arbitrary noise to break the mutual dependencies of samples. We show the encoded training samples reveal limited information about any original dataset sample. Thus, our method is robust against ciphertext attacks that attempt to infer information about the original dataset samples.
Homa Esfahanizadeh is a postdoctoral researcher at the Massachusetts Institute of Technology (MIT). Currently, she works at the MIT Research Laboratory of Electronics (RLE) under supervision of Prof. Muriel Médard, and her focus is on distributed algorithms and coding schemes for applications that demand security, low latency, and high throughput. Homa received her Ph.D. degree in Signals and Systems from the University of California, Los Angeles (UCLA) in 2019. She received her M.Sc and B.Sc degrees in Electrical Engineering from the University of Tehran in 2015 and 2012, respectively. Her research interests include coding theory, information theory, distributed systems, and machine learning.