An investigation of the application of autoencoders and large-language models to privacy-utility tradeoffs in group-specific settings

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Machine learning models have emerged as highly effective tools in tackling an array of practical challenges, including image classification, regression tasks, behavioral forecasting, and natural language processing, among others. Their ability to analyze substantial volumes of data endows them with significant value across diverse domains. The widespread success of machine learning models has led data analysts and scientists to extensively incorporate them into predictive and generative modeling endeavors. While leveraging data can enable professionals to gain insights into user preferences and refine predictive and generative accuracy, there exists a risk of inadvertently or intentionally inferring sensitive personal information from the data. Consequently, there is a critical need for the development of privacy mechanisms capable of sanitizing data to shield sensitive attributes while retaining data utility. Achieving this balance is pivotal in safeguarding individual privacy amidst the complexities of the digital age.

In discussions revolving around privacy considerations, there is a prevalent assumption that privacy requirements are homogeneous across the populace, a construct termed as the single-group setting. This dissertation introduces an adversarial learning framework that employs various iterations of autoencoders, specifically designed to navigate the privacy-utility tradeoff within this single-group setting. While the single-group setting holds significance in numerous practical contexts, it fails to consistently capture the varied requirements of different use cases. Recognizing the imperative to address a broader spectrum of scenarios, this dissertation introduces a problem formulation focused on the privacy-utility tradeoff within two distinct user groups. These heterogeneous groups encompass users characterized by differing private and utility attributes, aiming to encompass a wider array of application scenarios within the privacy-utility tradeoff domain.

Furthermore, the landscape of computer science is undergoing significant transformation, particularly in light of the recent surge in large language models (LLMs). LLMs are rapidly evolving, necessitating a thorough examination of the impact of these advancements on privacy domain. Thus, this dissertation endeavors to investigate whether LLMs, endowed with diverse capabilities, can be effectively employed for data sanitization to uphold user privacy. The straightforwardness of this approach without requiring specialized expertise underscores the adaptability of large language models in addressing privacy concerns.

Collectively, this dissertation seeks to enhance the understanding of privacy-utility tradeoff in different group specific settings with adversarial learning frameworks and large language models. The findings underscore the potential of these methodologies to provide robust privacy protections while simultaneously preserving or augmenting the utility of data across a myriad of applications.

Description

Keywords

data privacy, inference privacy, machine learning, large language models, adversarial optimization

Graduation Month

August

Degree

Doctor of Philosophy

Department

Department of Computer Science

Major Professor

George Amariucai

Date

2024

Type

Dissertation

Citation