A research conducted by Google DeepMind and numerous universities looks at how easily an outsider, without any prior knowledge of the data used to train a machine learning model, can get this information just by asking the model questions. We found that an adversary can pull out a lot of data, even gigabytes, from publicly available language models like Pythia or GPT-Neo, semi-public ones like LLaMA or Falcon, and private models like ChatGPT.
Key Findings:
- Vulnerabilities in Language Models: The research identifies vulnerabilities across various types of Language Models, ranging from open source (Pythia) to closed models (ChatGPT), and semi-open models (LLaMa). The vulnerabilities in semi-open and closed models are particularly concerning due to the non-public nature of their training data.
- Focus on Extractable Memorization: The study delves into the risks of extractable memorization, where data can be efficiently extracted from a machine learning model without prior knowledge of the training dataset.
- Enhanced Data Extraction Capabilities: The attack model developed by the researchers enables the extraction of training data at rates exceeding 150% compared to normal Language Model usage.
- Ineffectiveness of Data Deduplication: The research indicates that deduplication of training data does not significantly reduce the amount of data that can be extracted.
- Uncertainties in Data Handling: The study highlights ongoing uncertainties in how training data is processed and retained by Language Models.
Click here to read more