California Passes AI Training Data Transparency Bill

The California legislature has passed Bill AB 2013, mandating developers of artificial intelligence systems to disclose the data used to train their models. The bill is now going to Gov. Gavin Newsom for approval.

California Passes AI Training Data Transparency Bill
  • California has enacted a new law requiring AI developers to disclose the data used for training their models.

  • Bill AB 2013 aims to enhance transparency and accountability in AI development.

  • The new law will now go to Gov. Gavin Newsom for approval.

  • U.S., EU split over the presence of personal data in AI models.

California has passed Bill AB 2013, a pioneering law mandating that developers of artificial intelligence systems disclose the data used to train their models, Bloomberg Law reports.

The bill, initiated by Assembly member Jacqui Irwin (D) and officially titled “AB-2013 Generative artificial intelligence: training data transparency,” was passed by the state’s legislature on Tuesday, August 27.

What’s next: The new law will now go for approval to Gov. Gavin Newsom (D), who hasn’t weighed in on it yet.

If passed, this would be the most comprehensive law in the nation governing training data transparency, though it was narrowed earlier this month to apply only to generative AI, the type that can create text, images, and similar content such as Open AI’s ChatGPT.

What the law provides: The legislation is designed to promote greater transparency and accountability within the AI sector, which has faced growing scrutiny over issues related to bias, privacy, and ethical standards.

Enhanced accountability: The bill requires AI developers and companies  to provide detailed information about the datasets used in training, including the sources and types of data. 

Why it matters: This measure is expected to help identify potential biases and ensure that AI technologies are developed in a responsible manner. The bill would position California at the forefront of AI regulatory standards in the United States, surpassing the existing provisions found in Colorado’s AI law.

Criticism: The tech industry has voiced concern that the disclosure of AI training data could harm trade secrets and give birth to competitive disadvantages.

Personal Data in AI Models: Debate Ongoing

Meanwhile, the U.S. and the European Union have differing approaches when it comes to the presence of personal data in AI models, as research on large language models’ (LLMs) memorization capacity continues, according to Bloomberg Law.

The crux: The Hamburg Commissioner for Data Protection and Freedom of Information recently published a paper suggesting that AI models don’t memorize personal information like names and birth dates.

Why it matters: If the German commissioner’s findings become prevalent in Europe, individuals could no longer exercise their rights to access, correct, or delete any personal data once ingested by an LLM.

The Controversy

The German commissioner’s views triggered criticism from privacy advocates and tech experts who’ve established that LLMs can memorize personal data and regurgitate it when prompted.

Researchers from institutions such as Cornell University; University of California, Berkeley; and Google DeepMind got ChatGPT to divulge 10,000 examples of identifiable data using a $200 budget.

California lawmakers are trying to amend the state’s Consumer Protection Act to clarify that the law’s protections apply to personal information, whatever digital format it may take, including in AI systems.

Legal.io Logo
Welcome to Legal.io

Connect with peers, level up skills, and find jobs at the world's best in-house legal departments

Legal.io Logo
Welcome to Legal.io

Connect with peers, level up your skills, and find jobs at the world's best in-house legal departments