Shine the Light: California Passes New Transparency Law for Generative AI
September 4, 2024
The California legislature just passed a law that will require developers of generative AI that is made publicly available to Californians to provide transparency regarding the data used to train that AI. If signed by Governor Newsom, Assembly Bill 2013 would be the most comprehensive law in the United States regarding AI data transparency. It would require developers of generative AI, including governmental entities that provide such AI to the public, to provide a high-level disclosure of the datasets used to train that generative AI.
The law was originally drafted to apply more generally to AI, but was narrowed to apply only to “generative artificial intelligence,” which is defined as artificial intelligence1 that can generate derived synthetic content, such as text, images, video, and audio, that emulates the structure and characteristics of the artificial intelligence’s training data. The most well-known example of the generative AI programs is ChatGPT.
The law would apply to a generative artificial intelligence system, or a substantial modification to a generative artificial intelligence system or service, that is made publicly available to Californians for use. It would require any developer to post on the developer’s internet website certain documentation regarding the data used by the developer to train the generative artificial intelligence system or service2, including, but not be limited to, all of the following:
- The sources or owners of the datasets;
- A description of how the datasets further the intended purpose of the artificial intelligence system or service;
- The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets;
- A clear definition description of each category associated to the types of data points within the datasets, including the format of data points and sample values. For purposes of this paragraph, the following definitions apply:
- As applied to datasets that include labels, “types of data points” means the types of labels used;
- As applied to datasets without labeling, “types of data points” refers to the general characteristics;
- As applied to datasets that include labels, “types of data points” means the types of labels used;
- Whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain;
- Whether the datasets were purchased or licensed by the developer;
- Whether the datasets include personal information, as defined in the CCPA;
- Whether the datasets include aggregate consumer information, as defined in the CCPA;
- A description of whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence system or service;
- The time period during which the data in the datasets were collected, including a notice if the data collection is ongoing;
- The dates the datasets were first used during the development of the artificial intelligence system or service; and
- Whether the generative artificial intelligence system or service used or continuously uses synthetic data generation in its development, a developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the system or service.
The disclosure requirements must be made on or before January 1, 2026, and apply to each time any generative artificial intelligence system or service or substantial modification of such system or service is released on or after January 1, 2026.
The obligations apply to developers of the generative AI, which is any person, partnership, state or local government agency, or corporation that designs, codes, produces, or substantially modifies an artificial intelligence system or service for use by members of the public.
The disclosure requirements do not apply to a generative artificial intelligence system or service whose sole purpose is (a) to help ensure security and integrity, such as to detect security incidents, resist malicious, deceptive, fraudulent, or illegal actions or ensure the physical safety of a natural person; (b) the operation of aircraft in the national airspace; or (c) developed for national security, military, or defense purposes that is made available only to a federal entity.
1Artificial intelligence is defined as an engineered or machine-based system that varies in its level of autonomy and that can, for explicit or implicit objectives, infer from the input it receives how to generate outputs that can influence physical or virtual environments.
2This refers to the testing, validating, or fine tuning by the developer of the artificial intelligence system or service.
This memorandum is a summary for general information and discussion only and may be considered an advertisement for certain purposes. It is not a full analysis of the matters presented, may not be relied upon as legal advice, and does not purport to represent the views of our clients or the Firm. Scott W. Pink, an O’Melveny Special Counsel licensed to practice law in California and Illinois; Randall W. Edwards, an O’Melveny Partner licensed to practice law in California; Sid Mody, an O’Melveny Partner licensed to practice law in Texas; Jonathan P. Schneller, an O’Melveny Partner licensed to practice law in California; Kevin Feder, an O’Melveny Partner licensed to practice law in the District of Columbia and California; and Mark Liang, an O’Melveny Partner licensed to practice law in California, contributed to the content of this newsletter. The views expressed in this newsletter are the views of the authors except as otherwise noted.
© 2024 O’Melveny & Myers LLP. All Rights Reserved. Portions of this communication may contain attorney advertising. Prior results do not guarantee a similar outcome. Please direct all inquiries regarding New York’s Rules of Professional Conduct to O’Melveny & Myers LLP, 1301 Avenue of the Americas, Suite 1700, New York, NY, 10019, T: +1 212 326 2000.