The Australian Information Commissioner releases guidelines
October 21, 2024 |
AI presents a major regulatory challenge across a range of governmental and private activities. And that is especially the case with privacy. The UK Information Commissioner’s Office has issued detailed guidance and other resources on Artificial Intelligence. The US Federal Trade Commission raised issues on AI, by way of a Big Data report in 2016, by post in 2017, issued a guidance by way of Q & A in 2020 and a finding on the use of Artificial Intelligence In the Matter of DoNotPay, Inc. Matter Number 2323042 September 25, 2024. Which brings us to the Australian Information Commissioner’s release of AI guidance today. There are actually 2 guides, one on the use of commercially available AI products. The second relates to developers using personal information to great AI models.
AI needs personal information to properly work. Lots of it. Each of the guides and highlight the care that needs to be taken in considering the operation of the Privacy Act when using and developing Artificial Intelligence.
The media release provides:
New guides for businesses published today by the Office of the Australian Information Commissioner (OAIC) clearly articulate how Australian privacy law applies to artificial intelligence (AI) and set out the regulator’s expectations.
The first guide will make it easier for businesses to comply with their privacy obligations when using commercially available AI products and help them to select an appropriate product. The second provides privacy guidance to developers using personal information to train generative AI models.
“How businesses should be approaching AI and what good AI governance looks like is one of the top issues of interest and challenge for industry right now,” said Privacy Commissioner Carly Kind.
“Our new guides should remove any doubt about how Australia’s existing privacy law applies to AI, make compliance easier, and help businesses follow privacy best practice. AI products should not be used simply because they are available.
“Robust privacy governance and safeguards are essential for businesses to gain advantage from AI and build trust and confidence in the community,” she said.
The new guides align with OAIC focus areas of promoting privacy in the context of emerging technologies and digital initiatives, and improving compliance through articulating what good looks like.
“Addressing privacy risks arising from AI, including the effects of powerful generative AI capabilities being increasingly accessible across the economy, is high among our priorities,” Commissioner Kind said.
“Australians are increasingly concerned about the use of their personal information by AI, particularly to train generative AI products.
“The community and the OAIC expect organisations seeking to use AI to take a cautious approach, assess risks and make sure privacy is a key consideration. The OAIC reserves the right to take action where it is not.”
While the guidance addresses the current situation – concerning the law, state of technology and practices – Commissioner Kind said an important focus remains how AI privacy protections could be strengthened for the benefit of society as a whole.
“With developments in technology continuing to evolve and challenge our right to control our personal information, the time for privacy reform is now,” said Commissioner Kind.
“In particular, the introduction of a positive obligation on businesses to ensure personal information handling is fair and reasonable would help to ensure uses of AI pass the pub test.”
The OAIC has published a blog post with further information about the privacy guidance for developers using personal information to train generative AI models.
The first guide provides:
Top five takeaways
-
- Privacy obligations will apply to any personal information input into an AI system, as well as the output data generated by AI (where it contains personal information). When looking to adopt a commercially available product, organisations should conduct due diligence to ensure the product is suitable to its intended uses. This should include considering whether the product has been tested for such uses, how human oversight can be embedded into processes, the potential privacy and security risks, as well as who will have access to personal information input or generated by the entity when using the product.
- Businesses should update their privacy policies and notifications with clear and transparent information about their use of AI, including ensuring that any public facing AI tools (such as chatbots) are clearly identified as such to external users such as customers. They should establish policies and procedures for the use of AI systems to facilitate transparency and ensure good privacy governance.
- If AI systems are used to generate or infer personal information, including images, this is a collection of personal information and must comply with APP 3. Entities must ensure that the generation of personal information by AI is reasonably necessary for their functions or activities and is only done by lawful and fair means. Inferred, incorrect or artificially generated information produced by AI models (such as hallucinations and deepfakes), where it is about an identified or reasonably identifiable individual, constitutes personal information and must be handled in accordance with the APPs.
- If personal information is being input into an AI system, APP 6 requires entities to only use or disclose the information for the primary purpose for which it was collected, unless they have consent or can establish the secondary use would be reasonably expected by the individual, and is related (or directly related, for sensitive information) to the primary purpose. A secondary use may be within an individual’s reasonable expectations if it was expressly outlined in a notice at the time of collection and in your business’s privacy policy.
- As a matter of best practice, the OAIC recommends that organisations do not enter personal information, and particularly sensitive information, into publicly available generative AI tools, due to the significant and complex privacy risks involved.
Quick reference guide
-
- The Privacy Act applies to all uses of AI involving personal information.
- This guidance is intended to assist organisations to comply with their privacy obligations when using commercially available AI products, and to assist with selection of an appropriate product. However, it also addresses the use of AI products which are freely available, such as publicly accessible AI chatbots.
- Although this guidance applies to all types of AI systems involving personal information, it will be particularly useful in relation to the use of generative AI and general-purpose AI tools, as well as other uses of AI with a high risk of adverse impacts. It does not cover all privacy issues and obligations in relation to the use of AI, and should be considered together with the Privacy Act 1988 (Privacy Act) and the Australian Privacy Principles guidelines.
- A number of uses of AI are low-risk. However, the use of personal information in AI systems is a source of significant community concern and depending on the use case, may be a high privacy risk activity. The OAIC, like the Australian community, therefore expects organisations seeking to use AI to take a cautious approach to these activities and give due regard to privacy in a way that is commensurate with the potential risks.
Selecting an AI product
-
- Organisations should consider whether the use of personal information in relation to an AI system is necessary and the best solution in the circumstances – AI products should not be used simply because they are available.
- When looking to adopt a commercially available product, businesses should conduct due diligence to ensure the product is suitable to its intended uses. This should include considering whether the product has been tested for such uses, how human oversight can be embedded into processes, the potential privacy and security risks, as well as who will have access to personal information input or generated by the entity when using the product.
- Due diligence for AI products should not amount to a ‘set and forget’ approach. Regular reviews of the performance of the AI product itself, training of staff and monitoring should be conducted throughout the entire AI product lifecycle to ensure a product remains fit for purpose and that its use is appropriate and complies with privacy obligations.
Privacy by design
-
- Organisations considering the use of AI products should take a ‘privacy by design’ approach, which includes conducting a Privacy Impact Assessment.
Transparency
-
- Organisations should also update their privacy policies and notifications with clear and transparent information about their use of AI, including ensuring that any public facing AI tools (such as chatbots) are clearly identified as such to users. They should establish policies and procedures for the use of AI systems to facilitate transparency and ensure good privacy governance.
Privacy risks when using AI
-
- Organisations should be aware of the different ways they might be handling personal information when using AI systems. Privacy obligations will apply to any personal information input into an AI system, as well as the output data generated by AI (where it contains personal information).
- Personal information includes inferred, incorrect or artificially generated information produced by AI models (such as hallucinations and deepfakes), where it is about an identified or reasonably identifiable individual.
- If personal information is being input into an AI system, APP 6 requires entities to only use or disclose the information for the primary purpose for which it was collected, unless they have consent or can establish the secondary use would be reasonably expected by the individual, and is related (or directly related, for sensitive information) to the primary purpose.
- A secondary use may be within an individual’s reasonable expectations if it was expressly outlined in a notice at the time of collection and in your organisation’s privacy policy. Whether APP 5 notices or privacy policies were updated, or other information was given at a point in time after the collection, may also be relevant to this assessment. It is possible for an individual’s reasonable expectations in relation to secondary uses to change over time.
- Given the significant privacy risks presented by AI systems, it may be difficult to establish reasonable expectations for an intended use of personal information for a secondary, AI-related purpose. Where an organisation cannot clearly establish that such a secondary use was within reasonable expectations, to avoid regulatory risk they should seek consent for that use and/or offer individuals a meaningful and informed ability to opt-out.
- As a matter of best practice, the OAIC recommends that organisations do not enter personal information, and particularly sensitive information, into publicly available AI chatbots and other publicly available generative AI tools, due to the significant and complex privacy risks involved.
- If AI systems are used to generate or infer personal information, this is a collection of personal information and must comply with APP 3. Entities must ensure that the generation of personal information by AI is reasonably necessary for their functions or activities and is only done by lawful and fair means.
- Organisations must take particular care with sensitive information, which generally requires consent to be handled. Many photographs or recordings of individuals (including artificially generated ones) contain sensitive information and therefore may not be able to be generated by, or used as input data for, AI systems without the individual’s consent. Consent cannot be implied merely because an individual was notified of a proposed collection of personal information.
- The use of AI in relation to decisions that may have a legal or similarly significant effect on an individual’s rights is likely a high privacy risk activity, and particular care should be taken in these circumstances, including considering accuracy and the appropriateness of the product for the intended purpose.
Accuracy
-
- AI systems are known to produce inaccurate or false results. This risk is especially high with generative AI, which is probabilistic in nature and does not ‘understand’ the data it handles or generates.
- Under APP 10, organisations have an obligation to take reasonable steps to ensure the personal information collected, used and disclosed is accurate. Organisations must consider this obligation carefully and take reasonable steps to ensure accuracy, commensurate with the likely increased level of risk in an AI context, including the appropriate use of disclaimers or other tools such as watermarks.
Overview
Who is this guidance for?
This guidance is targeted at organisations that are deploying AI systems that were built with, collect, store, use or disclose personal information. A ‘deployer’ is any individual or organisation that supplies or uses an AI system to provide a product or service.[1] Deployment can be used for internal purposes or used externally impacting others, such as customers or individuals, who are not deployers of the system. If your organisation is using AI to provide a product or service, including internally within your organisation, then you will be a deployer.
This guidance is intended to assist organisations to comply with their privacy obligations when using commercially available AI products. Common types of AI tools and products currently being deployed by Australian entities include chatbots, content-generation tools (including text-to-image generators), and productivity assistants that augment writing, coding, note-taking, and transcription. While most of the content in the guidance is specific to situations in which an organisation considers purchasing an AI product, it also addresses the use of AI products which are freely available, such as publicly accessible AI chatbots.
The OAIC has separate guidance on privacy and developing and training generative AI models.
How to use this guidance
This guidance is not intended to be a comprehensive overview of all relevant privacy risks and obligations that apply to the use of AI. It aims to highlight the key privacy considerations and APP requirements that your business should have in mind when selecting and using an AI product. It does not address considerations from other regulatory regimes that may apply to the use of AI systems.[2]
While the Privacy Act applies to all uses of AI which involve the handling of personal information, this guidance will be particularly useful in relation to the use of generative AI tools and general-purpose AI tools involving personal information, as well as other uses of AI with a high risk of adverse impacts.
It is important to recognise that generative AI systems may carry particular privacy risks due to their probabilistic nature, reliance on large amounts of training data and vulnerability to malicious uses.[3] However, a number of significant privacy risks can arise in relation to both traditional and generative AI systems, depending on the design and use. It can also be difficult for users to clearly distinguish between generative and traditional AI systems, which can sometimes contain similar features. Further, in some use cases, generative and traditional AI are being combined – for example, in the marketing context traditional AI can be used to identify customer segments for personalised campaigns, with generative AI then used to create the personalised marketing content.
Organisations should therefore take a proportionate and risk-based approach to the selection and use of any AI products. This guidance sets out questions aimed at assisting your organisation with this process.
This guidance does not have to be read from start to finish – you can use the topic headings to navigate to the sections of interest to you. Most sections conclude with a list of practical tips, which draw together the key learnings from each section. We have also included case studies and examples in each section to illustrate the way that the APPs may apply.
Finally, there is a Quick Reference Guide and two Checklists to help guide your business in the selection and use of AI products and summarise the obligations discussed.
Introductory terms
While there is no single agreed definition of artificial intelligence (AI), as a general term it refers to the ability of machines to perform tasks which normally require human intelligence.[4] Although AI has existed in different forms for many decades, there have been significant technological advances made in recent years which have led to the emergence of AI models which apply advanced machine learning to increasingly sophisticated uses. [5]
More technically, AI refers to ‘a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments’. Different AI systems vary in their levels of autonomy and adaptiveness after deployment’.[6] An AI model is the ‘raw, mathematical essence that is often the ‘engine’ of AI applications’ such as GPT-4, while an AI system is ‘the ensemble of several components, including one or more AI models, that is designed to be particularly useful to humans in some way’ such as the ChatGPT app.[7]
There are many different kinds of AI.[8] General-purpose AI is a type of AI model that is capable of being used, or capable of being adapted for use, for a variety of purposes, both for direct use as well as for integration in other systems.[9] ChatGPT and DALL-E are examples of general-purpose AI systems (as well as being generative AI, see further below). There are also narrow AI models, which are focused on defined tasks and uses to address a specific problem.[10] Examples of narrow AI systems include AI assistants such as Apple’s Siri or Google Assistant, and many facial recognition tools.
Generative AI refers to ‘an AI model with the capability of learning to generate content such as images, text, and other media with similar properties to its training data’ and systems built on such models.[11] Large language models (LLMs) and multimodal foundation models (MFMs) are both examples of generative AI. An LLM is ‘a type of generative AI that specialises in the generation of human-like text’.[12] Some examples of products or services incorporating LLMs are Meta AI Assistant, ChatGPT, Microsoft Copilot and HuggingChat.[13]
An MFM is ‘a type of generative AI that can process and output multiple data types (e.g. text, images, audio)’.[14] Some examples of products or services incorporating MFMs that are image or video generators include DALL-E 3, Firefly, Jasper Art, Synthesia, Midjourney and Stable Diffusion. Some examples of products or services incorporating MFMs that are audio generators include Riffusion, Suno, Lyria and AudioCraft.
Artificial intelligence and privacy
Artificial intelligence (AI) has the potential to benefit the Australian economy and society, by improving efficiency and productivity across a wide range of sectors and enhancing the quality of goods and services for consumers.[15] However, the data-driven nature of AI technologies, which rely on large data sets that often include personal information, can also create new specific privacy risks, amplify existing risks and lead to serious harms.[16]
It is important that organisations understand these potential risks when considering the use of commercially available AI products. Entities that choose to use AI need to:
-
- Consider whether their development involved (or their use will involve) personal information
- If they involve the processing of personal information, ensure that they embed privacy into their systems both when selecting and using AI products.
Existing AI systems can perform a range of tasks such as:
-
- automatically summarising information or documents
- undertaking data analysis
- providing customer services
- producing content (including editing or creating images, video and music)
- generating code, such as through the use of AI coding assistants, and
- providing speech translation and transcription.[17]
These applications are continuing to change and expand as the technologies develop.
The risks of AI will differ depending on the particular use case and the types of information involved. For example, the use of AI in the healthcare sector may pose particular privacy risks and challenges due to the sensitivity of health information.
How does the Privacy Act apply?
The Privacy Act 1988 and the Australian Privacy Principles (APPs) apply to all uses of AI involving personal information, including where information is used to train, test or use an AI system. If your organisation is covered by the Privacy Act, you will need to understand your obligations under the APPs when using AI. This includes being aware of the different ways that your organisation may be collecting, using and disclosing personal information when interacting with an AI product.
What is personal information?
Personal information includes a broad range of information, or an opinion, that could identify an individual. This may include information such as a person’s name, contact details and images or videos where a person is identifiable. What is personal information will vary, depending on whether a person can be identified or is reasonably identifiable in the circumstances. Personal information is a broad concept and includes information which can reasonably be linked with other information to identify an individual.
Sensitive information is a subset of personal information that is generally afforded a higher level of privacy protection. Examples of sensitive information include photographs or videos where sensitive information such as race or health information can be inferred, as well as information about an individual’s political opinions or religious or philosophical beliefs.
Importantly, information can be personal information whether or not it is true. This may include false information generated by an AI system, such as hallucinations or deepfakes.[18]
Privacy risks and harms
The use of both traditional and generative AI technologies carries a number of significant privacy risks. Specific risks are discussed throughout this guidance, but can include:
-
- Bias and discrimination: As AI systems learn from source data which may contain inherent bias, this bias may be replicated in their outputs through inferences made based on gender, race or age and have discriminatory effects.[19] AI outputs can often appear credible even when they produce errors or false information.
- Lack of transparency: The complexity of many AI systems can make it difficult for entities to understand and explain how personal information is used and how decisions made by AI products are reached. In some cases, even developers of AI models may not fully understand how all aspects of the system work. This creates significant challenges in ensuring the transparency and explainability of outputs of AI systems, particularly where these involve personal information.
- Re-identification: The use of aggregated data drawn from multiple data sets raises questions about the potential for individuals to be re-identified through the use of AI.[20]
- Risk of disclosure of personal information through a data breach: The vast amounts of data collected and stored by many AI models, particularly generative AI, may increase the risks related to data breaches.[21] This could be through unauthorised access to the training dataset or through attacks designed to make a model regurgitate its training dataset.[22]
- Individuals losing control over their personal information: Many generative AI technologies are trained on large amounts of public data, including the personal information of individuals, which is likely to be collected without their knowledge and consent.[23] It can be difficult for individuals to identify when their personal information is used in AI systems and to request the correction or deletion of this information. These risks will also arise in relation to some traditional AI systems which are trained on public data containing personal information, such as facial recognition systems.
Generative AI can carry particular privacy risks, such as the following:
-
- Misuse of generative AI systems: The capabilities of AI models can be misused through malicious actors building AI systems for improper purposes, or the AI model or end users of AI systems misusing them, with potential impacts on individual privacy or broader negative consequences including through:[24]
- Generating disinformation at scale, such as through deepfakes
- Scams and identity theft
- Generating harmful or illegal content, such as image-based abuse, which can be facilitated through the accidental or unintended collection and use of harmful or illegal material, such as child sexual abuse material, to train AI systems[25]
- Generating harmful or malicious code that can be used in cyber attacks or other criminal activity.
- Other inaccuracies: Issues in relation to accuracy or quality of the training data (including as a result of data poisoning)[26] and the predictive nature of generative AI models can lead to outputs that are inaccurate but appear credible. [27] Feedback loops can cause the accuracy and reliability of an AI model to degrade over time.[28] Inaccuracies in output can have flow on consequences that depend on the context, including reputational harm, misinformation or unfair decisions.
- Misuse of generative AI systems: The capabilities of AI models can be misused through malicious actors building AI systems for improper purposes, or the AI model or end users of AI systems misusing them, with potential impacts on individual privacy or broader negative consequences including through:[24]
Example – AI chatbot regurgitating personal information
A work health and safety training company used an AI chatbot to generate fictional scenarios of psychosocial hazards to be used in a course delivered at an Australian prison. One of the ‘fictional’ case studies generated by the chatbot was in fact a real scenario involving a former employee of the prison, and included the full names of persons involved as well as details from an ongoing court case.[29]
This example highlights the risks of AI systems regurgitating personal information from their training data even when prompted for fictional examples, creating a range of potential privacy compliance and ethical risks. This risk can be exacerbated when the system has been trained on limited data relevant to the prompt.
Interaction with Voluntary AI Safety Standard
The National AI Centre has developed a Voluntary AI Safety Standard to help organisations develop and deploy AI systems in Australia safely and reliably. The standard consists of 10 voluntary guardrails that apply to all organisations across the AI supply chain. It does not seek to create new legal obligations, but rather helps organisations deploy and use AI systems in accordance with existing Australian laws. The information in this guidance is focussed on compliance with the Privacy Act, but will also assist organisations in addressing the guardrails in the Standard. For more information, see: www.industry.gov.au/publications/voluntary-ai-safety-standard.
What should organisations consider when selecting an AI product?
When looking to adopt a commercially available AI product, your organisation must ensure that it has enough information to understand how the product works and the potential risks involved, including in relation to privacy. Failing to conduct appropriate due diligence may create a range of risks, including that your organisation will deploy a product which is unsuited to its intended uses and does not produce accurate responses.
Practising ‘privacy by design’ is the best way to ensure your organisation engages with AI products in a responsible way that protects personal information and maintains trust in your products or services. This means building the management of privacy risks into your systems and processes from the beginning, rather than at the end.
A Privacy Impact Assessment will assist your organisation to understand the impact that your use of a particular AI product may have on the privacy of individuals and identify ways to manage, minimise or eliminate those impacts. For more information, see our Guide to undertaking privacy impact assessments and our Undertaking a privacy impact assessment e?learning course.
The following are matters which should be considered when selecting an AI product. This is focused on privacy-related issues and is not a comprehensive list – organisations should consider whether there are obligations or due diligence requirements arising from other frameworks which may also be relevant.
Is the product appropriate for its intended uses?
When selecting an AI product, your organisation should clearly identify the ways it intends to use AI and evaluate whether the particular product is appropriate. This includes considering whether the product has been tested and proven for such uses.
You should consider whether the intended uses are likely to constitute high privacy risk activities. This may be the case, for example, if the AI system will be used in relation to making decisions that may have a legal or similarly significant effect on an individual’s rights. In circumstances of high privacy risk, it is particularly important to ensure the appropriateness of the product for the intended purpose. This will include considering whether the behaviour of the AI system can be understood or explained clearly by your organisation.
To meet your organisation’s privacy obligations, particularly with respect to accuracy under APP 10, you should ensure you understand what data sources the AI system has been trained on, to assess whether the training data will be sufficiently diverse, relevant and reliable for your organisation’s intended uses. This will also help you to understand the potential risks of bias and discrimination. For example, if the AI product has not been trained on Australian data sets you should consider whether your organisation’s use of the product is likely to comply with your accuracy obligations under APP 10. Before deploying AI products, particularly customer-facing products such as chatbots, you should carefully test them to understand the risks of inaccurate or biased answers.
You should take steps to understand the limitations of the system as well as the protections that the developer has put in place. However, while developers may implement mitigations or safeguards to address these limitations – such as to prevent an AI system from producing or ‘surfacing’ personal information in its output – these will not be foolproof. Your business will need to implement its own processes to ensure the use of AI complies with your privacy obligations – these are discussed further below.
If your organisation intends to fine-tune an existing AI system for your specific purposes, the OAIC’s guidance on developing and training generative AI models sets out further privacy considerations and requirements.
For more information about your accuracy obligations under APP 10, see ‘What obligations do we have to ensure the accuracy of AI systems?’ below.
Example – Accuracy considerations when selecting an AI product
AI technology is increasingly being applied to healthcare, from AI-generated diagnostics to algorithms for health record analysis or disease prediction.[32] While AI may have potential benefits for healthcare, users should be aware of the potential for bias and inaccuracy. For example, an AI algorithm that has been trained with data from Swiss patients may not perform well in Australia, as patient population and treatment solutions may differ.
The impact of an algorithm applying biased information may lead to worsening health inequity for an already vulnerable section of the population or supply incorrect diagnoses.[33] It is important that your organisation is aware of how the AI software was developed, including the source data used to train and fine-tune the model, to ensure it is appropriate for your intended use.
What are the potential security risks?
When assessing a new product, you should consider your organisation’s obligations under APP 11 to take reasonable steps to protect the personal information you hold from misuse, interference and loss, as well as unauthorised access, modification or disclosure. You should take active steps to ensure that the use of an AI system will not expose your organisation’s personal information holdings to additional security risks.
Consider the general safety of the product, including whether there are any known or likely security risks. It may be helpful to run a search for any past security incidents. You should also assess what security measures have been put in place by the product owner to detect and protect against threats and attacks. For example, you may consider whether a generative AI product fine-tuned with your organisation’s data has been developed to block malicious or unlawful queries intended to make the product regurgitate its training data.
Organisations should also think about the intended operating environment for the AI product. For example, if a generative AI product is to be integrated into the organisation’s systems and have access to its documents then you will need to consider whether this will occur within a secure environment on the entity’s premises or hosted through the cloud. If the generative AI system will be deployed through the cloud, it will be necessary to consider the security and privacy risks of this, including where the servers are located and whether personal information could be disclosed outside of Australia.
For example, an AI coding assistant may be used to assist with software development such as by taking over recurring tasks in the development process, formatting and documenting code, and supporting debugging. Despite these benefits, studies have shown that programs generated by coding assistants often carry security vulnerabilities and the coding assistants themselves are vulnerable to malicious attacks.[34] When considering deployment of an AI coding assistant tool that may handle (or otherwise have access to) personal information, organisations should undertake a risk analysis and put in place appropriate mitigation measures where necessary.
Who will have access to data in the AI system?
Carefully review the terms and settings which will apply to your organisation’s use of the product. In particular, you should understand whether the service terms provide the developer with access to data which your organisation inputs or generates when using the AI. If it does, you will need to consider whether the use of the product will be compliant with your privacy obligations, particularly APP 6 which restricts the disclosure of personal information for secondary purposes.
Some commercial AI products will include terms or settings that allow the product owner to collect the data input by customers for further training and development of AI technologies. This can create privacy risks, with the potential for personal information input into an AI product then surfacing in response to a prompt from another user. You should also consider whether a third party receives personal information through the operation of the commercial AI product. For example, some commercial generative AI products have interfaces with search engines or other features that would result in disclosure of personal information entered in prompts to a third party.
If the operation or features of the AI system require disclosure of personal information to the developer or third parties, you will need to consider whether it is appropriate to proceed with using the product and if so, what measures should be placed around your use of the system to ensure you comply with APP 6. Relevant measures could include turning off features that would disclose personal information, providing any required notice of disclosure to the individual, controls to prevent personal information from being entered or prohibiting your staff from entering personal information in the AI inputs. Any controls or prohibitions on including personal information in AI inputs will need to be accompanied by robust training and auditing measures.
For more information on your APP 6 obligations when using AI systems, see ‘Can we input personal information into an AI system, such as through a prompt in an AI chatbot?’ below.
Practical tips – privacy considerations when selecting an AI product
When choosing an AI product, your organisation must conduct due diligence and ensure you identify potential privacy risks. You should be able to answer the following questions:
-
- How does your organisation plan to use the AI product? What is the potential privacy impact of these uses?
- What data has the product been trained and tested on? Is this training data diverse and relevant to your business, and what is the risk of bias in its outputs?
- What are the limitations of the system, and has the developer put in place safeguards to address these?
- What is the intended operating environment for the product? Are there any known or likely security risks?
- What are the data flows in the AI product? Will the developer or other third parties have access to the data which your organisation inputs or generates when using the AI product? Can you turn these features off or take other steps to comply with privacy obligations?
Conducting a Privacy Impact Assessment may assist you with considering these questions.
What are the key privacy risks when using AI?
When using AI products, organisations should be aware of the different ways that they may be handling personal information and of their privacy obligations in relation to this information. It is important to understand the specific risks associated with your organisation’s intended uses of AI, and whether or how they can be managed and mitigated.
Organisations should always consider whether the use of personal information in relation to an AI system is necessary and the best solution in the circumstances. AI products should not be used simply because they are available.
Key privacy considerations are set out below in relation to both the data input into an AI system, and the output data generated by AI.
Can we input personal information into an AI system?
The sharing of personal information with AI systems may raise a number of privacy risks. Once personal information has been input into AI systems, particularly generative AI products, it will be very difficult to track or control how it is used, and potentially impossible to remove the information from the system.[35] The information may also be susceptible to various security threats which aim to bypass restrictions in the AI system, or may be at risk of re-identification even when de-identified or anonymised.
It is therefore important that your business proceeds cautiously before using personal information in an AI prompt or query, and only does so if it can comply with APP 6.
What are the requirements of APP 6?
When personal information is input into an AI system, it may be considered either a use of personal information (if the data stays within your organisation’s control) or a disclosure of personal information (if the data is made accessible to others outside the organisation and has been released from your organisation’s effective control) under the Privacy Act. The use and disclosure of personal information must comply with APP 6.
APP 6 provides that an individual’s personal information can only be used or disclosed for the purpose or purposes for which it was collected (known as the ‘primary purpose’) or for a secondary purpose if an exception applies. Common exceptions include where:
-
- the individual has consented to a secondary use or disclosure, or
- the individual would reasonably expect the entity to use or disclose their information for the secondary purpose, and that purpose is related to the primary purpose of collection (or in the case of sensitive information, directly related to the primary purpose).
Your organisation should identify the anticipated purposes for which you will use personal information in connection with an AI system, and whether these are the same as the purposes for which you collected the information. If you intend to use personal information in AI systems for other, secondary purposes, you should consider whether these will be authorised by one of the exceptions under APP 6.
As outlined in the example below, organisations should frame purposes for collection, use and disclosure narrowly rather than expansively. However, it can be helpful to distinguish between the use of AI which is facilitative of, or incidental to a primary purpose (such as the use of personal information as part of customer service, where AI is the tool used), from purposes which are directly AI-related (such as the use of personal information to train an AI model).
Example – Can an organisation enter personal information into a publicly available generative AI chatbot?
Employees at an insurance company experiment with using a publicly available AI chatbot to assist with their work. They enter the details of a customer’s claim – including the customer’s personal and sensitive health information – into the chatbot and ask it to prepare a report assessing whether to accept the claim.
By entering the personal information into the AI chatbot, the insurance company is disclosing the information to the owners of the chatbot. For this to comply with APP 6, the insurance company will have to consider whether the purpose for which the personal information was disclosed is the same as the primary purpose for which the information was collected. This should have been specified in a notice provided to the customer at the time of collection in accordance with APP 5.
How broadly a purpose can be described will depend on the circumstances, however in cases of ambiguity, the OAIC considers that the primary purpose for collection, use or disclosure should be construed narrowly rather than expansively.[36]
If the purpose of disclosure is not consistent with the primary purpose of collection, the company will need to assess whether an exception under APP 6 applies. If the company wants to rely on the ‘reasonable expectations’ exception under APP 6.2(a), they must establish that the customer would reasonably have expected their personal information to be disclosed for the relevant purpose. The company should consider whether their collection notice and privacy policy specify that personal information may be disclosed to AI system developers or owners, in accordance with APP 5. It may be difficult to establish reasonable expectations if customers were not specifically notified of these disclosures, given the significant public concern about the privacy risks of chatbots.
It is important to be aware that an organisation seeking to input personal information into AI chatbots will also need to consider a range of other privacy obligations, including in relation to the generation or collection of personal information (APP 3), accuracy of personal information (APP 10) and cross-border disclosures (APP 8).
Given the significant and complex privacy risks involved, as a matter of best practice it is recommended that organisations do not enter personal information, and particularly sensitive information, into AI chatbots.
When will an individual reasonably expect a secondary use of their information for an AI-related purpose?
If your organisation is seeking to rely on the ‘reasonable expectations’ exception to APP 6, you should be able to show that a reasonable person who is properly informed would expect their personal information to be used or disclosed for the proposed secondary purpose.
You should consider whether the proposed use would have been within the reasonable expectations of the individual at the time the information was collected. This may be the case if the secondary use was expressly outlined in a notice at the time of collection and in your organisation’s privacy policy.
Whether APP 5 notices or privacy policies were updated, or other information was given at a point in time after the collection, may also be relevant to this assessment. It is possible for an individual’s reasonable expectations in relation to secondary uses to change over time. However, particular consideration should be given to the reasonable expectations at the time of collection, given this is when the primary purpose is determined.
Given the significant privacy risks that may be posed by AI systems, and strong levels of community concern around the use of AI, in many cases it will be difficult to establish that a secondary use for AI-related purposes (such as training an AI system) was within reasonable expectations.
If your organisation cannot clearly establish that a secondary use for an AI-related purpose was within reasonable expectations and related to the primary purpose, to avoid regulatory risk you should seek consent for that use and/or offer individuals a meaningful and informed ability to opt-out.
Importantly, you should only use or disclose the minimum amount of personal information sufficient for the secondary purpose. The OAIC expects organisations to consider what information is necessary and assess whether there are ways to minimise the amount of personal information that is input into the AI product. It may be helpful to consider whether there are privacy-preserving techniques that you could use to minimise the personal information included in the relevant prompt or query without compromising the accuracy of the output.
Further information about APP 6 requirements and exceptions is in the APP Guidelines at Chapter 6: APP 6 Use or disclosure of personal information.
Example – Can our organisation use existing customer data with an AI system?
Your organisation may have existing information that you wish to input into a new AI product. For example, you may plan to use AI to assist with your customer service functions, such as providing tailored marketing recommendations. To do so, the AI system needs to run individual analysis on your customer data.
If your organisation is using a proprietary AI system rather than a publicly available chatbot, for example, and has protections in place to ensure that information entered into the system will not be disclosed outside the organisation (such as to the system developer), this will constitute a use rather than a disclosure of personal information.
To assess whether your organisation’s proposed use of customer data will be compliant with your APP 6 obligations, you should consider the purposes for which the information was collected. This should have been specified in a notice provided to the customer at the time of collection in accordance with APP 5.
If one of the primary purposes of collection was providing tailored marketing or financial recommendations, the proposed use of AI to assist with this analysis is likely consistent with the primary purpose. As discussed above, how broadly a purpose can be described will depend on the circumstances, however in cases of ambiguity, the OAIC considers that the primary purposes of collection, use or disclosure should be construed narrowly rather than expansively.[37]
Your organisation then wishes to use the customer data you hold to further fine-tune the AI system you are using. If the customer’s personal information was not collected for this purpose, you will need to either obtain the customer’s consent to use the information in this way or establish that the reasonable expectations exception under APP 6.2(a) applies. To establish reasonable expectations, you should consider whether you have notified the individual that their personal information would be used in this way, such as via your organisation’s APP 5 notices and privacy policy.
Further OAIC guidance on the development and training of generative AI models.
Practical tips – using or disclosing personal information as an AI input
If your organisation wants to use personal information as an input into an AI system, you should consider your APP 6 obligations:
-
- Is the purpose for which you intend to use or disclose the personal information the same as the purpose for which you originally collected it?
- If it is a secondary use or disclosure, consider whether the individual would reasonably expect you to use or disclose it for this secondary purpose. What information have you given them about your intention to use their personal information in this way?
- If it is a secondary use or disclosure, also consider whether the secondary purpose is related to the primary purpose of collection (or if the information is sensitive information, whether it is directly related to the primary purpose).
- Take steps to minimise the amount of personal information that is input into the AI system.
Given the significant and complex privacy risks involved, as a matter of best practice it is recommended that organisations do not enter personal information, and particularly sensitive information, into publicly available AI chatbots.
What privacy obligations apply if we use AI to collect or generate personal information?
Can we collect personal information through an AI chatbot?
If your organisation collects personal information through public facing AI systems, such as through the use of AI chatbots, this collection must comply with APP 3 and APP 5 requirements. For example, if you have a customer facing AI chatbot on your website and you collect the information your customers send to the chatbot, you will need to ensure this collection complies with APP 3. In particular, APP 3 requires the collection of personal information to be reasonably necessary for your entity’s functions or activities and to be carried out by lawful and fair means.
The requirements of APP 3 are discussed under ‘Can we use an AI product to generate or infer personal information?’, directly below.
Can we use an AI product to generate or infer personal information?
Where you use AI to infer or generate personal information, the Privacy Act will treat this as a ‘collection’ of personal information and APP 3 obligations will apply. Under the Privacy Act, the concept of collection applies broadly, and includes gathering, acquiring or obtaining personal information from any source and by any means, including where information is created with reference to, or generated from, other information an entity holds.[38] To meet your APP 3 obligations when using AI, you must ensure that:
-
- any personal information generated by an AI product is reasonably necessary for your business’s functions or activities;
- the use of AI to infer or generate personal information is only by lawful and fair means; and
- it would be unreasonable or impracticable to collect the personal information directly from the individual.
If you are using AI to generate or infer sensitive information about a person, you will also need to obtain that person’s consent to do so (unless an exception applies).
If personal information is created through the use of AI which your organisation is not permitted to collect under APP 3, it will need to be destroyed or de-identified.
The use of AI systems to infer personal information, such as making inferences about a person’s demographic information or political views based on data from their social media, creates risks by enabling organisations to generate and use personal information that has not been knowingly disclosed by the individual, including information which may not be accurate.
Generating personal information through AI will also attract other APP obligations, including in relation to providing notice to individuals (APP 5) and ensuring the accuracy of personal information (APP 10). These obligations are discussed under the ‘Transparency’ and ‘Accuracy’ sections of this guidance.
Using sensitive information in relation to AI products
Sensitive information is a subset of personal information that is generally afforded a higher level of privacy protection under the APPs. Examples of sensitive information include photographs or videos where sensitive information such as race or health information can be inferred, as well as information about an individual’s political opinions or religious or philosophical beliefs.
It is important that organisations using AI products identify whether they may be collecting sensitive information in connection with these systems. For example, AI image and text manipulators can generate sensitive biometric information, including information that may be false or misleading.
If you use AI to generate or collect sensitive information about a person, to comply with APP 3 you will usually need to obtain consent.[39] Consent goes beyond just a line in your Privacy Policy – you must make sure that the individual is fully informed of the risks of using or generating sensitive information, and this consent must be current and specific to the circumstances that you are using it in.[40]
Organisations cannot infer consent simply because you have provided individuals with notice of a proposed collection of personal information.
When is generating personal information ‘reasonably necessary’?
What is ‘reasonably necessary’ for the purposes of APP 3 is an objective test based on whether a reasonable person who is properly informed would agree that the collection is necessary. You should consider whether the type and amount of personal information you are seeking to obtain from the AI system is necessary, and whether your organisation could pursue the relevant function or activity without generating the information (or by generating less information).
If there are reasonable alternatives available, the use of AI in this way may not be considered reasonably necessary.
What will constitute ‘lawful and fair means’?
Using AI to generate personal information may not be lawful for the purposes of APP 3 if it is done in breach of legislation – this would include, for example, generating or inferring information in connection with, or for the purpose of, an act of discrimination.[41]
For the purposes of APP 3, the collection or generation of personal information using AI may be unfair if it is unreasonably intrusive, or involves intimidation or deception.[42] This will depend on the circumstances. However, it would usually be unfair to collect personal information covertly without the knowledge of the individual.[43]
For example, organisations seeking to use AI to generate or infer personal information based on existing information holdings for a customer, in circumstances where the customer is not aware of their personal information being collected in this way, should give careful consideration as to whether this is likely to be lawful and fair.
In considering whether the generation of personal information has been fair, it may also be relevant to consider whether individuals have been given a choice. The OAIC encourages entities to provide individuals with a meaningful opportunity to opt-out of having their personal information processed through AI syste
Example – can our business use AI to take meeting minutes?
Commonly available virtual meeting platforms can record meetings and use AI to generate transcripts or written minutes. To comply with APP 3, it is important that you carefully consider the content of the meeting discussion, and ensure that any personal information collected is reasonably necessary to your organisation’s functions or activities. If you find that your system recorded any personal or sensitive information that is not reasonably necessary, such as a personal conversation or confidential client information, you will need to ensure that the information is destroyed or de-identified.
You will also need to obtain consent from participants if the AI system collects sensitive information. Given the uncertainty of what may be discussed in a meeting, seeking consent should be pursued as a matter of best practice. Depending on the circumstances, this could be done by advising meeting attendees of the AI system being in use and providing them with the option to object at the outset. However, this approach will only be appropriate if the information about the system and opt-out option is clear and prominent to all attendees so that it is likely they will have seen and read it, and where the consequences for failing to opt-out are not serious.[44]
Whether the participants in the meeting are made aware of the meeting being recorded and a transcript being produced, and whether consent is given to this occurring, will also be relevant factors in considering whether the AI product’s collection of personal information is by lawful and fair means.[45]
Could the information be collected directly from the individual?
If you are generating personal information through an AI product, to ensure you comply with APP 3 your organisation must be able to show that it would be unreasonable or impracticable to collect this information directly from the individual.[46]
What is ‘unreasonable or impracticable’ will depend on circumstances including whether the person would reasonably expect personal information about them to be collected directly from them, the sensitivity of the personal information being generated by the AI system, and the privacy risks of collecting the information through the use of the particular AI product. While the time and cost involved of collecting directly from the individual may also be a factor, organisations will need to show the burden is excessive in all the circumstances.[47]
Practical tips – using AI to collect or generate personal information
If your organisation is collecting personal information through an AI system such as a chatbot, or using an AI product to generate or infer personal information, you should consider your APP 3 obligations:
-
- Ensure the collection or generation is reasonably necessary for your organisation’s functions or activities.
- Consider whether you are collecting or generating personal information only by lawful and fair means.
- Consider whether it would be unreasonable or impracticable to collect the information directly from the person.
- If you are collecting or generating sensitive information using AI, make sure you have the individual’s consent.
What obligations do we have to ensure the accuracy of AI systems?
Organisations subject to the Privacy Act must comply with their accuracy obligations under APP 10 when using AI systems. APP 10 requires entities to take reasonable steps to ensure that:
-
- the personal information they collect is accurate, up-to-date and complete; and
- the personal information they use and disclose is, having regard to the purpose of the use or disclosure, accurate, up-to-date, complete and relevant.
As discussed elsewhere in this guidance, AI technologies carry inherent accuracy risks. Both traditional and generative AI models built on biased or incomplete information can then perpetuate and amplify those biases in their outputs, with discriminatory effects on individuals. Developers of AI systems may also unknowingly design systems with features that operate in biased ways.
What are the specific accuracy risks of generative AI?
Accuracy risks are particularly relevant in the context of generative AI. There are a number of factors which contribute to this:
-
- Generative AI models such as large language models (LLMs) are trained on huge amounts of data sourced from across the internet, which is highly likely to include inaccuracies and built-in biases.
- The probabilistic nature of generative AI (in which the next word, sub-word, pixel or other medium is predicted based on likelihood) and the way it tokenises input can generate hallucinations. For example, without protective measures an LLM asked how many ‘b’s are in banana will generally state there are two or three ‘b’s in banana as the training data is weighted with instances of people asking how many ‘a’s or ‘n’s are in banana and because of the way it tokenises words not letters.[48]
- The accuracy and reliability of an AI model is vulnerable to deterioration over time. This can be caused by the accumulation of errors and misconceptions across successive generations of training, or by a model’s development on training data obtained up to a certain point in time, which eventually becomes outdated.[49]
- An LLM’s reasoning ability declines when it encounters a scenario or task that differs to what is in their training data.[50]
These risks can be compounded by the tendency of generative AI tools to confidently produce outputs which appear credible, regardless of their accuracy. Generative AI also has the potential to emulate human-like behaviours and generate realistic outputs, which may cause users to overestimate its accuracy and reliability.
What are the requirements of APP 10?
Organisations looking to use AI must be aware of these accuracy risks and how they may impact on the organisation’s ability to comply with their privacy obligations. As discussed above, APP 10 requires an entity to take reasonable steps to ensure that:
-
- the personal information it collects is accurate, up-to-date and complete; and
- the personal information it uses and discloses is accurate, up-to-date, complete and relevant, having regard to the purpose of the use or disclosure.
The reasonable steps that an organisation should take will depend on circumstances that include the sensitivity of the personal information, the nature of the organisation holding the personal information, and the possible adverse consequences for an individual if the quality of personal information is not ensured.[51]
There are a number of measures which organisations should consider when seeking to ensure the accuracy of AI outputs. Some examples of these are set out below. However, given the significant accuracy risks associated with AI, and its potential to be used to make decisions that may have a legal or similarly significant effect on an individual’s rights, it is possible that these measures may not always be sufficient to constitute ‘reasonable steps’ for the purposes of APP 10.
How an organisation intends to use outputs from an AI system will also be relevant to the accuracy required. When using a generative AI system for example, if it intends to use the outputs as probabilistic guesses about something that may or may not be true rather than factually accurate information about an individual and this is in a low-risk context, 100% statistical accuracy may not be required. To comply with your APP 10 obligations, you should design your processes to factor in the possibility that the output is not correct. You should also not record or treat these outputs as facts but ensure that your records clearly indicate that they are probabilistic assessments and identify the data and AI system used to generate the output.
When considering potential uses of AI products, organisations should carefully consider whether it will be possible to do so in a way that complies with their privacy obligations in respect of accuracy.
Example – AI in recruiting
Your organisation may wish to use AI for recruiting purposes, such as to source and screen candidates, analyse resumes and job applications and conduct pre-employment assessments. Given the known risk of AI systems producing biased or inaccurate results, including in relation to recruitment, [52] you should consider whether the use of AI is necessary and the best option in the circumstances. If you do use AI to assist with recruitment, you must implement controls to ensure that outputs are accurate and not biased and that the risk of any inaccuracy is considered.
Controls should include assessing the particular AI product and how it has been trained, to ensure that it is appropriate for the intended uses (see ‘Selecting an AI product’ above). You should also ensure that there is human oversight of the AI system’s processes, and that staff are appropriately trained and able to monitor for any inaccurate outputs. This should include periodic assessments of the AI output to identify potential biases or other inaccuracies.
Practical tips – accuracy measures
To comply with your APP 10 obligations, your organisation must take reasonable steps to ensure the accuracy of the personal information it collects, generates, uses and discloses when using an AI product. You should think about:
-
- Ensuring that any AI product used has been tested on and is appropriate for the intended purpose, including through the use of training data which is sufficiently diverse and relevant.
- Taking steps to verify that the data being input into, or used to train or develop the AI product is accurate and of a high quality – this may include reviewing your organisation’s data holdings and ensuring you have processes in place to review and maintain data quality over time.
- Ensuring that your entity’s records clearly indicate where information is the product of an AI output and therefore a probabilistic assessment rather than fact, as well as the data and system used to generate the AI output.
- Establishing processes to ensure there is appropriate human oversight of AI outputs, which treat the outputs as statistically informed guesses. There are varying forms and degrees of human oversight that may be possible, depending on the context of the AI system and the particular risks.[53] It is important that a human user should be responsible for verifying the accuracy of any personal information obtained through AI, and can overturn decisions made.
- Training staff to understand the design and limitations of an AI system and to anticipate when it may be misleading and why. Staff should be trained and have the necessary resources to verify input data, critically assess outputs, as well as to provide meaningful explanations for decisions they make to reject or accept the AI system’s output.
- Ensuring that individuals are made aware of when AI is used in ways which may materially affect them.
- Maintaining ongoing monitoring of AI systems to ensure they are operating as intended and that unintended impacts such as bias or inaccuracies are identified and rectified.
What transparency and governance measures are needed?
Organisations subject to the Privacy Act have a number of transparency obligations:
-
- APP 1 requires entities to take reasonable steps to implement practices, procedures and systems to ensure they comply with the APPs, and to have a clearly expressed and up-to-date Privacy Policy.
- APP 5 requires entities that collect personal information about an individual to take reasonable steps either to notify the individual of certain matters or to ensure the individual is aware of those matters.
The complexity of AI systems, particularly large AI models such as LLMs, creates challenges for businesses in ensuring that the use of AI is transparent and explainable. The ‘black box’ problem posed by some AI technologies means that even AI developers may be unable to fully explain how a system came to generate an output.[54] This can make it difficult for businesses to communicate clearly with individuals about the way that their personal information is being processed by an AI system.
Despite these challenges, it is important that organisations take steps to ensure they are transparent about their handling of personal information in relation to AI systems. Ensuring that you manage your organisation’s use of AI systems in an open and transparent way will ensure you meet your APP 1 obligations, as well as increase your accountability to your customers, clients and members of the public, and help to build community trust and confidence.
What constitutes reasonable steps under APP 1 will depend on the circumstances, including the nature of the personal information your organisation holds and the possible adverse consequences for an individual if their personal information is mishandled – more rigorous steps may be required as the risk of adversity increases.[55]
Transparency is critical to enabling individuals to understand the way that AI systems are used to produce outputs or make decisions which affect them. Without a clear understanding of the way an AI product works, it is difficult for individuals to provide meaningful consent to the handling of their personal and sensitive information, understand or challenge decisions, or request corrections to the personal information processed or generated by an AI system. Entities must ensure that the output of AI systems, including any decisions made using AI, can be explained to individuals affected.
What practical measures can we take to promote transparency?
Organisations should establish policies and procedures to facilitate transparency, enhance accountability, and ensure good privacy governance. These may include:
-
- informing individuals about the use of their information in connection with AI, including through statements about the use of AI systems in their Privacy Policy.
- ensuring the entity’s APP 5 notices specify any AI-related purposes for which personal information is being collected, the entity’s use of AI systems to generate personal information (where applicable), as well as any disclosures of personal information in connection with AI systems.
- If the AI system developer has access to personal information processed through the system, this is a disclosure that should be included in an APP 5 notice.
- establishing procedures for explaining AI-related decisions and outputs to affected individuals. This may include ensuring that the AI tools you use are able to be appropriate explanations of their outputs.
- training staff to understand how the AI products generate, collect, use or disclose personal information and how to provide meaningful explanations of AI outputs to affected individuals.
Example – Decision-making in AI
Your organisation may be using AI to assist in decision-making processes. For example, you may use AI software that makes predictive inferences about a person to come to a decision on a client’s insurance claim, home loan application or to provide financial advice. The use of AI in relation to decisions that may have a legal or similarly significant effect on an individual’s rights is likely a high privacy risk activity, and particular care should be taken in these circumstances, including consideration of the accuracy and appropriateness of the tool for the intended purpose. In particular, if the behaviour of the AI system cannot be understood or explained clearly by your entity, it may not be appropriate to use for these purposes.
To ensure your use of AI is transparent to your clients, you should ensure that the use of personal information for these purposes is clearly outlined in your Privacy Policy.
If your organisation is using an AI product to assist in the decision-making process, you must understand how the product is producing its outputs so that you can ensure the accuracy of the decision. As a matter of best practice, you should be able to provide a meaningful explanation to your client. As discussed under ‘Accuracy’ below, you should ensure that a human user within your entity is responsible for verifying the accuracy of these outputs and can overturn any decisions made.
It is critical to ensure that you can provide the client with a sufficient explanation about how the decision was reached and the role that the AI product played in this process. This will enable the client to be comfortable that the decision was made appropriately and on the basis of accurate information.
Practical tips – transparency measures
Your organisation must be transparent about its handling of personal information in relation to AI systems. Key steps include:
-
- Informing individuals about how their personal information is used in connection with AI, including at the time you collect their information and in your organisation’s Privacy Policy.
- Ensuring that your organisation has the procedures and resources in place to be able to understand and explain the AI systems you are using, and particularly how they use, disclose and generate personal information.
Taking these steps will help to meet your transparency obligations under APP 1 and APP 5 as well as enhance the accountability of your organisation and build trust in your products or services.
What ongoing assurance processes are needed?
Throughout the lifecycle of the AI product, your organisation should have in place processes for ensuring that the product continues to be reliable and appropriate for its intended uses. This will assist your organisation to comply with the requirement under APP 1 to take reasonable steps to implement practices, procedures and systems that will ensure it complies with the APPs and is able to deal with related inquiries and complaints. This should include:
-
- putting in place internal policies for the use of AI which clearly define the permitted and prohibited uses;
- establishing clear processes for human oversight and verification of AI outputs, particularly where the outputs contain personal information or are relied on to make decisions in relation to a person;
- provide for regular audits or monitoring of the output of the product and its use by the organisation; and
- training staff in relation to the specific product, including how it functions, its particular limitations and risks, and the permitted and prohibited uses.
Example – Importance of ongoing monitoring of embedded AI tools
Your organisation implements the use of an AI assistant called ‘Zelda’ which performs a range of tasks, including generating summaries of customer interactions, providing information on request, and editing written work. Zelda takes on the tone of a friendly colleague and staff find the tool very useful for their work.
Over time, as the model’s underlying processes and training data become outdated, Zelda increasingly produces errors in its output. Unless your organisation has robust processes in place to provide human oversight and review of these outputs, ensure the tool remains fit for purpose, and train staff to understand its limitations, there is a risk that staff will come to over-rely on the assistant and overestimate its accuracy, leading to potential compliance risks.
Checklists
Question |
Considerations |
---|---|
Is the AI system appropriate and reliable for your entity’s intended uses? |
To assist with meeting your organisation’s privacy obligations, particularly regarding accuracy under APP 10, consider:
|
Can you clearly identify the data that the system has been trained on, to ensure that its output will be accurate? |
To help assess whether your use of the AI system will be compliant with your APP 10 obligations:
|
What are the potential security risks associated with the system? |
It is important to consider your entity’s security obligations under APP 11 when selecting an AI product. Consider:
|
What is the intended operating environment for the AI system? |
As part of assessing your APP 11 obligations, consider:
|
Will your inputs be accessible by the system developer? |
Your use of the AI system must be compliant with your APP 6 obligations regarding the disclosure of personal information. Ensure you understand:
|
Question |
Considerations |
---|---|
Are you using or disclosing personal information in the context of an AI system? |
To assess whether you will be compliant with your entity’s obligations under APP 6, consider:
|
Are you collecting, generating or inferring personal information using an AI system? |
To assess whether you will be compliant with your entity’s obligations under APP 3, consider:
|
Have you taken reasonable steps to ensure the accuracy of personal information in relation to the AI system? |
To comply with your APP 10 obligations, your business should consider:
|
Has your organisation put in place appropriate transparency and accountability measures? |
To comply with your business’s obligations under APP 1 and APP 5, you should consider:
|
What ongoing assurance processes need to be put in place? |
Your organisation will need to ensure it has established appropriate processes and systems to ensure it complies with its APP obligations throughout the lifecycle of the AI system. Consider:
|
The second guide provides:
Top five takeaways
-
- Developers must take reasonable steps to ensure accuracy in generative AI models, commensurate with the likely increased level of risk in an AI context, including through using high quality datasets and undertaking appropriate testing. The use of disclaimers to signal where AI models may require careful consideration and additional safeguards for certain high privacy risk uses may be appropriate.
- Just because data is publicly available or otherwise accessible does not mean it can legally be used to train or fine-tune generative AI models or systems. Developers must consider whether data they intend to use or collect (including publicly available data) contains personal information, and comply with their privacy obligations. Developers may need to take additional steps (e.g. deleting information) to ensure they are complying with their privacy obligations.
- Developers must take particular care with sensitive information, which generally requires consent to be collected. Many photographs or recordings of individuals (including artificially generated ones) contain sensitive information and therefore may not be able to be scraped from the web or collected from a third party dataset without establishing consent.
- Where developers are seeking to use personal information that they already hold for the purpose of training an AI model, and this was not a primary purpose of collection, they need to carefully consider their privacy obligations. If they do not have consent for a secondary, AI-related purpose, they must be able to establish that this secondary use would be reasonably expected by the individual, taking particular account of their expectations at the time of collection, and that it is related (or directly related, for sensitive information) to the primary purpose or purposes (or another exception applies).
- Where a developer cannot clearly establish that a secondary use for an AI-related purpose was within reasonable expectations and related to a primary purpose, to avoid regulatory risk they should seek consent for that use and/or offer individuals a meaningful and informed ability to opt-out of such a use.
Quick reference guide
-
- The Privacy Act applies to the collection, use and disclosure of personal information to train generative AI models, just as it applies to all uses of AI that involve personal information.
- Developers using large volumes of information to train generative AI models should actively consider whether the information includes personal information, particularly where the information is of unclear provenance. Personal information includes inferred, incorrect or artificially generated information produced by AI models (such as hallucinations and deepfakes), where it is about an identified or reasonably identifiable individual.
- A number of uses of AI are low-risk. However, developing a generative AI model is a high privacy risk activity when it relies on large quantities of personal information. As for many uses of AI, this is a source of significant community concern. Generative AI models have unique and powerful capabilities, and many use cases can pose significant privacy risks for individuals as well as broader ethical risks and harms.
- For these reasons, the OAIC (like the Australian community) expects developers to take a cautious approach to these activities and give due regard to privacy in a way that is commensurate with the considerable risks for affected individuals. Developers should particularly consider APPs 1, 3, 5, 6 and 10 in this context.
- Other APPs, such as APPs 8, 11, 12 and 13 are also relevant in this context, but are outside the scope of this guidance. This guidance should therefore be considered together with the Privacy Act 1988 (Privacy Act) and the Australian Privacy Principles guidelines.
Privacy by design
-
- Developers should take steps to ensure compliance with the Privacy Act, and first and foremost take a ‘privacy by design’ approach when developing or fine-tuning generative AI models or systems, including conducting a privacy impact assessment.
- AI technologies and supply chains can be complex and steps taken to remove or de-identify personal information may not always be effective. Where there is any doubt about the application of the Privacy Act to specific AI-related activities, developers should err on the side of caution and assume it applies to avoid regulatory risk and ensure best practice.
Accuracy
-
- Generative AI systems are known to produce inaccurate or false results. It is important to remember that generative AI models are probabilistic in nature and do not ‘understand’ the data they handle or generate.
- Under APP 10, developers have an obligation to take reasonable steps to ensure the personal information collected, used and disclosed is accurate. Developers must consider this obligation carefully and take reasonable steps to ensure accuracy, commensurate with the likely increased level of risk in an AI context, including through using high quality datasets, undertaking appropriate testing and the appropriate use of disclaimers.
- In particular, disclaimers should signal where AI models may require careful consideration and additional safeguards for certain high privacy risk uses, for example use in decisions that will have a legal or similarly significant effect on an individual’s rights.
Transparency
-
- To ensure good privacy practice, developers should also ensure they update their privacy policies and notifications with clear and transparent information about their use of AI generally.
Collection
-
- Just because data is publicly available or otherwise accessible does not mean it can legally be used to train or fine-tune generative AI models or systems. Developers must consider whether data they intend to use or collect (including publicly available data) contains personal information, and comply with their privacy obligations.
- Developers must only collect personal information that is reasonably necessary for their functions or activities. In the context of compiling a dataset for generative AI they should carefully consider what data is required to draft appropriate parameters for whether data is included and mechanisms to filter out unnecessary personal information from the dataset.
- Developers must take particular care with sensitive information, which generally requires consent to be collected. Many photographs or recordings of individuals (including artificially generated ones) contain sensitive information and therefore may not be able to be scraped from the web or collected from a third party dataset without establishing consent.
- Where sensitive information is inadvertently collected without consent, it will generally need to be destroyed or deleted from a dataset.
- Developers must collect personal information only by lawful and fair means. Depending on the circumstances, the creation of a dataset through web scraping may constitute a covert and therefore unfair means of collection.
- Where a third party dataset is being used, developers must consider information about the data sources and compilation process for the dataset. They may need to take additional steps (e.g. deleting information) to ensure they are complying with their privacy obligations.
Use and disclosure
-
- Where developers are seeking to use personal information that they already hold for the purpose of training an AI model, and this was not a primary purpose of collection, they need to carefully consider their privacy obligations.
- Where developers do not have consent for a secondary, AI-related purpose, they must be able to establish that this secondary use would be reasonably expected by the individual, taking particular account of their expectations at the time of collection, and that it is related (or directly related, for sensitive information) to the primary purpose or purposes (or another exception applies).
- Whether a secondary use is within reasonable expectations will always depend on the particular circumstances. However, given the unique characteristics of AI technology, the significant harms that may arise from its use and the level of community concern around the use of AI, in many cases it will be difficult to establish that such a secondary use was within reasonable expectations.
- Where a developer cannot clearly establish that a secondary use for an AI-related purpose was within reasonable expectations and related to a primary purpose, to avoid regulatory risk they should seek consent for that use and offer individuals a meaningful and informed ability to opt-out of such a use.
Overview
Who is this guidance for?
This guidance is intended for developers of generative artificial intelligence (AI) models or systems who are subject to the Privacy Act (see ‘When does the Privacy Act apply?’ below).[1] A developer includes any organisation who designs, builds, trains, adapts or combines AI models and applications.[2] This includes adapting through fine-tuning, which refers to modifying a trained AI model (developed by them or someone else) with a smaller, targeted fine-tuning dataset to adapt it to suit more specialised use cases.[3]
The guidance also addresses where an organisation provides personal information to a developer so they can develop or fine-tune a generative AI model.
Although this guidance has been prepared with specific reference to generative AI training activities, a number of the risks and issues discussed are also applicable to narrow AI systems or models that are trained using personal information. Developers of any kind of AI model that involves personal information will find the guidance helpful when considering the privacy issues that arise.
The Privacy Act applies to organisations that are APP entities developing generative AI or fine-tuning commercially available models for their purposes. It also applies to acts or practices engaged in outside Australia by organisations with an Australian link, such as where they are incorporated in Australia or they carry on business in Australia. Foreign companies carrying on business in Australia, such as digital platforms operating in Australia, will find this guidance useful.
For simplicity, this guidance generally refers to ‘developing’ or ‘training’ a generative AI model to refer to both initial development and fine-tuning, even where these two practices may be done by different entities.
This guidance does not address considerations during testing or deploying or otherwise making a generative AI model or system available for use. The OAIC has separate guidance on privacy and the use of commercially available AI products. The OAIC may also provide further statements or guidance in the future.
Can developers use personal information to develop a generative AI model?
Whether the use of personal information to develop a generative AI model will contravene the Privacy Act depends on the circumstances, such as how the personal information was collected, for what purpose it was collected and whether it includes sensitive information. Developers should consider the obligations and considerations set out in this Guide to determine whether their use of personal information to develop a generative AI model is permitted under the Privacy Act.
How to use this guidance
This guidance is not intended to be a comprehensive overview of all relevant privacy risks and obligations that apply to developing generative AI models. It focuses on APPs 1, 3, 5, 6 and 10 in the context of planning and designing generative AI, and compiling a dataset for training or fine-tuning a generative AI model.
This guidance includes practices that developers that are APP entities must follow in order to comply with their obligations under the Privacy Act as well as good privacy practices for developers when developing and training models. Where something is a matter of best practice rather than a clear legal requirement, this will be presented as an OAIC recommendation or suggestion.
The OAIC notes that this guidance is only about privacy considerations and the Privacy Act 1988 (Cth). It does not address considerations from other regimes that may apply to developing and fine-tuning AI models.[4]
This guidance does not have to be read from start to finish – you can use the topic headings to navigate to the sections of interest to you. Most sections conclude with a list of practical tips, which draw together the key learnings from each section. We have also included case studies and examples in each section to illustrate the way that the APPs may apply.
Finally, there is a Quick Reference Guide and Checklist that highlights key privacy considerations when planning and designing a generative AI model, and collecting and processing a generative AI training dataset.
Introductory terms
This guidance is about generative AI models or systems.
While there is no single agreed definition of AI, in this guidance AI refers to ‘a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.’[5] An AI model is the ‘raw, mathematical essence that is often the ‘engine’ of AI applications’ such as GPT-4, while an AI system is ‘the ensemble of several components, including one or more AI models, that is designed to be particularly useful to humans in some way’ such as the ChatGPT app.[6]
There are many different kinds of AI.[7] This guidance focuses on generative AI, which refers to ‘an AI model with the capability of learning to generate content such as images, text, and other media with similar properties to its training data’ and systems built on such models. [8] Developing a generative AI model is a high privacy risk activity when it relies on large quantities of personal information.
Large language models (LLMs) and multimodal foundation models (MFMs) are both examples of generative AI. An LLM is ‘a type of generative AI that specialises in the generation of human-like text’. [9] Some examples of products or services incorporating LLMs are Meta AI Assistant, ChatGPT, Microsoft Copilot and HuggingChat.
An MFM is ‘a type of generative AI that can process and output multiple data types (e.g. text, images, audio)’.[10] Some examples of products or services incorporating MFMs that are image or video generators include DALL-E 3, Firefly, Jasper Art, Synthesia, Midjourney and Stable Diffusion. Some examples of products or services incorporating MFMs that are audio generators include Riffusion, Suno, Lyria and AudioCraft.
Generative AI models are trained on the relationship between inputs, using this to identify probabilistic relationships between data that they use to generate responses.[11] This can have implications for their accuracy (discussed further below).
Artificial intelligence and privacy
Artificial intelligence (AI) has the potential to benefit the Australian economy and society, by improving efficiency and productivity across a wide range of sectors and enhancing the quality of goods and services for consumers. However, the data-driven nature of AI technologies, which rely on large datasets that often include personal information, can also create new specific privacy risks, amplify existing risks and lead to serious harms.[12] These AI-specific risks and harms are considered in further detail in the section below.
The Privacy Act 1988 and the Australian Privacy Principles (APPs) apply to all uses of AI involving personal information, including where information is used to train, test or use an AI system. APP entities need to understand their obligations under the APPs when developing generative AI models. This includes being aware of the different ways that they may be collecting, using and disclosing personal information when developing a generative AI model.
What is personal information?
Personal information includes a broad range of information, or an opinion, that could identify an individual. This may include information such as a person’s name, contact details and images or videos where a person is identifiable. What is personal information will vary, depending on whether a person can be identified or is reasonably identifiable in the circumstances. Personal information is a broad concept and includes information which can reasonably be linked with other information to identify an individual.
Sensitive information is a subset of personal information that is generally afforded a higher level of privacy protection. Examples of sensitive information include photographs or videos where sensitive information such as race or health information can be inferred, as well as information about an individual’s political opinions or religious or philosophical beliefs.
Importantly, information can be personal information whether or not it is true. This may include false information generated by an AI system, such as hallucinations or deepfakes.[13]
When does the Privacy Act apply?
The Privacy Act applies to Australian Government agencies, organisations with an annual turnover of more than $3 million, and some other organisations.[14] Importantly, the Privacy Act applies to acts or practices engaged in outside Australia by organisations with an Australian link, such as where they are incorporated in Australia or they carry on business in Australia.[15]
Whether a developer carries on business in Australia can be determined by identifying what transactions make up or support the business and asking whether those transactions or the transactions ancillary to them occur in Australia.[16] By way of example, developers whose business is providing digital platform services to Australians will generally be carrying on business in Australia.
Interaction with Voluntary AI Safety Standard
The National AI Centre has developed a Voluntary AI Safety Standard to help organisations develop and deploy AI systems in Australia safely and reliably. The standard consists of 10 voluntary guardrails that apply to all organisations across the AI supply chain. It does not seek to create new legal obligations, but rather helps organisations deploy and use AI systems in accordance with existing Australian laws. The information in this guidance is focussed on compliance with the Privacy Act, but will also assist organisations in addressing the guardrails in the Standard. For more information, see: www.industry.gov.au/publications/voluntary-ai-safety-standard
Privacy considerations when planning and designing an AI model or system
Privacy by design (APP 1)
Developers subject to the Privacy Act must take reasonable steps to implement practices, procedures and systems that will ensure they comply with the APPs and any binding registered APP code, and are able to deal with related inquiries and complaints.[17] When developing or fine-tuning a generative AI model, developers should consider the potential risks at the planning and design stage through a ‘privacy by design’ approach.
Privacy by design is a process for embedding good privacy practices into the design specifications of technologies, business practices and physical infrastructures.[18]
To mitigate risks, developers first need to understand them. A privacy impact assessment (PIA) is one way to do this for privacy risks. It is a systematic assessment of a project that identifies the impact that the project might have on the privacy of individuals, and sets out recommendations for managing, minimising or eliminating that impact. While PIAs assess a project’s risk of non-compliance with privacy legislation, a best practice approach considers the broader privacy implications and risks beyond compliance, including whether a planned use of personal information will be acceptable to the community.[19]
Some privacy risks that may be relevant in the context of generative AI include the following:
-
- Individuals losing control over their personal information: Technologies such as generative AI are trained on large amounts of public data, including the personal information of individuals, which is likely to be collected without their knowledge and consent.[20] It can be difficult for individuals to identify when their personal information is used in AI systems and to request the correction or deletion of this information.
- Bias and discrimination: As AI systems learn from source data which may contain inherent bias, this bias may be replicated in their outputs through inferences made based on gender, race or age and have discriminatory effects.[21] AI outputs can often appear credible even when they produce errors or false information.
- Other inaccuracies: Issues in relation to accuracy or quality of the training data (including as a result of data poisoning)[22] and the predictive nature of generative AI models can lead to outputs that are inaccurate but appear credible. [23] Feedback loops can cause the accuracy and reliability of an AI model to degrade over time.[24] Inaccuracies in output can have flow on consequences that depend on the context, including reputational harm, misinformation or unfair decisions.
- Lack of transparency: AI can make it harder for entities to understand and explain how personal information is used and how decisions made by AI systems are reached.
- Re-identification: The use of aggregated data drawn from multiple datasets also raises questions about the potential for individuals to be re-identified through the use of AI and can make it difficult to de-identify information in the first place.[25]
- Misuse of generative AI systems: The capabilities of generative AI models can be misused through malicious actors building AI systems for improper purposes, or the AI model or end users of AI systems misusing them, with potential impacts on individual privacy or broader negative consequences including through:[26]
- Generating disinformation at scale, such as though deepfakes
- Scams and identity theft
- Generating harmful or illegal content, such as image-based abuse, which can be facilitated through the accidental or unintended collection and use of harmful or illegal material, such as child sexual abuse material, to train AI systems[27]
- Generating harmful or malicious code that can be used in cyber attacks or other criminal activity.[28]
- Risk of disclosure of personal information through a data breach involving the training dataset or through an attack on the model: The vast amounts of data collected and stored by generative AI may increase the risks related to data breaches, especially when individuals disclose particularly sensitive data in their conversations with generative AI chatbots because they are not aware it is being retained or incorporated into a training dataset.[29] This could be through unauthorised access to the training dataset or through attacks designed to make a model regurgitate its training dataset.[30]
A developer may find it difficult to assess the privacy impacts of an AI model or system before developing it as it may not have all the information it requires to make a fulsome assessment. This is particularly so where the AI model developed is general purpose, where the purpose can be quite broad. For this reason:
-
- developers should consider the privacy risks through the PIA process to the extent possible early, including by reference to general risks
- PIAs should be an ongoing process so that as more information is known developers can respond to any changing risks
- it is important for developers that are fine-tuning models and deployers to consider any additional privacy risks from the intended use.[31]
Where developers build general purpose AI systems or structure their AI systems in a way that places the obligation on downstream users of the system to consider privacy risks, the OAIC suggests they provide any information or access necessary for the downstream user to assess this risk in a way that enables all entities to comply with their privacy obligations. However, as a matter of best practice, developers should err on the side of caution and assume the Privacy Act applies to avoid regulatory risk.
Practical tips – privacy by design
Take a privacy-by-design approach early in the planning process.
Conduct a PIA to identify the impact that the project might have on the privacy of individuals and then take steps to manage, minimise or eliminate that impact.
Accuracy when training AI models (APP 10)
Accuracy risks for generative AI models
Generative AI models carry inherent risks as to their accuracy due to the following factors:
-
- They are often trained on huge amounts of data sourced from across the internet, which is highly likely to include inaccuracies and be impacted by unfounded biases.[32] The models can then perpetuate and amplify those biases in their outputs.
- The probabilistic nature of generative AI (in which the next word, sub-word, pixel or other medium is predicted based on likelihood) and the way it tokenises input can generate hallucinations. For example, without protective measures an LLM asked how many ‘b’s are in banana will generally state there are two or three ‘b’s in banana as the training data is weighted with instances of people asking how many ‘a’s or ‘n’s are in banana and because of the way it tokenises words not letters.[33]
- The accuracy and reliability of an AI model is vulnerable to deterioration over time. This can be caused by the accumulation of errors and misconceptions across successive generations of training, or by a model’s development on training data obtained up to a certain point in time, which eventually becomes outdated.[34]
- An LLM’s reasoning ability declines when they encounter a scenario or task that differs to what is in their training data.[35]
These risks can be compounded by the tendency of generative AI tools to confidently produce outputs which appear credible, regardless of their accuracy.
Privacy obligations regarding accuracy
Developers of generative AI models must be aware of these risks and how they may impact on their ability to comply with privacy obligations. In particular, APP 10 requires developers to take reasonable steps to ensure that:
-
- the personal information it collects is accurate, up-to-date and complete; and
- the personal information it uses and discloses is accurate, up-to-date, complete and relevant, having regard to the purpose of the use or disclosure.
The reasonable steps that a developer must take will depend on circumstances that include the sensitivity of the personal information, the nature of the developer, and the possible adverse consequences for an individual if the quality of personal information is not ensured.[36] In the context of generative AI, there will be a strong link to the intended purpose of the AI model. For example, there is a lesser privacy risk associated with an AI system intended to generate songs than one that will be used to summarise key points for insurance claims to allow assessors to assess a claim from the key points. Particular care should be taken where generative AI systems will be used for high privacy risk uses, for example use in decisions that will have a legal or similarly significant effect on an individual’s rights. In these circumstances careful consideration should be given to the development of the model and more extensive safeguards will be appropriate.
How a developer intends the outputs to be used will also be relevant. If the developer does not intend for the output to be treated as factually accurate information about an individual but instead as probabilistic guesses about something that may be true, 100% statistical accuracy may not be required. However, the developer will need to clearly communicate the limits and intended use of the model. Examples of reasonable steps to ensure accuracy are provided below.
Practical tips – examples of reasonable steps to ensure accuracy
Developers should consider the reasonable steps they will need to take to ensure accuracy at the planning stage. The reasonable steps will depend on the circumstances but may require the developer to:
-
- Take steps to ensure the training data, including any historical information, inferences, opinions, or other personal information about individuals, having regard to the purpose of training the model, is accurate, factual and up-to-date.
- Understand and document the impact that the accuracy of the training data has on the generative AI model outputs.
- Clearly communicate any limitations in the accuracy of the generative AI model or guidance on appropriate usage, including whether the dataset only includes information up to a certain date, and through the use of appropriate disclaimers.
- Have a process by which a generative AI system can be updated if they become aware the information used for training or being output is incorrect or out-of-date.
- Implement diverse testing methods and measures to assess and mitigate risk of biased or inaccurate output prior to release.
- Implement a system to tag content as AI-generated, for example, the use of watermarks on images or video content.
- Consider what other steps are needed to address the risk of inaccuracy such as fine-tuning, allowing AI systems built on the generative AI model to access and reference knowledge databases when asked to perform tasks to help improve its reliability, restricting user queries, using output filters or implementing accessible reporting mechanisms that enable end-users to provide feedback on any inaccurate information generated by an AI system.[37]
Privacy considerations when collecting and processing the training dataset
Generative AI requires large datasets to train the model. In addition, more targeted datasets may be used to fine-tune a model. However, just because data is accessible does not automatically mean that it can be used to train generative AI models. The collection and use of data for model training purposes can present substantial privacy risks where personal information or sensitive information is handled in this process. As outlined above, the Privacy Act regulates the handling of personal information by APP entities, and will apply when a dataset contains personal information. It is therefore important for developers to ensure they comply with privacy laws.
As an initial step, developers need to actively consider whether the dataset they intend to use for training a generative AI model is likely to contain personal information. It is important to consider the data in its totality, including the data,[38] the associated metadata, and any annotations, labels or other descriptions attributed to the data as part of its processing. Developers should be aware that information that would not be personal information by itself may become personal information in combination with other information.[39]
If there is a risk the dataset contains some personal information, developers must consider whether their collection of the dataset complies with privacy law or take appropriate steps to ensure there is no personal information in the dataset. This section sets out considerations for different methods of compiling a dataset:
-
- Data scraping (see ‘Collection obligations (APP 3)’): Data scraping generally involves the automated extraction of data from the web, for example collecting profile pictures or text from social media sites. This may arise in a range of circumstances, including where a developer uses web crawlers to collect data or where they follow a series of links in a third party dataset and download or otherwise collect the data from the linked websites.
- Collecting a third party dataset (see ‘Collection obligations (APP 3)’): Another common source of data for training is datasets collected by third parties. This includes licensed datasets collected by data brokers, datasets compiled by universities and freely available datasets compiled from scraped or crawled information (e.g. Common Crawl).
- Using a dataset you (or the organisation you are developing the model for) already hold (see ‘Use and disclosure obligations (APP 6)’).
Using de-identified information
Developers using de-identified information will need robust de-identification governance processes.[40] As part of this, developers should be aware that de-identification is context dependent and may be difficult to achieve. In addition, developers seeking to use de-identified information to train generative AI models should be aware of the following:
-
- De-identifying personal information is a use of the personal information for a secondary purpose.[41]
- If a developer collects information about identifiable individuals with the intention of de?identifying it before model training, that is still a collection of personal information and the developer needs to comply with their obligations when collecting personal information under the Privacy Act (including deleting sensitive information).
Data minimisation when collecting and processing datasets
The Privacy Act requires developers to collect or use only the personal information that is reasonably necessary for their purposes.[42] An important aspect of considering what is ‘reasonably necessary’ is first specifying the purpose of the AI model. This should be a current purpose rather than collecting information for a potential, undefined future AI product. Once the purpose is established, developers should consider whether they could train the AI model without collecting or using the personal information, by collecting or using a lesser amount (or fewer categories) of personal information or by de-identifying the personal information.
Example – what personal information is reasonably necessary?
The purpose of the AI model or eventual AI system will inform what information needs to be included in the training dataset. For example, a generative AI model being developed to analyse brain scans and prepopulate a preliminary report will likely need examples of brain scans and their associated medical reports to be included in the training data, but consideration should be given to whether other information, for example details identifying the patient in the metadata or annotations, can be removed.
Practical tips – ways to minimise personal information
When training a generative AI model, developers can minimise the personal information they are using by:
-
- limiting the information at the collection stage through collection criteria such as:
- ensuring that certain sources are excluded from the collection criteria, such as public social media profiles, websites that are known to contain large amounts of personal information or websites that collect sensitive information by nature.
- excluding certain categories of data from collection criteria to prevent the collection of personal information or sensitive information.
- limiting collection through other criteria such as time periods.
- limiting annotations to what is necessary to train the model.
- removing or ‘sanitising’ personal information after it has been collected, but before it is used for model training purposes. This can occur through using a filter or other process to identify and remove personal information from the dataset.
- limiting the information at the collection stage through collection criteria such as:
The OAIC notes that these privacy-enhancing tools and technologies, including de-identification techniques, can be helpful risk-reduction strategies and assist in complying with APP 3.
However, as a matter of best practice developers should still err on the side of caution and treat data as personal information where there is any doubt to avoid regulatory risk. Even if a developer uses collection criteria and data sanitisation techniques it will still need to consider compliance with its other obligations under the Privacy Act.
Collection obligations (APP 3)
Collecting data through data scraping
The Privacy Act requires personal information to be collected directly from the individual unless it is unreasonable or impracticable to do so.[43] As scraped data is not collected directly from the individual, developers will need to consider whether their collection of personal information meets this test.
Personal information must also be collected by lawful and fair means. Examples of collections that are unlawful include collecting information in breach of legislation, that would constitute a civil wrong or would be contrary to a court or tribunal order.[44] Fairness is an open-textured and evaluative criterion. A fair means of collection is one that does not involve intimidation or deception, and is not unreasonably intrusive. Generally, it will be unfair to collect personal information covertly without the knowledge of the individual, although this will depend on the circumstances.[45]
Given the challenges for robust notice and transparency measures (discussed further below), the creation of a dataset through scraped data is generally a covert method of collection.[46] Whether this renders the collection unfair will depend on the circumstances such as:
-
- what the individual would reasonably expect;
- the sensitivity of the personal information;
- the intended purpose of the collection, including the intended operation of the AI model;
- the risk of harm to individuals as a result of the collection;
- whether the information being collected was intentionally made public by the individual that the information is about (in contrast to the individual who posted or otherwise published the information online); and
- the steps the developer will take to prevent privacy impacts, including mechanisms to delete or de-identify personal information or provide individuals with control over the use of their personal information.[47]
Caution – Developers should not assume information posted publicly can be used to train models
Developers should not assume that the collection of information that has been made public on the internet complies with APP 3.5. Personal information that is publicly available may have been made public by a third party, and even if it was made public by the individual themselves, they may not expect it to be collected and used to train an AI model, and therefore the collection of it for the purposes of training generative AI may not meet the requirements of APP 3.5.
Sensitive information
Further protections apply to sensitive information under the Privacy Act, which developers must not collect without consent unless an exception applies.
What is sensitive information?
Sensitive information is any biometric information to be used for the purposes of automated biometric verification or biometric identification, biometric templates, health information about an individual, genetic information about an individual or personal information about an individual for certain topics such as racial or ethnic origin, political opinions or sexual orientation.[48]
Information may be sensitive information where it clearly implies one of these matters.[49] For example, an image of a person is sensitive information where it implies one of the categories of sensitive information or will be used for automated identification. Labels or annotations assigned to images that include sensitive information such as race or whether someone has a disability will also be sensitive information.
If no exceptions apply, developers require valid consent to collect sensitive information under the APPs.[50] When developers scrape data, there are challenges with obtaining valid consent to the collection of sensitive information. In particular, consent under the Privacy Act must be express or implied consent.[51] Express consent is given explicitly, and implied consent may reasonably be inferred in the circumstances of the individual.[52] Generally developers should not assume that an individual has given consent on the basis that they did not object to a proposal to handle information in a particular way.[53] For example, a failure of a website to implement measures to prevent data scraping should not be taken as implied consent. Developers must also be conscious that information on publicly accessible websites may have been uploaded by a different person to the individual the information is about.
If consent cannot be obtained and no exception applies, developers must delete sensitive information from the dataset.
Example – consent when scraping profile pictures from social media
A developer collects public profile pictures from a social media website. The profile pictures are of sufficient quality to allow the individual/s in the pictures to be recognised and to allow inferences to be made about race or medical conditions due to physical characteristics displayed in the picture. The developer will need to have consent to collect the images unless another exception to APP 3.3. applies.
Practical tips – checks when collecting through data scraping
Developers collecting a data set through data scraping should:
-
- consider whether personal or sensitive information is in the data to be collected.
- consider the data minimisation techniques set out earlier, including whether personal information collected can be de-identified.
- once the data is collected, consider whether any unnecessary data can be deleted.
- determine whether they have valid consent for the collection of sensitive information or delete sensitive information.
- consider whether their means of collection is lawful and fair based on the circumstances.
- consider whether they could collect the data directly from individuals rather than indirectly from third party websites.
- consider what changes to privacy policies and collection notices are needed to comply with their notice and transparency obligations (see section on Notice and transparency obligations).
Collecting a third party dataset
When developers obtain a third party dataset that contains personal information, this will generally constitute a collection of personal information by the developer from the third party. As such, developers must consider whether they are complying with their privacy obligations in collecting and using this information.[54] Similar considerations to those specified for data scraping will be relevant. The circumstances of how the dataset was collected will be important to consider in this context. In particular:
-
- the steps the third party took when collecting the information to inform individuals that it would provide their personal information to others for them to use to train generative AI models will impact whether the collection by the developer is lawful and fair; and
- whether the third party obtained valid consent on behalf of the developer for the collection of sensitive information will be relevant to whether the developer can collect sensitive information or must delete it from the dataset.
In addition to deleting sensitive information from the dataset, developers may need to de-identify personal information prior to using the dataset.
Where possible, the OAIC recommends that developers seek information or assurances from third parties in relation to the dataset including through:
-
- contractual terms that the collection of personal information and its disclosure by the third party for the purposes of the developer training a generative AI model does not breach the Privacy Act 1988 (Cth), or, where the third party is not subject to the Privacy Act, would not do so if they were subject to the Privacy Act;
- requesting information about the source of any personal information; and
- requesting copies of what information was provided to individuals about how their personal information would be handled.
Where datasets of scraped or crawled content are made publicly available by a third party it may not be possible to amend the terms and conditions attached to the use of that data to require compliance with the Privacy Act. Where this is the case, developers must carefully consider the information provided about the data sources and compilation process for the dataset in the context of the privacy risks associated with data scraping set out in the previous section. As set out above, it may be necessary to delete sensitive information and delete or de-identify personal information prior to using the dataset.
Practical tips – checks when collecting a third party dataset
Developers collecting a third party data set should:
-
- consider whether personal or sensitive information is in the data to be collected.
- consider the data minimisation techniques set out earlier, including to narrow the personal information they collect to what is reasonably necessary to train the model.
- seek and consider information about the circumstances of the original collection, including its data sources and notice provided.
- if possible, seek assurances from the third party in relation to the provenance of the data and circumstances of collection.
- consider whether their means of collection is lawful and fair based on the circumstances.
- consider whether the data could be collected directly from the individual rather than from a third party.
- consider whether they have valid consent to collect any sensitive information – if they do not and another exception does not apply they will need to delete it.
- consider what changes to their privacy policies and collection notices (or the third party’s collection notices) are needed to comply with their notice and transparency obligations (see section on notice and transparency).
If the developer cannot establish their collect and use complies with their privacy obligations it may be necessary to delete sensitive information and delete or de-identify personal information prior to using the dataset to manage privacy risk.
Use and disclosure obligations (APP 6)
Developers may wish to use personal information they already hold to train a generative AI model, such as information they collected through operating a service, interactions with an AI system or a dataset they compiled for training an earlier AI model. However, before they do so, developers will need to consider whether that use is permitted under the Privacy Act. The considerations for reuse of data will also be relevant where an organisation is providing information to a developer to build a model for them.
When is information publicly available on a website ‘held’?
A developer, or organisation providing information to a developer, holds personal information if it has possession or control of a record that contains the personal information.[55] For the purposes of the Privacy Act, a record does not include generally available publications.[56] However, a website is not a generally available publication merely because anybody can access it. Instead, this will depend on consideration of a range of factors, including the circumstances, the nature of the information, the prominence of the website, the likelihood of the access and the steps needed to obtain that access.[57] Developers and organisations providing information to developers should exercise caution when claiming they do not hold personal information, due to it being available on a public facing website.
Using or disclosing personal information for a secondary, AI-related purpose
Organisations, including developers, can only use personal information or disclose it to third parties for the primary purpose or purposes for which it was collected unless they have consent or an exception applies.[58] The primary purpose or purposes are the specific functions or activities for which particular personal information is collected.[59] For example, where information is collected for the purpose of providing a particular service such as a social media service, providing that specific social media service will be the primary purpose even where the provider of the service has additional secondary purposes in mind.
Consent
Consent under the Privacy Act means express or implied consent and has the following four elements:
-
- the individual is adequately informed before giving consent;
- the individual gives consent voluntarily;
- the consent is current and specific; and
- the individual has the capacity to understand and communicate their consent.[60]
In the context of training generative AI, developers and organisations providing personal information to developers may have challenges ensuring the individual is adequately informed. This is because training generative AI models is a form of complex data processing that may be difficult for individuals to understand. Developers training models and organisations providing personal information to developers should therefore consider how they can provide information in an accessible way. In addition, as consent must be voluntary, current and specific, they should not rely on a broad consent to handle information in accordance with the privacy policy, as consent for training generative AI models.[61]
Informing individuals about training generative AI models
Developers and organisations providing personal information to developers should consider what information will give individuals a meaningful understanding of how their personal information will be handled, so they can determine whether to give consent. This could include information about the function of the generative AI model, a general description of the types of personal information that will be collected and processed and how personal information is used during the different stages of developing and deploying the model.
Reasonable expectations
One of the exceptions in the Privacy Act permits personal information to be used for a secondary purpose if the individual would reasonably expect it to be used for the secondary purpose and the secondary purpose is related to a primary purpose. If the information is sensitive information, the information needs to be directly related to a primary purpose.
The ‘reasonably expects’ test is an objective one that has regard to what a reasonable person, who is properly informed, would expect in the circumstances.[62] Relevant considerations include whether, at the time the developer or organisation providing information to the developer collected the information:
-
- the developer or organisation providing information to the developer provided information about use for training generative AI models through an APP 5 notice, or in its APP privacy policy; and
- the secondary purpose was a well understood internal business practice.
Whether APP 5 notices or privacy policies were updated, or other information was given at a point in time after the collection may also be relevant to this assessment. It is possible for an individual’s reasonable expectations in relation to secondary uses that are related to the primary purpose to change over time. However, particular consideration should be given to the reasonable expectations at the time of collection, given this is when the primary purpose or purposes are determined. In the context of training generative AI models, updating a privacy policy or providing notice by themselves will generally not be sufficient to change reasonable expectations regarding the use of personal information that was previously collected for a different purpose.
Even if the individual would reasonably expect their information to be used for a secondary purpose, that purpose still needs to be related (for personal information) or directly related (for sensitive information) to the primary purpose. A related secondary purpose is one which is connected to or associated with the primary purpose and must have more than a tenuous link.[63] A directly related secondary purpose is one which is closely associated with the primary purpose, even if it is not strictly necessary to achieve that primary purpose.[64] This will be difficult to establish where information collected to provide a service to individuals will be used to train a generative AI model that is being commercialised outside of the service (rather than to enhance the service provided).
Practical steps for compliance
Where a developer or an organisation providing personal information to a developer cannot clearly establish that using the personal information they hold to train a generative AI model is within reasonable expectations and related to the primary purpose, they should seek consent for that use and/or offer individuals the option of opting-out of such a use. The opt-out mechanism must be accompanied by sufficient information to inform the individual about the intended use of their personal information and sufficient time to exercise the opt-out.
Example 1 – social media companies reusing historic posts
A social media company has collected posts from users on their platform since 2010 for the purpose of operating the service. In 2024 they wish to use these posts to train a LLM which they will then licence to others. They update their privacy policy to say that they use the information they hold to train generative AI and then proceed to use the personal information they hold going back to 2010 to train a generative AI model.
This use of personal information would not be permitted where the association with the original purpose of collection is too tenuous and the individuals would not have reasonably expected the information they provided in the past to be used to train generative AI. In these circumstances, the social media company could consider de-identifying the information before using it, or taking appropriate steps to seek consent, for example by informing impacted individuals, providing them with an easy and user-friendly mechanism to remove their posts and providing them with a sufficient amount of time to do so.
Example 2 – using prompts to a chatbot to retrain the chatbot
An AI model developer makes an AI chatbot publicly available. Prompts entered by individuals are used to provide the chatbot service. Where those prompts contain personal information, providing the chatbot service is the primary purpose of collection. The developer also wishes to use the prompts and responses to train future versions of the AI model underlying the chatbot service so it can improve over time. They provide clear information about this to individuals using the product through a sentence above the prompt input field explaining that interactions with the chatbot are used to retrain the AI model, and a more detailed explanation in their privacy policy.
As the individual was informed before using the chatbot and this use is related to the primary purpose, this secondary use of the data would likely be within users’ reasonable expectations.
Example 3 – bank providing information to a developer to develop or fine-tune an AI model for their business purposes
A bank wants to provide a developer with historic customer queries to fine-tune a generic LLM for use in a customer chatbot. The customer queries include personal information, such as names and addresses of customers. The bank should first consider whether the fine-tuning dataset needs to include the personal information or if the customer queries can be de-identified.
If the customer queries cannot be de-identified, the bank will need to consider whether APP 6 permits the customer queries to be disclosed to the third party developer. The primary purpose of collecting the personal information was to respond to the customer query. As such, the organisation will need to consider whether they have consent from the individuals the information is about, or an exception applies to use it for this secondary purpose.
To comply with APP 6, the bank provides information to its customers about its intended use of previous customer queries and provides a mechanism for its customers and previous customers to opt-out from their information being used in this way. It also:
-
- updates its privacy policy and collection notice to make clear that personal information will be used to fine-tune a generative AI chatbot for internal use
- agrees other protections with the developer such as filters to ensure personal information from the training dataset will not be included in outputs.
The OAIC also has guidance for entities using commercially available AI products, including considerations when selecting a commercially available AI product.
Practical tips – checks when using or disclosing personal information for a secondary, AI-related, purpose
Organisations providing information to developers and developers reusing personal information they already hold to train generative AI should:
-
- consider the data minimisation techniques set out earlier, including to narrow the personal information they use to what is reasonably necessary to train the model.
- consider whether the use or disclosure is for a primary purpose the personal information was collected for.
- if use or disclosure to train AI is a secondary purpose and they wish to rely on reasonable expectations:
- consider what were the reasonable expectations of the individual at the time of collection.
- consider what information was provided at the time of the collection (through APP 5 notification and the privacy policy).
- consider whether training generative AI is related (for personal information) or directly related (for sensitive information) to the primary purpose of collection.
- consider if another exception under APP 6 applies.
- if an exception does not apply, seek consent for that use and offer individuals a meaningful and informed ability to opt-out – the opt-out mechanism must be accompanied by sufficient information to inform the individual about the intended use of their personal information.
- consider what changes to their privacy policies and collection notices are needed to comply with their notice and transparency obligations (see section on notice and transparency).
Notice and transparency obligations (APP 1 and APP 5)
Regardless of how a developer compiles a dataset, they must:
-
- have a clearly expressed and up-to-date privacy policy about their management of personal information
- take such steps as are reasonable to notify or otherwise ensure the individual is aware of certain matters at, before or as soon as practicable after they collect the personal information.[65]
A developer’s privacy policy must contain the information set out in APP 1.4. In the context of training a generative AI model, this will generally include:
-
- information about the collection (whether through data scraping, a third party or otherwise) and how the data set will be held
- the purposes for which it collects and uses personal information, specifically that of training generative AI models
- how an individual may access personal information about them that is held by the developer and seek correction of such information, including an explanation of how this will work in the context of the dataset and generative AI model.
Similarly, organisations providing personal information to developers must include the information set out in APP 1.4 in their privacy policy. This will generally include the purposes for which they disclose personal information, specifically to train generative AI models.
It is important to note that generally, having information in a privacy policy is not a means of meeting the obligation to notify individuals under APP 5.[66] Transparency should be supported by both a notice under APP 5 and through the privacy policy.
The steps taken to notify individuals that are reasonable will depend on the circumstances, including the sensitivity of the personal information collected, the possible adverse consequences for the individual as a result of the collection, any special needs of the individual and the practicability of notifying individuals.[67] The sections below include considerations relevant to notice in common kinds of data collection for training AI models.
Notice in the context of data scraping
Data scraping may pose challenges for developers in relation to notification, particularly if they do not have a direct relationship with the individual or access to their contact details. Despite this, developers still need to consider what steps are reasonable in the circumstances. If individual notification is not practicable, developers should consider what other mechanisms they can use to provide transparency to affected individuals, such as making information publicly available in an accessible manner.[68]
The intention of APP 5 is to explain in clear terms how personal information is collected, used and disclosed in particular circumstances. One matter that is particularly relevant in the context of data scraping is providing information about the facts and circumstances of the collection of information. In the context of data scraping this should include information about the categories of personal information used to develop the model, the kinds of websites that were scraped, and if possible the domain names and URLs. Given the lack of transparency when information is scraped, it is good practice for developers to leave a reasonable amount of time between taking steps to inform individuals about the collection of information and training the generative AI model.
Notice in the context of third party data
Where the dataset was collected by a third party, that third party may have provided notice of relevant matters under APP 5 to the individual. The developer will need to consider what information has been provided to the individual by the third party so that the developer can assess whether it has taken such steps as are reasonable to ensure the individual is aware of relevant matters under APP 5.[69] This includes whether information was provided about the circumstances of the developer’s collection of personal information from the third party.
Notice when using personal information for a secondary purpose
As set out above, whether an APP 5 notice stated that personal information will be used for the purpose of training generative AI models at the time of collection is relevant to whether this was within an individual’s reasonable expectations. To form a reasonable expectation, this information needs to be provided in an easily accessible way that is clearly explained. Developers and organisations providing personal information to developers should explicitly refer to training artificial intelligence models rather than relying on broad or vague purposes such as research.
Practical tips – notice and transparency
Developers and organisations providing information to developers should:
-
- ensure their privacy policy and any APP 5 notifications are clearly expressed and up-to-date. Do they clearly indicate and explain the use of the data for AI training purposes?
- take steps to notify affected individuals or ensure they are otherwise aware that their data has been collected.
- where data scraping has been used and individual notification is not possible, consider other ways to provide transparency such as through publishing and promoting a general notification on their website.
Other considerations
This guidance focuses on compliance with APPs 1, 3, 5, 6 and 10 of the Privacy Act, but is not an exhaustive consideration of all the privacy obligations that will be relevant when developing or fine-tuning a generative AI model. The application of the Privacy Act will depend on the circumstances, but developers may also need to consider:
-
- how to manage overseas data flows where datasets or other kinds of personal information are disclosed overseas (APP 8);
- how they can design their generative AI model or system and keep records of where their data was sourced from in a way that enables compliance with individual rights of access and correction, and the consequences of withdrawal of consent (APPs 12 and 13); and
- how to appropriately secure training datasets and AI models, and when datasets should be destroyed or de-identified (APP 11).
Checklist – privacy considerations when developing or training an AI model
Privacy issue |
Considerations |
---|---|
Can obligations to implement practices, procedures and systems to ensure APP compliance be met? |
Has a privacy by design approach been adopted? Has a privacy impact assessment been completed? |
Have reasonable steps been taken to ensure accuracy at the planning stage? |
Is the training data accurate, factual and up to date considering the purpose it may be used for? What impact will the accuracy of the data have on the model’s outputs? Are the limitations of the model communicated clearly to potential users, for example through the use of appropriate disclaimers? Can the model be updated if training data becomes inaccurate or out-of-date? Have diverse testing methods or other review tools or controls been implemented to identify and remedy potential accuracy issues? Has a system to tag content been implemented (e.g. use of watermarks)? Have any other steps to address the risk of inaccuracy been taken? |
Can obligations in relation to collection be met? |
Is there personal or sensitive information in the data to be collected? Could de-identified data be used, or other data minimisation techniques integrated to ensure only necessary information is collected or used? Once collected, has the data been reviewed and any unnecessary data been deleted? Is there valid consent for the collection of any sensitive information or has sensitive information been deleted? Is the means of data collection lawful and fair? Could the data be collected directly from individuals rather than from a third party? Where data is collected from a third party, has information been sought about the circumstances of the original collection (including notification and transparency measures) and have these circumstances been considered? Have assurances been sought from relevant third parties in relation to the provenance of the data and circumstances of collection? If scraped data has been used, have measures been undertaken to ensure this method of collection complies with privacy obligations? |
Can obligations in relation to use and disclosure be met? |
What is the primary purpose the personal information was collected for? Where the AI-related use is a secondary purpose, what were the reasonable expectations of the individual at the time of collection? What was provided in the APP 5 notification (and privacy policy) to individuals at the time of collection? Have any new notifications been provided? Does consent need to be sought for the secondary AI related use, and/or could a meaningful and informed opt-out be provided as an option? |
Have transparency obligations been met? |
Are the privacy policy and any APP 5 notifications clearly expressed and up-to-date? Do they clearly indicate and explain the use of the data for AI training purposes? Have steps been taken to notify affected individuals that their data has been collected? Where data scraping has been used and individual notification is not possible, has general notification been considered? |