AI Models using images of Australian children without their consent (meaning parent’s) consent

July 6, 2024 |

AI models and programs needing vast data on individuals, such as facial recognition technology, needs vast troves of data to be effective. Hoovering up that data has been the business model of companies in those fields. The level of discrimination has been low and respect for the privacy of those whose data is used has generally been non existent. This is highlighted in an ABC piece The world’s biggest AI models were trained using images of Australian kids, and their families had no idea.

The genesis of the ABC story is a report by Human Rights Watch titled Australia: Children’s Personal Photos Misused to Power AI Tools.

The article provides:

In short:

Images of Australian children were found in a dataset called LAION-5B, which is used to train AI.

The images have since been removed from the dataset, but AI models are unable to forget the material they’re trained on, so it’s still possible for them to reproduce elements of those images, including faces, in their outputs.

What’s next?

The federal government is expected to unveil proposed changes to the privacy act next month, including specific protections for children online.

The privacy of Australian children is being violated on a large scale by the artificial intelligence (AI) industry, with personal images, names, locations and ages being used to train some of the world’s leading AI models.

Researchers from Human Rights Watch (HRW) discovered the images in a prominent dataset, including a newborn baby still connected to their mother by an umbilical cord, preschoolers playing musical instruments, and girls in swimsuits at a school sports carnival.

“Ordinary moments of childhood were captured and scraped and put into this dataset,” said Hye Jung Han, a children’s rights and technology researcher at HRW.

“It’s really quite scary and astonishing.”

Students targeted by deepfake imagery

Cyber safety experts say the ability to create fake sexually explicit images has become frighteningly easy in the last couple of years.

The images were found in LAION-5B, a free online dataset of 5.85 billion images, used to train a number of publicly available AI generators that produce hyper-realistic images.

Researchers were investigating the AI supply chain following an incident at Bacchus Marsh Grammar School, where deepfake nude images of female students were allegedly produced by a peer, using AI.

HRW examined a sample of 5,850 images from the collection, covering a broad range of subject matter — from potatoes to planets to people — and found 190 Australian children, from every state and territory.

“From the sample that I looked at, children seem to be over-represented in this dataset, which is indeed quite strange,” Ms Hye Jung said.

“That might give us a clue [as] to how these AI models are able to then produce extremely realistic images of children.”

The images were gathered using a common automated tool called a “web crawler”, which is programmed to scour the internet for certain content.

HRW believes the images have been taken from popular photo and video-sharing sites including YouTube, Flickr, and blogging platforms, as well as sites many would presume were private.

“Other photos were uploaded [to their own websites] by schools, or by photographers hired by families,” said Hye Jung Han, adding that the images were not easily findable via search, or on public versions of the websites they came from.

Some images also came with highly specific captions, often including children’s full names, where they lived, hospitals they’d attended, and their ages when the photo was taken.

The revelations are a wake-up call for the industry, according to Professor Simon Lucey, Director of the Australian Institute for Machine Learning at the University of Adelaide.

He says AI is in a “wild west” phase.

“If there’s a dataset out there, people are going to use it,” he said.

‘The harm is already done’

According to the experts, AI models are incapable of forgetting their training data.

“The AI model has already learned that child’s features and will use it in ways that nobody can really foresee in the future,” Ms Hye Jung said.

Additionally, there’s a slim but real risk that AI image models will reproduce elements of their training data — for example, a child’s face.

“There has been quite a lot of research going into this … and it seems to be that there is some leakage in these models,” Professor Lucey said.

There are no known reports of actual children’s images being reproduced inadvertently, but Dr Lucey said the capability was there.

He believes there are certain models which should be switched off completely.

“Where you can’t reliably point to where the data has come from, I think that’s a really appropriate thing to do,” he said.

He emphasised though that there were plenty of safe and responsible ways to train AI.

“There’s so many examples of AI being used for good, whether it’s about discovering new medicines [or] things that are going to help with climate change.

“I’d hate to see research in AI stopped altogether,” he said.

Images of Australian children deleted from dataset

The dataset LAION-5B has been used to train many of the world’s leading AI models, such as Stable Diffusion and Midjourney, used by millions of people globally.

It was created by a German not-for-profit organisation called LAION.

In a statement to the ABC, a LAION spokesperson said its datasets “are just a collection of links to images available on [the] public internet”.

They said, “the most effective way to increase safety is to remove private children’s information from [the] public internet”.

In 2023, researchers at Stanford found hundreds of known images of child sexual abuse material (CSAM) in the LAION-5B dataset.

LAION took its dataset offline and sought to remove the material, before making the collection publicly available again.

LAION’s spokesperson told the ABC, “it’s impossible to make conclusions based on [the] tiny amounts of data analysed by HRW”.

The organisation has taken steps to remove the images discovered by HRW, even though they’ve already been used to train various AI generators.

“We can confirm that we remove all the private children data [sic] reported by HRW.”

HRW didn’t find any new instances of child sexual abuse imagery in the sample it examined, but said the inclusion of children’s images was a risk in its own right.

“The AI model is able to combine what it learns from those kinds of [sexualised] images, and… images of real Australian kids,” Ms Hye Jung said.

“[It] essentially learns from both of those concepts … to be able to then produce hyper-realistic images of Australian kids, in sexualised poses.”

‘We need governments to stand up for the community’

While the use of children’s data to train AI might be concerning, experts say the legalities are murky.

“There are very, very few instances where a breach of privacy leads to regulatory action,” said Professor Edward Santow, a former Human Rights Commissioner and current Director at the Human Technology Institute.

It’s also “incredibly difficult” for private citizens who might want to take civil action, he said.

“That’s one of the many reasons why we need to modernise Australia’s Privacy Act,” he said.

The federal government is expected to unveil proposed changes to the Act next month, including specific protections for children online.

Mr Santow said it was a long-overdue update for a law that was mostly written “before the internet was created”.

“We have a moment now where we need governments to really stand up for the community … because pretty soon in the next year or two, that moment will have passed,” he said.

“These [AI] models will all have been created and there’ll just be no easy way of unpicking what has gone wrong.”

HRW is also calling for urgent law reform.

“These things are not set in stone… it is actually possible to shape the trajectory of this technology now,” Ms Hye Jung said.

The Human Rights Watch piece provides:

(Sydney) – Personal photos of Australian children are being used to create powerful artificial intelligence (AI) tools without the knowledge or consent of the children or their families, Human Rights Watch said today. These photos are scraped off the web into a large data set that companies then use to train their AI tools. In turn, others use these tools to create malicious deepfakes that put even more children at risk of exploitation and harm.

“Children should not have to live in fear that their photos might be stolen and weaponized against them,” said Hye Jung Han, children’s rights and technology researcher and advocate at Human Rights Watch. “The Australian government should urgently adopt laws to protect children’s data from AI-fueled misuse.”

Analysis by Human Rights Watch found that LAION-5B, a data set used to train popular AI tools and built by scraping most of the internet, contains links to identifiable photos of Australian children. Some children’s names are listed in the accompanying caption or the URL where the image is stored. In many cases, their identities are easily traceable, including information on when and where the child was at the time their photo was taken.

One such photo features two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colorful mural. The accompanying caption reveals both children’s full names and ages, and the name of the preschool they attend in Perth, in Western Australia. Information about these children does not appear to exist anywhere else on the internet.

Human Rights Watch found 190 photos of children from all of Australia’s states and territories. This is likely to be a significant undercount of the amount of children’s personal data in LAION-5B, as Human Rights Watch reviewed fewer than 0.0001 percent of the 5.85 billion images and captions contained in the data set.

The photos Human Rights Watch reviewed span the entirety of childhood. They capture intimate moments of babies being born into the gloved hands of doctors and still connected to their mother through their umbilical cord; young children blowing bubbles or playing instruments in preschools; children dressed as their favorite characters for Book Week; and girls in swimsuits at their school swimming carnival.

The photos also capture First Nations children, including those identified in captions as being from the Anangu, Arrernte, Pitjantjatjara, Pintupi, Tiwi, and Warlpiri peoples. These photos include toddlers dancing to a song in their Indigenous language; a girl proudly holding a sand goanna lizard by its tail; and three young boys with traditional body paint and their arms around each other.

Many of these photos were originally seen by few people and previously had a measure of privacy. They do not appear to be possible to find through an online search. Some photos were posted by children or their family on personal blogs and photo- and video-sharing sites. Other photos were uploaded by schools, or by photographers hired by families to capture personal moments and portraits. Some of these photos are not possible to find on the publicly accessible versions of these websites. Some were uploaded years or even a decade before LAION-5B was created.

Human Rights Watch found that LAION-5B also contained photos from sources that had taken steps to protect children’s privacy. One such photo is a close-up of two boys making funny faces, captured from a video posted on YouTube of teenagers celebrating Schoolies week after their final exams. The video’s creator took precautions to protect the privacy of those featured in the video: Its privacy settings are set to “unlisted,” and the video does not show up in YouTube’s search results.

YouTube’s terms of service prohibit scraping or harvesting information that might identify a person, including images of their faces, except under certain circumstances; this instance appears to violate these policies. YouTube did not respond to our request for comment.

Once their data is swept up and fed into AI systems, these children face further threats to their privacy due to flaws in the technology. AI models, including those trained on LAION-5B, are notorious for leaking private information; they can reproduce identical copies of the material they were trained on, including medical records and photos of real people. Guardrails set by some companies to prevent the leakage of sensitive data have been repeatedly broken

Moreover, current AI models cannot forget data they were trained on, even if the data was later removed from the training data set. This perpetuity risks harming Indigenous Australians in particular, as many First Nations peoples restrict the reproduction of photos of deceased people during periods of mourning.

These privacy risks pave the way for further harm, Human Rights Watch said. Training on photos of real children enables AI models to create convincing clones of any child, based on a handful of photos or even a single image. Malicious actors have used LAION-trained AI tools to generate explicit imagery of children using innocuous photos, as well as explicit imagery of child survivors whose images of sexual abuse were scraped into LAION-5B.

Likewise, the presence of Australian children in LAION-5B contributes to the ability of AI models trained on this data set to produce realistic imagery of Australian children. This substantially amplifies the existing risk children face that someone will steal their likeness from photos or videos of themselves posted online and use AI to manipulate them into saying or doing things that they never said nor did.

In June 2024, about 50 girls from Melbourne reported that photos from their social media profiles were taken and manipulated using AI to create sexually explicit deepfakes of them, which were then circulated online.

Fabricated media have always existed, but required time, resources, and expertise to create, and were largely unrealistic. Current AI tools create lifelike outputs in seconds, are often free, and are easy to use, risking the proliferation of nonconsensual deepfakes that could recirculate online forever and inflict lasting harm.

LAION, the German nonprofit organization that manages LAION-5B, confirmed on June 1 that the data set contained the children’s personal photos found by Human Rights Watch, and pledged to remove them. It disputed that AI models trained on LAION-5B could reproduce personal data verbatim. LAION said, “We urge the HRW to reach out to the individuals or their guardians to encourage removing the content from public domains, which will help prevent its recirculation.”

Mark Dreyfus, Australia’s attorney general, recently introducedbill in parliament banning the nonconsensual creation or sharing of sexually explicit deepfakes of adults, noting that such imagery of children would continue to be treated as child abuse material under the Criminal Code. However, Human Rights Watch said that this approach misses the deeper problem that children’s personal data remains unprotected from misuse, including the nonconsensual manipulation of real children’s likenesses into any kind of deepfake.

In August, Australia’s government is set to introduce reforms to the Privacy Act, including drafting Australia’s first child data protection law, known as the Children’s Online Privacy Code. This Code should protect the best interests of the child, as recognized in the United Nations Convention on the Rights of the Child, and their full range of rights in the collection, processing, use, and retention of children’s personal data, Human Rights Watch said.

The Children’s Online Privacy Code should prohibit scraping children’s personal data into AI systems. It should also prohibit the nonconsensual digital replication or manipulation of children’s likenesses. And it should provide children who experience harm with mechanisms to seek meaningful justice and remedy.

Australia’s government should also ensure that any proposed AI regulations incorporate data privacy protections for everyone, and especially for children.

“Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable,” Han said. “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”





Leave a Reply

Verified by MonsterInsights