Children’s photos are being ‘illegally used to train AI’

Children’s photos are being ‘illegally used to train AI’

Personal photos of Brazilian children are being used without their knowledge or consent to develop sophisticated artificial intelligence (AI) tools, according to Human Rights Watch. These images are reportedly collected from the internet and compiled into extensive datasets, which companies use to improve their AI technologies.

Consequently, these tools are then said to be employed to produce harmful deepfakes, increasing the risk of exploitation and harm to more children.

Hye Jung Han, a children’s rights and technology researcher and Human Rights Watch advocate said: “Children should not have to live in fear that their photos might be stolen and weaponized against them.

“The government should urgently adopt policies to protect children’s data from AI-fueled misuse,” she continued, warning, “Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable.”

She added: “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”

Are children’s images being used to train AI?

The investigation revealed that LAION-5B, a major dataset used by prominent AI applications and compiled by crawling vast amounts of online content, includes links to identifiable images of Brazilian children.

These images often include the children’s names either in the caption or the URL where the image is hosted. In various examples, it’s possible to trace the children’s identities, revealing details about the time and place the photos were taken. According to WIRED, it has been used to train several AI models, such as Stability AI’s Stable Diffusion image generation tool.

Human Rights Watch identified 170 images of children across at least 10 Brazilian states including Alagoas, Bahia, Ceará, Mato Grosso do Sul, Minas Gerais, Paraná, Rio de Janeiro, Rio Grande do Sul, Santa Catarina, and São Paulo from as far back as the mid-1990s. This figure likely represents just a fraction of the children’s personal data in LAION-5B, as only less than 0.0001 per cent of the dataset’s 5.85 billion images and captions were reviewed.

The organization claims that at least 85 girls from some of these Brazilian states have been subjected to harassment. Their classmates misused AI technology to create sexually explicit deepfakes using photographs from the girls’ social media profiles and subsequently distributed these manipulated images online.

LAION responds to claims

In response, LAION, the German AI nonprofit overseeing LAION-5B, acknowledged that the dataset used some children’s photos identified by Human Rights Watch and committed to removing them. However, it contested that AI models trained on LAION-5B could exactly copy personal data. LAION also stated that it was the responsibility of children and their guardians to delete personal photos from the internet, arguing this was the best way to prevent misuse.

In December, Stanford University’s Internet Observatory reported that LAION-5B contained thousands of suspected child sexual abuse images. LAION took the dataset offline and released a statement saying it “has a zero tolerance policy for illegal content”.

ReadWrite has reached out to LAION and Stability AI for comment.

Featured image: Canva / Ideogram

The post “Children’s photos are being ‘illegally used to train AI'” by Suswati Basu was published on 06/10/2024 by readwrite.com