33-17, Q Sentral.

2A, Jalan Stesen Sentral 2, Kuala Lumpur Sentral,

50470 Federal Territory of Kuala Lumpur


Visualizing data - abstract purple background with motion blur, digital data analysis concept

The Showdown Between Data Ownership and A.I.

Have you ever wondered who owns the digital content you post online? Well, a growing number of people are concerned about how A.I. systems are using their online data without asking for permission. This includes fan fiction writers, actors, social media sites, and news outlets. They’re all pushing back against A.I. companies that use their data without consent. Let’s look into this emerging resistance against A.I. and the different forms of protest taking place.

data science specialists working at office together
data science specialists working at office together

A Surge of Data Protests

In recent years, we’ve seen a boom in the development and use of artificial intelligence (A.I.) systems. These systems, which use generative A.I. and large language models, can create super realistic content, whether it’s text, images, or other forms of media. But to do this, these A.I. models need lots of data, and they often get this from sources like fan fiction, news articles, and other online collections.

Big tech companies like Google, Meta, and OpenAI are leading the way in using data taken from the internet to train their A.I. systems. This practice, known as “scraping,” involves using publicly available information without asking for permission. Even though this has been a common method, it’s only recently that people have started to pay attention to it, especially with the rise of A.I. technologies like ChatGPT.

A Creative Uprising

One of the groups fighting back against A.I.’s data usage is the community of fan fiction writers. These writers, who have spent years creating stories based on their favorite franchises, were shocked to learn that their work had been copied and used to train A.I. models. Kit Loffstadt, a writer from South Yorkshire in Britain, decided to take action. She stopped sharing her stories online and helped organize a rebellion against A.I. systems.

The rebellion involved creating and sharing lots of silly and irrelevant stories to confuse the data-collection services that feed A.I. technology. Loffstadt and other fan fiction writers argue that their creative work shouldn’t be used by machines without their permission. They want to protect their work and have their rights as creators recognized.

Empowering Privacy and Data Protection with Secure Encryption Technology
Empowering Privacy and Data Protection with Secure Encryption Technology

A Multi-Front Fight

The fight against A.I. systems isn’t just about fan fiction writers. Social media companies, news organizations, authors, and even actors are joining in. Each group is using their own tactics to resist and challenge the ways A.I. uses their data.

Writers and artists have started locking their files or boycotting platforms that host A.I.-generated content. Social media sites, like Reddit and Twitter, are thinking about charging for access to their data, realizing its value. There have also been lawsuits against A.I. companies, accusing them of copyright infringement and unauthorized use of creative work.

These protests are a sign that people are becoming more aware of the value of online information. The days of easy data access through scraping may be ending, which will force tech companies to rethink their data collection strategies. While big companies like Google and Microsoft have large data repositories, smaller A.I. companies and nonprofits might have a harder time getting enough content to train their systems.

Data’s Changing Worth

This shift in how data is valued is central to these protests. In the past, data was considered valuable when it was freely available and could be used for advertising. But with A.I., this has changed. Companies are realizing that by protecting and using their data as input for A.I. systems, they can get even more value from it.

As the protests continue, A.I. companies are facing difficult questions about ethics and ownership. OpenAI, which created ChatGPT, says that its training data comes from licensed and publicly available content, but that it also respects creators’ and authors’ rights. Google, Meta, and other big companies are talking about how to handle content in the future, recognizing the need for a thriving content ecosystem.

Data Scientist Working with Supercomputer
Data Scientist Working with Supercomputer

Legal Fights and What They Mean for the Future

The data protests have led to a wave of lawsuits against A.I. companies. These lawsuits argue that copyrighted material has been used without permission. But legal experts suggest that these arguments might have a tough time in court. The future of A.I. and how it relates to data ownership will likely be shaped by these legal fights.

Larger companies are pushing back against A.I. scrapers in response to the protests. Reddit, for example, has said it wants to charge for access to its huge database of conversations, so it can keep the value that comes from its data. Stack Overflow, a site for programmers, also plans to ask A.I. companies to pay to access their data.

News outlets are also taking a stand against A.I. systems. Memos from big publications like The New York Times stress the need for A.I. companies to respect intellectual property. Meanwhile, artists and writers are reconsidering where they publish their work, looking for sites that protect against data scraping and A.I.-generated content.

Wrapping Up

The fight against A.I. companies using online content without permission has sparked a wave of data protests. Fan fiction writers, actors, social media companies, and news outlets are leading this movement. They’re using different strategies, like posting distracting content on the internet or filing lawsuits, to protect their work and challenge the unauthorized use of their data.

As A.I. technology advances, data is becoming more valuable. Tech giants and industry leaders are grappling with the ethical questions raised by data scraping and are looking for new ways to manage data. As the legal landscape changes and the fight for data ownership continues, it’s uncertain what the future holds for A.I. and content creators.

Data in computer

Frequently Asked Questions

1. What is data scraping and why is it a problem?

Data scraping is a method where information from websites is extracted without explicit consent. It’s a common way A.I. companies gather data to train their models. However, it has raised issues because many feel it infringes on intellectual property rights and the use of creative work without permission.

2. Who is fighting against data scraping?

Various groups including fan fiction writers, social media companies, news organizations, authors, and actors are taking a stand against data scraping. They argue that their creative work shouldn’t be used by machines without their permission and are seeking recognition of their rights as creators.

3. How are people resisting data scraping?

People are employing a variety of methods to resist data scraping. This includes flooding the internet with irrelevant stories to confuse data-collection services, locking files, boycotting platforms that host A.I.-generated content, and filing lawsuits against A.I. companies.

4. Why is data now seen as more valuable?

Previously, data was valuable when it was freely available and could be used for advertising. However, with the advent of A.I., companies have realized that by safeguarding and using their data as an input for A.I. systems, they can extract even greater value. This realization has sparked a reevaluation of data’s worth.

5. What are the implications of the legal battles against A.I. companies?

The legal battles against A.I. companies could potentially reshape the relationship between A.I. and data ownership. Depending on the outcomes of these cases, there could be new laws or regulations that govern how data is collected and used by A.I. systems.

6. How are companies responding to these data protests?

Larger companies like Reddit and Stack Overflow are considering charging A.I. companies for access to their data. Others like Google, Meta, and OpenAI are engaging in discussions on how to manage content in the future, recognizing the need to respect creators’ and authors’ rights.

Material From The New York Times