How is AI getting our data, and what is it doing with it?
Jul 28, 2023, 1:00 PM | Updated: 2:16 pm

FILE - The logo for OpenAI, the maker of ChatGPT, appears on a mobile phone, in New York, Tuesday, Jan. 31, 2023. ChatGPT-maker OpenAI and The Associated Press said Thursday that they've made a deal for the artificial intelligence company to license AP's archive of news stories. (AP Photo/Richard Drew, File)
(AP Photo/Richard Drew, File)
SALT LAKE CITY — The introduction of sophisticated and accessible AI, like ChatGPT, into mainstream culture has brought this new wave of technology to the forefront of contemporary society. However, AI’s inauguration continues to be a controversial topic.
Among the many evolving concerns surrounding the danger AI can create for the public is: How does it get its data, and what is it doing with it?
How is AI getting its data?
Earl Foote, CEO and founder of Nexus IT, told Utah’s Morning News that these intelligence agencies use datasets that could include client data. But that’s not where they get most of their information.
“As these AI models, digest all of that data and learn from it, a lot of that data does contain personally identifiable information of random people like you and I,” he said. “Information that’s simply available on public websites on public record.”
These models scour the internet to feed their programs, but they are not able to differentiate between what is personal, copyrighted, or public. Therefore, this can lead to the AI picking up and filing information that you may not want it to know.
“For example, in a program like ChatGPT,” Foote explained. “If you were to ask, you know, information about a specific individual, you potentially could find information that that individual doesn’t want the general public to know about them.”
A new kind of plagiarism
AI is a collector of information, but it is also a manufacturer. Foote said this has led to several cases of alleged plagiarism. These large language models digest intellectual property and use this information to fuel their responses.
“Many authors, artists, musicians and actors are beginning to allege that their work is being plagiarized or has been used to train AI-LLMs,” Foote said.
Comedian Sarah Silverman has recently joined the pack of lawsuits against AI. She’s suing OpenAI and Meta for using content from her book to train their models.
Is there anything we can do?
Vox News reported that while there is plenty of talk circulating within the federal government encouraging AI-related bills, no real efforts have been made. However, the fact that lawmakers are even talking about it means privacy and security concerns are on their radar.
However, Congress may never make a move on assuring the nation’s data privacy. It’s likely that these programs have seized any public information already online.
For those concerned about the impacts of AI on personal data, calls for regulations and protections will continue. But until there’s legislation, if there ever is, people can take extra caution when putting any information on the internet.