The growth of automated traffic on the internet has led to accusations that the internet is mostly dead and populated by nothing but robots.
This hypothesis predates the recent rise in AI, but the presence of machine-generated activity and, which is possibly worse, machine-generated content has the potential to make the internet worse for humans.
The Rise of the Robots
In layman’s terms, a robot (or just, bot) is a program designed to do something on the internet with little to no human oversight. The internet has had robots since the very early days of the World Wide Web, and they have had both a positive and negative impact on it.
Good Bots
Generally, robots are harmless and many are actively helpful. Googlebot is a robot, although part of a particular group of robots called a spider, it goes through the internet looking for pages and then analyses them for inclusion in the great big index Google keeps of all the pages that appear in the search results.
Other harmless robots include SEO tools, which analyse webpages for particular signals (does it have a proper page title and use headings correctly, for example). We use robots to monitor our websites and immediately inform us if something isn’t working properly.
Bad Bots
Unfortunately, the internet quickly became host to robots that weren’t helpful. Malicious robots are the reason you had to select all of the traffic lights in a series of photos when you tried to do something. The reason for this is that the Captcha was protecting something the owner of the site didn’t want a robot to interact with. The use of the image was chosen as it’s a complex task to build a robot that can recognise a traffic light in a photo.
What were these bots doing? If you’ve ever advertised your business on the internet, the most noticeable malicious bots would be the ones that send spam messages through your contact forms or scraped your email address for inclusion in some dubious mailing list. Hopefully, you won’t have encountered much of this in the past decade, as the arms race between web developers and spammers has led to much of it being blocked. But I can assure you, these bots are still out there, still sending spammy messages through your forms. It just gets intercepted so it doesn’t pollute your inbox.
But, while annoying, those spam bots aren’t the worst malicious robots out there. Trying to hack into login forms and hunting for backdoors to the admin section of websites are common uses for bad robots. This is again something that developers have taken into account and have systems in place to prevent, but if you’re running a homemade WordPress website, it’s definitely why you need to keep your site (as well as any themes and plugins) up to date with security patches.
Here at Yello Studio, by the way, we don’t use WordPress, mostly because we have a whole team of highly skilled developers who simply don’t like WordPress.
Robots being Social
As social media took off, it didn’t take long for robots to find their way there. Again, the split between benign and malign is stark, but this time it is far more human.
Benign and even helpful robots exist on social media. The most common are probably those system robots that work quietly in the background removing hateful or offensive content that violates the ToS of the site or just flagging it for human review. More noticeably, there are interactive robots that perform a (typically simple) task when called upon, such as the thread unrollers on Twitter or the remind me bots on Reddit. There are even automated bots just for fun, such as Reddit’s Haikubot, which roams the comments sections of posts and tells a user if their comment is (normally accidentally) a haiku.
Robots Being Antisocial
Then the flipside; spammers, scammers, and manipulators. The earliest bots were simple spammers, going around posts and profiles to send messages. These evolved into scammers, often romance scams, especially on personal social networks, which would send messages to thousands of potential marks, for real humans to take over the scam once someone responded.
Manipulators are the most insidious of the social media bots. These use comments and likes to try to sway opinion and play whatever algorithms the social network uses to promote posts. These have been used by criminal gangs, companies, political parties, and nation-states to sway opinion, using fake interaction to get more eyeballs on their message.
Robots Gain Artificial Intelligence
The explosion of Large Language Models (LLM) and Generative AI (GenAI) and their availability to the general public had a significant impact on the content on the internet as well as the behaviour of robots. Most prominently, features designed to allow people to interact with a talkative AI quickly became used to try to engage with people. This was used mostly for either for companies looking to make a sale, or a scammer trying to hook a mark.
Artificial Intelligence vs Human Stupidity
This automated engagement is seemingly what LLMs like ChatGPT were made for (it really isn’t, but it seems like it is), being able to come up with a customised message to a user and then respond to them in a way that is mostly sensible. While it’s definitely the case that many instances of this fall flat, either outright ignored or spotted for what they are, if an AI bot seeking engagement didn’t work at all, they would have been abandoned.
AI bots running on a LLM are certainly not perfect. Increasing awareness of their use, and more importantly, their Achilles heel has started a race between those who build these bots and those who expose them. Again, most of these bots are themselves fairly harmless, maybe creating openings for a sales pitch or generating awareness of a YouTube channel. Some, however, are simply a new take on romance scams, or very dubious cryptocurrency offerings, or attempts to sway public opinion.
The AI Miners in the User Generated Content Goldrush
LLM’s sudden growth amongst the various industries trying to make a living on the internet coincided with something peculiar. Google started to give more credence to User Generated Content (UGC). This means forums, social media posts and discussions, and short-form posts. Before July last year, UGC only really showed up for highly specific queries about technical topics (Stack Overflow answers for precise coding questions, as a prime example) or a post from a relevant community answering a search about how to beat a particular level in a game.
Then Google quietly released an update, and UGC could be anywhere. Searches for any topic could result in Reddit or Quora discussions appearing prominently in the results, and that exposure has only increased since. This made those UGC sites premium real estate for anyone interested in building visibility for their brand or message.
This sudden interest in wanting to dominate the UGC elements of search results coincided with more and more people having access to advanced LLMs. This immediately resulted in Reddit and Quora being inundated with AI-generated content in a land rush aimed at dominating the search results for all sorts of search terms.
Cascade of Enshittification
Before we start going into how AI, bots, and those with nefarious intent managed to enshittify large and prominent parts of the internet, we should perhaps touch on what enshittification actually is.
Coined by Cory Doctorow in 2022, enshittification is the process by which a platform declines in quality, leading to a deterioration in user experience. Pretty much every widely used platform on the internet has been accused of this, but it’s certainly most noticeable on Facebook, Twitter, and (over the past year) Google’s search.
Search Shows Junk, Results in More Junk
When people who wanted visibility on the internet but didn’t care about user experience found that they could get page one positioning in Google search with little effort by getting AI to write their content for them, there was nothing to stop them from doing just that. Mostly these sites filled with AI content were just stuffed with ads, meaning that this visibility resulted immediately in more revenue.
Others faked large quantities of UGC with links to their sites, resulting in greater organic exposure for the whole site. Again, this was mostly for the ads revenue or selling backlinks to others who wanted greater organic performance for competitive keywords.
Not all AI generated content was bad, but it set a trend. It also resulted in a self-fulfilling prophecy about the quality of AI generated content. As LLMs are built using what they find on the internet, as more of the most prominent content on the internet became AI generated itself, the circular cascade of enshittification began. It got to the point where mainstream news publications were putting out stories about how bad Google had become.
AI-Based Search and Google/Reddit Deal
The launch of AI Overviews (AIO) in Google search quickly showed the limitations of AI. Admittedly, most of the failures were due to AI being less capable than most humans at spotting obvious sarcasm in the websites it was stealing basing its answers on. There’s certainly the worry an element is certainly that some of the answers AIO was giving came from AI generated content, which itself was unreliable.
This potential problem shows no signs of slowing, as Google made a deal with Reddit to feed all of Reddit’s data into their AI to train it. We’re seeing a modern retelling of an old adage about computers, Garbage In; Garbage Out. As you feed a mix of good content and bad content into a system which is unable to differentiate between the two, the result is not going to be great.
Malicious Content Polluting AI
The use of bots to spread disinformation across multiple platforms ensures that this information inevitably comes to the attention of AI systems. As with the inability of AI to differentiate between good content and bad, there’s also the real concern that AI doesn’t have the ability to make moral judgements about the content it is fed or that it produces.
While I am sure that the creators of the various LLMs and GenAI systems available to the public don’t want their systems to pass on disinformation from manipulative bad actors, and have presumably put great effort into teaching their AI systems to ignore it, we can’t ignore the signs that they might fail. The high-profile failures of AIO, with it giving recommendations for health-oriented queries that fly in the face of actual medical advice or basic common sense, show that we really can’t rely on the creators of the AI system to do a good job in protecting the end user.
The Dead Internet Theory
Over the last fifteen years there have been several iterations of the hypothesis that the internet is mostly made up of robots who are all working towards some sinister purpose in manipulating people. While the more extreme hypotheses fall into the category of conspiracy theory,
it is certainly the case that there are robots on the internet, manipulating people and algorithms. However, until now it’s been a battleground between those doing it for commercial reasons, personal gain, or political or malicious intent. There was no sinister unifying purpose behind these manipulations, as the perpetrators have always been highly diverse.
How AI Can Possibly Be Much Worse
But now? All of those attempts at manipulation have now been fed into AI. It’s not the occasional few sentences from a single website, it’s automated en masse across the entire UGC sphere of the internet. While that may be a small fraction of all the information these LLMs and GenAI systems are given to learn on, it may be enough to give us genuine reasons to be afraid of our new creations.
In essence, we have taken a new intelligence and given it access to all of humanity’s creativity for good and ill. It’s as if as well as Paradise Lost, Plutarch's Lives, and The Sorrows of Young Werther to understand what humanity is, Frankenstein’s monster had also been given Trump’s Art of the Deal, OJ’s If I did it, and Mein Kampf.