Discover more from Ramblings By Mason Pelt
Apocalyptic Myths Hide AI's Reality
Media coverage of AI as if it’s a dark and threatening abyss likely to usher in an apocalypse is impacting how AI is developed, clouding the discourse when people bring up real flaws.
Isaac Asimov and Alan Turing influence the future of computer science in their own ways. Humans create myths, those legends create humanity. An infinite feedback loop of fiction and reality constructing one another.
A mythology of artificial super intelligence has dominated the conversation about AI since the 80s. Constant media coverage of AI as if it’s a dark and threatening abyss likely to usher in an apocalypse is impacting how AI is developed, clouding the discourse when people bring up real flaws in the current generation of AIs.
Thanks for reading Ramblings By Mason Pelt! Subscribe for free to receive new posts and support my work.
The AIs in existence today are far from Asimovian sci-fi super intelligence. Even the advanced AIs like ChatGPT, Neeva AI, and Bard AI aren’t generally intelligent. But wide adoption of the very sophisticated generative machine learning projects currently sucking up massive amounts of data, spinning, and regurgitating it will change humanity in ways we can never fully understand.
All machine learning is AI, but not all AI involves machine learning. Conceptually, a few simple conditional statements are an AI. At the Little Tikes: Babies First Ultron level, AIs are rule-based systems; a series of if/than statements that directly make decisions.
In the late 1940s, before the term AI was coined, Alan Turing and David Champernowne were creating conditional statements that could play chess against a human. The logic for a chess game is labyrinthian. Mathematician Claude Shannon calculated the lower bound of the game-tree complexity for chess as 1e+120 (1 followed by 120 zeros). Due to limitations of computers at the time, Turing and Champernowne’s first chess AI was played from physical paper much like a book called Tic Tac Tome lets a human play Tic-Tac-Toe against a physical book.
The first brut force attempt at a chess AI was built with the goal of winning greatly simplifying the game tree. Even simplified it reportedly took upwards of 30 minutes for Turing to work through the strategy of each move from reams of paper. Ultimately the first chess AI failed to beat a human player.
The First Machine Learning
A few years after Turing’s game, in around 1957, the first true computer chess program was up and running. A team at IBM created a chess program, we will call BCP, that took a few more steps towards modern machine learning.According to a paper published in Scientific American, BCP worked without brute force programming.
BCP was programed with the rules of chess, such that neither the computer nor human player could cheat. After that the conditional logic involved the computer examining the state of the squares on the board, and seeking answers to eight questions. Those answers informed what subset of options BCP would analyze.
BCP was remarkable, but struggled to beat even a novice chess player. As computer hardware gained power, chess AIs have gotten faster and have become impossible for human’s to beat. The way AIs are programed progressed as well.
AI Black Box
The Advanced generative chat AIs we are seeing augment search engines are a black box to the public. Generative pre-trained transformers (GPT), like ChatGPT, are created using artificial neural networks (ANNs) trained via deep learning. ANNs structurally mimic how neurons in biological nervous systems operate creating a digital brain. Deep learning trains that digital brain with lots of data, like articles and images.
Look for an explanation of how ANNs work and find many articles referencing three or more layers of neurons; an input layer, one or more hidden layers, and an output layer. Most articles are seemingly recycling information that isn’t fully understood by the author, something humans and AIs have in common.
Most humans could create a unique article about deep learning by finding a few existing articles on the topic, copying, pasting, and rewriting until they have something that appears new. As an imperfect analogy, that is how generative chat AIs work. The input layer is several articles on a given topic. The hidden layers are how the AI processes the information and generates its responses. The output layer is what the AI says.
The above is an incredible over simplification. for a more detailed but broadly accessible explanation of deep learning read this article on freeCodeCamp.
Everything Is A Remix
A child swinging a plastic lightsaber while birthing a fan fiction of Rogue One may never know the name Akira Kurosawa. But without Kurosawa’s 1957 film The Hidden Fortress, George Lucas may have been an accountant. Works build on each other, and sourcing fades away.
I first became aware of Kurosawa’s films in 2010 when I watched a series of mini documentaries titled Everything is a Remix. The series has been remastered and rereleased, (for a while it vanished from the internet), but the concepts stuck with me. Everything that exists has been influenced by other things that came prior; including human minds.
“I cannot remember the books I’ve read any more than the meals I have eaten; even so, they have made me.”
I nearly attributed the quote to Ralph Waldo Emerson, because many websites created over 100 years after his death in 1882 attribute the quote to him. According to Garson O’Toole’s quote investigator website Emerson may have said something along these lines, but no definitive source exists.
Had I misattributed the quote, someone could view this article as proof Emerson uttered those words. This is something I call source laundering; when a claim is repeated until finding the origins becomes challenging. Online, we see sources get laundered alarmingly quickly.
A random Tweet is embedded on a blog, local media cites the blog, and national media cites local media. More articles popup citing the national media. Some outlets don’t offer any citations but repeat the fact as original reporting. Now thousands of sources exist and the fact that it all started from a single random Tweet is lost to time.
Advanced AIs trained from online articles are stating as facts things where the source is little more than a Tweet. Some people are already using generative text AI as arbiters for truth. Others are creating content entirely with AI. Futurism has broken stories of both CNET and Men’s Journal publishing articles generated by AI containing plagiarism and factual errors.
Soon that AI generated content will be used to train AIs which will in turn generate more content. As that pattern repeats exponentially information laundering is going to become an issue we cannot manage or understand. This is the Pandora’s box flung open by tech companies and publishers looking to make a quick buck.
The Information Era
Citations are already hard to find due to sheer volume of content. In 1450 there were about a hundred new books published; in 2009, there were more than a million. Those facts come from a song by Pomplamoose, Ben Folds, and Nick Hornby that I first heard in 2010.
It was 13 years later while writing this article that I first tried to find the source for those lyrics. I could not. Searching for sources was a lesson in how facts are lost, information evolves, and how AI is going to amplify these problems.
According to University of Minnesota Libraries, in Europe books were almost entirely hand scribed until the year 1500 but wood block printing began in Tang Dynasty China around 800 years earlier. By 1450 the barrier for printing was low enough in China that perhaps thousands of books were published, nearly all lost to time.
It’s certain that over a million books were published in 2009 but a single coherence citation alludes me. When I asked ChatGPT, the AI told me “According to the United Nations Educational, Scientific and Cultural Organization (UNESCO), approximately 2.2 million new books were published worldwide in 2009.” This is not true!!! UNESCO published no such report in 2009!!!
The pedigree of data for ChatGPT quoting that 2.2 million book figure is source laundering to an extreme.
AI Launders Information, Lies And Plagiarizes
ChatGPT plagiarized a blog post from January 2019 written by John Jennings. Jennings is a Forbes contributor, but this is published in a personal side project, The Interesting Fact of the Day Blog (The IFOD). For the record, I’m not attacking Jennings.
On the IFOD, Jennings wrote:
UNESCO (United Nations Educational, Scientific and Cultural Organization) keeps track of books published by country. It estimates that 2.2 million new titles are published worldwide each year. It’s data is mainly from 2013. Here’s how it breaks down by country.:
Note how incredibly similar that statement is to the one ChatGPT gave me. But Jennings’ numbers are mainly for 2013, so ChatGPT made 2009 up. Jennings’ blog post is incorrect in attributing data to UNESCO.
High Tech Plagiarism Inherited Human Flaws
Jennings (or whatever primary source he cited) used Wikipedia’s Books published per country per year page for research. Based on the edit history, the last Wikipedia edit prior to his post is December 2018, that version said:
“This page lists the number of book titles published per country per year from various sources. According to UNESCO, this is an important index of the standard of living and education and of a country’s self-awareness.”
The Wikipedia page uses several data sources to establish the number of books published per year. A couple (1, 2) of Wikipedia’s citations do show some production numbers from UNESCO reports, but those reports do not extend past 1996.
UNESCO is not a source for book publishing data for 2013 in any of the 20 versions of the Wikipedia page I randomly checked. On that page UNESCO is primarily used to say the number of published books is an important metric of a country standard of living.
The Wikipedia page in December 2018 draws mainly from the International Publishers’ Association’s 2013-2014 report. Summing up publications for new books and reprints from that report shows 1,864,971 from 2013, or 4,131,389 if you look at all listed years.
2,210,000 is the sum of books published from the most recent available year in each of the 119 countries listed on the December 2018 version of the Books published per country per year Wikipedia page. Those years span from 1990 until 2017. ChatGPT told me UNESCO estimated 2.2 million books were printed in 2009, because a human misattributed a source, and the computer is a lying plagiarist.
Hidden Intellectual Property Infringement
The only reason I can trace a source for ChatGPT claiming that UNESCO published a report about new books in 2009, is because I can find only one article making the same incorrect claim. If ChatGPT were accurate, or many sources made the same wrong claim, it would have been harder to discover how ChatGPT got it’s information.
Several AIs that generate images from text prompts are currently the subjects of intellectual property litigation for the training data they used. This article is long enough without an explanation of deep learning, and several other types of machine learning for images. For purposes here it’s enough to say these AIs are trained by examining similarities and patterns between a huge volume of images. When the AI creates an image, that image is a remix of the training data, sometimes that image is even close to being a copy of a single image.
An AI’s creators may always be able to see what data contributed to generating a specific work. However, avoiding lawsuits and maintaining a competitive advantage leaves companies unlikely to share that information. Developers of these systems may even intentionally obfuscate sourcing.
The general public won’t get to work backwards from a known output, through hidden layers, all the way to the inputs. With more AI works generated by more machine learning models the volume of data becomes unmanageable. People who edit AI generated works passing them off as human creations will potentially remove any chance of reverse engineering to uncover which specific inputs created each work.
With rampant and unchecked copyright infringement we will see a feedback loop where AIs are trained by AI generated content. Instead of publishing outrage bait articles where an AI trained with science fiction answers hypothetical questions about how it will end humanity, perhaps we should be talking about how humanity is stepping on an AI rake.
Tic-Tac-Toe is much less complex than chess, with at most 362,880 move combinations. Far fewer moves when you consider the rules of the game. Playing to win further reduces the possible game tree. At just over 1400 pages Tic Tac Tome allows a human to play Tic-Tac-Toe against a physical book.
This IBM chess program doesn’t seem to have a name. The development was lead by Alex Bernstein, and several chess websites call it The Bernstein Chess Program.