Discover more from Ramblings By Mason Pelt
Why Google Search Sucks And A Tribute To Neil Gaiman
Searching, "who is Neil Gaiman", or "list of the endless in the Neil Gaiman series" will likely give searchers the answers they seek. But ask with less specificity, incorrect information, and synonyms
First published in MasonPelt.com on January 31, 2023.
Google grants access to information kings didn't have 50 years before. I have consumed so much content, books, podcasts, movies, articles, songs, and possibly the Ph.D. thesis of a woman from Chesapeake. I cannot remember it all.
Like most, I'll occasionally use Google to find a specific but only half-recalled crumb of content. Increasingly I use services other than Google because Google sucks as a search engine. No, Grammarly, I don't mean "Google search could be better." Google search is worse than it was three years ago.
Thanks for reading Ramblings By Mason Pelt! Subscribe for free to receive new posts and support my work.
People Google Search In Two Ways
People use Google to find general information where any credible source is acceptable. Or they use Google looking for specific results.
Searching, "who is Neil Gaiman", or "list of the endless in the Neil Gaiman series" will likely give searchers the answers they seek.
But ask with less specificity, incorrect information, and synonyms, "list of the eternals from the Marvel comics books by Neil Gaiman" and Google fails to return an answer about the DC Comics series The Sandman.
A human could justifiably struggle to answer the same question. This is a fundamental limitation of indexing an evolving glob of information.
Complexities Of Indexing Growing Information
You don't need to keep an index for a few books on a nightstand. If you have no memory of one or more books, just read the dust jackets. This solution doesn't scale.
At libraries with rooms of shelves crammed with books, indexing them is a process. Library classification is complex, but every book has its place. Staff spend their days' shelf reading, looking for out-of-place books, and putting them where they belong.
Google is the shop with an index of the web. Per a Google help page, "that index is similar to an index in a library, which lists information about all the books the library has available." Instead of books, Google indexes webpages.
Google was the first search engine to use bibliometrics as part of an algorithm to sort and rank results based on quality and relevance to a search. People used this index of webpages to find the specific in the everything. The web has grown exponentially, shifting as pages are changed, deleted, replaced, and moved.
Few Attempt To Manipulate Libraries
Ranking at the top of a highly searched term on Google can mean millions of dollars. It's like a high-profit marathon that never ends, and only pays out while you're winning. The incentives mean Google has been playing cat and mouse with marketers trying to beat the algorithm since the early days.
For a few years now, Google, the Kleenex of online search, has been observed as worse than it once was. Marissa Mayer, a former Google and Yahoo executive, implied in an interview with Freakonomics Radio that Google's rapid quality decline is the result of a larger, and lower average quality internet.
Between the volume of information on the internet and those who seek to manipulate the results, Google has an uphill battle. Bing, and Duck Duck Go, face the same challenges, but only Google is currently terrible as a search engine.
Google Isn’t Focused On Improving Search
Google is the entrenched behemoth.The company really can't capture more search market share. Google owns the largest mobile operating system, and the largest web browser. Revenue for at least the next several years is likely to increase almost by default.
Google has economic incentives not to worry about being the best search engine. Any publicly traded company with a money printing machine guaranteed to work predictably for the next few years would focus on reducing cost, and finding the next honey pot.
Google and many other players seem to view AI as the next disruptive tech, and they are all focused on winning the arms race for the best dumb AI. That means testing and training the machine. Google, first with RankBrain, and later with BERT (names of search algorithm updates) incorporates far more machine learning into search than the competition.
Google executives who, again, want to make money seem willing to turn a dial that lowers search quality in the present for a profitable future. Ideally without distracting headlines about how they are promoting bleach as a covid19 cure. Slightly worse search results may even raise Google’s ad revenues.
Not the only issues plaguing Google, but the search results are biased towards larger websites, especially for controversial topics. Even when searching for specific content, like a blog post's title, Google tends not to show small websites.
Needle In A Haystack
All search engines have to prioritize ranking multiple web pages with similar keywords somehow. Even the most advanced machine learning is abysmal at processing natural language. With enough competing results a non-fungible piece of content can be berried.
I ran into this problem looking for a quote from Neil Gaiman for use in a forthcoming article. I vividly remember not just reading but hearing Gaiman read the story of sending his publisher the pitch for American Gods and receiving a mockup of a book cover in response.
Google, and Bing both failed me. I searched in vain for a semi-specific bit of content mentioning the words "Neil Gaiman" and "American Gods" and "Email" or possibly "letter" and "publisher" or perhaps "agent" or maybe "editor" and about the word "cover". That sentence's chaotic grammatical mess is a window into the Google search results pages.
Measured by volume of articles online, American Gods is Gaiman's most successful work.Thousands of pages containing all of those words or synonyms exist. A blog post teasing Robert McGinnis creating artwork for the covers of the novel’s paperback rerelease has all these keywords, but is a different story.
I finally asked everyone's favorite oracle, the generative pre-trained transformer AI, ChatGPT. Its answer,
Neil Gaiman discussed the idea for American Gods in his blog post "American Gods and the Hugo Awards" which was posted on his website on May 14th, 2001. In the post, he mentioned that he had emailed the idea for the book to his publisher.
AI Is Flawed
Problem is the blog post seemingly no longer exists on the live web.ChatGPT has yet to become a reliable source for citations. Researching for the same article ChatGPT told me that Billy McFarland was listed on the Forbes '30 Under 30' for "Technology" in 2017 and also on the list for "Finance" in 2013.
These are both untrue. Barring a conspiracy that Forbes removed the embarrassing Fyre Festival guy from online archives but did not remove Martin Shkreli, Chat GPT is wrong.
After hours of searching, I found the quote. Not from a search engine, or ChatGPT, but from remembering where a I once saw it. Three sentences, from the novel’s intro.
And then, during a stopover in Iceland, I stared at a tourist diorama of the travels of Leif Erickson, and it all came together. I wrote a letter to my agent and my editor that explained what the book would be. I wrote "American Gods" at the top of the letter, certain I could come up with a better title. A couple of weeks later, my editor sent me a mock-up of the book cover.
AI powered Search Is Problematic AF
As mentioned Google doesn't like SEOs, and has financial incentive for both slightly worse search results and for prioritizing building Wensleydale over all else.AI in every current iteration, is bad at natural language processing. Google’s over reliance on poor natural language can be seen across the search results pages.
Search “natural remedy” even with quotation marks in Google and you’ll see results for "home remedy” and “herbal medicine”. Google even boldfaces "home remedies" in the search engine results as if it what was searched, but these are not at all the same thing.
People take many drugs at home, that are not natural. I can buy RAD 140 legally as a research chemical and use it at home to treat a muscle wasting disease (I'm not recommending you do that). But RAD 140 is fully created in a lab. It's a home remedy, not a natural one.
In fairness to Google, Bing and other search engines do treat “natural remedy” and “home remedy” as the same thing. Google is just far and away the worst offender.
Google’s attempt at understanding what humans mean from a search is poor. I assume the company is leveraging user behavior, like relative click-thru rate, time on site and return to the search results page as training data for it’s AI projects.
Google Crowns Kings From Many Versions
Intellectual properties existence contrasts between the physical and digital worlds. Online the enforcement for copyrighted work can be (with some exemptions in law) that the work lives on one webpage, that can be viewed by millions of people at once. Corporeally millions of people reading a book at once, requires millions of copies of the book.
Online, the same article may appear many places, perhaps with slight differences in formatting, title, links on the page, or user comments. The broad internet tends to work best with a sort of syndication model that generates copies of the same content on various platforms. But the article is not entirely different, any more than different printings of a book are different books.
Cory Doctorow uses POSSE (post own site, share everywhere). As Doctorow said in a Tweet “[POSSE] allows me to maintain control over my work while still meeting my audience where they are, on platforms whose scale makes them hard to rely on.” I lifted from his approach when I opted for a syndication heavy model to distribute my writing.
Google (unlike the non Google internet and the real world) generally wants a single best source of anything raining as canonical. I won’t get deep into the technical explanation of canonicalization, but suffice it to say, Google wants a single page to be the source of any given article. Creating problems for searchers.
For searchers, being unable to find the content they are seeking can be an issue. Example looking for a specific website using screenshots instead of embedded content, or the option to join the mailing list. Other reasons include, the website with the fewest ads, without a paywall, the one that isn’t on Medium because you dated a developer at the company and they were an asshole. Someone may also want to see how many places syndicated the article.
Perhaps they may not even care about the article at all and they want to find a user comment. Or just want to find a website again, and are searching for it the best way they know how. Bing, and Duck Duck Go, are both usable for these purposes, Google is no longer.
It’s not just me, or a few people on the internet saying Google search is worse. Media sources even some that cover search engines specifically have been commenting on Google’s decline for a few years. Toronto Star, Fast Company, Freakonomics Radio, Search Engine Journal, The Atlantic, Washington Post, ITPro, The Telegraph, The New Daily to name a few sources.
Adapting note this from a Twitter thread I posted last year.
Google doesn't like SEOs; they will say because they game the search results. While that is undoubtedly at least a half-truth, most of Google's revenue comes from search ads. Organic traffic as an alternative to advertising is bad for Google's revenue.
Making search results slightly worse, and at the same time, hurting SEO as a service alternative to paid search ads is in Google's best interest for at least the next few years. Hitting websites for being too search optimized accomplishes both tasks.
At least in my experience, search ads perform better when organic results are worse. I think Google's goal is to ensure that most commercial pages are confined to the ads and Google My Business sections.
I wrote a post on my small personal website titled "The 1xftrv9efs of Covid19 – Google Test". The site is a reasonable example of a small occasionally updated personal blog. The post was published on March 11, 2022.
The page ranks in Google if you search "1xftrv9efs", a completely random ten-character alphanumeric code that appears no place on the internet not referring to that article. Curiously the page appears in Google for various permutations of the title, but at the time of writing, the page doesn't appear in Google's search results for the full article title without quotation marks.
Initial reviews and promotion for American Gods from 2001 were hidden by more coverage when the book won Hugo, Locus, and Nebula Awards. The Book's tenth-anniversary edition, including a fantastic radio drama audiobook created more reviews.
The American Gods TV adaptation lead to reviews of each episode. Media coverage of production mishaps. Interviews of famous actors on press junkets, Emmy predictions, and articles about where the show is available on streaming.
Assuming Gaiman’s blog post did once exist, my best guess is that the post got adapted into the introduction of the American Gods' tenth-anniversary edition. The copyright belonging to HarperCollins may have necessitated the removal of the blog post.
In The Good Omens novel Wensleydale is the AI computer that administers operations for of the Four Horsemen of the Apocalypse.
With minor differences in editing between printings, and superficially different covers and ink smudges the content of all the physical books are the same. They are one book.
Even the often highlighted final chapter difference between the US and UK versions of Anthony Burgess’ A Clockwork Orange doesn’t make those versions entirely different books. Burgess himself often argued one was a novel the other a fable, but didn’t always feel so strongly.