That Google Data Leak and SEO

In May a prominent figure in the SEO world, Rand Fishkin, shared information about what was supposedly a significant leak of Google Search API documents. The documentation was then confirmed as authentic by some ex-Google employees. Google itself responded by urging caution.

Some of our inquisitive clients have been aware of the leak and asked us about the implication on search and SEO.

What is an API?

API stands for Application Programming Interface. Having an API allows you to query systems and retrieve data from them.

As an example, our website design clients will be familiar with the Google Maps API – It allows us to embed code into a company website, query the Google Maps system, and pull location data from it, thus allowing us to present real-time interactive maps of a business directly into the “contact us” page of a website.

And What is the Google Search API?

The Google Search API allows developers to integrate Google’s powerful search capabilities into their own applications or websites. It provides programmatic access to Google search results.

In our digital agency we use a number of tools for measuring and tracking rank, but these typically use the Google Analytics API and Google Search Console API.

The Google Search API is more often used for creating custom search on websites and integrating Google search functionality into applications.

What’s the Significance of the API Leak?

The source of the leak is anonymous but claimed to have access to the API documentation. The leak includes something like 2,569 documents containing over 14,000 attributes and features, primarily related to search.

Obviously, this is a security concern for Google but also a potential operational and intellectual property risk. Some ex-Google employees have confirmed that the documents are authentic.

One of the biggest points about the leak is that a lot of what it contains is contradictory to official Google public statements. More about that next…

Contradictions

Some specific examples of Google public statements and the leaked information are as follow:

Click Data and User Engagement Metrics

Google has long maintained that click through rates (CTR) and other user engagement metrics are not used directly in its ranking algorithms. However, the leaked documentation specifically reference metrics such as “goodClicks” ,”badClicks” and “lastLongestClicks”. Each of these imply that user engagement is indeed considered in the ranking process.

Chrome User Data

Google’s Chrome browser is a powerful tool under the bonnet. Google has said publicly that it does not use user data from the Chrome browser in the ranking process. Yet again, the leaked documents indicate that metrics like total Chrome views and transition clicks are considered.

The Google Sandbox

New websites don’t always rank immediately, despite launching fully optimised and with quality backlinks, leading to many SEO experts saying that there’s a “sandbox” system affecting them. This is not a new concept and has been circulating in the SEO community for about 20 years now. The API leak suggests that there is such a system.

Whitelists for Preferred Websites

Another long-suspected concept is that of there being a “whitelist” of websites that Google treats favourably. However, the Google Search API leak does indeed suggest that there are such things for certain verticals like travel and health websites, suggesting that these domains receive preferential treatment in search.

Twiddlers

Google likes to tell the public that ranking changes are mostly affected during what it calls “core updates” and, through its Search Liaison arm, does keep people informed about when these will happen. However, the latest revelations have unearthed “twiddlers” or micro updates which suggest a granular level of control previously not publicised.

What Was Google’s Reaction?

With such a high-profile story, Google had to make some comments about the API leak but has, as yet, still not offered an official statement on the issue.

However, there were some official communications to the huge digital marketing websites such as Search Engine Land that have a massive reach amongst the SEO community. One such comment was:

“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We’ve shared extensive information about how Search works and the types of factors that our systems weigh while also working to protect the integrity of our results from manipulation.”

The beauty of this communication from the search engine giant is that it is customarily ambiguous and neither confirms nor denies which aspects of the leaked data are accurate, invalid, or still relevant.

Clever Marketing’s View

Whilst we have over 20 years of experience in conducting search engine optimisation (SEO) and optimising websites for organic search, and thus have a lot of respect for the search engines, we always treat any official Google position with a healthy but positive dose of scepticism.

One very obvious example is in Google’s Terms of Service (TOS) and webmaster guidelines stating that “doorway pages” are prohibited. And yet we have seen many times over the years websites with such webpages ranking extremely well, despite the practice being against Google’s TOS.

Another example is that Google prohibits the buying and selling of links. We’ve recently seen high-quality adverts on YouTube from an agency selling links. And yet YouTube is owned by Google: How and why does Google allow advertisers on YouTube promote link buying when it’s a clear violation of their own TOS?

So, as you can see, we like to “toe the line” and work “white hat” when it comes to SEO services but if we see webpages succeeding in the search engine results pages (SERPs) that contravene best practice, it’s only right that we question why a search engine gives high rank to websites that are clearly bending the rules.

Introducing Generative Engine Optimisation (GEO)

To keep up with Google again “moving the goalposts” search marketers have had to do what they do best and adapt to the changing search landscape very quickly. Step in generative engine optimisation (GEO), a natural evolution of search engine optimisation (SEO).

Generative Engine Optimisation (GEO) is an advanced digital strategy that enhances visibility and relevance of online content in AI-driven search engine (generative engine) results. Whilst SEO tends to focus, at a very basic level, on keywords and backlinks, GEO focuses on optimising the content, so it’s a highly specific form of on-page SEO.

GEO optimises content to be crawled and used by the large language models (LLMs) with the intent that the content is most likely to be included in AI-generated responses to search queries. As always, whilst the creed is that content should be made for humans and not search engines, it is the machines that crawl, index, and are programmed to rank content based on what they are told real people expect from those search queries.

GEO is the absolute peak of “helpful content” provision, with practitioners providing comprehensive content that includes almost every element that might end up in a generative result. These elements and entities include:

  • Citations and quotations: Drawn from reliable and high-quality sources, citations and quotations can apparently improve visibility by up to 40% in generative engine responses. 
  • Stats and technical terms: Incorporating statistics and subject-specific technical terminology is helpful.
  • Fluency and understandability: Content should be well-written, easy to understand, and have a logical flow.
  • Authoritative content: As an authority in your field, your content must demonstrate your expertise and authority, as per Google’s EEAT.
  • Freshness: AI-powered search results seem to prefer fresh content, so ensure that it remains current and relevant.
  • Structure: Humans like well-ordered information, so make sure that your content is also structured logically.
  • Comprehensive: Ensure that your content has sufficient information to provide a satisfactory result for the search: We even recommend going the extra mile to keep users and engines happy.
  • Factual and verifiable: Provide content that is correct and can be fact-checked.
  • Multi-modal content: Whilst not imperative, adding images, video, or other media may help make your content more helpful and comprehensive.

As you can probably tell, generative search optimisation strategies and techniques are a logical extension of the age-old methods employed in providing the best content for website users. So whilst there is a new name for optimising content for AI-driven search results, the methods are the same as those the best digital agencies have been using for many years now.

Conclusion

We will continue to conduct solid SEO and GEO for our clients but are always keeping an eye on Google and also Microsoft, who run Bing search, to see how their developments affect our customers’ websites.

We are at their mercy, especially when they introduced “featured snippets”, resulting in your content being scraped and presented as Google’s own. Add to that the new “AI Overview” that compiles an AI-generated snippet above the featured snippet, and you can already picture your first page results being devalued. We need to think of other innovative and relevant ways of increasing visibility, driving traffic, and increasing sales.


Logo of Clever Marketing - Digital Marketing Agency in Hampshire, Surrey and Berkshire.For all things SEO and now GEO, call Clever Marketing’s highly experienced search marketing team to discuss organic rankings, pay per click, and paid social media campaigns.

We’re on 01276 402 381 or complete the contact us form.