searchmetrics email facebook github gplus instagram linkedin phone rss twitter whatsapp youtube arrow-right chevron-up chevron-down chevron-left chevron-right clock close menu search
98139813

Unwrapping the Secrets of SEO: How Google Interprets Search Queries

Almost every search term is an implicit or explicit question. With voice search and mobile devices, it is even more important for Google to be able to identify search queries and the user intent or meaning behind them. This helps the search engine provide search results that precisely match the user’s request. In our latest Unwrapping the Secrets of SEO, guest expert Olaf Kopp, co-founder, head of SEO and Chief Business Development Officer (CBDO) at Aufgesang Inbound marketing GmbH, examines semantics and machine learning at Google.

Unwrapping the Secrets of SEO

 

What’s it all mean?

In 2009, Ori Allon, then technical lead of the Google Search Quality Team, said in an interview with IDG:

“We’re working really hard at search quality to have a better understanding of the context of the query, of what is the query. The query isn’t the sum of all the terms. The query has a meaning behind it. For simple queries like ‘Britney Spears’ and ‘Barack Obama’ it’s pretty easy for us to rank the pages. But when the query is ‘What medication should I take after my eye surgery?’, that’s much harder. We need to understand the meaning…”

Ultimately, Google wants to identify the user, or search, intent.

How Google identifies search intent

To do this, Google has to understand the context. When talking about context, we have to differentiate between search query context like the relationships between terms, user context like the location and (search) historical, and topical context. Some forms of context are dynamic and can change over time. By considering all available forms of context, an individual and solid understanding of user intent can be extrapolated for each search query.

Google therefore has to answer the following questions:

  1. Where is the user?
  2. Which device is the user searching with?
  3. What has the user been interested in in the past?
  4. How are the terms used related to one another?
  5. Which entities are included in the search request?
  6. In which topical context are the terms used?
  • Google can quickly answer the first two questions using client-information, GPS data and IP addresses. The third question can be answered using search history, clicks in the SERPs and general online behavior.
  • The last three questions, which relate to the actual meaning of the search query, cannot be answered as easily.

Enter RankBrain

Google’s introduction of RankBrain was a huge step towards improved scaling and performance.

In order for Google to be able to recognize the meanings of search terms, a kind of semantic understanding has to be imitated using statistical methods. This requires the classification of search terms using comments or annotations and the mapping of terms which are not yet known to relevant topics. Due to the enormous number of search terms which are entered into Google every day, this cannot possibly take place manually. To enable scalability, it has to take place using cluster analyses and automatic clustering.

Google has been able to do this since 2015, when it introduced Machine Learning in the form of RankBrain. This helped Google join the dots between scalability and the reconstructed semantic understanding of search queries.

Methods for interpreting search queries

Google uses so-called vector space analyses to interpret search queries. These convert the search query into a vector, and draw the relationships to other terms within the vector space. By comparing relationship patterns, the search intent or meaning can be identified, even if the specific search query has not been previously analyzed.

In this regard, user signals like the click-through rate on the individual search result seem to play a particularly important role. In two scientific projects that Google employees were involved in, I discovered interesting information regarding how an algorithm addressing this could work.

In Learning from User Interactions in Personal Search via Attribute Parameterization, it is explained how Google may be able to use an analysis of user behavior and individual documents to create semantic attribute relationships between search queries and the documents clicked on – and even to support a self-learning ranking algorithm:

“The case in private search is different. Users usually do not share documents (e.g., emails or personal files), and therefore directly aggregating interaction history across users becomes infeasible. To address this problem, instead of directly learning from user behavior for a given [query, doc] pair like in web search, we instead choose to represent documents and queries using semantically coherent attributes that are in some way indicative of their content.

This approach is schematically described in Figure 2. Both documents and queries are projected into an aggregated attribute space, and the matching is done through that intermediate representation, rather than directly. Since we assume that the attributes are semantically meaningful, we expect that similar personal documents and queries will share many of the same aggregate attributes, making the attribute level matches a useful feature in a learning-to-rank model.”

Another scientific paper from Google with the title “Improving semantic topic clustering for search queries with word co-occurrence and bipartite graph co-clustering” provides several interesting insights into how Google now probably categorizes search queries into different topic clusters.

In this document, two methods are presented that Google uses to establish content for search queries. So-called “lift scores” play a central role in the first, entitled “Word Co-occurrence Clustering”:

In this formula, “wi” stands for all terms that are closely related to the word’s root, like misspellings, plurals, singulars and synonyms. “a” can be any user interaction like the search for a particular search term or visiting a particular page. If the lift score is, for example, 5, then the probability that “wi” is being searched for is five times higher than the general likelihood of “wi” being searched for.

“A large lift score helps us to construct topics around meaningful rather than uninteresting words. In practice the probabilities can be estimated using word frequency in Google search history within a recent time window.”

This makes it possible to assign terms to certain entities like “Mercedes” and/or – if there is a search for replacement car parts – to the topical context cluster “car.” The context cluster or entity can then also have terms assigned to it that often appear as co-occurrences with the search term. This makes it possible to quickly create a search term wordcloud for a certain topic. The size of the lift score determines the closeness to the topic:

“We use lift score to rank the words by importance and then threshold it to obtain a set of words highly associated with the context.”

This method is particularly useful when “wi” is already known, like with searches for brands or categories that are already known. If “wi” cannot be clearly defined, because the search terms for the same topic are too varied, then Google could use a second method: weighted bigraph clustering.

This method is based on two assumptions:

  1. Users with the same intent phrase their search queries differently. Search engines still display the same search results.
  2. For any given search query, similar URLs are displayed amongst the top search results.

Applying this method, the search terms are compared with the top-ranking URLs and query-URL pairs are created whose relationship is also weighted according to users’ click-through rates and page impressions. This makes it possible to recognize similarities between search terms that do not contain the same lexical root, and thereby create semantic clusters.

The role of entities in interpreting search queries

Google wants to find out what the entity is that a question is referring to. By looking at the entities in a search term and the relational context between entities, Google can identify the sought entity.

Even if the results do differ slightly, Google recognizes that the entities, “Bill Bowerman” and “Phil Knight,” are being searched for, even though the names do not appear in the search query. And it makes no difference whether I ask an implicit question like, “founder Nike” or an explicit question. The entity, “Nike,” and the relational context, “founder,” are sufficient.

This capability is often erroneously attributed to RankBrain and/or Google’s Machine Learning technologies. However, it actually has its origins in the functionality of Hummingbird, together with the Knowledge Graph. Ergo: Google was able to do this before RankBrain ever showed up.

As early as 2009, Google introduced the first semantic technologies for interpreting search terms with its “related searches.” The inventor of this technology, Ori Allon, already prepared Google’s users for the underlying technology having a wider-reaching impact on rankings. The patent for the technology developed by Allon can be found here.

The patent primarily deals with the interpretation of search queries and its fine-tuning. This means that it is likely this that RankBrain would later build on with its Machine-Leaning technologies. Since RankBrain (if not earlier), Google has been able to conduct a scalable semantic interpretation of search queries using Machine Learning.

According to the patent, the fine-tuning of a search query relates to particular entities that frequently appear together in documents ranking for the original search query or for synonyms.

The problem pre-RankBrain was the lack of scalability when looking to identify entities and store them in the Knowledge Graph. The Knowledge Graph is mainly based on information from Wikidata, which is verified by Wikipedia entities – meaning it is a manually curated and therefore static and non-scalable system.

 “Wikipedia is often used as a benchmark for entity mapping systems. As described in Subsection 3.5 this leads to sufficiently good results, and we argue it would be surprising if further effort in this area would lead to reasonable gains.”
Source: From Freebase to Wikidata – The Great Migration

Google Gets Good (or Great?)

It can be safely assumed that Google has been working on the development of a search engine that includes semantic influences to better understand the meaning of search queries and documents since at least 2007.

As of today, with semantic structures like the Knowledge Graph and with Machine Learning, Google seems to be very close to the goal stated by former VP, Marissa May, of moving away from a purely keyword-based search engine to a concept- or context-based search engine.

“Right now, Google is really good with keywords and that’s a limitation we think the search engine should be able to overcome with time. People should be able to ask questions and we should understand their meaning, or they should be able to talk about things at a conceptual level. We see a lot of concept-based questions — not about what words will appear on the page but more like ‘what is this about?’
Source: Google wants your Phonemes – Infoworld

And, really, it’s high time Google reached this goal – if you consider that Voice Search is on the march worldwide and that search queries are becoming more and more complex.

Olaf Kopp

Olaf Kopp

Olaf Kopp is Co-Founder, Head of SEO and Chief Business Development Officer (CBDO) at Aufgesang Inbound Marketing GmbH. He specializes in online marketing and content marketing strategies throughout the customer journey, semantic SEO und digital brand building. As co-organiser of SEAcamp, he has been involved in social media marketing, Google AdWords, SEO and content marketing since 2005.

7 thoughts on “Unwrapping the Secrets of SEO: How Google Interprets Search Queries


  • Filip Zafirovski 2017/10/10 at 9:12 am

    Great article Olaf, thanks for the share.

    I want to ask you.

    What does the future hold for SEO trough these changes?

    Should one focus on creating more content for people rather then the search engines?

    Kind regards,
    Filip

  • Hi Filip, thx. In the future Google will be much better in Interpreting search queries and detecting the search intent. That is one of the main goals of Google since ten years. Generally I would always recommend User First for creating content independent of the points in this article.

  • Ali Haider Shah 2017/10/10 at 3:32 pm

    No one understands what Google really wants, in some updates it devastates a blog and in some updates it gives a slide boost to rankings, no one can understand about it!
    But no doubt admin this is a great article thanks for sharing these ideas and in the end I will say no one can understand Google exact sites promoting pattern.

  • Hi Ali, for sure. But in the article I focussed on interpreting search Queries. Not Rankings.

  • As an SEO, I believe Google is set to return results in a more semantically based environment using algorithms like Hummingbird.

    Also, it has done a great job till now I think. Look at the percentage of relevant search results you get to your search queries. I also see Google transforming from a search engine to a conversation engine in the next 3-5 years. The journey has already started. Look at the amount of answers you get on Google first page without even visiting any website listed in the search results.

  • Yes, I totally agree with you Fahad. This will be the future of search.

  • Your Keywords Are Not What You Think They Are. According to the patent, when Google receives a query, it will go and grab web documents from its index that it associates with the query, based on the keyword the searcher typed in. It will then look at the pages it extracted and the concepts, or semantic clusters, they are associated with. If it finds out that the results belong to a few very different concepts, it will likely conclude that the query is ambiguous and would benefit from a refinement.


Write a Comment

Note: If you enter something other than a name here (such as a keyword), or if your entry seems to have been made for commercial or advertising purposes, we reserve the right to delete or edit your comment. So please only post genuine comments here!

Also, please note that, with the submission of your comment, you allow your data to be stored by blog.searchmetrics.com/us/. To enable comments to be reviewed and to prevent abuse, this website stores the name, email address, comment text, and the IP address and timestamp of your comment. The comments can be deleted at any time. Detailed information can be found in our privacy statement.