Unwrapping the Secrets of SEO: It's All Semantic For Google Search

Milestones such as the Knowledge Graph, Hummingbird and RankBrain have helped bring Google a few steps closer to becoming a perfect search engine. Statistics, semantic theories and structures, and machine learning all play an important role. In the latest Unwrapping the Secrets of SEO, guest author Olaf Kopp examines aspects of semantics and machine learning in Google search.

In the last installment of Unwrapping the Secrets of SEO, I outlined my view on how Google interprets search queries and the user intent behind them. Now it’s time to take a look at how Google does such a good job at improving search accuracy.

Semantic Search or Statistical Information Retrieval?

I have had many a heated argument (civilized debate?) with fellow SEO Jens Fauldrath over whether Google really is a semantic search engine.

The results that Google presents its users certainly make it look like the search engine giant has a highly developed semantic understanding regarding search queries and documents. However, much of what leads to this appearance is based on statistical methods, and not on any genuine semantic understanding. But due to semantic structures, in combination with statistics and machine learning, Google is now able to get close to semantic understanding.

“For instance, we find that useful semantic relationships can be automatically learned from the statistics of search queries and the corresponding results, or from the accumulated evidence of Web-based text patterns and formatted tables, in both cases without needing any manually annotated data.” Source: The Unreasonable Effectiveness of Data, IEEE Computer Society, 2009

How Word2Vec Works

To demonstrate this more clearly, I will briefly enter the work of statistical text analysis. Google uses vector space analyses for the evaluation of relevance and the identification of relationships. A vector space consists of individual data points that can be linked via vectors in the vector space. The angle between the vectors tells us about similarities and/or relationships between data points. The larger the angle, the less similarity there is. The smaller the angle, the larger the similarity. For the analysis of the main components, for example, a vector is created in the vector space from the search query and all available relevant documents. For this so-called “word embedding” process, Google uses Word2vec.

Using the proximity of data points to one another makes it possible to show the semantic relationships between them. Typically, vectors are created for search queries and documents that can be placed in relation to one another. Another usage is creating vectors from a document and the terms within it in order to identify its concept or topic. It would also be possible to form vectors from entities like people, brands, companies or topics.

In order to make use of vector space analyses, documents first need to be indexed and mapped to concepts or topic areas that then make up the relevant topical corpus. A process for carrying out this step is Latent Semantic Indexing (LSI), which makes it possible to create vector spaces that provide the best results in terms of precision and recall. Using this method, it is also possible to carry out semantic classification or clustering of terms related to a topic.

How Search Queries Can be Automatically Classified

In the past, the main problem was the lack of scalability as search queries had to be manually classified. These are former Google VP Marissa Mayer’s words on the subject from a 2009 interview:

“When people talk about semantic search and the semantic Web, they usually mean something that is very manual, with maps of various associations between words and things like that. We think you can get to a much better level of understanding through pattern-matching data, building large-scale systems. That’s how the brain works. That’s why you have all these fuzzy connections, because the brain is constantly processing lots and lots of data all the time… The problem is that language changes. Web pages change. How people express themselves changes. And all those things matter in terms of how well semantic search applies. That’s why it’s better to have an approach that’s based on machine learning and that changes, iterates and responds to the data. That’s a more robust approach. That’s not to say that semantic search has no part in search. It’s just that for us, we really prefer to focus on things that can scale. If we could come up with a semantic search solution that could scale, we would be very excited about that. For now, what we’re seeing is that a lot of our methods approximate the intelligence of semantic search but do it through other means.” Source: http://www.pcworld.com/article/181874/article.html

Much of what we call semantic understanding when we talk about Google identifying the meaning of a search query or document is built on statistical methods like vector space analyses or methods of statistical text analysis like TF-IDF. Strictly speaking, this is therefore not based on genuine semantics. But the results do get very close to semantic understanding. The increased application of machine learning – and the more detailed analyses this enables – makes the semantic interpretation of search queries and documents much easier.

Semantic Understanding as one of Google’s Goals

One of Google’s most important goals is achieving semantic understanding regarding search terms and indexed documents in order to display more relevant search results. A semantic understanding exists when a (search) query and the terms contained within it can be unambiguously understood. Unambiguous interpretation is often made difficult by queries including terms with multiple meanings, terms unknown by the system, unclear phrasing, individual understanding etc.

To aid understanding, analysis is conducted of the words used, their order, and the contexts of their topic, time and location. Machine learning and/or RankBrain enable Google to use cluster analyses to automatically create new classes and assign search queries to them. This not only establishes a high level of detail, but also creates scalability and increases performance. The creation of new vector spaces for vector space analyses is also made possible.

In this way, statistics combine with machine learning to give an increasingly semantic interpretation that comes very close to a semantic understanding of search queries and documents. Google wants to be able to recreate a truly semantic search with the help of statistical methods and machine learning. Furthermore, a central element of the modern Google search engine, the Knowledge Graph, is also based on semantic structures.

In part three of this article series on Google’s semantics and machine learning, Olaf Kopp will look at the Foundations of Semantics: Graphs, Entities and Ontologies.

6 thoughts on “Unwrapping the Secrets of SEO: It’s All Semantic For Google Search”

Shivangi Shrivastava 2017/10/18 at 4:36 am

Thanks! Such an informative article.
Ryan 2017/10/18 at 10:28 pm

Nice writeup. I’ve often wondered whether we’ll know when Google truly masters semantic search when they don’t need to rely on links. A truly powerful AI will theoretically know what a very solid piece of content might be, in the same way either of us could by evaluating a few random sites from the web. “This is spam, that one is well researched.”

Keyword research in my opinion should now focus on topics and not actual keywords. In many cases, the way the user enters a query may not be the way those words occur together in the wild. Ranking for one keyword with a focus instead on topic will land you many keyword rankings, if the content is very unique and useful, barring other signals.
Olaf Kopp 2017/10/19 at 9:26 am

Hi Ryan,

that is also my experience. Choosing the right keywords (Main Keyword, Additional Keywords and TF-IDF Keywords) to build an holistic content is especially ranking for informational keywords the best way.
Jakob Boman 2017/10/20 at 1:29 pm

Thanks for sharing your thoughts on the subject.
I have done a lot of SEO and I think Google’s algorithm is more simple than many people think as you need simplicity in order deliver results across a huge variation of websites, content and links. If it is too complicated it might blow up and provide poor search results places you can not predict.
I also find that keyword groups are more and more becoming the way to go. But that also makes sense in so many ways:-)
Steven 2017/10/26 at 12:53 pm

Great writeup! I agree with Ryan that keyword research should be focused on topics and phrases rather than actual keywords.
Steve 2017/10/31 at 12:32 am

Without google publishing papers on the subject, its never going to be possible to know for sure, but i imagine a lot of their machine learning work in search is based on feature extraction, that can be used by the ranking algorithm, rather than directly effecting the results.

The largely ‘black box’ nature of machine learning would involve google giving up a lot of control over an important part of their business

Write a Comment Cancel reply

Comment *

Name *

Email *

Note: If you enter something other than a name here (such as a keyword), or if your entry seems to have been made for commercial or advertising purposes, we reserve the right to delete or edit your comment. So please only post genuine comments here!

Also, please note that, with the submission of your comment, you allow your data to be stored by blog.searchmetrics.com/us/. To enable comments to be reviewed and to prevent abuse, this website stores the name, email address, comment text, and the IP address and timestamp of your comment. The comments can be deleted at any time. Detailed information can be found in our privacy statement.

Consent to data storage according to GDPR *