Since Google’s recent announcement, the question everyone is asking is: “How can I optimize for BERT?”
If you haven’t read another article on the subject yet, the short answer is: you can’t.
BERT does not rate your webpage or your site. According to Google, it does not (currently) rate your content. You cannot use any optimization for BERT to directly affect how ranking algorithms evaluate your page.
But that’s just the beginning.
Let’s look at the details.
What is BERT?
BERT is a natural language processing algorithm. It made waves in 2018 by breaking records for machine-based performance on a variety of language tasks. Most importantly, models that use BERT significantly increase their ability to understand context and meaning in words used normally, the way human speakers and writers use them.
The key to this understanding is its access to what comes both before and after a given word, which wasn’t previously possible in Natural Language Processing. This helps address ambiguity, which is notoriously difficult for machines that lack the real-world knowledge and common-sense reasoning skills that humans possess.
BERT is a pre-training technology, which means that it helps train question answering systems and other models quickly and accurately.
BERT itself is available as open source technology for anyone to use.
The importance of bidirectional analysis
BERT stands for “Bidirectional Encoder Representations from Transformers”. BERT’s novelty comes from its ability to examine both the preceding and the following words in a phrase or sentence. This is the “bidirectional” part in its name.
Here’s an example from Google’s AI blog post in November 2018, concerning the difficulty faced by previous techniques when dealing with phrases like:
I accessed the bank account…
I accessed the bank of the river…
The word “bank” would have the same context-free representation in “bank account” and “bank of the river.” Contextual models instead generate a representation of each word that is based on the other words in the sentence. For example, in the sentence “I accessed the bank account,” a unidirectional contextual model would represent “bank” based on “I accessed the” but not “account.” However, BERT represents “bank” using both its previous and next context — “I accessed the … account”.
BERT gains a much better understanding of words and phrases based on this bidirectional context, but that’s not its only advantage. BERT can also learn to understand relationships between two sentences. Models using BERT can predict whether a sentence A is likely to be followed by a sentence B or not:
This means that BERT can differentiate between a text that doesn’t make much sense despite using well-formed sentences, and a text that develops an idea in a predictably coherent way.
If these seem like very obvious advantages that should have already been taken into account, you’re mostly right. People working in the field of Natural Language Processing have been trying to find a way to obtain them for a long time.
However, because of the way models are trained–that is, the way they learn from practice data–, it is very difficult to create a technique that successfully takes both the previous and next words into account. BERT is the first technique that was successful in preparing a neural network to be able to perform these tasks.
How BERT helps Google
Integrating BERT into search algorithms is a logical next step for Google in their mission to understand the meaning and the context of search queries, and to provide better search results to users. BERT builds on (but does not replace) previous major search changes:
- Hummingbird (2013), which took into account the entire search phrase instead of individual words
- Rankbrain (2015), which is a machine learning algorithm used to rewrite queries, particularly when commonly understood words are missing from the actual query
- User intent analysis, which draws from contextual information (Is the user on a mobile phone? What is the user’s location? What time of year or time of day is it? Does the query contain keyword intent indicators?) in order to understand what the user is trying to find
In the rest of this article, we’ll look at what Google does with BERT, and how you can ensure your content isn’t losing out.
Query understanding for organic search
Google uses BERT to better understand queries. Because BERT gives Google a better sense of how the words in a query work together to provide meaning. As Google explains,
Particularly for longer, more conversational queries, or searches where prepositions like “for” and “to” matter a lot to the meaning, Search will be able to understand the context of the words in your query. […]
Here’s a search for “2019 brazil traveler to usa need a visa.” The word “to” and its relationship to the other words in the query are particularly important to understanding the meaning. It’s about a Brazilian traveling to the U.S., and not the other way around. Previously, our algorithms wouldn’t understand the importance of this connection, and we returned results about U.S. citizens traveling to Brazil. With BERT, Search is able to grasp this nuance and know that the very common word “to” actually matters a lot here, and we can provide a much more relevant result for this query.
Note that BERT has been trained in English, and these modifications apply to English-language queries. Google also intends to “take models that learn from improvements in English (a language where the vast majority of web content exists) and apply them to other languages.”
Consequently, BERT has only been rolled out in the United States for queries on google.com. There’s no explicit timeline for a broader roll-out, but the plan is to expand gradually to other countries.
How to optimize for query understanding
On this point, there’s no possible optimization for content creators. While BERT affects ranking, it does so by interpreting the query differently, not by analyzing your page differently. In fact, with no intervention by content creators, this should send better quality traffic to a website.
This is in line with the official statement from Google:
There’s nothing to optimize for with BERT, nor anything for anyone to be rethinking. The fundamentals of us seeking to reward great content remain unchanged.
— Danny Sullivan (@dannysullivan) October 28, 2019
Featured snippet improvement
Google uses BERT to “improve featured snippets”. Like Google’s Search, featured snippets are available in “two dozen countries” using different languages.
The example given in Google’s post is a featured snippet for the query “parking on a hill with no curb” that previously gave little importance to the word “no” and showed a featured snippet that described parking on a hill, but didn’t address the “no curb” part of the query. In fact, having retained the word “curb”, it provided instructions for parking on hills where there is a curb.
With the use of BERT, Google is better able to understand that the searcher wants to know about parking, on a hill, AND with no curb. The featured snippet provided now is much more pertinent.
Once the query is correctly understood, the ability to correctly understand and select a featured snippet to match it suggests that Google is likely also using BERT on content in its index, rather than exclusively on queries.
How to optimize for featured snippets with BERT
For the moment, even if Google is using BERT to understand your content, there’s still no obvious way to optimize for it.
The effect BERT should have is the creation of better featured snippets based on your content, which in turn should mean that you should see a higher quality of traffic from featured snippets you hold.
You might also obtain more featured snippets for queries that Google did not differentiate before, but now understands to be separate questions.
(Probable?) (Future?) Content evaluation of indexed URLs
Vincent Terassi, SEO data scientist and Product Director at OnCrawl, predicts that BERT’s use in ranking algorithms will penalize certain types of content: autogenerated, poor quality content, and particularly sites that use spinners to target more keywords.
We mentioned earlier that one of the advantages of BERT’s bidirectionality is its ability to understand the relationships between sentences. In fact, BERT’s “textual entailment” features means that it is able to “predict” the next sentence or segment. “Predicting” in this case involves the ability to decide whether or not a proposition for the next sentence is reasonable or off-topic, regardless of whether that proposition was found in the text or generated by a machine.
Because of this, some of the most obvious uses of BERT involve recognizing and classifying abusive content. This includes:
- Deciding whether content is spam or not
Content (or parts of content) that scores poorly on spam and fact-checking tests won’t get by BERT. This may be used by Google to demote this type of content and websites that display it.
Furthermore, while running natural language processing algorithms on all of the web has been too computationally expensive, BERT drastically reduces this cost. While Google might not be ready to start feeding it all of the web, we can easily imagine a future in which BERT checks suspicious content, or in which future versions of BERT can examine most web pages.
How to optimize for content analysis with BERT
If BERT is used, as it can be (and as we believe it will be) to identify abusive content, optimizing for BERT means avoiding practices that degrade the quality and usefulness of your content for search users. This might include rejecting the use of budget content creation, such as:
- Automatic translation with no human copyediting to languages where BERT is being used
- Automatic content generation without human copyediting in languages where BERT is being used
- Use of content spinners
- Overuse of synonyms, keyword stuffing
- Low-quality content writers
- Non-expert writers, particularly in fields with recognizable tone, sentence patterns, or vocabulary
You’ll probably see the best results if you optimize your content for human readers, using natural language. This may also mean taking a less keyword-focused approach in favor of developing information-rich content.
You’ll also likely see more improvements post-BERT for high quality content in general, compared to the improvement you might have seen pre-BERT.
Curiously–or perhaps not–, this comes down to the same advice provided by Google: provide quality content that puts a search user first.
Where to find more information about BERT
Google’s use of BERT
- Google’s announcement on use of BERT in organic search
Bing’s use of BERT
Google is not the only search engine that uses BERT to better understand queries and content. In fact, Bing’s use of BERT was made publicly available through Azure in July 2019.
❗️ In light of the @Google #BERT announcement, here’s a friendly reminder that *you* can train the very same model on @Azure – all the code is available in open source on @GitHub and contained in a simple Jupyter notebook! ???? https://t.co/ER3Nw6jwcs
— Frédéric Dubut (@CoperniX) October 25, 2019
It’s reasonable to expect Bing and other search engines to use these models and processes to improve their own search experiences.
How BERT works
In no particular order, these resources can be useful to anyone looking to understand how BERT works and why it is considered groundbreaking:
- Dawn Anderson’s deck: “Google BERT & Family & the Natural Language Understanding Leaderboard Race“
- Google’s whitepaper: “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding“
- Minh Quang-Nhat Pham’s presentation: “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding“
- Mohd Sanad Zaki Rizvi’s article: “Demystifying BERT: The Groundbreaking NLP Framework“
- Daw Anderson’s very complete review article on Search Engine Land: “A Deep Dive into BERT: How BERT launched a rocket into natural language understanding“