The Future of The Web; Websites as Data Sources

Websites as Data Sources

Since its inception, the World Wide Web has been a medium served through a screen. Websites have always been laid out as pages, with links allowing users to navigate from one page to another to find what they’re looking for.

In recent years, however, we’re seeing rapid technological changes that affect how people interact with the web. Increasingly, web content is not served to people through a screen, but relayed through voice systems and as contextual triggers on personal devices.

Voice search continues to grow and comScore expects half of all online searches to be performed by voice command by 2020. With the ubiquity of smartphones and the proliferation of smartwatches, we see a growth of intelligent behaviour from our connected devices to provide us with contextual triggers that don’t require us to perform a manual search.

The old web model where webpages are served to end users on a screen is less and less dominant. It will never disappear, but will live side by side with the new model where screen-less interactions are equally common.

The interesting question is, how can website owners prepare for this?

To provide an answer to that question, we must first understand how the technology underpinning these developments works. Specifically, we need to know how voice search platforms and interconnected apps find the information that powers their answers and contextual triggers.

Machine-Readable Content

The technology that drives voice search and contextual triggers, as well as countless other innovations on the web, has the same underlying principle: machine-readable content.

At its core, this is what you need to get ready for. If you want your content served to voice search users and your information used by interconnected devices, you will need to deliver your content in formats that machines can read and fully understand.

Historically, the web has been terrible at delivering content in machine-readable formats. Standard webpages are not conducive to delivering content to machine systems in an easily digestible way. That’s why vastly complex search engines such as Google are so important, and why even two decades after the web’s invention these search engines still struggle to consistently deliver the best answers to their users’ questions.

That’s why new languages and technologies have been introduced to make content more easily understandable for machines, so that this content can be used and re-purposed. The following four technologies are at the core of making websites machine-readable, and you will need to start implementing these straight away if you want to stay on the sharp end of the technology curve.

1. Structured Data

The most important technology that helps make your web content machine-readable is structured data. By marking up your content with structured data, you take the guesswork away and allow machine systems to extract your data and utilise it in the most appropriate context.

For example, when you host an event you should mark up your event webpage with the right structured data. This will allow machine systems to understand all the relevant information about your event.

Event structured data

This sort of markup allows voice search platforms to provide your event as an answer to questions like “what’s happening near me?” and “find a free marketing event in Leeds.”

Without structured data, you’re leaving the discovery of your content up to chance. By implementing structured data, you remove a lot of that chance and allow your content to be properly interpreted and used in the right context.

2. Knowledge Graph

Increasingly we see web search engines provide direct answers in search results – the so-called Knowledge Graph.

Rather than show a list of links for users to click on, Google will show what it believes to be the correct answer straight on the results page. For informational queries like ‘home heating oil prices‘, this is a very useful feature.

Home heating oil prices Knowledge Graph

Such a knowledge graph result is also a test for Google to see if it can be used for voice search queries. The ‘feedback’ option allows Google to learn when it gets things wrong and needs to improve its answer. In due course, Google will be confident enough to serve this sort of answer for voice queries like “what is today’s heating oil price?“.

Enabling knowledge graph results is relatively straightforward. The biggest barrier is that your website needs to be seen as a trustworthy source, so usually it requires an existing first page ranking.

If your site is already ranked well, often the addition of a table and/or a list of bullet points that answers the query can be enough to trigger a knowledge graph result. The folks at Builtvisible have done some tests and you should read their findings.

3. App Indexing

A side-effect of the unparalleled adoption of smartphones is that more content is hidden inside mobile apps. As apps have replaced the mobile web for many use cases, content is embedded in apps and effectively made invisible to web-based machine systems.

Google saw that obstacle coming early on, and has been pushing for App Indexing for several years now. In a nutshell, by supporting URLs and allowing Googlebot to crawl your app content, app indexing allows Google to integrate your app’s content in its search results and encourage mobile web users to install your app.

App Indexing

The next stage is to allow apps to interact with one another. We see the early stages of this development already, where tapping a link to a social media site on a mobile browser will allow your phone to open the relevant app rather than just load the mobile webpage. Apps are starting to talk to one another more, interchanging relevant data to enhance each other’s functionality and provide contextually relevant notifications.

Keep an eye on the app ecosystem, especially how it relates to the mobile web, as this is where a lot of interesting technologies will emerge.

4. Google AMP

Lastly, let’s talk about AMP. This new web standard has been much maligned, but it serves an additional purpose that has been missed by many commentators. That purpose is to make content fully useful and re-usable for machine systems.

An AMP page is, in essence, a fully machine-readable piece of content that is cloud-hosted (in the AMP cache) and accessible for any machine system.

Google AMP Cache

Facebook’s adoption of the AMP standard shows that it transcends Google’s ambitions, and can be used by any developer of machine systems that wants to provide content to users.

Rather than relying on the cumbersome and complicated process of crawling the web, a machine system can use the AMP cache to retrieve content in a quick and efficient way and re-purpose this content for whatever relevant purpose.

The AMP standard is a near perfect machine-ready web content standard, combining structured data with fully approved JavaScript and ridiculously fast load times. While users reap the rewards as well, ultimately it will be machine systems that benefit the most from AMP.

The Future of Websites

Since the invention of the web we have seen websites primarily as end destinations for people. We want our customers to see our webpages, read our content, and interact with our offering.

This will never disappear, but we will see another use case for websites: machine-readable data sources.

Already most of the web’s traffic is automated bot traffic. This will continue to be the case as human interaction with the web moves to screen-less environments such as voice assistants and interconnected devices.

Websites are no longer primarily for human consumption; machines will be your website’s biggest audience. It’s up to you to make sure your online content is fully machine-readable and machine-usable.