Yesterday Google announced a new feature for Webmaster Tools that is genuinely useful and something many website owners have requested: Index Status.
In this new report, found under the Health section in your Webmaster Tools account, Google informs you about how many pages it has included in its index.
More than that, in the Advanced tab Google tells you much more:
- How many pages it has crawled
- How many pages it has opted not to include in the index
- How many pages are blocked by robots.txt
From the relevant Google Support page:
This number indicates “the cumulative total of URLs on your site that Google has ever crawled. Not all crawled URLs get indexed, and Google may discover some URLs by other means such as inbound links from other sites. This number should increase over time as new pages are added to your site.”
Pages Not Selected
These are the pages on your site that Google has crawled, but “that are not indexed because they are substantially similar to other pages, or that have been redirected to another URL.”
Lastly, these pages are those that Google cannot crawl because they are disallowed in your robots.txt file.
This is incredibly useful information that allows webmasters to quickly troubleshoot indexing issues on their websites. For example if you inadvertedly block too many pages in robots.txt, or make a mistake in your attempts to sculpt the indexing of your site (for example of you have faceted navigation and want to ensure Google focuses on the most semantically relevant facets).
Google’s Pierre Far has elaborated on the official blog post, which is definitely worth a read as well as it’ll give you more hands-on details on how to use these new reports.