This is part two of the post Information Architecture: Common pitfalls. The first part was published yesterday, find it here.
And if you’re done with that it’s now time to move on to re-organizing the contents. Things you might consider:
Flattened site architecture
Considering the facts that there is a limited amount of links each page can carry (in the webmaster help center Google states that it should be not more than 100 links – however this very much depends on the authority the domain has) and that it is important how many steps down a page resides within the domain (aka how many clicks from the homepage does it take me to get to that specific page?) there really is one problem to tackle: Keep the “depth” to a reasonable number. There is a great visualization on this over at webmasterworld.com:
I usually recommend having a maximum of four levels; however this really depends on the site itself and of course it has to be considered how much content is available, etc. If you still don’t have a clue what the hell I’m talking about I really recommend also having a look at Richard Baxter’s chart of a flattened site structure and reading the post over there as well. That being said there are of course dozens of other things that have to be taken into account. On a per-page basis for example it is really important to know the main goal of that specific page (for example is it a hub page vs. a landing-page) and adjust the internal link structure properly.
External links and internal cross linking
I’m not going to dig into link-building for those pages at all – but it’s for sure that you will need a decent amount of external links no matter if you have a flattened site architecture or not. What you will need to do as well is to consider is a consistent and intelligent internal cross linking which mainly has two very important factors: Besides of making sure you use the right anchor texts for each page (to actually support the keywords you’re targeting on that specific landing-page) you should also make sure that you do cross link. To identify which pages should be providing links and which ones should be receiving links it could be a start to just check for the most linked-to pages (maybe you want to differentiate between internal and external ones and give them a separate weight) and the ones receiving the smallest amount of links – these are your givers (the more linked-ones) and takers (the lesser linked pages). You could probably say that if there is an intelligent internal cross-linking that supports your users to navigate between pages this also helps search engine crawlers to index your content.
One page, one “static” URL
Don’t use dynamic URLs – it’s as simple as that. Session id’s or any other kind of dynamic URL parameters are just nasty when it comes down to make your content well organized. If you have not heard of “mod_rewrite” go ask Google (yes, there is a solution for nearly every web-server, including IIS and more special ones like lighttpd or whatever you’re using).
Something that pretty much directly correlates with good information architecture is a clean and consistent site navigation concept. To make it a little easier “to get” (besides of technical requirements like text & CSS vs. images): If it’s easy to understand and navigate for a website’s visitor it’s probably the same for a search engine crawler. And seriously: Make it consistent – where it appears, how the navigational items are organized and displayed, etc. As shown in the illustration earlier it’s a pretty bad idea to link all pages from every page (doing that would destroy the themed pyramids). Figuring out what needs to be in the navigation at first hand is another crucial step. As well as defining which navigation to put on which page. Since the navigation can be slightly different for each landing-page.
Bread crumbs (“the good”)
Another well proven navigation concept which often helps search engines (and users) to understand how contents on a domain are structured is a so called bread crumb (BC). I think you all are well aware of how this might look like (often found in the top left corner of each landing-page), but just to be sure:
You are here: > Brand name > Category > Product
If you’re consistently using this BC concept and interlink the different levels with their desired target pages you’ll create a perfect way for all parties to especially navigate to upper levels (and in return: explain where the currently browsed content belongs to). A nice side effect: If the bread crumb resides somewhere on top within the source code you also have a perfect scenario when it comes down to using the right internal anchor texts for the linked pages.
Pagination (“the bad”)
When it comes to pagination, unfortunately there is not only one answer – but from what I have seen most of the time it causes way more damage than it helps for anything. Before we look at why this might be the case, let’s consider one single fact: More than 95% of search engine referrals come from page one of the search result pages. This means even on Google less than five percent of all users ever go to page two and therefore actively use the pagination elements! Or course this varies from page to page, industry to industry (like a lot of other things before), etc. but in my opinion it shows that pagination might not be the very best way to guide users and search engines.
That being said: Most of the time those paged contents (like an article or category for example) do target the same keywords, they even do have the same page titles and probably the same meta description. The only difference might be that there is something like “– page 2” being appended. Wow… not so good because it might cause duplicate content, again. If – for any reason – you think you do need paging, you should at least consider not pushing these sites into the search engines, for example by using a “noindex” or the canonical tag.
The calendar (“the ugly”)
A perfect example that really causes headache for search engines are calendars. What, calendars? Yep. Why is that, you ask? Because it is an infinite amount of pages to crawl for a search engine (given the scenario that each month of a year does have its own URL to show the selected content). To make this one short: Especially Google seems to be smart enough – at least after a while – to figure out that these pages do not really provide any value. If that’s the case Google stops crawling them but not every search engine crawler is that smart – and it’s also a bad user experience being able to hit the “next month”-button like forever.
Since nearly every SEO seems to be a tool fanatic (maybe it’s just because we have so many things to do…) I don’t want to miss on pointing out a couple of tools and methods which help a lot in analyzing and optimizing potential IA flaws. Let’s have a look:
Google Webmaster Tools
I’m pretty sure everyone knows but just to be sure: Google has a nice report on crawling errors hidden in the GWT, to use it just select the desired profile on the dashboard and click “diagnosis” -> “crawling errors”. There you can find a complete overview on errors the Google crawler ran into – you can even do an export and I really recommend you make this one of the standard procedures to have a look on. It’s pretty valuable to understand what might go wrong.
One last remark: Just a few days ago, I think it was the 27th of July, Google announced that they’ll now send out emails if the crawling error count does increase massively – so that you’ll know pretty quickly if something goes really wrong (for example after deploying a new version of you website or similar). In my opinion this is really great and useful feature!
Xenu Link Sleuth
Another way to verify if your site – and especially the interlinking between pages – does work is crawling through the site using Xenu. This tiny application needs to be downloaded to your local computer and can be used by simply entering a URL to start with. And after you did that go grab a coffee or two and wait for the results. In the end you’ll get a nice list which you can compare against the sites you do have. And I recommend you especially pay attention to your landing-pages – did Xenu reach them? If not it’s a high risk that search engines crawlers won’t find them either.
Sitemaps you ask? Well, not in the “classic” way. Actually I can’t really remember where I heard this one the first time but it must have been at some conference at the beginning of this year. However, the idea was pretty simple: Given that fact that every domain has a so called indexation cap (it’s similar to the “how many links per page”-issue; based on trust, PageRank, DC issues and other factors) you want to make sure that you important pages will be indexed. Monitoring this just based on how many pages are indexed in total does not really help since it doesn’t say anything about which pages are actually in there.
So why not use multiple xml sitemaps? For example a single one for categories, one for sub categories, another for product pages or top selling products this month, etc. and submit those single sitemaps to Google using the GWT. Doing so you’ll get a pretty nice statistic (for each sitemap) telling you exactly how many pages (out of the submitted ones) has been included to the index. Looking at these numbers on a regularly basis can tell you what pages might need some more link juice (or even on-page measures) to get indexed.