What's Really Important for Technical SEO? - SMX London

As the last format seemed to work quite well I’m going to focus on just the top tips today. We’ve got Rich Baxter from SEOgadget, Martijn Beijk of Onetomarket (also a fellow State of Search blogger), Jonathan Hochman of JE Hochman and Associates, and finally John Mueller from Google Webmaster Trends.

Really looking forward to this one, so away we go!

Top tips from Rich:

1. The single biggest problem impacting most eCommerce or large sites are having internal pages without unique content or with boilerplate content. Justify it by means of writing persuasive and beautifully written content to inspire your customers to trust, be inspired and buy!

2. Internal links from the homepages to key internal category pages remains an important part of your strategy and one that is often unexplored. Linking to the key category pages still makes a huge difference.

3. Be very careful with faceted navigation or pagination as excessive duplication of content is inexcusable, as is waiting for Google to sort it out for you. Key point here would be to noindex but follow a lot of pages. Rich also suggested not using robots.txt to accomplish this where possible but was a bit coy as to what he would use instead.

4. Indexed staging server (absolutely essential and something I covered at a4u London last year).

5. Google will index other subdomains beyond just www. and the top level domain (e.g. ww.reddit.com) – uh oh!

6. Always check that your site is performing a proper 404 error (not a soft error) Rich mentions that you should use Live HTTP headers for this (which doesn’t work on Firefox 4).

7. Some custom built CMS do some wonky things, make sure you have the most important files (e.g. sitemap .xml) and that they respond with the correct server header response.

8. Use SEOmoz’s toolbar to test out differen user agents – and disable JavaScript and CSS for an even better look at what they search engines are seeing.

9. Avoid too many 301 redirects. Rich referenced an example of a site with 40,000+ internal 301 redirects. Leaking link juice everywhere! Internal 301 redirects is never a great solution.

10. Make sure your search boxes and form functionality to make sure HTML is stripped particularly for sites that create user profile pages and search results.

11. Don’t host your UGC (or any other important content) in Javascript.

12. Make your data/content embeddable. Make it easy for people to link to you!

Top tips from Martijn:

1. Speed matters because Google announced that they wanted to make the web faster. He made reference to loads of announcements (page pseed API, Google Public DNS, Page speed in webmaster tools, etc.). Oh, and it helps with conversions!

2. Use W3 total cache to help improve server performance.

3. Beware of the lazy programmer syndrome – just because it runs doesn’t mean it was done well.

4. Apache has the largest marketshare of servers, but the busiest sites on the internet almost explicitly use Apache though nginix is an up and comer.

5. Martijn suggests using virtual host and NOT .htaccess. You should not use shared hosting, if you are doing so you are not taking your business seriously.

6. Use ApacheBench to test performance. From Martijn’s tests Nginx (as well as Nginx + Varnish) faster than Apcahce and Apache is faster than ISS.

7. If you are using WordPress check out: http://bit.ly/speedupwordpress

Top Tips from Jonathan:

1. The devils are in the details – when moving from your test servers to live you need to make absolutely sure your site is being indexed and crawled.

2. Correct http status codes are immensely important. Either return a 301 redirect or a 404 error code whenever a page is deleted or moved. Dead pages are no fun for anyone. Make sure you have a custom 404 error page with branding and navigation.

3. Almost all CMS programmes need an aftermarket addition, either use WordPress SEO (from Yoast) or try searching for {your CMS} + SEO for ways to help sort out your CMS.

4. Google has some stealth crawlers not called Googlebot… don’t serve different information to their bots!

5. Failure to create unique titles and meta descriptions is a sign of low quality, has lesser CTR and makes pages more likely to be treated as duplicate.

6. Submitting your .xml sitemap is not a magic SEO tool but it is a great way to get feedback on which pages have been indexed. It can also help you find duplicate content!

7. Dead links (broken links) are bad for user experience and an absolute waste of link juice (use Google Webmaster Tools to help find this).

8. If your site gets hacked, your traffic will drop. Scanning only detects 30% of threats, however, “File Integrity Monitoring” detects ~100%. For SMB’s Jonathan suggests CodeGuard, for Enterprise sites he suggests: NetIQ – TripWire – nCircle.

9. Failure to patch WordPress is the number one source of exploits.

10. SEO intangibles – happy visitors generate links, referrals, Tweets, etc. How do you get this? Your site should load quickly, correctly, smoothly, on any browser, any computer and any mobile device.

11. People do print things out, make sure you have a print media stylesheet and get rid of menus and so forth.

12. Use WuFoo to create forms and reduce as much friction as possible!

*A lot of these included snippets of code that could not be taken down quickly enough. Please look for Jonathan’s slides when they come online.

Top tips from John:

1. All of the above is relevant because it is what Google does.

2. Google URLs bucket includes: known URLs, new URLs from links, Sitemaps and feeds and from the Add-URL page.

3. Google may try crawling different keywords even if your orphaned content is not well indexed using searchable web forms.

4. Google’s scheduler tries to treat your server with respect, possible reasons that would hurt the scheduler would be a slow server or having too many URLs. (Shared servers are not advisable!!).

5. Your robots.txt file must be stable and should not change throughout the day or too regularly, or based on user agent.

6. You can monitor Google’s whole pipeline by using Webmaster Tools and fetching a page as Googlebot. This will flag up any issues that you may send to Google’s scheduler.

7. Google’s Parser tries to extract text and context as well as the links. Reasons this might go wrong would be having bad content (CMS trying to optimise for Googlebot may lead to broken HTML that can’t be crawled and indexed properly).

8. Additionally, “soft 404” pages tells Google that you want these pages indexed – don’t do it.