Real Answers For Technical SEO Problems: SMX East New York 2011, Day 2!

Sep 15, 2011 | 2,787 views | by Navneet Kaushal
VN:F [1.9.20_1166]
Rating: 5.0/5 (3 votes cast)

The topic of this session of SMX New York 2011 Day 2 is on “Real Answers For Technical SEO Problem”. This session is an interactive session and more of Q and A among the moderators, speakers and audiences. The aim of this session is to find out the root cause of issues and how to get them fixed once and for all.


  • Vanessa Fox, Contributing Editor, Search Engine Land (@vanessafox)

Q&A Moderator:

  • Michael Martin, Senior SEO Strategist, Covario (@googleandblog)


  • Vanessa Fox, Contributing Editor, Search Engine Land (@vanessafox)
  • Todd Nemet, Director, Technical Projects, Nine By Blue (@nemet)

Todd Nemet, Director, Technical Projects, Nine By Blue, begins the session by saying that he has learned about titles at SMX and it does not really matter. So he renamed his. He then said that, we are not going to look at the web site or any of that stuff. Instead, he emphasized to look on things like:

  • Network
  • Web access longs
  • HTTP response codes
  • HTTP response headers and
  • Talk to the network admin/developers

Questions for IT/Developers:

  • Is your load balancing round robin?
  • How do you monitor your site? [Because it's rude to ask "DO you monitor your site?"]
  • Are there are reverse proxies or CDNs in your configuration?
  • Do you do any URL rewriting
  • May I have a sample of your web access log files?

From there he will:

  • Check load balancing
  • Check server latency. Do 10 real quick grabs of the home page and time it.
  • Check network latency: They can see a slow network, packet loss. You’ll want to talk to a network, engineer.
  • Check for duplicate sites to see how many have DNS records

Web access log analysis:
Todd explains this by saying that , “you have browsers that go to a Web server and a bot that goes to a Web server. Every time something is accessed, an entry gets written into a log file.”

He wants to get all the log files from the server. What kind of data is in there?

  • IP address and user-agent: Whose doing the crawling?
  • Date: How often are we being crawled?
  • Referrer: What’s the referring inbound link?

The URL being requested.

What are the http response codes?
They have created a web log analyzer where their clients upload web log files and take the relevant fields. In the Excel file you get the bot activity, a hierarchical view of what’s happening, query parameters, reverse DNS, HTTP response codes, etc.

Todd comes up with more examples which are as follows:

  1. He shows an example of crawling inefficiency and a site with about a zillion dynamic sitemaps. For this, Google was spending all its time crawling the million sitemaps and nothing else. Then to fix it, they changed the way sitemaps were generated.
  2. His next example was about a site that was getting badly scraped. What they did was monitor it to block all the bad IPs.

Duplicate content problems:
Site has 7 version of their home page indexed. Links are going all over the place and being diffused and you’re wasting your crawl time.

Solution: URLRewrite (IIS7+) and URLRewriter (IIS6+)

Todd mentions about other duplicate content problems that come from sorting issues. The solution is to use the canonical tag or redirects so Google is ignoring certain parameters. Your log files will tell you how bad that problem really is.

Poor error handling: Your error pages tend to be the most crawled pages so they’ll look like they’re very important pages, which will bump down other pages.

TakeAways from Todd's views and examples:

  • Look to see if a site is cache friendly.
  • Look for character encoding. You want to URL encode those characters.
  • Cache control headers
  • Compression
  • DNS configuration
  • Domain health

This session of SMX New York 2011 Day 2 ended here and went head to Q and A. Details are enlisted below:

Q: The client’s home page had an internal server error which was corrected after Google re-indexed the page. Now the home page has been taken out of Google’s index. How can we get it back in?

To this, Vanessa, Contributing Editor, Search Engine Land replies by saying that she does not think that it has been re-indexed. She says as Google frequently crawls to your homepage, you dont have to submit. But if your home page is not in the index, something is wrong. Even crappy sites should get their home pages crawled every day.

Q: Should multiple sites be on different IPs to help with link development? Should domain registrations be private or public?

If you’re not a spammer, then the answer would be that there is no reason to hide from the search engines the site that you own. There is no problem with owning multiple sites. If you ARE a spammer, Google will find out that you own those 15,000 domains that are linking to one another.

A: Todd and Vanessa: No, you’re fine. If you don’t do it, you’ll have a major problem.

Googlebot does not crawl the name domain, it is making request via IP.

Todd: I see this quite a bit. It will be in the Webmaster Tools as one of the most frequent linking domains.

Question on going international: They can only sell specific products if they’re going to Canada. But it’s their regular .com product pages which are ranking really well in the top few positions.

Vanessa: What happens with internationalization is that Google says its fine to have 4 different English language sites (US, UK, Australia, etc) but there’s all these different relevance signals that go into ranking. Further, she says that One of them certainly is the country. Part of that is currency, shipping, TLD, having it on a subdomain, etc. All of those should go toward country relevance. The problem is there are other signals that go into ranking as well. Sometimes those other signals outweigh those location relevance signals. You can have IP detection that redirects Canadian users to the Canadian version. It’s a tough problem. You want to start getting local authority by going after local links.

Todd: If you can get shell access, then you can filter that and then zip it.

For Vanessa, Google’s in an infinite loop because the site is pointing at a short URL. When someone has a cookie they can load that URL, but when you arrive for the first time, it appends session information. So Googlebot is never getting a 200 on that short URL.

The Q and A session ended here. For more information on SMX New York, keep tuned to Page Traffic Buzz.

Real Answers For Technical SEO Problems: SMX East New York 2011, Day 2!, 5.0 out of 5 based on 3 ratings
4.thumbnail Real Answers For Technical SEO Problems: SMX East New York 2011, Day 2!

Navneet Kaushal

Navneet Kaushal is the founder and CEO of PageTraffic, an SEO Agency in India with offices in Chicago, Mumbai and London. A leading search strategist, Navneet helps clients maintain an edge in search engines and the online media. Navneet's expertise has established PageTraffic as one of the most awarded and successful search marketing agencies.
4.thumbnail Real Answers For Technical SEO Problems: SMX East New York 2011, Day 2!
4.thumbnail Real Answers For Technical SEO Problems: SMX East New York 2011, Day 2!