Duplication, Aggregation, Syndication, Affiliates, Scraping, and Information Architectur : SMX East New York 2011, Day 2!

Sep 15, 2011 | 1,966 views | by Navneet Kaushal
VN:F [1.9.20_1166]
Rating: 5.0/5 (1 vote cast)

This session of SMX New York 2011 Day 2 focused on how a well planned information architecture minimize the search obstacles and how unavoidable duplication can be handled to avoid PageRank dilution? The expert panelists of this session tries to sort through the bewildering world of near duplication.


  • Vanessa Fox, Contributing Editor, Search Engine Land (@vanessafox)

Q&A Moderator:

  • Erika Mamber, Vice President, Organic Traffic & SEO, Demand Media (@erikamamber)


  • Brian Cosgrove, VP Analytics & Engineering, TPG
  • Vanessa Fox, Contributing Editor, Search Engine Land (@vanessafox)
  • Matt Heist, CEO, High Gear Media

Brian Cosgrove, VP Analytics & Engineering, TPG, begins the session by saying that Thin Content, Low Quality and PageRank Dilution are major problems on the web today. According to him, the reason is a process issue and not technical. He then discusses the following issues:

Feeds from suppliers:

  • Products
  • Real Estate Listings
  • Travel Listings
  • Deals
  • Other Feeds

Too many categories:

  • Men’s Hats
  • Brown Hats
  • Men’s Brown Hats
  • Cowboy Hats
  • Men’s Brown Cowboy Hats
  • Straw Hats
  • Men’s Brown Cowboy Straw Hats

Too many similar items:
Nearly identical items are better arranged as options on one page. Brian says that unique content is the cost of entry for SEO and recommends that if you focus on the category pages where those feeds move into and work on developing clean and unique content there.

He then next presents various Tool Sets for Success.

SEO Strategy:

  • Define SEO opportunity related to business goals
  • Categories of terms and relative volumes
  • Competitive analysis
  • Paid search and social coordination plans
  • High level tactical plans: link acquisition

Keyword Mapping:
Brian says that Keyword Mapping is important and it will make your life a lot easier and provide clarity in your SEO direction.

Content Strategy:

  • Define content needs as it relates to the business
  • Should have integrated SEO content needs
  • Describe categories of content needed
  • Quantify the amount of needed content
  • Provide timelines and goals for content production
  • Define the teams and roles involved

Work Flow:

Work flow should be defined for the process of content development. Brian defines Work flow as:

SEO Research – Deliver Creative Brief – Research and Write Article – Reviews (SEO, legal, editorial) – Publish

Style Guide:

  • Reiterate brand values and site values
  • Define the Web’s Voice, Tone and Style
  • List out quality guidelines
  • List out Legal guidelines and considerations
  • List out generation SEO considerations
  • Add amendments and updates often
  • Keeps this document ALIVE

Brian further, said the fact that they have this document is the only reason they’re able to outsource content. It should be ensured that the expectations from a new writer should be cleared to him and he should consider it a contract or an agreement.

Content Calendar:

  • Is a prioritized queue of content being developed
  • Include exact dates for assignment through publication
  • Contains a description, length and timing for content being assigned

Before concluding, he talks about the sites with clear voice and branding and mentions Woot, who always has pretty awesome product descriptions. They’re not rich from an SEO perspective, but they’re interesting.

The floor was next taken over by Matt Heist, CEO, High Gear Media. He initiates the session by talking about his personal experiences about URLs. His company is one of the well known content publishers focusing on the automotive vertical. They have on staff 7 full time writers and rest freelancers who produce 1,200 pieces of quality content monthly.They own and operate 4 of the leading “in market” and segment focused sites (as well as 3 opinion blogs).
Pre-Project Focus

Founding Thesis:
The founding concept was to lunch as many sites as possible, hundreds. The company launched 100s of sites with niche, passionate communities and original content contributors. Potential content contributors want to full service sites.


  • Launch each one with with thousands of pages typically found on full service sites.
  • Have it all be original content: 7 Full time writers, 20-30 freelances
  • Syndication: On-Network (small sites receive original HGM content from more established sites). Off-Network (license HGM content to branded media publishers looking for automotive content).

Matt says that he had all the above mentioned things but made a mistake by being careless while distributing content. Out of that, they had 107 sites, many of which looked the same. They were essentially competing against themselves in search. In February 2011, they had 107 sites and were competing against themselves.

The Good:

  • 25 sites (out of 105+) with solid original content
  • Passionate audience around ~10 of their sites. Traffic concentrated on key sites.

The Bad:

  • Too many sites without clear audience segments
  • Undifferentiated look and feel between sites.
  • Over-sharing of our own original content on our sites

The worst was when they were hit by Panda.
Matt Heist, then shares what he did after that. He did basic blocking and tackling:

  • Eliminated (301 redirected) non-core sites (105+) down to 7 sites (4 core sites and 3 blogs) build around specific target audience segments
  • Properly canonicalized HGM product duplicate content
  • Aggregated content with strong user engagement was kept BUT no-indexed – notwithstanding revenue impact
  • Unique expert reviews (on top of traditional expert news, analysis, opinion) written for each target segment site
  • Challenging from a P&L perspective

They balanced content between news/opinion/analysis and hardcore reviews. Audience based around differentiation; create engagement around content; reviews written with voice for targeted segment.

  • FamilyCarGuide – Family and Safety oriented news and reviews
  • MotorAuthority – Luxury, Performance

Top TakeAways:
Summary Leanings from Matt's experience:

  • Fewer/larger sites helpful on several forums. Journalists like writing for larger brands and advertisers want to speak with larger audiences
  • Differentiation around content AND design matters; design is not “fluff”
  • While costly, original content was HGM’s asset all along BUT must be disciplined redistributing content on owned and operated properties. Tough decisions around costs are required; keep original content/cut elsewhere.
  • Knock on wood: search traffic trends are positive
  • With the evolution of social, premium content that is authoritative and fresh will flourish.

The final speaker of this session was Vanessa Fox, Contributing Editor, Search Engine Land. Vanessa is working with a Task Force that has been set up in the Federal Government to clean up the 24,000 web sites they own. She shows one of the department of education sites, the student aid web site with whom she worked. Vanessa, shares her experience and thoughts explaining that you would think it’d be very easy for students to find out about student aid because it’s all online — but there are 14 different Web sites. Because every time there is a new policy about it, someone creates a new web site. You have all these different sites and the Government didn’t know what to do?

How do you fix this problem?


  • Don't think about you and what you want to tell the audience, think about what your audience needs to know.
  • Come up with an info architecture that works.

Vanessa further, explains by giving example of the site:about.com. They have 60,000 results about [counting calories]. That signals a duplicate content problem. 60,000 results for counting calories on just about.com! She then asks to use the Persona Methodology to help consolidate. Even she applied the same methodology in the Department of Education.

TakeAways from Vanessa's session:

  • Do page mapping to help consolidate.
  • You can group many of the pages together. Cluster those pages together and see what groups, maybe you can break down 60,000 into 100 or 20 pages?
  • Then when you have the 20 pages, build out the structure to lead people to the right area.

The session ended here with more Q and A.

Duplication, Aggregation, Syndication, Affiliates, Scraping, and Information Architectur : SMX East New York 2011, Day 2!, 5.0 out of 5 based on 1 rating
4.thumbnail Duplication, Aggregation, Syndication, Affiliates, Scraping, and Information Architectur : SMX East New York 2011, Day 2!

Navneet Kaushal

Navneet Kaushal is the founder and CEO of PageTraffic, an SEO Agency in India with offices in Chicago, Mumbai and London. A leading search strategist, Navneet helps clients maintain an edge in search engines and the online media. Navneet's expertise has established PageTraffic as one of the most awarded and successful search marketing agencies.
4.thumbnail Duplication, Aggregation, Syndication, Affiliates, Scraping, and Information Architectur : SMX East New York 2011, Day 2!
4.thumbnail Duplication, Aggregation, Syndication, Affiliates, Scraping, and Information Architectur : SMX East New York 2011, Day 2!