SEO Linking Gotchas Even The Pros Make

 

2008 SEMMY Runner-Up I am going to attempt to debunk almost every WordPress SEO “Expert” article ever written, and in some respects this article even debunks some of the things I have written in the past.

This article does not reference Google Toolbar PageRank in any way

First of all you are going to need to do a little homework.

Eric Enge interview with Matt Cutts

The Eric Enge interview with Matt Cutts was truly exceptional and revealed a number of gotchas that for some reason continue to be circulated.

Key takeaways

Matt Cutts: … Now, robots.txt says you are not allowed to crawl a page, and Google therefore does not crawl pages that are forbidden in robots.txt. However, they can accrue PageRank, and they can be returned in our search results.

Matt Cutts: … So, with robots.txt for good reasons we’ve shown the reference even if we can’t crawl it, whereas if we crawl a page and find a Meta tag that says NoIndex, we won’t even return that page. For better or for worse that’s the decision that we’ve made. I believe Yahoo and Microsoft might handle NoIndex slightly differently which is little unfortunate, but everybody gets to choose how they want to handle different tags.

Eric Enge: Can a NoIndex page accumulate PageRank?

Matt Cutts: A NoIndex page can accumulate PageRank, because the links are still followed outwards from a NoIndex page.

Eric Enge: So, it can accumulate and pass PageRank.

Matt Cutts: Right, and it will still accumulate PageRank, but it won’t be showing in our Index. So, I wouldn’t make a NoIndex page that itself is a dead end. You can make a NoIndex page that has links to lots of other pages.

For example you might want to have a master Sitemap page and for whatever reason NoIndex that, but then have links to all your sub Sitemaps.

I have just provided a couple of highlights, I am not attempting to replace a need for visiting the site I am citing. This is something I hate seeing, when people take other people’s content and repurpose it, thus making the original article worthless.
There are a few other gotchas in there, I suggest you read it 2 or 3 times to really understand what was said, and what wasn’t said.

Dangling Pages

One of the best descriptions of dangling pages is on the Webworkshop site, though they are assuming that links are totally taken out of the equation based on what they quote from the PageRank paper.

“Dangling links are simply links that point to any page with no outgoing links. They affect the model because it is not clear where their weight should be distributed, and there are a large number of them. Often these dangling links are simply pages that we have not downloaded yet……….Because dangling links do not affect the ranking of any other page directly, we simply remove them from the system until all the PageRanks are calculated. After all the PageRanks are calculated they can be added back in without affecting things significantly.” – extract from the original PageRank paper by Google’s founders, Sergey Brin and Lawrence Page.

Alternate interpretation

This is just an aside, as the amount of juice lost to dangling pages currently is hard to determine, and could be handled differently

They are assuming that if page A links to 6 other pages, 5 of them being dangling links, then the website will be treated as only having 2 pages until the end of the calculation.

Whilst I haven’t delved into the maths (and probably couldn’t through lack of information and lack of knowledge), it also seems to me that at the time the pages are taken out of the cyclic calculation, a percentage of the link value can still be taken with them.

Thus though the site for cyclic calculations will be just 2 pages, the link from A to B might only transfer 1/6 of the juice on each cycle.

At the time the original paper was written, Google only had a small proportion of the web indexed due to hardware and operating system restraints.
In modern times they have a lot more indexed, thus a more complex way of handling dangling pages could be possible.

More food for thought, a link to a page that is considered supplemental could be treated as a full link or as a link to a dangling page, or some other variant.

Even more food for thought, a site with multiple interlinked pages with no external links at all could be looked on as a “dangling site”.

Ultimately what is important is that dangling pages are a juice leak, though it is difficult to determine exactly how much

Additional Research On Link Juice Flow

I have referenced these works before, and I am just going to keep on referring people to them.

  • SEOFastStart by Dan Thies – a good introduction to SEO, and also introduces the ideas of controlling juice around a website – no email signup required
  • Revenge of the Mininet by Michael Campbell – a timeless classic as long as PageRank continues to be important – the download page isn’t hidden if you really don’t want to sign up to Michael’s mailing list, but I have been on his list for years.
  • Dynamic Linking by Leslie Rhode – A bonus that comes with Revenge of the Mininet

I mentioned these is a comment on SEOmoz recently in a discussion on PageRank, and for some reason my comment received just 2 up votes and one down vote.

I don’t gain in any material way from promoting these free ebooks, though I might gain some goodwill. The main reason I link to them is because they are a superb resource, and it saves me countless hours writing beginners material.

OK, On to some debunking

Blocking Pages With Robots.txt Creates Dangling Pages On The First Tier

In the quoted paragraph above, Matt clearly states that pages blocked with Robots.txt still accumulate juice from the links they receive.

Those pages don’t have any external 2nd tier links that are visible to a ‘bot, thus they are dangling pages.

How much juice they leak depends on how Google currently factor in dangling pages, but Matt himself suggests not to create dangling pages.

If you read any SEO Guide that suggests that the ultimate cure for duplicate content is to block it with robots.txt, I suggest you might want to question the author about dangling pages.

Meta NoIndex Follow Duplicate Content

This is a better solution than using Robots.txt, because it doesn’t create dangling pages. Links on a duplicate content page are still followed, however both internal and external links are followed and thus are leaks, often multiple leaks for the same piece of content when using CMS systems such as WordPress which create site-wide links in the sidebar when using poorly designed themes, plugins, and especially WordPress Widgets.

If you read an article suggesting using Meta Noindex Follow, ask the author how they are controlling external links on duplicate content pages.

Meta NoIndex Nofollow Duplicate Content

If you use Meta Noindex Nofollow, whilst this is handled slightly differently by Google to Robots.txt, as the page won’t appear in search results, it is still a page accumulating Google Juice if you link to it, another dangling page or node.
Second tier leaks from the page won’t leak, but the page as a whole will leak depending on how Google are currently handling dangling pages.

I don’t see people recommending this frequently, but as with Robots.txt, ask the author about dangling pages.

Dynamic Linking & rel=”nofollow”

Extensive use of Nofollow and other forms of dynamic linking are the only way to effectively prevent duplicate content pages in some way having a effect on your internal linking structure and juice flow. The Wikipedia page on Nofollow really isn’t correct.

The Dangling Sales Page

To finish I want to give you an example of how a sales page that previously might have benefited from lots of links can easily be turned into a dangling page and effectively discounted from cyclic PageRank calculations.

Sales pages started off just as a single page with no links:-

Single Page

Despite all the links coming to the site from external sources, this website is a dangling page, thus excluded from iterative PageRank calculations. It might still benefit from anchor text and other factors, but it effectively is not part of Google’s global mesh and passes on no influence.

Add Legal Paperwork And Reciprocal Links Directory:-

Sales Letter Variant with Reciprocal Link Directory

A much more structured site, and whilst it gains some benefit from reciprocating links there are 2 factors that are almost universally overlooked.

  1. No Longer A Dangling Page – because the site now has external links, it is valid as part of the global ranking calculations. Other pages as mentioned above were previously stating that the amount of juice passed to dangling pages was minimal, so this could be potentially a huge boost.
  2. More Pages Indexed – it is only a few pages, but with PageRank it is often not just how much juice you have flowing into a site, but what you do with it.

The reciprocal low quality links might not have had a huge amount of value compared to the benefit of being a member of the “iteration club” and having a few more pages indexed.

Add a link to the designer

Single Page With Designer Credit

Some early single page sales letters were not dangling pages, but didn’t benefit from any internal iterations, and acted as a conduit of juice to their web design firm.

The Danger of Using Nofollow or Robots.txt on Unimportant Pages

The Danger of Using Nofollow or Robots.txt on Unimportant Pages

I have actually seen this on a few sites:-

  • Reciprocal Link Directory Removed
  • Link to web designer removed
  • Nofollow added to legal papers that are looked on as being unimportant

Such a website is now out of the iteration club, it is a dangling page as it is no longer voting on other pages.

My Own Gotcha

I mentioned that this catches me out as well.

A while ago I wrote an article about linking to Technorati being a problem. It might still be true, but the amount of juice lost through such links might also be lower than I thought, due to Technorati using meta nofollow on every page. Technorati tag pages are themselves dangling pages with no external links.

Wikipedia and Digg on the other hand are not dangling pages. They still have external links to other sites, and thus any links to them are part of iterative calculations.

I would still say it is best to have tags pointing to your own domain tag pages, and to use nofollow on links to Wikipedia and Digg, though with Digg I suggest that is only on links to submission pages which contain no content.

Stumbleupon is also tricky – there are no external links from individual pages, but there is extensive internal linking.

With Digg and Stumbleupon, profiles rank extremely well, so you can use them for reputation management even if you get no juice direct from the profile.

I think I was the first to describe Wikipedia as a black hole of link equity, explained why you should nofollow Wikipedia extensively, and was one of the first to promote Ken’s Nofollow Wikipedia plugin.

You would have thought in 10 months they would have come up with an alternative to using nofollow on all those out-bound links.

They do however link out to a few trusted sites without nofollow, from just a few pages. I suppose Google does still allow them to be part of their iterative calculations.

Another Own Gotcha

This isn’t 100% something I can fix. I have suggested people use robots.txt on certain sites knowing it wasn’t the perfect solution.

You might notice on this site I don’t use an extensive robots.txt, and the design of my site structure is deliberate, but then at the same time I use nofollow with lots of custom theme modifications, and should use it a lot more.

Eventually I will come up with solutions to make things a little easier.

Tools In The Wrong Hands Can Be Dangerous

Using Robots.txt and Meta Noindex, Follow as a cure for duplicate content is a SEO bodge job or SEO bandaid. It may offer some benefits depending on how dangling pages are being handled, but is certainly not an ideal solution due to the amount of leaks that typically remain or dangling pages that are created.

 

Liked this post? Follow this blog to get more. Follow

Comments

    • says

      I have never been an Akismet fan http://andybeard.eu/tag/akismet

      As to the article, I feel I had to write what I had to write, as it is no good pointing this kind of thing in comments on other blogs.

      I just don’t want people to think that if they have used robots.txt or noindex follow for their duplicate content issues that they have “SEOed” their site.

  1. says

    Andy, really I have seen many SEO bloggers chalenged this, including Rand Fishkin at SeoMoz and his grandfather Si, but by far you did the best job. You hit the nail right on the head.

    Home Run

  2. says

    To be honest my preferred method of removing content is sometimes just to cloak it and 301 googlebot to the homepage and let other users see it as normal (not recommended but not too dangerous).

    Robots.txt isn’t useful as you say because if a page has enough juice to get indexed you need to pass the juice to another page.

    • says

      For some of the links it is possibly a viable solution, but I am not sure I would want to do that for all the pages that many people suggest are blocked with Robots.txt or meta tags.

      There is also a risk of diluting anchor text pointing to the home page.

  3. says

    Nice post Andy, and yesterdays.

    Andy I have a question that’s going round my head but perhaps you have come across before.

    If a page has 10 links going to 6 pages, (ie 5 links go to one page each and the other 5 go to the 6th) do you think the “link equity” is split 10 ways (with the 6th page getting the most) or is it split 6 ways evenly to the 6 unique urls?

    Shaun

    • says

      It is something I have never tested myself, and I have seen conflicting opinions not only on number but also on placement.

      Though this isn’t a conclusive test, it will be interesting to see if the anchor text from this comment is passed to the home page.

  4. says

    Andy, great post! I have a question: How Google treats feeds? Does it crawl XML? Does it know what feeds are and ignore them? Or does it actually pass PR to the feed, with PR passed onwards via links in the feed content? I link to my main feed on every page of my site. Should I nofollow that link?

    Sorry if this is a ‘silly question’, but I really don’t know the answer to this…

    • says

      Stephen, Feedbuner offer an option to have a feed blocked from search engines.
      I did it a while back as a test, but if most of the links from your feed point back to your own content, it isn’t actually a major problem.
      Think: tags, categories, related posts, internal links

      Links to comment feeds on every page are more of a problem. A while back it seems Google knocked them all into supplemental, and Matt Cutts I seem to remember stating that Google understands blog structure quite well.

      I would still use add_link_attribute plugin on as many unnecessary features as possible, especially on navigation on those duplicate content pages.

  5. says

    In terms of a blog

    leaks for the same piece of content when using CMS systems such as WordPress which create site-wide links in the sidebar when using poorly designed themes, plugins, and especially WordPress Widgets.

    So we should remove things like, categories, archives ect.. and use a noindex on the frontpage to improve our seo?

    • says

      Chris there isn’t a single method of setting up a blog for SEO, a lot depends on how you want to place emphasis on particular content.

      I have mentioned before that if you were a gambler, the following would apply.

      6+1=7
      3+4=7

      I have provided a number of solutions in my WordPress SEO masterclass
      http://andybeard.eu/2007/06/wordpress-seo-masterclass-for-competitive-niches.html

      The biggest problem is that different themes handle things link sidebar elements differently, and WordPress widgets.

      One of the key tools is the add_link_attribute plugin.

      Eventually I hope to have some better solutions with a friendly interface.

  6. says

    I knew some of that stuff myself, but i wasn’t aware of the robots.txt facts. Although it’s more than obvious.

    Good article indeed, this is an evergreen.

      • says

        I have been working on a project called “the one theme” inspired by CSS Zen Garden. The idea would be to create a single markup that could converted to theme WordPress, Blogger, NucleusCMS, typo, etc and then the pretty stuff is done only by CSS. I’d love to create the “perfect” SEO base with that markup (or designed with SEO in mind). If only to separate the design and the HTML (making wordpress themes safer).

  7. says

    “Extensive use of Nofollow and other forms of dynamic linking are the only way to effectively prevent duplicate content pages in some way having a effect on your internal linking structure and juice flow. The Wikipedia page on Nofollow really isn’t correct.”

    Hi Andy,

    I contributed a substantial part of the article to nofollow at Wikipedia and have it on my watch list and keep it updated as much as I can and time permits me to do so.

    Please elaborate what exactly is not correct in the current article. From what I take away from your post, could you argue that the article does not explain additional uses of nofollow, like the control of flow of linkjuice within your own website.

    This could be mentioned and makes sense IMO. It would at least provide some positive aspects to the whole thing and that webmasters seem to make the best possible thing out of this new tool (in contradiction to search engines who’s repurpose of the nofollow attribute causes more problems than anything else).

    Thanks

    • says

      Carsten the specific sections are these ones

      nofollow is a non-standard HTML attribute value used to instruct search engines that a hyperlink should not influence the link target’s ranking in the search engine’s index. It is intended to reduce the effectiveness of certain types of spamdexing, thereby improving the quality of search engine results and preventing spamdexing from occurring in the first place.

      I think it is currently clear that the scope is a little deeper now.

      What nofollow is not for

      The nofollow attribute value is not meant for blocking access to content or preventing content to be indexed by search engines. The proper methods for blocking search engine spiders to access content on a website or for preventing them to include the content of a page in their index are the Robots Exclusion Standard (robots.txt) for blocking access and on page Meta Elements that are designed to specify on an individual page level, what search engine spider should or should not do with the content of the crawled page.

      I think this needs to be revised in light of what Matt said in the interview on Stone Temple.

      Nofollow has been used for the control of juice since it was created almost 3 years ago, though references I know of are all on private email lists or private access web documents.
      am not sure how you can reference them but it would be more appropriate than referencing current discussion other than that Google is on record to say they are ok with it.

  8. says

    This is the best post I’ve read in months! Absolutely brilliant collection of gems, organised in a readable and useful way. I visit here every now and again, but for what it’s worth you’ve won my RSS subscription, andy!

  9. says

    I must admit that I’m coming around to understand Google’s position on the whole paid linking situation. Their seems to be a cottage industry evolving around manipulating back links and page rank.

    Sites like PPP are really harming the whole search results for everyone. I find that I might want to do a write up on one of my blogs talking about someone’s site and linking to them. This situation ruins that.

    In reality this all has derailed what page rank was supposed to be about…getting credit for others linking to your site honestly…which can’t really happen now.

  10. says

    Thanks for the article Andy,

    The whole duplicate content issue is a really difficult one to deal with – the worst thing is you never know when the algorithms are gonnna change and you have to re-think everything again.

  11. says

    Wow Andy. Im not exaggerating when I say this has got to be the best post of have read in a very long time on an SEO blog (and I have all the main ones on my feed). This really changes my mindset on a lot of issues and opens my eyes to things that I may have been doing wrong in the past. Ive gota say congratulations on an excellent post. Keep it up.

  12. says

    Excellent post, have book market this and will be reading this a lot. It will be given in snips to my tech and desgin guys to follow. Thanks again.

  13. Pet Lover says

    Wow. Definitely need to bookmark this one and read it over a second time after I get home from work. It answers a lot of questions I had about NoIndex pages. Very, very good post!

    Thanks Andy!

  14. says

    This is an eye opener. I always thought linking to external sites does not do any good to a sites/page pagerank. Infact, it seems now that we could be losing out by having dangling pages

  15. says

    Andy, I will have to agree with everyone else in stating that this is one of the best, if not the best, pieces of content I’ve read on here. I’ve relied on robots.txt for duplicate content issues but not anymore. Thank you!

  16. says

    Quite a lot of info to take in and digest. Still trying to understand seo and how it works. This post was definitely helpful. Thanks.

  17. says

    This no follow tag appears to be as dangerous as updating a plugin when you have a lot of other plugins, you never know quite how it’s going to fit in.I think I’ll wait another six months for the knowledge level to rise before I tackle no follow. Can you find and replace a whole blog then? Hell, I’ve got no rank to share so I may as well send my juice on.

    Your excellent instruction is bookmarked though, I’ll be back.

  18. says

    Firstly , thanks for the wonderful pictures you displayed , I fully get what you mean very easily. It is definitely not an easy job to get so much traffic so easily . The most important item in a website is : Users

  19. says

    Andy – once again you cease to impress with the effort and diligence to writing a great post. We are starting to use some of these tools and graphics for our internal training and documentation.

    What comes as second nature to us as internet marketers is not often as easily grasped by our employers and clients. i can’t tell you how many times I have pencil or white-board drawn the same chicken scratch tactical approaches for link structures and now I can just point them to your work here.

    Thanks for taking such effort to put out awesome content and diagrams. I have bookmarked your site on some social bookmarking sites and will spread the word to all I work with in industry.

    Best,
    Joe

  20. says

    I had a customer who was linking to Wikipedia in almost every single post that he wrote. All dofollow links, of course. Your explaination of why to nofollow Wikipedia references (with a little creative addition or two) really helped him understand.

    • says

      That is one of the better pages, certainly far better than the Wikipedia page. Unfortunately like everywhere else it didn’t take into consideration dangling pages, which mess up everyone’s thought process when they actually stop to think about what is happening.

  21. says

    @Andy

    True … However, If read very carefully … we may deduce from Ian’s writing that dangling pages are not a wise move …

    I try to use the implementation from his fifth example, which fully suggests that child pages should support the parent(s).

    Unfortunately, this too leads to child pages that “sacrifice” their PR … to the benefit of the parent …

  22. says

    Wow Andy,
    Everytime I start to think I understand the game someone like yourself throws me a little “wrench/news” that reminds me that the world is not flat. Your argument on the issues of dangling pages is tough to agree 100% without just drinking the kool-aid.

  23. says

    As so many others said above. This post is really useful and a real eye opener for someone trying her best to grapple with SEO and being thwarted at every turn with conflicting information.
    Like Julie, I think I’ll wait a couple of months for the knowledge level to accumulate before I even attempt to tackle the pros and cons of no follow and dangling pages, so this post is bookmarked until then ;)

  24. says

    This just goes to show the importance of testing in SEO. I actively test lots of things to get hard and fast answers to many of the things that remain speculative in SEO.

  25. says

    Great post – very informative and easy to read. I am still pretty new to SEO so have bookmarked your blog as I obviously still have lots and lots to learn. Thanks.

  26. says

    Excellent post a very interesting read, i too have never been an Akismet fan what so ever, can’t understand why it comes pre-packaged with wordpress, everytime i upload a fresh install of wordpress i forget that i like to remove the plugin from the wp-content folder

  27. seo_company says

    I love that everyone is getting there comment published, everyone seems to be twittering

Trackbacks

  1. [...] writer that always has tips and advice on how to improve your blogging. One of his latest articles SEO Linking Gotchas got me thinking. Then I read an article by Mert Erkal  on 10 Blog Traffic [...]

  2. [...] Robots.txt – A lot of things are being blocked off in his Robots.txt file, including date based archives, paged content, images and author pages. He is also blocking off tags – I am not going to dispute that decision, it is something to test and track. There is overlap between categories and tags – it is possible to make that work ok by including additional content and different layouts, and with some SEO linking structures you would block both, others you would enhance both. One good thing he is doing is blocking all his /go/ affiliate links. Important:- Robots.txt is a bandaid as it can create hanging or dangling pages. [...]

  3. [...] zal ik in deze post een korte samenvatting weergeven van de blog van Andy Bear. Hij legt in zijn blog uit wat de nadelen zijn van de meta noindex [...]

  4. [...] I guess it all comes down to how much value these pages add to your visitors if they land on these pages from search results. I see many websites (especially big sites) that allow their internal search results to be spidered by Googlebot. Of course, if you follow Google’s advice and if people are linking to your search results pages too, and you robots.txt them out, you might be losing out on incoming link equity. Andy beard wrote about something similar some time ago- – SEO Linking Gotchas Even the Pros Make. [...]

  5. [...] Perhaps it comes down to how much value these pages add to your visitors if they land on these pages from search results. I see many websites (especially big sites) that allow their internal search results to be spidered by Googlebot. Of course, if you follow Google’s advice and if people are linking to your search results pages too, and you robots.txt them out, you might be losing out on incoming link equity. Andy beard wrote about something similar some time ago- – SEO Linking Gotchas Even the Pros Make. [...]

  6. [...] page to keep its pr? I also think you are incorrect in reference to how the pr is transferd. Visit SEO Linking Gotchas Even The Pros Make. Getting back to my original question, now that I have ‘followed’ the links on my ‘NoIndex’ [...]

  7. [...] Perhaps it comes down to how much value these pages add to your visitors if they land on these pages from search results. I see many websites (especially big sites) that allow their internal search results to be spidered by Googlebot. Of course, if you follow Google’s advice and if people are linking to your search results pages too, and you robots.txt them out, you might be losing out on incoming link equity. Andy beard wrote about something similar some time ago- – SEO Linking Gotchas Even the Pros Make. [...]