I’m either a total idiot or a raving lunatic or both.
I can’t even pinpoint within my data as to when this foolish or unfortunate incident occured because well… just look at the data.
That is nigh on 5 years of Google Blogsearch referral data, though because of the long time range it is listed as sampled data. There may be some traffic sources I have missed – variations of url depending on how Google were displaying blogsearch pages, but that is referrals from http://blogsearch.google.com
- In the past when I have checked at various times (though admittedly it has been a while) I had ranked well in blogsearch.
- There have always been so few referrals that I have more or less ignored Blogsearch
- The only blogsearch I have used are for links to me – in the WordPress interface, and occasionally to grab more results than WordPress displays – it has always been unreliable
- Blogsearch picks up links from all kinds of things
- blogroll links
- if you send a pingback and a blog displays them, it will come up in blogsearch too
Because of all of these factors I had always assumed that the idea of using noindex on a feed of any kind was to prevent that feed appearing in Google’s primary organic results.
A pretty Feedburner feed isn’t a terrible landing page, but it is possble to do better. I have even written about using my feedburner URL when leaving blog comments in the past, as in some ways it immediately signals you want people to subscribe more so than linking to a blog.
Other RSS search engines were indexing my feed content – Technorati, Blogcatalog, Icerocket – my feeds were being read by my readers, picked up by various Twitter robots etc.
And of course my content remained indexed in Google’s primary organic index.
But then a few days ago I was browsing a little and looking for additional sources for a story followon, and noticed I wasn’t listed for previous coverage. I hadn’t been specific in the title that I was related… but there wasn’t a lot of competition.
Then I discovered this:-
My first thought for 5 minutes was somehow for some crazy reason I had become penalized in Google Blogsearch – then I rationalized it in thinking it must be something to do with noindex settings in Feedburner.
You see I had never equated noindex with a blog search engine – every other blog search engine which sent me traffic was still picking up my content and sending me traffic.
Google Indexing RSS Feeds
There is still a very real need for a way to tell Google…
“Hey Google, this is my RSS feed – you can index it for Google Blog Search, but I don’t want it to appear in the organic search results.”
3 years ago Google were saying they were working to remove RSS feeds from organic search.
3 years later feeds from Feedburner are still appearing in organic search results.
http://feeds.feedburner.com all the results seem to have been removed
http://feeds2.feedburner.com there still seems to be plenty of feeds within the search results
Information About & Help With Feedburner Since Google Acquisition
On a scale of 1 to 10 Google Feedburner Support gets a 2 – it is a free service, Google monetize it providing Adsense for feeds, but don’t expect anyone to answer support queries in the Google groups from Feedburner.
Documentation is sparse – hardly updated in the over 3 years since Google bought Feedburner… but then there haven’t been too many visible changes other than adding Adsense. I am sure there have been changes to help with scaling, especially how it eventually was made easier to integrate with Blogspot, but very little for anyone else.
Feedburner Noindex Controls
So this I believe is the culprit
This is the code that gets added to the RSS feed.
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" /><meta xmlns="http://pipes.yahoo.com" name="pipes" content="noprocess" />
That data is still not transferred to feed items that are shared within Google Reader or feeds such as tags created there – which can get fed to other places and indexed.
The left hand doesn’t know what the right hand is doing
I have explained my whoopsie, but somewhere in the Googleplex they are a little confused over what they are doing as well.
Blogsearch isn’t the only search for my Blog posts
For instance there is Google Buzz
Now remember – Google is treating the noindex on my RSS feed as being an instruction to not include my content in Google BlogSearch…. so you would expect that instruction to be universal for the RSS content.
Those were taken from the PUBLIC timeline of Buzz. That is content that Google isn’t indexing on Blogsearch due to a noindex in the XML.
I also have my full content being fed into Facebook and being indexed and made searchable within Facebook, but at least that is my choice.
The only way to prevent content being shared and indexed is currently to block Google Reader from accessing feeds. I have been trying for over 4 years to get Google to introduce more publisher controls for sharing… as it would be easy to share private content from Google Reader by mistake… with Pubsubhubbub it can be broadcast by mistake to your 1000s of Buzz subscribers instantly.
This is possibly why Google have never introduced support for http authentication.
With their current stance for sharing freedoms, it doesn’t make sense for them to treat the current xml declarations as an instruction not to index the content in Blogsearch, as the content is in Buzz anyway. It should be treated as just a noindex for the page.
Alternatively they should add support for x-header noindex, then noindex in the XML would be for search engines, and it should travel with each content item, even to Buzz, possibly with no way to share the content.