It has been close to a year since I first starting delving into the intricacies of various forms of blog search, and 10 months since I returned to the subject.
My post yesterday on the Microsoft Yahoo deal was the ideal opportunity to see how things might have changed over the last 10 months, as it is a topic being heavily discussed on 100s of blogs.
Here are some of my previous articles on the topic, which provide a good background
In Depth: Google BlogSearch | Ranking Blog Documents Patent
Google Blog Search | How Google Blogsearch ranks your Postsâ€¦ In their own words! (or not)
Exclusive: Google Blog Search Extended Results | Supplemental Results
Google Blog Search
I grabbed some snapshots to demonstrate how things are currently shaping up on Google Blog Search based upon 2 very similar search terms.
After five hours – Search for Microsoft Yahoo
After 19 hours
After five hours – Search for Yahoo Microsoft
After 19 hours
- Keywords within the title still seem to be the primary ranking factor
- Keyword order in the title makes a significant difference
- Within the content, keyword proximity, keyword density and keyword order appear to make a difference, especially on less used combinations.
- Site authority metrics, such as PageRank, Feed Subscriber numbers, links, etc seem to play an almost insignificant role, other than possibly as a way to filter out spam
- Freshness when sorting by relevance seems to be marginal – once you have been selected as relevant, it seems you remain relevant, with relevance being recalculated periodically (hourly?)
- Tagging (rel=”tag”) may or may not be a factor – it may just add more keywords together in close proximity
- Social media bookmarking and links don’t seem to be important
- Extended results based upon the search phrase to suggest topical authority don’t seem to be a large factor
It is nice to be looked on by Google Blog Search to be more relevant than the New York Times, though it is difficult to determine why.
From a casual end user perspective, the search results were relevant and fresh – for someone looking to research a story for a blog post, they might have to use additional filters based upon time (within the last 24 hours), and maybe also sort the results by date.
Technorati Blog Search
Technorati is currently, without doubt providing fresher results than Google – refreshing a Google blog search page tracking results sorted by date was providing 10 results in the last 2 hours.
In contrast, Technorati is providing 10 results… in the last 10 minutes… and they are not spam.
Some spam can make its way into both Technorati and Google Blog Search results, Technorati’s way of filtering those out, rather than ranking based upon relevance to a search term, is to remove results based on a particular user defined authority threshold, which even “with a lot of authority” lets most established blogs through (as long as they haven’t been banned)
One thing I can’t quite work out with Technorati is why blog posts aren’t quite displayed in precise date order – sometimes a post from 20 minutes ago appears fresher than one from 10 minutes ago – it is possible that the dates are based upon when they were published, but they are displayed in the order thy were collected.
Technorati used to have a major problem with duplicate results from the same domain appearing in their search index, that appears to have been fixed.
There is no way to “rank higher” on Technorati – you are either relevant to a search or you aren’t – the primary search method is full text – I would look on tagging as more important to appear on tag based feed syndication.
Google Blog Search vs Technorati
By nature I am an inclusionist, and I feel that any voice on the blogosphere should be heard if they have something valuable to say. Google’s apparent poor indexing for me is a huge negative factor.
Google’s relevance in blog search seems to be heavily influenced by what in the old days would be looked on as keyword stuffing.
Technorati doesn’t really attempt to classify content as being more relevant, other than authroity requirements – you can select between the keywords appearing as tags, or within the text – there is no over reliance on Titles to prove that something really is relevant.
Even on a relatively hot topic, neither service is sending me a lot of traffic – the total so far is less than 20 visits… combined.
Lets look at what Techmeme doesn’t do
- Doesn’t include all sources
- No search function – I would love a database based search in reverse chronological order
- No snippets for all headlines, just the lead story – maybe this could be fixed with a mouseover and some Ajax
What Techmeme does well is provide a good overview of a breaking story, and as such it also delivers more traffic – more people find it useful.
If I read about a technology based story in a feed reader or on a social news site, I am more likely to turn to Techmeme than Technorati or Blogsearch.
Whilst Technorati has recently swithced to a more “meme like” front page, it still doesn’t provide me with the width of opinion I am looking for, and as it happens when I first started researching this post, the updates to the Microsoft / Yahoo deal were not listed as a technology news story on Technorati.
It is true that Google are slowly integrating blogsearch or blog results in their primary index, but certainly for breaking news on this topic Google Universal Search provided more of a historical reference.
Where Do I Go Second?
It used to be Google Blog Search, because Technorai had very noisy duplicate results.
I am now switching back to Technorati – I love being able to rank well on Google Blogsearch, but the criteria for ranking doesn’t currently provide more relevant results.
Technorati provides fresher results from a wider selection of blogs – chalk one up for the little guy