WordPress.com Bugged XML Sitemaps

 

WordPress.com has added XML sitemaps so I thought I would take a glance at their implementation.

My immediate though was to take a look at Lorelle’s sitemap.xml

  • Homepage daily priority
  • Every other page updated on a weekly basis?

That seems like a good way to tell the spiders to index your site less often than they currently do.

With Lorelle you would certainly want spiders checking the home page hourly as she is sometimes the source of breaking news.

Then I looked at the sitemap with a little more detail, and in particular the entry for her most recent post, the Cyclical Nature of Blog Stats – a post worthy of a link anyway so this is a 2-in-1.

This entry was written by Lorelle VanFossen and posted on June 16, 2008 at 4:57 am

Ah but I know Lorelle writes posts sometimes in batches and schedules them for publishing. Lets look at the XML

<loc>http://lorelle.wordpress.com/2008/06/16/the-cyclical-nature-of-blog-stats/</loc>
		<changefreq>weekly</changefreq>
		<priority>0.6</priority>

		<lastmod>2008-06-11T18:59:24+00:00</lastmod>

Last modified 5 days before it was published.

Just for good measure, lets look at the home page

<loc>http://lorelle.wordpress.com/</loc>
		<changefreq>daily</changefreq>
		<priority>1.0</priority>
		<lastmod>2008-06-12T02:05:56+00:00</lastmod>

Wrong again – today is the 17th, Lorelle published a post on 16th June, which updated the home page, but it is not reflected in the sitemap.

Sometimes you might be better off with no sitemap at all…

5/10 for finally fulfilling a user request
1/10 for implementation (so far)

 

Liked this post? Follow this blog to get more. Follow

Comments

  1. says

    An interesting and possibly simple idea would be to derive the frequency for the home page based on the average number of posts per day over the last 30 day window.

    Couple benefits I see:
    * People like Lorelle that post quite often, it’ll be low
    * People that post sporadically, it’ll be higher
    * Not all frequencies across wordpress.com will be the same. This may or may not have any impact but I suspect that Google will love you more (read:trust in some form) if the sitemap is accurate and reflects what Googlebot is seeing in terms of changesets going through a site.

    A similar line of thinking could be used for post pages. Default the frequency to a derivative of number of comments per post over a rolling 30 day window. After the statistics for your site show that you only receive comments for x days, it gets increased to weekly or higher depending.

  2. says

    Andy,
    Very interesting. I saw this announcement this morning. I am upset that the spiders are only crawling pages every 7 days…wish we could edit it on wordpress.com where my blog is currently…at this time…hosted.

    hopefully they will be doing this soon!

  3. zach says

    Wow, great article. Looking back at your past articles has taught me a lot. I subscribed to your feed. Maybe you could take a look at my site, and maybe even subscribe if you would like to.

    Thanks,
    Zach

  4. says

    This is great information, Andy, but for those who don’t really understand what sitemaps are, WordPress and WordPress.com uses pings when a post is published through ping-o-matic to alert search engines and others that you’ve published a new post. This is the traditional “invitation” for them to send out their search bots.

    The sitemaps are a constantly current table of contents for your blog, updated every time you publish a new post. It acts like a road map, telling the search bots which recognize XML sitemaps which pages to index. On the first run through, it indexes everything. On the next visits, it can check via the dates to find out what is new or modified and index only the new information, allowing the bots to move faster through the sites and not waste so much time with duplicating effort and information.

    As for those who fear having these activated, they are a standard on most sites today, invisible to users and administrators. You control whether or not you want your WordPress or WordPress.com indexed through the Options panel.

    Sitemaps are recognized by Google, Yahoo, and MSN last time I was paying attention to these things. Not all search engines or site indexing bots recognize them, so while it improves indexing, keywords, links, and other traditional techniques still holds sway over SEO. This is just a tool that speeds up the work of the search engine bots.

  5. says

    Andy, what plugin do you suggest using for sitemaps? Someone told me your RSS feed is consumable as a site map (it is xml) but it just errors in Webmaster tools. I don’t have time to play around so I thought you would send us to the right one.

      • JunkieYard Dot Com says

        Yep, that’s the one I’m using for my sites. If you want a sitemap with your WP, that’s the one. It got so many options that you can configure and it will ping all the search engine everytime you publish a new post. They will come crawling to your sites. :D

        • says

          Spot on! I use this sitemap generator on my self hosted WP blog and ever since I started using it I have noticed a huge improvement in how fast my pages get indexed in different search engines. The amount of traffic to my blog has also increased greatly.

  6. says

    Very nice. Just another reason why I don’t use the big WP. We’ve seen this a lot on WP sites but never really thought about the consequences. We’ve played around a lot with our sitemap and are now using a script to generate it and base priority and change frequency on the average weekly traffic a post gets with special weight given to certain types of posts like listings and MLS RSS feed. This means our sitemaps are constantly in flux which seems to work very well. Google crawls the sites regularly and we saw massive SERP changes on the listing and RSS feed pages once this was implemented. I think that might be a valuable plugin for WP- a script that can help identify which posts should be given priority and generate new priority and change frequency data.

  7. Mark says

    Thats the problem with sitemaps. If they aren’t arranged properly, they can seriously hurt your google traffic

  8. Cornel says

    Not just your traffic… they hurt your ranking as well especially if they are incorrectly formatted. I have seen it in a couple of blogs I was runnning on WordPress MU, after the corrections to the sitemap generator I was using one of the two blogs gained about 3 positions with no extra postings, the other just one.

  9. One Year Millionaire says

    So if you have a sitemap that isn’t properly set up it would hurt you more than no sitemap at all?

  10. says

    I don’t believe you’re ever better off without a xml sitemap. If you’re having trouble, fix it. Don’t disable it.

    I’ve heard of some people having trouble with Google XML Sitemaps plugin on WordPress for scheduled posts but I’ve never had a problem on my blogs.

  11. Detectives says

    I think keeping no sitemap is in fact much better than keeping a faulty one. I believe it affects in the crawling of the site by the search engine spiders. This is another reason why we try to avoid using WordPress.

  12. mtb says

    This post and comments are a littledated but my findings with sitemaps has been nothing but positive. I occasionally check my dynamic site map and it is generating the correct links. I also occasionally resubmit to the search engines if a lot of links were genereated. My opinion is they do us good.

Trackbacks