Updated: Facebook & Twitter – Lucky To Be In Google At All

Facebook & Twitter have some of the worst landing pages on the web.

At least if you look at it from a search engine perspective, who should assume that every visitor isn’t a member of the site they are referencing in the search engine.

It should also be understood that both Facebook & Twitter are bursting at the seams with former Google engineers & execs – they can’t claim they were unaware of what Google is looking for from content owners on the web, webmaster guidelines etc.

Twitter

You can’t look at the Google cache and see exactly what Google sees, because they do some sneaky redirects which are very akin to cloaking.

I have written about this before.

Video Exclusive: Has Google Given Twitter a Cloaking Penalty?

This is what Google sees based upon the preview

The little piece of text at the top of the page is what amounts to your profile… you can’t count the background image if any because it can’t be read by Googlebot unless it works really hard using OCR, and certainly can’t be read by people with disabilities.
The links within the content of the page are mostly nofollow, and the links in the sidebar get blocked by robots.txt.
The link at the bottom of the page to access more content… which may be of interest to search is also blocked by robots.txt.

I am not the only one who has spent considerable time trying to get Twitter fixed. A great example is this post by Vanessa on Search Engine Land.
How Twitter’s Technical Infrastructure Issues Are Impacting Google Search Results

Facebook

Facebook is worse

There is nothing there of any real value… it isn’t the timeline a logged in user might see.

First Click Free

If you want to have some kind of membership wall for users, then Google have special arrangements where you are required to show content for the first click.

Cloaking

Google over the years have published lots of content about what they think of cloaking.

I can still think of a few cases where some kind of cloaking would be justified. As an example on uQast we serve RTMP video with flash and use javascript “cloaking” to provide mp4 for iPhone. We could even serve that video to Googlebot’s mobile crawler without breaking Google guidelines as “cloaking” to serve content to specific browsers is allowed. But we can’t serve Googlebot which crawls for the main search index something it understands, as the Google guidelines require you treat Google as a normal desktop user browsing from California in the USA.
So Googlebot is served flash based RTMP within the webmaster guidelines rather than something it might like to see which we would be quite happy to give it.

That doesn’t prevent Google sometimes (though rarely) indexing the mobile video by figuring out the javascript, but it would be so much easier to give them something they understand.

Google Isn’t Playing Fair

One area that Google isn’t necessarily playing fair is that I don’t seem to be able to view Google+ profile pages in their own cache, and they don’t give a preview of the page that Googlebot sees.

This is my Google+ Profile

You can normally search in Google for cache:https://plus.google.com/102279602913916787678/posts or any url to get a cached version of what the crawler sees.
It is possible for every site to tell Google and other search engines not to store a cached page, so Google are well within their rights not to do so… but it prevents comparrisons.

Compare
cache:andybeard.eu – brings up a cached result
cache:https://plus.google.com/102279602913916787678/posts – does not bring up a cached result, just a 404 error

FTC Complaint over Search Plus Your World

The blogoshere love a good witch hunt, but I can’t see that Google is treating Twitter or Facebook unfairly. Eric Schmidt was quite right about some of the nofollows, but there are bigger technical restrictions in place on crawling.

I actually quite like a Google profile as a default profile and identity on the web, but Google need to live up to the promise of salmon and make it a viable endpoint for all activity, or as an alternative use it for identity, and allow me to define my own default profile.. which if I choose might be Twitter or Facebook.
I can also understand why you wouldn’t undertake the complex engineering to make such flexibility possible for your first itteration, especially with partners who are unwilling to do something similar themselves.

Just ask Twitter how many content partners they now support on the new Twiter for embeds. (I wrote them a letter a year ago and never received a response)

This post ignores what a logged in and fully javascript supporting human might experience, but in many ways Google’s profiles whilst now having a social element for years have generously linked out to any other online destination of your choosing, and provided the necessary markup to claim them as being part of your personal social graph.

Update – Google Profiles Now Cached

Michael VanDeMar left a comment showing a way to get the cached page to show by including the https protocol at the beginning of the url to query.

However when I posted I had tried lots of different variations all resulting in a 404 error.

This unmodified link was previously bringing up a 404 error
cache:https://plus.google.com/102279602913916787678/posts

It now returns what appears to be a blank page – as Michael points out if you switch off the CSS in your browser you can see the complete cached landing page.

Andy Beard Google+ Profile
Click to view full size without CSS

This appears to be a recent change, though they still need to fix the canonical – the canonical changes as you navigate between tabs and between the first 2 urls on this list there is effectively a redirect loop with /posts claiming / is the canonical, but humans are redirected to /posts

https://plus.google.com/102279602913916787678/

https://plus.google.com/102279602913916787678/posts

https://plus.google.com/102279602913916787678/about

All the different URLs show all of the same content, so should set whichever canonical a human is redirected to which currently is /posts

Not Total Fix

It seems some other pages are still giving 404 errors – maybe due to all the funky redirects going in circles with the canonical on occasion (this query is with HTTPS)

If you have difficulty understanding the concept of canonical, it is just like Highlander… “There should be only one” page with the same content in Google’s index, especially on the same domain.

Liked this post? Follow this blog to get more. Follow

Comments

  1. says

    Off the top of my head, Andy, I don’t think any https: gets cached does it? I’ll have to check to be sure.

    But, ultimately, the Plus profile is infinitely more useful to a non logged in user than any of the other profiles out there.

  2. says

    Never looked at it like this before. Interesting view point. I feel Google have a very mixed bag when it comes to their practises, some being very fair (to the point that its hurt their company) and some being terribly unfair and the rest somewhere in the middle. But everyone and his mum has a blog that attacks Google for every little thing they do wrong. Yes lets tell them its wrong but lets not judge. Being the market shareholder for the search engine industry in an environment where social networking is becoming and more and more influential part is not exactly somewhere tons of companies have been before. This is new ground and new risky ideas and the people who make them have to be embraced, whether they work or not.

  3. says

    Thanks for explaining information clearly and in a way which you have choose here, so easy
    that even an absolute beginner as I can understand
    without feeling overwhelmed by the dark terminology of
    experts. I’ve really learned from browsing
    your blog & reading post.