Facebook & Twitter have some of the worst landing pages on the web.
At least if you look at it from a search engine perspective, who should assume that every visitor isn’t a member of the site they are referencing in the search engine.
It should also be understood that both Facebook & Twitter are bursting at the seams with former Google engineers & execs – they can’t claim they were unaware of what Google is looking for from content owners on the web, webmaster guidelines etc.
You can’t look at the Google cache and see exactly what Google sees, because they do some sneaky redirects which are very akin to cloaking.
I have written about this before.
This is what Google sees based upon the preview
The little piece of text at the top of the page is what amounts to your profile… you can’t count the background image if any because it can’t be read by Googlebot unless it works really hard using OCR, and certainly can’t be read by people with disabilities.
The links within the content of the page are mostly nofollow, and the links in the sidebar get blocked by robots.txt.
The link at the bottom of the page to access more content… which may be of interest to search is also blocked by robots.txt.
I am not the only one who has spent considerable time trying to get Twitter fixed. A great example is this post by Vanessa on Search Engine Land.
How Twitter’s Technical Infrastructure Issues Are Impacting Google Search Results
Facebook is worse
There is nothing there of any real value… it isn’t the timeline a logged in user might see.
First Click Free
If you want to have some kind of membership wall for users, then Google have special arrangements where you are required to show content for the first click.
Google over the years have published lots of content about what they think of cloaking.
So Googlebot is served flash based RTMP within the webmaster guidelines rather than something it might like to see which we would be quite happy to give it.
Google Isn’t Playing Fair
One area that Google isn’t necessarily playing fair is that I don’t seem to be able to view Google+ profile pages in their own cache, and they don’t give a preview of the page that Googlebot sees.
You can normally search in Google for cache:https://plus.google.com/102279602913916787678/posts or any url to get a cached version of what the crawler sees.
It is possible for every site to tell Google and other search engines not to store a cached page, so Google are well within their rights not to do so… but it prevents comparrisons.
cache:andybeard.eu – brings up a cached result
cache:https://plus.google.com/102279602913916787678/posts – does not bring up a cached result, just a 404 error
FTC Complaint over Search Plus Your World
The blogoshere love a good witch hunt, but I can’t see that Google is treating Twitter or Facebook unfairly. Eric Schmidt was quite right about some of the nofollows, but there are bigger technical restrictions in place on crawling.
I actually quite like a Google profile as a default profile and identity on the web, but Google need to live up to the promise of salmon and make it a viable endpoint for all activity, or as an alternative use it for identity, and allow me to define my own default profile.. which if I choose might be Twitter or Facebook.
I can also understand why you wouldn’t undertake the complex engineering to make such flexibility possible for your first itteration, especially with partners who are unwilling to do something similar themselves.
Just ask Twitter how many content partners they now support on the new Twiter for embeds. (I wrote them a letter a year ago and never received a response)
Update – Google Profiles Now Cached
However when I posted I had tried lots of different variations all resulting in a 404 error.
This unmodified link was previously bringing up a 404 error
It now returns what appears to be a blank page – as Michael points out if you switch off the CSS in your browser you can see the complete cached landing page.
This appears to be a recent change, though they still need to fix the canonical – the canonical changes as you navigate between tabs and between the first 2 urls on this list there is effectively a redirect loop with /posts claiming / is the canonical, but humans are redirected to /posts
All the different URLs show all of the same content, so should set whichever canonical a human is redirected to which currently is /posts
Not Total Fix
It seems some other pages are still giving 404 errors – maybe due to all the funky redirects going in circles with the canonical on occasion (this query is with HTTPS)
If you have difficulty understanding the concept of canonical, it is just like Highlander… “There should be only one” page with the same content in Google’s index, especially on the same domain.