Salmon Protocol Endpoints & Canonicalization

This blog post mainly comprises of a message I just sent to the Salmon Protocol mailing list, but I have expanded on it a little.

Salmon Protocol is actually one of the most exciting things that will be used by Google Buzz & theoretically could dramatically make huge changes to the way communication flows around all kinds of web content, whether comments on blog posts, votes & likes on Youtube videos, maybe blog posts themselves, video replies etc.

It designates the way notifications are sent “upstream” just like Salmon swimming up a river to a specific end point.

Many pundits are looking on Google Buzz as some kind of final application, whereas in many ways it is purely an introductory interface to many information protocols that in many cases are still being roughed out by programming geeks who will define how the web will work (the way various apps talk to each other) for the next decade.

I believe what I have stated about end points is important, but might need an example.

  1. Someone comments about one of my blog posts on Buzz
  2. I write a blog post linking to someone
  3. That person has a video from Youtube embedded in their post
  4. That YouTube video is a reaction to someone else’s YouTube Video
  5. That YouTube Video was just a syndicated copy of a video originally posted on a blog post as a mp4

If that comment on Buzz is a “Salmon” how far up the stream should it flow as a final destination?

In theory the chain might never end, wheras the one definitive end point is a profile owned by the person making the comment.

Here is what I wrote last July which might give you an enhanced perspective about what I feel is exciting about “buzz”.

The Future Of Commenting And Aggregation

An even more radical approach would be to totally get rid of “comments” as a unique entity, and many other social sites for that matter, and have only unique personal streams of media, long or short form, video, pictures, text or a mixture, and what appears on other sites, whether on a blog as a comment, or on Twitter, Youtube or an social site would just be a syndicated copy of your original content. Just one permalink for the original content, with full ownership and privacy controls over who could see it.

In many ways Youtube is just a video feed reader where you syndicate your unique video, and you should link back to the original source, and get the original source ranking :)

It is not just “Salmon Protocol”, there are all kinds of other things going on underneath that need to be understood, and to be honest my own level of understanding of many of the intricacies is a little vague.

Here are a few reference links

Salmon Protocol
Magic Signatures
Activity Streams

Here is what I posted to the Salmon discussion group (I will add a link if it gets through the moderators)

I noticed this Self-Salmon proposal

http://code.google.com/p/salmon-protocol/wiki/SelfSalmon

From a content ownership perspective, the default endpoint should be SelfSalmon and all other endpoints should be for syndication purposes, under user control.

If a user migrates from one endpoint to another, say from a Google service to a Facebook/Yahoo/MSN service, their Salmon moves with them.
A user may even access their Salmon through multiple interfaces of interlinked IDs

I would also like to reference a few conversations from the search community

http://outspokenmedia.com/social-media/social-aggregation/
http://www.google.com/buzz/118122556596388698587/Djy7DNLaE41/twitters-140-character-limit-is-useful-theres-no

Do content creators own the comments?
How about the people the content creators link to?
Or the people they link to?
Or the YouTube Video that they embed that they are all discussing?

Salmon could flow upstream with no logical end point.

Ultimately the only logical primary endpoint is the reference that is unique, the comment creator = SelfSalmon

There is in many ways also an issue with multiple identities & privacy which I discovered from user complaints about Disqus

I am not sure whether this is still an issue, but Disqus was tying together anonymous comments based upon an email ID and displaying them in public, thus it was possible to tie comments on a Political blog under one pseudonym with those on a tech blog using another.
Thus a need for personal end points being able to provide anonymous private notification keys for each interaction.

I realise that Magic Signatures is intended to address this problem, but can it also in some way be extended as a unique permissions mechanism for email delivery?
If an email address was actually a unique Magic Signature, it could also then easily be revoked by the owner similar to providing one-time disposable email addresses.

From a purely email context, a Magic email address would work for only the designated sender to the designated recipient. If sent from a non-designated sender, such as a replyall to a CCed email, there would need to be some kind of authorization/permission & update layer.

These thoughts come from looking in some way to use the provision of email addresses by Facebook through the API and the onerous need to update sending permissions on multiple email service provider platforms, whether correspondence, sales messaging, transactional email or just a comment subscription.

Within the offical spec for Salmon there is effectively a canonical reference for the comment originator

POST /salmon-endpoint HTTP/1.1
Host: example.org
Content-Type: application/atom+xml

<?xml version='1.0' encoding='UTF-8'?>
    <entry xmlns='http://www.w3.org/2005/Atom'>
    <author>
      <name>John Doe</name>
      <uri>acct:johndoe@aggregator-example.com</uri>
    </author>
    <content>Yes, but what about the llamas?</content>  
    <id>tag:aggregator-example.com,2009:cmt-441071406174557701</id>
    <updated>2009-09-28T18:30:02Z</updated>
    <thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0'
       ref='tag:example.org,1999:id-22717401685551851865'/>

    <me:provenance xmlns:me="http://salmon-protocol.org/ns/magic-env">
    <me:data type='application/atom+xml'>
    PD94bWwgdmVyc2lvbj0nMS4wJyBlbmNvZGluZz0nVVRGLTgnPz4KPGVudHJ5IHhtbG5zPS
    dodHRwOi8vd3d3LnczLm9yZy8yMDA1L0F0b20nPgogIDxpZD50YWc6ZXhhbXBsZS5jb20s
    MjAwOTpjbXQtMC40NDc3NTcxODwvaWQ-ICAKICA8YXV0aG9yPjxuYW1lPnRlc3RAZXhhbX
    BsZS5jb208L25hbWUPHVyaT5hY2N0OmpwYW56ZXJAZ29vZ2xlLmNvbTwvdXJpPjwvYXV0a
    G9yPgogIDx0aHI6aW4tcmVwbHktdG8geG1sbnM6dGhyPSdodHRwOi8vcHVybC5vcmcvc3l
    uZGljYXRpb24vdGhyZWFkLzEuMCcKICAgICAgcmVmPSd0YWc6YmxvZ2dlci5jb20sMTk5O
    TpibG9nLTg5MzU5MTM3NDMxMzMxMjczNy5wb3N0LTM4NjE2NjMyNTg1Mzg4NTc5NTQnPnR
    hZzpibG9nZ2VyLmNvbSwxOTk5OmJsb2ctODkzNTkxMzc0MzEzMzEyNzM3LnBvc3QtMzg2M
    TY2MzI1ODUzODg1Nzk1NAogIDwvdGhyOmluLXJlcGx5LXRvPgogIDxjb250ZW50PlNhbG1
    vbiBzd2ltIHVwc3RyZWFtITwvY29udGVudD4KICA8dGl0bGUU2FsbW9uIHN3aW0gdXBzdH
    JlYW0hPC90aXRsZT4KICA8dXBkYXRlZD4yMDA5LTEyLTE4VDIwOjA0OjAzWjwvdXBkYXRl
    ZD4KPC9lbnRyeT4KICAgIA
    </me:data>
    <me:encoding>base64url</me: <me:alg>RSA-SHA256</me:alg>
    <me:sig>
    EvGSD2vi8qYcveHnb-rrlok07qnCXjn8YSeCDDXlbhILSabgvNsPpbe76up8w63i2f
    WHvLKJzeGLKfyHg8ZomQ
    </me:sig>
    </me:provenance>

    <title/>
</entry>

The canonical ID for the content would be

tag:aggregator-example.com,2009:cmt-441071406174557701

The problem comes when destination sites lay claim to ownership of a particular comment and control how it is syndicated.

From many of the code examples I have seen, if someone originated a comment whilst visiting my blog, the above tag would be.

tag:andybeard.eu,2010:cmt-441071406174557701

Effectively content is “owned” by whichever platform is being used to create it, rather than a stated intent that the comments are somehow portable owned by the creator of the content, and published under license.
All my “tweets” on Twitter currently exist on a Twitter permalink. There isn’t an effective method to migrate that content to a different platform, though it is extensively syndicated.

The thing is, as an active participant in conversations across the web, I want to be able to control things from a single “web cockpit” which currently ends up being tons of email notifications in Gmail, many of which end up in spam folders.

It could well be that I don’t understand the implications completely, or the geeky side of programming this, but it seems Salmon isn’t yet user centric but portal centric.

User centric would allow:-

  1. Migration between interface platforms
  2. All content interactions possible, but not restricted to a single interface platform, thus I might start an engagement posting a comment on a blog, but continue all the discussion from within my current web cockpit of choice
  3. Complete control over syndication – if I don’t want my “Salmons” travelling upstream, possibly out of context, I should have complete control over that – I should be able to limit who can display my content on an individual Salmon basis, with the possibility to revoke access to individual sites
  4. Complex threading, not just branching within the interfaces – a single “salmon” whilst in response to one piece of primary content, such as a blog post, might also link through to 3 or 4 other pieces of content.
    Imagine what you can do with a blog post and pingbacks/trackbacks with interlinking with other blogs. In theory that capability should be within every salmon created. In some ways that is a possibility within forum software, as in the past I have received multiple pingbacks or trackbacks from mentions of my blog posts on forums.
  5. Email effectively gives me a universal notification method, and in some cases even a method to respond, such as Posterous, Disqus comments, even some WordPress plugins. Salmon & whatever platforms are used to interface with it needs to have the same level of flexibility.
  6. Visible canonical links within the code whereever a comment is displayed, either for humans or search engines – if I happen to be commenting on a blog post, that is linking to another blog post, which embeds a YouTube video, there needs to be some kind of visible notification of the reason my content appeared where it did.

Within all of this what I don’t really get… is the “scope”.

Salmon appears to be intended for content reactions, but ultimately even blog posts and videos are often reactions. Blogs notify each other using trackback & pingback, and Salmon is in many ways the next stage.

You wouldn’t expect a whole blog post to be imported as a reaction to someone else’s blog post, thus the same could be true for a long comment, or any comment.

Surely all that is needed on a destination site where a comment is left is a link to the canonical source of any reaction with the data displayed using Ajax.
From an efficiency point of view it wouldn’t make sense to have to poll 100 different canonical storage locations for the comment information they contain, thus there is a need to store a referenced comment locally or at a hub, or to have some level of caching, but that doesn’t mean an endpoint should have a permanent copy of whatever content they choose to display from their upstream and downstream, whether it is a direct comment reaction, a blog post or video.

With a service such as Seesmic (the video service not the Twitter client) a comment reaction after all is in many ways just as valid as a tweet or a direct comment.

Any XML data structure or feed attached to a piece of content possibly shouldn’t consist of much more than a list of permalinks where the canonical version of the reaction is held.

Search?

The overall implications for search, based on what is currently within the Salmon Protocol are probably complex enough without adding my input above to confuse things even more. My hope is that we will end up with one canonical version of all content, controlled by the content creator, but with Salmon we might end up with the same comment on 100 different sites with no onus on the sites to display any edited updates, and the possibility through moderation for further editing of content.
Google might be able to work out what is happening, but comment authors probably won’t.

Ideally I would love to see (at a code level) reactions just a bunch of links to their canonical location (owned by the authors with a real permalink), and Javascript/Ajax used to pull in some or all of the content depending on length.
I think what we will end up with is just a bunch of javascript for user generated content, or mass duplicate content with comments stored locally (e.g. current situation with Backtype)

Also I hope long-term it will be possible to upload a video to Youtube and then define the canonical version as a video on a domain I own, and pull in reactions, but that isn’t necessarily in Google’s interests.

That is the end of my brain dump, your head can hurt just as much as mine now.

(p.s. if you get a white screen when you leave a comment, the comments are going through I just have to track down a bug somewhere)

Here is also a link to any discussion on Google Buzz for this post

Liked this post? Follow this blog to get more. Follow

Comments

  1. says

    this my first time that i know something called canonicalization. how it can be controvertion in blog world?
    thanks for your info

  2. says

    Fantastic article Andy. It’ll be interested to see what the effects of this so-called “Salmon Protocol” will have on search… I know you haven’t posted your input because some of the implications are clear but what are your thoughts?

    • says

      I think it is too soon to say.

      The scope is currently for comments that appear to be tied to an endpoint provided by an aggregator posted on a content site, but that doesn’t seem to cater for the extended uses content is already being used for, such as syndicating comments further upstream, scraping on sites like Mahalo, authorized syndication with Friendfeed, Blogcatalog, Buzz itself, and a multitude of other situations.

      Content owner controls… the comment author seems to be limited, but I don’t really differentiate a comment from a blog post, or see why there should be a difference in the way the content is syndicated to various destinations.

  3. susheel says

    The article is explicitly focused towards Salmon Protocol mailing list. Salmon Protocol is actually one of the most exhilarating things that will be used by Google Buzz & theoretically could dramatically make mammoth changes to the way communication flows around all kinds of web content, whether comments on blog posts, votes & likes on YouTube videos, maybe blog posts themselves, video replies etc. The article is hugely informative and knowledgeable. Thanks for sharing so useful information.

    • says

      Interesting comment spam tool you have there, rewriting my content with some kind of spin algorithm – Google Buzz with Salmon will kill 95% of human comment spam

  4. says

    Andy I love how you say ” your head can hurt just as much as mine now” because beleive me trying to keep up with this post was quite difficult. So is salmon protocol just a way of trying to figure out user interaction an linking? The way in which content creation is shared, syndicated, linked to, discussed among all interfaces and platforms. I am probably way off I will go read the resource link that you gave for salmon protocol.

  5. says

    Andy — Sorry about the delay in your message, I fell behind in moderating new posters (spam filtering) over the past few days. Thanks for the thoughts. Salmon at the moment is a single-hop protocol, but given that you can go from A to B and B to C it’s a very small step to figure out when you want to chain all the way from A to C :).

    Re: Migration — Yes, absolutely. Note that the tag: URIs do _not_ depend on the existence or approval of the minter of the URI — in other words, even if example.org created that ID, it should be possible to import it and associated content to sample.org in the future. In other words, the domain name is just a convenient mechanism for generating guaranteed unique IDs. (The year appended to the domain name helps with cases where the domain changes ownership in the future.)

  6. says

    Hello Andy, great article and thanks for the awesome info. I found your blog while I was researching canonical links and how Google will now see and treat them. Do you agree that the Caffeine update now treats them differently than before? I am getting mixed responses and no one seems to have a solid explanation about it.

  7. says

    First of all, what a ridiculously awesome name for their new system; I don’t think that anyone could forget it. Perfect for branding. The idea is a little bit complex but Google is so good at ironing out kinks that this sounds interesting. I’m wondering how this will affect the SEO part of the web. Anyone know?

  8. says

    This is clearly what one could call the next level of communication on ‘www’. If this works just like the way you told,it would be no wonder that it would be true by each word that you have written above.

  9. says

    Andy, there’s been alot of talk about just how much Google takes into account how “viral” content is and yeah, I’m excited that this (fingers crossed) is the future of the web; I think that taking a look at the “buzz” surrounding information should be more of a determining factor than the number of (possibly spammy) backlinks pointing at the content. Its pretty easy to see how “salmonization” works inside of buzz, but I’m having a tough time seeing how salmonization will be taken into consideration across the web – e.g. someone posts a video in response to an article I wrote, how do I know, and for that matter, how does google know 1)the video is out there and 2) most importantly that it is a response to my content, especially if the video author does not place any kind of tag or link to my original content? I hope I’m not missing a major concept here, and I’m thankful I’m not the guy that responsible for coding this! The future of the internet looks bright but a little daunting!

  10. says

    Thank you a lot for this topic.
    Even now we have Google in Russia (Russian language version), still not much info regarding Google preferences/ rules and are algorythms available. Never before I heard about “Salmon Protocol”.
    Thanks again.

    • says

      Yes – thanks a lot for this topic… I’ve also never heard of salmon protocol, but it’s very interesting to think of the communication possibilities as this begins to be developed further. Great post!

  11. Gleenn says

    hi,

    I’m reading a lot about SEO tips and techniques to improve my knowledge about this whole thing. Although I understand much of what you discussed in this article, there are still a lot that I need time to absorb – more about this Salmon thingy.

    What frustrates me though about PR status is that the content does not weigh as much as the inbound links when it comes to PR rating. Some blogs as I noticed do not have much to say but because of inbound links, they report good pr.

    However, in relation to the notion that perhaps there’ll come a time when commenting will no longer be necessary, I don’t think that would be a good idea. Although comments do not really matter that much for me, it is also good to get feedback from the readers and it really makes me feel good to know that some people are actually reading my post.

    So back to the Salmon, ;), I guess I need to read this article again to fully understand it, hehe.
    And oh, btw, my blog is a dofollow blog too. I hope you’d take a look and drop me some thoughts ;) And even if I read from from your comment policy that there shouldn’t be any link at the signatory of the post, I’m still doing it. hoping you’ll allow ;)

    Gleenn

  12. WorkingShirt Online says

    Hi Andy,

    This is such a lot of information, truly, information overload, but the Salmon Protocol shows how interconnected the world is and can be. Something like the old theory “Six Degrees of Separation,”
    in that we know or, are connected, to everyone in the world. This interconnection is really exciting and I’m looking forward to it. Sometimes the rapid spread of information, even when some governments try to block it, shows how powerful this connection is. Love it.

  13. says

    Ouch…my eye hurts,…please andy dont take it personally…but it felt i was going through another “Kennedy” (book author of Communication) of Protocols ..it was interesting..so i cant leave it from middle also…
    BUT..
    one thing i couldn’t digest…how the heck i am going to get the authority over my comment..who will have its ownership..
    is it owned by “the platform” or “the owner” …

    PS: Be prepared i will invade u again and again..surely u have invited a dumb boar who cant give up on tech until he understands them … :D

  14. says

    Fantastic article Andy. It’ll be interested to see what the effects of this so-called “Salmon Protocol” will have on search…I’ve also never heard of salmon protocol, but it’s very interesting to think of the communication possibilities as this begins to be developed further.