• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Cross Posting

This version was saved 14 years, 3 months ago View current version     Page history
Saved by Rob Dolin
on April 17, 2010 at 5:04:06 pm
 

Notes from Brainstorming at Stream Camp (2010-04-17)

 

Scenarios:

  • User publishes an activity from a client to multiple services simultaneously
    • TweetDeck --> Facebook and Twitter
    • Ping.fm --> ~30 services
  • User sync's activity among services
    • Foursquare --> Facebook, Twitter
    • Twitter <--> MySpace, LinkedIn, Facebook

 

Globally unique <entry><id> - challenging

Check publishing <source>, <app_id>, <generator>, etc.

Need to cannonicalize status text (ex: remove "robdolin: " or "Rob Dolin" or "via foo")

Time window (ex: same update within a short period of time)

Doing checking on similar properties (ex: <title> of a Blog-entry in a Status or Status contains a link (possibly short URL) to the blog <link>)

For multi-site publishers (ex: Ping.fm) could define/include a property/hash that was unique to the author/publishing client (i.e. JohnSmith123PingFm)

 

(Older content below)

 

In early October, a few folks from Facebook and Google (others were invited as well) got together to brainstorm on cross posting and reducing duplicated content.  We spent a lot of time talking about Mart's Atom Cross-posting Extension and wanted to share our notes.

 

Brainstorming Notes on De-Duping and Cross Posting

 

Proposal Summary

- Facebook, Twitter, and similar APIs should allow clients to specify a crosspost:source ID when publishing content.

- Services that crawl and re-publish feeds should propagate the crosspost:source element or create it based on the crawled atom:id or RSS guid.

 

Background

It's become clear over the past year that the rate of content being automatically re-shared across social websites is increasing and has caused duplication as aggregators are unable to determine the original source of a piece of content. One example is YouTube's auto-share functionality which allows a user to automatically re-share videos they upload (or favorite) to Facebook, Twitter, and Google Reader. This has resulted in a given video being posted to YouTube, shared to Facebook, Twitter, and Reader, and then re-duplicated on FriendFeed as it's aggregating from both Facebook and Twitter as well. While aggregators like FriendFeed have written custom algorithms to detect this sort of duplication and coalesce content, it would be desirable to reduce this echo effect from the start.

 

So far the crosspost:source element from http://74.125.155.132/search?q=cache:F4JZo0244ZkJ:martin.atkins.me.uk/specs/atomcrosspost (Mart's site seems to be down) seems to solve the main use case.

 

<entry>

  <id>tag:jibber.example.org,2005:4523452</id>

  <title>geraldine: Photos from my Weekend http://sillyurl.example.net/abc123</title>

  <link href="/http://jibber.example.net/statuses/4523452" />

  <!-- (other standard Atom elements elided for brevity) -->

  <crosspost:source>

    <id>tag:blogtastic.example.com,2009:5a12451543</id>

  </crosspost:source>

</entry>

 

 

Where do we want the complexity?

- Options are either at the publisher or at the aggregator.

- It seems more resilient for the publisher to specifiy if this is cross posting versus the aggregator having to figure it out.

 

So when do you include crosspost:source?

- Only use crosspost:source when it really is a cross post, i.e. really constitutes the same original user action, merely being re-broadcast through multiple services.

- We're not solving the general coalescing problem; aggregators still need to make their own decisions on how to coalesce items that originate from distinct but related user actions.

- For example, if the user posted a YouTube video and then manually pasted the link into their Facebook status, crosspost:source would not be included.

- To an aggregator, crosspost:source means, "if you already have another instance of the thing I'm linking to, feel free to completely drop one of the two".

- Items with the same crosspost:source may not be identical -- e.g. on Twitter, the text may be truncated to 140 chars or links may be shortened. Aggregators are free to select among multiple copies however they like -- e.g. try to identify the highest fidelity.

 

A simple scenario of user activity and how the crosspost:source travels through the system:

- User uploads a video on YouTube. The YouTube user feed should not have a crosspost:source, but will have a unique atom:id.

- YouTube auto-shares that video upload action to Twitter. The same ID that appears in YouTube's atom feed should be included as the crosspost:source when posting to Twitter (see below re: extension to posting API).

- If the user has set up FriendFeed to crawl their YouTube feed, then FriendFeed finds the item in the YouTube feed, with no crosspost:source. When re-publishing the user's aggregated FriendFeed feed, they should take the YouTube atom:id and make that the crosspost:source in the re-published feed.

- If the user has set up FriendFeed to crawl their Twitter feed (but not their YouTube feed), then FriendFeed finds the item in the user's Twitter feed, including a crosspost:source. They should propagate that crosspost:source when re-publishing the item in the user's aggregated feed.

- If the user has set up FriendFeed to crawl both their YouTube and Twitter feeds, then FriendFeed finds both, recognizes them as the same, and re-publishes in the user's aggregated feed a single item with the crosspost:source ID.

 

What about one activity which contains multiple things?

- The only use case we can think of where this would occur is where an aggregator automatically coalesces two distinct activities and then publishes the combined activity. (Remember that the crosspost:source element is only applied to automatically re-shared activities, not activities that are the result of explicit user actions.) We have decided not to address the coalesced activity use case now, because we consider coalescing to be a separate problem that is out of scope. One possible solution is to include multiple crosspost:source elements in the activity, but this is not currently permitted by the crosspost extension spec. For the time being, coalesced activities will be considered new activities, and they will not include any crosspost:source element.

 

Issues around who gets linked to when stuff gets coalesced.

- What happens when similar activities (like Favoriting & Rating a video) happen at near-by times for the same video. They would get different crosspost:sources and it'd be up to the aggregator to coalesce them.

 

What if I don't want to use Atom? Aren't more APIs becoming built on JSON instead?

- Twitter specifically raised this question as they see most of their API usage via JSON and not Atom.

- The Activity Streams working group is currently developing native (not some crazy programatic transform) representations for their specifications in JSON. We expect crosspost:source to work in JSON in addition to Atom and RSS.

 

Why not use link rel=via?

- It could it be combined with the link rel 'alternate'? i.e. <link rel="alternate via" href="/http://www.youtube.com/watch?v=bBBw9E2Q_aY" />

- Doesn't using a URL mean that we then need to understand canonical and duplicate URLs? An atom:id is not dependant on the URL(s) of the content.

 

Why not use rel=canonical (http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html)?

- It is designed to say that two URLs are equivalent versus that a piece of content is being re-posted somewhere else.

 

Why not use atom:source (http://tools.ietf.org/html/rfc4287#section-4.2.11)?

- It seems like atom:source is only for if the entire entry is preserved identically.

 

For APIs like Facebook's stream.Publish (and similar on Twitter, etc), how do clients provide a crosspost:source ID value?

- Sites like Facebook, Twitter, and Digg could extend their APIs to expose a new mechanism for clients to provide a crosspost:source ID.

- Open question: How can Facebook (etc) verify the atom:id that is being passed in? Or do they even need to? What are the malicious or just "dumb user" or "dumb developer" issues if the wrong id is passed in? Just broken coalescing?

 

When calling such an API, how do clients indicate whether or not crosspost:source should be included in that service's outbound feed?

- Probably just need to always include crosspost:source if the client provided one.

- YouTube currently provides a message like "David favorited this awesome video" so checking for a blank message won't work. If they pushed in Activity Streams then these sorts of default messages should become less needed.

Comments (0)

You don't have permission to comment on this page.