Asa Dotzler: Firefox and more

October 13, 2007

yahoo pipes project

If there's anyone reading who'd like to take on an interesting Yahoo Pipes project that would be a big benefit to the Mozilla community, I could use some help.

Here's what I'm after. I'd like to take the feed results of a "firefox" search at Google Blog Search (and various other blog search services) and fetch the full content of each feed item, using that to rebuild the results feed with full content rather than excerpts.

The Google Blog search will return a feed that contains a series of entries that reference blog posts that mention Firefox. See here.

Each of the entries in this feed contains the basic title, permalink, author and date information, and a short excerpt of the content.

What I'd like is to use Yahoo Pipes transform this into a feed that contains the full content of each post rather than that short excerpt - at least for sites that offer full content. This shouldn't be too complicated but so far I've been unable to make it happen.

(Alternately, Google and the other search services could just start providing full content feeds rather than short excerpts.)

I've been looking over the Yahoo Pipes features and it looks like this should all be possible. The Google Blog Search results feed contains the permalink to the post, so with the Fetch Site Feed module, Pipes should be able to retrieve each site's full feed and then with some simple term matching it should be possible to pull out the specific feed item that mentioned Firefox and use that content to rebuild the original aggregated feed.

If you've got more Yahoo Pipes skills than me, and would like to help out with the relaunch of Mozilla's For The Record program, please let me know and I can try to go more in depth here.

Posted by asa at 11:11 AM

 

reactions, thoughts, comments, etc.

Hey Asa,

What's your take on the two full-time Thunderbird developers leaving the project? You posted an entry a while back saying how Thunderbird wasn't dead because it had these two developers assigned - looks like they've abandoned a sinking ship no?

Posted by: Guy Smiley | October 13, 2007 2:38 PM

Guy, how about not posting wildly off-topic comments here. Thanks.

- A

Posted by: Asa Dotzler | October 13, 2007 4:06 PM

Sorry about off-topic'ing. But I hadn't heard about the two devs leaving thunderbird. I would also like to hear your take on this development. Once again sorry.

On the actual subject. While I understand your need for reading full content, without clicking too much, I believe the authors of the blogs are the owners of their own intellectual property, thus it is only fair, that you have to click to get to their pages, in order for them to earn money off adds.

Posted by: Frederik | October 13, 2007 4:16 PM

Frederik, I'm explicitly talking about blog owners who offer full content feeds. Many do and the problem I'm experiencing is that the sites who index these blogs, Google Blog Search, Technorati, and others, don't include the full content in their search results feed.

So, to recap, what I'm asking for is that for sites that do provide full content feeds, that when I get them aggregated in a search results feed from Google and Technorati, that I get the actual content that they offer to their feed subscribers and not the truncated content that Google and Technorati offer.

Does that make sense?

- A

Posted by: Asa Dotzler | October 13, 2007 4:57 PM

Hmmm, I'm not sure if this is what you had in mind, but check this pipe out:

This runs the Google blog search link you posted above, passes its results through the Loop module with a Fetch Site Feed injected into it. What happens is that for each link returned by the Google search, the target page gets retrieved, its feeds get enumerated, the first one is picked, and the results of it are aggregated into the Loop module's output. Then, this output is fed through a Filter elements, which simply looks through the description field to filter out the items that do not contain "Firefox" in their description. The result is all of the stuff posted on the blog's feeds which contain the word "Firefox".

I'd like to know if this is near to what you had in mind! Remember, this is the first pipe I've ever created. :-)

Ehsan

Posted by: Ehsan Akhgari | October 14, 2007 1:16 PM

Oops, the URL got messed up in the previous comment because I have a habit of putting URLs in brackets...

Here's the URL:

http://pipes.yahoo.com/pipes/pipe.info?_id=NH5AYph63BGZptQo9IS63A

Ehsan

Posted by: Ehsan Akhgari | October 14, 2007 1:18 PM

Ehsan, that's looking great. Could you split the pipe to allow those which we can't find full content (or even autodiscover the feed) to pass through so we don't drop items?

- A

Posted by: Asa Dotzler | October 15, 2007 12:25 PM

asa2008.jpg

Join Mozilla!