Good strategies for dynamically organizing content from multiple automated feeds

12 Feb 2009 - 3:17pm
5 years ago
8 replies
1089 reads
Gail Swanson
2008

I've got a design problem that I've been trying to solve for some
time now, and just can't seem to find the answer to. Perhaps all of
you out in UX-land have some ideas.

I've been working on a project that has an established taxonomy for
its content (artlcles, videos, blogs, etc.) but the big challenge is
how to handle content that comes into the site through automatic
feeds from various sources. How can we categorize the information by
topic dynamically?

For example, imagine that a news website imports its article content
from various news services around the globe. None of the sources
include consistent topic data. The metadata available are things
like source and data published, very quantifiable. How would they
provide a way for the user to browse the available content according
to categories like "Sports" and "World News" if they cannot be
derived from the provider?

We're trying to provide a unified experience across the content no
matter the provider and allow the user to browse by topics. We also
need to minimize human intervention because of frequency and volume
of content. Has anyone had success with a similar situation? What
was the solution?

I've been banging my head against the monitor on this one for a few
months and would appreciate any ideas.

Comments

12 Feb 2009 - 4:21pm
Shimone Samuel
2009

Have a look at Yahoo Pipes: http://pipes.yahoo.com

With Pipes you can take an input (your RSS feed for instance), run a
query on it and output the results as a new feed. Provide this custom
feed for your users and you have categorization.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=38635

12 Feb 2009 - 4:31pm
Katie Albers
2005

Well, my gut reaction is that -- at least for text-based content --
you're down to the content itself. I'm assuming here that you *are*
dealing with news. If there are authors associated with the content,
then generally you'll find they're associated with a certain type of
content (some reporters write about sports, some about DC, some about
war zones, etc.) and that gives you a top level way to sort. Then
within the content you find keywords which can be associated with
current events. Of course, this still means that you have an ongoing
task of keeping those keyword lists up-to-date.

I would hope that the video has better metadata; I'd hate to think of
someone watching hours of video just to sort it.

I hope there's a better way to do this, but that's all I can think of.

Katie Albers
Founder & Principal Consultant
FirstThought
User Experience Strategy & Project Management
310 356 7550
katie at firstthought.com

On Feb 12, 2009, at 1:17 PM, Gail Swanson wrote:

> I've got a design problem that I've been trying to solve for some
> time now, and just can't seem to find the answer to. Perhaps all of
> you out in UX-land have some ideas.
>
> I've been working on a project that has an established taxonomy for
> its content (artlcles, videos, blogs, etc.) but the big challenge is
> how to handle content that comes into the site through automatic
> feeds from various sources. How can we categorize the information by
> topic dynamically?
>
> For example, imagine that a news website imports its article content
> from various news services around the globe. None of the sources
> include consistent topic data. The metadata available are things
> like source and data published, very quantifiable. How would they
> provide a way for the user to browse the available content according
> to categories like "Sports" and "World News" if they cannot be
> derived from the provider?
>
> We're trying to provide a unified experience across the content no
> matter the provider and allow the user to browse by topics. We also
> need to minimize human intervention because of frequency and volume
> of content. Has anyone had success with a similar situation? What
> was the solution?
>
> I've been banging my head against the monitor on this one for a few
> months and would appreciate any ideas.
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help

12 Feb 2009 - 5:57pm
Angel Marquez
2008

Presentation Layer/ ViewUse a xslt. <http://www.w3.org/TR/xslt>

Server Side
Have an IA setup like this:
Server 1
2009
Febraury
01_Category_01
01_article_01.xml

02_Category_02
03_Category_03
Server 2

Aggregator Script

On Thu, Feb 12, 2009 at 2:31 PM, Katie Albers <katie at firstthought.com>wrote:

> Well, my gut reaction is that -- at least for text-based content -- you're
> down to the content itself. I'm assuming here that you *are* dealing with
> news. If there are authors associated with the content, then generally
> you'll find they're associated with a certain type of content (some
> reporters write about sports, some about DC, some about war zones, etc.) and
> that gives you a top level way to sort. Then within the content you find
> keywords which can be associated with current events. Of course, this still
> means that you have an ongoing task of keeping those keyword lists
> up-to-date.
>
> I would hope that the video has better metadata; I'd hate to think of
> someone watching hours of video just to sort it.
>
> I hope there's a better way to do this, but that's all I can think of.
>
> Katie Albers
> Founder & Principal Consultant
> FirstThought
> User Experience Strategy & Project Management
> 310 356 7550
> katie at firstthought.com
>
>
>
>
>
>
> On Feb 12, 2009, at 1:17 PM, Gail Swanson wrote:
>
> I've got a design problem that I've been trying to solve for some
>> time now, and just can't seem to find the answer to. Perhaps all of
>> you out in UX-land have some ideas.
>>
>> I've been working on a project that has an established taxonomy for
>> its content (artlcles, videos, blogs, etc.) but the big challenge is
>> how to handle content that comes into the site through automatic
>> feeds from various sources. How can we categorize the information by
>> topic dynamically?
>>
>> For example, imagine that a news website imports its article content
>> from various news services around the globe. None of the sources
>> include consistent topic data. The metadata available are things
>> like source and data published, very quantifiable. How would they
>> provide a way for the user to browse the available content according
>> to categories like "Sports" and "World News" if they cannot be
>> derived from the provider?
>>
>> We're trying to provide a unified experience across the content no
>> matter the provider and allow the user to browse by topics. We also
>> need to minimize human intervention because of frequency and volume
>> of content. Has anyone had success with a similar situation? What
>> was the solution?
>>
>> I've been banging my head against the monitor on this one for a few
>> months and would appreciate any ideas.
>> ________________________________________________________________
>> Welcome to the Interaction Design Association (IxDA)!
>> To post to this list ....... discuss at ixda.org
>> Unsubscribe ................ http://www.ixda.org/unsubscribe
>> List Guidelines ............ http://www.ixda.org/guidelines
>> List Help .................. http://www.ixda.org/help
>>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help
>

12 Feb 2009 - 6:12pm
Angel Marquez
2008

*Presentation Layer/ View*

Use a xslt. <http://www.w3.org/TR/xslt>

*Server Side*

Have IA setup like this:

*Server 1: Content Server*

2009

Febraury

01_Category_01

01_article_01.xml

01_article_02.json

02_Category_02

03_Category_03

Other or Miscellaneous

Source

Compare

Category.xml (authors, keywords, descriptions, custom tags etc... )

*Server 2: Media Server*

Media

image

mobile

web

broadcast

audio

mobile

web

broadcast

video

mobile

web

broadcast

*Aggregator Script*

Use your server side scripting language to scan the incoming feeds and
compare them to the category.xml (you must make a xml file for each category
and have some logic to what would appear in the header, content, and footer
etc..) file and if their is a match toss it into the appropriate category
folder and route the media to the media server with some sort of lowest
common denominator criteria. If their is a questionable comparison have it
scripted into the other or misc folder and on your front end script it so
that the front end script feeds the data into the xslt make sure the other
folder content comes up something like 'writers choice or our pick' or have
human intervention for those.

Have your interaction designer have the interface make sense and your visual
designer make it look sensible.

$('source-feed').compare('.article-01');

12 Feb 2009 - 6:00pm
Angel Marquez
2008

ooops.
I fat fingered that one. I'll write it up in a text editor and then paste it
in a sec.

12 Feb 2009 - 10:28pm
jwdomb
2006

Hi Gail,

I've used Naive Bayes classification [1] to accomplish similar
things in the past. The method can be used to sort blocks of text
into predefined categories (your taxonomy) based on word frequencies
(in the items that come from automatic feeds). It's a pretty
popular approach for filtering spam out of inboxes, but it can be
used much more generally and with as many categories as you'd like.
Implementation tends to follow these steps:

1. Set up a classifier and categories [2]
2. Train the classifier with sample content for each of your
categories
3. Test the classifier with additional sample content to make sure
it's working reasonably well
4. Refine over time

A nice reference implementation might be POPFile [3]. POPFile sorts
emails into categories you define and then refine by letting it know
when it's made a mistake. The Wikipedia page on Naive Bayes can
lead you to other methods or you might consider a more advanced
solution like SPSS's Predictive Text Analytics [4].

Sincerely,

Joseph Dombroski

[1] http://en.wikipedia.org/wiki/Naive_Bayesian_classification
[2] Many programming languages have libraries to make this easier.
You can also find software that will help you set these up. I did a
quick search for feed classifiers and found the service
http://rss.knownews.net as well as some software at
http://the.taoofmac.com/space/blog/2006/11/04
[3] http://getpopfile.org/
[4] http://www.spss.com/text_mining_for_clementine/

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=38635

12 Feb 2009 - 4:10pm
Daniel Gross
2009

Hello Gail,

Looks like you have an information problem -- i.e. too little information at
the source. I guess user tagging of contents --after some broad
classification -- doesn't work.

I once talked to someone at Microsoft about the technology used for help
suggestions -- they parse text a user types, and an algorithm tries to
derive relevant help topic ... but it's not perfect of course

Daniel

> -----Original Message-----
> From: new-bounces at ixda.org [mailto:new-bounces at ixda.org] On Behalf Of
> Gail Swanson
> Sent: Thursday, February 12, 2009 1:18 PM
> To: discuss at ixda.org
> Subject: [IxDA Discuss] Good strategies for dynamically organizing
> content from multiple automated feeds
>
> I've got a design problem that I've been trying to solve for some
> time now, and just can't seem to find the answer to. Perhaps all of
> you out in UX-land have some ideas.
>
> I've been working on a project that has an established taxonomy for
> its content (artlcles, videos, blogs, etc.) but the big challenge is
> how to handle content that comes into the site through automatic
> feeds from various sources. How can we categorize the information by
> topic dynamically?
>
> For example, imagine that a news website imports its article content
> from various news services around the globe. None of the sources
> include consistent topic data. The metadata available are things
> like source and data published, very quantifiable. How would they
> provide a way for the user to browse the available content according
> to categories like "Sports" and "World News" if they cannot be
> derived from the provider?
>
> We're trying to provide a unified experience across the content no
> matter the provider and allow the user to browse by topics. We also
> need to minimize human intervention because of frequency and volume
> of content. Has anyone had success with a similar situation? What
> was the solution?
>
> I've been banging my head against the monitor on this one for a few
> months and would appreciate any ideas.
>
>
> ________________________________________________________________
> Reply to this thread at ixda.org
> http://www.ixda.org/discuss?post=38635
>
> ________________________________________________________________
> Welcome to the Interaction Design Association (IxDA)!
> To post to this list ....... discuss at ixda.org
> Unsubscribe ................ http://www.ixda.org/unsubscribe
> List Guidelines ............ http://www.ixda.org/guidelines
> List Help .................. http://www.ixda.org/help

14 Feb 2009 - 9:08pm
Gail Swanson
2008

Great info everybody. Thanks! I've some good leads on what I need
to research and some options that are available.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=38635

Syndicate content Get the feed