Wednesday, March 18, 2009

PAWS Meeting - Mar 18, 2009

1. Jeniffer presented an article entitled "Accuracy in rating and recommending item features." The paper discusses work aimed at comparing item- and feature-based ratings of images of artwork.

2. Denis presented an article entitled "Clustering the tagged Web." The paper investigates the question of the impact of user-generated tags on improving Web document clustering. Denis also talked about the progress he made on his project and suggested that having Chillean wine while not sleeping enough may be a good idea.

2 Comments:

Blogger Peter B. said...

I think the paper presented by Denis is quite important in the context of our work and would like to see more comments on it. As we getting more and more ways to describe data (with keywords, concepts, tags, etc), it is important to understand how these different approaches work and how they can creatively coexist. At lest for the work of Jennifer and Jaewook, the presented results could give some guidance pushing us to choose one of several approaches we were considering.

March 20, 2009  
Blogger Denis Parra said...

It could be interesting to check the post about this paper made by one of the auhors:

http://infoblog.stanford.edu/2008/11/clustering-tagged-web-posted-by-daniel.html

He emphasizes the findings done by their research. Special note for the weighting model, where he says : "We found that weighting all word dimensions in aggregate equally to (or with some constant multiple of) all the tag dimensions resulted in the best performance... if you normalize tag dimensions and word dimensions independently, then your clustering model improves. This insight may well apply to other tasks in the VSM."

About the name MM-LDA (MultiMultinomial Latent dirichlet allocation), in the comments of the posts Bob Carpenter wrote:

<< "MM-LDA" is what Newman, Chemudugunta and Smyth in their KDD '06 paper "Statistical Entity-Topic Models" called "CI-LDA". The "CI" was for "conditionally independent", because the two views (tags and words here; entities and words in their paper) are being generated independently given the document's topic distribution.>>

March 20, 2009  

Post a Comment

<< Home