Tuesday, November 17, 2009

PAWS meeting - Nov 10, 2009

In today's meeting, there were two presentations. In the first one Rosta presented an introduction to the theory behind people's participation in organizations, which comes mainly from the organizational domain. Following, she presented a study, conducted in two steps, about socialization tactics on several WikiPage projects. The study was focused on the participation of contributors (not on viewers). In the first part of the study it was observed the behavior of the users. in the second part was measured the impact of socialization techniques specially on newcomers. One interesting result was that personalized messages produced more participation than standardized messages on newcomers.

In the second turn, Denis presented the paper "The Effect of Correlation Coefficients on Communities of Recommenders". This paper was written by Lathia, Hailes and Capra from the University College London. They compare different measures of similarity showing their distribution (using MovieLens as dataset) and comparing their accuracy (MAE) and coverage results.

They show an interesting result on their study: that the similarity coefficients don't have a significant impact on the accuracy metrics compared to the neighborhood size. The experiments show that in some cases, using a random similarity measure between users can result in better accuracy of item prediction whether a large number of users have been used in the neighborhood. At the end of the paper there's a discussion about these non-expected results. Between 3 of them (criticism to accuracy metrics, data sparsity and the use of user-based similarity) they highlight that their results show a lack of support to the user-based similarity as a measure to capture an important factor on providing recommendations. So far, different similarity metrics show different rankings and distributions, but none of them extensively outperform any other.

One open question is: Should this result be extrapolated to any other dataset? Other questions: Which kind of similarity measure could help to capture better the concept of "word-of-mouth"? Do we really know what the ratings mean and how to use them to provide better recommendations?

Labels: ,

Tuesday, September 08, 2009

PAWS meeting - Sep 8, 2009

The second meeting of the Fall 2009 semester is devoted to discussing selected papers from the UMAP 2009 conference. The two papers presented are focusing on mobile recommender systems.

Paper 1. PBohnert, F. and Zukerman, I. (2009). Non-intrusive personalisation of the museum experience. In Houben, G.-J., McCalla, G. I., Pianesi, F., and Zancanaro, M., (Eds.), 17th International Conference on User Modeling, Adaptation, and Personalization (UMAP 2009), pp. 307–318, Trento, Italy.

Authors utilize a special hand-operated tool - Geckotracker to obtain tracking data from visitors of Melbourne Museum. In particular exhibits of interest and viewing times are collected. The data is then analyzed in order to build a prediction model, capable of recommending new unvisited exhibits to see. Actual log viewing times are used as a primary measure of user interest. Several competing models are built. Leave one out method is used to estimate models' performance.

Paper 2. Partridge, K. and Price, B. (2009). Enhancing mobile recommender systems with activity inference. In Houben, G.-J., McCalla, G. I., Pianesi, F., and Zancanaro, M., (Eds.), 17th International Conference on User Modeling, Adaptation, and Personalization (UMAP 2009), pp. 307–318, Trento, Italy.

Paper focuses on mobile activity recommender system Magitti. Magitti recommends 5 classes of activities: eat, shop, do, see, and read. The data used for building several alternative recommender models was provided by the Japan Statistics Bureau and has data from 10 000 people reporting their activity every 15 min during one whole day. Recommender models take into account several factors, including location, surrounding venues, time of the day, personal calendar, etc. Magitti has gone though a small scale evaluation by 11 researchers and administrative staff users. Results show that a combination of location-based and personal-activity-pattern models works best.

Also discussed:
- UMUAI Special Issue on Educational Data Mining
- UMAP 2009 Proceedings

Wednesday, April 15, 2009

PAWS meeting - Apr 15, 2009

Jae-wook presenting paper Combining document representations for known item search by Paul Ogilvie and Jamie Callan

The paper investigates the pre-conditions for successful combination of document representations formed from structural markup for the task of known-item search. As this task is very similar to work in meta-search and data fusion, we adapt several hypotheses from those research areas and investigate them in this context. To investigate these hypotheses, we present a mixture-based language model and also examine many of the current meta-search algorithms. We find that compatible output from systems is important for successful combination of document representations. We also demonstrate that combining low performing document representations can improve performance, but not consistently. We find that the techniques best suited for this task are robust to the inclusion of poorly performing document representations. We also explore the role of variance of results across systems and its impact on the performance of fusion, with the surprising result that the correct documents have higher variance across document representations than highly ranking incorrect documents.

Dhruba Baishya presenting a set of innovative visualization techniques, including:

- eigen factor score
- dewey circles
- ny times api
- flickr ecosystem
- ted sphere
- radial social network
- knowledge network
- author co-citation
- euro2004
- web trend map
- los ojos del mundo
- botanical tree
- tagging behavior in nicovideo
- flickr group

Related links
- http://www.visualcomplexity.com/vc/
- http://infosthetics.com/
- http://developer.nytimes.com/visualizations_app/

Wednesday, March 25, 2009

PAWS Meeting (25 March, 2009)

1. Denis discussed his experiment on Recommendation for CiteULike.

2. Danielle presented a paper about “Tagsplanation” - best paper at IUI’09.
Authors investigated the use of tags for generating and explaining recommendation.

3. Tomek presented a paper that describes Document Summarization based on Eye-Tracking with Web-cam. The paper raised some concerns.

4. Katrina (from Switzerland ;-) introduced her research on culturally-adaptive user interfaces. The system is modeling users’ culture along several dimensions (nationality, religion, education, etc.) and tries to adjust its interface.

Wednesday, March 18, 2009

PAWS Meeting - Mar 18, 2009

1. Jeniffer presented an article entitled "Accuracy in rating and recommending item features." The paper discusses work aimed at comparing item- and feature-based ratings of images of artwork.

2. Denis presented an article entitled "Clustering the tagged Web." The paper investigates the question of the impact of user-generated tags on improving Web document clustering. Denis also talked about the progress he made on his project and suggested that having Chillean wine while not sleeping enough may be a good idea.

Friday, February 27, 2009

PAWS meeting - Feb 25, 2009

Comments on the first presentation here.

In the second part of the meeting Zhen presented her work about collaborative information behavior (CIB). She studied aspects of CIB by simulating e-discovery tasks and obtained some insights such as: Communication is frequent and an essential component of CIB; the division of labor is common in the collaborative task of e-discovery; and it is important for collaborators to keep “awareness” of each other’s activities to make sure the collaboration goes well. Based on these insights, some functions for retrieval systems that support CIB were proposed: Collaborative information retrieval technologies should support collaborative information behaviors including verbal communication and text exchanging. A well designed CIR system should support both synchronous collaboration and asynchronous collaboration.

Tuesday, February 10, 2009

[week 5] Muddiest Points

Talking about the Probability Ranking Principle, it was mentioned the case of retrieval costs but we didn't deepen into that concept. In the case that Costs are taking into account in a retrieval model:
  • Which are commonly the values for these "C" costs?
  • Which variables or factors are taking into account to set these costs values (hardware, size of collection, mean size of documents, etc)?
Another question I have is that several models take constants into account and after some experiments they suggest a range to set the values of those constants when defining a model. I am not absolutely sure, but these ranges must come using metrics such as precision and recall, and the documents collections must come from a programme like TREC. Is this enough to establish a model? It seems that all the theory for these probabilistic and language models is, in practice, oversimplified by smoothing factors and other constants added to the models. How can we be sure that the theory still states when adding these factors and constants in practice?