Friday, September 28, 2007

PAWS Meeting 2007-09-28

Paper presentation by Danielle
Khan, L., McLeod, D., and Hovy, E. 2004. Retrieval effectiveness of an ontology-based model for information selection. The VLDB Journal 13, 1 (Jan. 2004), 71-85. DOI= http://dx.doi.org/10.1007/s00778-003-0105-1

Technology in the field of digital media generates huge amounts of nontextual information, audio, video, and images, along with more familiar textual information. The potential for exchange and retrieval of information is vast and daunting. The key problem in achieving efficient and user-friendly retrieval is the development of a search mechanism to guarantee delivery of minimal irrelevant information (high precision) while insuring relevant information is not overlooked (high recall). The traditional solution employs keyword-based search. The only documents retrieved are those containing user-specified keywords. But many documents convey desired semantic information without containing these keywords. This limitation is frequently addressed through query expansion mechanisms based on the statistical co-occurrence of terms. Recall is increased, but at the expense of deteriorating precision. Focusing on audio data, we have constructed a demonstration prototype. We have experimentally and analytically shown that our model, compared to keyword search, achieves
a significantly higher degree of precision and recall. The techniques employed can be applied to the problem of information selection in all media types.

Discussion
Decided to keep reading logs in private www.wordpress.com blogs only visible to group members

Labels: , , , ,

Friday, September 21, 2007

PAWS Meeting 2007-9-21

Chirayu presented a paper titled as "The folksonomy tag cloud: when is it useful?" In this paper, the authors intended to see the utility of the famous "tag clouds." Even if it's already being used in many commercial Web services like Flickr, its utility has not be clearly identified so far. Even Tomas Vder Wal, the one who coined the term "folksonomy", refered TC as cute but with little value.
They performed an experiment and compared two information seeking options: searching vs. TC and find out if TC was conceived as really useful by subjects. The subjects were given 10 information seeking tasks for 10 articles (per task). The system showed top 70 user tags and search box they can use.
The experimental results shows that TC was favoured by the subjects in the expriment sessions: 48.0% vs. 41.2%. People relied on TC made mor queries (higher efficiency) even when relevant keywords were in TC. Their conclusions include that TC was useful for browsing and non-specific information discovery, it provided visual summaries, and it required less cognitive load.
What was clear from this study was the strength of browsing compared with the ad-hoc searching (as in our study with KnowledgeSea). TC provided a new browsing option but the study did not take into account other browsing options than just simple searching. It might be needed to see exactly how TC is accepted differently than other browsing techniques because the results of this paper might have been made due to the browsing scheme itself, not from just the TC.

Friday, September 14, 2007

20070914 PAWS meeting: Ontology-based Annotation

Ontology-based Annotation by Sergey Sosnovsky

provide semantic annotations
knowledge sharing
indexing by human

motivation:
using ontology on applications
definition: creating a markup of web documents using a pre-existing ontology and/or populating knowledge bases by marked up documents.

important characters: automation;format; languages; etc.

SMORE by UMaryland:
it does extract data from documents, but it still replies on human to edit and index the concepts.
but there are problems of manual Annotation, eg. time, expensive, storage, trust
solution: search engine like annotation service

O-based Annotation:
supervised: eg. MnM, human user can accept or reject the annotation.
unsupervised: eg. Amilcare;Annie; T-REX; Pankow: use the templates and query them from google, and collect the hits
SemTag(by IBM): it extracts a huge amount of webpages, automatically get a huge amount of disambiguated semantics tags
Conclusion:
it's a necessary thing
manual is bad, automatic is good

Questions/Comments:
-2006 AAAI paper, there's a work, codes&errors, the value of mining the database? what's the relation btw this and the C parser? (C parser is not the annotation, it's the formal structured grammar)
-common sense is certainly more than grammar

-
Adaptation Hypermedia Technology <--------------------> simple key words
(meaningful concepts)
we can consider to use meaningful parsers and other technology mentioned earlier in this talk to bridge these two.


Personal Services - Summer Visiting Researchmanship and Implications for Adapt2.
A group member talked about his one-and-a-half month long stay in Hannover, Deutschland. While there they worked on personalization component of the Personal Reader application. The presentation is available at PAWS Website. That work (as far as the minutes sindicator understood) focused (at least partially) on bringing a common communication layer to all services responsible for generating the content of an adaptive information access system. The choice the designers made was RDF and RSS. More details in the presentation slides. The presenter also showed a few nice pictures from Hanover.


Paper Summaries Repository
A discussion on paper summarizing and sharing was initiated. Given the choices, majority of people wanted to use CiteULike. Member would be storing summaries in a separate instance, perhaps a web log. No one objected avoiding Google as a web log provider in order to screen them from yet another piece of sensitive personal information they collecting every second.