### PAWS Meeting 2007-03-02

1. English for Dummies: Searching for Rules versus Exploiting Examples

Presented by Angela Brunstein

In our recent study on the Chemnitz Internet Grammar, we replicated and extended our earlier results on that system by presenting one and the same version of the grammar system to all students and by unifying tests and instructions. Altogether 200 students processed two chapters of the grammar on Present Continuous and Present Perfect for 30 minutes. They either searched for specific contents in the grammar or explored the grammar on their own. Before and after processing the grammar, they answered factual questions and mastered application tests. Corresponding with literature and our earlier results, students with the specific search task answered questions related to the contents they had searched for more in detail. In contrast, students with the unspecific exploration task answered more questions but less in detail. More interestingly, both groups differed remarkably when applying their knowledge. When learning by examples, students who explored the grammar on their own improved their skills more than students who had searched for specific contents. This effect can also be found in tendency when learning by rules and examples, but not for learning by rules only. In that case, a guiding instruction highlighting most relevant facts in the grammar seems to enhance learning.

2. PEBL: Web Page Classification without Negative Examples

Presented by Chirayu Wongchokprasitti

Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious pre- processing such as collecting positive and negative training examples. For instance, in order to construct a “homepage” classifier, one needs to collect a sample of homepages (positive examples) and a sample of non homepages (negative examples). In particular, collecting negative training examples requires arduous work and caution to avoid bias. This paper presents a framework, called Positive Example Based Learning (PEBL), for Web page classification which eliminates the need for manually collecting negative training examples in preprocessing. The PEBL framework applies an algorithm, called Mapping-Convergence (M-C), to achieve high classification accuracy (with positive and unlabeled data) as high as that of a traditional SVM (with positive and negative data). M-C runs in two stages: the mapping stage and convergence stage. In the mapping stage, the algorithm uses a weak classifier that draws an initial approximation of “strong” negative data. Based on the initial approximation, the convergence stage iteratively runs an internal classifier (e.g., SVM) which maximizes margins to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. We present the M-C algorithm with supporting theoretical and experimental justifications. Our experiments show that, given the same set of positive examples, the M-C algorithm outperforms one-class SVMs, and it is almost as accurate as the traditional SVMs.

3. Discussion of job-recommendation system Proactve

presented by Daniela Hyunsook Lee

Presented by Angela Brunstein

In our recent study on the Chemnitz Internet Grammar, we replicated and extended our earlier results on that system by presenting one and the same version of the grammar system to all students and by unifying tests and instructions. Altogether 200 students processed two chapters of the grammar on Present Continuous and Present Perfect for 30 minutes. They either searched for specific contents in the grammar or explored the grammar on their own. Before and after processing the grammar, they answered factual questions and mastered application tests. Corresponding with literature and our earlier results, students with the specific search task answered questions related to the contents they had searched for more in detail. In contrast, students with the unspecific exploration task answered more questions but less in detail. More interestingly, both groups differed remarkably when applying their knowledge. When learning by examples, students who explored the grammar on their own improved their skills more than students who had searched for specific contents. This effect can also be found in tendency when learning by rules and examples, but not for learning by rules only. In that case, a guiding instruction highlighting most relevant facts in the grammar seems to enhance learning.

2. PEBL: Web Page Classification without Negative Examples

Presented by Chirayu Wongchokprasitti

Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious pre- processing such as collecting positive and negative training examples. For instance, in order to construct a “homepage” classifier, one needs to collect a sample of homepages (positive examples) and a sample of non homepages (negative examples). In particular, collecting negative training examples requires arduous work and caution to avoid bias. This paper presents a framework, called Positive Example Based Learning (PEBL), for Web page classification which eliminates the need for manually collecting negative training examples in preprocessing. The PEBL framework applies an algorithm, called Mapping-Convergence (M-C), to achieve high classification accuracy (with positive and unlabeled data) as high as that of a traditional SVM (with positive and negative data). M-C runs in two stages: the mapping stage and convergence stage. In the mapping stage, the algorithm uses a weak classifier that draws an initial approximation of “strong” negative data. Based on the initial approximation, the convergence stage iteratively runs an internal classifier (e.g., SVM) which maximizes margins to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. We present the M-C algorithm with supporting theoretical and experimental justifications. Our experiments show that, given the same set of positive examples, the M-C algorithm outperforms one-class SVMs, and it is almost as accurate as the traditional SVMs.

3. Discussion of job-recommendation system Proactve

presented by Daniela Hyunsook Lee

## 0 Comments:

Post a Comment

<< Home