Text Categorization Based on Bayesian Classification Approach using Class-Specific Features

The wide availability of web documents in electronic forms requires an automatic technique to label the documents with a predefined set of topics, what is known as automatic Text Categorization (TC). Over the past decades, it has been witnessed a  large number of advanced machine learning algorithms to address this challenging task. The generated presentation slides can be used as drafts to help the presenters prepare their formal slides in a quicker way. A novel system called PPSGen is proposed to address this task. Documents are usually represented by the ―bag-of-words‖: namely, each word  or phrase occurs in documents once or more times is considered as a feature. It first employs the regression method to learn the importance scores of the sentences in an academic paper, and then exploits the integer linear programming (ILP) method to generate well-structured slides by selecting and aligning key phrases and sentences.. This paper proposes a novel system called PPSGen to generate presentation slides from academic papers. We train a sentence scoring model based on SVR and use the ILP method to align and extract key phrases and sentences for generating the slides. Experimental results show that our method can generate much better slides than traditional methods.
Keywords: Text Categorization(TC); Machine Learning Algorithms; SVR; PPS Generate; Integer Linear Programming;

05 Feb 2017
