Skip to main content


Text Categorization Based on Bayesian Classification Approach using Class-Specific Features

Issue Abstract

Abstract
The wide availability of web documents in electronic forms requires an automatic technique to label the documents with a predefined set of topics, what is known as automatic Text Categorization (TC). Over the past decades, it has been witnessed a  large number of advanced machine learning algorithms to address this challenging task. The generated presentation slides can be used as drafts to help the presenters prepare their formal slides in a quicker way. A novel system called PPSGen is proposed to address this task. Documents are usually represented by the ―bag-of-words‖: namely, each word  or phrase occurs in documents once or more times is considered as a feature. It first employs the regression method to learn the importance scores of the sentences in an academic paper, and then exploits the integer linear programming (ILP) method to generate well-structured slides by selecting and aligning key phrases and sentences.. This paper proposes a novel system called PPSGen to generate presentation slides from academic papers. We train a sentence scoring model based on SVR and use the ILP method to align and extract key phrases and sentences for generating the slides. Experimental results show that our method can generate much better slides than traditional methods.
Keywords: Text Categorization(TC); Machine Learning Algorithms; SVR; PPS Generate; Integer Linear Programming;
 


Author Information
A.POORNIMA,
Issue No
2
Volume No
3
Issue Publish Date
05 Feb 2017
Issue Pages
56-61

Issue References

References

  1. H. Liu and L. Yu, ―Toward integrating  feature selection algorithms for classification and clustering,‖ IEEE
    Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491– 502,2005. 

  2.  P. M. Baggen stoss, ―Class-specific feature sets in classification, ‖IEEE Transactions on Signal Processing, vol.
    47, no. 12,pp. 3428–3432, 1999. 

  3. B. Tang and H. He, ―ENN: Extended nearest neighbor method for pattern recognition [research frontier],‖ IEEE  Computational Intelligence Magazine, vol. 10, no. 3, pp. 52–60, 2015. 

  4. I.-S. Oh, J.-S. Lee, and C. Y. Suen, Analysis of class separation and combination of class-dependent features for handwriting recognition,‖ IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 10, pp. 1089–1094, 1999. 

  5. D. Cai, X. He, and J. Han, ―Document clustering using locality preserving indexing,‖ IEEE Transactions on
    Knowledge and Data Engineering, vol. 17, no. 12, pp. 1624–1637, 2005.