yong chen / Shenyang Institute of Automation, Chinese Academy of Sciences
Through the research of Naive Bayes, this article puts forward an incremental text classification algorithm based on improved Naive Bayes (IMP-NB) for the defects of Naive Bayes. On the one hand, IMP-NB will comprehensive consider various categories of testing thesaurus feature words which is more obvious and the category feature of the training text when in calculating the posterior probability; on the other hand, IMP-NB using the incremental learning mode enhanced the self-learning ability of the algorithm, by selecting feature word from the evaluation of test documents and reliable classification which have better classification ability to the corresponding categories of incremental words in the library. The feature words with higher classification ability are classified into different categories according to different levels. Experiments used Chinese news text classification data and the results show that the accuracy of IMP-NB and other indicators improved more significantly than the traditional simple Bias improvement method.