搜档网
当前位置:搜档网 › 基于Labeled-LDA模型的文本分类新算法

基于Labeled-LDA模型的文本分类新算法

 万方数据

 万方数据

 万方数据

 万方数据

 万方数据

 万方数据

626计算机学报2008钽

LDA模型中类别化的隐含主题结构与核方法相结

合以进一步提升分类性能;(2)Labeled—LDA模型

并不局限于文本分类,可以应用到其它的监督学习

任务中.

致谢在此,我们向对本文工作给予建议、帮助的

老师和同学,尤其是中国科学院软件研究所中文信

息处理研究组的黄瑞红同学和冯元勇同学,表示

感谢!

E13[z]

[33[4]

[53[6]

[7][8]

参考文献

FabrizioSebastiani.Textcategorization//AlessandroZanasi.TextMininganditsApplications.Southampton,UK:WITPress,2005:109—129

SuJin-Shu,Zhangt30—Feng,XuXin.AdvancesinMachineLearningBasedTextCategorization.JournalofSoftware,2008,17:1848—1859(inChinese)

(苏金树,张博锋,徐听.基于机器学习的文本分类技术研究进展.软件学报,2006。17:1848—1859)

FabrizioSebastiani.Machinelearninginautomatedtextcate—gorization.ACMCompdtingSurveys,2002,34(1):1—47

MoschittiA,BasiliR.ComplexlinguisticfeaturesfortextclassificationiAcomprehensivestudy//McDonaldS,TaitJ.ProceedingsoftheECIR-04.Sunderland:Springer—Verlag.Sunderland,U.K.,2004:181—196

KehagiasA,PetridisV,KaburlasosVG,FragkouP.Acomparisonofword-andsense—basedtextcategorizationusingseveralclassificationalgorithms.JournalofIntelligentInformationSystems,2003,21(3):227—247

DeerwesterS,DumaisST,Furnas

eta1.Indexingbylatentsemanticindexing.JournaloftheAmericanSocietyforInfor—marionScience,1990,41(6)l391-407

ThomasHofmann.Probabilisticlatentsemanticindexing//ProceedingsoftheSIGIR.Berkeley,CA,USA,1999:50—57SchutzeH。HullDAeta1.Acomparisonofclassifiersanddocumentrepresentationsfortheroutingproblem//Proceed-

LIWen-Bo,bornin1975,Ph.D.

candidate.Hisresearchinterestsinclude

informationretrieval,textclassification

andmachinelearning.

Background

This

paperfocusesonthenewtextpresentation

methodsanditsapplicationintextclassification.Classicaltextpresen-

ingsoftheSIGIR一95.Seattle,Washington,USA,1995:

229—237

[9]ChenL,TokudaN,NagaiA.AnewdifferentialLSIspace—basedprobabilistiedocumentclassifier.InformationProcess—

ingLetters,2003,88(5):203—212

[10]BleiD,NgA,JordanM.Latentdirichletallocation.JournalofMachineLearningResearch,2003。3:993—1022

[11]WeiXing,CroftWBruce.LDA-baseddocumentmodelsforAd—Hoeretrieval//ProceedingsoftheSIGIR.Seattle,Wash—

ington,USA,2006:178—185

[12]VijayKrishnan.Shortcomingsoflatentmodelsinsupervisedsettings//ProceedingsoftheSIGIR.Salvador,Brazil,2005:

625-626

[13]LiWei,MeCallumAndrew.Paehinkoallocation:DAG—structuredmixturemodelsoftopiccorrelations//Proceedings

ofICML一06.Pittsburgh,Pennsylvania,2006:577-584

[14]BleiD,LaffertyJ.Correlatedtopicmodels.AdvancesinNeuralInformationProcessingSystems,2005,18:147-154[153WainwrightMJ,JordanMI.Avariationalprincipleforgraphicalmodels//HaykinS,PrincipeJ,SejnowskiT,Me-

WhirterJeds.NewDirectionsinStatisticalSignalProcess-

ing:FromSystemstoBrain.Cambridge,MA:MITPress,

2005:155—202

[16]ChangChih-Chung,LinChih-Jen.LIBSVM:ALibraryforSupportVectorMachines.Taiwan,China,2001

[17]ZengXue—Qiang,WangMinx-Wen,ChenSu—Fen.Atextclassificationmodelbasedonthe1atentsemanticstructure.

JournalofSouthChinaUniversityofTechnology,2004。32:

99-102(inChinese)

(曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分

类模型.华南理工大学学报,2004,32:99—102),

[18]ZhangQi-Rui,ZhangLing,DongShou-Bineta1.Effectsofcategorydistributioninatrainingsetontextcategorization.

JournalofTsinghuaUniversity。2005,45{1802-1805(in

Chinese)

(张启蕊,张凌,董守斌等.训练集类别分布对文本分类的影

响.清华大学学报,2005,45:1802—1805)

[19]YangYi—Minx.Anevaluationofstatisticalapproachestotextcategorization.InformationRetrieval,1999。1(1-2):69-90

SUNLe,bornin1971,associateprofessor.Hisre—searchinterestsincludeinformationretrievalandnaturallan—guageprocessmg.

ZHANGDa—Kun,bornin1980,Ph.D.candidate.Hisresearchinterestsincludemachinetranslationandnaturallanguageprocessing.

tationmethodsmainlyincludevectorspacemodel,n-grams,HMM.andetc.Thesetextpresentationmethodshave

been 万方数据

 万方数据

基于Labeled-LDA模型的文本分类新算法

作者:李文波, 孙乐, 张大鲲, LI Wen-Bo, SUN Le, ZHANG Da-Kun

作者单位:李文波,LI Wen-Bo(中国科学院软件研究所,北京,100080;中国科学院研究生院,北京

,100049), 孙乐,张大鲲,SUN Le,ZHANG Da-Kun(中国科学院软件研究所,北京,100080)

刊名:

计算机学报

英文刊名:CHINESE JOURNAL OF COMPUTERS

年,卷(期):2008,31(4)

被引用次数:11次

参考文献(22条)

1.式(1)中的ψ表示函数logΓ的一阶导函数

2.Yang Yi-Ming An evaluation of statistical approaches to text categorization 1999(1-2)

3.张启蕊;张凌;董守斌训练集类别分布对文本分类的影响[期刊论文]-清华大学学报(自然科学版) 2005(9)

4.曾雪强;王明文;陈素芬一种基于潜在语义结构的文本分类模型[期刊论文]-华南理工大学学报(自然科学版)2004(z1)

5.Chen L;Tokuda N;Nagai A A new differential LSI spacebased probabilistic document classifier[外文期刊] 2003(05)

6.Schutze H;Hull D A A comparison of classifiers and document representations for the routing problem 1995

7.Thomas Hofmann Probabilistic latent semantic indexing 1999

8.Deerwester S;Dumais S T;Furnas Indexing by latent semantic indexing[外文期刊] 1990(06)

9.Kehagias A;Petridis V;Kaburlasos V G;Fragkou P A comparison of word-and sense-based text categorization using several classification algorithms[外文期刊] 2003(03)

10.Moschitti A;Basili R Complex linguistic features for text classification:A comprehensive study 2004

11.Fabrizio Sebastiani Machine learning in automated text categorization 2002(01)

12.苏金树;张博锋;徐昕基于机器学习的文本分类技术研究进展[期刊论文]-软件学报 2006(9)

13.文献[13]中的实验采用的是Accuracy指标,本文统一采用micro_F1.二者在单标签分类情况下是等价的,参见文献[19].另外由于文献[13]中没有给出主题数量/类,所以我们用直线画出

14.狄拉克函数:δ(x)=

15.Chang Chih-Chung;Lin Chih-Jen LIBSVM:A Library for Support Vector Machines 2001

16.Wainwright M J;Jordan M I A variational principle for graphical models 2005

17.Blei D;Lafferty J Correlated topic models 2005

18.Li Wei;McCallum Andrew Pachinko allocation:DAG-structured mixture models of topic correlations[外文会议] 2006

19.Vijay Krishnan Shortcomings of latent models in supervised settings 2005

20.Wei Xing;Croft W Bruce LDA-based document models for Ad-Hoc retrieval 2006

21.Blei D;Ng A;Jordan M Latent dirichlet allocation[外文期刊] 2003(4/5)

22.Fabrizio Sebastiani Text categorization,Alessandro Zanasi.Text Mining and its Applications 2005引证文献(11条)

1.张小平.周雪忠.黄厚宽.冯奇.陈世波基于词相似性与CRP的主题模型[期刊论文]-模式识别与人工智能 2010(1)

2.唐颖军.须德.解文杰.薄一航一种基于类主题空间的图像场景分类方法[期刊论文]-中国图象图形学报A 2010(7)

3.吴飞.韩亚洪.庄越挺.邵健图像-文本相关性挖掘的Web图像聚类方法[期刊论文]-软件学报 2010(7)

4.吴飞.韩亚洪.庄越挺.邵健图像-文本相关性挖掘的Web图像聚类方法[期刊论文]-软件学报 2010(7)

5.肖可.奉国和1999~2008年国内文本分类研究文献计量分析[期刊论文]-情报学报 2010(4)

6.王红军.李志蜀.戚建淮.成镲.周鹏.周维基于贝叶斯网络的半监督聚类集成模型[期刊论文]-软件学报 2010(11)

7.王红军.李志蜀.戚建淮.成镲.周鹏.周维基于贝叶斯网络的半监督聚类集成模型[期刊论文]-软件学报 2010(11)

8.袁柳.张龙波基于统计主题模型的多粒度Web文档标注[期刊论文]-计算机应用 2010(12)

9.石晶.李万龙基于LDA模型的主题词抽取方法[期刊论文]-计算机工程 2010(19)

10.廖晓锋.刘春年.龚花萍基于主题模型的图片检索结果语义聚类[期刊论文]-电脑知识与技术 2010(34)

11.王朝飞.王凯主题模型在数字图书馆Web服务中的应用[期刊论文]-情报理论与实践 2010(2)

本文链接:https://www.sodocs.net/doc/2d11009160.html,/Periodical_jsjxb200804008.aspx

相关主题