万方数据
万方数据
万方数据
万方数据
万方数据
万方数据
626计算机学报2008钽
LDA模型中类别化的隐含主题结构与核方法相结
合以进一步提升分类性能;(2)Labeled—LDA模型
并不局限于文本分类,可以应用到其它的监督学习
任务中.
致谢在此,我们向对本文工作给予建议、帮助的
老师和同学,尤其是中国科学院软件研究所中文信
息处理研究组的黄瑞红同学和冯元勇同学,表示
感谢!
E13[z]
[33[4]
[53[6]
[7][8]
参考文献
FabrizioSebastiani.Textcategorization//AlessandroZanasi.TextMininganditsApplications.Southampton,UK:WITPress,2005:109—129
SuJin-Shu,Zhangt30—Feng,XuXin.AdvancesinMachineLearningBasedTextCategorization.JournalofSoftware,2008,17:1848—1859(inChinese)
(苏金树,张博锋,徐听.基于机器学习的文本分类技术研究进展.软件学报,2006。17:1848—1859)
FabrizioSebastiani.Machinelearninginautomatedtextcate—gorization.ACMCompdtingSurveys,2002,34(1):1—47
MoschittiA,BasiliR.ComplexlinguisticfeaturesfortextclassificationiAcomprehensivestudy//McDonaldS,TaitJ.ProceedingsoftheECIR-04.Sunderland:Springer—Verlag.Sunderland,U.K.,2004:181—196
KehagiasA,PetridisV,KaburlasosVG,FragkouP.Acomparisonofword-andsense—basedtextcategorizationusingseveralclassificationalgorithms.JournalofIntelligentInformationSystems,2003,21(3):227—247
DeerwesterS,DumaisST,Furnas
eta1.Indexingbylatentsemanticindexing.JournaloftheAmericanSocietyforInfor—marionScience,1990,41(6)l391-407
ThomasHofmann.Probabilisticlatentsemanticindexing//ProceedingsoftheSIGIR.Berkeley,CA,USA,1999:50—57SchutzeH。HullDAeta1.Acomparisonofclassifiersanddocumentrepresentationsfortheroutingproblem//Proceed-
LIWen-Bo,bornin1975,Ph.D.
candidate.Hisresearchinterestsinclude
informationretrieval,textclassification
andmachinelearning.
Background
This
paperfocusesonthenewtextpresentation
methodsanditsapplicationintextclassification.Classicaltextpresen-
ingsoftheSIGIR一95.Seattle,Washington,USA,1995:
229—237
[9]ChenL,TokudaN,NagaiA.AnewdifferentialLSIspace—basedprobabilistiedocumentclassifier.InformationProcess—
ingLetters,2003,88(5):203—212
[10]BleiD,NgA,JordanM.Latentdirichletallocation.JournalofMachineLearningResearch,2003。3:993—1022
[11]WeiXing,CroftWBruce.LDA-baseddocumentmodelsforAd—Hoeretrieval//ProceedingsoftheSIGIR.Seattle,Wash—
ington,USA,2006:178—185
[12]VijayKrishnan.Shortcomingsoflatentmodelsinsupervisedsettings//ProceedingsoftheSIGIR.Salvador,Brazil,2005:
625-626
[13]LiWei,MeCallumAndrew.Paehinkoallocation:DAG—structuredmixturemodelsoftopiccorrelations//Proceedings
ofICML一06.Pittsburgh,Pennsylvania,2006:577-584
[14]BleiD,LaffertyJ.Correlatedtopicmodels.AdvancesinNeuralInformationProcessingSystems,2005,18:147-154[153WainwrightMJ,JordanMI.Avariationalprincipleforgraphicalmodels//HaykinS,PrincipeJ,SejnowskiT,Me-
WhirterJeds.NewDirectionsinStatisticalSignalProcess-
ing:FromSystemstoBrain.Cambridge,MA:MITPress,
2005:155—202
[16]ChangChih-Chung,LinChih-Jen.LIBSVM:ALibraryforSupportVectorMachines.Taiwan,China,2001
[17]ZengXue—Qiang,WangMinx-Wen,ChenSu—Fen.Atextclassificationmodelbasedonthe1atentsemanticstructure.
JournalofSouthChinaUniversityofTechnology,2004。32:
99-102(inChinese)
(曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分
类模型.华南理工大学学报,2004,32:99—102),
[18]ZhangQi-Rui,ZhangLing,DongShou-Bineta1.Effectsofcategorydistributioninatrainingsetontextcategorization.
JournalofTsinghuaUniversity。2005,45{1802-1805(in
Chinese)
(张启蕊,张凌,董守斌等.训练集类别分布对文本分类的影
响.清华大学学报,2005,45:1802—1805)
[19]YangYi—Minx.Anevaluationofstatisticalapproachestotextcategorization.InformationRetrieval,1999。1(1-2):69-90
SUNLe,bornin1971,associateprofessor.Hisre—searchinterestsincludeinformationretrievalandnaturallan—guageprocessmg.
ZHANGDa—Kun,bornin1980,Ph.D.candidate.Hisresearchinterestsincludemachinetranslationandnaturallanguageprocessing.
tationmethodsmainlyincludevectorspacemodel,n-grams,HMM.andetc.Thesetextpresentationmethodshave
been 万方数据
万方数据
基于Labeled-LDA模型的文本分类新算法
作者:李文波, 孙乐, 张大鲲, LI Wen-Bo, SUN Le, ZHANG Da-Kun
作者单位:李文波,LI Wen-Bo(中国科学院软件研究所,北京,100080;中国科学院研究生院,北京
,100049), 孙乐,张大鲲,SUN Le,ZHANG Da-Kun(中国科学院软件研究所,北京,100080)
刊名:
计算机学报
英文刊名:CHINESE JOURNAL OF COMPUTERS
年,卷(期):2008,31(4)
被引用次数:11次
参考文献(22条)
1.式(1)中的ψ表示函数logΓ的一阶导函数
2.Yang Yi-Ming An evaluation of statistical approaches to text categorization 1999(1-2)
3.张启蕊;张凌;董守斌训练集类别分布对文本分类的影响[期刊论文]-清华大学学报(自然科学版) 2005(9)
4.曾雪强;王明文;陈素芬一种基于潜在语义结构的文本分类模型[期刊论文]-华南理工大学学报(自然科学版)2004(z1)
5.Chen L;Tokuda N;Nagai A A new differential LSI spacebased probabilistic document classifier[外文期刊] 2003(05)
6.Schutze H;Hull D A A comparison of classifiers and document representations for the routing problem 1995
7.Thomas Hofmann Probabilistic latent semantic indexing 1999
8.Deerwester S;Dumais S T;Furnas Indexing by latent semantic indexing[外文期刊] 1990(06)
9.Kehagias A;Petridis V;Kaburlasos V G;Fragkou P A comparison of word-and sense-based text categorization using several classification algorithms[外文期刊] 2003(03)
10.Moschitti A;Basili R Complex linguistic features for text classification:A comprehensive study 2004
11.Fabrizio Sebastiani Machine learning in automated text categorization 2002(01)
12.苏金树;张博锋;徐昕基于机器学习的文本分类技术研究进展[期刊论文]-软件学报 2006(9)
13.文献[13]中的实验采用的是Accuracy指标,本文统一采用micro_F1.二者在单标签分类情况下是等价的,参见文献[19].另外由于文献[13]中没有给出主题数量/类,所以我们用直线画出
14.狄拉克函数:δ(x)=
15.Chang Chih-Chung;Lin Chih-Jen LIBSVM:A Library for Support Vector Machines 2001
16.Wainwright M J;Jordan M I A variational principle for graphical models 2005
17.Blei D;Lafferty J Correlated topic models 2005
18.Li Wei;McCallum Andrew Pachinko allocation:DAG-structured mixture models of topic correlations[外文会议] 2006
19.Vijay Krishnan Shortcomings of latent models in supervised settings 2005
20.Wei Xing;Croft W Bruce LDA-based document models for Ad-Hoc retrieval 2006
21.Blei D;Ng A;Jordan M Latent dirichlet allocation[外文期刊] 2003(4/5)
22.Fabrizio Sebastiani Text categorization,Alessandro Zanasi.Text Mining and its Applications 2005引证文献(11条)
1.张小平.周雪忠.黄厚宽.冯奇.陈世波基于词相似性与CRP的主题模型[期刊论文]-模式识别与人工智能 2010(1)
2.唐颖军.须德.解文杰.薄一航一种基于类主题空间的图像场景分类方法[期刊论文]-中国图象图形学报A 2010(7)
3.吴飞.韩亚洪.庄越挺.邵健图像-文本相关性挖掘的Web图像聚类方法[期刊论文]-软件学报 2010(7)
4.吴飞.韩亚洪.庄越挺.邵健图像-文本相关性挖掘的Web图像聚类方法[期刊论文]-软件学报 2010(7)
5.肖可.奉国和1999~2008年国内文本分类研究文献计量分析[期刊论文]-情报学报 2010(4)
6.王红军.李志蜀.戚建淮.成镲.周鹏.周维基于贝叶斯网络的半监督聚类集成模型[期刊论文]-软件学报 2010(11)
7.王红军.李志蜀.戚建淮.成镲.周鹏.周维基于贝叶斯网络的半监督聚类集成模型[期刊论文]-软件学报 2010(11)
8.袁柳.张龙波基于统计主题模型的多粒度Web文档标注[期刊论文]-计算机应用 2010(12)
9.石晶.李万龙基于LDA模型的主题词抽取方法[期刊论文]-计算机工程 2010(19)
10.廖晓锋.刘春年.龚花萍基于主题模型的图片检索结果语义聚类[期刊论文]-电脑知识与技术 2010(34)
11.王朝飞.王凯主题模型在数字图书馆Web服务中的应用[期刊论文]-情报理论与实践 2010(2)
本文链接:https://www.sodocs.net/doc/2d11009160.html,/Periodical_jsjxb200804008.aspx