-
摘要: 提出了一个通过对同一主题的多文档集合内局部主题的判定和抽取生成多文档文摘 的方法.首先在对多文档集合中句子依存分析和语义分析的基础上进行相似度计算,将相似 句子经过聚类形成多文档集合内不同的局部主题,然后进行每个局部主题中质心句的抽取和 排序,生成多文档文摘.该方法实现了文摘长度随文档内容自动确定,从而保证了文摘中包 含的信息的全面和简洁.最后文中还给出了多文档文摘的评价方法和实验结果,文摘的平均 精确率和平均压缩率分别为71.4%和25.2%.Abstract: This paper describes a multi-document summarization method based on local topics identification and extraction. The similarity of sentences is measured by analysis of dependency and semantics. Local topics are found by sentence clustering. The centroid sentence is extracted from each local topic and is ordered to generate summarization. The size of summarization is determined according to content of multiple documents, as a result, the summarization becomes general and concise. Finally, the evaluation and experiment are given, the average precision of summarization and the average ratio of compressibility are 71.4% and 25.2%, respectively.
-
Key words:
- Multi-document summarization /
- local topic /
- clustering
计量
- 文章访问数: 2817
- HTML全文浏览量: 81
- PDF下载量: 950
- 被引次数: 0