A Schema Matching Model Based on Partial Verified Matching Relations
-
摘要: 模式匹配是模式集成、语义WEB及电子商务等领域的重点及难点问题. 为了有效利用专家知识提高匹配质量, 提出了一种基于部分已验证匹配关系的模式匹配模型. 在该模型中, 首先,人工验证待匹配模式元素间的少量对应关系, 进而推理出当前任务下部分已知的匹配关系及单独匹配器的缺省权重; 然后,基于上述已收集到的先验知识对多种匹配器所生成的相似度矩阵进行合并及调整, 并在全局范围内进行优化; 最后,对优化矩阵的选择性进行评估, 从而为不同匹配任务推荐最合理的候选匹配生成方案. 实验结果表明, 部分已验证匹配关系的使用有助于模式匹配质量的提高.Abstract: Schema matching is an important and difficult problem in many database application domains, such as data integration, semantic web and data warehousing and so on. In order to use experts' knowledge to improve the matching quality effectively, a schema matching model is proposed based on partial verified matching relations. In this model, first, a small amount of correspondences between schemas elements are verified by manual, and the partial known matching relations and default weights of different matchers are reasoned on the current task; second, the similarity matrices of multiple matchers are combined based on the collected priori knowledge, and optimized under global scope; finally, the selectivity of the optimization matrix is evaluated, and the most reasonable candidate matching generation plans for different matching tasks are generated. Experimental results show that the use of partial verified matching helps to improve the quality of schema matching.
-
Key words:
- Schema matching /
- schema integration /
- similarity matrix /
- candidate matching
-
[1] Bernstein P A, Madhavan J, Rahm E. Generic schema matching, ten years later. Proceedings of the VLDB Endowment, 2011, 4(11): 695-701 [2] Madhavan J, Bernstein P A, Rahm E. Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases. San Francisco, USA: Morgan Kaufman Publishers, 2001. 49-58 [3] Melnik S, Garcia-Molina H, Rahm E. Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering. San Jose, California: IEEE, 2002. 117-128 [4] Kang J, Naughton J F. Schema matching using interattribute dependencies. IEEE Transactions on Knowledge and Data Engineering, 2008, 20(10): 1393-1407 [5] Do Hong-Hai, Rahm E. COMA—A system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very large Data Bases. Hong Kong, China: VLDB, 2002. 610-621 [6] Li W S, Clifton C. SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data and Knowledge Engineering, 2000, 33(1): 49-84 [7] Doan A, Domingos P, Halevy A Y. Reconciling schemas of disparate data sources: a machine-learning approach. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. Santa Barbara, USA: ACM, 2002. 509-520 [8] Bilke A, Naumann F. Schema matching using duplicates. In: Proceedings of the 21st International Conference on Data Engineering. Tokyo, Japan: IEEE, 2005. 69-80 [9] Zhang M H, Hadjieleftheriou M, Ooi B C, Procopiuc C M, Srivastava D. Automatic discovery of attributes in relational databases. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. New York, USA: ACM, 2011. 109-120 [10] Shen De-Rong, Yu En-Yun, Zhang Xu, Kou Yue, Nie Tie-Zheng, Yu Ge. SKM: a schema matching model based on schema structure and known matching knowledge. Journal of Software, 2009, 20(2): 327-338(申德荣, 余恩运, 张旭, 寇月, 聂铁铮, 于戈. SKM: 一种基于模式结构和已有匹配知识的模式匹配模型. 软件学报, 2009, 20(2): 327-338) [11] Li Guo-Hui, Du Xiao-Kun, Du Jian-Qiang. A structure matching method based on partial functional dependencies. Chinese Journal of Computers, 2010, 33(2): 240-250(李国辉, 杜小坤, 杜建强. 基于部分函数依赖的结构匹配方法. 计算机学报, 2010, 33(2): 240-250) [12] Bellahsene Z, Bonifati A, Rahm E. Schema Matching and Mapping. Berlin: Springer-Verlag, 2011. 29-52 [13] Aumueller D, Do H H, Massmann S, Rahm E. Schema and ontology matching with COMA++. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. New York, USA: ACM, 2005. 906-908 [14] Wang G L, Goguen J, Nam Y K, Lin K. Critical points for interactive schema matching. In: Proceedings of the 6th Asia-Pacific Web Conference. Hangzhou, China: Springer-Verlag, 2004. 654-664 [15] Peukert E, Rahm E. Restricting the overlap of top-n sets in schema matching. In: Proceedings of the 1st Workshop on New Trends in Similarity Search. New York, USA: ACM, 2011. 20-25 [16] Kim W, Seo J. Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer, 1991, 24(12): 12-18 [17] Peukert E, Eberius J, Rahm E. Rule-based construction of matching processes. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. New York, USA: ACM, 2011. 2421-2424
点击查看大图
计量
- 文章访问数: 1752
- HTML全文浏览量: 101
- PDF下载量: 865
- 被引次数: 0