An in-depth study of similarity predicate committee
Publication date: May 2019
Source: Information Processing & Management, Volume 56, Issue 3
Author(s): Jia Zhu, Gabriel Pui Cheong Fung, Zeyang Lei, Min Yang, Ying Shen
In the last decades, many similarity measures are proposed, such as Jaccard coefficient, cosine similarity, BM25, language model, etc. Despite the effectiveness of the existing similarity measures, we observe that none of them can consistently outperform the others in most typical situations. Choosing which similarity predicate to use is usually treated as an empirical question by evaluating a particular task with a number of different similarity predicates, which is not computationally efficient and the obtained results are not portable. In this paper, we propose a novel approach to combine different similarity predicates together to form a committee so that we do not need to worry about choosing which of them to use. Empirically, we can obtain a better result than any individual similarity predicate, which is quite meaningful in practice. Specifically, our method models the problem of committee generation as a 0–1 integer programming problem based on the confidence of similarity predicates and the reliability of attributes. We demonstrate the effectiveness of our model by applying it on three datasets with controlled errors. Experimental results demonstrate that our similarity predicate committee is more robust and superior over existing individual similarity predicates.
Keeping up-to-date with research can feel impossible, with papers being published faster than you'll ever be able to read them. That's where Researcher comes in: we're simplifying discovery and making important discussions happen. With over 19,000 sources, including peer-reviewed journals, preprints, blogs, universities, podcasts and Live events across 10 research areas, you'll never miss what's important to you. It's like social media, but better. Oh, and we should mention - it's free.
Researcher displays publicly available abstracts and doesn’t host any full article content. If the content is open access, we will direct clicks from the abstracts to the publisher website and display the PDF copy on our platform. Clicks to view the full text will be directed to the publisher website, where only users with subscriptions or access through their institution are able to view the full article.