Jaccard

Bibliography

1

R. Cilibrasi and P. M. B. Vitanyi. Clustering by compression. IEEE Transactions on Information Theory, 51(4):1523–1545, 2005.

2

Rudi L. Cilibrasi and Paul M. B. Vitanyi. The Google similarity distance. IEEE Trans. on Knowl. and Data Eng., 19(3):370–383, March 2007.

3

Michel Marie Deza and Elena Deza. Encyclopedia of distances. Springer, Berlin, fourth edition, 2016.

4

Ged Ridgway. Mutual information — Wikipedia, the Free Encyclopedia, Revision as of 14:55, 22 january 2010, 2010. [Online; accessed 14-May-2020].

5

Alonso Gragera and Vorapong Suppakitpaisarn. Semimetric properties of Sørensen-Dice and Tversky indexes. In WALCOM: algorithms and computation, volume 9627 of Lecture Notes in Comput. Sci., pages 339–350. Springer, [Cham], 2016.

6

Alonso Gragera and Vorapong Suppakitpaisarn. Relaxed triangle inequality ratio of the Sørensen-Dice and Tversky indexes. Theoret. Comput. Sci., 718:37–45, 2018.

7

Sergio Jiménez, Claudia Jeanneth Becerra, and Alexander F. Gelbukh. SOFTCARDINALITY-CORE: improving text overlap with distributional measures for semantic textual similarity. In Mona T. Diab, Timothy Baldwin, and Marco Baroni, editors, Proceedings of the Second Joint Conference on Lexical and Computational Semantics, *SEM 2013, June 13-14, 2013, Atlanta, Georgia, USA, pages 194–201. Association for Computational Linguistics, 2013.

8

Bjørn Kjos-Hanssen. Lean project: a 1-parameter family of metrics connecting jaccard distance to normalized information distance. https://github.com/bjoernkjoshanssen/jaccard, 2021.

9

Bjørn Kjos-Hanssen, Saroj Niraula, and Soowhan Yoon. A parametrized family of Tversky metrics connecting the Jaccard distance to an analogue of the Normalized Information Distance. In Sergei Artemov and Anil Nerode, editors, Logical Foundations of Computer Science, pages 112–124, Cham, 2022. Springer International Publishing.

10

A Kraskov, H Stögbauer, R. G Andrzejak, and P Grassberger. Hierarchical clustering using mutual information. Europhysics Letters (EPL), 70(2):278–284, apr 2005.

11

Alexander Kraskov, Harald Stögbauer, Ralph G. Andrzejak, and Peter Grassberger. Hierarchical clustering based on mutual information. ArXiv, q-bio.QM/0311039, 2003.

12

Abraham Lempel and Jacob Ziv. On the complexity of finite sequences. IEEE Trans. Inform. Theory, IT-22(1):75–81, 1976.

13

Ming Li, Jonathan H. Badger, Xin Chen, Sam Kwong, Paul E. Kearney, and Haoyong Zhang. An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics, 17 2:149–54, 2001.

14

Ming Li, Xin Chen, Xin Li, Bin Ma, and Paul M. B. Vitányi. The similarity metric. IEEE Trans. Inform. Theory, 50(12):3250–3264, 2004.

15

Edward Raff and Charles K. Nicholas. An Alternative to NCD for Large Sequences, Lempel–Ziv Jaccard Distance. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017.

16

C. Rajski. Entropy and metric spaces. In Information theory (Symposium, London, 1960), pages 41–45. Butterworths, Washington, D.C., 1961.

17

Suvrit Sra. Is the Jaccard distance a distance? MathOverflow. URL:https://mathoverflow.net/q/210750 (version: 2015-07-03).

18

A. Tversky. Features of similarity. Psychological Review, 84(4):327––352, 1977.

19

Jacob Ziv and Abraham Lempel. A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory, IT-23(3):337–343, 1977.

20

Jacob Ziv and Abraham Lempel. Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory, 24(5):530–536, 1978.