Class JaroWinklerDistance

  • All Implemented Interfaces:
    java.util.function.BiFunction<java.lang.CharSequence,​java.lang.CharSequence,​java.lang.Double>, EditDistance<java.lang.Double>, ObjectSimilarityScore<java.lang.CharSequence,​java.lang.Double>, SimilarityScore<java.lang.Double>

    public class JaroWinklerDistance
    extends java.lang.Object
    implements EditDistance<java.lang.Double>
    Measures the Jaro-Winkler distance of two character sequences. It is the complementary of Jaro-Winkler similarity.
    Since:
    1.0
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int INDEX_NOT_FOUND
      Deprecated.
      Deprecated as of 1.7.
    • Constructor Summary

      Constructors 
      Constructor Description
      JaroWinklerDistance()
      Creates a new instance.
    • Method Detail

      • matches

        @Deprecated
        protected static int[] matches​(java.lang.CharSequence first,
                                       java.lang.CharSequence second)
        Deprecated.
        Deprecated as of 1.7, use JaroWinklerSimilarity.matches(CharSequence, CharSequence). This method will be removed in 2.0. TODO see TEXT-104.
        Computes the Jaro-Winkler string matches, half transpositions, prefix array.
        Parameters:
        first - the first string to be matched.
        second - the second string to be matched.
        Returns:
        array containing: matches, half transpositions, and prefix
      • apply

        public java.lang.Double apply​(java.lang.CharSequence left,
                                      java.lang.CharSequence right)
        Computes the Jaro Winkler Distance between two character sequences.
         distance.apply(null, null)          = Throws IllegalArgumentException
         distance.apply("foo", null)         = Throws IllegalArgumentException
         distance.apply(null, "foo")         = Throws IllegalArgumentException
         distance.apply("", "")              = 0.0
         distance.apply("foo", "foo")        = 0.0
         distance.apply("foo", "foo ")       = 0.06
         distance.apply("foo", "foo  ")      = 0.09
         distance.apply("foo", " foo ")      = 0.13
         distance.apply("foo", "  foo")      = 0.49
         distance.apply("", "a")             = 1.0
         distance.apply("aaapppp", "")       = 1.0
         distance.apply("frog", "fog")       = 0.07
         distance.apply("fly", "ant")        = 1.0
         distance.apply("elephant", "hippo") = 0.56
         distance.apply("hippo", "elephant") = 0.56
         distance.apply("hippo", "zzzzzzzz") = 1.0
         distance.apply("hello", "hallo")    = 0.12
         distance.apply("ABC Corporation", "ABC Corp") = 0.09
         distance.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.05
         distance.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.08
         distance.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.12
         
        Specified by:
        apply in interface java.util.function.BiFunction<java.lang.CharSequence,​java.lang.CharSequence,​java.lang.Double>
        Specified by:
        apply in interface ObjectSimilarityScore<java.lang.CharSequence,​java.lang.Double>
        Specified by:
        apply in interface SimilarityScore<java.lang.Double>
        Parameters:
        left - the first input, must not be null.
        right - the second input, must not be null.
        Returns:
        result distance.
        Throws:
        java.lang.IllegalArgumentException - if either CharSequence input is null.
      • apply

        public <E> java.lang.Double apply​(SimilarityInput<E> left,
                                          SimilarityInput<E> right)
        Computes the Jaro Winkler Distance between two character sequences.
         distance.apply(null, null)          = Throws IllegalArgumentException
         distance.apply("foo", null)         = Throws IllegalArgumentException
         distance.apply(null, "foo")         = Throws IllegalArgumentException
         distance.apply("", "")              = 0.0
         distance.apply("foo", "foo")        = 0.0
         distance.apply("foo", "foo ")       = 0.06
         distance.apply("foo", "foo  ")      = 0.09
         distance.apply("foo", " foo ")      = 0.13
         distance.apply("foo", "  foo")      = 0.49
         distance.apply("", "a")             = 1.0
         distance.apply("aaapppp", "")       = 1.0
         distance.apply("frog", "fog")       = 0.07
         distance.apply("fly", "ant")        = 1.0
         distance.apply("elephant", "hippo") = 0.56
         distance.apply("hippo", "elephant") = 0.56
         distance.apply("hippo", "zzzzzzzz") = 1.0
         distance.apply("hello", "hallo")    = 0.12
         distance.apply("ABC Corporation", "ABC Corp") = 0.09
         distance.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.05
         distance.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.08
         distance.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.12
         
        Type Parameters:
        E - The type of similarity score unit.
        Parameters:
        left - the first input, must not be null.
        right - the second input, must not be null.
        Returns:
        result distance.
        Throws:
        java.lang.IllegalArgumentException - if either CharSequence input is null.
        Since:
        1.13.0