Class JaroWinklerSimilarity
- java.lang.Object
-
- org.apache.commons.text.similarity.JaroWinklerSimilarity
-
- All Implemented Interfaces:
java.util.function.BiFunction<java.lang.CharSequence,java.lang.CharSequence,java.lang.Double>,ObjectSimilarityScore<java.lang.CharSequence,java.lang.Double>,SimilarityScore<java.lang.Double>
public class JaroWinklerSimilarity extends java.lang.Object implements SimilarityScore<java.lang.Double>
A similarity algorithm indicating the percentage of matched characters between two character sequences.The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters.
This implementation is based on the Jaro Winkler similarity algorithm from https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance.
This code has been adapted from Apache Commons Lang 3.3.
- Since:
- 1.7
-
-
Constructor Summary
Constructors Constructor Description JaroWinklerSimilarity()Creates a new instance.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.Doubleapply(java.lang.CharSequence left, java.lang.CharSequence right)Computes the Jaro Winkler Similarity between two character sequences.<E> java.lang.Doubleapply(SimilarityInput<E> left, SimilarityInput<E> right)Computes the Jaro Winkler Similarity between two character sequences.protected static int[]matches(java.lang.CharSequence first, java.lang.CharSequence second)Computes the Jaro-Winkler string matches, half transpositions, prefix array.protected static <E> int[]matches(SimilarityInput<E> first, SimilarityInput<E> second)Computes the Jaro-Winkler string matches, half transpositions, prefix array.
-
-
-
Constructor Detail
-
JaroWinklerSimilarity
public JaroWinklerSimilarity()
Creates a new instance.
-
-
Method Detail
-
matches
protected static int[] matches(java.lang.CharSequence first, java.lang.CharSequence second)
Computes the Jaro-Winkler string matches, half transpositions, prefix array.- Parameters:
first- the first input to be matched.second- the second input to be matched.- Returns:
- mtp array containing: matches, half transpositions, and prefix.
-
matches
protected static <E> int[] matches(SimilarityInput<E> first, SimilarityInput<E> second)
Computes the Jaro-Winkler string matches, half transpositions, prefix array.- Type Parameters:
E- The type of similarity score unit.- Parameters:
first- the first input to be matched.second- the second input to be matched.- Returns:
- mtp array containing: matches, half transpositions, and prefix.
- Since:
- 1.13.0
-
apply
public java.lang.Double apply(java.lang.CharSequence left, java.lang.CharSequence right)
Computes the Jaro Winkler Similarity between two character sequences.sim.apply(null, null) = Throws
IllegalArgumentExceptionsim.apply("foo", null) = ThrowsIllegalArgumentExceptionsim.apply(null, "foo") = ThrowsIllegalArgumentExceptionsim.apply("", "") = 1.0 sim.apply("foo", "foo") = 1.0 sim.apply("foo", "foo ") = 0.94 sim.apply("foo", "foo ") = 0.91 sim.apply("foo", " foo ") = 0.87 sim.apply("foo", " foo") = 0.51 sim.apply("", "a") = 0.0 sim.apply("aaapppp", "") = 0.0 sim.apply("frog", "fog") = 0.93 sim.apply("fly", "ant") = 0.0 sim.apply("elephant", "hippo") = 0.44 sim.apply("hippo", "elephant") = 0.44 sim.apply("hippo", "zzzzzzzz") = 0.0 sim.apply("hello", "hallo") = 0.88 sim.apply("ABC Corporation", "ABC Corp") = 0.91 sim.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95 sim.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92 sim.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88- Specified by:
applyin interfacejava.util.function.BiFunction<java.lang.CharSequence,java.lang.CharSequence,java.lang.Double>- Specified by:
applyin interfaceObjectSimilarityScore<java.lang.CharSequence,java.lang.Double>- Specified by:
applyin interfaceSimilarityScore<java.lang.Double>- Parameters:
left- the first input, must not be null.right- the second input, must not be null.- Returns:
- result similarity.
- Throws:
java.lang.IllegalArgumentException- if either CharSequence input isnull.
-
apply
public <E> java.lang.Double apply(SimilarityInput<E> left, SimilarityInput<E> right)
Computes the Jaro Winkler Similarity between two character sequences.sim.apply(null, null) = Throws
IllegalArgumentExceptionsim.apply("foo", null) = ThrowsIllegalArgumentExceptionsim.apply(null, "foo") = ThrowsIllegalArgumentExceptionsim.apply("", "") = 1.0 sim.apply("foo", "foo") = 1.0 sim.apply("foo", "foo ") = 0.94 sim.apply("foo", "foo ") = 0.91 sim.apply("foo", " foo ") = 0.87 sim.apply("foo", " foo") = 0.51 sim.apply("", "a") = 0.0 sim.apply("aaapppp", "") = 0.0 sim.apply("frog", "fog") = 0.93 sim.apply("fly", "ant") = 0.0 sim.apply("elephant", "hippo") = 0.44 sim.apply("hippo", "elephant") = 0.44 sim.apply("hippo", "zzzzzzzz") = 0.0 sim.apply("hello", "hallo") = 0.88 sim.apply("ABC Corporation", "ABC Corp") = 0.91 sim.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95 sim.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92 sim.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88- Type Parameters:
E- The type of similarity score unit.- Parameters:
left- the first input, must not be null.right- the second input, must not be null.- Returns:
- result similarity.
- Throws:
java.lang.IllegalArgumentException- if either CharSequence input isnull.- Since:
- 1.13.0
-
-