question Find the similarity percent between two strings
-
question
similar("Apple","Appel") => 80% similar("Apple","Mango") => 0%
-
answer
from difflib import SequenceMatcher def similar(a, b): return SequenceMatcher(None, a, b).ratio() >>> similar("Apple","Appel") 0.8 >>> similar("Apple","Mango") 0.0
-
reference Fuzzy string comparison in Python, confused with which library to use [closed]
-
question
import Levenshtein Levenshtein.ratio('hello world', 'hello') Result: 0.625 import difflib difflib.SequenceMatcher(None, 'hello world', 'hello').ratio() Result: 0.625
-
answer
difflib.SequenceMatcher => Ratcliff/Obershelp algorithm Levenshtein => Levenshtein algorithm
-
-
FuzzyWuzzy: Fuzzy String Matching in Python
-
string similarity
from difflib import SequenceMatcher m = SequenceMatcher(None, 'new york mets', 'new york meats') m.ratio() => 0.9626... fuzz.ratio('new york mets', 'new york meats') => 96
-
partial string similarity
fuzz.ratio('yankees', 'new york yankees') => 60 fuzz.ratio('new york mets', 'new york yankees') => 75 fuzz.ratio('yankees', 'new york yankees') => 100 fuzz.ratio('new york mets', 'new york yankees') => 69
-
out of order
fuzz.ratio('new york mets vs atlanta braves', 'atlanta braves vs new york mets') => 45 fuzz.partial_ratio('new york mets vs atlanta braves', 'atlanta braves vs new york mets') => 45 # token sort 'new york mets vs atlanta braves' --> 'atlanta braves mets new vs york' fuzz.token_sort_ratio('new york mets vs atlanta braves', 'atlanta braves vs new york mets') => 100 # token set s1 = 'mariners vs angels' s2 = 'los angeles angels of anaheim at seattle mariners' # after sort t1 = 'angels mariners vs' t2 = 'anaheim angeles angels los mariners of seattle vs' fuzz.token_set_ratio('mariners vs angels', 'los angels of anaheim at seattle mariners') => 90 fuzz.token_set_ratio('sirhan, sirhan', 'sirhan') => 100
-
references
-
distance
-
source code
-
doc