Skip to content

Fuzzy Matching Code in T-SQL Using BK-Tree Structure

License

Notifications You must be signed in to change notification settings

sfelman/FuzzyMatch

Repository files navigation

FuzzyMatch

Fuzzy Matching Code in T-SQL

Use a dictionary of words or other strings that you would like to fuzzy match.

For this example I used https://github.com/zeisler/scrabble/blob/master/db/dictionary.csv

Order of SQL for Setup:

  1. F_Levenshtein.sql - altered version of https://blog.softwx.net/2014/12/optimizing-levenshtein-algorithm-in-tsql.html
  2. BK_Tree.sql
  3. P_BK_Tree_Insert.sql
  4. BK_Tree_Triggers.sql
  5. P_BK_Tree_Search.sql

To initialize the tree, you can run a normal insert call(the insert trigger will take care of the P_BK_Tree_Insert calls) such as:

insert into BK_Tree (word) select word from Dictionary

BK-Tree insert computes in O(N) time - took ~2 hours for the 172k dictionary.

BK-Tree search computes in O(log N) time

To Remove a word from the tree, run: UPDATE BK_Tree SET active = 0 WHERE word = 'word'

Some example outputs of P_BK_Tree_Search on the Scrabble Words Dictionary of words and the runtimes: image image image

Releases

No releases published

Packages

No packages published

Languages