How to identify similarly pronounced words in SQL server?

Apr 29, 2022

2 min read

There are two functions in SQL Server that are used to identify whether the two strings are pronounced similarly or not.

They are

  • SOUNDEX() - This function takes a string as parameter and returns a four-character code. This code is called as Soundex. When this code is calculated it basically ignores the vowels (A, E, I, O, U), H, W, and Y unless they are the first letter of the string.

  • DIFFERENCE() - This function takes two strings as parameter and returns a integer value from 1 to 4. This function internally calculates the SOUNDEX code for each of the string and find the difference between the two SOUNDEX code.

SOUNDEX ('Sequel') AS Sequel,
DIFFERENCE('SQL', 'Sequel') AS Similarity;  

SOUNDEX ('Michael Jackson') AS Michael_Jackson, 
SOUNDEX ('Mitchel Johnson') AS Mitchel_Johnson,
DIFFERENCE('Michael Jackson','Mitchel Johnson') AS Similarity;  

SOUNDEX ('Ramesh') AS Ramesh, 
SOUNDEX ('Suresh') AS Suresh,
DIFFERENCE('Ramesh','Suresh') AS Similarity;  

SOUNDEX ('Tamil') AS Tamil, 
SOUNDEX ('Malayalam') AS Malayalam,
DIFFERENCE('Tamil','Malayalam') AS Similarity;


The output of the DIFFERENCE function

1Not similar
2Very less similar
3Some what similar
4Exact match/ Mostly similar

