How to identify similarly pronounced words in SQL server?

There are two functions in SQL Server that are used to identify whether the two strings are pronounced similarly or not.

They are

  • SOUNDEX() - This function takes a string as parameter and returns a four-character code. This code is called as Soundex. When this code is calculated it basically ignores the vowels (A, E, I, O, U), H, W, and Y unless they are the first letter of the string.

  • DIFFERENCE() - This function takes two strings as parameter and returns a integer value from 1 to 4. This function internally calculates the SOUNDEX code for each of the string and find the difference between the two SOUNDEX code.

SOUNDEX ('Sequel') AS Sequel,
DIFFERENCE('SQL', 'Sequel') AS Similarity;  

SOUNDEX ('Michael Jackson') AS Michael_Jackson, 
SOUNDEX ('Mitchel Johnson') AS Mitchel_Johnson,
DIFFERENCE('Michael Jackson','Mitchel Johnson') AS Similarity;  

SOUNDEX ('Ramesh') AS Ramesh, 
SOUNDEX ('Suresh') AS Suresh,
DIFFERENCE('Ramesh','Suresh') AS Similarity;  

SOUNDEX ('Tamil') AS Tamil, 
SOUNDEX ('Malayalam') AS Malayalam,
DIFFERENCE('Tamil','Malayalam') AS Similarity;


The output of the DIFFERENCE function

1Not similar
2Very less similar
3Some what similar
4Exact match/ Mostly similar

