Surname NYSIIS Code: Original NYSIIS Code: Modified
New York State Identification and Intelligence System (NYSIIS) Phonetic Encoder
Source implementation by Steve Hobbs, Comserve Limited
Converted to SAS by Anna Ferrante, August, 1990
Converted to Javascript by Matt Pérez, July 1999 and later modified to match Taft's original algorithm, July 2006.
NIST reference page on NYSIISExternal Page Policy 
(click to see policy and instructions), with links to other algorithms
Alternate algorithm by Ross PattersonExternal Page Policy 
(click to see policy and instructions), Rutgers University, May 5, 1988
Alternate implementation in C# by Reggie Beneke
Mark Antro asked what to do when the name ends with "RDT" (e.g., as in GILMOURDT): should it result in "...DT" or "...D?" After reviewing various implementations, it seems to me that the original algoritm has consistenly been interpreted to refer to the last two characters of the name. Taft is silent on three-character strings. On the other hand, it sounds to me as if "RDT" should reduce to "D" and not "DT."
Thanks to Steve Skalski, Reggie Beneke, John Morrill, Graham Case and Anthony Wilson for catching various bugs. All remaining bugs are mine.

Original Algorithm:

1. Transcode first characters of name:
MAC » MCC
KN » NN
K » C
PH » FF
PF » FF
SCH » SSS
2. Transcode last characters of name:
EE, IE » Y
DT,RT,RD,NT,ND » D
3. First character of key = first character of name.
4. Transcode remaining characters by following these rules, incrementing by one character each time:
EV » AF else A,E,I,O,U » A
Q » G  
Z » S  
M » N  
KN » N else K » C
SCH » SSS  
PH » FF  
H » If previous or next is nonvowel, previous
W » If previous is vowel, previous
Add current to key if current != last key character
5. If last character is S, remove it
6. If last characters are AY, replace with Y
7. If last character is A, remove it
8. Collapse all strings of repeated characters
9. Add original first character of name as first character of key

Modified Algorithm:

1. if the first character of the name is a vowel, remember it
2. remove all 'S' and 'Z' chars from the end of the name
3. transcode first characters of name
MAC » MC
PF » F
4. Transcode trailing strings as follows,
IX » IC
EX » EC
YE,EE,IE » Y
DT,RT,RD,NT,ND » D
repeat this last step as necessary
5. transcode 'EV' to 'EF' if not at start of name
6. use first character of name as first character of key
7. remove any 'W' that follows a vowel
8. replace all vowels with 'A' and collapse all strings of repeated 'A' to one
9. transcode 'GHT' to 'GT'
10. transcode 'DG' to 'G'
11. transcode 'PH' to 'F'
12. if not first character, eliminate all 'H' preceded or followed by a vowel
13. change 'KN' to 'N', else 'K' to 'C'
14. if not first character, change 'M' to 'N'
15. if not first character, change 'Q' to 'G'
16. transcode 'SH' to 'S'
17. transcode 'SCH' to 'S'
18. transcode 'YW' to 'Y'
19. if not first or last character, change 'Y' to 'A'
20. transcode 'WR' to 'R'
21. if not first character, change 'Z' to 'S'
22. transcode terminal 'AY' to 'Y'
23. remove trailing vowels
24. collapse all strings of repeated characters
25. if first character of original name is a vowel, prepend to code (or replace first transcoded 'A')

In both implementations, before the algoritm is applied, the input string is preprocessed as follows:

  1. Convert all characters to upper case
  2. Trim all trailing whitespace
  3. Remove "JR," "SR," and Roman Numerals from the end of the string (i.e., where "Roman Numerals" can be a malformed run of 'I' and 'V' chars)
  4. Remove all non-alpha characters

Click here to see a simple test page

The original algorithm comes f rom Robert L. Taft, "Name Search Techniques", New York State Identification and Intelligence System.

According to the document Duplicate Record Detection [PDF] by Elmagarmid, Ipeirotis, & Verykios, the resulting code is limited to six characters.