NAACCR Record Merging

The Unreliable Identifying Key Problem

The CPDMS Database relies upon hidden keys to maintain integrity of patient records. Hidden keys consist of the combination of a numeric hospital id (HOSPID) and a numeric patient id (PATID). Hidden keys are automatically assigned by the CPDMS hospital registry software. The { HOSPID, PATID } pair yields a unique identifier for each patient in the database. They are called "hidden" keys because they are not normally visible to the end user. Hidden keys are necessary to allow a central registry to merge data from multiple institutions into a single database. They provide reliable identification of patient records as traditional patient identifiers may change.

Hidden keys allow the central registry to detect when a hospital registry has updated other patient identifiers. For example, when the social security number (SSN) and last name of a previously submitted patient are corrected at the hospital and resubmitted to the central registry, how does the central registry detect that this is an update and not a new patient?

123-45-6789 John Smith = 223-45-6789 John Smithe

With hidden keys in place, the software identifies the previously submitted recrods using the hidden keys and thus detects that an update was performed. i.e. the hidden keys will match, even though the other identifying keys do not.

Regrettably, NAACCR records do not contain reliable hidden keys. While space is allocated in the NAACCR record for such keys, experience has shown that hospital registry software packages besides CPDMS do not output them. In order to merge NAACCR records into a CPDMS database, it is necessary to generate hidden keys for the NAACCR records. A set of algorithms have been developed to perform this task. This is a particularly difficult objective because we must rely on traditional patient identifiers to retain patient identity even while the patient identifiers are potentially changing. This process is by definition imperfect and the possibility exists that when multiple identifiers for a single patient are modified simultaneously, then the software may conclude, incorrectly, that the incoming record is a new patient. However, great effort has been made to to minimize this possibility. On the positive side, our task is limited to maintaining hidden keys for individual hospital registries. In other words, we need only compare identifiers of patients against records submitted by the same hospital.

Much research has been devoted to the process of Probabilistic Record Linkage. This is a process of utilizing probability theory to decide whether a set of identifiers uniquely identifies each individual. Some probablistic methods have been incorporated into this process for CPDMS but it is not yet truly probabilistic. Probabalistic linkage would involve performing frequency analysis on individual identifiers and assigning weights to matching values. It is noted that commerical software such as MatchWare could be used to search for duplicates in the CPDMS database.

Ultimately this problem would be resolved if hospital based registry software could generate hidden keys and output them in the NAACCR record. Alternatively, it would be resolved if they could incorporate hidden keys generated by the central registry into the hospital record and output them with NAACCR records. Until this occurs, we will rely on the following algorithms.

Patient Identifying Keys Utilized by CPDMS NAACCR Merge

The following identifiers have been determined to be most reliable in identifying individual patients:

In order to merge NAACRR records into a CPDMS database, the NAACCR record must be converted into a CPDMS merge record, including the necessary hidden keys. Each incoming NAACCR record is compared against records stored in the NAMEKEY database. If a match is found, the hidden keys are extracted from the matching NAMEKEY record and the CPDMS merge record will be identified as the matching patient. The NAMEKEY record will also be updated with the current incoming information. If a match is not found, new hidden keys will be generated, a new NAMEKEY record with these keys will be saved, and the CPDMS records will identify a new patient.

The matching algorithms used to maintain the NAMEKEY database, and subsequently hidden keys for CPDMS from NAACCR records follows.

Matching Algorithms

If Hospital Accession Year and Number Match {
	If SSN Matches, Accept/Overwrite Patient Keys.
	If Last Name and First Name Match, Accept/Overwrite Patient Keys.
	Otherwise Reject for Review.

If SSN Matches {
	If Last Name, First Name Match, Accept/Overwrite Patient Keys.
	If NYSIIS Last, NYSIIS First, Sex & BirthDate, Match, Accept/Overwrite.
	Otherwise Reject for Review.

If (Name and/or NYSIIS), Sex/BDate, Address, & Phone Match, Accept/Overwrite Record.
Otherwise, If Any 2 Match, Reject and Report for Review;
Otherwise, If Fewer Than 2 Match, Accept New Record.

These algorithms will evolve as we gain experience merging NAACCR records into CPDMS central registries.

NAACCR Merge Procedures

Please follow these temporary procedures for NAACCR merging:

The list of merge files in /kcr/naaccr will appear. The list will display the hospital name, date of merge, submitting software, and the number of merge records.

Use the cursor to highlight the desired file, and press enter to select.

The selected file will be converted into a CPDMS merge file. Look for significant error messages during the conversion.

If the conversion has completed with no serious errors:

At this point the merge files will appear in the CPDMS merge menu. Be sure to check rejects.mrg for rejected NAACCR merge records.