edu.uky.kcr.recordlinkage.engine
Class AbstractLinkageEngine

java.lang.Object
  extended by edu.uky.kcr.recordlinkage.engine.AbstractLinkageEngine
All Implemented Interfaces:
LinkageEngine
Direct Known Subclasses:
ExactMatchLinkageEngine

public abstract class AbstractLinkageEngine
extends java.lang.Object
implements LinkageEngine

This class provides the most convenient way to create a LinkageEngine implementation.

It provides the following features to implementers:


An implementer will first subclass this class and then implement the getLinkageMatches(LinkageDataSource, DataSourceRecord) method to perform a linkage operation, returning a List of LinkageMatch objects.

A typical lifecycle from the point of view of an AbstractLinkageEngine implementation:
  1. A LinkageController creates an AbstractLinkageEngine object by reading the LinkageConfiguration.getLinkageEngineClassName() value as a the type of class to instantiate. The LinkageController then initializes the engine with the LinkageConfiguration object and calls LinkageEngine.findLinkedRecords(LinkageDataSource, LinkageDataSource).
  2. The AbstractLinkageEngine calls buildUnblockedLists(LinkageDataSource, LinkageDataSource) to create the lists of primary and secondary unblocked records from the BlockingConfiguration objects in LinkageConfiguration.
  3. The AbstractLinkageEngine then iterates through every unblocked record from the secondary data source and calls getLinkageMatches(LinkageDataSource, DataSourceRecord) with that record.
  4. An implementer will compare the records in the primary LinkageDataSource to the passed-in secondary record and create a list of LinkageMatch objects with a valid score for each match.
  5. From the returned lists of every call to getLinkageMatches(LinkageDataSource, DataSourceRecord), the AbstractLinkageEngine will iterate through the LinkageMatch objects and compare their LinkageMatch.getScore() value with the LinkageConfiguration.getPositiveMatchCutoffScore() and LinkageConfiguration.getNegativeMatchCutoffScore() to know whether they are positive or indeterminate matches.
  6. Once all secondary records have been considered, the AbstractLinkageEngine will return a LinkageResultSet with the LinkageMatch results.

NOTE: You do not need to define any constructors in your subclass of AbstractLinkageEngine, but if you define any, you will need to also define the default constructor that takes no arguments. This requirement is related to the way that the LinkageController instantiates LinkageEngines by class name.

Author:
ihands

Constructor Summary
AbstractLinkageEngine()
           
 
Method Summary
 LinkageResultSet findLinkedRecords(LinkageDataSource primaryDataSource, LinkageDataSource secondaryDataSource)
          Primary workhorse method for a linkage operation, this is where deterministic and probabilistic methods are used to match records.
 LinkageConfiguration getLinkageConfiguration()
           
abstract  java.util.List<LinkageMatch> getLinkageMatches(LinkageDataSource primaryDataSource, DataSourceRecord secondaryDataSourceRecord)
          This method should be the only method that a LinkageEngine implementer needs to implement.
 java.util.Set<DataSourceRecord> getPrimaryUnblocked()
           
 java.util.Set<DataSourceRecord> getSecondaryUnblocked()
           
 void initialize(LinkageConfiguration linkageConfiguration)
          This method is called immediately after the LinkageEngine is created, before LinkageEngine.findLinkedRecords(LinkageDataSource, LinkageDataSource) is called.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.uky.kcr.recordlinkage.engine.LinkageEngine
getName
 

Constructor Detail

AbstractLinkageEngine

public AbstractLinkageEngine()
Method Detail

initialize

public void initialize(LinkageConfiguration linkageConfiguration)
Description copied from interface: LinkageEngine
This method is called immediately after the LinkageEngine is created, before LinkageEngine.findLinkedRecords(LinkageDataSource, LinkageDataSource) is called.

Specified by:
initialize in interface LinkageEngine
Parameters:
linkageConfiguration - Configuration object containing the BlockingConfiguration, MatchingConfiguration, and cutoff scores necessary for a LinkageMatch to be determined by this engine.

getLinkageConfiguration

public LinkageConfiguration getLinkageConfiguration()
Returns:
LinkageConfiguration object that was passed into the LinkageEngine.initialize(LinkageConfiguration) method.

getLinkageMatches

public abstract java.util.List<LinkageMatch> getLinkageMatches(LinkageDataSource primaryDataSource,
                                                               DataSourceRecord secondaryDataSourceRecord)
                                                        throws LinkageException
This method should be the only method that a LinkageEngine implementer needs to implement. It is called after the unblocked lists have been populated so that unblocked records are available through getPrimaryUnblocked() and getSecondaryUnblocked(). An implementer needs to create LinkageMatch objects with a valid score for each record-to-record match from the primaryDataSource to the secondaryDataSourceRecord.

Parameters:
primaryDataSource - Source of data records to match against the secondary record.
secondaryDataSourceRecord - Single record from the secondary data source that should be scored against the primary records.
Returns:
List of LinkageMatch objects with a valid score.
Throws:
LinkageException

findLinkedRecords

public LinkageResultSet findLinkedRecords(LinkageDataSource primaryDataSource,
                                          LinkageDataSource secondaryDataSource)
                                   throws LinkageException
Description copied from interface: LinkageEngine
Primary workhorse method for a linkage operation, this is where deterministic and probabilistic methods are used to match records.

Specified by:
findLinkedRecords in interface LinkageEngine
Parameters:
primaryDataSource - A source of DataSourceRecords to be linked, typically the larger data set.
secondaryDataSource - A second source of DataSourceRecords to be linked, typically the smaller data set.
Returns:
An instance of a LinkageResultSet containing the complete list of positive and indeterminate LinkageMatch objects, as well as any unmatched records.
Throws:
LinkageException

getPrimaryUnblocked

public java.util.Set<DataSourceRecord> getPrimaryUnblocked()
Returns:
A deduplicated Set of DataSourceRecord objects from the primary data source that have not been blocked by BlockingConfiguration objects in the LinkageConfiguration.

getSecondaryUnblocked

public java.util.Set<DataSourceRecord> getSecondaryUnblocked()
Returns:
A deduplicated Set of DataSourceRecord objects from the secondary data source that have not been blocked by BlockingConfiguration objects in the LinkageConfiguration.