Duplicate Parties

Duplicate parties may already exist in the data, and despite our best efforts, duplicates will certainly be created in the future.

What is the process for handling duplicates?

Report

Someone must notice the potential duplicate and report it to the data steward. The touchpoints for reporting are numerous: a phone call, an email, a Slack message, a hallway conversation, etc.

Identify

The data steward must identify the duplicate and the valid parties. The data steward must also determine if the duplicate party has any properties or relationships that need to be merged into the valid party.

In our design this stewarding process is not automated, and requires human judgement. Many users make mistakes in identifying duplicates; the steward will very often do some research on both the candidate parties to ensure not only which is the duplicate, but as a courtesy to downstream systems, try to get an idea of the number of transactions that have been attached to the potentially duplicate party. This simple impact analysis will guide the steps that follow.

Merge

Assuming that the steward has identified the duplicate and the valid parties, we must now try to create a good master record by merging properties, and relationships if necessary, from the duplicate party to the valid party. This will require some de-dupe tooling not yet specified.

Deactivate and Record Dupe

These two steps should be completed in a single transaction. Once the merge is complete, the duplicate party is deactivated as follows:

the duplicate party's relationships should all be deactivated
the duplicate party itself should be deactivated

Then, we must create an IS_DUPLICATE_OF relationship between the duplicate party and the valid party. This relationship will be used by downstream systems to identify and resolve duplicates.

DOWNSTREAM SYSTEMS MUST RESOLVE DUPLICATES

Remember, it is the job of consuming, or downstream systems to resolve duplicates in their systems. The IS_DUPLICATE_OF relationship is a signal to downstream systems that they should resolve the duplicate. It is an important design principle in our product approach that the authoritative data product not "reach into" other systems to fix their data.

Refer to the documentation for IS_DUPLICATE_OF for more information.

Venues

Data Nodes

Category Nodes

Data Relationships

Category Relationships

Party Relationships

Operations

Types

Objects (155)

Interfaces (3)

Unions (1)

Inputs (273)

Enums (72)

Scalars (4)

Duplicate Parties

Report

Identify

Merge

Deactivate and Record Dupe

Objects (155)

Interfaces (3)

Unions (1)

Inputs (273)

Enums (72)

Scalars (4)

Duplicate Parties ​

Report ​

Identify ​

Merge ​

Deactivate and Record Dupe ​

Duplicate Parties

Report

Identify

Merge

Deactivate and Record Dupe