Introduction
The data model describes UTA's approach for organizing its core business entities in a graph database.
Our goal is to be collectively exhaustive in modeling not only the people and organizations with which UTA interacts, but the relationships between those people and organizations.
Our desired outcomes are to:
- enable our business to answer a large class of the questions it asks, not only for day-to-day operations but also to be prepared for M&A, legal, compliance, and regulatory questions
- create a foundation upon which integrated applications, both transactional and analytic, can be built
In Scope: The Party Domain
We are focused on a bounded context called the party domain.
A bounded context is a domain-driven design concept that describes a particular domain of interest. The party domain concerns itself with operational (as opposed to analytic) data, and more specifically the roles and relationships between people and organizations of interest to UTA.
Out of Scope: CRM and Analytics
This party domain does not address important operational or analytic data domains, which may be referred to with names like project, engagement, deal, credits, or phonesheet todos. Similarly, the party domain does not address most CRM-like data such as activity, lead tracking, offers, or contracts.
Authoritative Data is an implementation of the party-role-relationship model in a graph database
The party model is a traditional approach which has been implemented across many domains, usually in a relational database environment.
Our experience building a party, party role, and party relationship schema in a relational database has shown us that what we were actually building was a graph. For this design, instead of modeling the graph inside a relational database, UTA will use a graph database for modeling its authoritative data.
High level overview of the Authoritative Data graph model
We will introduce this graph model by building up the small number of core elements we use, one by one.
A graph consists of nodes and relationships.
Think of the nodes as the nouns of the model, and relationships as the verbs.
Central to the model is the party node, which is a person or organization. In graph data modeling, we draw the nodes as circles, so we will draw a party node like this:
One kind of party node ("noun") can represent an organization such as "Netflix," and another kind of party node ("noun") can represent a person such as "Ted Sarandos."
Party nodes are mostly simple, high level identifiers, without many attributes. What should be done to model the attributes associated with these parties?
We call any node that describes data attributes of a party a party data node. For example, an address node contains data about a particular party's address:
We will connect the Netflix party to its address node, and Ted Sarandos to Netflix, in a moment.
A second type of node that categorizes other nodes is called a party category node. We use category nodes to aggregate all the parties that belong to that particular category. An example category node would be Role (such as "Client" or "Buyer"). For example, a buyer category node might look like this:
Now we are ready to discuss how we can connect all these nodes together. We connect the nodes with relationships, which are the verbs of the model.
The relationships between party nodes and party data nodes typically indicate a "HAS" relationship in their names, for example, "A party has an address." We draw these relationships as arrows, like this:
The relationships between party nodes and party category nodes typically indicate an "IS" relationship, however semantically we tend to use "HAS" a lot in the relationship names, so make sure you understand which is which. In our Netflix example, we use "HAS_ROLE" to connect the buyer relationship.
Like party data nodes, we draw these relationships as arrows:
Finally, parties can have relationships with other parties (a "party-to-party" relationship). For example, Ted Sarandos works at Netflix, so using the HAS_APPOINTMENT and APPOINTED_TO relationships we draw these party-to-party relationships as essentially circling back on another party:
When you put those pieces together, you can see how the graph builds out to create the big picture view of a company:
The diagrams above model the essential elements of the model, at a meta level:
| Three node types | Three relationship types |
|---|---|
| Party | Data |
| Party Data | Category |
| Party Category | Party-to-Party |
Data-related architectural elements we are using
Some gain a better understanding of a system's architecture by asking some basic questions about the approach and the stack. Anticipating some of those questions, we summarize these answers below:
| Data architectural element | How we are implementing it |
|---|---|
| Graph Database | Neo4j v5 or later; specifically, the AuraDB managed database product |
| Graph Query Language | Cypher |
Core entities like :Party and :Role | Modeled as Nodes. The nodes are the nouns |
| Relationships between entities | Modeled as Relationships. The relationships are the verbs. |
| Referential integrity | Enforced to the extent we can with Neo4j indexes and constraints; the rest in the API service |
| Business logic | Enforced in the API service. These rules will be described in the detail sections on nodes and relationships that follow. |
| Field validation | Specified in an OpenAPI 3.1 spec and enforced in the API via an OpenAPI plugin and TypeScript |
| Authorization | Enforced in the API via Open Policy Agent (OPA) |
Learning more about graph data modeling
A good reference for a relational database-oriented party model can be found in Silverstein's data model book[1]. Be aware when you read Silverstein you will be looking at Entity-Relationship models, not graphs. However, this is still a good reference for the party model.
For further foundational information on graph data modeling, please refer to Singh's Graph Database Modeling[2] and the posts by Neo4j staff architect David Allen. You may also want to review the blog posts by Tahir Waseer on graph data modeling.