Data tools
The ADM data engineering team shall use simple tools -- preferably native Unix/Linux command line utilities. Every data transformation should be written as code rather than drawn in a GUI.
Expressing the sequencing of those transformation operations shall be done initially in a makefile but may move to a code-based orchestration tool like Apache Airflow.
Tool inventory
| Tool | Description | Install |
|---|---|---|
| jq | A command-line JSON processor | brew install jq |
| gojq | A command-line JSON processor written in Go | brew install gojq |
| jaq | A command-line JSON processor wriiten in Rust | brew install jaq |
| sort | Sort lines of text files | Native to the OS |
| uniq | Report or omit repeated lines | Native to the OS |
| sed | A stream editor for filtering and transforming text | Native to the OS |
| xsv | A suite of utilities for converting to and working with CSV, the king of tabular file formats | brew install xsv |
adm-gen-init | An internally-developed CLI for generating a Neo4j database initialization script from a metamodel | npm i -g @united-talent-agency/adm-gen-init(see doc below) |
| guish | Experimental |
A cookbook of serveral examples showing how the tools are used:
cat people.json | gojq '.[].contacts[].contactType' | sort | uniq -ccat people.json | gojq '.[].addresses' | sed '/^null/d' | sed '/^\[\]/d'#! /bin/bash
cat people.json \
| gojq ".[].$1" \
| sed '/^null/d' \
| sed '/^\[\]/d'cat people.json | gojq '.[] | select(.addresses | length >= 1) | { onyxId: ._id."$oid", addresses: .addresses }'cat people.json | gojq '.[] | select(._id."$oid" == "5d9d0b10d6703f0011cd3c84")'cat people.json | jq lengthcat people.json \
| gojq '.[].name' \
| sort \
| uniq -D \
| uniq -c \
| sort -rcat people.json | gojq '.[].name' | grep -i "o/b/o"adm-gen-init
The adm-gen-init command line interface allows the ADM data engineering team to generate a Cypher init script for authoritative data. We will use this not only to prepare production and test databases, but also to build local development and CI-based containers for testing.
The source code for adm-gen-init can be found at https://github.com/united-talent-agency/adm-gen-init.
Authenticating to the GitHub Package Registry
adm-gen-init is packaged as an npm module and hosted in UTA's private GitHub packages repository. To install adm-gen-init or any other UTA private package:
- Create a "classic" personal access token with the
read:packagesscope. - Authorize that token to access the
@united-talent-agencyorganization using the "Configure SSO" button:

- Create (or update) a
.npmrcfile in your home directory (usually~/) with the following contents:
@united-talent-agency:registry=https://npm.pkg.github.com
//npm.pkg.github.com/:_authToken=YOUR_TOKEN_HEREWARNING
Be sure to keep your token secure!
At this point you should be able to run or install the CLI as described below.
Installation
If you want to install the package, you can do so with:
npm i -g @united-talent-agency/adm-gen-init
You can also run it via npx or pnpx:
pnpx @united-talent-agency/adm-gen-init -i mdm-meta-model.yaml -o init.cypher
Usage
Run adm-gen-init --help for usage information.
The metamodel
The input file on the command line is required and should be a YAML-based "metamodel," which is just a trivial DSL for specifying core entities we need initialized in Neo4j before loading data. The data engineering team maintains the latest metamodel in the data engineering repo, but below is a sample of what the metamodel file looks like.
adm-gen-seed
The adm-gen-init command line interface allows the ADM data engineering team to generate a Cypher script that will seed some sample authoritative data into a database. This will not be used for production but rather for test databses used in development and CI.
The source code for adm-gen-seed can be found at https://github.com/united-talent-agency/adm-gen-seed.
Installation
If you want to install the package, you can do so with:
npm i -g @united-talent-agency/adm-gen-seed
You can also run it via npx or pnpx:
pnpx @united-talent-agency/adm-gen-seed > seed.cypher
Usage
Run adm-gen-seed --help for usage information.