An Introduction to Flow¶

If you’re a brand new Flow user, you’re in the right place! We’re going to walk through the basics of Flow by building a shopping cart backend.

Your First collection¶

To start with, we’re going to define a Flow collection that holds data about each user. We’ll have this collection accept user JSON documents via the REST API, and we’ll materialize the data in a Postgres table to make it available to our marketing team. Our devcontainer comes with a Postgres instance that’s started automatically, so all of this should “just work” in that environment.

Flow collections are declared in a YAML file, like so:

collections:
  - name: examples/shopping/users
    key: [/id]
    schema: user.schema.yaml

Note that the schema is defined in a separate file. This is a common pattern because it allows your schemas to be reused and composed. The actual schema is defined as:

user.schema.yaml¶

description: "A user who may buy things from our site"
type: object
properties:
  id: { type: integer }
  name: { type: string }
  email:
    type: string
    format: email
required: [id, name, email]

We can apply our collection to a local Flow instance by running:

$ flowctl build && flowctl develop

Now that it’s applied, we’ll leave that terminal running and open a new one to simulate some users being added.

curl -H 'Content-Type: application/json' -d @- 'http://localhost:8081/ingest' <<EOF
{
    "examples/shopping/users": [
        {
            "id": 6,
            "name": "Donkey Kong",
            "email": "bigguy@dk.com"
        },
        {
            "id": 7,
            "name": "Echo",
            "email": "explorer@ocean.net"
        },
        {
            "id": 8,
            "name": "Gordon Freeman",
            "email": "mfreeman@apeture.com"
        }
    ]
}
EOF

This will print out some JSON with information about the writing of the new data, which we’ll come back to later. Let’s check out our data in Postgres:

$ psql 'postgresql://flow:flow@localhost:5432/flow?sslmode=disable' -c "select id, email, name from shopping_users;"
id |         email         |      name
----+-----------------------+----------------
6 | bigguy@dk.com         | Donkey Kong
7 | explorer@ocean.net    | Echo
8 | freeman@apeture.com | Gordon Freeman
(3 rows)

As new users are added to the collection, they will continue to appear here. One of our users wants to update their email address, though. This is done by ingesting a new document with the same id.

curl -H 'Content-Type: application/json' -d @- 'http://localhost:8081/ingest' <<EOF
{
    "examples/shopping/users": [
        {
            "id": 8,
            "name": "Gordon Freeman",
            "email": "gordo@retiredlife.org"
        }
    ]
}
EOF

If we re-run the Postgres query, we’ll see that the row for Gordon Freeman has been updated. Since we declared the collection key of [ /id ], Flow will automatically combine the new document with the previous version. In this case, the most recent document for each id will be materialized. But Flow allows you to control how these documents are combined using reduction annotations, so you have control over how this works for each collection. The users collection is simply using the default reduction strategy lastWriteWins.

Writing Tests¶

Before we go, let’s add some tests that verify the reduction logic in our users collection. The tests section allows us to ingest documents and verify the fully reduced results automatically. Most examples from this point on will use tests instead of shell scripts for ingesting documents and verifying expected results.

tests:
  "A users email is updated":
    - ingest:
        collection: examples/shopping/users
        documents:
          - { id: 1, name: "Jane", email: "janesoldemail@email.test" }
          - { id: 2, name: "Jill", email: "jill@upahill.test" }
    - ingest:
        collection: examples/shopping/users
        documents:
          - id: 1
            name: Jane
            email: jane82@email.test
    - verify:
        collection: examples/shopping/users
        documents:
          - id: 1
            name: Jane
            email: jane82@email.test
          - id: 2
            name: Jill
            email: jill@upahill.test

Each test is a sequence of ingest and verify steps, which will be executed in the order written. In this test, we are first ingesting documents for the users Jane and Jill. The second ingest step provides a new email address for Jane. The verify step includes both documents, and will fail if any of the properties do not match.

We can run the tests using:

$ flowctl build && flowctl test

Next Steps¶

Now that our users collection is working end-to-end, Here’s some good topics to check out next:

Learn the basics of CSV ingestion by building the Products collection
Explore reduction annotations by building the Shopping Cart collection