This documentation is deprecated. The new documentation is available at docs.estaury.dev. This repo is archived for reference, but may soon be deleted to avoid confusion.
Estuary Flow Deprecated Documentation¶
Estuary Flow unifies technologies and teams around a shared understanding of an organization’s data, that updates continuously as new data records come in.
Flow is primarily targeted to backend engineers who must manage various continuous data sources, with multiple use cases and stakeholders. It makes it easy for engineers to turn sources – e.g. streaming pub/sub topics or file drops – into pristine S3 “data lakes” that are documented and discoverable to analysts, ML engineers, and others using their preferred tooling (e.g. via direct Snowflake / Spark integration).
Engineers can then go on to define operational transforms that draw from the same data – with its complete understanding of history – to build new data products that continuously materialize into databases, pub/sub, and SaaS. All with end-to-end latency in the milliseconds.
Flow’s continuous transform capability is uniquely powerful. Build complex joins and aggregations that have unlimited historical look-back, with no onerous windowing requirements, and which are simple to define and evolve. Once declared, Flow back-fills transformations directly from the S3 lake and then seamlessly transitions to live updates. New data products – or fixes to existing ones – are assured of consistent results, every time. The Flow runtime manages scaling and recovers from faults in seconds, for true “hands-free” operation.
Flow is configuration driven and uses a developer-centric workflow that emphasizes version control, composition & re-use, rigorous schematization, and built in testing. It’s runtime takes best advantage of data reductions and cloud pricing models to offer a surprisingly low total cost of ownership.
Flow is currently in release preview. It’s ready for local development and prototyping, but there are sharp edges, open issues, and missing features.
Slides (Direct Link)
This documentation is interactive! You can directly open it on GitHub using Codespaces, or you can clone this repo and open using the VSCode Remote Containers extension (see our guide). Both options will spin up an environment with the Flow CLI tools, add-ons for VSCode editor support, and an attached PostgreSQL database for trying out materializations.
# Build this documentation repository's Flow catalog. $ flowctl build # Run all catalog tests. $ flowctl test # Start a local Flow instance and deploy the catalog to it $ flowctl develop
Flow is built upon Gazette. A basic understanding of Gazette concepts can be helpful for understanding Flow’s runtime and architecture, but isn’t required to work with Flow.
Table of Contents¶
- How Flow Helps
- Working with VS Code
- Reduction Types
- Derivation Patterns
- Ingesting Data
- Example: Citi Bike
- Example: Wiki Edits
- Example: Network Traces
- Example: Shopping Cart