Data applications are notoriously difficult to test and are therefore often un- or under-tested.
Creating testable and verifiable data pipelines is one of the focuses of Dagster. We believe ensuring data quality is critical for managing the complexity of data systems. Here, we'll show how to write unit tests for Dagster pipelines and solids.
Let's go back to our first solid and pipeline in hello_cereal.py
and ensure they're working as expected by writing some tests. We'll use execute_pipeline()
to test our pipeline, as well as execute_solid()
to test our solid in isolation.
These functions synchronously execute a pipeline or solid and return results objects (SolidExecutionResult
and PipelineExecutionResult
) whose methods let us investigate, in detail, the success or failure of execution, the outputs produced by solids, and (as we'll see later) other events associated with execution.
def test_hello_cereal_solid():
res = execute_solid(hello_cereal)
assert res.success
assert len(res.output_value()) == 77
def test_hello_cereal_pipeline():
res = execute_pipeline(hello_cereal_pipeline)
assert res.success
assert len(res.result_for_solid("hello_cereal").output_value()) == 77
Now you can use pytest, or your test runner of choice, to run unit tests as you develop your data applications.
pytest hello_cereal_with_tests.py
Note: by convention, pytest tests are typically kept in separate files prefixed with test_
. We've put them in the same file just to simplify the tutorial code.
Obviously, in production we'll often execute pipelines in a parallel, streaming way that doesn't admit this kind of API, which is intended to enable local tests like this.
Dagster is written to make testing easy in a domain where it has historically been very difficult. Throughout the rest of this tutorial, we'll explore the writing of unit tests for each piece of the framework as we learn about it. You can learn more about Testing in Dagster by reading the Testing page.
🎉 Congratulations! Having reached this far, you now have a working, testable, and maintainable data pipeline. You should now be able to build your own data applications using Dagster!