I started a new job a couple of months ago and was introduced to this open source project DBT with it. I've been working in the data field for a few years now and had heard about it from afar but hadn't any real experience working with it. I thought I'd share my initial thoughts having worked with it for about a month now.
For the unfamiliar, DBT is an abstraction layer that makes it easy to write transformation code in SQL for any data warehouse. For me, DBT helps address some of the biggest problems that comes with writing a lot of SQL code and helps formalize a lot of transformations that data teams do. In the past, I had written those 100+ line SQL queries where you had 0 validation beyond hoping when someone looked at your data in a dashboard it looked correct. With DBT, you can write tests for each table and column in your data model which I have really enjoyed doing. Also, DBT allows for writing extensive documentation on all the tables making discovery a lot easier.
My first job in data was as a Business Analyst writing a lot of SQL and so I have a soft spot for it but as I have gotten more experience have found the limitations around data integrity to be a real barrier to having good "production" SQL. With DBT, it really eased a lot of those concerns by making it possible to have well-documented well-tested SQL and easily having a CI/CD framework for your data warehouse.
The source freshness checks are also a really useful feature in DBT that I have enjoyed having. DBT allows the developer to configure a source YAML file that has tests around the "raw" data being imported into your data warehouse and check to make sure data has been updated at some frequency. In the past, we would have some Airflow DAG or dashboard that would keep track of counts and alert if something dramatically changed but here it is much easier to check if data is being updated regularly.
A couple of months in, DBT earns high marks from me!
For the unfamiliar, DBT is an abstraction layer that makes it easy to write transformation code in SQL for any data warehouse. For me, DBT helps address some of the biggest problems that comes with writing a lot of SQL code and helps formalize a lot of transformations that data teams do. In the past, I had written those 100+ line SQL queries where you had 0 validation beyond hoping when someone looked at your data in a dashboard it looked correct. With DBT, you can write tests for each table and column in your data model which I have really enjoyed doing. Also, DBT allows for writing extensive documentation on all the tables making discovery a lot easier.
My first job in data was as a Business Analyst writing a lot of SQL and so I have a soft spot for it but as I have gotten more experience have found the limitations around data integrity to be a real barrier to having good "production" SQL. With DBT, it really eased a lot of those concerns by making it possible to have well-documented well-tested SQL and easily having a CI/CD framework for your data warehouse.
The source freshness checks are also a really useful feature in DBT that I have enjoyed having. DBT allows the developer to configure a source YAML file that has tests around the "raw" data being imported into your data warehouse and check to make sure data has been updated at some frequency. In the past, we would have some Airflow DAG or dashboard that would keep track of counts and alert if something dramatically changed but here it is much easier to check if data is being updated regularly.
A couple of months in, DBT earns high marks from me!