Evgenii Karimov - Combining dbt and SQLFluff

In this post I’d like to share how to adopt two very useful tools SQLFluff, a SQL linter, and dbt, a data modelling tool, into a powerful combination and make it work with pre-commit to ensure all your SQLs are properly formatted.

Why to combine them at all? There are a lot of ways to write SQL: leading/trailing commas, lower-upper case of keywords, aliasing table and column names, etc.

Since every data engineer and analyst spends a lot of time reading SQL code, having a standard style helps to improve this process and also makes writing less error-prone.

dbt is a powerful data modelling tool helping to write less boilerplate code with Jinja templates and it has a remarkable ecosystem of different plugins.

Let’s have a look at the example, say we have the following SQL:

SELECT
  order_id,
  ORDER_DATE AS order_date
  , order_price *0.9 AS discounted_price
from orders

Doesn’t look very reader-friendly, does it?
Now I’m going to install sqlfluff with pip and create a configuration file for it.
In this example I’ll be using BigQuery as a query engine, so you might need to adjust my code for your setup, as well as to enforce certain rules (reference):

Installing sqlfluff, dbt and dependent packages:

pip install sqlfluff
pip install dbt-core
pip install dbt-bigquery
pip install sqlfluff-templater-dbt

How my sqlfliff looks like:

[sqlfluff]
verbose = 1
dialect = bigquery
templater = dbt
recurse = 0
runaway_limit = 10
ignore_templated_areas = True
encoding = autodetect
processes = 4

# NB: This config will only apply in the root folder.
sql_file_exts = .sql
    
[sqlfluff:indentation]
indented_joins = False
indented_using_on = False
template_blocks_indent = True
    
[sqlfluff:templater]
unwrap_wrapped_queries = True
    
[sqlfluff:templater:dbt]
profiles_dir = profiles/
profile = bigquery
target = prod
    
[sqlfluff:templater:jinja]
apply_dbt_builtins = True
    
[sqlfluff:rules]
tab_space_size = 4
max_line_length = 110
indent_unit = space
comma_style = leading
allow_scalar = True
single_table_references = consistent
unquoted_identifiers_policy = all

Also we need to ignore certain dbt folders by adding them into .sqlfluffignore:

target/
dbt_modules/
logs/

After running sqlfluff fix we get:

SELECT
    order_id
    , order_date AS order_date
    , order_price * 0.9 AS discounted_price
FROM orders

Integration with pre-commit hooks works following:

Setup pre-commit for your repo.
Add the hooks into .pre-commit-config.yaml as follows:

  - repo: https://github.com/sqlfluff/sqlfluff
    rev: 1.1.0
    hooks:
      - id: sqlfluff-fix
        additional_dependencies: ['dbt-core==1.1.0', 'dbt-bigquery==1.1.0', 'sqlfluff-templater-dbt==1.1.0']
        files: |
          models/|
          tests/          
      - id: sqlfluff-lint
        additional_dependencies: ['dbt-core==1.1.0', 'dbt-bigquery==1.1.0', 'sqlfluff-templater-dbt==1.1.0']
        files: |
          models/|
          tests/

Enjoy!
Now everytime you want to commit a new change in your dbt models, sqlfluff will try to fix it automatically for you.
Additionally you can integrate this check with Github Actions or other CI/CD tools and require sqlfluff lint to pass before making merging a new code into the main branch.

Combining dbt and SQLFluff

Share this post

Takeouts on new GCP products DataPlex and Analytics Hub