Strengthen Query Consistency with SQLFluff: Learn to Lint SQL Queries and Adapt Rules for Data Teams

As data professionals, we understand the significance of writing clean, maintainable, and pretty code. In the realm of SQL, where queries can quickly become complex, it's helpful to have a tool that ensures consistency and adherence to best practices. Enter SQLFluff—an open-source linter specifically designed for SQL. In this blog post, we'll explore how to leverage SQLFluff to lint T-SQL queries and how to use a configuration file to customize the rules.

What is SQLFluff?

SQLFluff is a powerful command-line tool that analyzes SQL code and provides feedback on formatting and style issues. It follows a set of predefined rules based on community-driven best practices and can be customised to align with your team's specific coding standards. SQLFluff supports various SQL dialects, including T-SQL, making it an excellent choice for SQL developers.

The "WHY"?

I work in an environment where we have a lot of SQL code written by a lot of different people over a lot of years. As we built up a new data team we were seeing an added amount of overhead bringing in new standards and consistency to the existing SQL code.

We began using SQLFluff to help make code easier to read and review. I wanted a way to reduce the amount of review time that was spent on formatting and style differences. If we had an old 500-line proc and were making a change to a few lines but the style and standards were out of date, I wanted a way to make it simple to apply those styles and standards without having to go through manually adjusting the code. I have also found it to be an important part of onboarding people into the team or helping to teach people SQL. I wanted the focus to be on the code we were writing and the problem we were solving...not if we had remembered to capitalise or space things correctly :)

The following sections will walk through how to install and lint SQL code as well as how to configure SQLFluff. I have the example queries and my SQLFluff config file in this Github repo if you would like to use them.

Github SQLFluff config repo

Installing SQLFluff

Before we dive into the linting process, let's quickly cover how to install SQLFluff. You'll need Python and pip installed on your machine. Simply run the following command in your terminal:

pip install sqlfluff

Linting a SQL Query

  1. Create a new text file

  2. Put the following code in the file select first_name, last_name, email_address from customers where first_name='John'

  3. Save the file as my_sql_query.sql

  4. To lint a T-SQL query using SQLFluff, execute the following command in the directory you saved your file above:

sqlfluff lint my_sql_query.sql --dialect tsql

SQLFluff will process the file and provide detailed feedback on any formatting or style violations.

The output of the linting shows the rules that are failing so you can reference back to the SQLFluff documentation for the detail of the rules.

SQLFluff can correct most of your wrongs. You can "fix" the code by running the following:

sqlfluff fix my_sql_query.sql --dialect tsql

If you now open your my_sql_query.sql file the code should look like this:

select
    first_name,
    last_name,
    email_address
from customers
where first_name = 'John'

Much tidier :)

But...

SQLFluff Configuration

The beauty of SQLFluff is the ability to configure the rules the way you like them. For example, the eternal battle between commas being at the start or the end of a column name :) I am a comma in front person so I can apply that version of the rule in a config file as follows:

  1. Create a file, in the directory you want to lint files, called .sqlfluff. When you run SQLFluff in this directory it will use the details of the .sqlfluff file here

  2. Add the following to your file and save it.

     [sqlfluff:layout:type:comma]
     line_position = leading
     spacing_after = single
    

The above configuration means that when I run a lint command it will check the config file and understand that I want the layout for commas to be in front of the line (leading) and be followed by a space.

Run the same commands for lint and fix over our previous file after adding this rule and see what changes.

select
    first_name
    , last_name
    , email_address
from customers
where first_name = 'John'

SQLFluff has great documentation with examples of all the rules. You can find them here:

SQLFluff Rules Documentation

I want to take a bit of time now to explain some other parts of the config file I use.

Top-level config lines:

[sqlfluff]
dialect = tsql
sql_file_exts = .sql
exclude_rules = AL07, ST06

Line 1: We use SQL Server so we tell SQLFluff our dialect is tsql . We then don't have to explicitly tell SQLfluff the dialect when running a command. I.e. --dialect tsql won't need to be typed anymore :)

Line 2: I want to make sure I am only trying to lint .sql files in the directory in case I have some other files or documents in there for other purposes E.g. an Excel file for the analysis I am doing

Line 3: There are a couple of rules that we didn't want to apply so this line is how you can exclude rules. In my example, AL07 and ST06 are ignored.

The remainder of the config file lines that we use are style choices that we chose as a team. As you can see below they relate mainly to casing for the code we write.

[sqlfluff]
dialect = tsql
sql_file_exts = .sql
exclude_rules = AL07, ST06
large_file_skip_char_limit = 0
max_line_length = 120

[sqlfluff:rules]
single_table_references = unqualified

[sqlfluff:indentation]
indent_unit = tab

[sqlfluff:layout:type:comma]
line_position = leading
spacing_after = single

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = upper

[sqlfluff:rules:capitalisation.identifiers]
extended_capitalisation_policy = lower
unquoted_identifiers_policy = all

[sqlfluff:rules:capitalisation.functions]
extended_capitalisation_policy = upper

[sqlfluff:rules:capitalisation.types]
extended_capitalisation_policy = upper

[sqlfluff:rules:capitalisation.literals]
capitalisation_policy = upper

If I re-run the linting with my team config file this is what the example query looks like.

SELECT
    first_name
    , last_name
    , email_address
FROM customers
WHERE first_name = 'John'

Examples for you

I have put a few fictitious examples of SQL in need of linting in the GitHub repo so you can experiment for yourself. You can take the config file and make it your own and also experiment with other dialects if you are using different engines.

But Wait...

There is also a great extension for VS Code for SQLFluff that you can use after installing SQLFluff. Use the settings of the extension to tune it and use your config file.

Conclusion

By incorporating SQLFluff into your T-SQL development workflow, you can ensure that your queries adhere to best practices, maintain a consistent code style, and minimize potential errors. With its customizable rules and ability to provide detailed feedback on formatting and style violations, SQLFluff empowers you to write cleaner, more readable, and higher-quality SQL code. I would highly recommend the tool, lint your SQL queries, and enjoy the benefits of improved code maintainability and collaboration within your team. Happy linting!

As always send me feedback or ideas and like and follow if you enjoy this type of content.

Shout out!!

A big thanks to my current team for adopting and signing up for the idea of linting SQL. Also a shout out to previous teams for introducing me to linting and the difference it can make to engineering teams.