A formatting tool for your Databricks notebooks.
While you can use pip
directly, you should prefer using pipx.
$ pipx install blackbricks
You probably also want to have installed the databricks-cli
, in order to use blackbricks
directly on your notebooks.
$ pipx install databricks-cli
$ databricks configure # Required in order to use `blackbricks` on remote notebooks.
You can use blackbricks
on Python notebook files stored locally, or directly on the notebooks stored in Databricks.
For the most part, blackbricks
operates very similarly to black
.
$ blackbricks notebook1.py notebook2.py # Formats both notebooks.
$ blackbricks notebook_directory/ # Formats every notebook under the directory (recursively).
An important difference is that blackbricks
will ignore any file that does not contain the # Databricks notebook source
header on the first line. Databricks adds this line to all Python notebooks. This means you can happily run
blackbricks
on a directory with both notebooks and regular Python files, and blackbricks
won't touch the latter.
If you specify the -r
or --remote
flag, blackbricks
will work directly on your notebooks stored in Databricks.
$ blackbricks --remote /Users/username/notebook.py
$ blackbricks --remote /Repos/username/repo-name/notebook.py
$ poetry run blackbricks --help
Usage: blackbricks [OPTIONS] [FILENAMES]...
Formatting tool for Databricks python notebooks.
Python cells are formatted using `black`, and SQL cells are formatted by `sqlparse`.
Local files (without the `--remote` option):
- Only files that look like Databricks (Python) notebooks will be processed. That is,
they must start with the header `# Databricks notebook source`
- If you specify a directory as one of the file names, all files in that directory will
be added, including any subdirectory.
Remote files (with the `--remote` option):
- Make sure you have installed the Databricks CLI (``pip install databricks_cli``)
- Make sure you have configured at least one profile (`databricks configure`). Check the
file `~/.databrickscfg` if you are not sure.
- File paths should start with `/`. Otherwise they are interpreted as relative to
`/Users/username`, where `username` is the username specified in the Databricks profile
used.
╭─ Arguments ────────────────────────────────────────────────────────────────────────────╮
│ filenames [FILENAMES]... Path to the notebook(s) to format. [default: None] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --remote -r If this option is used, │
│ all filenames are treated │
│ as paths to notebooks on │
│ your Databricks host (i.e. │
│ not local files). │
│ --profile -p NAME If using --remote, which │
│ Databricks profile to use. │
│ [default: DEFAULT] │
│ --line-length INTEGER How many characters per │
│ line to allow. │
│ [default: 88] │
│ --sql-upper --no-sql-upper SQL keywords should be │
│ UPPERCASE or lowercase. │
│ [default: sql-upper] │
│ --check Don't write the files │
│ back, just return the │
│ status. Return code 0 │
│ means nothing would │
│ change. │
│ --diff Don't write the files │
│ back, just output a diff │
│ for each file on stdout. │
│ --version Display version │
│ information and exit. │
│ --help Show this message and │
│ exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Use pre-commit. Add a .pre-commit-config.yaml
file
to your repo with the following content (changing/removing the args
as you
wish):
repos:
- repo: https://github.com/inspera/blackbricks
rev: 1.0.0
hooks:
- id: blackbricks
args: [--line-length=120]
Set the rev
attribute to the most recent version of blackbricks
.
The args
are optional and can be used to set any of blackbricks
options.
If you find blackbricks useful, feel free to say so with a star. If you think it is utterly broken, you are more than welcome to contribute improvements. Please open an issue first to discuss what you want added/fixed. Unless you are just adding tests. In that case your pull request is extremely likely to be merged right away.
Sure! Certain SQL statements might not be parsed and indented properly by sqlparse
, and the result can be jumbled
formatting. You can disable SQL formatting for a cell by adding -- nofmt
to the very first line of a cell:
%sql -- nofmt
select this,
sql_will, -- be kept just
like_this
from if_that_is.what_you_need
First, make sure you have set up databricks-cli
on your system (see installation), and that you have
at least one profile setup in ~/.databrickscfg
. As an example:
# File: ~/.databrickscfg
[DEFAULT]
host = https://dbc-b23456-a1243.cloud.databricks.com/
username = [email protected]
password = dapi12345678901234567890
[OTHERPROFILE]
host = https://dbc-c54321-d234.cloud.databricks.com
username = [email protected]
password = dapi09876543211234567890
You should use access tokens instead of your actual password.
You can then do:
$ blackbricks --remote /Users/[email protected]/notebook.py # Uses DEFAULT profile.
$ blackbricks --remote notebook.py # Equivalent to the above.
$ blackbricks --remote --profile OTHERPROFILE /Users/[email protected]/notebook.py
$ blackbricks --remote --profile OTHERPROFILE notebook.py # Equivalent to the above.
$ blackbricks --remote /Repos/[email protected]/repo-name/notebook.py # Targeting notebook in a Repo
No. See #27 for why.
However, Databricks now allows you to format your notebooks with black directly.
This means you had an old version of click
installed from before, and your installation didn't upgrade it
automatically. Updating your installation should do the trick, e.g. pip install -U blackbricks
or similar depending on
your installation method of choice.
See #21.
Style choices made by blackbricks
will follow semantic versioning, with changes that cause differences resulting in
new major versions. Such changes will be kept to an absolute minimum, with none currently planned.
Style choices made by black
(responsible for 95% of the formatting in a notebook) will not follow the same strict
semantic versioning. This is because black
itself does not use semver, but instead provide a year-based
policy. blackbricks
will
make a minor version increase when it upgrades black to a new year. Such a bump should be made once the new year's
release of black
is available. Feel free to open an issue if this has not been done yet.
Notebooks will be terminated with a \n
starting with version 2.0.0
. This harmonizes EOF handling and should be much
less annoying in practice than prior versions. This causes a diff on any notebook that was previously formatted with
blackbricks<2.0.0
.
Also, the deprecated and non-functional flag for two space indentation is removed, and providing said flag is now an error.
Earlier versions of blackbricks applied a patched version of black in order to allow two-space indentation. This was done because Databricks used two-space indentation, and did not allow you to change that.
Since then, Databricks has added the option to choose. Because you can now choose, blackbricks re-joins black in being uncompromising, and since version 1.0 you can no longer choose anything but 4 space indentation.
If you must keep using two-space indentation, then stick to versions <1.0
.