Thanks for considering contributing to Ploomber!
Issues tagged with good first issue are great options to start contributing.
If you get stuck, open an issue or reach out to us on Slack and we'll happily help you.
If you're contributing to the documentation, go to doc/CONTRIBUTING.md.
The easiest way to setup the development environment is via the setup command; you must have miniconda installed. If you don't want to use conda, skip to the next section.
Click here for miniconda installation details.
Make sure conda has conda-forge as channel, running the following:
conda config --add channels conda-forge
Once you have conda ready:
# get the code
git clone https://github.com/ploomber/ploomber
# invoke is a library we use to manage one-off commands
pip install invoke
# move into ploomber directory
cd ploomber
# setup development environment
invoke setup
Note: If you're using Linux, you may encounter issues running invoke setup
regarding the psycopg2
package. If that's the case, remove psycopg2
from the setup.py
file and try again.
Then activate the environment:
conda activate ploomber
Ploomber has optional features that depend on packages that aren't straightforward to install, so we use conda
for quickly setting up the development environment. But you can still get a pretty good development environment using pip
alone.
Note: we highly recommend you to install ploomber in a virtual environment (the most straightforward alternative is the venv built-in module):
# create virtual env
python -m venv ploomber-venv
# activate virtual env (linux/macOS)
source ploomber-venv/bin/activate
# activate virtual env (windows)
.\ploomber-venv\Scripts\activate
Note: Check venv docs to find the appropriate command if you're using Windows.
# required to run the next command
pip install invoke
# install dependencies with pip
invoke setup-pip
Note: If you're using Linux, you may encounter issues running invoke setup
regarding the psycopg2
package. If that's the case, remove psycopg2
from the setup.py
file and try again.
Conda takes care of installing all dependencies required to run all tests. However, we need to skip a few of them when installing with pip because either the library is not pip-installable or any of their dependencies are. So if you use invoke setup-pip
to configure your environment, some tests will fail. This isn't usually a problem if you're developing a specific feature; you can run a subset of the testing suite and let GitHub run the entire test suite when pushing your code.
However, if you wish to have a full setup, you must install the following dependencies:
- pygrapviz (note that this depends on graphviz) which can't be installed by pip
- IRKernel (note that this requires an R installation)
Make sure everything is working correctly:
# import ploomber
python -c 'import ploomber; print(ploomber)'
Note: the output of the previous command should be the directory where you ran git clone
; if it's not, try re-activating your conda environment (i.e., if using conda: conda activate base
, then conda activate ploomber
) If this doesn't work, open an issue or reach out to us on Slack.
Run some tests:
pytest tests/util
We receive contributions via Pull Requests (PRs). We recommend you check out this guide.
We use yapf for formatting code. Please run yapf on your code before submitting:
yapf --in-place path/to/file.py
We use flake8 for linting. Please check your code with flake8 before submitting:
# run this in the project directory to check code with flake8
# note: this takes a few seconds to finish
flake8
Note: If you created a virtual env in a child directory, exclude it from flake8
using the --exclude
argument (e.g., flake8 --exclude my-venv
), ploomber-venv
is excluded by default.
If you don't see any output after running flake8
, you're good to go!
If you want git to automatically check your code with flake8
before you push to your fork, you can install a pre-push hook locally:
# to install pre-push git hook
invoke install-git-hook
# to uninstall pre-push git hook
invoke uninstall-git-hook
The installed hook only takes effect in your current repository.
When you have finished the feature development and you are ready for a Code Review (a.k.a Pull Request in Github), make sure you "squash"
the commits in your development branch before creating a PR
There are two ways to do that:
- Squash all the commits in Command Line
- Utilize Github PR page when you are about to merge to main branch
1. Using Command Line: git rebase
$ git rebase -i <-i for interactive>
# "squash" the command history
# for example
pick commit_hash_1 commit_message_1
s commit_hash_2 commit_message_2
s commit_hash_3 commit_message_3
s commit_hash_4 commit_message_4
2. Using Github Console
- We use pytest for testing. A basic understanding of
pytest
is highly recommended to get started - In most cases, for a given in
src/ploomber/{module-name}
, there is a testing module intests/{module-name}
, if you're working on a particular module, you can execute the corresponding testing module for faster development but when submitting a pull request, all tests will run - Ploomber loads user's code dynamically via dotted paths (e.g.,
my_module.my_function
is similar to doingfrom my_module import my_function
). Hence, some of our tests do this as well. Dynamic imports can become a problem if tests create and import modules (i.e., create a new.py
file and import it). To prevent temporary modules from polluting other tasks, use thetmp_imports
pytest fixture, which deletes all packages imported inside a test - Some tests make calls to a PostgreSQL database. When running on Github Actions, a database is automatically provisioned, but the tests will fail locally.
- If you're checking error messages and they include absolute paths to files, you may encounter some issues when running the Windows CI since the Github Actions VM has some symlinks. If the test calls
Pathlib.resolve()
(resolves symlinks), call it in the test as well, if it doesn't, useos.path.abspath()
(does not resolve symlinks).
ploomber
is available in conda (via conda-forge). The recipes are located here:
The first feedstock corresponds to the core package, and the second is a complimentary package that implements the scaffolding logic (i.e., ploomber scaffold
). When uploading a new version to PyPI, the conda-forge bot automatically opens a PR to the feedstocks; upon approval, the new versions are available to install via conda install ploomber --channel conda-forge
.
Note that conda-forge implements a CI pipeline that checks that the recipe works. Thus, under most circumstances, the PR will pass. One exception is when adding new dependencies to setup.py
; in such a case, we must manually edit the recipe (meta.yml
) and open a PR to the feedstock. See the next section for details.
Note that it takes some time for packages to be available for download. You can verify successful upload by opening Anaconda.org (ploomber, ploomber-scaffold); such website is updated immediately.
To check if packages are available: conda search ploomber --channel cf-staging
. Pending packages will appear in channel cf-staging
while available packages in conda-forge
. It usually takes less than one hour for packages to move from one to the other.
If conda-forge
's bot PR fails (usually because a new dependency was added), we must submit a PR ourselves:
- Fork feedstock repository
- Clone it:
git clone https://github.com/{your-user}/ploomber-feedstock
(changeyour-user
) - Create a new branch:
git checkout -b branch-name
- Update recipe (
meta.yaml
):- Update the version in the
{% set version = "version" %}
line - Update
source.sha256
, you can get that fromhttps://pypi.org/project/ploomber/{version}/#files
, just change theversion
and copy the SHA256 hash from the.tar.gz
file - If there are new dependencies (or new constraints), add them to
requirements.run
- Update the version in the
- You may need to run
conda smithy rerender -c auto
(click here for details)
If you already forked the repository, you can sync with the original repository like this:
git remote add upstream https://github.com/conda-forge/ploomber-feedstock
git fetch upstream
git checkout main
git merge upstream/main
We follow scikit-learn's guidelines.