PyBugHive Logo

PyBugHive

A Comprehensive Database of Manually Validated, Reproducible Python Bugs

Projects

Name Issues Domain LOC Test
(LOC)
Stars Forks Commits Years
Active
psf/black 38 code formatter 303 466 12 709 33 389 2 161 1 627 5
cookiecutter/cookiecutter 2 project generator from templates 5 631 4 055 20 162 1 892 3 011 10
Rapptz/discord.py 2 Discord bot 37 960 1 464 13 368 3 773 4 906 8
freqtrade/freqtrade 15 cryptocurrency trading 73 015 41 765 22 755 5 213 22 035 6
numpy/numpy 1 numerical computing 170 179 91 891 24 344 8 459 33 396 13
pandas-dev/pandas 43 data analysis 394 739 268 554 39 550 16 639 33 195 13
python-poetry/poetry 11 package manager and builder 39 411 24 504 26 358 2 050 2 938 5
saltstack/salt 7 configuration management/automation 747 255 307 831 13 472 5 471 118 331 12
scrapy/scrapy 2 web scraping 45 052 29 105 48 267 10 127 10 032 13
explosion/spaCy 11 Natural language processing 89 644 30 771 26 954 4 232 15 984 9
google/jax 17 program transformation 179 079 90 476 28 316 2592 20 893 6
Average 13   189 584 82 102 26 994 5 691 24 213 10

Using PyBugHive

Using PyBugHive is straightforward as all commands (besides the clean command) follow the same pattern: python pybughive.py {command} {project}-{issue}, where {project} is the name of the selected project’s repository and {issue} is the number of the selected issue. The possible commands are the following:

For example, if a user would like to validate 1493 from the project black, they should do the following sequence:

  1. python pybughive.py checkout black-1493
  2. python pybughive.py install black-1493
  3. python pybughive.py test black-1493
  4. python pybughive.py fix black-1493
  5. python pybughive.py test black-1493

Here, step 1) checks out the appropriate commit from the selected repository, step 2) installs the project and step 3) runs the tests. Here, the tests related to the bug should fail. Step 4) applies the bugfix and step 5) runs the tests again, which should pass this time.

Before PyBugHive can be used, it needs to be configured by setting the INSTALL_DIRECTORY, which specifies where to download the repositories, and the MONGO_URL, which specifies the connection string to MongoDB. There are three ways the user can provide these:

  1. Add them to the system’s environment variables.
  2. Create a .env file in the PyBugHive project root and add the above variables to it.
  3. Same as 2), but with a config.py file. We offer a Docker image with everything pre-installed, which further helps with an isolated run of PyBugHive. Moreover, we provide an installer script (setup.sh) that installs all required Python versions and other dependencies needed to successfully run any bug presented in our database. However, for better isolation, we recommend using the Docker environment.

Offline version

A major concern for bug databases is to continuously ensure the reproducibility of bugs, for example, due to continuously changing dependencies and improperly locked dependency versions. This is the main reason why we created an offline version of PyBugHive. This version of the dataset does not need a database connection or even an internet connection so the database can be used in a completely offline environment as well. For each bug, we downloaded the specific project both before and after the fix, created the corresponding environment, and compressed everything into a zip file. This way, the users do not have to install the particular project manually; the provided virtual environment is already prepared, and the tests can be run. The drawback of this version is its large size.