PyBugHive

Projects

Name	Issues	Domain	LOC	Test (LOC)	Stars	Forks	Commits	Years Active
psf/black	38	code formatter	303 466	12 709	33 389	2 161	1 627	5
cookiecutter/cookiecutter	2	project generator from templates	5 631	4 055	20 162	1 892	3 011	10
Rapptz/discord.py	2	Discord bot	37 960	1 464	13 368	3 773	4 906	8
freqtrade/freqtrade	15	cryptocurrency trading	73 015	41 765	22 755	5 213	22 035	6
numpy/numpy	1	numerical computing	170 179	91 891	24 344	8 459	33 396	13
pandas-dev/pandas	43	data analysis	394 739	268 554	39 550	16 639	33 195	13
python-poetry/poetry	11	package manager and builder	39 411	24 504	26 358	2 050	2 938	5
saltstack/salt	7	configuration management/automation	747 255	307 831	13 472	5 471	118 331	12
scrapy/scrapy	2	web scraping	45 052	29 105	48 267	10 127	10 032	13
explosion/spaCy	11	Natural language processing	89 644	30 771	26 954	4 232	15 984	9
google/jax	17	program transformation	179 079	90 476	28 316	2592	20 893	6
Average	13		189 584	82 102	26 994	5 691	24 213	10

Using PyBugHive

Using PyBugHive is straightforward as all commands (besides the clean command) follow the same pattern: python pybughive.py {command} {project}-{issue}, where {project} is the name of the selected project’s repository and {issue} is the number of the selected issue. The possible commands are the following:

checkout: Clones the selected repository and checks out the appropriate commit hash. This should be the first command when using this tool.
install: Searches for the appropriate install steps and runs them. This should be the second command.
test: Runs the appropriate test steps. When used with the --all flag, instead of testing just the file(s) included in the issue, it runs all tests.
fix: Installs the fix for the selected bug.
clean: This command does not need a project or an issue number. It deletes all temporary files generated during the run. When used with the --all flag, it deletes everything the tool downloaded, including the repositories.

For example, if a user would like to validate 1493 from the project black, they should do the following sequence:

python pybughive.py checkout black-1493
python pybughive.py install black-1493
python pybughive.py test black-1493
python pybughive.py fix black-1493
python pybughive.py test black-1493

Here, step 1) checks out the appropriate commit from the selected repository, step 2) installs the project and step 3) runs the tests. Here, the tests related to the bug should fail. Step 4) applies the bugfix and step 5) runs the tests again, which should pass this time.

Before PyBugHive can be used, it needs to be configured by setting the INSTALL_DIRECTORY, which specifies where to download the repositories, and the MONGO_URL, which specifies the connection string to MongoDB. There are three ways the user can provide these:

Add them to the system’s environment variables.
Create a .env file in the PyBugHive project root and add the above variables to it.
Same as 2), but with a config.py file. We offer a Docker image with everything pre-installed, which further helps with an isolated run of PyBugHive. Moreover, we provide an installer script (setup.sh) that installs all required Python versions and other dependencies needed to successfully run any bug presented in our database. However, for better isolation, we recommend using the Docker environment.

Offline version

A major concern for bug databases is to continuously ensure the reproducibility of bugs, for example, due to continuously changing dependencies and improperly locked dependency versions. This is the main reason why we created an offline version of PyBugHive. This version of the dataset does not need a database connection or even an internet connection so the database can be used in a completely offline environment as well. For each bug, we downloaded the specific project both before and after the fix, created the corresponding environment, and compressed everything into a zip file. This way, the users do not have to install the particular project manually; the provided virtual environment is already prepared, and the tests can be run. The drawback of this version is its large size.