Prerequisites: pytest & hypothesis¶
HypoFuzz is designed to run hypothesis tests, so you’ll need some of those as fuzz targets.
Our current implementation uses pytest to collect the tests to run,
so you can select specific tests in the usual way by file,
or just allow pytest to discover all your tests for you.
HypoFuzz is a pure-Python package, and can be installed from your shell with
pip install hypofuzz
or from a
requirements.txt file like
hypofuzz >= 21.05.1
Optionally, you can
pip install hypofuzz[pytrace] for automatic time-travel
debugging with PyTrace through the web interface.
There are still some rough edges here, but it’s pretty cool.
Running hypothesis fuzz¶
The core idea is that while you run pytest ... on each change, you run hypothesis fuzz ... on a server - and it’ll keep searching for interesting new inputs until shut down from outside.
$ hypothesis fuzz --help
Usage: hypothesis fuzz [OPTIONS] [-- PYTEST_ARGS]
[hypofuzz] runs tests with an adaptive coverage-guided fuzzer.
Unrecognised arguments are passed through to `pytest` to select the tests to
run, with the additional constraint that only tests using Hypothesis but not
any pytest fixtures can be fuzzed.
This process will run forever unless stopped with e.g. ctrl-C.
-n, --numprocesses NUM default: all available cores [x>=1]
--dashboard / --no-dashboard serve / don't serve a live dashboard page
--port PORT Optional port for the dashboard (if any)
--unsafe Allow concurrent execution of each test
(dashboard may report wrong results)
-h, --help Show this message and exit.
By design, this is minimally configurable: test selection and collection is
pytest, using exactly the same syntax as usual, and the
remaining options are out of scope for the fuzzer itself to determine.
Reproducing and fixing bugs¶
HypoFuzz saves any failures it finds into Hypothesis’ standard example database, so the workflow for deduplicating and reproducing any failures is “run your test suite in the usual way”.
It really is that easy!
A quick glance under the hood¶
HypoFuzz isn’t “better” than Hypothesis - it’s playing a different game, and the main difference is that it runs for much longer. That means:
The performance overhead of coverage instrumentation pays off, as we can tell when inputs do something unusual and spend more time generating similar things in future.
Instead of running 100 examples for each test before moving on to the next, we can interleave them, run different numbers of examples for each test, and focus on the ones where we’re discovering new behaviours fastest.
We spend our time generating more interesting examples, focussed on the most complex tests, and do so without any human input at all.