gramfuzz¶
Using gramfuzz consists of three steps:
defining the grammar(s)
create a
gramfuzz.GramFuzzer
instanceloading the grammar
generating
num
random rules from the loaded grammars from a specifc category
Example Revisited¶
In the example on the main page for this documentation (TLDR Example),
we defined a grammar, and made two new classes: NRef
and NDef
.
We did this so that we could force any definitions created with
NDef
to use the "name_def"
category. The same goes for
the NRef
class we made - it forces gramfuzz to lookup referenced
definitions in the "name_def"
category instead of the default
category.
The importance of this functionality becomes clear when we look at the line that actually generates the names:
names = fuzzer.gen(cat="name", num=10)
Notice how we explicitly say that we want the fuzzer to generate 10
random
rules from the "name"
category (NOT the "name_def"
category). This
is an intentional way of differentiating between top-level rules that
should be chosen randomly to generate, and rules that only exist to
help create the top-level rules.
In our simple names example, the sole rule definition in the "name"
category:
Def("name",
Join(
Opt(NRef("name_prefix")),
NRef("first_name"),
Opt(NRef("middle_initial")),
Opt(NRef("last_name")),
Opt(NRef("name_suffix")),
sep=" "),
cat="name"
)
uses the other definitions in the "name_def"
category to complete itself.
Preferred Category Groups¶
gramfuzz has a concept of a “category group”. In the default usage of this concept, a grammar-rule’s category group is the name of the python file the rule was defined in.
If, say, we had loaded ten separate grammar files into a GramFuzzer
instance,
and one of the grammar files was named important_grammar.py
, we could tell
the fuzzer to focus on all the rules in that grammar 60% of the time:
# assuming we've already loaded all of the grammars
outputs = fuzzer.gen(
cat = "the_category",
num = 10,
preferred = ["important_grammar"],
preferred_ratio = 0.6
)
This becomes especially powerful when using the gramfuzz module as a base for more specific/targeted grammar fuzzing.
Rule Preprocessing¶
Another argument to the gramfuzz.GramFuzzer.gen
function is the
auto_process
parameter, which defaults to True
.
When true, the GramFuzzer
instance will calculate the reference paths lengths
of each rule, as well as which option in each Or
field is the shortest/most direct to generate.
Once this is complete, GramFuzzer
will prune all rules that it could
not determine a reference length for. This would indiciate that the rule
could never terminate in a leaf node/rule, and thus should be removed.
If any new grammar rules are added to the GramFuzzer
instance, it
will rerun the gramfuzz.GramFuzzer.preprocess_rules
method
the next time gen
is called.
Maximum Recursion¶
The gramfuzz.GramFuzzer.gen
method has an argument max_recursion
.
This argument is used to limit the number of times a gramfuzz.fields.Ref
instance may resolve nested references.
For example the code below:
import gramfuzz
from gramfuzz.fields import *
import sys
class ODef(Def): cat = "other"
class ORef(Ref): cat = "other"
fuzzer = gramfuzz.GramFuzzer()
rule1 = Def("rule1", Or("rule1", ORef("rule2")))
ODef("rule2", Or("rule2", ORef("rule3")))
ODef("rule3", Or("rule3", ORef("rule4")))
ODef("rule4", Or("rule4", ORef("rule5")))
ODef("rule5", "rule5")
max_recursion = int(sys.argv[1])
for x in range(10000):
print(fuzzer.gen("default", num=1, max_recursion=max_recursion)[0])
yields the output:
!python test.py 5 | sort | uniq -c
4951 rule1
2473 rule2
1287 rule3
649 rule4
640 rule5
Now if we limit max_recursion
to 2
, you’ll see that it only
generates rule1
and rule2
:
!python test.py 2 | sort | uniq -c
4935 rule1
5065 rule2
Once it reaches rule2
, the reference level count will have reached
a value of 2
, at which point instead of randomly choosing to generate either
the value rule2
or ORef("rule3")
, it will choose the field with
the shortest number of dereferences back to a leaf value. In this case,
it will choose to generate rule3
since it is a leaf value and does
not require any dereferencing.
In a more real-world example, grammars often define lists of items in a recursive manner, like below (taken from the Python 2.7 syntax grammar):
fpdef: NAME | '(' fplist ')'
fplist: fpdef (',' fpdef)* [',']
If this were implemented in gramfuzz, it would look like this:
Def("fpdef", Or(
Ref("name"),
And("(", Ref("fplist"), ")")
))
Def("fplist",
Ref("fpdef"), STAR(", ", Ref("fpdef")), Opt(", "),
)
Without the max_recursion limits, this could easily result in a
maximum recursion depth runtime error in Python (and often does).
The max_recursion
limit was specifically added to handle these
types of situations.
gramfuzz Reference Documentation¶
This module defines the main GramFuzzer
class through
which rules are defined and rules are randomly generated.
-
class
gramfuzz.
GramFuzzer
(debug=False)[source]¶ GramFuzzer
is a singleton class that is used to hold rule definitions and to generate grammar rules from a specific category at random.-
add_definition
(cat, def_name, def_val, no_prune=False, gram_file='default')[source]¶ Add a new rule definition named
def_name
having valuedef_value
to the categorycat
.- Parameters
cat (str) – The category to add the rule to
def_name (str) – The name of the rule definition
def_val – The value of the rule definition
no_prune (bool) – If the rule should explicitly NOT be pruned even if it has been determined to be unreachable (default=``False``)
gram_file (str) – The file the rule was defined in (default=``”default”``).
-
add_to_cat_group
(cat, cat_group, def_name)[source]¶ Associate the provided rule definition name
def_name
with the category groupcat_group
in the categorycat
.- Parameters
cat (str) – The category the rule definition was declared in
cat_group (str) – The group within the category the rule belongs to
def_name (str) – The name of the rule definition
-
cat_groups
= {}¶ Used to store where rules were defined in. E.g. if a rule
A
was defined using the categoryalphabet_rules
in a file calledtest_rules.py
, it would show up incat_groups
as:{ "alphabet_rules": { "test_rules": ["A"] } }
This lets the user specify probabilities/priorities for rules coming from certain grammar files
-
defs
= {}¶ Rule definitions by category. E.g.
{
- “category”: {
“rule1”: [<Rule1Def1>, <Rule1Def2>], “rule2”: [<Rule2Def1>, <Rule2Def2>], …
}
}
-
gen
(num, cat=None, cat_group=None, preferred=None, preferred_ratio=0.5, max_recursion=None, auto_process=True)[source]¶ Generate
num
rules from categorycat
, optionally specifying preferred category groupspreferred
that should be preferred at probabilitypreferred_ratio
over other randomly-chosen rule definitions.- Parameters
num (int) – The number of rules to generate
cat (str) – The name of the category to generate
num
rules fromcat_group (str) – The category group (ie python file) to generate rules from. This was added specifically to make it easier to generate data based on the name of the file the grammar was defined in, and is intended to work with the
TOP_CAT
values that may be defined in a loaded grammar file.preferred (list) – A list of preferred category groups to generate rules from
preferred_ratio (float) – The percent probability that the preferred groups will be chosen over randomly choosen rule definitions from category
cat
.max_recursion (int) – The maximum amount to allow references to recurse
auto_process (bool) – Whether rules should be automatically pruned and shortest reference paths determined. See
gramfuzz.GramFuzzer.preprocess_rules
for what would automatically be done.
-
get_ref
(cat, refname)[source]¶ Return one of the rules in the category
cat
with the namerefname
. If multiple rule defintions exist for the defintion namerefname
, usegramfuzz.rand
to choose a rule at random.- Parameters
cat (str) – The category to look for the rule in.
refname (str) – The name of the rule definition. If the rule definition’s name is
"*"
, then a rule name will be chosen at random from within the categorycat
.
- Returns
gramfuzz.fields.Def
-
load_grammar
(path)[source]¶ Load a grammar file (python file containing grammar definitions) by file path. When loaded, the global variable
GRAMFUZZER
will be set within the module. This is not always needed, but can be useful.- Parameters
path (str) – The path to the grammar file
-
no_prunes
= {}¶ Rules that have specifically asked not to be pruned, even if the rule can’t be reached.
-
post_revert
(cat, res, total_num, num, info)[source]¶ Commit any staged rule definition changes (rule generation went smoothly).
-
preprocess_rules
()[source]¶ Calculate shortest reference-paths of each rule (and Or field), and prune all unreachable rules.
-
revert
(info=None)[source]¶ Revert after a single def errored during generate (throw away all staged rule definition changes)
-
set_cat_group_top_level_cat
(cat_group, top_level_cat)[source]¶ Set the default category when generating data from the grammars defined in cat group. Note a cat group is usually just the basename of the grammar file, minus the
.py
.- Parameters
cat_group (str) – The category group to set the default top-level cat for
top_level_cat (str) – The top-level (default) category of the cat group
-
set_max_recursion
(level)[source]¶ Set the maximum reference-recursion depth (not the Python system maximum stack recursion level). This controls how many levels deep of nested references are allowed before gramfuzz attempts to generate the shortest (reference-wise) rules possible.
- Parameters
level (int) – The new maximum reference level
-