gramfuzz

Using gramfuzz consists of three steps:

  1. defining the grammar(s)

  2. create a gramfuzz.GramFuzzer instance

  3. loading the grammar

  4. generating num random rules from the loaded grammars from a specifc category

Example Revisited

In the example on the main page for this documentation (TLDR Example), we defined a grammar, and made two new classes: NRef and NDef.

We did this so that we could force any definitions created with NDef to use the "name_def" category. The same goes for the NRef class we made - it forces gramfuzz to lookup referenced definitions in the "name_def" category instead of the default category.

The importance of this functionality becomes clear when we look at the line that actually generates the names:

names = fuzzer.gen(cat="name", num=10)

Notice how we explicitly say that we want the fuzzer to generate 10 random rules from the "name" category (NOT the "name_def" category). This is an intentional way of differentiating between top-level rules that should be chosen randomly to generate, and rules that only exist to help create the top-level rules.

In our simple names example, the sole rule definition in the "name" category:

Def("name",
    Join(
        Opt(NRef("name_prefix")),
        NRef("first_name"),
        Opt(NRef("middle_initial")),
        Opt(NRef("last_name")),
        Opt(NRef("name_suffix")),
    sep=" "),
    cat="name"
)

uses the other definitions in the "name_def" category to complete itself.

Preferred Category Groups

gramfuzz has a concept of a “category group”. In the default usage of this concept, a grammar-rule’s category group is the name of the python file the rule was defined in.

If, say, we had loaded ten separate grammar files into a GramFuzzer instance, and one of the grammar files was named important_grammar.py, we could tell the fuzzer to focus on all the rules in that grammar 60% of the time:

# assuming we've already loaded all of the grammars
outputs = fuzzer.gen(
    cat             = "the_category",
    num             = 10,
    preferred       = ["important_grammar"],
    preferred_ratio = 0.6
)

This becomes especially powerful when using the gramfuzz module as a base for more specific/targeted grammar fuzzing.

Rule Preprocessing

Another argument to the gramfuzz.GramFuzzer.gen function is the auto_process parameter, which defaults to True.

When true, the GramFuzzer instance will calculate the reference paths lengths of each rule, as well as which option in each Or field is the shortest/most direct to generate.

Once this is complete, GramFuzzer will prune all rules that it could not determine a reference length for. This would indiciate that the rule could never terminate in a leaf node/rule, and thus should be removed.

If any new grammar rules are added to the GramFuzzer instance, it will rerun the gramfuzz.GramFuzzer.preprocess_rules method the next time gen is called.

Maximum Recursion

The gramfuzz.GramFuzzer.gen method has an argument max_recursion. This argument is used to limit the number of times a gramfuzz.fields.Ref instance may resolve nested references.

For example the code below:

import gramfuzz
from gramfuzz.fields import *
import sys

class ODef(Def): cat = "other"
class ORef(Ref): cat = "other"

fuzzer = gramfuzz.GramFuzzer()

rule1 = Def("rule1", Or("rule1", ORef("rule2")))
ODef("rule2", Or("rule2", ORef("rule3")))
ODef("rule3", Or("rule3", ORef("rule4")))
ODef("rule4", Or("rule4", ORef("rule5")))
ODef("rule5", "rule5")

max_recursion = int(sys.argv[1])
for x in range(10000):
    print(fuzzer.gen("default", num=1, max_recursion=max_recursion)[0])

yields the output:

!python test.py 5 | sort | uniq -c
   4951 rule1
   2473 rule2
   1287 rule3
    649 rule4
    640 rule5

Now if we limit max_recursion to 2, you’ll see that it only generates rule1 and rule2:

!python test.py 2 | sort | uniq -c
   4935 rule1
   5065 rule2

Once it reaches rule2, the reference level count will have reached a value of 2, at which point instead of randomly choosing to generate either the value rule2 or ORef("rule3"), it will choose the field with the shortest number of dereferences back to a leaf value. In this case, it will choose to generate rule3 since it is a leaf value and does not require any dereferencing.

In a more real-world example, grammars often define lists of items in a recursive manner, like below (taken from the Python 2.7 syntax grammar):

fpdef: NAME | '(' fplist ')'
fplist: fpdef (',' fpdef)* [',']

If this were implemented in gramfuzz, it would look like this:

Def("fpdef", Or(
    Ref("name"),
    And("(", Ref("fplist"), ")")
))
Def("fplist",
    Ref("fpdef"), STAR(", ", Ref("fpdef")), Opt(", "),
)

Without the max_recursion limits, this could easily result in a maximum recursion depth runtime error in Python (and often does). The max_recursion limit was specifically added to handle these types of situations.

gramfuzz Reference Documentation

This module defines the main GramFuzzer class through which rules are defined and rules are randomly generated.

class gramfuzz.GramFuzzer(debug=False)[source]

GramFuzzer is a singleton class that is used to hold rule definitions and to generate grammar rules from a specific category at random.

__init__(debug=False)[source]

Create a new GramFuzzer instance

add_definition(cat, def_name, def_val, no_prune=False, gram_file='default')[source]

Add a new rule definition named def_name having value def_value to the category cat.

Parameters
  • cat (str) – The category to add the rule to

  • def_name (str) – The name of the rule definition

  • def_val – The value of the rule definition

  • no_prune (bool) – If the rule should explicitly NOT be pruned even if it has been determined to be unreachable (default=``False``)

  • gram_file (str) – The file the rule was defined in (default=``”default”``).

add_to_cat_group(cat, cat_group, def_name)[source]

Associate the provided rule definition name def_name with the category group cat_group in the category cat.

Parameters
  • cat (str) – The category the rule definition was declared in

  • cat_group (str) – The group within the category the rule belongs to

  • def_name (str) – The name of the rule definition

cat_groups = {}

Used to store where rules were defined in. E.g. if a rule A was defined using the category alphabet_rules in a file called test_rules.py, it would show up in cat_groups as:

{
    "alphabet_rules": {
        "test_rules": ["A"]
    }
}

This lets the user specify probabilities/priorities for rules coming from certain grammar files

defs = {}

Rule definitions by category. E.g.

{
“category”: {

“rule1”: [<Rule1Def1>, <Rule1Def2>], “rule2”: [<Rule2Def1>, <Rule2Def2>], …

}

}

gen(num, cat=None, cat_group=None, preferred=None, preferred_ratio=0.5, max_recursion=None, auto_process=True)[source]

Generate num rules from category cat, optionally specifying preferred category groups preferred that should be preferred at probability preferred_ratio over other randomly-chosen rule definitions.

Parameters
  • num (int) – The number of rules to generate

  • cat (str) – The name of the category to generate num rules from

  • cat_group (str) – The category group (ie python file) to generate rules from. This was added specifically to make it easier to generate data based on the name of the file the grammar was defined in, and is intended to work with the TOP_CAT values that may be defined in a loaded grammar file.

  • preferred (list) – A list of preferred category groups to generate rules from

  • preferred_ratio (float) – The percent probability that the preferred groups will be chosen over randomly choosen rule definitions from category cat.

  • max_recursion (int) – The maximum amount to allow references to recurse

  • auto_process (bool) – Whether rules should be automatically pruned and shortest reference paths determined. See gramfuzz.GramFuzzer.preprocess_rules for what would automatically be done.

get_ref(cat, refname)[source]

Return one of the rules in the category cat with the name refname. If multiple rule defintions exist for the defintion name refname, use gramfuzz.rand to choose a rule at random.

Parameters
  • cat (str) – The category to look for the rule in.

  • refname (str) – The name of the rule definition. If the rule definition’s name is "*", then a rule name will be chosen at random from within the category cat.

Returns

gramfuzz.fields.Def

classmethod instance()[source]

Return the singleton instance of the GramFuzzer

load_grammar(path)[source]

Load a grammar file (python file containing grammar definitions) by file path. When loaded, the global variable GRAMFUZZER will be set within the module. This is not always needed, but can be useful.

Parameters

path (str) – The path to the grammar file

no_prunes = {}

Rules that have specifically asked not to be pruned, even if the rule can’t be reached.

post_revert(cat, res, total_num, num, info)[source]

Commit any staged rule definition changes (rule generation went smoothly).

pre_revert(info=None)[source]

Signal to begin saving any changes that might need to be reverted

preprocess_rules()[source]

Calculate shortest reference-paths of each rule (and Or field), and prune all unreachable rules.

revert(info=None)[source]

Revert after a single def errored during generate (throw away all staged rule definition changes)

set_cat_group_top_level_cat(cat_group, top_level_cat)[source]

Set the default category when generating data from the grammars defined in cat group. Note a cat group is usually just the basename of the grammar file, minus the .py.

Parameters
  • cat_group (str) – The category group to set the default top-level cat for

  • top_level_cat (str) – The top-level (default) category of the cat group

set_max_recursion(level)[source]

Set the maximum reference-recursion depth (not the Python system maximum stack recursion level). This controls how many levels deep of nested references are allowed before gramfuzz attempts to generate the shortest (reference-wise) rules possible.

Parameters

level (int) – The new maximum reference level