gramfuzz ======== Using gramfuzz consists of three steps: #. defining the grammar(s) #. create a :any:`gramfuzz.GramFuzzer` instance #. loading the grammar #. generating ``num`` random rules from the loaded grammars from a specifc category Example Revisited ^^^^^^^^^^^^^^^^^ In the example on the main page for this documentation (:ref:`tldr_example`), we defined a grammar, and made two new classes: ``NRef`` and ``NDef``. We did this so that we could force any definitions created with ``NDef`` to use the ``"name_def"`` category. The same goes for the ``NRef`` class we made - it forces gramfuzz to lookup referenced definitions in the ``"name_def"`` category instead of the default category. The importance of this functionality becomes clear when we look at the line that actually generates the names: .. code-block:: python names = fuzzer.gen(cat="name", num=10) Notice how we explicitly say that we want the fuzzer to generate ``10`` random rules from the ``"name"`` category (*NOT* the ``"name_def"`` category). This is an intentional way of differentiating between top-level rules that should be chosen randomly to generate, and rules that only exist to help create the top-level rules. In our simple names example, the sole rule definition in the ``"name"`` category: .. code-block:: python Def("name", Join( Opt(NRef("name_prefix")), NRef("first_name"), Opt(NRef("middle_initial")), Opt(NRef("last_name")), Opt(NRef("name_suffix")), sep=" "), cat="name" ) uses the other definitions in the ``"name_def"`` category to complete itself. Preferred Category Groups ^^^^^^^^^^^^^^^^^^^^^^^^ gramfuzz has a concept of a "category group". In the default usage of this concept, a grammar-rule's category group is the name of the python file the rule was defined in. If, say, we had loaded ten separate grammar files into a ``GramFuzzer`` instance, and one of the grammar files was named ``important_grammar.py``, we could tell the fuzzer to focus on all the rules in that grammar 60% of the time: .. code-block:: python # assuming we've already loaded all of the grammars outputs = fuzzer.gen( cat = "the_category", num = 10, preferred = ["important_grammar"], preferred_ratio = 0.6 ) This becomes especially powerful when using the gramfuzz module as a base for more specific/targeted grammar fuzzing. Rule Preprocessing ^^^^^^^^^^^^^^^^^^ Another argument to the :any:`gramfuzz.GramFuzzer.gen` function is the ``auto_process`` parameter, which defaults to ``True``. When true, the ``GramFuzzer`` instance will calculate the reference paths lengths of each rule, as well as which option in each ``Or`` field is the shortest/most direct to generate. Once this is complete, ``GramFuzzer`` will prune all rules that it could not determine a reference length for. This would indiciate that the rule could never terminate in a leaf node/rule, and thus should be removed. If any new grammar rules are added to the ``GramFuzzer`` instance, it will rerun the :any:`gramfuzz.GramFuzzer.preprocess_rules` method the next time ``gen`` is called. Maximum Recursion ^^^^^^^^^^^^^^^^^ The :any:`gramfuzz.GramFuzzer.gen` method has an argument ``max_recursion``. This argument is used to limit the number of times a :any:`gramfuzz.fields.Ref` instance may resolve nested references. For example the code below: .. code-block:: python import gramfuzz from gramfuzz.fields import * import sys class ODef(Def): cat = "other" class ORef(Ref): cat = "other" fuzzer = gramfuzz.GramFuzzer() rule1 = Def("rule1", Or("rule1", ORef("rule2"))) ODef("rule2", Or("rule2", ORef("rule3"))) ODef("rule3", Or("rule3", ORef("rule4"))) ODef("rule4", Or("rule4", ORef("rule5"))) ODef("rule5", "rule5") max_recursion = int(sys.argv[1]) for x in range(10000): print(fuzzer.gen("default", num=1, max_recursion=max_recursion)[0]) yields the output: .. code-block:: text !python test.py 5 | sort | uniq -c 4951 rule1 2473 rule2 1287 rule3 649 rule4 640 rule5 Now if we limit ``max_recursion`` to ``2``, you'll see that it only generates ``rule1`` and ``rule2``: .. code-block:: text !python test.py 2 | sort | uniq -c 4935 rule1 5065 rule2 Once it reaches ``rule2``, the reference level count will have reached a value of ``2``, at which point instead of randomly choosing to generate either the value ``rule2`` or ``ORef("rule3")``, it will choose the field with the shortest number of dereferences back to a leaf value. In this case, it will choose to generate ``rule3`` since it *is* a leaf value and does not require any dereferencing. In a more real-world example, grammars often define lists of items in a recursive manner, like below (taken from the Python 2.7 syntax grammar): .. code-block:: text fpdef: NAME | '(' fplist ')' fplist: fpdef (',' fpdef)* [','] If this were implemented in gramfuzz, it would look like this: .. code-block:: python Def("fpdef", Or( Ref("name"), And("(", Ref("fplist"), ")") )) Def("fplist", Ref("fpdef"), STAR(", ", Ref("fpdef")), Opt(", "), ) Without the max_recursion limits, this could easily result in a maximum recursion depth runtime error in Python (and often does). The ``max_recursion`` limit was specifically added to handle these types of situations. gramfuzz Reference Documentation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. automodule:: gramfuzz :members: