parsec.py recursive definitions
parsec.py recursive definitions
I'm loving the simplicity of this library. Sadly I haven't figure out how to make recursive definitions:
Consider this minimal contrived example:
import parsec as psc
digit = psc.regex("[0-9]")
number = lambda: digit ^ (digit + number())
_ = number().parse("42")
As expected, the evaluation of number()
is infinite recursive:
number()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 1, in <lambda>
File "<stdin>", line 1, in <lambda>
[Previous line repeated 995 more times]
RecursionError: maximum recursion depth exceeded
I know in this specific example it can be solved by changing line 3 to
number = lambda: psc.many1(digit)
but I'm not sure it's always possible for a general case.
Is it possible to make the parsing of number recursive as in
<number> ::= <digit><number> | <digit>
2 Answers
2
(UPDATE) Disclaimer: As we found out in the comments, this only works with versions of parsec.py above 3.3, which (as of August 2018) is the latest release available on PyPI, so for now you have to manually install the development version from GitHub in order for this solution work.
In cases like these, I find that it is often helpful to "manually expand" the primitives provided by parsec
and write a "low-level" parser of our own (i.e. one that takes text
and index
arguments, not one built up from parsec primitives), just to see what is going on. By taking the definition of try_choice
(^
) from the parsec source code and manually plugging in the constituents of your expression1, we can write a parser which I think does what you want:
parsec
text
index
try_choice
^
import parsec as psc
digit = psc.regex("[0-9]")
def number():
@psc.Parser
def number_parser(text, index):
res = (digit + number())(text, index)
return res if res.status else digit(text, index)
return number_parser
_ = number().parse("42423")
print(_)
Output:
('4', ('2', ('4', ('2', '3'))))
The reason this works is of course because the "parser-returning function" number()
gets called only conditionally within the actual parser, not (unconditionally) within itself.
number()
An easier way to write this is to not use a "parser-returning function" at all but just write the parser itself directly:
@psc.Parser
def number(text, index):
return ((digit + number) ^ digit)(text, index)
The usage changes from number().parse("42423")
to number.parse("42423")
but it does the same thing.
number().parse("42423")
number.parse("42423")
Lastly, is there some functionality in parsec.py that lets you do this without the explicit text
and index
arguments, such that the parser is built up entirely from other parsec.py primitives? It turns out that parsec.generate
fits the bill rather nicely! From its in-source documentation:
text
index
parsec.generate
The most powerful way to construct a parser is to use the generate
decorator. The generate
creates a parser from a generator that
should yield parsers. These parsers are applied successively and their
results are sent back to the generator using .send()
protocol. The
generator should return the result or another parser, which is
equivalent to applying it and returning its result.
generate
.send()
You can find example usages here, from which it becomes clear that we can write:
@psc.generate
def number():
r = yield (digit + number) ^ digit
return r
This works because besides all the fancy generator magic it does, the generate
decorator returns a function which is itself decorated with @parsec.Parser
(cf. source linked above), so the resulting, fully decorated number
function will be of the same kind as the one in the 2nd solution. Hence, we can use it in the same manner as digit
etc. without having to call it or anything, which is what we do in the yield
expression.
generate
@parsec.Parser
number
digit
yield
Usage is again just number.parse("42423")
and it returns the same thing as the other two solutions.
number.parse("42423")
Maybe there is a better solution out there but this is all I could come up with.
1 I had to reverse the order from digit ^ (digit + number())
to (digit + number()) ^ digit
, because the first one returns just the first digit and then considers itself done. I hope this is ok?
digit ^ (digit + number())
(digit + number()) ^ digit
tokenize = lambda p: p + psc.regex(r"s+|Z")
res = tokenize(digit)(text, index)
return (tokenize(digit) ^ digit + number)(text, index)
r = yield tokenize(digit) ^ (digit + number)
@jabozzo My bad, the first example didn't have the change described in the footnote applied (reversed order) - I edited it and it should work fine now. What is the problem with the other two examples? They work fine for me and don't lead to infinite recursion. Or do you mean that it has to be
digit ^ (digit + number)
and not the other way round?– pmos
Aug 21 at 2:42
digit ^ (digit + number)
The updated first approach also has infinite recursion problems. I've just copied/pasted your functions into a file one after the other and doing a
try: print(number.parse("24243")) except RecursionError as _: print("Recursion")
in between each redefinition and I get 3 "Recursion". I'ts like in the (digit + number) part, digit is not consuming a character. Here is exactly what I'm running.– jabozzo
Aug 21 at 3:05
try: print(number.parse("24243")) except RecursionError as _: print("Recursion")
@jabozzo Yes that's probably it, I just tried with your version of parsec and it led to recursion even with Python 3.5 - but if I install the development version directly from Github (
pip install git+https://github.com/sighingnow/parsec.py.git
) it works fine, so I guess they changed something important since the latest release.– pmos
Aug 21 at 3:19
pip install git+https://github.com/sighingnow/parsec.py.git
@jabozzo You're very welcome, this has been interesting :) From my testing it looks like the bug fix that makes it work in the Github version is this, so it's actually a problem with
joint
(+
).– pmos
Aug 21 at 3:28
joint
+
Another quick way to solve this problem if there are already a lot of parser-returning functions or just want to be consistent in style is to make the function lazy evaluate:
import parsec as psc
def lazy(fn):
@psc.Parser
def _lazy(text, index):
return fn()(text, index)
return _lazy
digit = lambda: psc.regex("[0-9]")
number = lambda: (digit() + lazy(number)) ^ digit()
print(number().parse("423242"))
Prints:
('4', ('2', ('3', ('2', ('4', '2')))))
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Your fixed version is actually the example I wanted to show. Your first approach is only detecting the fist digit, the other two are having the same recursion problems. For fixing them I first defined:
tokenize = lambda p: p + psc.regex(r"s+|Z")
. I changed res in the first approach tores = tokenize(digit)(text, index)
. In the second approachreturn (tokenize(digit) ^ digit + number)(text, index)
and in the third approachr = yield tokenize(digit) ^ (digit + number)
. I'm still not happy with that approach since I'm forced to tokenize.– jabozzo
Aug 21 at 2:03