Pyparsing failed to parse multiple rules
Pyparsing failed to parse multiple rules
I am trying to create Boolean query parser with some special rules like adjacent and near values. Rules i have created so far is
## DEFINITIONS OF SYMBOLS ###
NEAR = CaselessLiteral('near').suppress()
NUMBER = Word(nums)
NONEDIRECTIONAL = Combine(NEAR+NUMBER)
ADJ = CaselessLiteral("ADJ").setParseAction(replaceWith('0'))
OAND = CaselessLiteral("and")
OOR = CaselessLiteral("or")
ONOT = CaselessLiteral("not")
## ----------------------- ##
## DEFINITIONS OF TERMS ###
# Do not break quoted string.
QUOTED = quotedString.setParseAction(removeQuotes)
# space-separated words are easiest to define using just OneOrMore
# must use a negative lookahead for and/not/or operators, and this must come
# at the beginning of the expression
WORDWITHSPACE = OneOrMore(~(OAND | ONOT | OOR | NONEDIRECTIONAL | ADJ) +
Word(printables, excludeChars="()"))
# use a parse action to recombine words into a single string
WORDWITHSPACE.addParseAction(lambda t: ' '.join(t))
TERM = (QUOTED | WORDWITHSPACE)
## ----------------------- ##
## DEFINITIONS OF Expresion ###
EXPRESSION = infixNotation(TERM,
[
(ADJ, 2, opAssoc.LEFT),
(NONEDIRECTIONAL, 2, opAssoc.LEFT),
(ONOT, 1, opAssoc.RIGHT),
(Optional(OAND, default='and'), 2, opAssoc.LEFT),
(OOR, 2, opAssoc.LEFT)
])
# As we can have more than one occurances of symbols together we are
# using `OneOrMore` Exprestions
BOOLQUERY = OneOrMore(EXPRESSION) + StringEnd()
## ----------------------- ##
When i run
((a or b) and (b and c)) or (a and d)
It works fine
Whereas when i try to parse
((((smart ADJ contract*) and agreement) or (enforced near3 without near3 interaction) or (automated ADJ escrow)) or ((protocol* or Consensus ADJ algorithm) near5 (agreement and transaction)))
It code stuck not able to process.
can any one help me out where i am going wrong ?
Updated code :
EXPRESSION = infixNotation(TERM,
[
(ONOT, 1, opAssoc.RIGHT),
(Optional(OAND, default='and'), 2, opAssoc.LEFT),
((OOR | NONEDIRECTIONAL | ADJ), 2, opAssoc.LEFT)
])
kept optional and because of cases like
x not y not z
How to create implicit grammar where i want one of "and/or/not" has to be there in the query?
– Vikram Sangat
Aug 20 at 3:46
Not sure I understood this question. To get rid of implicit 'and', remove
Optional
wrapper in (Optional(OAND, default='and'), 2, opAssoc.LEFT),
, change to (OAND, 2, opAssoc.LEFT),
– PaulMcG
Aug 20 at 3:49
Optional
(Optional(OAND, default='and'), 2, opAssoc.LEFT),
(OAND, 2, opAssoc.LEFT),
Ok Thank you FYI wiki space link is closed can you provide alternate link for documentation ?
– Vikram Sangat
Aug 20 at 3:51
The pyparsing docs are online at pythonhosted.org/pyparsing/pyparsing-module.html
– PaulMcG
Aug 20 at 10:04
1 Answer
1
Your program is taking a long time because your infixNotation
is 5 layers deep AND has an optional AND operator.
infixNotation
I was able to run this as-is by just enabling packrat parsing. Do this by adding to the top of your script (right after importing pyparsing):
ParserElement.enablePackrat()
To run your tests, I used runTests
. It was not clear to me why BOOLQUERY was necessary, since you are just parsing expressions:
runTests
tests = """
((a or b) and (b and c)) or (a and d)
((((smart ADJ contract*) and agreement) or (enforced near3 without near3 interaction) or (automated ADJ escrow)) or ((protocol* or Consensus ADJ algorithm) near5 (agreement and transaction)))
"""
EXPRESSION.runTests(tests)
Gives:
((a or b) and (b and c)) or (a and d)
[[[['a', 'or', 'b'], 'and', ['b', 'and', 'c']], 'or', ['a', 'and', 'd']]]
[0]:
[[['a', 'or', 'b'], 'and', ['b', 'and', 'c']], 'or', ['a', 'and', 'd']]
[0]:
[['a', 'or', 'b'], 'and', ['b', 'and', 'c']]
[0]:
['a', 'or', 'b']
[1]:
and
[2]:
['b', 'and', 'c']
[1]:
or
[2]:
['a', 'and', 'd']
((((smart ADJ contract*) and agreement) or (enforced near3 without near3 interaction) or (automated ADJ escrow)) or ((protocol* or Consensus ADJ algorithm) near5 (agreement and transaction)))
[[[[['smart', '0', 'contract*'], 'and', 'agreement'], 'or', ['enforced', '3', 'without', '3', 'interaction'], 'or', ['automated', '0', 'escrow']], 'or', [['protocol*', 'or', ['Consensus', '0', 'algorithm']], '5', ['agreement', 'and', 'transaction']]]]
[0]:
[[[['smart', '0', 'contract*'], 'and', 'agreement'], 'or', ['enforced', '3', 'without', '3', 'interaction'], 'or', ['automated', '0', 'escrow']], 'or', [['protocol*', 'or', ['Consensus', '0', 'algorithm']], '5', ['agreement', 'and', 'transaction']]]
[0]:
[[['smart', '0', 'contract*'], 'and', 'agreement'], 'or', ['enforced', '3', 'without', '3', 'interaction'], 'or', ['automated', '0', 'escrow']]
[0]:
[['smart', '0', 'contract*'], 'and', 'agreement']
[0]:
['smart', '0', 'contract*']
[1]:
and
[2]:
agreement
[1]:
or
[2]:
['enforced', '3', 'without', '3', 'interaction']
[3]:
or
[4]:
['automated', '0', 'escrow']
[1]:
or
[2]:
[['protocol*', 'or', ['Consensus', '0', 'algorithm']], '5', ['agreement', 'and', 'transaction']]
[0]:
['protocol*', 'or', ['Consensus', '0', 'algorithm']]
[0]:
protocol*
[1]:
or
[2]:
['Consensus', '0', 'algorithm']
[1]:
5
[2]:
['agreement', 'and', 'transaction']
This works way better.
– Vikram Sangat
Aug 20 at 3:55
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
If "and" is optional, how will you determine if "a b" means the single term "a b" or the two separate terms "a" and "b" with an optional/implicit "and"? I think you will have to choose between implicit "and"-ing and unquoted multi-word terms, else your grammar is ambiguous.
– PaulMcG
Aug 19 at 22:09