Replies: 1 comment
-
I made progress with this. The string pub fn operator<'src>(
) -> impl Parser<'src, &'src str, Spanned<String>, extra::Err<Rich<'src, char, SimpleSpan>>> {
let operator_char = one_of("+-*/<>=~!@#%^&|`?");
let operator_char_not_plus_minus = one_of("*/<>=~!@#%^&|`?");
(operator_char
.clone()
.repeated()
.to_slice()
.try_map_with(|ops: &str, extra| {
if ops.contains("--") || ops.contains("/*") {
return Err(Rich::custom(extra.span(), "invalid operator"));
}
if (ops.ends_with("+") || ops.ends_with("-"))
&& ops.len() > 1
&& !(ops.contains("~")
|| ops.contains("!")
|| ops.contains("#")
|| ops.contains("%")
|| ops.contains("^")
|| ops.contains("&")
|| ops.contains("|")
|| ops.contains("`")
|| ops.contains("?"))
{
return Err(Rich::custom(extra.span(), "invalid operator"));
}
Ok(Spanned(ops.to_string(), extra.span()))
}))
.or((operator_char.clone().repeated().or_not())
.then(operator_char_not_plus_minus.clone())
.to_slice()
.try_map_with(|ops: &str, extra| {
if ops.contains("--") || ops.contains("/*") {
return Err(Rich::custom(extra.span(), "invalid operator"));
}
Ok(Spanned(ops.to_string(), extra.span()))
}))
.or(operator_char
.clone()
.to_slice()
.map_with(|ops: &str, extra| Spanned(ops.to_string(), extra.span())))
} My parse call in my unit test is: assert_eq!(
operator().then(operator()).parse("+*-").into_result(),
Ok((
Spanned("+*".to_string(), SimpleSpan::new(0, 2)),
Spanned("-".to_string(), SimpleSpan::new(2, 3)),
))
); |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm lexing PostgreSQL SQL and currently trying to get operators working. The rules are here: https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-OPERATORS
Basically, operators consist of one or more characters that fall into an operator character set. In the case of multi-character operators, they can't contain characters that denote a comment (
--
and/*
). And they can't end with+
or-
unless certain conditions are met. This last part is what I'm having trouble with.The code I have is below. Lexing
+-
should lead to+
and-
, since+-
is not a valid multi-character operator: it ends with-
and doesn't meet those certain conditions (in the code below). The chumsky parser is greedily grabbing+-
as a whole string slice and my.try_map_with()
is failing on the whole thing.How do I get the parser to backtrack to return the
+
? My higher-level tokenizer would then lex this as+
followed by-
.Another example of this would be the string
+*-
, which should lex to the valid PostgreSQL operators of+*
followed by-
.Beta Was this translation helpful? Give feedback.
All reactions