-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surprising lexer behaviour when input is shorter than the shortest token #349
Comments
Correction: doesn't only happen on the start of the input, apparently the space messed with the results, this still fails: #[derive(Logos, Debug, PartialEq, Eq)]
pub enum Foo {
#[token("TEST")]
Test,
}
#[test]
fn test() {
let mut lex = Foo::lexer("TESTSET");
assert_eq!(lex.next(), Some(Ok(Foo::Test)));
assert_eq!(lex.next(), Some(Err(()))); // fails here
} Adding a callback makes the test pass. |
Hello, that's strange! But indeed you have to be aware that the space in important and will definitely cause an error if not handled :-) Could you post the error message here? Also, does adding a callback to |
That's very surprising, too, actually. Consider that there are no
Everything compiles fine, so unclear what error message you expect me to post. It's the runtime behaviour which is surprising.
I did some more testing, and apparently callbacks are a red herring entirely. Apologies for the misleading report. So this works as expected, returning an error on #[derive(Logos, Debug, PartialEq, Eq)]
pub enum Foo {
#[token("FOOBAR")]
Foo,
#[token("Q")]
Bar,
}
#[test]
fn test() {
let mut lex = Foo::lexer("FOOBARZAP");
assert_eq!(lex.next(), Some(Ok(Foo::Foo)));
assert_eq!(lex.next(), Some(Err(())));
} This, however, does not: #[derive(Logos, Debug, PartialEq, Eq)]
pub enum Foo {
#[token("FOOBAR")]
Foo,
#[token("QFOOBAR")]
Bar,
}
#[test]
fn test() {
let mut lex = Foo::lexer("FOOBARZAP");
assert_eq!(lex.next(), Some(Ok(Foo::Foo)));
assert_eq!(lex.next(), Some(Err(()))); // returns None
} |
There also seems to be some variation with a single token, dependent on the token length. This works as expected: #[derive(Logos, Debug, PartialEq, Eq)]
pub enum Foo {
#[token("FOO")]
Foo,
}
#[test]
fn test() {
let mut lex = Foo::lexer("ZAP");
assert_eq!(lex.next(), Some(Err(())));
} This doesn't: #[derive(Logos, Debug, PartialEq, Eq)]
pub enum Foo {
#[token("FOOB")]
Foo,
}
#[test]
fn test() {
let mut lex = Foo::lexer("ZAP");
assert_eq!(lex.next(), Some(Err(())));
} So... to me, it seems the lexer returns |
Apparently, lexer behaviour differs when there are callbacks and when there aren't any. Specifically, when there aren't any callbacks, and there's an error at the start of the input, the lexer returns
None
, instead of the expectedSome(Err(()))
.Reproducer:
This succeeds:
This only happens at the start of the input, e.g. this works as expected:
The text was updated successfully, but these errors were encountered: