-
Notifications
You must be signed in to change notification settings - Fork 286
PR to clean up global state in thriftpy.parser.parser? #268
Comments
Thanks, pull requests are welcomed! |
Awesome, looking forward to working with you! :-) |
This may be a bit extreme, but I got into the guts and PLY really punishes you for not having global state. Here's what I have so far, migrating to Parsley: Most parsing tests are passing. Before proposing this as a real PR, I'll get 100% of tests passing, and also do a refactor of the code to get it as close as possible to the existing variable and function names. (My biggest concern is that such a major change would make the code-base unfamiliar for core developers.) (Also, performance is similar between ply and parsley libraries; my test .thrift file is 25% faster in parsley but this isn't apples to apples until all of the correct in memory classes are set up.) |
@kurtbrose so would it make sense if we only fix this bug by wrap a try..catch on parser.parse? |
@lxyu that's a good question. Here are some scenarios: minimum changeA try..finally around the push / pop. Limitation: parsing is not thread safe. reset global state + lockAt the beginning of every parse, reset the global variables in no global state parsing(My proposal.) Parsing functions are all purely functional -- the inner My use case is I'm trying to migrate a large existing code-base from a very old version of facebook thrift (r821160) to thriftpy. There are ~300 .thrift files some of which have parse errors because the compiler has become more strict over time. It's your call. I believe that switching |
Basically, if one thrift file failed to parse, the best way to do is throw a exception and stop early. Only continue when the For the thread-local part, yes currently the parser is not thread-safe, it needs some fix to archive it. And I have 2 thoughts on this problem.
|
Those are good points. In a typical usage scenario (services communicating to each other) the protocol spec is fixed in advance and if it has any problems you want to abort. And if this was C++ or Java that would be the end of the story :-) But, in Python we can do so much more. For example, in my early phases of development I'll typically use a REPL or Jupyter Notebook to iteratively play with code. In that environment, having isolated repeatable functions even if an exception is raised is very important. You'll execute a function, get an exception, try tweaking it and execute again. For example, the way I first encountered these issues was trying to figure out the correct set of directories to pass to the parser so that the thrift Regarding using thread local -- I think an approach with closures is the best way to avoid global state if sticking with PLY. (As mentioned in the PLY docs http://www.dabeaz.com/ply/ply.html#ply_nn18 .) It's basically an engineering trade-off -- what is the most elegant way to achieve the desired outcome? Using parsley results in very simple and elegant usage patterns. For example: _GRAMMAR('struct Foo { 1: bar.Bar first 2: string second }').Struct() This returns a struct -- subclass of I'm finishing wiring things up and getting all the tests passing, then I'll do a diff minimization in git to make sure there are no incidental changes -- I want parser.py to still "feel" like the same module as much as possible. At that point it's up to you. |
Milestone -- the new parser correctly parses all of ThriftTest.thrift (https://git-wip-us.apache.org/repos/asf?p=thrift.git;a=blob_plain;f=test/ThriftTest.thrift;hb=HEAD) https://gist.github.com/kurtbrose/1a99167917998647a4707b5d475edbbe |
Just as an FYI, the current parser fails on ThriftTest.thrift from the apache thrift IDL documentation. I think that alone might make the change worth it :-) Doing final integration now, will probably have PR tomorrow afternoon. |
Have you considered defining the parsing grammar in a class and keeping its state local to the instance? You would use it via http://www.dabeaz.com/ply/ply.html#ply_nn18 This would also greatly simplify #259 |
https://github.com/eleme/thriftpy/blob/develop/thriftpy/parser/parser.py#L479-L481
there is global state in thriftpy.parser.parser
Specifically, if an exception is raised by
parser.parse(data)
,thrift_stack.pop()
never happens and "dead" modules are left in thethrift_stack
global.I'd love to migrate our usage off of facebook thrift and onto thriftpy. However, the non-reentrancy of the parser is a problem. I am happy to fix it, but from a maintainability perspective I don't want to migrate us onto a fork.
Would you be open to a PR that makes the parser more object oriented and removes global state?
The text was updated successfully, but these errors were encountered: