Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Parsed structures note their locations in the input text #4

Open
davidsantiago opened this issue Jun 1, 2012 · 7 comments

Comments

@davidsantiago
Copy link

As far as I can tell, and I'm embarrassed to admit that I'm quite over my head in your amazing sci-fi incremental parser code, there's no way you can make parsley generate the parse tree so that it notes the location in the text of each production. For example, it would be nice to be able to have each node point to the location of the first and last char that got parsed into that production. Or even just the first character's location. I don't think it's possible to do this with make-node, as that information does not get passed into the arguments.

Is this something that could be done to the parser? I'm not sure if something about its incremental parsing would prevent this.

@cgrand
Copy link
Owner

cgrand commented Jun 1, 2012

Hi,

Because of incrementalism a node can't know it's start/end offset in the character stream -- if it could, an insertion would require to update this piece of data in all nodes to the right of the insertion.
However there's an alternative design: each node store the relative end offsets of its children. Thus given a global offset you can start from the root and recurse your way up (at each step you find the children which contains the offset, subtract its start offset to get the remaining offset relative to it and recurse).

Assuming your lreaves are just strings, something like that should work.

(defn length [node]
  (if (string? node)
    (count node)
    (peek (:offsets node))))

(defn make-node [tag content]
  {:tag tag :content content 
   :offsets (vec (reductions + (map length content)))})

(untested code)

Does it help?

@davidsantiago
Copy link
Author

Yeah, I was kinda thinking it could do that update on everything to the right when it needed to, but I think you are also aiming to keep a logarithmic runtime on the incremental parsing, so that would ruin it. I had not thought of the scheme you are suggesting, because I thought that the content nodes would possibly throw away information during the parse, especially with the - suffix. If the parse tree can always be used to recreate the input, then something like this would definitely suit me. I'll try it out and see if it works.

@cgrand
Copy link
Owner

cgrand commented Jun 1, 2012

Parsley keeps all characters. The - suffix only instructs the parser to not create a node and to inline its children in its parent node -- so only inner nodes are elided, but not their children which are reparented.

@davidsantiago
Copy link
Author

I see. Sorry for the false issue, I only took a real look at parsley this morning, and it's different from what I'm used to in a number of ways, so I haven't digested it all yet. I do think it might be nice to have that offsets scheme you mentioned built in, or more readily available to users at least, so I'll leave the issue open if you want to talk about doc updates or code tweaks related to that, or you can go ahead and close it if not.

@cgrand
Copy link
Owner

cgrand commented Jun 2, 2012

Well, offsets are a special case of what I call "views" (aggregated properties on nodes/leaves) which I'm at last working on at the moment (because of sjacket) -- I helped @laurentpetit to introduce them in an ad hoc manner in paredit.clj but now they are going to be an existing facility of parsley.

@davidsantiago
Copy link
Author

Oh, excellent. I was thinking it'd be a shame if something like offsets was something that people had to reinvent/discover on their own every time someone needed it, but it sounds like you're working on a more general facility.

@cgrand
Copy link
Owner

cgrand commented Jun 4, 2012

finding a node by its offset: https://github.com/cgrand/parsley/blob/master/test/net/cgrand/parsley/test.clj#L91
Rignt now only functional trees support views but regular trees will support them too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants