Request: Parsed structures note their locations in the input text #4

davidsantiago · 2012-06-01T18:38:22Z

As far as I can tell, and I'm embarrassed to admit that I'm quite over my head in your amazing sci-fi incremental parser code, there's no way you can make parsley generate the parse tree so that it notes the location in the text of each production. For example, it would be nice to be able to have each node point to the location of the first and last char that got parsed into that production. Or even just the first character's location. I don't think it's possible to do this with make-node, as that information does not get passed into the arguments.

Is this something that could be done to the parser? I'm not sure if something about its incremental parsing would prevent this.

cgrand · 2012-06-01T19:28:23Z

Hi,

Because of incrementalism a node can't know it's start/end offset in the character stream -- if it could, an insertion would require to update this piece of data in all nodes to the right of the insertion.
However there's an alternative design: each node store the relative end offsets of its children. Thus given a global offset you can start from the root and recurse your way up (at each step you find the children which contains the offset, subtract its start offset to get the remaining offset relative to it and recurse).

Assuming your lreaves are just strings, something like that should work.

(defn length [node]
  (if (string? node)
    (count node)
    (peek (:offsets node))))

(defn make-node [tag content]
  {:tag tag :content content 
   :offsets (vec (reductions + (map length content)))})

(untested code)

Does it help?

davidsantiago · 2012-06-01T20:07:13Z

Yeah, I was kinda thinking it could do that update on everything to the right when it needed to, but I think you are also aiming to keep a logarithmic runtime on the incremental parsing, so that would ruin it. I had not thought of the scheme you are suggesting, because I thought that the content nodes would possibly throw away information during the parse, especially with the - suffix. If the parse tree can always be used to recreate the input, then something like this would definitely suit me. I'll try it out and see if it works.

cgrand · 2012-06-01T20:14:15Z

Parsley keeps all characters. The - suffix only instructs the parser to not create a node and to inline its children in its parent node -- so only inner nodes are elided, but not their children which are reparented.

davidsantiago · 2012-06-01T20:26:09Z

I see. Sorry for the false issue, I only took a real look at parsley this morning, and it's different from what I'm used to in a number of ways, so I haven't digested it all yet. I do think it might be nice to have that offsets scheme you mentioned built in, or more readily available to users at least, so I'll leave the issue open if you want to talk about doc updates or code tweaks related to that, or you can go ahead and close it if not.

cgrand · 2012-06-02T09:01:18Z

Well, offsets are a special case of what I call "views" (aggregated properties on nodes/leaves) which I'm at last working on at the moment (because of sjacket) -- I helped @laurentpetit to introduce them in an ad hoc manner in paredit.clj but now they are going to be an existing facility of parsley.

davidsantiago · 2012-06-02T09:03:51Z

Oh, excellent. I was thinking it'd be a shame if something like offsets was something that people had to reinvent/discover on their own every time someone needed it, but it sounds like you're working on a more general facility.

cgrand · 2012-06-04T11:17:25Z

finding a node by its offset: https://github.com/cgrand/parsley/blob/master/test/net/cgrand/parsley/test.clj#L91
Rignt now only functional trees support views but regular trees will support them too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Parsed structures note their locations in the input text #4

Request: Parsed structures note their locations in the input text #4

davidsantiago commented Jun 1, 2012

cgrand commented Jun 1, 2012

davidsantiago commented Jun 1, 2012

cgrand commented Jun 1, 2012

davidsantiago commented Jun 1, 2012

cgrand commented Jun 2, 2012

davidsantiago commented Jun 2, 2012

cgrand commented Jun 4, 2012

Request: Parsed structures note their locations in the input text #4

Request: Parsed structures note their locations in the input text #4

Comments

davidsantiago commented Jun 1, 2012

cgrand commented Jun 1, 2012

davidsantiago commented Jun 1, 2012

cgrand commented Jun 1, 2012

davidsantiago commented Jun 1, 2012

cgrand commented Jun 2, 2012

davidsantiago commented Jun 2, 2012

cgrand commented Jun 4, 2012