-
Notifications
You must be signed in to change notification settings - Fork 28
Up for discussion: User-provided commands for reading shell input #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
this keeps the in-progress parse tree separate from the rest of the shell memory
This is terrible. We should probably make a new recursive-copy function to put in Tags instead.
These test segfaults sure are disconcerting; I can't repro them with the same compilation flags on my computer. I'll keep poking. |
159d11f
to
b8110e9
Compare
Under the assumption that people definitely do definitely want:
I'm going to merge some of the non-user-visible pieces of this separately: in particular the core |
I now (too late) understand what it was doing in the first place.
Another wacky change from the chicken ranch. I think this one is pretty exciting, though.
Here's what's included:
$&readline
primitive.The primitive itself isn't terribly interesting, it essentially does just what you'd expect - it takes a single optional prompt as its argument and its return semantics are the same as
$&read
. It also interacts with the readline history primitives and$&resetterminal
as you would expect. However, it also (sort of) supports user-programmable file completion -- by uncommentingreadline.c:218
and defining the following function, you can test it out:That's novel and interesting. But not exactly enough so to justify a whole new primitive just to have it. So what's the point of this primitive?
$&parse
to take a "reader command" which it calls to fetch input.So instead of
$&parse $prompt
, we do something like$&parse {$&readline $prompt}
(what it really looks like is just below).$&parse
calls the reader command at least once in order to fetch input to parse, but the number of calls exactly depends on the input itself, which is why it needs to be modeled this way. (In theory we could do a push-style$&parse
primitive where we fetch input, feed it to$&parse
, and then repeat depending on its output, but this is a more straightforward change from the prior state of things). When$&parse
gets an empty list return value from its reader command, it understands that as an EOF and throws aneof
exception.So we added the
$&readline
primitive because now, that's how we use readline in the shell!Doing this enables some impressive flexibility. For example, a
%parse
function which behaves like the existing%parse $prompt
primitive looks something like:But this is much more flexible and explicit than the existing setup. Some ways you could change this to suit your fancy:
%write-history
each line individually rather than buffering the input and writing it all at the end (this would be like the pre Minor overhaul of history writing #65 behavior).Use only a single
$prompt
for every input line, or use three different$prompt
elements, or even dynamically generate$prompt
for each individual lineUse something other than
$&readline
or$&read
. Currently those are the only two reasonable options, but with this setup, a whole other line editing library is just a matter of calling a different reader command.Do pre-processing on the read-in text before returning it. This can be used for lots of read-time behaviors, such as (in increasing order of bad-idea-ness):
$KEYBOARD_HACK
time foo | bar
=>time {foo | bar}
alias fonk='echo blah |'
Note that the reader command is called with stdin set to the current shell command input (which, unlike other shells, is a bit different than the user's idea of stdin). If using a
$&runinput
primitive like the%run-file
from #79, then the set of$&runinput
,$&parse
, and the "reader command" make a coherent "system" of functions, where an extremely minimal -- but just about complete, modulo handling theeof
exception -- REPL would look like:where: (a)
$&runinput
would set up a new command input and track certain values (line number, etc.), and (b)$&parse
would interpret the input as shell commands, calling (c)$&read
to actually read the input from the file set up by$&runinput
. It's not entirely "orthogonal" to have the$&runinput
and$&parse
behaviors depend on each other this much (you can't properly read from the command input without$&parse
, and you can't read an input to$&parse
without a context set up by$&runinput
), but given some of the complexities at play here, it seems fairly minimal -- and should extend fairly cleanly to other behaviors a user might want to add to get a "complex" input system.Knowledgeable readers might be asking by now, how is this setup possible? Shell commands during parsing are inherently memory-buggy given the way the parser and GC interact -- this was mentioned way early on in the old mailing list (https://wryun.github.io/es-shell/mail-archive/msg00039.html). That's been solved here with the last (actually first, as far as implementation order went) bit:
Originally, based in part on some comments in the mailing list, I thought I would need to replace the parser in order to play nice with GC'd memory (see both of my attempts, as well as #91). The problem is that replacing the parser is a fair amount of work, and parsing has so many tiny, fiddly behaviors (including a lot of allocations) that it's easy to add a heap of bugs just replacing the parser, before even getting to fixing the memory situation or developing the new input behaviors. So it went off "in the weeds" each time.
So to bypass that, what happens here is that allocations either in or used by the parser, instead of using
gcalloc
, usepalloc
, which allocates these pointers in "pspace", which is like normal GC space except it is 1. disjoint (no normal GCs touch pspace), and 2. explicitly collected at the end of a$&parse
call, when the generated parse tree is retrieved. At that point the parse tree is treated as the only root of pspace and it is copied into GC space for normal use.This was surprisingly simple to implement, and works shockingly well, given it's only ~200 new lines of code.
Not all of this PR is set in stone, but there are two particular aspects I think are very good and important:
First of all, the pspace change unblocks the entire notion of extensible "reading", in the sense that a shell is a "R"EPL. This is the foundational technical innovation of this PR, to let us try to play a little bit of "catch-up" with the fancy interactive abilities of basically every other shell these days. By itself, the palloc stuff is also a no-op for the user, and has ~no impact on either performance or memory usage as far as I can tell.
Second, I was able to pull all the readline logic out of the rest of the shell and collect it in
readline.c
. Localizing all the extra complexity of line editing libraries like this should make it far easier to manage. A possible direction to follow up here could be the "pluggable primitives" idea that has been floated off and on in the past, or at least some other way to more flexibly choose the line editing library that we like (rc has some precedent here, though it isn't based on primitives.)In general, follow-ups I see to this would be:
Follow through on programmable completion and key binding in readline
Explore other line editing libraries and commands such as libedit/editline, linenoise/replxx, or even linecook (hello @injinj :). Or what about alternative frontends to es entirely? As part of this, figure out how to make it easier to "plug in" these different libraries at the C level.
Performance improvements -- using
$&read
for shell input makes a "seeking read" much more useful (Without the "seeking read", this PR slows down shell tests by ~9%, and with it, that slowdown disappears).Support separate reading logic for heredocs (ever want to have a separate prompt or history file just for heredocs? I do).
Explore getting
$&runinput
or something similar into the shell and explore how the whole input system is changed/simplified by just using these new primitives together. I think we could probably shorten input.c by another ~200 lines and simplify the control flow by a lot. This PR and Up for discussion: Move the "main runtime" of es out of C and into es script #79 together almost completely remove the need forrun_interactive
in the shell -- I think that's quite interesting.For completeness, I want to make sure to mention a couple bugs/limitations of the current implementation:
Calling
%parse
messes with the input's state, which can cause a surprising shell exit ($&parse {result ()}
) or just slightly-wrong line numbers. I don't know if this is a "bug" or just a sharp edge of a powerful tool. If$&runinput
were present then$&parse
calls could be better insulated from parent REPLs.It doesn't support parsing while you parse (that is, something like
$&parse {$&parse}
). This might seem like kind of a "no duh" limitation, but parsing strings can happen at unexpected times, like when first running a function from the environment. This limitation should be relaxed to "no nested parsing on a single input", but I need to work out if yacc even supports these kind of shenanigans (without extensions specific to specific implementations). In the worst case, we could define a "normal" parser and a "string" parser using-p
such that we can at least callparsestring()
inside of a$&parse
.$&read
can't handle\0
bytes. This was already found in$&read
ing a NUL byte crashes the shell #93, but in this new setting it's more of a problem -- I had to comment out atrip.es
test case to account for it.What do folks think of all this? Promising?