Skip to content

Up for discussion: User-provided commands for reading shell input #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

jpco
Copy link
Collaborator

@jpco jpco commented Mar 27, 2025

Another wacky change from the chicken ranch. I think this one is pretty exciting, though.

Here's what's included:

  1. This PR adds a $&readline primitive.

The primitive itself isn't terribly interesting, it essentially does just what you'd expect - it takes a single optional prompt as its argument and its return semantics are the same as $&read. It also interacts with the readline history primitives and $&resetterminal as you would expect. However, it also (sort of) supports user-programmable file completion -- by uncommenting readline.c:218 and defining the following function, you can test it out:

fn %complete in {
  let (f = $in^*) if {access $f} {
    result $f
  } {
    result ()
  }
}

That's novel and interesting. But not exactly enough so to justify a whole new primitive just to have it. So what's the point of this primitive?

  1. This PR modifies $&parse to take a "reader command" which it calls to fetch input.

So instead of $&parse $prompt, we do something like $&parse {$&readline $prompt} (what it really looks like is just below). $&parse calls the reader command at least once in order to fetch input to parse, but the number of calls exactly depends on the input itself, which is why it needs to be modeled this way. (In theory we could do a push-style $&parse primitive where we fetch input, feed it to $&parse, and then repeat depending on its output, but this is a more straightforward change from the prior state of things). When $&parse gets an empty list return value from its reader command, it understands that as an EOF and throws an eof exception.

So we added the $&readline primitive because now, that's how we use readline in the shell!

Doing this enables some impressive flexibility. For example, a %parse function which behaves like the existing %parse $prompt primitive looks something like:

fn %parse prompt {
	if %is-interactive {
		let (p = $prompt(1); input = ())
		let (code = <={$&parse {
			let (r = <={$&readline $p}) {
				input = $input $r
				p = $prompt(2)
				result $r
			}
		}}) {
			if {!~ $#fn-%write-history 0} {
				%write-history <={%flatten \n $input}
			}
			result $code
		}
	} {
		$&parse $&read
	}
}

But this is much more flexible and explicit than the existing setup. Some ways you could change this to suit your fancy:

  • %write-history each line individually rather than buffering the input and writing it all at the end (this would be like the pre Minor overhaul of history writing #65 behavior).

  • Use only a single $prompt for every input line, or use three different $prompt elements, or even dynamically generate $prompt for each individual line

  • Use something other than $&readline or $&read. Currently those are the only two reasonable options, but with this setup, a whole other line editing library is just a matter of calling a different reader command.

  • Do pre-processing on the read-in text before returning it. This can be used for lots of read-time behaviors, such as (in increasing order of bad-idea-ness):

    • history expansion
    • the equivalent of zsh's $KEYBOARD_HACK
    • precedence-hacking to allow things like time foo | bar => time {foo | bar}
    • "real" aliases which let you do weird things like alias fonk='echo blah |'

Note that the reader command is called with stdin set to the current shell command input (which, unlike other shells, is a bit different than the user's idea of stdin). If using a $&runinput primitive like the %run-file from #79, then the set of $&runinput, $&parse, and the "reader command" make a coherent "system" of functions, where an extremely minimal -- but just about complete, modulo handling the eof exception -- REPL would look like:

fn minimal-loop script {
	$&runinput $script {
		forever {
			<={$&parse $&read}
		}
	}
}

where: (a) $&runinput would set up a new command input and track certain values (line number, etc.), and (b) $&parse would interpret the input as shell commands, calling (c) $&read to actually read the input from the file set up by $&runinput. It's not entirely "orthogonal" to have the $&runinput and $&parse behaviors depend on each other this much (you can't properly read from the command input without $&parse, and you can't read an input to $&parse without a context set up by $&runinput), but given some of the complexities at play here, it seems fairly minimal -- and should extend fairly cleanly to other behaviors a user might want to add to get a "complex" input system.

Knowledgeable readers might be asking by now, how is this setup possible? Shell commands during parsing are inherently memory-buggy given the way the parser and GC interact -- this was mentioned way early on in the old mailing list (https://wryun.github.io/es-shell/mail-archive/msg00039.html). That's been solved here with the last (actually first, as far as implementation order went) bit:

  1. This PR introduces a new, separate memory space, which I call "pspace" in the code, for building parse trees.

Originally, based in part on some comments in the mailing list, I thought I would need to replace the parser in order to play nice with GC'd memory (see both of my attempts, as well as #91). The problem is that replacing the parser is a fair amount of work, and parsing has so many tiny, fiddly behaviors (including a lot of allocations) that it's easy to add a heap of bugs just replacing the parser, before even getting to fixing the memory situation or developing the new input behaviors. So it went off "in the weeds" each time.

So to bypass that, what happens here is that allocations either in or used by the parser, instead of using gcalloc, use palloc, which allocates these pointers in "pspace", which is like normal GC space except it is 1. disjoint (no normal GCs touch pspace), and 2. explicitly collected at the end of a $&parse call, when the generated parse tree is retrieved. At that point the parse tree is treated as the only root of pspace and it is copied into GC space for normal use.

This was surprisingly simple to implement, and works shockingly well, given it's only ~200 new lines of code.


Not all of this PR is set in stone, but there are two particular aspects I think are very good and important:

First of all, the pspace change unblocks the entire notion of extensible "reading", in the sense that a shell is a "R"EPL. This is the foundational technical innovation of this PR, to let us try to play a little bit of "catch-up" with the fancy interactive abilities of basically every other shell these days. By itself, the palloc stuff is also a no-op for the user, and has ~no impact on either performance or memory usage as far as I can tell.

Second, I was able to pull all the readline logic out of the rest of the shell and collect it in readline.c. Localizing all the extra complexity of line editing libraries like this should make it far easier to manage. A possible direction to follow up here could be the "pluggable primitives" idea that has been floated off and on in the past, or at least some other way to more flexibly choose the line editing library that we like (rc has some precedent here, though it isn't based on primitives.)

In general, follow-ups I see to this would be:

  • Follow through on programmable completion and key binding in readline

  • Explore other line editing libraries and commands such as libedit/editline, linenoise/replxx, or even linecook (hello @injinj :). Or what about alternative frontends to es entirely? As part of this, figure out how to make it easier to "plug in" these different libraries at the C level.

  • Performance improvements -- using $&read for shell input makes a "seeking read" much more useful (Without the "seeking read", this PR slows down shell tests by ~9%, and with it, that slowdown disappears).

  • Support separate reading logic for heredocs (ever want to have a separate prompt or history file just for heredocs? I do).

  • Explore getting $&runinput or something similar into the shell and explore how the whole input system is changed/simplified by just using these new primitives together. I think we could probably shorten input.c by another ~200 lines and simplify the control flow by a lot. This PR and Up for discussion: Move the "main runtime" of es out of C and into es script #79 together almost completely remove the need for run_interactive in the shell -- I think that's quite interesting.

For completeness, I want to make sure to mention a couple bugs/limitations of the current implementation:

  • Calling %parse messes with the input's state, which can cause a surprising shell exit ($&parse {result ()}) or just slightly-wrong line numbers. I don't know if this is a "bug" or just a sharp edge of a powerful tool. If $&runinput were present then $&parse calls could be better insulated from parent REPLs.

  • It doesn't support parsing while you parse (that is, something like $&parse {$&parse}). This might seem like kind of a "no duh" limitation, but parsing strings can happen at unexpected times, like when first running a function from the environment. This limitation should be relaxed to "no nested parsing on a single input", but I need to work out if yacc even supports these kind of shenanigans (without extensions specific to specific implementations). In the worst case, we could define a "normal" parser and a "string" parser using -p such that we can at least call parsestring() inside of a $&parse.

  • $&read can't handle \0 bytes. This was already found in $&reading a NUL byte crashes the shell #93, but in this new setting it's more of a problem -- I had to comment out a trip.es test case to account for it.

What do folks think of all this? Promising?

@jpco
Copy link
Collaborator Author

jpco commented Mar 28, 2025

These test segfaults sure are disconcerting; I can't repro them with the same compilation flags on my computer. I'll keep poking.

@jpco jpco force-pushed the palloc branch 4 times, most recently from 159d11f to b8110e9 Compare March 29, 2025 19:11
@jpco
Copy link
Collaborator Author

jpco commented Apr 12, 2025

Under the assumption that people definitely do definitely want:

  • More read-time extensibility of some kind (if not necessarily this user-visible design), and
  • More flexibility for different libraries to be included in the shell,

I'm going to merge some of the non-user-visible pieces of this separately: in particular the core pspace stuff, and moving the LOCAL_GETENV stuff to var.c where it makes much more sense. I also merged %write-history since people were already on board with that change conceptually and I'd gotten it to a place where I like it standalone but it also composes well with these reader commands, and I re-opened the "fast-$&read" PR #107.

jpco and others added 2 commits April 14, 2025 07:40
I now (too late) understand what it was doing in the first place.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant