Up for discussion: User-provided commands for reading shell input #178

jpco · 2025-03-27T13:39:29Z

Another wacky change from the chicken ranch. I think this one is pretty exciting, though.

Here's what's included:

This PR adds a $&readline primitive.

The primitive itself isn't terribly interesting, it essentially does just what you'd expect - it takes a single optional prompt as its argument and its return semantics are the same as $&read. It also interacts with the readline history primitives and $&resetterminal as you would expect. However, it also (sort of) supports user-programmable file completion -- by uncommenting readline.c:218 and defining the following function, you can test it out:

fn %complete in {
  let (f = $in^*) if {access $f} {
    result $f
  } {
    result ()
  }
}

That's novel and interesting. But not exactly enough so to justify a whole new primitive just to have it. So what's the point of this primitive?

This PR modifies $&parse to take a "reader command" which it calls to fetch input.

So instead of $&parse $prompt, we do something like $&parse {$&readline $prompt} (what it really looks like is just below). $&parse calls the reader command at least once in order to fetch input to parse, but the number of calls exactly depends on the input itself, which is why it needs to be modeled this way. (In theory we could do a push-style $&parse primitive where we fetch input, feed it to $&parse, and then repeat depending on its output, but this is a more straightforward change from the prior state of things). When $&parse gets an empty list return value from its reader command, it understands that as an EOF and throws an eof exception.

So we added the $&readline primitive because now, that's how we use readline in the shell!

Doing this enables some impressive flexibility. For example, a %parse function which behaves like the existing %parse $prompt primitive looks something like:

fn %parse prompt {
	if %is-interactive {
		let (p = $prompt(1); input = ())
		let (code = <={$&parse {
			let (r = <={$&readline $p}) {
				input = $input $r
				p = $prompt(2)
				result $r
			}
		}}) {
			if {!~ $#fn-%write-history 0} {
				%write-history <={%flatten \n $input}
			}
			result $code
		}
	} {
		$&parse $&read
	}
}

But this is much more flexible and explicit than the existing setup. Some ways you could change this to suit your fancy:

%write-history each line individually rather than buffering the input and writing it all at the end (this would be like the pre Minor overhaul of history writing #65 behavior).
Use only a single $prompt for every input line, or use three different $prompt elements, or even dynamically generate $prompt for each individual line
Use something other than $&readline or $&read. Currently those are the only two reasonable options, but with this setup, a whole other line editing library is just a matter of calling a different reader command.
Do pre-processing on the read-in text before returning it. This can be used for lots of read-time behaviors, such as (in increasing order of bad-idea-ness):
- history expansion
- the equivalent of zsh's $KEYBOARD_HACK
- precedence-hacking to allow things like time foo | bar => time {foo | bar}
- "real" aliases which let you do weird things like alias fonk='echo blah |'

Note that the reader command is called with stdin set to the current shell command input (which, unlike other shells, is a bit different than the user's idea of stdin). If using a $&runinput primitive like the %run-file from #79, then the set of $&runinput, $&parse, and the "reader command" make a coherent "system" of functions, where an extremely minimal -- but just about complete, modulo handling the eof exception -- REPL would look like:

fn minimal-loop script {
	$&runinput $script {
		forever {
			<={$&parse $&read}
		}
	}
}

where: (a) $&runinput would set up a new command input and track certain values (line number, etc.), and (b) $&parse would interpret the input as shell commands, calling (c) $&read to actually read the input from the file set up by $&runinput. It's not entirely "orthogonal" to have the $&runinput and $&parse behaviors depend on each other this much (you can't properly read from the command input without $&parse, and you can't read an input to $&parse without a context set up by $&runinput), but given some of the complexities at play here, it seems fairly minimal -- and should extend fairly cleanly to other behaviors a user might want to add to get a "complex" input system.

Knowledgeable readers might be asking by now, how is this setup possible? Shell commands during parsing are inherently memory-buggy given the way the parser and GC interact -- this was mentioned way early on in the old mailing list (https://wryun.github.io/es-shell/mail-archive/msg00039.html). That's been solved here with the last (actually first, as far as implementation order went) bit:

This PR introduces a new, separate memory space, which I call "pspace" in the code, for building parse trees.

Originally, based in part on some comments in the mailing list, I thought I would need to replace the parser in order to play nice with GC'd memory (see both of my attempts, as well as #91). The problem is that replacing the parser is a fair amount of work, and parsing has so many tiny, fiddly behaviors (including a lot of allocations) that it's easy to add a heap of bugs just replacing the parser, before even getting to fixing the memory situation or developing the new input behaviors. So it went off "in the weeds" each time.

So to bypass that, what happens here is that allocations either in or used by the parser, instead of using gcalloc, use palloc, which allocates these pointers in "pspace", which is like normal GC space except it is 1. disjoint (no normal GCs touch pspace), and 2. explicitly collected at the end of a $&parse call, when the generated parse tree is retrieved. At that point the parse tree is treated as the only root of pspace and it is copied into GC space for normal use.

This was surprisingly simple to implement, and works shockingly well, given it's only ~200 new lines of code.

Not all of this PR is set in stone, but there are two particular aspects I think are very good and important:

First of all, the pspace change unblocks the entire notion of extensible "reading", in the sense that a shell is a "R"EPL. This is the foundational technical innovation of this PR, to let us try to play a little bit of "catch-up" with the fancy interactive abilities of basically every other shell these days. By itself, the palloc stuff is also a no-op for the user, and has ~no impact on either performance or memory usage as far as I can tell.

Second, I was able to pull all the readline logic out of the rest of the shell and collect it in readline.c. Localizing all the extra complexity of line editing libraries like this should make it far easier to manage. A possible direction to follow up here could be the "pluggable primitives" idea that has been floated off and on in the past, or at least some other way to more flexibly choose the line editing library that we like (rc has some precedent here, though it isn't based on primitives.)

In general, follow-ups I see to this would be:

Follow through on programmable completion and key binding in readline
Explore other line editing libraries and commands such as libedit/editline, linenoise/replxx, or even linecook (hello @injinj :). Or what about alternative frontends to es entirely? As part of this, figure out how to make it easier to "plug in" these different libraries at the C level.
Performance improvements -- using $&read for shell input makes a "seeking read" much more useful (Without the "seeking read", this PR slows down shell tests by ~9%, and with it, that slowdown disappears).
Support separate reading logic for heredocs (ever want to have a separate prompt or history file just for heredocs? I do).
Explore getting $&runinput or something similar into the shell and explore how the whole input system is changed/simplified by just using these new primitives together. I think we could probably shorten input.c by another ~200 lines and simplify the control flow by a lot. This PR and Up for discussion: Move the "main runtime" of es out of C and into es script #79 together almost completely remove the need for run_interactive in the shell -- I think that's quite interesting.

For completeness, I want to make sure to mention a couple bugs/limitations of the current implementation:

Calling %parse messes with the input's state, which can cause a surprising shell exit ($&parse {result ()}) or just slightly-wrong line numbers. I don't know if this is a "bug" or just a sharp edge of a powerful tool. If $&runinput were present then $&parse calls could be better insulated from parent REPLs.
It doesn't support parsing while you parse (that is, something like $&parse {$&parse}). This might seem like kind of a "no duh" limitation, but parsing strings can happen at unexpected times, like when first running a function from the environment. This limitation should be relaxed to "no nested parsing on a single input", but I need to work out if yacc even supports these kind of shenanigans (without extensions specific to specific implementations). In the worst case, we could define a "normal" parser and a "string" parser using -p such that we can at least call parsestring() inside of a $&parse.
$&read can't handle \0 bytes. This was already found in $&reading a NUL byte crashes the shell #93, but in this new setting it's more of a problem -- I had to comment out a trip.es test case to account for it.

What do folks think of all this? Promising?

this keeps the in-progress parse tree separate from the rest of the shell memory

This is terrible. We should probably make a new recursive-copy function to put in Tags instead.

jpco · 2025-03-28T15:17:28Z

These test segfaults sure are disconcerting; I can't repro them with the same compilation flags on my computer. I'll keep poking.

jpco · 2025-04-12T15:57:24Z

Under the assumption that people definitely do definitely want:

More read-time extensibility of some kind (if not necessarily this user-visible design), and
More flexibility for different libraries to be included in the shell,

I'm going to merge some of the non-user-visible pieces of this separately: in particular the core pspace stuff, and moving the LOCAL_GETENV stuff to var.c where it makes much more sense. I also merged %write-history since people were already on board with that change conceptually and I'd gotten it to a place where I like it standalone but it also composes well with these reader commands, and I re-opened the "fast-$&read" PR #107.

I now (too late) understand what it was doing in the first place.

jpco added 12 commits March 24, 2025 21:50

Small simplification of gc functions

084f0ac

working (?!) implementation of palloc and friends

cec72d5

this keeps the in-progress parse tree separate from the rest of the shell memory

only scan the necessary pspace pointers while copying into gcspace

ea81fd3

This is terrible. We should probably make a new recursive-copy function to put in Tags instead.

move LOCAL_GETENV logic out of input.c into var.c

ba9b099

Pull in %write-history from wryun#65

b8b31bd

Start shoving readline logic into readline.c

2eb69a1

Shove yet more readline logic into readline.c

dd2fef1

initial sketch of a $&readline primitive

1a4923b

Demo user-defineable completion function

288fdd8

use hook functions for shell input

2c05f72

Make pseal(NULL) work okay

ec430cb

Li'l bug fixes

b8110e9

jpco force-pushed the palloc branch 4 times, most recently from 159d11f to b8110e9 Compare March 29, 2025 19:11

jpco added 4 commits March 30, 2025 20:23

Add (and fix) tests for new parser behavior

8c3ae10

Try to isolate test execution better

a679134

Small test improvements

eea5e0c

fix obvious little segfault bug

0fae344

jpco mentioned this pull request Apr 12, 2025

%read in chunks for lseek()able files. #107

Open

jpco mentioned this pull request Apr 12, 2025

Move LOCAL_GETENV handling to var.c where the other env handling lives. #190

Merged

jpco and others added 2 commits April 14, 2025 07:40

Restore previous scanspace logic.

1c00488

I now (too late) understand what it was doing in the first place.

Merge remote-tracking branch 'upstream/master' into palloc

d3f888f

This was referenced Apr 23, 2025

Add special-case readline completion logic for first-position #202

Open

Allocate memory from the parser in "pspace", not GC space #205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Up for discussion: User-provided commands for reading shell input #178

Up for discussion: User-provided commands for reading shell input #178

jpco commented Mar 27, 2025 •

edited

Loading

jpco commented Mar 28, 2025

jpco commented Apr 12, 2025

Up for discussion: User-provided commands for reading shell input #178

Are you sure you want to change the base?

Up for discussion: User-provided commands for reading shell input #178

Conversation

jpco commented Mar 27, 2025 • edited Loading

jpco commented Mar 28, 2025

jpco commented Apr 12, 2025

jpco commented Mar 27, 2025 •

edited

Loading