Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new AsyncContext.callingContext() API #77

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jridgewell
Copy link
Member

@jridgewell jridgewell commented Apr 3, 2024

This is a small helper function that allows you to temporarily restore the previous ambient context state (like a snapshot.run(fn) call). This allows us continue with registration-time being the recommended behavior for every API, but allows call-time usecases in the cases that it's necessary.

const asyncVar = new AsyncContext.Variable();

const obj = new EventEmitter();

asyncVar.run("registration", () => {
  obj.on("foo", () => {
    // EventEmitter restored the registration time context before
    // invoking our callback. If the callback wanted to get the context
    // active during the `emit()` call, it would have to receive a
    // Snapshot instance passed by the caller.
    console.log(asyncVar.get()); // => 'registration'

    // But with `AsyncContext.callingContext()`, we're able to restore
    // the caller's context without changing EventEmitter's API.
    // EventEmitter can continue to assume that registration is default
    // that is most useful to developers.
    AsyncContext.callingContext(() => {
      console.log(asyncVar.get()); // => 'call'
    });
  });
});

asyncVar.run("call", () => {
  obj.emit("foo");
});

In yesterday's meeting, we identified a few general cases:

  • Async queueing APIs (like setTimeout())
    • These APIs are registration-time, and there is no useful call-time context
  • Eventing APIs that can invoke sync (like EventEmitter/EventTarget) or async
    • When it's being invoked sync (obj.emit('foo') or el.dispatchEvent(click)), either registration-time or call-time could be needed.
    • When it's triggered async (user click interaction, out-of-band loading), then registration-time is appropriate
  • unhandledrejection

This API allows us to handle all use cases, without requiring extensive API changes (eg, accepting an options bag that allows the dev to choose at registration time).


This also allows us to settle #18. Async/Generators will capture their init-time context and restore it during every call. But, the generator body itself can restore the call-time context easily:

function* gen() {
  console.log(asyncVar.get()); // => 'init'

  yield 1;

  // Generators and AsyncGenerators always restore the context that was
  // active when they were initialized.
  console.log(asyncVar.get()); // => 'init'

  // The generator can easily restore the calling context's state without
  // requiring the caller to do anything special.
  AsyncContext.callingContext(() => {
    console.log(asyncVar.get()); // => 'second'
  });
}

const it = asyncVar.run("init", () => {
  return gen();
});

asyncVar.run("first", () => it.next());
asyncVar.run("second", () => it.next());

One thing we haven't discussed is what to do with Promises. @mhofman had requested being able to receive both the resolve() call-time context and the p.then(fn) registration time context when invoking fn. With a bit of extra spec work, we could expose that with this same API. I just haven't done the work to propagate the call-time through the Promise reaction jobs yet.

@Qard
Copy link

Qard commented Apr 3, 2024

Microsoft attempted to describe a similar formalization of context in this two paths scenario many years ago which sadly didn't go very far unfortunately, but I do think it's a good idea.

Personally I'm wondering if we actually need to restore the context for a window or just be able to read the context. We could alternatively just have a different getter rather than needing the extra closure:

const asyncVar = new AsyncContext.Variable();

const obj = new EventEmitter();

asyncVar.run("registration", () => {
  obj.on("foo", () => {
    // EventEmitter restored the registration time context before
    // invoking our callback. If the callback wanted to get the context
    // active during the `emit()` call, it would have to receive a
    // Snapshot instance passed by the caller.
    console.log(asyncVar.get()); // => 'registration'

    // But with `store.getCallingContext()`, we're able to restore
    // the caller's context without changing EventEmitter's API.
    // EventEmitter can continue to assume that registration is default
    // that is most useful to developers.
    console.log(asyncVar.getCallingContext()); // => 'call'
  });
});

asyncVar.run("call", () => {
  obj.emit("foo");
});

I'm wondering if this satisfies the needs or if there's a specific scenario for actually restoring the context that you have in mind? 🤔

@Flarna
Copy link

Flarna commented Apr 3, 2024

If there are more tools (e.g. OTel tracing, some security monitoring and some context enabling logger,...) around in an application it might happen that there are nested .run() calls.
As a result I doubt a static AsyncContext.callingContext might fit here in all these cases.

@littledan
Copy link
Member

I'm very skeptical of this proposal as a general solution.

  • There are more than 2 relevant contexts as @Flarna pointed out. This means I don't know how the API itself should look or be implemented
  • This proposal would not save us any work on the integration into environments--we still need to find a good default answer which will be used most of the time. And for tasks that are scheduled by the host (so there's no enclosing JS code), they'll now have to provide multiple answers--making integration even more complicated.
  • I'm really not sure how anyone using AsyncContext will decide whether they want to refer to currentContext() or not. The current snapshot is supposed to be the current context. What is it that they're opting into/out of? How should this be understood?

The big example here was events, where there are two relevant contexts: registration time (the only option when the system is the one dispatching it) or calling time (when explicitly dispatching events). These are well-defined.

One idea for handling events sent from JS: making the ambient snapshot be registration time, but passing the call-time snapshot (or undefined when it's missing) as a property of the event.

@shicks
Copy link
Contributor

shicks commented Apr 3, 2024

@Qard

I'm wondering if this satisfies the needs or if there's a specific scenario for actually restoring the context that you have in mind? 🤔

See below. If you only expose it at get-time, then you end up with nesting problems and would likely need more than just the last caller. Also, if you're not the one reading from the context, but are calling an API that does so, then it would need to know which context to read from, whereas restoring the context allows the transitive code to not worry about it because you can ensure it just runs in the correct context from the start.


@Flarna

If there are more tools (e.g. OTel tracing, some security monitoring and some context enabling logger,...) around in an application it might happen that there are nested .run() calls. As a result I doubt a static AsyncContext.callingContext might fit here in all these cases.

I think the idea is that if you need it, you use it immediately inside the function body, rather than expecting some more deeply-transitive function to check it. The point is that it provides plumbing that can "undo" the (usually correct) registration-context default in cases where that's necessary. Just like Snapshot is awkward to use directly, I expect something akin to wrap to show up for this:

function wrapWithCallingContext(fn) {
  return function() {
    return AsyncContext.callingContext(() => fn.apply(this, arguments));
  }
}

Now if you've got a scheduler that you know is going to call your callback in the registration context but it needed to access the calling context, you can instead schedule(wrapWithCallingContext(callback)) and it does the right thing.

I also don't think calls to run are particularly relevant here - if you need to preserve the pre-run context, you can take a Snapshot at the appropriate time. What we're talking about is when context changes due to a snapshot being restored (either implicitly or explicitly).


@littledan

I'm very skeptical of this proposal as a general solution.

  • There are more than 2 relevant contexts as @Flarna pointed out. This means I don't know how the API itself should look or be implemented

I think it's pretty straightforward: when the current context ends, the engine knows what context it's going to restore. That's the context that callingContext should expose. If you think about it from a polyfill perspective, it's basically just

Snapshot.prototype.run = function(fn, ...args) {
  const savedCurrentContext = $currentContext;
  const savedCallingContext = $callingContext;  // (added)
  $callingContext = $currentContext;  // (added)
  $currentContext = this.context;
  try {
    return fn.apply(null, args);
  } finally {
    $currentContext = savedCurrentContext;
    $callingContext = savedCallingContext; // (added)
  }
}

I don't know whether a similar change should be made to Variable.prototype.run, but either way the analogue is clear.

  • This proposal would not save us any work on the integration into environments--we still need to find a good default answer which will be used most of the time. And for tasks that are scheduled by the host (so there's no enclosing JS code), they'll now have to provide multiple answers--making integration even more complicated.

You're right that this doesn't save work - good defaults are definitely still important. I don't think it will produce significant extra work. The only cases where we might possibly choose anything other than the obvious "immediately surrounding caller" would be cases where there's already something interesting enough going on that we'd already not be using the registration context - so we're already thinking about the relevant non-registration context. In those cases, the calling context should probably just be null, and the interesting third context should be the default.

  • I'm really not sure how anyone using AsyncContext will decide whether they want to refer to currentContext() or not. The current snapshot is supposed to be the current context. What is it that they're opting into/out of? How should this be understood?

Allow me to turn this question around somewhat. As the proposal stands today, this is already an issue: if we're talking about making a decision for every callback-accepting API in the web platform, then we will likely end up with some functions going with registration context and some going with calling (or some other) context. Even if every builtin API uses registration context, there will inevitably be userland scheduling libraries that end up defaulting into using the calling context (or something even less relevant) because that's what you get without updating the library to account for AsyncContext. But we recognize that this choice may not always be what the user actually needs, and that's why we provide Snapshot and wrap in the first place - so that developers can get the context they need when the default isn't right. But as it stands, there's a weird asymmetry where you can wrap a calling-context API to run your callback in the registration context (or some other pre-existing context available at registration time), but there's no way to go the other direction. If we believe there are APIs that make sense to default to calling context then we're acknowledging that that context is sometimes correct - yet it becomes effectively unavailable if the API defaults to registration context. I'm in complete agreement that this is the correct default in many cases, but when that doesn't work for a particular usage, there needs to be some sort of escape hatch.

The big example here was events, where there are two relevant contexts: registration time (the only option when the system is the one dispatching it) or calling time (when explicitly dispatching events). These are well-defined.

One idea for handling events sent from JS: making the ambient snapshot be registration time, but passing the call-time snapshot (or undefined when it's missing) as a property of the event.

I disagree with this approach: it's a much more invasive solution that requires updating one or more disparate APIs and doesn't solve the general case, whereas keeping the solution local to AsyncContext is both more general and more self-contained.

@shicks
Copy link
Contributor

shicks commented Apr 3, 2024

@jridgewell

One thing we haven't discussed is what to do with Promises. @mhofman had requested being able to receive both the resolve() call-time context and the p.then(fn) registration time context when invoking fn. With a bit of extra spec work, we could expose that with this same API. I just haven't done the work to propagate the call-time through the Promise reaction jobs yet.

Is this another case where there's more than two possibilities? What about the following?

const p1 = v.run(1, () => Promise.resolve());
const p2 = v.run(2, () => p1.then(() => {
  // main context: 2; calling context: 1?
  return v.run(3, () => Promise.resolve());
}));
const p3 = p2.then(() => {
  // main context: null; calling context: 2 or 3?
});

May be worth following up in a separate issue so as not to derail this discussion.

@Flarna
Copy link

Flarna commented Apr 3, 2024

I think the idea is that if you need it, you use it immediately inside the function body, rather than expecting some more deeply-transitive function to check it.

EventEmitters are often used deep inside libraries. There are a lot case where registration/calling happens quite distant from code in your control. There can be several layers/wrappers between the emitter and the handler.

I also don't think calls to run are particularly relevant here - if you need to preserve the pre-run context, you can take a Snapshot at the appropriate time. What we're talking about is when context changes due to a snapshot being restored (either implicitly or explicitly).

I don't understand this. A snapshot restore is done via snapshot.run() which is not really different to asyncVar.run().

I don't think there should be a difference if I call emit within a run callback of a snapshot or within a run callback providing a new store. If someone calls run it switches the context. This new active context is also that one propagated if new async operations are initiated.

@jridgewell
Copy link
Member Author

@shicks

Is this another case where there's more than two possibilities?

Basically. There are 3 possibilities:

  1. .then(fn) registration time (absolutely the default)
  2. resolve() call-time (deferred.resolve() or the internal resolve() of a .then(() => promise))
  3. Empty execution stack call-time, which is what AsyncContext.callingContext() will currently use.

What about the following?

I think 2? But I haven't put much thought into it besides "it's possible".


EventEmitters are often used deep inside libraries. There are a lot case where registration/calling happens quite distant from code in your control. There can be several layers/wrappers between the emitter and the handler.

I think this doesn't apply? The cases that I imagine is first party code, where the emitter and the handler are controlled by the same dev. It could even be inside library, where the library is in control of the emitter and handler.

The default should be registration time (and that's directly under the control of the dev when using 3p code). If they're using 3p code, then the calling context may or may not be relevant depending on if that code has snapshotted something unexpected. But this isn't any different than the code today where you only have access to the registration–time context.

A snapshot restore is done via snapshot.run() which is not really different to asyncVar.run().

This API is specifically to help with registration-time or call-time problem. So it's directly tied to snapshot.run(). I don't think asyncVar.run() is relevant here because it's not used to restore the registration-time context at a later point.

@shicks
Copy link
Contributor

shicks commented Apr 3, 2024

This API is specifically to help with registration-time or call-time problem. So it's directly tied to snapshot.run(). I don't think asyncVar.run() is relevant here because it's not used to restore the registration-time context at a later point.

I could see going either way on this: treat them the same for consistency, or don't do anything for asyncVar.run because it's less relevant. It doesn't make a ton of difference in my opinion, as long as there's some workable way to restore the calling context after a snapshot is swapped.

@Flarna
Copy link

Flarna commented Apr 4, 2024

I think this doesn't apply? The cases that I imagine is first party code, where the emitter and the handler are controlled by the same dev. It could even be inside library, where the library is in control of the emitter and handler.

ok. for this very specific scenario it might be fine. But docs should clearly state that this API is only useful is such special cases.

I'm also not sure if registration time is the correct time to capture context for EventEmitters in general.
e.g. OpenTelemetry instrumentation for node.js http module instrument the HTTP request/response objects (which are EventEmitters internally) to bind their context on the emitter more or less on emitter creation time (actually when they first get it).

@Qard
Copy link

Qard commented Apr 8, 2024

Making this statically-scoped further suffers from the grafting problem. If we follow the longest path through the internal machinery of everything, which loses relevance in cases like connection pools, we can graft around those internals to produce a more optimal graph for the use case of following user intent. However by applying that graft globally we risk breaking store users which don't want that grafting decision and we lose the possibility to restore that branch of the tree. With callingContext(...) we sort of resolve that problem, but only if we also avoid mutating the graph globally.

I think this is another case where we need to think more at instance scope rather than static scope. As with my prior suggestion of having an instance-scoped bind(...), it would be advisable to have an instance-scoped way to follow the calling context path. In fact, this sort of case is specifically why I made that previous suggestion. Instance-scoped bind(...) and callingContext(...) are actually solving the same problem, just from different sides of the problem.

By always following the internal path and providing tools to reduce the graph where needed we provide a graph which can always be reduced to the more optimal graph for a given case, but we can't do the reverse if a path taken has already orphaned a branch. However the callingContext(...) concept, if done right, which by that I mean instance-scoped, can actually do both directions, allow us to define a suggested path while still enabling following that opposite path where necessary.

The following graph shows an example of a scenario where both interpretations of the graph may be valid depending on the desire of the store owner. This graph corresponds to a call to fs.readFile(name, callback) in Node.js.

context-flow-orphaning

The solid line represents the user-facing execution path which may be the path a user desires to take for context management of application data. However there's a bunch of inner calls which actually led to the callback being called and each of those calls could be setting their own context data. A tracing product may want to produce spans for that inner behaviour which flow back to the user code, but that link would be broken if context is forced to flow only via the solid line.

If viewing this problem with static scope we make a choice for all stores to follow--either the solid path or the dotted path. If we approach the problem with instance-scoped tools, we can follow either path per-store and the other user can either instance-scoped bind(...) in or instance-scoped callingContext(...) out to the other path.

I think perhaps I would change it slightly though to just return the snapshot it was in prior to the current snapshot restoring.

@littledan
Copy link
Member

If we want to focus on "instance-scoped" snapshots (#25), then we have an m x n problem that I see as really unsolvable programming model wise: If you have m variables, and n contexts where you queue a closure (so you need to AsyncContext.Snapshot.wrap it), then how is it possible for each of these things to know about the other to determine which one it should propagate? It seems unscalable, like for each variable you need to be an APM which monkey-patches all the modules in the world.

@jasnell
Copy link

jasnell commented Apr 8, 2024

Thinking through this a bit... there are two contexts to be concerned about here.

For EventEmitter/EventTarget:

  1. The context that is current when the event is dispatched, e.g. acval.run(2, () => et.dispatchEvent(ev)) (call context)
  2. The context that is current when the handler is registered, e.g. acval.run(3, () => et.on(() => {}) (registration context)

Both are important in different scenarios, and both may be important at the same time depending on use case. So if I am understanding the issue in this PR correctly, the question is how we should propagate all of them?

For this, based on experience with AsyncLocalStorage, I think it's best if, by default, the handler is invoked in the call context, or, in other words:

als.run(1, () => et.addEventListener('foo', () => {
  console.log(als.get());  // 2
});
als.run(2, () => et.dispatchEvent(new Event('foo')));

And if the code that is registering the event listener explicitly opts in to wanting the registration context preserved, they would use snapshot:

als.run(1, () => et.addEventListener('foo', AsyncContext.Snapshot.wrap(() => {
  console.log(als.get());  // 1
}));
als.run(2, () => et.dispatchEvent(new Event('foo')));

This makes using both options possible. If the registration context were the default, then there would otherwise be no actual way of preserving the call context without it being a more convoluted mess.

But what if we want both? That's certainly a tad trickier but not impossible using AsyncContext.Snapshot. Since the call context would be the default, we'd simply need to capture the AsyncContext.Snapshot and close over it in the event handler

als.run(1, () => {
  const snapshot = new AsyncContext.Snapshot();
  et.addEventListener('foo', () => {
    const callContextValue = als.get(); // 2
    snaphot.run(() => {
      const registrationContextValue = als.get(); // 1
    });
  });
});
als.run(2, () => et.dispatchEvent(new Event('foo')));

For an event emitter and EventTarget, I think this approach would make the most sense. Making the registration context the default here, and introducing AsyncContext.callingContext() would be a mistake.

For sync and async generators (e.g. function* {...}):

For a sync generator, the body of the generator itself should capture the async context that exists when the generator is created and should have no visibility into the async context of whatever code is calling next().

function* foo() {
  yield acval.get();
}

const gen = acval.run(123, () => foo());

acval.run(321, () => {
  console.log(gen.next().value);  // 123
});

@littledan
Copy link
Member

@jasnell What about events which are not dispatched from JS? There are still sometimes multiple answers. One example (where maybe the registration time is the only possibility): #22

@jasnell
Copy link

jasnell commented Apr 8, 2024

It likely just needs to be recognized that in some cases, events are going to be triggered from the null-context ... that is, where no context is in scope when the event is actually dispatched. In such cases, either there is no value or the object that is emitting the events (in the case of #22 the XMLHttpRequest) has to capture the AsyncContext.Snapshot when it is created and propagate that.

Looking at the example from #22,

function makeRequestThen(callback) {
  const req = new XMLHttpRequest();
  req.addEventListener('load', AsyncContext.Snapshot.wrap(() => callback(null, req.responseText)));
  req.addEventListener('error', AsyncContext.Snapshot.wrap((err) => callback(err)));
  req.open('GET', url);
  req.send();
}

@mcollina
Copy link

mcollina commented Apr 9, 2024

From a quick skim, I think adding two contexts will significantly increase the overhead of AsyncContext because it would potentially keep them alive longer, increasing memory consumption and GC work. The pattern in #77 (comment) makes sense to me.

@shicks
Copy link
Contributor

shicks commented Apr 10, 2024

@jasnell

For a sync generator, the body of the generator itself should capture the async context that exists when the generator is created and should have no visibility into the async context of whatever code is calling next().

function* foo() {
  yield acval.get();
}

const gen = acval.run(123, () => foo());

acval.run(321, () => {
  console.log(gen.next().value);  // 123
});

Can you elaborate on your reasoning here? As mentioned in #75 (comment), there are also cases where a generator wants to run each continuation in the next's calling context, and your suggestion would render that case impossible (without terrible workarounds). It's possible to write a wrapper to restore the init snapshot, but the reverse requires invasive changes to the generator body.

@jridgewell
Copy link
Member Author

@Flarna

I'm also not sure if registration time is the correct time to capture context for EventEmitters in general.
e.g. OpenTelemetry instrumentation for node.js http module instrument the HTTP request/response objects (which are EventEmitters internally) to bind their context on the emitter more or less on emitter creation time (actually when they first get it).

In my head, OpenTelemetry would instrument the http.createServer() method, and run a new context before invoking it:

const requestContext = new AsyncContext.Variable();

http.createServer = (handler) => {
  const server = new Server(); // or whatever;
  server.handlerRequest = () => {
    const { req, res } = internalStuffToCreateReqRes();
    requestContext.run({ req, res }, handler);
  };
};

It doesn't matter that req/res's event emitter binds at registration-time, because their registration-time is effectively the creation time's context.

Does this match what you're thinking?


@Qard

The solid line represents the user-facing execution path which may be the path a user desires to take for context management of application data. However there's a bunch of inner calls which actually led to the callback being called and each of those calls could be setting their own context data. A tracing product may want to produce spans for that inner behaviour which flow back to the user code, but that link would be broken if context is forced to flow only via the solid line.

I think this case is neatly handled by defaulting to registration-time, and using callingContext()? I'm imagining there's a openSpan() before the fs.readFile() call, and an closeSpan() within the handler:

function handleRequest() {
  const span = openSpan();
  fs.readFile('foo.txt', (err, data) => {
    closeSpan();
  });
}

closeSpan() could itself use callingContext() to retrieve data provided by open()/read()/close() calls.

But from the dev's perspective, I think registration-time is still correct. I don't care how OTel adds data, just that any context data that I put there is available in my callback once the file is read. Yes, they could wrap, but it seems better ergonomics to do it by default.


@jasnell

It likely just needs to be recognized that in some cases, events are going to be triggered from the null-context ... that is, where no context is in scope when the event is actually dispatched.

I think that if we make the null-context something that developers routinely encounter, then we've really failed on the ergonomics. What I imagine is that nearly everything is going to be calling Snapshot.wrap(…), to the point where it becomes a cargo-cult style.

And this will lead to friction. We reintroduced Snapshot.wrap(…) to make it more ergonomic for devs to pass one-off callbacks to libraries (and your code sample uses it!). But if you wrap a callback, how do you get the calling context back to solve @Qard's example? The ergonomic choice breaks it, and to solve the problem we have to go back to new Snapshot() and running that within our callback to get the data the dev put there. Or we'd end up reintroducing callingContext() anyways.


@mcollina

From a quick skim, I think adding two contexts will significantly increase the overhead of AsyncContext because it would potentially keep them alive longer, increasing memory consumption and GC work.

Can you explain where you see them being kept alive longer? callingContext() only restores the context that is already being kept alive by an enclosing snapshot.run(…) call (because the snapshot has to restore that previous context when run() exits).

@jasnell
Copy link

jasnell commented Apr 10, 2024

@jridgewell :

I think that if we make the null-context something that developers routinely encounter, then we've really failed on the ergonomics. What I imagine is that nearly everything is going to be calling Snapshot.wrap(…), to the point where it becomes a cargo-cult style.

I'm not sure how we can reasonably avoid the null-context entirely. But the point is well taken,

Let's see if we can tease things out a bit more. Let's imagine a case where we actually have multiple AsyncContext.Variable instances at play.

const requestId = new AsyncContext.Variable();
const processId = new AsyncContext.Variable();

function log(msg) {
  console.log(`${processId.get()}::${requestId.get()} - ${msg}`);
}

processId.run(123, () => {
  addEventListener('error', (event) => {
    log(`There was an error: ${event.error.message}`);
    return true;
  });
});

requestId.run('abc', () => {
  reportError(new Error('boom'));
});

What should log(...) print in this case? We have two variables, one set in the registration context, another in the calling context. How would log(...) know that it needs to grab callingContext(...) for requestId? In order to make this ergonomic, we'd need either both variables to be entered at registration or both to be entered at call.

... But if you wrap a callback, how do you get the calling context back to solve @Qard's example?

Perhaps we're thinking about it the wrong way. Consider my example where we access both the calling context and registration context:

als.run(1, () => {
  const snapshot = new AsyncContext.Snapshot();
  et.addEventListener('foo', () => {
    const callContextValue = als.get(); // 2
    snaphot.run(() => {
      const registrationContextValue = als.get(); // 1
    });
  });
});
als.run(2, () => et.dispatchEvent(new Event('foo')));

What is clear within the event handler callback is that there is an initial context (in this case the calling context), and we enter a new context, effectively forming a stack. What if we modeled it as such?

als.run(1, () => {
  const snapshot = new AsyncContext.Snapshot();
  // Let's define this such that the snapshot is automatically taken when addEventListener
  // is called. Then, when we emit, we enter the registration context. The calling context will
  // naturally be the parent on the stack...
  et.addEventListener('foo', () => {
    console.log(als.get());  // 1  ... registration context
    context.log(als.parent.get());  // 2 ... calling context
  });
});
als.run(2, () => et.dispatchEvent(new Event('foo')));

@Qard
Copy link

Qard commented Apr 10, 2024

It's worth considering that if we always propagate by calling context by default and then bind around registration points (including within the language itself) then the calling context will always be synchronously available at the point when the bind would change the context and we therefore only need to capture the value when the bind occurs and store it only for the synchronous window of the scope function. In any nested async code that would no longer be the calling context, something else will become the calling context.

I don't think it's nearly as expensive to hold as people are thinking.

@shicks
Copy link
Contributor

shicks commented Apr 11, 2024

IIUC,

async function* gen() {
  console.log(AsyncContext.callingContext(() => v.get()));
  yield;
  console.log(AsyncContext.callingContext(() => v.get()));
  await 0;
  console.log(AsyncContext.callingContext(() => v.get()));
}
const it = v.run(1, () => gen());
await v.run(2, () => it.next()); // logs 2
await v.run(3, () => it.next()); // logs 3, null

The fact that callingContext is more or less meaningless after an await in an async generator is maybe somewhat concerning?

@Flarna
Copy link

Flarna commented Apr 11, 2024

It doesn't matter that req/res's event emitter binds at registration-time, because their registration-time is effectively the creation time's context.

Does this match what you're thinking?

The life cylce of a HTTP operation and all the inner operations (like DB requests triggered) is a bit more complex.
Anyhow, I think the easier to read samples above from @jasnell describe the problem also very well.

My main point is that EventEmitters are used in that a lot different ways that it's mostly not possible to select a single correct solution. e.g.

  • in HTTP use case above instances are created for exactly the lifetime of an "operation"
  • other cases use instances within an queue/shared connection/... where the instance is used for quite different "operations"

As a result I question why they should capture context on default just to force users to restore the original context to workaround the default.

Node for example offers a non capturing EventEmitter and a creation time capturing EventEmitterAsyncResource to allow users to choose.

@mcollina
Copy link

Can you explain where you see them being kept alive longer? callingContext() only restores the context that is already being kept alive by an enclosing snapshot.run(…) call (because the snapshot has to restore that previous context when run() exits).

@jridgewell consider the example of #77 (comment). callingContext() is essentially useless after an await, so the only thing users would do (or would ask us to do), is to keep it around for longer / essentially as the "parent" that caused the transition from one context to another.

Developers unfortunately like to "fork" the promise/context chain further, so I expect the calling context to always leak over its original lifetime

@Qard
Copy link

Qard commented Apr 11, 2024

The context would already flow to that point anyway though, unless you're proposing cutting off a branch at the start of a bind point, which would mean internals would all be context-less and unable to connect to anything useful. Referencing my graph in #77 (comment) that would mean breaking the link between fs.readFile and fs.open which would result in all that internal behaviour being untraceable as it would have no parent context to link. We could introduce an equivalent to the callingContext API for that other end of the bind graph reduction, but that would be storing pre-async-barrier to access post-async-barrier and so would be even more costly to retain the data to be able to escape from that scenario.

From both performance and flexibility perspective the best possible choice is to always follow call context path and then allow snapshot restores to bind around internal paths where desired but with an escape hatch for individual stores to restore the calling context path for their case. This also means the calling context only needs to be stored for the cases where a bind graph reduction is applied as every other case the calling context and the stored context would be the same.

Internals can do snapshot restores by default around cases where it makes sense, but any internal paths, where observable, would continue to follow the calling context and so would not have an unexpected null context. It would only have an immediate switch at the point of the snapshot restore, which can be used to capture the current context as the calling context for the sync window of that snapshot restore.

@jridgewell
Copy link
Member Author

@jasnell

What should log(...) print in this case? We have two variables, one set in the registration context, another in the calling context. How would log(...) know that it needs to grab callingContext(...) for requestId? In order to make this ergonomic, we'd need either both variables to be entered at registration or both to be entered at call.

If we're imagining that log should not need to change (I agree), it seems like this falls on the error handler to stitch these together? But this scenario seems specific because we're not running the request handlers within the scope of the processId.run().

(This might be a case where error is a special global event listener like unhandledrejection, but that would mean you need you'd need to get the process id some other way…)

... But if you wrap a callback, how do you get the calling context back to solve @Qard's example?
Perhaps we're thinking about it the wrong way. Consider my example where we access both the calling context and registration context:

Both of these code samples work fine, because the first is using new Snapshot() and not Snapshot.wrap (and so can control when it enters the registration context), and the second is using essentially callingContext().

But my scenario from #77 (comment) was that devs will reach for Snapshot.wrap(…) because it is easy and ergonomic. And if this becomes cargo-cult style to wrap all your callbacks (that way the data the devs set is always propagated), then we still need callingContext() in some form so that we can retrieve call data.


@shicks

The fact that callingContext is more or less meaningless after an await in an async generator is maybe somewhat concerning?

Oh, I missed that. That is concerning, I would expect the final log to be 3.


@Flarna

My main point is that EventEmitters are used in that a lot different ways that it's mostly not possible to select a single correct solution. e.g.

Is it known when the event emitter is constructed whether it will always be registration or call time?

As a result I question why they should capture context on default just to force users to restore the original context to workaround the default.

Because the registration-time one has the dev's data from when the called on or addEventListener, and the other's data is unknown. We'll need to switch contexts in one direction or the other, so I think the default should be the one with the dev's data.


@Qard

The context would already flow to that point anyway though, unless you're proposing cutting off a branch at the start of a bind point, which would mean internals would all be context-less and unable to connect to anything useful.

My example was a bad because I didn't nest the callbacks. openSpan(() => { /* read here */ }).


It seems everyone is arguing for call-context by default. We'll need to discuss this in the next meeting.

The issue is that we have #22 arguing that switching from a fetch promise to using xhr.send() event listeners shouldn't cause a refactoring hazard, and that only works if we use registration by default. We'll need decide if we can get past this concern from committee delegates.

@jridgewell
Copy link
Member Author

I'm just now realizing that #22 is handled if send()'s call-time context is used when invoking the event handlers. Maybe call-time isn't an issue at all.

@Qard
Copy link

Qard commented Apr 11, 2024

I think the conversation around call-time vs registration time is a bit fuzzy and could use some clarification.

Call-based context is technically what you have by default if you don't do anything to manage propagation. Data "flows" in that way because the application execution itself flows in that way and the data doesn't change. Registration time is simply the act of capturing context at some point and swapping out the state that already flowed along with the calling path to begin with and replacing that with the value capture earlier. Capturing context and restoring it around registration time flows makes perfect sense in certain cases.

What I'm saying is that if access to calling context is done right the argument of which one is "correct" no longer matters because you can always just follow calling context out of a bound context or manually bind a context as-needed. Accessing calling context is essentially a way to undo the context you just restored and return to the one you would have been in if not for the bind, which means both paths are accessible and the "correct" path becomes a purely user experience default, not a critical decider of what is possible to do with the graph.

@shicks
Copy link
Contributor

shicks commented Apr 12, 2024

@Qard

What I'm saying is that if access to calling context is done right the argument of which one is "correct" no longer matters because you can always just follow calling context out of a bound context or manually bind a context as-needed.

I saw your image earlier but didn't exactly follow it (I don't have a strong Node.js background). What stood out to me from your earlier post was

If viewing this problem with static scope we make a choice for all stores to follow--either the solid path or the dotted path. If we approach the problem with instance-scoped tools, we can follow either path per-store and the other user can either instance-scoped bind(...) in or instance-scoped callingContext(...) out to the other path.

I understood "instance-scoped bind" to mean that each AsyncContext.Variable would provide its own wrap method, rather than binding/wrapping all variables at once - but this seems infeasible because one of the points of this whole thing is that intermediate code doesn't need to be aware of all the Variables, but can implicitly propagate everything, and it just "does the right thing". So as I understand it, there really needs to be a single static wrap function that scheduling libraries can call without any awareness of what Variables are being wrapped.

That said, I think the question of how much a userland scheduler should wrap does come down to details of how/why it's doing the scheduling. Imagine a simple thread pool that takes async functions and limits the number of pending promises - this one should absolutely go with registration-time context for all variables, since call-time will be essentially meaningless. On the other hand, imagine a data production pipeline where one registers a function for each step in the pipeline (e.g. computed signals). In that case, tracing or cancellation would both benefit from propagating the call-time context. There's clearly no one-size-fits-all answer here, and even at the Variable level, I don't think it's possible to know whether any given scheduler should be wrapping it.


I'm intrigued by the idea that both paths in the graph might be available, but I'm skeptical whether that's at all feasible in terms of implementation. Can it be done without significantly increasing the runtime cost (in particular, on folks who don't actually use the thing) or sacrificing the general usability (as demonstrated above)?

@Qard
Copy link

Qard commented Apr 12, 2024

Having instance-scoped bind doesn't mean we remove static bind. We should have both and generally path decisions will be made statically. The instance-scoped version would exist as an escape hatch for the handful of scenarios where the correct path is use-case dependent and a particular store user disagrees. They can, with some additional effort, follow a different path where preferred.

An instance-scoped bind lets them route around internal paths they don't want to follow, like the implementation details of a connection pool. An instance-scoped calling context provides the ability to go the other direction where a bind has been applied statically and a particular store disagrees with that default selection they can return to the internal path which was orphaned when the snapshot was restored.

As for feasibility and runtime cost, it absolutely can be done as it has been done before with AsyncLocalStorage. I have put significant effort into ensuring it is capable of these things as the primary use case it targets is observability providers which need this capability. To reach calling context all you need to do is have a snapshot of the state before a bound snapshot is restored so you can return to that snapshot, effectively undoing the bind for a limited scope.

@shicks
Copy link
Contributor

shicks commented Apr 12, 2024

@Qard

An instance-scoped calling context provides the ability to go the other direction where a bind has been applied statically and a particular store disagrees with that default selection they can return to the internal path which was orphaned when the snapshot was restored.

Can you provide some example code showing what this (would/could/does) look like? In my imagination you're stuck digging around a complex graph, which seems infeasible, so I think I'm missing your point. To put it concretely, suppose I have the following scheduler outside of my control:

class Scheduler {
  callbacks = [];
  register(callback) {
    this.callbacks.push(AsyncContext.Snapshot.wrap(callback));
  }
  call() {
    this.callbacks.forEach(fn => fn());
  }
}

Now I want to call register in such a way as to get the calling context inside the registered function:

scheduler.register(() => console.log(v.get()));

How do I change that register call such that it logs the variable from the context that called call() rather than the registration context? @jasnell suggested a v.parent.get() - is that (analogous to) what you're talking about, or do you have something else in mind? What would it look like if there's additional indirection (with more vars being set in between). IIUC you're suggesting we can traverse multiple graph edges - what specifically would that look like?

@Qard
Copy link

Qard commented Apr 12, 2024

No, not multiple graph edges. We just need one level up to restore the orphaned branch because, if it's following call context by default it will have already propagated through that branch on its own anyway. In the graph I posted earlier the fs.readFile would propagated through the dotted line branch all the way up until the callback to fs.readFile is reached as a snapshot restore will swap out the context that flowed through the dotted path for the one which it captured at the start of the fs.readFile call. Capturing a snapshot will not cause the fs.readFile to stop propagating by call path through that branch up until that point, it will only stop propagating at the exact moment it goes to restore the snapshot. So at that exact moment you can just get whatever the current context is and that will be the calling context.

If there are multiple levels of binds then it would just be up to the user to handle each to follow the path it wants to follow. There's no reasonable way to describe the full path a context propagation should take, it needs to be handled on a case-by-case basis. This is why we define a best-effort default path which propagates automatically and then provide tools to bind or restore away from that path.

In that scheduler case you use AsyncContext.Snapshot.wrap(...) which is essentially something like:

function wrap(fn) {
  const snapshot = new AsyncContext.Snapshot()
  return (...args) => snapshot.run(fn, ...args)
}

This does nothing to change the propagation behaviour of the async tasks scheduled within the call receiving this wrapped function. All it does is replace what has already been propagated when that snapshot.run(...) is called. Inside that snapshot.run(...) you can just capture the current context before running the snapshot, which we already do to restore the context after it finishes running, and use that prior context value to get the state of the store in the calling context path from that prior snapshot.

I think people are expecting this to be a lot more complicated and expensive than it actually needs to be.

@littledan
Copy link
Member

I'd prefer that we adopt the direction described in #100 where we instead provide explicit APIs to access certain snapshots, rather than try to define a general one as in this PR.

@Qard
Copy link

Qard commented Jul 23, 2024

That doesn't help APMs though as we would then still need to patch everything to get at those context objects and flow them ourselves when they're passed around on event objects. We need an interface decoupled from the execution itself to be able to know what contexts are current. For example, say we instrument something that somewhere internally deals with events, and we instrument something that would trigger in an event handler, but we don't have any specific awareness of the event emitter itself we would have no way to establish the link between those two points unless the event object was passed directly through to that inner API, which is generally not the case. This sort of scenario is the whole point of context in the first place--to be able to pass data through the execution graph without needing explicit access to anything in particular between the two points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants