Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Proxy to allow for arbitrary property/array access #96

Draft
wants to merge 4 commits into
base: liam/future-types
Choose a base branch
from

Conversation

liamgriffiths
Copy link
Contributor

@liamgriffiths liamgriffiths commented Jul 16, 2024

Warning

This branch is quite experimental and we're still working through it's implications.

This branch is a follow up to #93 and the goal is to attempt to allow for arbitrary property (and array indexing) access on Future values that are defined by the user.

With this implementation, you can do the following:

import { Substrate, Box, ComputeText, ComputeJSON, sb } from "substrate";

async function main() {
  const SUBSTRATE_API_KEY = process.env["SUBSTRATE_API_KEY"];

  const substrate = new Substrate({
    apiKey: SUBSTRATE_API_KEY,
  });

  const numbers = new Box({
    value: [0, 1],
  });

  const latin = new Box({
    value: ["a", "b"],
  });

  const greek = new Box({
    value: {
      a: "α",
      b: "β",
    },
  });

  const proxyAccessor = new Box({ value: { property: "text" } });

  const d = new ComputeText({
    prompt: "What is the character for the Latin 'D' in the Cyrillic alphabet? (just tell me the character only)",
    max_tokens: 1,
  });

  const three = new ComputeText({
    prompt: "What number comes after '2'? (just tell me the character only)",
    max_tokens: 1,
  });
  const number3 = sb.jq<number>(three.future, ".text | tonumber");

  const hebrew = new ComputeJSON({
    prompt: "what are the characters of the Hebrew alphabet (in unicode, eg. א )?",
    json_schema: {
      type: "object",
      properties: {
        characters: {
          type: "array",
          items: {
            type: "string",
            description: "single character",
          }
        }
      }
    }
  });

  const result = new Box({
    value: {
      a: latin.future.value[numbers.future.value[0]],
      b: greek.future.value[latin.future.value[1]],
      c: hebrew.future.json_object.characters[number3 as any],
      d: sb.get<string>(d.future, proxyAccessor.future.value.property),
      abcd: sb.concat(
        greek.future.value.a,
        greek.future.value.b,
        hebrew.future.json_object.characters[number3 as any],
        sb.get<string>(d.future, proxyAccessor.future.value.property),
      ),
    },
  });

  const res = await substrate.run(result);
  console.log(res.get(result));
}
main();

In the example above, Box allows the user to define some arbitrary structure, so the Future values that are accessed from it will have the shape described by the input object.

These Future objects are implemented using a Proxy and intercepts accessors to produce new Future values which are wrapped again in a Proxy allowing us to access further properties in these structures.

Internally, we keep track of the accessors and use them to construct the Trace directive that describes to the server how to access the variable at some given accessor chain. When we accept Future values as inputs to our various helper functions (in sb, like concat, jq, etc) we "unproxy" the potentially proxied value so that we can interact with the "target"'s actual properties instead of the "virtual" properties that are enabled through the proxy.

Currently I've only implemented these proxied Future values on the generated Node Future properties that return arbitrary objects. For example, for BoxOut.value, IfOut.result, ComputeJSONOut.json_object (and other LLM nodes), Embedding.metadata, etc. We might also want to return proxied Future values from other functions too, like sb.jq and sb.get, and I am exploring that in another branch.

Property access and Array indexing

This implementation supports accessing properties on objects using "dot notation" (eg. object.property) and bracket access (eg. object["property"]). Additionally because these proxied objects may also represent arrays, the bracket access may also accept number values too (eg. object[123]). A special use case also implemented here is to support the use of Future values as accessors, which may be used with bracket access (eg. object[future]).

The JavaScript runtime will attempt to convert any value used with bracket indexing (on an object) into a string | Symbol. Because of this, the implementation here handles the case of "numeric" values and Future values as special cases.

For "numeric" values (any "digit-only" value like 123 or "123") will be treated as array-indexing access and will serialize this accessor as a getitem "op". This means that an object with the property "123" may not work as expected, but this might be a reasonable tradeoff as properties do not often use digit-only property names.

For Future values the SDK maintains a hidden lookup table for Future values that are used as accessors. When the Future is used in this context (as a side-effect of the toPrimitive conversion) we store the value in the lookup table and return a specially-formatted string key that the proxy can use to look up the Future and store this internally in the new Future as the next accessor.

Known Limitations

  • As mentioned above, bracket access of object properties with digit-only names will not work, eg object["123"] - but instead be treated as array indexing access.
  • I don't know that there is a suitable type to use generally for these values that allow for arbitrary access without forcing the user to either implement elaborate type assertions or fallback to type casting to any, so the type of these objects currently is any. I'm going to explore using type parameters in these cases that can transform user-defined types into types with wrapped Future<T> types and use type assertions to allow these values to "masquerade" as the provided type and be reflected in LSP tooling
  • Because these objects allow for arbitrary access a user may access properties that will not exist on the resulting proxied Future, but will be given no type error - however they will receive a node dependency_error with the message "Unexpected exception while resolving arguments for node"
  • When using "bracket" property access using a Future value (not proxied and cast to any) the user will need to add a type assertion to avoid TypeScript warnings that "Future<X> cannot be used as an index type.". For example, let newProxiedFuture = proxiedFuture.x.y.z[notProxiedFuture as any]. This doesn't apply to proxied-futures because these are currently cast to any already.

Outstanding Issues

The server side "implicit graph" is not handling a case where Future values from the same node are combined

I've encountered an issue when building some simple test examples where using two Future values from the same node together result in the node being dependent on itself causing the node not to run because it's dependencies are not resolved.

For example,

const data = new Box({
  value: {
    letters: ["a", "b"],
    index: 0,
  },
});

const selected = new Box({
  value: {
    example: data.future.value.letters[data.future.value.index],
  },
});

// prop.type = string
// prop.ref = None
// prop_return_class = Future<string>
// use_proxy = False
Copy link
Contributor Author

@liamgriffiths liamgriffiths Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to remove this information from the codegen output, but I've been using it to debug for now.

Comment on lines 46 to 50
return () => {
const utarget = unproxy(target);
const id = futureId(utarget);
futureTable[id] = utarget; // store in lookup table
return id;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may also add a runtime check here to assert that target in this context is a Future value so that other arbitrary objects cannot be used with bracket access on proxied Future values

import { Future, Trace } from "substrate/Future";

// Only Futures that resolve to string | number are legal property accessors
type FutureTable = Record<string, Future<string | number | unknown>>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type FutureTable = Record<string, Future<string | number | unknown>>;
type FutureTable = Record<string, Future<string | number>>;


// @ts-ignore (access protected prop: _directive)
const trace = target._directive.next(nextProp);
if (!(trace instanceof Trace)) throw "something's not right.";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I think that we're only creating Future values with a Trace directive, but I'm not sure that it would always be the case if we used proxies as return values from other things besides node output futures.

@liamgriffiths liamgriffiths force-pushed the liam/proxy-arbitrary-futures branch from a022cd0 to 4f1685c Compare July 17, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant