Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Allocation overhead for native type wrapping #63

Open
knervous opened this issue Mar 21, 2025 · 2 comments
Open

Question: Allocation overhead for native type wrapping #63

knervous opened this issue Mar 21, 2025 · 2 comments

Comments

@knervous
Copy link

knervous commented Mar 21, 2025

Hello, amazing library, I am using the web export and this has increased development speed tremendously, thank you for your work.

I am curious on thoughts/best practices around native type marshalling - I've noticed in debug output that in cases where native types are created on the stack and returned they will trigger the gc.alloc debug command, example is in Vector3 methods:

Vector3 Vector3::cross(const Vector3 &p_with) const {
	Vector3 ret(
			(y * p_with.z) - (z * p_with.y),
			(z * p_with.x) - (x * p_with.z),
			(x * p_with.y) - (y * p_with.x));

	return ret;
}

Will result in an allocation. I have been implementing TS extensions to allow for mutating the same object with the same logic for many of these utility functions like add, scale, etc. and keeping static instances for utilities. Also for getters on a node3d like node.global_position.x the global_position will allocate a Vector3 which seems like an expensive operation. Same with _input events for mouse/keyboard - they will be wrapped and allocated.

Do you think this is something that doesn't require optimization? I am trying to be as strict as possible in WASM environment, creating a wrapper for marshalling simple types without copies. Was curious your thoughts on this approach?

export class Extensions {
    static cachedPosition: Vector3 = new Vector3(0, 0, 0);
    static cachedRotation: Vector3 = new Vector3(0, 0, 0);
    static cachedPositionNode: Node | null;
    static root: Node | null;

    static Dispose() {
        Extensions.cachedPositionNode = null;
    }

    static SetRoot(node: Node) {
        Extensions.root = node;
    }

    private static get positionNode() {
        if (!Extensions.cachedPositionNode) {
            Extensions.cachedPositionNode = Extensions.root?.get_node('/root/Zone/GDBridge') as Node;
        }
        return Extensions.cachedPositionNode;
    }

    static Eval(node: Node3D, code: string) {
        return Extensions.positionNode?.call('eval_plain', code, node);
    }

    static GetPosition(node: Node3D): Vector3 {
        this.cachedPosition.x = Extensions.positionNode.call('eval_plain', 'node.global_position.x', node);
        this.cachedPosition.y = Extensions.positionNode.call('eval_plain', 'node.global_position.y', node);
        this.cachedPosition.z = Extensions.positionNode.call('eval_plain', 'node.global_position.z', node);

        return this.cachedPosition;
    }

    static GetBasisZ(node: Node3D): [number, number, number] {  
        const x = Extensions.positionNode.call('eval_plain', 'node.transform.basis.z.x', node);
        const y = Extensions.positionNode.call('eval_plain', 'node.transform.basis.z.y', node);
        const z = Extensions.positionNode.call('eval_plain', 'node.transform.basis.z.z', node);
        return [x,y,z]
    }
}

and GDBridge:

extends Node

var _expr_cache := {}

func eval_plain(expr_str: String, node: Node3D) -> Variant:
	var expression: Expression
	if _expr_cache.has(expr_str):
		expression = _expr_cache[expr_str]
	else:
		expression = Expression.new()
		var err = expression.parse(expr_str, ["node"])
		if err != OK:
			push_error("Parse error in expression: %s" % expr_str)
			return null
		_expr_cache[expr_str] = expression

	var result = expression.execute([node])
	if expression.has_execute_failed():
		push_error("Execution error in expression: %s" % expr_str)
		return null
	return result

func _input(event: InputEvent) -> void:
	if get_viewport().gui_get_focus_owner():
		return
	if event is InputEventMouseButton:
		var node = get_node("/root/Zone")
		node.call("input", event.button_index);
		if event.button_index == MOUSE_BUTTON_RIGHT:
			Input.set_mouse_mode(Input.MOUSE_MODE_CAPTURED if event.pressed else Input.MOUSE_MODE_VISIBLE)
	elif event is InputEventPanGesture:
		var node = get_node("/root/Zone")
		node.call("input_pan", event.delta_y);
	elif event is InputEventMouseMotion:
		var node = get_node("/root/Zone")
		node.call("input_mouse_motion", event.relative.x, event.relative.y);
@ialex32x
Copy link
Collaborator

Of course, resuing the wrapped value is a good practice to improve critical performance.
However, eval_plain introduces string marshaling, hashing, more function calls, and cross-domain invocations, which is slower than a simple Vector3 wrapper instance allocation in most situations.
The only way to avoid extra allocations is to use bridge calls implemented in C++,

class MyUtilityFuncs : public Object 
{
    GDCLASS(MyUtilityFuncs, Object)

    static void _bind_methods()
    {
        // ... expose 'set_node_position'
    }

    static void set_node_position(Node2D* node, float x, float y) 
    { 
        node->set_position(Vector2(x, y)); 
    }
};

The valuetype binding for web.imp (using the host JS) is not performant enough in the current version.
I'll try solutions to improve it.
Such as using HEAP32F[get_internal_address_of(vec2)] to get the component of a Vector2 in xxx = vec2.x

@knervous
Copy link
Author

Great, I will look and see what might make sense with extensions on the C++ side. I've already made proto extensions for Vector math on the JS/TS side which mutate rather than copy.. Sort of changes the behavior but it works. The godot <-> js binding is unfortunate and I am trying to keep memory overhead down vs cpu at the moment since that's a weak point in wasm, when I see "allocation" i try to bury it.

I'd be happy to help take a look where I notice things stick out and when I'm doing a bigger optimization, I've only started diving into the web impl code. Maybe overloading for Vector methods with Float32Array or float args with embind... I noticed the FinalizationRegistry callbacks tend to get called parallel to the GC pass in the browser and can cause a big hiccup. Splitting those into requestIdleCallback would help not do all the work in one frame.

Appreciate the response and feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants