Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Performance improvement #31

Closed
wants to merge 6 commits into from
Closed

Conversation

lambdalisue
Copy link
Member

Performance improvement refactoring suggested in #30 are applied.

Check file:///Users/alisue/ogh/jsr-core/asyncutil/mutex_bench.ts
Check file:///Users/alisue/ogh/jsr-core/asyncutil/semaphore_bench.ts
cpu: Apple M1 Max
runtime: deno 1.45.4 (aarch64-apple-darwin)

file:///Users/alisue/ogh/jsr-core/asyncutil/mutex_bench.ts
benchmark              time (avg)        iter/s             (min … max)       p75       p99      p995
----------------------------------------------------------------------- -----------------------------
Mutex                 452.41 ns/iter   2,210,370.0 (439.27 ns … 545.97 ns) 446.38 ns 537.83 ns 545.97 ns
Mutex (1.0.2)         666.79 ns/iter   1,499,722.7 (636.54 ns … 947.56 ns) 653.95 ns 947.56 ns 947.56 ns
Mutex (Issue #30)     435.85 ns/iter   2,294,381.9  (428.7 ns … 458.39 ns) 435.39 ns 457.76 ns 458.39 ns


file:///Users/alisue/ogh/jsr-core/asyncutil/semaphore_bench.ts
benchmark              time (avg)        iter/s             (min … max)       p75       p99      p995
----------------------------------------------------------------------- -----------------------------
Semaphore              78.77 µs/iter      12,695.7    (61.62 µs … 1.81 ms) 68.04 µs 607.38 µs 665.96 µs
Semaphore (1.0.2)     296.69 µs/iter       3,370.6    (260.5 µs … 2.24 ms) 279.54 µs 635.88 µs 687.54 µs

Thanks to @PandaWorker 🎉

Approx. 4x faster than the previous implementation.

  benchmark              time (avg)        iter/s             (min … max)       p75       p99      p995
  ----------------------------------------------------------------------- -----------------------------
  Semaphore              78.77 µs/iter      12,695.7    (61.62 µs … 1.81 ms) 68.04 µs 607.38 µs 665.96 µs
  Semaphore (1.0.2)     296.69 µs/iter       3,370.6    (260.5 µs … 2.24 ms) 279.54 µs 635.88 µs 687.54 µs
Approx. 1.5x faster than the previous implementation.

  benchmark              time (avg)        iter/s             (min … max)       p75       p99      p995
  ----------------------------------------------------------------------- -----------------------------
  Mutex                 452.41 ns/iter   2,210,370.0 (439.27 ns … 545.97 ns) 446.38 ns 537.83 ns 545.97 ns
  Mutex (1.0.2)         666.79 ns/iter   1,499,722.7 (636.54 ns … 947.56 ns) 653.95 ns 947.56 ns 947.56 ns
  Mutex (Issue #30)     435.85 ns/iter   2,294,381.9  (428.7 ns … 458.39 ns) 435.39 ns 457.76 ns 458.39 ns
Copy link

codecov bot commented Aug 15, 2024

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 92.00%. Comparing base (6eaad95) to head (6e1b4fe).
Report is 6 commits behind head on main.

Files Patch % Lines
semaphore.ts 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #31      +/-   ##
==========================================
+ Coverage   91.76%   92.00%   +0.23%     
==========================================
  Files          11       11              
  Lines         340      325      -15     
  Branches       41       41              
==========================================
- Hits          312      299      -13     
+ Misses         28       26       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@PandaWorker
Copy link

It's not entirely clear what Object.assign is used for.
When I measured performance, the version with Object.assign was slower because additional instructions were executed, which in principle are not needed at all

Explain to me why to bind the disposing methods to the promise itself?

return Object.assign(Promise.resolve(disposable), disposable);
return Object.assign(waiter.promise.then(() => disposable), disposable);

@PandaWorker
Copy link

Also, I would recommend using under the hood of the module only: lock[Symbol.dispose]()

And give customers the opportunity to manually release

const sem = new Semaphore(1);

{
  const lock = await sem.acquire();
  try {
     // do ...
  } finally {
    lock[Symbol.dispose](); // ot sem.release();
  }
}

{
  const lock = await sem.acquire();
  return asyncTask().finally(() => lock[Symbol.dispose]); // or sem.release();
}

You also need to write all possible tests and benchmarks

@PandaWorker
Copy link

Also Denon.bench does not run tasks competitively, which means that the semaphore will always be unlocked and will not put the expectant in the Set

You also need to make benchmarks for competitive access to acquire, so that the expectations are exactly put in the Set

@PandaWorker
Copy link

PandaWorker commented Aug 15, 2024

Semaphore - #30 (comment)

While there is no native using feature in v8, you should not use it in places where it can harm performance and in libraries

import { Semaphore } from './semaphore.ts';

const sem = new Semaphore(1);

Deno.bench({
	name: 'relesae with using',
	fn: async () => {
		using _lock = await sem.acquire();
	},
	group: 'semaphore acquire'
});

Deno.bench({
	name: 'relesae with using (acquireWithSignal)',
	fn: async () => {
		using _lock = await sem.acquireWithSignal();
	},
	group: 'semaphore acquire'
});

Deno.bench({
	name: 'relesae with [Symbol.dispose]',
	fn: async () => {
		const lock = await sem.acquire();
		try {
			// do...
		} finally {
			lock[Symbol.dispose]()
		}
	},
	group: 'semaphore acquire'
});

Deno.bench({
	name: 'relesae with [Symbol.dispose] (acquireWithSignal)',
	fn: async () => {
		const lock = await sem.acquireWithSignal();
		try {
			// do...
		} finally {
			lock[Symbol.dispose]()
		}
	},
	group: 'semaphore acquire'
});

Deno.bench({
	name: 'relesae with using',
	fn: async () => {
		const task = async () => {
			using _lock = await sem.acquire();
		}

		await Promise.all([task(), task(), task(), task(), task()])
	},
	group: 'semaphore concurents tasks (5)'
});

Deno.bench({
	name: 'relesae with using (acquireWithSignal)',
	fn: async () => {
		const task = async () => {
			using _lock = await sem.acquireWithSignal();
		}

		await Promise.all([task(), task(), task(), task(), task()])
	},
	group: 'semaphore concurents tasks (5)'
});

Deno.bench({
	name: 'relesae with [Symbol.dispose]',
	fn: async () => {
		const task = async () => {
			const lock = await sem.acquire();
			try {
				// do ...
			} finally {
				lock[Symbol.dispose]()
			}
		}

		await Promise.all([task(), task(), task(), task(), task()])
	},
	group: 'semaphore concurents tasks (5)'
});

Deno.bench({
	name: 'relesae with [Symbol.dispose] (acquireWithSignal)',
	fn: async () => {
		const task = async () => {
			const lock = await sem.acquireWithSignal();
			try {
				// do ...
			} finally {
				lock[Symbol.dispose]()
			}
		}

		await Promise.all([task(), task(), task(), task(), task()])
	},
	group: 'semaphore concurents tasks (5)'
});

### benchmarks

cpu: Apple M1 Max
runtime: deno 1.45.5 (aarch64-apple-darwin)

file:///Users/panda/Documents/deno/sync/semaphore.bench.ts
benchmark                                              time (avg)        iter/s             (min … max)       p75       p99      p995
------------------------------------------------------------------------------------------------------- -----------------------------

group semaphore acquire
relesae with using                                    640.74 ns/iter   1,560,701.0 (444.79 ns … 853.81 ns) 661.68 ns 853.81 ns 853.81 ns
relesae with using (acquireWithSignal)                623.57 ns/iter   1,603,669.8  (443.37 ns … 864.6 ns) 689.33 ns 864.6 ns 864.6 ns
relesae with [Symbol.dispose]                         380.67 ns/iter   2,626,927.6  (296.18 ns … 489.5 ns) 417.5 ns 489.34 ns 489.5 ns
relesae with [Symbol.dispose] (acquireWithSignal)     368.48 ns/iter   2,713,847.5 (285.91 ns … 485.12 ns) 407.7 ns 454.87 ns 485.12 ns

summary
  relesae with [Symbol.dispose] (acquireWithSignal)
   1.03x faster than relesae with [Symbol.dispose]
   1.69x faster than relesae with using (acquireWithSignal)
   1.74x faster than relesae with using

group semaphore concurents tasks (5)
relesae with using                                      4.29 µs/iter     233,092.6     (3.54 µs … 7.67 µs) 4.3 µs 7.67 µs 7.67 µs
relesae with using (acquireWithSignal)                   4.2 µs/iter     238,120.6      (3.4 µs … 5.09 µs) 4.31 µs 5.09 µs 5.09 µs
relesae with [Symbol.dispose]                           2.72 µs/iter     367,256.7     (2.19 µs … 3.46 µs) 2.85 µs 3.46 µs 3.46 µs
relesae with [Symbol.dispose] (acquireWithSignal)       2.67 µs/iter     374,526.2     (2.38 µs … 3.09 µs) 2.81 µs 3.09 µs 3.09 µs

summary
  relesae with [Symbol.dispose] (acquireWithSignal)
   1.02x faster than relesae with [Symbol.dispose]
   1.57x faster than relesae with using (acquireWithSignal)
   1.61x faster than relesae with using
  • Bench updated

@PandaWorker
Copy link

The semaphore implementation on Set will work in much the same way as on Deque, as long as there are few elements. As soon as there are a lot of expectations, there is a significant performance drawdown, since Set hashes the element itself when adding and searching for an element, and Deque immediately gives the first element

cpu: Apple M1 Max
runtime: deno 1.45.5 (aarch64-apple-darwin)

file:///Users/panda/Documents/deno/sync/semaphore.bench.ts
benchmark                                                      time (avg)        iter/s             (min … max)       p75       p99      p995
--------------------------------------------------------------------------------------------------------------- -----------------------------

group semaphore acquire
relesae with using                                            631.12 ns/iter   1,584,480.8 (456.72 ns … 881.66 ns) 657.79 ns 881.66 ns 881.66 ns
relesae with using [acquireWithSignal]                        584.76 ns/iter   1,710,112.1 (474.36 ns … 929.88 ns) 627.16 ns 929.88 ns 929.88 ns
relesae with [Symbol.dispose]                                  373.9 ns/iter   2,674,518.7 (292.75 ns … 459.38 ns) 421.79 ns 457.5 ns 459.38 ns
relesae with [Symbol.dispose] (Deque)                         417.88 ns/iter   2,393,042.3  (320.25 ns … 512.5 ns) 466.82 ns 508.17 ns 512.5 ns
relesae with [Symbol.dispose] [acquireWithSignal]                422 ns/iter   2,369,692.6   (294.93 ns … 2.37 µs) 424.22 ns 1.82 µs 2.37 µs
relesae with [Symbol.dispose] [acquireWithSignal] (Deque)     409.78 ns/iter   2,440,341.3   (302.27 ns … 2.41 µs) 419.23 ns 1.51 µs 2.41 µs

summary
  relesae with [Symbol.dispose]
   1.1x faster than relesae with [Symbol.dispose] [acquireWithSignal] (Deque)
   1.12x faster than relesae with [Symbol.dispose] (Deque)
   1.13x faster than relesae with [Symbol.dispose] [acquireWithSignal]
   1.56x faster than relesae with using [acquireWithSignal]
   1.69x faster than relesae with using

group semaphore concurents tasks (5)
relesae with using                                              3.84 µs/iter     260,238.5     (3.23 µs … 4.71 µs) 4.07 µs 4.71 µs 4.71 µs
relesae with using [acquireWithSignal]                          3.91 µs/iter     255,575.8     (3.01 µs … 5.06 µs) 4.18 µs 5.06 µs 5.06 µs
relesae with [Symbol.dispose]                                   2.82 µs/iter     354,830.3     (2.31 µs … 3.17 µs) 2.92 µs 3.17 µs 3.17 µs
relesae with [Symbol.dispose] (Deque)                           2.49 µs/iter     402,009.5      (1.9 µs … 2.71 µs) 2.66 µs 2.71 µs 2.71 µs
relesae with [Symbol.dispose] [acquireWithSignal]               2.72 µs/iter     368,041.4     (2.22 µs … 3.41 µs) 3.01 µs 3.41 µs 3.41 µs
relesae with [Symbol.dispose] [acquireWithSignal] (Deque)       2.53 µs/iter     395,288.0     (2.01 µs … 2.91 µs) 2.71 µs 2.91 µs 2.91 µs

summary
  relesae with [Symbol.dispose] (Deque)
   1.02x faster than relesae with [Symbol.dispose] [acquireWithSignal] (Deque)
   1.09x faster than relesae with [Symbol.dispose] [acquireWithSignal]
   1.13x faster than relesae with [Symbol.dispose]
   1.54x faster than relesae with using
   1.57x faster than relesae with using [acquireWithSignal]

group semaphore concurents tasks (10_000)
relesae with using                                             34.14 ms/iter          29.3  (25.57 ms … 112.56 ms) 32.61 ms 112.56 ms 112.56 ms
relesae with using [acquireWithSignal]                         44.94 ms/iter          22.3    (26.4 ms … 119.2 ms) 56.11 ms 119.2 ms 119.2 ms
relesae with [Symbol.dispose]                                  27.16 ms/iter          36.8   (21.55 ms … 58.43 ms) 28.34 ms 58.43 ms 58.43 ms
relesae with [Symbol.dispose] (Deque)                            6.2 ms/iter         161.3    (4.62 ms … 15.66 ms) 6.61 ms 15.66 ms 15.66 ms
relesae with [Symbol.dispose] [acquireWithSignal]              25.95 ms/iter          38.5   (21.04 ms … 34.29 ms) 28.16 ms 34.29 ms 34.29 ms
relesae with [Symbol.dispose] [acquireWithSignal] (Deque)        5.8 ms/iter         172.4    (3.65 ms … 18.56 ms) 6.66 ms 18.56 ms 18.56 ms

summary
  relesae with [Symbol.dispose] [acquireWithSignal] (Deque)
   1.07x faster than relesae with [Symbol.dispose] (Deque)
   4.47x faster than relesae with [Symbol.dispose] [acquireWithSignal]
   4.68x faster than relesae with [Symbol.dispose]
   5.88x faster than relesae with using
   7.75x faster than relesae with using [acquireWithSignal]

group semaphore concurents tasks (50_000)
relesae with using                                            687.02 ms/iter           1.5  (650.5 ms … 739.51 ms) 695.77 ms 739.51 ms 739.51 ms
relesae with using [acquireWithSignal]                        702.04 ms/iter           1.4 (591.96 ms … 835.46 ms) 787.13 ms 835.46 ms 835.46 ms
relesae with [Symbol.dispose]                                 610.42 ms/iter           1.6  (574.81 ms … 643.3 ms) 629.11 ms 643.3 ms 643.3 ms
relesae with [Symbol.dispose] (Deque)                          66.74 ms/iter          15.0    (57.17 ms … 81.5 ms) 68.79 ms 81.5 ms 81.5 ms
relesae with [Symbol.dispose] [acquireWithSignal]             628.18 ms/iter           1.6 (549.56 ms … 676.17 ms) 671.43 ms 676.17 ms 676.17 ms
relesae with [Symbol.dispose] [acquireWithSignal] (Deque)      85.02 ms/iter          11.8  (65.58 ms … 105.92 ms) 92.25 ms 105.92 ms 105.92 ms

summary
  relesae with [Symbol.dispose] (Deque)
   1.27x faster than relesae with [Symbol.dispose] [acquireWithSignal] (Deque)
   9.15x faster than relesae with [Symbol.dispose]
   9.41x faster than relesae with [Symbol.dispose] [acquireWithSignal]
   10.29x faster than relesae with using
   10.52x faster than relesae with using [acquireWithSignal]

Copy link

@mdekstrand mdekstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the double-release issue.

*
* @returns A Promise with Disposable that releases the mutex when disposed.
*/
acquire(): Promise<Disposable> & Disposable {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this seems to enable an interesting error case — it is possible to release the same lock twice.

using _pending = sem.acquire(); // will dispose the disposable promise
// some stuff
using _lock = await pending; // the inner lock will *also* be disposed

Now, that seems unusual and I don't think people should do it, but making both the promise value and the promise itself Disposable allows for it. ISTM that the intent here is to allow a pending lock attempt to be cancelled, but double-release protection of some kind is needed (or "cancel" and "release" should be different actions).

Copy link

@PandaWorker PandaWorker Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why such a design and order is needed at all

What's the problem with using it like this?
Yes, and this does not solve the problem of unnecessary calls using

If you need protection against the repeated return of a single lock, then you only need to set the Disposable state to the object

import { scheduler } from 'node:timers/promises';
import { Semaphore } from "./semaphore.ts";

const sem = new Semaphore(1);

async function doSome(obj: {text: string}) {
	const pendingLock = sem.acquire();

         using lock = await pendingLock;

	// do task ...
	await scheduler.wait(1000);

	using _lock = await pendingLock;
	//maybe throw 
	obj.text += 'hello '

	using _lock2 = await pendingLock;
	//maybe throw 
	obj.text += 'world'

	return obj;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I misunderstood the question

Copy link

@PandaWorker PandaWorker Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do it like this

	acquire(): Promise<Disposable> {
		const disposable = {
			[kDisposed]: false,
                         release: () => this.release(),
			[Symbol.dispose]() {
				if (!this[kDisposed]) {
					console.log('release');
					this.release();
					this[kDisposed] = true;
				}
			},
		};
		
		if (this.#value > 0) {
			this.#value--;
			return Promise.resolve(disposable);
		}

		const waiter = Promise.withResolvers<void>();
		this.#waiters.add(waiter);

		return waiter.promise.then(() => disposable);
	}

@lambdalisue
Copy link
Member Author

lambdalisue commented Aug 15, 2024

It's not entirely clear what Object.assign is used for. When I measured performance, the version with Object.assign was slower because additional instructions were executed, which in principle are not needed at all

Explain to me why to bind the disposing methods to the promise itself?

return Object.assign(Promise.resolve(disposable), disposable);
return Object.assign(waiter.promise.then(() => disposable), disposable);

The current RwLock implementation relies on that. (And this is internal things that is not documented so we can change impl)

@lambdalisue
Copy link
Member Author

lambdalisue commented Aug 15, 2024

While there is no native using feature in v8, you should not use it in places where it can harm performance and in libraries

I see. I haven't considered that. Now, I think we need to add release() for convenience

It seems Node / Bun already support using. Could you explain what "no native using feature in v8" mean?

@PandaWorker
Copy link

No runtime has native support for using, check state tc39 yet https://github.com/tc39/proposal-explicit-resource-management

At the moment, the conversion of using to a js implementation is being used, which is not very optimized.

For more than a year, runtimes have been preparing for the appearance of using in native form in v8.

I took measurements on all runtimes of Bun, Deno and Node(tsx), everywhere using led to a significant deterioration in performance

@PandaWorker
Copy link

Look at what it turns into when you build
From here you will get the answer to the question why there is no native support and performance drawdown due to unnecessary abstractions

source:

const disposable = {
	data: '123',
	[Symbol.dispose](){
		console.log('disposed')
	}
}

{
	using a = disposable;
	console.log(a);
}

bundle:

function _using_ctx() {
    var _disposeSuppressedError = typeof SuppressedError === "function" ? SuppressedError : function(error, suppressed) {
        var err = new Error();
        err.name = "SuppressedError";
        err.suppressed = suppressed;
        err.error = error;
        return err;
    }, empty = {}, stack = [];
    function using(isAwait, value) {
        if (value != null) {
            if (Object(value) !== value) {
                throw new TypeError("using declarations can only be used with objects, functions, null, or undefined.");
            }
            if (isAwait) {
                var dispose = value[Symbol.asyncDispose || Symbol.for("Symbol.asyncDispose")];
            }
            if (dispose == null) {
                dispose = value[Symbol.dispose || Symbol.for("Symbol.dispose")];
            }
            if (typeof dispose !== "function") {
                throw new TypeError(`Property [Symbol.dispose] is not a function.`);
            }
            stack.push({
                v: value,
                d: dispose,
                a: isAwait
            });
        } else if (isAwait) {
            stack.push({
                d: value,
                a: isAwait
            });
        }
        return value;
    }
    return {
        e: empty,
        u: using.bind(null, false),
        a: using.bind(null, true),
        d: function() {
            var error = this.e;
            function next() {
                while(resource = stack.pop()){
                    try {
                        var resource, disposalResult = resource.d && resource.d.call(resource.v);
                        if (resource.a) {
                            return Promise.resolve(disposalResult).then(next, err);
                        }
                    } catch (e) {
                        return err(e);
                    }
                }
                if (error !== empty) throw error;
            }
            function err(e) {
                error = error !== empty ? new _disposeSuppressedError(error, e) : e;
                return next();
            }
            return next();
        }
    };
}
const disposable = {
    data: '123',
    [Symbol.dispose] () {
        console.log('disposed');
    }
};
{
    try {
        var _usingCtx = _using_ctx();
        const a = _usingCtx.u(disposable);
        console.log(a);
    } catch (_) {
        _usingCtx.e = _;
    } finally{
        _usingCtx.d();
    }
}

@PandaWorker
Copy link

There's no point arguing, make benchmarks for both versions and you'll see for yourself.

@lambdalisue
Copy link
Member Author

I’ve already decided to remove using from internal implementations due to performance concerns. Now, I’m considering whether or not to add a release() method.

I’m aware that using is still at stage 3, but if most platforms already support it, I think it would be better NOT to expose release(). If users are concerned about performance, they can use lock[System.dispose]() instead.

@PandaWorker
Copy link

PandaWorker commented Aug 16, 2024

OK, you don't want to open the method outside, then you can look at this option.
I think this implementation will be convenient for everyone

Example usage

const sem = new Semaphore(1);

// release with using
{
	using _lock = await sem.acquire();
}

// release with lock.release
{
	const lock = await sem.acquire();

	try {
		//do
	} finally {
		lock.release();
	}
}

// release with lock[Symbol.dispose]
{
	const lock = await sem.acquire();

	try {
		//do
	} finally {
		lock[Symbol.dispose]();
	}
}

// 
{
	const lockPromise = sem.acquire();

	try {
		// do

		const lock = await lockPromise;

		await asyncTask().finally(() => lock.release());
	} finally {
		await lockPromise.then((lock) => lock.release());
	}
}
export class Semaphore {
	#waiters = new Set<PromiseWithResolvers<void>>();
	#value: number;

	get locked() {
		return this.#value === 0;
	}

	get waiters() {
		return this.#waiters.size;
	}

	constructor(value: number) {
		this.#value = value;
	}

	acquire(): Promise<{ release: () => void } & Disposable> {
		let disposed = false;
		const release = () => {
			if (!disposed) {
				this.#release();
				disposed = true;
			}
		}

		const disposable = {
			release: release,
			[Symbol.dispose]: release,
		};

		if (this.#value > 0) {
			this.#value--;
			return Promise.resolve(disposable);
		}

		const waiter = Promise.withResolvers<void>();
		this.#waiters.add(waiter);

		return waiter.promise.then(() => disposable);
	}

	#release() {
		if (this.#waiters.size > 0) {
			const waiters = this.#waiters;
			const [waiter] = waiters.keys();

			waiters.delete(waiter);
			waiter.resolve();
		} else {
			this.#value++;
		}
	}
}

@lambdalisue
Copy link
Member Author

I'll remove using anyway due to performance concern in Node but it seems using doesn't affect the performance in Deno. Just fyi.

benchmark      time (avg)        iter/s             (min … max)       p75       p99      p995
--------------------------------------------------------------- -----------------------------

group Lock#lock
v1.0.0         52.84 ms/iter          18.9   (45.86 ms … 57.58 ms) 55.68 ms 57.58 ms 57.58 ms
main           53.55 ms/iter          18.7   (47.32 ms … 73.59 ms) 55.52 ms 73.59 ms 73.59 ms

summary
  v1.0.0
   1.01x faster than main
diff --git a/lock.ts b/lock.ts
index bf49c90..2b5d955 100644
--- a/lock.ts
+++ b/lock.ts
@@ -43,7 +43,11 @@ export class Lock<T> {
    * @returns A Promise that resolves with the result of the function.
    */
   async lock<R>(fn: (value: T) => R | PromiseLike<R>): Promise<R> {
-    using _lock = await this.#mu.acquire();
-    return await fn(this.#value);
+    const lock = await this.#mu.acquire();
+    try {
+      return await fn(this.#value);
+    } finally {
+      lock[Symbol.dispose]();
+    }
   }
 }

@PandaWorker
Copy link

I do not know what you have done there, but using cannot work faster than explicitly calling

function getDisposable(): Promise<Disposable> {
	const disposable = {
		[Symbol.dispose]() {},
	};

	return Promise.resolve(disposable);
}

Deno.bench({name: "using", fn: async () => {
	using _lock = await getDisposable();
}});

Deno.bench({name: "Symbol.dispose", fn: async () => {
	const _lock = await getDisposable();
	_lock[Symbol.dispose]();
}})

Deno.bench({name: "try/finally Symbol.dispose", fn: async () => {
	const _lock = await getDisposable();
	
	try {
	//
	} finally {
		_lock[Symbol.dispose]();
	}
}})
cpu: Apple M1 Max
runtime: deno 1.45.5 (aarch64-apple-darwin)

file:///Users/panda/Documents/deno/sync/using2.ts
benchmark                       time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------------------- -----------------------------
using                          659.36 ns/iter   1,516,632.6 (545.24 ns … 839.54 ns) 669.6 ns 839.54 ns 839.54 ns
Symbol.dispose                 412.58 ns/iter   2,423,753.7 (291.03 ns … 539.02 ns) 428.44 ns 531.37 ns 539.02 ns
try/finally Symbol.dispose      415.1 ns/iter   2,409,041.8 (296.53 ns … 528.19 ns) 426.43 ns 446.6 ns 528.19 ns

@PandaWorker
Copy link

I'll remove using anyway due to performance concern in Node but it seems using doesn't affect the performance in Deno. Just fyi.

benchmark      time (avg)        iter/s             (min … max)       p75       p99      p995
--------------------------------------------------------------- -----------------------------

group Lock#lock
v1.0.0         52.84 ms/iter          18.9   (45.86 ms … 57.58 ms) 55.68 ms 57.58 ms 57.58 ms
main           53.55 ms/iter          18.7   (47.32 ms … 73.59 ms) 55.52 ms 73.59 ms 73.59 ms

summary
  v1.0.0
   1.01x faster than main
diff --git a/lock.ts b/lock.ts
index bf49c90..2b5d955 100644
--- a/lock.ts
+++ b/lock.ts
@@ -43,7 +43,11 @@ export class Lock<T> {
    * @returns A Promise that resolves with the result of the function.
    */
   async lock<R>(fn: (value: T) => R | PromiseLike<R>): Promise<R> {
-    using _lock = await this.#mu.acquire();
-    return await fn(this.#value);
+    const lock = await this.#mu.acquire();
+    try {
+      return await fn(this.#value);
+    } finally {
+      lock[Symbol.dispose]();
+    }
   }
 }

I measured lock on my implementation and my version wins by 10-15% than the implementation using using or try/finally

	async lock<R>(fn: () => R | PromiseLike<R>): Promise<R> {
		const lock = await this.acquire();
		return await Promise.resolve(fn()).finally(() => lock[Symbol.dispose]());
	}
	
	async lockTry<R>(fn: () => R | PromiseLike<R>): Promise<R> {
		const lock = await this.acquire();
		try {
			return await fn();
		} finally {
			lock.release();
		}
	}

	async usingLock<R>(fn: () => R | PromiseLike<R>): Promise<R> {
		using _lock = await this.acquire();
		return await fn()
	}
cpu: Apple M1 Max
runtime: deno 1.45.5 (aarch64-apple-darwin)

file:///Users/panda/Documents/deno/sync/using2.ts
benchmark                                time (avg)        iter/s             (min … max)       p75       p99      p995
----------------------------------------------------------------------------------------- -----------------------------
lock release with Symbol.dispose        536.57 ns/iter   1,863,705.3  (446.88 ns … 694.3 ns) 570.59 ns 692.39 ns 694.3 ns
lockTry release with Symbol.dispose      697.2 ns/iter   1,434,314.2    (506.6 ns … 2.21 µs) 706.4 ns 2.21 µs 2.21 µs
usingLock release with using            704.44 ns/iter   1,419,560.7    (526.5 ns … 1.27 µs) 730.83 ns 1.27 µs 1.27 µs

lambdalisue added a commit that referenced this pull request Aug 16, 2024
It seems `using` in Node is not as performant as expected.
Note that the performance in Deno seems not to be affected by this
change.

  benchmark      time (avg)        iter/s             (min … max)       p75       p99      p995
  --------------------------------------------------------------- -----------------------------

  group Lock#lock
  v1.0.0         52.84 ms/iter          18.9   (45.86 ms … 57.58 ms) 55.68 ms 57.58 ms 57.58 ms
  main           53.55 ms/iter          18.7   (47.32 ms … 73.59 ms) 55.52 ms 73.59 ms 73.59 ms

  summary
    v1.0.0
    1.01x faster than main

See #31 for detail
@lambdalisue lambdalisue mentioned this pull request Aug 16, 2024
@lambdalisue
Copy link
Member Author

lambdalisue commented Aug 16, 2024

I re-implemented and continued on #34.

Please give me comments on that PR @PandaWorker @mdekstrand

lambdalisue added a commit that referenced this pull request Aug 16, 2024
It seems `using` in Node is not as performant as expected.
Note that the performance in Deno seems not to be affected by this
change.

  benchmark      time (avg)        iter/s             (min … max)       p75       p99      p995
  --------------------------------------------------------------- -----------------------------

  group Lock#lock
  v1.0.0         52.84 ms/iter          18.9   (45.86 ms … 57.58 ms) 55.68 ms 57.58 ms 57.58 ms
  main           53.55 ms/iter          18.7   (47.32 ms … 73.59 ms) 55.52 ms 73.59 ms 73.59 ms

  summary
    v1.0.0
    1.01x faster than main

See #31 for detail
lambdalisue added a commit that referenced this pull request Aug 16, 2024
It seems `using` in Node is not as performant as expected.
Note that the performance in Deno seems not to be affected by this
change.

  benchmark      time (avg)        iter/s             (min … max)       p75       p99      p995
  --------------------------------------------------------------- -----------------------------

  group Lock#lock
  v1.0.0         52.84 ms/iter          18.9   (45.86 ms … 57.58 ms) 55.68 ms 57.58 ms 57.58 ms
  main           53.55 ms/iter          18.7   (47.32 ms … 73.59 ms) 55.52 ms 73.59 ms 73.59 ms

  summary
    v1.0.0
    1.01x faster than main

See #31 for detail
@lambdalisue lambdalisue deleted the performance-improvement branch August 17, 2024 02:08
lambdalisue added a commit that referenced this pull request Aug 17, 2024
It seems `using` in Node is not as performant as expected.
Note that the performance in Deno seems not to be affected by this
change.

  benchmark      time (avg)        iter/s             (min … max)       p75       p99      p995
  --------------------------------------------------------------- -----------------------------

  group Lock#lock
  v1.0.0         52.84 ms/iter          18.9   (45.86 ms … 57.58 ms) 55.68 ms 57.58 ms 57.58 ms
  main           53.55 ms/iter          18.7   (47.32 ms … 73.59 ms) 55.52 ms 73.59 ms 73.59 ms

  summary
    v1.0.0
    1.01x faster than main

See #31 for detail
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants