-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maxAge should be stored in file instead of setTimeout, so cache is properly invalidated across several runs #289
Comments
Oof... I thought I had found a library that met my needs (memoize fns + ttl disk cache), but this makes the library kinda useless :/ I'd say you can't really claim that the library has a |
Hi @artsiommiksiuk, thanks for creating the issue. I think your use case makes sense and we should cover it. Though I think there is no need in "storing records age along with the data", because each cache file already includes meta data (including creation and modification time) which can be accessed using the file system API, specifically using fs.stat or fs.statSync. That being said, I cannot say when I will have time to work on this (I'm currently quite busy and I do need some code free time). I don't think the feature is difficult to implement, it just needs to be implemented in a way that it doesn't break existing projects (where a @vincerubinetti Sorry, but your comment is not helpful. |
Let me elaborate then. The first line of the readme is Node.js solution for memoizing/caching function results on the file system. That's what people are coming to this library for; otherwise there are several other more popular options ( First thing I did when finding this library was look for "disk" and "max age", and I saw both in the readme with no caveats. It's lucky I looked at the issues before I spent time installing, configuring, and debugging why it wasn't working. So here is my recommendation: # maxAge (**in-memory only**)
With `maxAge` option you can ensure that cache for given call is cleared after a predefined period of time (in milliseconds).
Note that the cache for this call is only cleared after `maxAge` milliseconds since its _first write call in the current Node process_.
If you restart your Node process, the expiry countdown will start again after the first write. Looking at the code, I think that's correct. Not sure what happens when something has a For anyone looking for 1) persistent disk cache, 2) single cache file to not pollute file system/git diff, 3) function wrapper memoizer, and 4) per-function time-to-live, here is the simple solution I came up with since none of the available libraries met all those requirements: codeimport { createHash } from "crypto";
import { existsSync, readFileSync, unlinkSync, writeFileSync } from "fs";
import { log } from "@/util/log";
type Options = {
/** ttl (in seconds) */
maxAge?: number;
};
/** on-disk cache */
const cachePath = "./util/cache.json";
export const memoize =
<Args extends unknown[], Return>(
func: (...args: Args) => Promise<Return>,
options: Options = {},
) =>
async (...args: Args) => {
/** set options defaults */
options.maxAge ??= 24 * 60 * 60;
/** get cache key unique to function and arguments */
const key = func.name + func.toString() + JSON.stringify(args);
/** make shorter unique cache key */
const hash = createHash("md5").update(key).digest("hex");
/** possible cache hit */
const cached = cache[hash];
/** current timestamp */
const now = Date.now();
/** is cached entry expired */
const expired = now - (cached?.timestamp ?? 0) > options.maxAge * 1000;
/** return value */
let result: Return;
/** if cached value valid */
if (cached && !expired) {
log(`Using memoized return value for ${func.name}()`, "secondary");
/** use cached value */
result = cached.data as Return;
} else {
/** run function */
result = await func(...args);
/** set cache */
cache[hash] = { timestamp: now, data: result };
}
/** update cache */
saveCache();
return result;
};
/** global in-memory cache */
let cache: Cache = {};
/** cache store */
type Cache = Record<
string,
{
timestamp: number;
data: unknown;
}
>;
if (process.env.NOCACHE) {
/** clear disk cache */
unlinkSync(cachePath);
} else {
/** load disk cache into memory cache */
if (existsSync(cachePath))
cache = JSON.parse(readFileSync(cachePath, "utf-8")) as Cache;
}
/** save memory cache to disk cache */
const saveCache = () => writeFileSync(cachePath, JSON.stringify(cache)); Then using it: await memoize(someFunction)(arg1, arg2); |
Okay, then I'll add |
I think replacing the current
One thing we should pay attention to is the performance implication associated with this approach, since we would now need to access filesystem metadata. One way to reduce system calls and improve performance would be to use Alternatively we could store a timestamp along with the data as @vincerubinetti has done it in his code example above (thanks for the input btw). But to me this feels like we are saving redundant data. Also, I wouldn't like to change the current data format because it would be a breaking change. @artsiommiksiuk If you like you can give it a try. |
@artsiommiksiuk @vincerubinetti I have created a PR that includes my proposed changes and published a release candidate to npm. Please install via |
@borisdiakur, thank you! I'm still busy with a lot of other issues I need to solve, but I'm going to try as soon as I have time. |
@borisdiakur , I've launched it on my system and code base (without even removing old cache) and it worked. Thanks! |
Thanks a lot @artsiommiksiuk ! |
My use case and confusion. I'm using memoize-fs to acquire jwt tokens for tests. Tokens living 1 hours, and their acquisition takes about 20 seconds, so memoize-fs effectively cutting those 20 seconds of from every test run.
maxAge is a timer based cache invalidation, however, because I'm using it to cache results across runs, I need invalidation happen based on the age of the cached data, not when memoization function was created.
This is at least to say confusion point, as documentation doesn't say anything about it. And I'd say, it's makes sense to store records age along with the data and verify this age on pull from disk. In this case maxAge would behave consistently with expectations (expectation is age of the data, not the memoization function instance in a process).
The text was updated successfully, but these errors were encountered: