Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InMemoryKache::getOrPut fails in high concurrency context #239

Open
vfsfitvnm opened this issue Jun 8, 2024 · 3 comments
Open

InMemoryKache::getOrPut fails in high concurrency context #239

vfsfitvnm opened this issue Jun 8, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@vfsfitvnm
Copy link

Hi 😸
I noticed I was getting NullPointerExceptions when using SHA256KeyHasher.
I could narrow it down to InMemoryKache - the following test fails:

    @Test
    fun concurrency() = runTest {
        val cache = InMemoryKache<String, String>(maxSize = 1)

        (0..100000).map(Int::toString).forEach {
            launch { cache.getOrPut(it) { it }!! }
        }
    }

Stack trace:

java.lang.NullPointerException
	at com.mayakapps.kache.InMemoryKacheTest$concurrency$1$2$1.invokeSuspend(InMemoryKacheTest.kt:36)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
	at kotlinx.coroutines.test.TestDispatcher.processEvent$kotlinx_coroutines_test(TestDispatcher.kt:24)
	at kotlinx.coroutines.test.TestCoroutineScheduler.tryRunNextTaskUnless$kotlinx_coroutines_test(TestCoroutineScheduler.kt:99)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTest$2$1$workRunner$1.invokeSuspend(TestBuilders.kt:322)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
	at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:277)
	at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:95)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:69)
	at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
	at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:48)
	at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
	at kotlinx.coroutines.test.TestBuildersJvmKt.createTestResult(TestBuildersJvm.kt:10)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0(TestBuilders.kt:310)
	at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0(Unknown Source)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0(TestBuilders.kt:168)
	at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0(Unknown Source)
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt.runTest-8Mi8wO0$default(TestBuilders.kt:160)
	at kotlinx.coroutines.test.TestBuildersKt.runTest-8Mi8wO0$default(Unknown Source)
	at com.mayakapps.kache.InMemoryKacheTest.concurrency(InMemoryKacheTest.kt:32)
@vfsfitvnm vfsfitvnm added the bug Something isn't working label Jun 8, 2024
@paulrs
Copy link

paulrs commented Jul 18, 2024

We are seeing similar NPEs in this code.

Uncaught Kotlin exception: kotlin.NullPointerException
    at 0   AppEcc                              0x10587ad97        kfun:kotlin.Throwable#<init>(){} + 95 
    at 1   AppEcc                              0x105874167        kfun:kotlin.Exception#<init>(){} + 87 
    at 2   AppEcc                              0x105874387        kfun:kotlin.RuntimeException#<init>(){} + 87 
    at 3   AppEcc                              0x1058745a7        kfun:kotlin.NullPointerException#<init>(){} + 87 
    at 4   AppEcc                              0x1058af41f        ThrowNullPointerException + 131 
    at 5   AppEcc                              0x105d8f4d7        kfun:com.mayakapps.kache.SHA256KeyHasher.$transformCOROUTINE$0.invokeSuspend#internal + 783 
    at 6   AppEcc                              0x1059afd1f        kfun:kotlin.coroutines.native.internal.BaseContinuationImpl#invokeSuspend(kotlin.Result<kotlin.Any?>){}kotlin.Any?-trampoline + 67 

SHA256KeyHasher has the following code:

    override suspend fun transform(oldKey: String): String = hashedCache.getOrPut(oldKey) {
        oldKey.encodeUtf8().sha256().hex().uppercase()
    }!! // Since our creation function never returns null, we can use not-null assertion

Notice the comment at the end.

Now look at the following:

    override suspend fun getOrPut(key: K, creationFunction: suspend (key: K) -> V?): V? {
        get(key)?.let { return it }

        creationMutex.withLock {
            if (creationMap[key] == null && map[key] == null) {
                @Suppress("DeferredResultUnused")
                internalPutAsync(key, creationFunction) // <!--- ASYNC
            }
        }

        return get(key) // <!-- DID ASYNC ABOVE FINISH?
    }

I know nothing about coroutines in Kotlin, but I see an async call that I'm guessing is not guaranteed to finish before the return get call at the end of the method, resulting in a nasty race condition where sometimes you'll get a value and other times you will not. This seems extremely bad.

@djehrlich
Copy link

I was able to replicate the crash using the unit test provided above:

    @Test
    fun concurrency() = runTest {
        val cache = InMemoryKache<String, String>(maxSize = 1)

        (0..100000).map(Int::toString).forEach {
            launch { cache.getOrPut(it) { it }!! }
        }
    }

I found that by replacing the second call to get(key) in the getOrPut(…) function with a new function I called unsafeGet(key: K): V, the problem goes away. This is my unsafeGet(…) implementation:

private suspend fun unsafeGet(key: K): V = (getFromCreation(key) ?: getIfAvailable(key))!!

Since unsafeGet(…) is a copy of get(…), but with evictExpired() removed (and a force unwrap added), this suggests to me that evictExpired() may be triggering a race condition. However, my Kotlin experience is limited, so my confidence in this analysis is not particularly high.

As an aside, by calling get(…) twice, evictExpired() is being called twice. Is this really necessary in a single call to getOrPut(…)? Calling evictExpired() twice seems superfluous and possibly a performance issue.

@MSDarwish2000
Copy link
Member

Thank you for reporting the issue and investigating it, and sorry for being late.

I would like to further investigate and try to fix the issue in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants