Wrong encoding when calculating hash #93

retog · 2021-07-12T18:10:13Z

on
https://github.com/ssb-js/ssb-keys/blob/155db5133566fa7bc3ac1845291a3bf56503a635/util.js#L7
it defaults to the encoding "binary" which, according to https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings, is a legacy alias for "latin1".

The code should encode the strings as utf-8 instead which is the web standard and the only encoding supported by modern js implementations.

staltz · 2021-07-12T18:38:36Z

Unfortunately we can't change this without changing the SSB protocol, i.e. causing a breaking change.

arj03 · 2021-07-12T18:40:19Z

This is one of the reasons people have been wanting a new feed format ;-)

retog · 2021-07-12T20:04:08Z

Unfortunately we can't change this without changing the SSB protocol, i.e. causing a breaking change.

The SSB Protocol as described/defined in the protocol-guide does not mandate iso-8859-1 encoding.

The references python implementation explicitly uses UTF-8:

https://github.com/pferreir/pyssb/blob/975467030a6deeae6c5078ff10d90949e9adca56/ssb/feed/models.py#L75

        return dumps(self.to_dict(add_signature=add_signature), indent=2).encode('utf-8')

One of the linked nodejs examples uses UTF-8 implicitly:

https://github.com/ssb-js/ssb-keys/blob/155db5133566fa7bc3ac1845291a3bf56503a635/index.js#L118

So I think the error is with this implementation and not the protocol. The protocol document could be more explicit and mandate the standard utf-8 encoding, but using an encoding that encodes only a tiny subset of the json character set and of the characters the protocol explicitly allows for the type key, isn't a justifiable choice for a protocol implementation.

retog · 2021-07-12T20:15:45Z

This is one of the reasons people have been wanting a new feed format ;-)

Another reason would be to treat JSON as per the RFC rather than as an ordered collection of key/value pairs. Also, a standard for a signed tamper-proof feed could be used beyond Scuttlebutt. However, this is not a reason not to implement Scuttlebutt correctly according to a charitable interpretation of the spec.

christianbundy · 2021-07-12T20:59:35Z

The SSB Protocol as described/defined in the protocol-guide does not mandate iso-8859-1 encoding.

I hate to say this, but that means there's a bug in the protocol guide. We all agree that these warts in the reference implementation are annoying and gross and we'd love to remove them, but doing so would break compatibility with the existing SSB network. There are feed formats that are vastly more reasonable than this one (example), but The Feed Format That Most People Use is still the one with the warts.

The references python implementation explicitly uses UTF-8:

That implementation is, unfortunately, incorrect. The readme says "Please, don't use this for anything that is not experimental. This is a first attempt at implementing the main functionality needed to run an SSB client/server.".

However, this is not a reason not to implement Scuttlebutt correctly according to a charitable interpretation of the spec.

The reference implementation came long before the spec, so the specification is descriptive rather than prescriptive. A "correct" implementation of Scuttlebutt is [unfortunately] one that agrees with the reference implementation. If you're interested, I've put together an SSB validation dataset that aims to collect the majority of these odd edge-cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong encoding when calculating hash #93

Wrong encoding when calculating hash #93

retog commented Jul 12, 2021

staltz commented Jul 12, 2021

arj03 commented Jul 12, 2021

retog commented Jul 12, 2021

retog commented Jul 12, 2021

christianbundy commented Jul 12, 2021

Wrong encoding when calculating hash #93

Wrong encoding when calculating hash #93

Comments

retog commented Jul 12, 2021

staltz commented Jul 12, 2021

arj03 commented Jul 12, 2021

retog commented Jul 12, 2021

retog commented Jul 12, 2021

christianbundy commented Jul 12, 2021