Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong encoding when calculating hash #93

Open
retog opened this issue Jul 12, 2021 · 5 comments
Open

Wrong encoding when calculating hash #93

retog opened this issue Jul 12, 2021 · 5 comments

Comments

@retog
Copy link

retog commented Jul 12, 2021

on
https://github.com/ssb-js/ssb-keys/blob/155db5133566fa7bc3ac1845291a3bf56503a635/util.js#L7
it defaults to the encoding "binary" which, according to https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings, is a legacy alias for "latin1".

The code should encode the strings as utf-8 instead which is the web standard and the only encoding supported by modern js implementations.

@staltz
Copy link
Member

staltz commented Jul 12, 2021

Unfortunately we can't change this without changing the SSB protocol, i.e. causing a breaking change.

@arj03
Copy link
Member

arj03 commented Jul 12, 2021

This is one of the reasons people have been wanting a new feed format ;-)

@retog
Copy link
Author

retog commented Jul 12, 2021

Unfortunately we can't change this without changing the SSB protocol, i.e. causing a breaking change.

The SSB Protocol as described/defined in the protocol-guide does not mandate iso-8859-1 encoding.

The references python implementation explicitly uses UTF-8:

https://github.com/pferreir/pyssb/blob/975467030a6deeae6c5078ff10d90949e9adca56/ssb/feed/models.py#L75

        return dumps(self.to_dict(add_signature=add_signature), indent=2).encode('utf-8')

One of the linked nodejs examples uses UTF-8 implicitly:

https://github.com/ssb-js/ssb-keys/blob/155db5133566fa7bc3ac1845291a3bf56503a635/index.js#L118

So I think the error is with this implementation and not the protocol. The protocol document could be more explicit and mandate the standard utf-8 encoding, but using an encoding that encodes only a tiny subset of the json character set and of the characters the protocol explicitly allows for the type key, isn't a justifiable choice for a protocol implementation.

@retog
Copy link
Author

retog commented Jul 12, 2021

This is one of the reasons people have been wanting a new feed format ;-)

Another reason would be to treat JSON as per the RFC rather than as an ordered collection of key/value pairs. Also, a standard for a signed tamper-proof feed could be used beyond Scuttlebutt. However, this is not a reason not to implement Scuttlebutt correctly according to a charitable interpretation of the spec.

@christianbundy
Copy link
Contributor

The SSB Protocol as described/defined in the protocol-guide does not mandate iso-8859-1 encoding.

I hate to say this, but that means there's a bug in the protocol guide. We all agree that these warts in the reference implementation are annoying and gross and we'd love to remove them, but doing so would break compatibility with the existing SSB network. There are feed formats that are vastly more reasonable than this one (example), but The Feed Format That Most People Use is still the one with the warts.

The references python implementation explicitly uses UTF-8:

That implementation is, unfortunately, incorrect. The readme says "Please, don't use this for anything that is not experimental. This is a first attempt at implementing the main functionality needed to run an SSB client/server.".

However, this is not a reason not to implement Scuttlebutt correctly according to a charitable interpretation of the spec.

The reference implementation came long before the spec, so the specification is descriptive rather than prescriptive. A "correct" implementation of Scuttlebutt is [unfortunately] one that agrees with the reference implementation. If you're interested, I've put together an SSB validation dataset that aims to collect the majority of these odd edge-cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants