diff --git a/SWIPs/swip-mantaray-1_0.md b/SWIPs/swip-mantaray-1_0.md new file mode 100644 index 0000000..1c111ab --- /dev/null +++ b/SWIPs/swip-mantaray-1_0.md @@ -0,0 +1,189 @@ +--- +SWIP: +title: Mantaray 1.0 +author: Viktor Levente Tóth (@nugaon) +discussions-to: +status: Draft +type: Core +category (*only required for Standard Track): Core +created: 2021-12-08 +--- + + + +## Simple Summary + +The `mantaray` data-structure is widely used whithin the Swarm ecosystem; +just to mention one, `manifests` are built on the `mantaray` data-structure, which consist of paths and files mappings of all dApps live on Swarm. + +This proposal fixes some design issues of the current `mantaray v0.2` data-structure and additionally provides better metadata handling and makes possible to have reliable web3 services that deal with tree-like data-structures. + +## Abstract + +The objectives that this proposal seeks to achieve: +1. **handling matadata of mantaray node**: currently, by design, only the mantaray forks can have metadata, +2. **polishing mantaray flags**: some flags are not necessary or not clear why they were implemented, +3. **the elements of the mantaray node body have random access**: the forks define their own metadata length, which makes forks to have only sequential access. + +## Motivation + +(Mantaray 0.2 binary format can be found [here](https://github.com/ethersphere/bee/blob/377b43bb835261c074708ea87d09517bffdf7614/pkg/manifest/mantaray/docs/format/node.md)) + +Referring to the list in the Abstract: +1. There is a need to have accessible metadata on the node level about the node; one evidence is [the `/` path hack that also has some problems](https://github.com/ethersphere/bee/issues/2549). With the current metadata handling, it is always necessary to rely on such hacks that pollute the tree structure as well. +2. The `nodeType` encapsulated in the mantaray byte representation consists of the following flags: + - `value`: indicates whether the node's `entry` reference is defined or not. It is a useful flag, nevertheless, if it is zero, then 32 bytes of data is reseved for the `entry` anyway that does not carry any information. + - `edge`: the node has additional forks that are referring to other mantaray nodes. Again, it is useful, but still the dedicated compressed mapping of forks, namely `forksIndexBytes <32 bytes>` is not omitted if it is zero. + - `withPathSeparator`: it is unclear what this flag is used for. + - `withMetadata`: indicates whether the forks have metadata or not. It seems it just saves the 2 bytes `metadataBytesSize` under each fork. The problem here is the metadata length for forks should be the same under one node, but about it in the next point.\ + \ + the `nodeType` is defined on the fork, however, it could be a useful information on the node level as well. + In addition to these, there is `refByteSize`(1 byte) on the node level, that only describes whether the `entry` (32/64 bytes) is an encrypted reference or not which could be a 1 bit flag instead. +3. Because the forks can have different byte length under a node, the fork array does not offer random access. It is a huge problem, not only because the fork deserialisation is not optimal, but also because one cannot provide _inclusion proof_ on forks. Thereby, web3 services cannot work reliably with tree-like data-structures, because the proof would be incredibly expensive on the blockchain for one particular segment of fork data. +If the fork metadata length would be unified and defined by the node for each fork then this issue could be sorted out. + +## Specification + + +The specification elaborates the discrepancies from the current version and the new features. + +### Node + +Node binary format: +```text +┌────────────────────────────────┐ +│ obfuscationKey <32 byte> │ +├────────────────────────────────┤ +│ hash("mantaray:1.0") <31 byte> │ +├────────────────────────────────┤ +│ nodeFeatures <1 byte> │ +├────────────────────────────────┤ +│ entry <32/64 byte> │ +├────────────────────────────────┤ +│ forksIndexBytes <32 byte> │ +├────────────────────────────────┤ +│ ┌────────────────────────────┐ │ +│ │ Fork 1 │ │ +│ ├────────────────────────────┤ │ +│ │ ... │ │ +│ ├────────────────────────────┤ │ +│ │ Fork N │ │ -> where N maximum is 256 +│ └────────────────────────────┘ │ +├────────────────────────────────┤ +│ nodeMetadata │ +└────────────────────────────────┘ +``` + +#### nodeFeatures +- 3 bit flags + 5 bit forkMetadataSegmentSize +- flags + - hasEntry + - whether the node has `entry` field + - encEntry + - if `hasEntry` 0 it cannot be 1 + - the `entry` field is an encrypted reference and 64 bytes long + - edge + - whether the node has additional forks or not + - if 0, then [forkIndexBytes](#forkIndexBytes) is omitted +- forkMetadataSegmentSize + - the first 5 most significant bit in the `nodeFeatures` bitvector + - its `value * the segment size (32)` gives the reserved bytesize for metadata under each forkdata + - the size is only the local maximum for metadata size, it does not apply or define for the whole tree + +In order to get the byte length until the fork array (`nodeHeaderSize`) we have to add 64 (`obfuscationKey (32) + hash("mantaray:1.0") (31) + nodeFeatures (1)`) to the evaluation of the `nodeFeature`'s flags: +- `hasEntry` is true, then + 32 +- `encEntry` is true, then + 32 +- `edge` is true, then + 32 + +#### entry +- depends on the [nodeFeatures](#nodeFeatures) + - if `encEntry` is 1, then it has 64 bytes length + - if `hasEntry` is 0, it is omitted + +#### forkIndexBytes +- 256 long bitvector +- indicates and compresses the first bytes of the fork prefixes that the node has + - e.g. if there are forks with ASCII prefixes `foo` and `bar`, then the 98th and 102nd indices will be 1 of `forkIndexBytes` bitvector. +- depends on the [nodeFeatures](#nodeFeatures) + - if `edge` is 1, then it is 32 byte length and has at least one true bit in the bitvector. + - if `edge` is 0, then it is omitted + +#### Fork + +Fork binary format: +```text +┌───────────────────────────────┬──────────────────────────────┐ +│ prefixLength <1 byte> │ prefix <31 byte> │ +├───────────────────────────────┴──────────────────────────────┤ +│ forkReference <32/64 byte> │ +├──────────────────────────────────────────────────────────────┤ +│ forkMetadata │ +└──────────────────────────────────────────────────────────────┘ +``` + +- prefixLength + - if it is > 31 (`prefix` length) then the mantaray fork's node is a `continuous node` -> on `addFork` it has to be taken account (shorten the child's node prefix by the common part until `prefixLength <= 31`) +- forkReference + - if `encEntry` is 1, then it is 64 bytes long, othetwise it is 32 bytes long. + - it has to be interpreted as another Mantaray reference by default, but this can be changed based on the metadata regarding to the reference interpretation. Nevertheless, this SWIP handles `forkReference` always as a child Mantaray node reference. +- forkMetadata + - in structure the same as [nodeMetadata](#nodeMetadata) + - its length is defined by `forkMetadataSegmentSize * 32 bytes` + +The size of each element (`forkSize`) in the fork array can be calculated as `prefixLength (1 byte) + prefix (31 bytes) + mantarayReference (32/64 bytes) + forkMetadataSegmentSize * 32 bytes`. +The byte length of the fork array (`forkArraySize`) counts all true (1) bits in the `forksIndexBytes` and multiply that with `forkSize`. +The order of the fork array starts from the corresponding fork of the least significant true bit of the `forkIndexBytes`. + +#### nodeMetadata +- it is specified by another data structure + - on which you can also perform inclusion proof on demand + - it can consist of content ID references of metadata on swarm that can be cashed + - such as `Content-Type: index.hml` + - it can describe its own length. +- in Mantaray 1.0 version it can be JSON. +- the starting byte offset is taken by calculating the nodeHeaders + +The remaining bits after forks' data can be utilised for any arbitrary metadata. +By evaluating `nodeFeatures`, the byte length until the fork array is fixed (`nodeHeaderSize`) and then only the `forkSize` has to be added to get the offset for the `nodeMetadata`. + +## Rationale + + +Each separate part of the data-structure has been made for fitting into 32 byte segments in order to perform inclusion proof on that easily. + +The structure of the `nodeMetadata` and the `forkMetadata` are stringified JSON objects as in the current version. +Obviously, it is not the best solution to store metadata effectively, it may require the introduction of a new data-structure and its canonical serialisation format that offer provability. +We are planning to elaborate this on mantaray version `1.1`. + +The `forkMetadata` and the corresponding fork's `nodeMetadata` can be intentionally **different metadata**. At in-memory representation, `nodeMetadata` and `forkMetadata` can be retrieved separately or merged with `forkMetadata` that is superior of `nodeMetadata` at key collision. +The latter feature is useful if, for example, one wants to override the MIME type of an existing mantaray node because of a different way of displaying content. + +Nevertheless, the `forkMetadata` and the corresponding fork's `nodeMetadata` can be intentionally the **same metadata** which is provable by inclusion proof. +In this case, if fork has `nodeMetadata` and it has changed, then the corresponding parent's `forkMetadata` has to be updated __on saving__, and vice versa. + +## Backwards Compatibility + +- instead of the designated `/` mantaray node which is used for setting `website-index-document`, the node can have `nodeMetadata` that consists of the same information +- how to get `nodeType` values on `mantaray:1.0` node + - value + - `hasEntry` flag is 1 + - edge + - `edge` flag is 1 + - withPathSeparator + - couldn't identify what was its role before + - withMetadata + - `nodeMetadata`'s length is greater than 0 OR + - parent node sets its metadata if `forkMetadataSegmentSize` is greater than 0 +- the `addFork` and `removeFork` interfaces remain the same. +- obfuscation key handling method remains the same, but for every new mantaray node should have different `obfuscationKey` than its parent if the `obfuscationKey != zeroBytes` + +## Test Cases + +Unit and integration test can be found in [mantaray v1.0 PR in mantaray-js repository](https://github.com/ethersphere/mantaray-js/pull/30) + +## Implementation + +The full implementation of the proposed features can be found in [mantaray v1.0 PR in mantaray-js repository](https://github.com/ethersphere/mantaray-js/pull/30) + +## Copyright +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).