Skip to content

Article: options and tradeoffs around data import parameters #1176

Open
@lidel

Description

@lidel

Most people are ok with whatever chunker and hash function is the current default in commands that import data to IPFS.
In case of go-ipfs, these are ipfs add, ipfs dag put, and ipfs block put.

However, one can not only use custom --chunker and --hash function when doing ipfs add, but also choose to produce TrickleDAG instead of MErkleDAG by passing --trickle, enable or disable --raw-leaves, or even write own software that chunks and hashes and assembles UnixFS DAG in novel ways.

One can go beyond that, and import a JSON data as dag-json or dag-cbor, creating data structures beyond regular files and directories.

We need an article that explains:

  • what is the current default when importing files and why
    • chunker (why we use size-based, when to use rabin or buzzhash)
    • hash (why we use sha2-256)
    • raw leaves (possible and default when cidv1 is used, but legacy implementations used cidv0 without this)
    • cid version
      • we should document cid v1 as the default, but note that legacy implementations may use v0
    • dag type ( --trickle better suited for append-only data such as logs?)
  • what are the knobs one can change during import, and what is their impact/tradeoffs
  • things to hitn at, but no need to go to deep
    • note dag-pb alternatives exist, mention dag-json and dag-cbor, and hint when using non-Unixfs DAGs make sense

Prior art:

Metadata

Metadata

Assignees

Labels

need/author-inputNeeds input from the original authorneed/triageNeeds initial labeling and prioritizationstatus/blockedUnable to be worked further until needs are met

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions