Skip to content
This repository was archived by the owner on Apr 16, 2020. It is now read-only.

Commit 773ec8d

Browse files
author
Matt Zumwalt
committed
breaks up large datasets instructions into multiple pages
1 parent d4a25bb commit 773ec8d

File tree

3 files changed

+128
-124
lines changed

3 files changed

+128
-124
lines changed
Lines changed: 6 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -1,139 +1,21 @@
11
Instructions for Replicating Large Amounts of Data with Minimal Overhead
22
====
33

4-
<!-- TOC depthFrom:1 depthTo:6 withLinks:1 updateOnSave:1 orderedList:0 -->
4+
## Who should use these instructions?
55

6-
- [Who should use these instructions?](#who-should-use-these-instructions)
7-
- [Instructions for Providers/Sources](#instructions-for-providerssources)
8-
- [TL;DR.](#tldr)
9-
- [Step 1: Use ipfs-pack to serve your datasets](#step-1-use-ipfs-pack-to-serve-your-datasets)
10-
- [Step 2: Get the multiaddr of your node](#step-2-get-the-multiaddr-of-your-node)
11-
- [Instructions for Mirrors](#instructions-for-mirrors)
12-
- [TL;DR.](#tldr)
13-
- [Step 1: Install and Initialize IPFS](#step-1-install-and-initialize-ipfs)
14-
- [Step 2: Turn off the Noisy Bits](#step-2-turn-off-the-noisy-bits)
15-
- [Step 3: Manually Set Up Routing](#step-3-manually-set-up-routing)
16-
- [Step 4: Run the node with auto-routing turned off](#step-4-run-the-node-with-auto-routing-turned-off)
17-
- [Step 5: Pin the Data on Your node](#step-5-pin-the-data-on-your-node)
18-
19-
<!-- /TOC -->
20-
21-
# Who should use these instructions?
22-
23-
These instructions apply to either of these cases:
6+
The instructions here apply to either of these cases:
247
* You are replicating data between a set of known peers.
258
* You want to set up a _seed node_ with a public IP address that everyone can pull the initial copies of the data from.
269

2710
For example, after Jack [downloads all of the data.gov datasets](https://github.com/ipfs/archives/issues/113), he needs a way to let his collaborators replicate the datasets efficiently. Eventually they will all be able to serve the data on the general, distributed network of IPFS peers, but the first priority is to pass complete replicas to the collaborators who've allocated enough storage to hold the entire dataset. Once they've populated those seed nodes, it will be easier to provide the data to the rest of the network.
2811

2912
In order to make this replication process as efficient as possible, we turn off the DHT and peer auto-discovery features. These features are important, amazing parts of IPFS but they slow down the system, eat up a bunch of bandwidth, and they're not needed in a situation where you already know which nodes you'll be communicating with.
3013

31-
# Instructions for Providers/Sources
32-
33-
If you have datasets on your machine and want to serve them over the network, follow these instructions. They will walk you through using ipfs-pack to register and serve your data.
34-
35-
## TL;DR.
36-
37-
If you just want to run the commands without explanation, here's what you need to do. _This assumes that you've already installed [ipfs-pack](https://github.com/ipfs/ipfs-pack)._
38-
39-
It's best if you do this on a machine that has a public IP address.
40-
41-
```
42-
cd /path-to-your/dataset-directory
43-
ipfs-pack make
44-
ipfs-pack serve
45-
```
46-
47-
And then retrieve the Pack Root and the multiaddr for the node you just started. This uses info that was printed out on the console when you ran `ipfs pack serve`. To learn how to retrieve the multiaddr, see [Step 2: Get the multiaddr of your node](#step-2-get-the-multiaddr-of-your-node) Give the multiaddr and Pack Root to the people who are setting up Mirrors.
48-
49-
## Step 1: Use ipfs-pack to serve your datasets
50-
51-
Follow the instructions in [the ipfs-pack tutorial](https://github.com/ipfs/ipfs-pack/blob/master/tutorial/README.md), which covers installing ipfs-pack, initializing a pack, and serving the contents of your pack on the IPFS network.
52-
53-
## Step 2: Get the multiaddr of your node
54-
55-
This works best if you're running ipfs on a machine with a public IP address.
56-
57-
After starting the ipfs node with `ipfs-pack serve`, you will see some info about the node printed on the console. It will look like:
58-
59-
```
60-
verified pack, starting server...
61-
Serving data in this pack...
62-
Peer ID: QmVbXV7mQ5Fs3tYY2Euek5YdkkzcRafUg8qGWvFdgaBMuo
63-
/ip4/127.0.0.1/tcp/58162
64-
/ip4/1.2.3.4/tcp/58162
65-
Pack root is QmRguPt6jHmVMzu1NM8wQmpoymM9UeqDJGXdQyU3GhiPy4
66-
Shared: 0 blocks, 0 B total data uploaded
67-
```
68-
69-
The multiaddr is the public IPv4 address plus the Pack Peer ID, so for the sample output above, your pack's multiaddr would be `/ip4/1.2.3.4/tcp/58162/ipfs/QmVbXV7mQ5Fs3tYY2Euek5YdkkzcRafUg8qGWvFdgaBMuo`
70-
71-
The pack root in this sample is `QmRguPt6jHmVMzu1NM8wQmpoymM9UeqDJGXdQyU3GhiPy4`.
72-
73-
Give that multiaddr to the people who are setting up Mirrors. They will use the multiaddr to bootstrap their network connections based on your node. This will make the connections between your nodes more efficient because they're establishing point-to-point connections with your node.
74-
75-
# Instructions for Mirrors
76-
77-
If you want to mirror data that someone else has published, you can follow these instructions to efficiently replicate the data onto your IPFS node.
78-
79-
## TL;DR.
80-
81-
If you just want to run the commands without explanation, here's what you need to do. _This assumes that you've already [installed ipfs](https://flyingzumwalt.gitbooks.io/decentralized-web-primer/content/install-ipfs/lessons/download-and-install.html) -- you need version 0.4.5 or higher._
82-
83-
Before doing these steps, you need to get the multiaddr of the provider node that you're replicating data from and the pack root of the dataset you're replicating. The multiaddr will look like `/ip4/1.2.3.4/tcp/9999/ipfs/QmIpfsPackPeerId`. The Pack Root hash will look like `QmRguPt6jHmVMzu1NM8wQmpoymM9UeqDJGXdQyU3GhiPy4` In these instructions, these values have been replaced with MULTIADDR-OF-PROVIDER and PACK-ROOT-HASH because they are unique for every node.
84-
85-
```
86-
ipfs init
87-
ipfs config --json Datastore.NoSync true
88-
ipfs config Reprovider.Interval "0"
89-
ipfs bootstrap rm --all
90-
ipfs bootstrap add MULTIADDR-OF-PROVIDER
91-
92-
# then start the daemon without auto-routing:
93-
ipfs daemon --routing=none
94-
95-
# then pin the data on your node
96-
ipfs pin PACK-ROOT-HASH
97-
```
98-
99-
## Step 1: Install and Initialize IPFS
100-
101-
If you have not already installed IPFS, follow the lesson on [Installing and Initializing IPFS](https://flyingzumwalt.gitbooks.io/decentralized-web-primer/content/install-ipfs/) in the Decentralized Web Primer. _You need ipfs version 0.4.5 or higher._
102-
103-
## Step 2: Turn off the Noisy Bits
104-
105-
These arguments will turn off some IPFS features that are important for everyday use of IPFS, but would slow down the process of replicating the datasets. After replicating the datasets, you can turn them back on and restart your ipfs node.
106-
107-
```
108-
ipfs config --json Datastore.NoSync true
109-
ipfs config Reprovider.Interval "0"
110-
```
111-
112-
## Step 3: Manually Set Up Routing
113-
114-
Before starting the daemon, configure your node to connect directly with the main node that's providing your dataset. To do this, you need the multiaddr for that node.
115-
116-
When you run these commands, replace `MULTIADDR-OF-PROVIDER` with the multiaddr you got from the people who are providing the source dataset.
117-
118-
```
119-
ipfs bootstrap rm --all
120-
ipfs bootstrap add MULTIADDR-OF-PROVIDER
121-
```
122-
123-
## Step 4: Run the node with auto-routing turned off
124-
125-
Make sure you've manually configured routing before you start the daemon (see previous step).
126-
127-
Start the ipfs daemon with auto-routing turned off:
14+
## Different Instructions for Sources and Mirrors
12815

129-
```
130-
ipfs daemon --routing=none
131-
```
16+
The instructions here depend on whether you are **providing** new data onto the ipfs network or seeking to replicate and **mirror** data that someone else has already provided onto the network.
13217

133-
## Step 5: Pin the Data on Your node
18+
**Sources/Providers**: If you have datasets on your machine and want to add them to ipfs so you can serve them over the network, follow the [Instructions for Providers/Sources Publishing Large Datasets](providers-instructions.md). They will walk you through using ipfs-pack to register and serve your data.
13419

135-
Now you're ready to replicate the data by pinning it onto your ipfs node. To do this, you need the Root Hash of the datasets. You need to get that PACK-ROOT-HASH from the people who are providing the dataset.
13620

137-
```
138-
ipfs pin PACK-ROOT-HASH
139-
```
21+
**Mirrors**: If you want to mirror data that someone else has published, you can follow [Instructions for Mirrors Replicating Large Datasets](mirrors-instructions.md) to efficiently replicate the data onto your IPFS node.
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
Instructions for Mirrors Replicating Large Datasets
2+
======
3+
4+
_This is part of the Instructions for Replicating Large Amounts of Data. Please read the [Overview](README.md) before proceeding._
5+
6+
If you want to mirror data that someone else has published, you can follow these instructions to efficiently replicate the data onto your IPFS node.
7+
8+
By contrast, if you have datasets on your machine and want to add them to ipfs so you can serve them over the network, follow the [Instructions for Providers/Sources Publishing Large Datasets](providers-instructions.md).
9+
10+
## TL;DR.
11+
12+
If you just want to run the commands without explanation, here's what you need to do. _This assumes that you've already [installed ipfs](https://flyingzumwalt.gitbooks.io/decentralized-web-primer/content/install-ipfs/lessons/download-and-install.html) -- you need version 0.4.5 or higher._
13+
14+
Before doing these steps, you need to get the multiaddr of the provider node that you're replicating data from and the pack root of the dataset you're replicating. The multiaddr will look like `/ip4/1.2.3.4/tcp/9999/ipfs/QmIpfsPackPeerId`. The Pack Root hash will look like `QmRguPt6jHmVMzu1NM8wQmpoymM9UeqDJGXdQyU3GhiPy4` In these instructions, these values have been replaced with MULTIADDR-OF-PROVIDER and PACK-ROOT-HASH because they are unique for every node.
15+
16+
```
17+
ipfs init
18+
ipfs config --json Datastore.NoSync true
19+
ipfs config Reprovider.Interval "0"
20+
ipfs bootstrap rm --all
21+
ipfs bootstrap add MULTIADDR-OF-PROVIDER
22+
23+
# then start the daemon without auto-routing:
24+
ipfs daemon --routing=none
25+
26+
# then pin the data on your node
27+
ipfs pin PACK-ROOT-HASH
28+
```
29+
30+
## Step 1: Install and Initialize IPFS
31+
32+
If you have not already installed IPFS, follow the lesson on [Installing and Initializing IPFS](https://flyingzumwalt.gitbooks.io/decentralized-web-primer/content/install-ipfs/) in the Decentralized Web Primer. _You need ipfs version 0.4.5 or higher._
33+
34+
## Step 2: Turn off the Noisy Bits
35+
36+
These arguments will turn off some IPFS features that are important for everyday use of IPFS, but would slow down the process of replicating the datasets. After replicating the datasets, you can turn them back on and restart your ipfs node.
37+
38+
```
39+
ipfs config --json Datastore.NoSync true
40+
ipfs config Reprovider.Interval "0"
41+
```
42+
43+
## Step 3: Manually Set Up Routing
44+
45+
Before starting the daemon, configure your node to connect directly with the main node that's providing your dataset. To do this, you need the multiaddr for that node.
46+
47+
When you run these commands, replace `MULTIADDR-OF-PROVIDER` with the multiaddr you got from the people who are providing the source dataset.
48+
49+
```
50+
ipfs bootstrap rm --all
51+
ipfs bootstrap add MULTIADDR-OF-PROVIDER
52+
```
53+
54+
## Step 4: Run the node with auto-routing turned off
55+
56+
Make sure you've manually configured routing before you start the daemon (see previous step).
57+
58+
Start the ipfs daemon with auto-routing turned off:
59+
60+
```
61+
ipfs daemon --routing=none
62+
```
63+
64+
## Step 5: Pin the Data on Your node
65+
66+
Now you're ready to replicate the data by pinning it onto your ipfs node. To do this, you need the Root Hash of the datasets. You need to get that PACK-ROOT-HASH from the people who are providing the dataset.
67+
68+
```
69+
ipfs pin PACK-ROOT-HASH
70+
```
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
Instructions for Providers/Sources Publishing Large Datasets
2+
=======
3+
4+
_This is part of the Instructions for Replicating Large Amounts of Data. Please read the [Overview](README.md) before proceeding._
5+
6+
If you have datasets on your machine and want to add them to ipfs so you can serve them over the network, follow these instructions. They will walk you through using ipfs-pack to register and serve your data.
7+
8+
By contrast, if you want to mirror data that someone else has published, you can follow [Instructions for Mirrors Replicating Large Datasets](mirrors-instructions.md) to efficiently replicate the data onto your IPFS node.
9+
10+
## TL;DR.
11+
12+
If you just want to run the commands without explanation, here's what you need to do. _This assumes that you've already installed [ipfs-pack](https://github.com/ipfs/ipfs-pack)._
13+
14+
It's best if you do this on a machine that has a public IP address.
15+
16+
```
17+
cd /path-to-your/dataset-directory
18+
ipfs-pack make
19+
ipfs-pack serve
20+
```
21+
22+
And then retrieve the Pack Root and the multiaddr for the node you just started. This uses info that was printed out on the console when you ran `ipfs pack serve`. To learn how to retrieve the multiaddr, see [Step 2: Get the multiaddr of your node](#step-2-get-the-multiaddr-of-your-node) Give the multiaddr and Pack Root to the people who are setting up Mirrors.
23+
24+
## Step 1: Use ipfs-pack to serve your datasets
25+
26+
Follow the instructions in [the ipfs-pack tutorial](https://github.com/ipfs/ipfs-pack/blob/master/tutorial/README.md), which covers installing ipfs-pack, initializing a pack, and serving the contents of your pack on the IPFS network.
27+
28+
## Step 2: Get the multiaddr of your node
29+
30+
This works best if you're running ipfs on a machine with a public IP address.
31+
32+
After starting the ipfs node with `ipfs-pack serve`, you will see some info about the node printed on the console. It will look like:
33+
34+
```
35+
verified pack, starting server...
36+
Serving data in this pack...
37+
Peer ID: QmVbXV7mQ5Fs3tYY2Euek5YdkkzcRafUg8qGWvFdgaBMuo
38+
/ip4/127.0.0.1/tcp/58162
39+
/ip4/1.2.3.4/tcp/58162
40+
Pack root is QmRguPt6jHmVMzu1NM8wQmpoymM9UeqDJGXdQyU3GhiPy4
41+
Shared: 0 blocks, 0 B total data uploaded
42+
```
43+
44+
The multiaddr is the public IPv4 address plus the Pack Peer ID, so for the sample output above, your pack's multiaddr would be `/ip4/1.2.3.4/tcp/58162/ipfs/QmVbXV7mQ5Fs3tYY2Euek5YdkkzcRafUg8qGWvFdgaBMuo`
45+
46+
The pack root in this sample is `QmRguPt6jHmVMzu1NM8wQmpoymM9UeqDJGXdQyU3GhiPy4`. This hash is the content-address for the ipfs-pack that contains your dataset.
47+
48+
## Step 3: Publish the Pack Root Hash and Give the multiaddr to your Mirrors
49+
50+
Give that multiaddr to the people who are setting up Mirrors. They will use the multiaddr to bootstrap their network connections based on your node. This will make the connections between your nodes more efficient because they're establishing point-to-point connections with your node.
51+
52+
The mirrors, and anyone else replicating your dataset, will also need the Pack Root hash. They will use that hash to pin your dataset onto their nodes.

0 commit comments

Comments
 (0)