Skip to content

jarvis08/graph-convolutional-network

Repository files navigation

GCN example

1. Installation

For All-reduce distributed training or Single node training, TensorFlow version 2.x.x is required.

To use Parameter Server, install TF version at least 2.4.

$ pip install spektral==0.6.2

1-1. Prepare dataset

Dataset must be downloaded (not public)

  1. Move downloaded dataset to Data directory

  2. cd Data

  3. tar -zxvf gdp_dataset.tgz

  4. Run python preprocess_dataset_v3.py

2. Training GCN

2-1. Train Model with Single Node

Run python train_gcn_v3.py at the root dir.

2-2. [All-reduce] Distributed Learning

  1. Set nodes' IP addresses in dist_gcn_v3.py file

  2. Run dist_gcn_v3.py in each node. The chief node uses 0 for argument, and the others use increasing number from 1.

# Chief node
$ python dist_gcn_v3.py 0

# Other worker nodes
$ python dist_gcn_v3.py 1
$ python dist_gcn_v3.py 2
...

2-3. [Parameter Server] Distributed Learning

  • 1 Chief node & 2 Worker nodes & 1 PS node
# Node 1
$ python dist_ps_gcn_v2.py 0

# Node 2
$ python dist_ps_gcn_v2.py 1

# Node 3
$ python dist_ps_gcn_v2.py 2

# Node 4
$ python dist_ps_gcn_v2.py 3
  • 1 Chief node & 3 [Worker + PS] nodes
# Node 1
$ python dist_ps_gcn_v3.py chief
 
# Node 2
$ python dist_ps_gcn_v3.py worker 1
$ python dist_ps_gcn_v3.py ps 1

# Node 3
$ python dist_ps_gcn_v3.py worker 2
$ python dist_ps_gcn_v3.py ps 2

# Node 4
$ python dist_ps_gcn_v3.py worker 3
$ python dist_ps_gcn_v3.py ps 3

3. Model Checkpoint & Logging

Results of training saved in to path Model_v3/[datetime]/FOLD-[CV]/ or Model_dist_v3/[datetime]/FOLD-[CV]/.

Logging file (train.log) will be saved in to the same path with model.

Only chief node saves results in distributed training.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages