We provide long-term prediction benchmark results on KTH Action dataset using $10\rightarrow 20$ frames prediction setting. Metrics (MSE, MAE, SSIM, pSNR, LPIPS) of the best models are reported in three trials. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. The default training setup is trained 100 epochs by Adam optimizer with a batch size of 16 and Onecycle scheduler on single GPU or 4GPUs, and we report the used GPU setups for each method (also shown in the config).
- For a fair comparison of different methods, we provide config files in configs/kth. Notice that
4xbs4
indicates 4GPUs DDP training with a batch size of 4 on each GPU.
- We provide config files in configs/kth/simvp.
STL Benchmarks on KTH
Method |
GPUs |
Params |
FLOPs |
FPS |
MSE |
MAE |
SSIM |
PSNR |
LPIPS |
Download |
ConvLSTM |
1xbs16 |
14.9M |
1368.0G |
16 |
47.65 |
445.5 |
0.8977 |
26.99 |
0.26686 |
model | log |
E3D-LSTM |
2xbs8 |
53.5M |
217.0G |
17 |
136.40 |
892.7 |
0.8153 |
21.78 |
0.48358 |
model | log |
PredNet |
1xbs16 |
12.5M |
3.4G |
399 |
152.11 |
783.1 |
0.8094 |
22.45 |
0.32159 |
model | log |
PhyDNet |
1xbs16 |
3.1M |
93.6G |
58 |
91.12 |
765.6 |
0.8322 |
23.41 |
0.50155 |
model | log |
MAU |
1xbs16 |
20.1M |
399.0G |
8 |
51.02 |
471.2 |
0.8945 |
26.73 |
0.25442 |
model | log |
MIM |
1xbs16 |
39.8M |
1099.0G |
17 |
40.73 |
380.8 |
0.9025 |
27.78 |
0.18808 |
model | log |
PredRNN |
1xbs16 |
23.6M |
2800.0G |
7 |
41.07 |
380.6 |
0.9097 |
27.95 |
0.21892 |
model | log |
PredRNN++ |
1xbs16 |
38.3M |
4162.0G |
5 |
39.84 |
370.4 |
0.9124 |
28.13 |
0.19871 |
model | log |
PredRNN.V2 |
1xbs16 |
23.6M |
2815.0G |
7 |
39.57 |
368.8 |
0.9099 |
28.01 |
0.21478 |
model | log |
DMVFN |
1xbs16 |
3.5M |
0.88G |
727 |
59.61 |
413.2 |
0.8976 |
26.65 |
0.12842 |
model | log |
SimVP+IncepU |
2xbs8 |
12.2M |
62.8G |
77 |
41.11 |
397.1 |
0.9065 |
27.46 |
0.26496 |
model | log |
SimVP+gSTA |
4xbs4 |
15.6M |
76.8G |
53 |
45.02 |
417.8 |
0.9049 |
27.04 |
0.25240 |
model | log |
TAU |
4xbs4 |
15.0M |
73.8G |
55 |
45.32 |
421.7 |
0.9086 |
27.10 |
0.22856 |
model | log |
Benchmark of MetaFormers Based on SimVP (MetaVP)
MetaFormer |
GPUs |
Params |
FLOPs |
FPS |
MSE |
MAE |
SSIM |
PSNR |
LPIPS |
Download |
IncepU (SimVPv1) |
2xbs8 |
12.2M |
62.8G |
77 |
41.11 |
397.1 |
0.9065 |
27.46 |
0.26496 |
model | log |
gSTA (SimVPv2) |
2xbs8 |
15.6M |
76.8G |
53 |
45.02 |
417.8 |
0.9049 |
27.04 |
0.25240 |
model | log |
ViT |
2xbs8 |
12.7M |
112.0G |
28 |
56.57 |
459.3 |
0.8947 |
26.19 |
0.27494 |
model | log |
Swin Transformer |
2xbs8 |
15.3M |
75.9G |
65 |
45.72 |
405.7 |
0.9039 |
27.01 |
0.25178 |
model | log |
Uniformer |
2xbs8 |
11.8M |
78.3G |
43 |
44.71 |
404.6 |
0.9058 |
27.16 |
0.24174 |
model | log |
MLP-Mixer |
2xbs8 |
20.3M |
66.6G |
34 |
57.74 |
517.4 |
0.8886 |
25.72 |
0.28799 |
model | log |
ConvMixer |
2xbs8 |
1.5M |
18.3G |
175 |
47.31 |
446.1 |
0.8993 |
26.66 |
0.28149 |
model | log |
Poolformer |
2xbs8 |
12.4M |
63.6G |
67 |
45.44 |
400.9 |
0.9065 |
27.22 |
0.24763 |
model | log |
ConvNeXt |
2xbs8 |
12.5M |
63.9G |
72 |
45.48 |
428.3 |
0.9037 |
26.96 |
0.26253 |
model | log |
VAN |
2xbs8 |
14.9M |
73.8G |
55 |
45.05 |
409.1 |
0.9074 |
27.07 |
0.23116 |
model | log |
HorNet |
2xbs8 |
15.3M |
75.3G |
58 |
46.84 |
421.2 |
0.9005 |
26.80 |
0.26921 |
model | log |
MogaNet |
2xbs8 |
15.6M |
76.7G |
48 |
42.98 |
418.7 |
0.9065 |
27.16 |
0.25146 |
model | log |
TAU |
2xbs8 |
15.0M |
73.8G |
55 |
45.32 |
421.7 |
0.9086 |
27.10 |
0.22856 |
model | log |