Speed comparison with cKDTree in Python #100

LimHyungTae · 2025-01-08T20:24:20Z

I experienced the fast speed of small-gicp and wanted to use it in my Python code as follows:

import yaml
import small_gicp
import open3d as o3d
import numpy as np
from scipy.spatial import cKDTree
import time

if __name__ == "__main__":

    full_pcd = o3d.io.read_point_cloud("/home/shapelim/tmp/scene_rgb.ply")

    start = time.time()
    locs_in = np.array(full_pcd.points)

    tree_pcd = cKDTree(locs_in)

    dis, idx = tree_pcd.query(np.asarray(full_pcd.points), k=1, workers=-1)
    end = time.time()

    pcd1_cloud = small_gicp.PointCloud(locs_in)
    sg_tree = small_gicp.KdTree(pcd1_cloud)

    idx2 = []
    dis2 = []
    for point in locs_in:
        found, idx_tmp, sq_dist = sg_tree.nearest_neighbor_search(point)
        idx2.append(idx_tmp)
        dis2.append(sq_dist)

    end2 = time.time()

    k = 1  
    num_threads = 48

    sg_tree2 = small_gicp.KdTree(pcd1_cloud)
    k_indices, k_sq_dists = sg_tree2.batch_knn_search(locs_in, k, num_threads)
    end3 = time.time()

    print(end - start, "sec")
    print(end2 - end, "sec")
    print(end3 - end2, "sec")

However, surprisingly, it was slower than cKDTree in scipy.

If you have some time, could you please give me some advice on this situation? The point cloud file can be downloaded here: https://drive.google.com/file/d/1JCwTGCj8hMwKz5IDhKeWM3F2HuNN0vYJ/view?usp=drive_link

The text was updated successfully, but these errors were encountered:

LimHyungTae · 2025-01-08T20:24:40Z

Output:

0.5939619541168213 sec
2.543389320373535 sec
2.479418992996216 sec

The second one's slower speed might make sense, but I don't understand why the third one is dramatically slower than the first one.

koide3 · 2025-01-09T08:39:31Z

Thanks for reporting the issue. I'm so curious about the performance gap.
Could you change the permission of the download link so that I can download it?

LimHyungTae · 2025-01-09T14:47:48Z

Oh my bad. I think now it should work. Could you retry it?: https://drive.google.com/file/d/1JCwTGCj8hMwKz5IDhKeWM3F2HuNN0vYJ/view?usp=sharing

I also do not agree that the small_gicp is slower than the existing tree library.

koide3 · 2025-01-11T13:59:05Z

Thanks a lot! I did a quick benchmark and found that batch_knn_search allocated knn result data as std::vector<std::vector<size_t>> and std::vector<std::vector<size_t>> that were inefficient to allocate and access. This inefficient data structure affected the knn search performance when the number of queries was large.

I replaced them with simple Eigen::Matrix<size_t, -1, -1> and Eigen::Matrix<double, -1, -1> to improve the memory efficiency, and confirmed that the improved version is faster than cKDTree for most of the cases.

The benchmark code is as follows, which is based on your previous benchmark code. Please note that small_gicp.KdTree supports parallel tree construction, and thus num_threads is given to the constructor of KdTree in addition to batch_knn_search. Could you check if it can be reproduced in your environment? The improved code is pushed to #101.

#!/bin/env python3
import time
import itertools
import small_gicp
import open3d as o3d
import numpy as np
from scipy.spatial import cKDTree

if __name__ == "__main__":
    full_pcd = o3d.io.read_point_cloud("/home/koide/datasets/small_gicp/scene_rgb.ply")

    f = open('result.csv', 'w')
    print('num_threads,k,ckd_build,ckd_query,sg_build,sg_query')
    print('num_threads,k,ckd_build,ckd_query,sg_build,sg_query', file=f)
    def trial(k, num_threads):
        locs_in = np.array(full_pcd.points)
        start = time.time()
        tree_pcd = cKDTree(locs_in)
        mid = time.time()
        dis, idx = tree_pcd.query(np.asarray(full_pcd.points), k=k, workers=num_threads)
        end = time.time()

        ckd_times = mid - start, end - mid

        pcd1_cloud = small_gicp.PointCloud(locs_in)
        start = time.time()
        sg_tree2 = small_gicp.KdTree(pcd1_cloud, num_threads=num_threads)
        mid = time.time()
        k_indices, k_sq_dists = sg_tree2.batch_knn_search(locs_in, k, num_threads)
        end = time.time()

        sg_times = mid - start, end - mid

        print('{},{},{},{},{},{}'.format(num_threads, k, ckd_times[0], ckd_times[1], sg_times[0], sg_times[1]))
        print('{},{},{},{},{},{}'.format(num_threads, k, ckd_times[0], ckd_times[1], sg_times[0], sg_times[1]), file=f)

    ks = [1, 2, 3, 4, 5, 10, 20]
    num_threads = [1, 2, 4, 6, 8, 10, 12]

    for k, n in itertools.product(ks, num_threads):
        trial(k, n)

#!/bin/env python3
import pandas
from matplotlib import pyplot

def main():
  df = pandas.read_csv('result.csv')

  ks = df['k'].unique()

  fig, axes = pyplot.subplots(len(ks), 3, figsize=(15, 15))

  for i, k in enumerate(ks):
    data = df[df['k'] == k]

    axes[i, 0].plot(data['num_threads'], data['ckd_build'], label='ckd')
    axes[i, 1].plot(data['num_threads'], data['ckd_query'], label='ckd')
    axes[i, 2].plot(data['num_threads'], data['ckd_build'] + data['ckd_query'], label='ckd')

    axes[i, 0].plot(data['num_threads'], data['sg_build'], label='sg')
    axes[i, 1].plot(data['num_threads'], data['sg_query'], label='sg')
    axes[i, 2].plot(data['num_threads'], data['sg_build'] + data['sg_query'], label='sg')

    axes[i, 0].legend()
    axes[i, 1].legend()
    axes[i, 2].legend()

    axes[i, 0].set_title('build (k={})'.format(k))
    axes[i, 1].set_title('query (k={})'.format(k))
    axes[i, 2].set_title('total (k={})'.format(k))
    axes[i, 0].set_ylabel('time [s]')

    axes[i, 0].set_xlabel('num_threads')
    axes[i, 1].set_xlabel('num_threads')
    axes[i, 2].set_xlabel('num_threads')

  fig.tight_layout()
  pyplot.show()

if __name__ == '__main__':
  main()

Side note: cKDTree shows a slightly faster query time than that of small_gicp when k is set to a large value (e.g., k=10 and k=20). I guess this is because small_gicp uses a linear vector to manage knn results while cKDTree uses a heap that can be more efficient for managing many neighbors. I'm thinking to implement the heap data structure and switch the data container to it when k is large.

LimHyungTae · 2025-01-11T16:45:28Z

Yep, it has been reproduced! Now you can merge the branch to the main in #101.

It was also remarkable that we can build KdTree in multi-threading mode.

LimHyungTae · 2025-01-11T16:46:19Z

P.S. I don't mind it because what I only use is k=1, haha. Your coding skills are always remarkable. Happy new year and take care!

LimHyungTae closed this as completed Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed comparison with cKDTree in Python #100

Speed comparison with cKDTree in Python #100

LimHyungTae commented Jan 8, 2025 •

edited

Loading

LimHyungTae commented Jan 8, 2025 •

edited

Loading

koide3 commented Jan 9, 2025

LimHyungTae commented Jan 9, 2025

koide3 commented Jan 11, 2025 •

edited

Loading

LimHyungTae commented Jan 11, 2025

LimHyungTae commented Jan 11, 2025

Speed comparison with cKDTree in Python #100

Speed comparison with cKDTree in Python #100

Comments

LimHyungTae commented Jan 8, 2025 • edited Loading

LimHyungTae commented Jan 8, 2025 • edited Loading

koide3 commented Jan 9, 2025

LimHyungTae commented Jan 9, 2025

koide3 commented Jan 11, 2025 • edited Loading

LimHyungTae commented Jan 11, 2025

LimHyungTae commented Jan 11, 2025

LimHyungTae commented Jan 8, 2025 •

edited

Loading

LimHyungTae commented Jan 8, 2025 •

edited

Loading

koide3 commented Jan 11, 2025 •

edited

Loading