Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed comparison with cKDTree in Python #100

Closed
LimHyungTae opened this issue Jan 8, 2025 · 6 comments
Closed

Speed comparison with cKDTree in Python #100

LimHyungTae opened this issue Jan 8, 2025 · 6 comments

Comments

@LimHyungTae
Copy link

LimHyungTae commented Jan 8, 2025

I experienced the fast speed of small-gicp and wanted to use it in my Python code as follows:

import yaml
import small_gicp
import open3d as o3d
import numpy as np
from scipy.spatial import cKDTree
import time

if __name__ == "__main__":

    full_pcd = o3d.io.read_point_cloud("/home/shapelim/tmp/scene_rgb.ply")

    start = time.time()
    locs_in = np.array(full_pcd.points)

    tree_pcd = cKDTree(locs_in)

    dis, idx = tree_pcd.query(np.asarray(full_pcd.points), k=1, workers=-1)
    end = time.time()

    pcd1_cloud = small_gicp.PointCloud(locs_in)
    sg_tree = small_gicp.KdTree(pcd1_cloud)

    idx2 = []
    dis2 = []
    for point in locs_in:
        found, idx_tmp, sq_dist = sg_tree.nearest_neighbor_search(point)
        idx2.append(idx_tmp)
        dis2.append(sq_dist)

    end2 = time.time()

    k = 1  
    num_threads = 48

    sg_tree2 = small_gicp.KdTree(pcd1_cloud)
    k_indices, k_sq_dists = sg_tree2.batch_knn_search(locs_in, k, num_threads)
    end3 = time.time()

    print(end - start, "sec")
    print(end2 - end, "sec")
    print(end3 - end2, "sec")

However, surprisingly, it was slower than cKDTree in scipy.

If you have some time, could you please give me some advice on this situation? The point cloud file can be downloaded here: https://drive.google.com/file/d/1JCwTGCj8hMwKz5IDhKeWM3F2HuNN0vYJ/view?usp=drive_link

@LimHyungTae
Copy link
Author

LimHyungTae commented Jan 8, 2025

Output:

0.5939619541168213 sec
2.543389320373535 sec
2.479418992996216 sec

The second one's slower speed might make sense, but I don't understand why the third one is dramatically slower than the first one.

@koide3
Copy link
Owner

koide3 commented Jan 9, 2025

Thanks for reporting the issue. I'm so curious about the performance gap.
Could you change the permission of the download link so that I can download it?

@LimHyungTae
Copy link
Author

Oh my bad. I think now it should work. Could you retry it?: https://drive.google.com/file/d/1JCwTGCj8hMwKz5IDhKeWM3F2HuNN0vYJ/view?usp=sharing

I also do not agree that the small_gicp is slower than the existing tree library.

@koide3
Copy link
Owner

koide3 commented Jan 11, 2025

Thanks a lot! I did a quick benchmark and found that batch_knn_search allocated knn result data as std::vector<std::vector<size_t>> and std::vector<std::vector<size_t>> that were inefficient to allocate and access. This inefficient data structure affected the knn search performance when the number of queries was large.

I replaced them with simple Eigen::Matrix<size_t, -1, -1> and Eigen::Matrix<double, -1, -1> to improve the memory efficiency, and confirmed that the improved version is faster than cKDTree for most of the cases.

kdtree_bench

The benchmark code is as follows, which is based on your previous benchmark code. Please note that small_gicp.KdTree supports parallel tree construction, and thus num_threads is given to the constructor of KdTree in addition to batch_knn_search. Could you check if it can be reproduced in your environment? The improved code is pushed to #101.

#!/bin/env python3
import time
import itertools
import small_gicp
import open3d as o3d
import numpy as np
from scipy.spatial import cKDTree

if __name__ == "__main__":
    full_pcd = o3d.io.read_point_cloud("/home/koide/datasets/small_gicp/scene_rgb.ply")

    f = open('result.csv', 'w')
    print('num_threads,k,ckd_build,ckd_query,sg_build,sg_query')
    print('num_threads,k,ckd_build,ckd_query,sg_build,sg_query', file=f)
    def trial(k, num_threads):
        locs_in = np.array(full_pcd.points)
        start = time.time()
        tree_pcd = cKDTree(locs_in)
        mid = time.time()
        dis, idx = tree_pcd.query(np.asarray(full_pcd.points), k=k, workers=num_threads)
        end = time.time()

        ckd_times = mid - start, end - mid

        pcd1_cloud = small_gicp.PointCloud(locs_in)
        start = time.time()
        sg_tree2 = small_gicp.KdTree(pcd1_cloud, num_threads=num_threads)
        mid = time.time()
        k_indices, k_sq_dists = sg_tree2.batch_knn_search(locs_in, k, num_threads)
        end = time.time()

        sg_times = mid - start, end - mid

        print('{},{},{},{},{},{}'.format(num_threads, k, ckd_times[0], ckd_times[1], sg_times[0], sg_times[1]))
        print('{},{},{},{},{},{}'.format(num_threads, k, ckd_times[0], ckd_times[1], sg_times[0], sg_times[1]), file=f)

    ks = [1, 2, 3, 4, 5, 10, 20]
    num_threads = [1, 2, 4, 6, 8, 10, 12]

    for k, n in itertools.product(ks, num_threads):
        trial(k, n)
#!/bin/env python3
import pandas
from matplotlib import pyplot

def main():
  df = pandas.read_csv('result.csv')

  ks = df['k'].unique()

  fig, axes = pyplot.subplots(len(ks), 3, figsize=(15, 15))

  for i, k in enumerate(ks):
    data = df[df['k'] == k]

    axes[i, 0].plot(data['num_threads'], data['ckd_build'], label='ckd')
    axes[i, 1].plot(data['num_threads'], data['ckd_query'], label='ckd')
    axes[i, 2].plot(data['num_threads'], data['ckd_build'] + data['ckd_query'], label='ckd')

    axes[i, 0].plot(data['num_threads'], data['sg_build'], label='sg')
    axes[i, 1].plot(data['num_threads'], data['sg_query'], label='sg')
    axes[i, 2].plot(data['num_threads'], data['sg_build'] + data['sg_query'], label='sg')

    axes[i, 0].legend()
    axes[i, 1].legend()
    axes[i, 2].legend()

    axes[i, 0].set_title('build (k={})'.format(k))
    axes[i, 1].set_title('query (k={})'.format(k))
    axes[i, 2].set_title('total (k={})'.format(k))
    axes[i, 0].set_ylabel('time [s]')

    axes[i, 0].set_xlabel('num_threads')
    axes[i, 1].set_xlabel('num_threads')
    axes[i, 2].set_xlabel('num_threads')

  fig.tight_layout()
  pyplot.show()

if __name__ == '__main__':
  main()

Side note: cKDTree shows a slightly faster query time than that of small_gicp when k is set to a large value (e.g., k=10 and k=20). I guess this is because small_gicp uses a linear vector to manage knn results while cKDTree uses a heap that can be more efficient for managing many neighbors. I'm thinking to implement the heap data structure and switch the data container to it when k is large.

@LimHyungTae
Copy link
Author

Yep, it has been reproduced! Now you can merge the branch to the main in #101.

It was also remarkable that we can build KdTree in multi-threading mode.

Figure_1

@LimHyungTae
Copy link
Author

P.S. I don't mind it because what I only use is k=1, haha. Your coding skills are always remarkable. Happy new year and take care!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants