Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multivariate KDE #2

Open
seatonullberg opened this issue Oct 17, 2022 · 4 comments
Open

Add support for multivariate KDE #2

seatonullberg opened this issue Oct 17, 2022 · 4 comments
Labels
enhancement New feature or request
Milestone

Comments

@seatonullberg
Copy link
Owner

Currently only univariate distributions are supported. A complete implementation would include seamless support for multivariate distributions. The only type that should be changed is the KernelDensityEstimator struct. Currently, the data structure is as follows:

pub struct KernelDensityEstimator<B, K> {
    observations: Vec<Float>,
    bandwidth: B,
    kernel: K,
}

The observations field will need to be converted to a nalgebra::DMatrix to support a multivariate distribution. The type should be hidden behind an alias so that end-users do not need to add nalgebra as a dependency in their own projects.

pub type Matrix2D = nalgebra::DMatrix<Float>;

pub struct KernelDensityEstimator<B, K> {
    observations: Matrix2D,
    bandwidth: B,
    kernel: K,
}

To prevent needless conversions for users working with univariate data, the data structure could instead add a generic parameter T representing the type of observations. For univariate data T could be concretely represented as Vec<Float> and for multivariate data T could be concretely represented as Matrix2D. However, this would require the introduction of two new traits UnivariateKDE and MultivariateKDE to mimic overloading of the method names pdf, cdf, and sample.

pub struct KernelDensityEstimator<T, B, K> {
    observations: T,
    bandwidth: B,
    kernel: K,
}

pub trait UnivariateKDE {
    // Unimplemented.
}

pub trait MultivariateKDE {
    // Unimplemented.
}

// Univariate case.
impl<B, K> UnivariateKDE for KernelDensityEstimator<Vec<Float>, B, K>
where
    B: Bandwidth,
    K: Kernel,
{
    // Unimplemented.
}

// Multivariate case.
impl<B, K> MultivariateKDE for KernelDensityEstimator<Matrix2D, B, K>
where
    B: Bandwidth,
    K: Kernel,
{
    // Unimplemented.
}

Lastly, the traits UnivariateKDE and MultivariateKDE should be sealed to prevent end-user implementations.

@seatonullberg seatonullberg added the enhancement New feature or request label Oct 17, 2022
@seatonullberg seatonullberg added this to the v0.2.0 milestone Oct 17, 2022
@humphreylee
Copy link

Good piece of work. Would you mind if I ask if there is any progress on this? Thanks.

@seatonullberg
Copy link
Owner Author

Hi @humphreylee, I'm currently preoccupied with writing my dissertation, but I do intend to get back to this as soon as I am able.

@seatonullberg seatonullberg modified the milestones: v0.2.0, v0.3.0 Mar 21, 2024
@emyr666
Copy link

emyr666 commented Apr 12, 2024

bump!
am using seaborn in python to do 2d kde contour plots but its very slow. am hoping that by using rust I can use the webasm stuff to do this fast on the client side instead of generating plots (exteremly slowly) on the server to send to a web client.

@rob-p
Copy link

rob-p commented Jul 3, 2024

Also a bump on this. We have an application, related to gene expression analysis, where we need a 2D density estimator. Currently, there are no crates for this in rust and we are calling out to a Python library which is both slow and much uglier than we'd like (the rest of the code is pure rust). It would be great to have the ability to do the density estimation in rust and this seems like the only crate I can find where this is even on the roadmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants