You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-3Lines changed: 20 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,24 @@ Radius clustering is a Python package that implements clustering under radius co
17
17
- Compatible with scikit-learn's API for clustering algorithms
18
18
- Supports radius-constrained clustering
19
19
- Provides options for exact and approximate solutions
20
+
- Easy to use and integrate with existing Python data science workflows
21
+
- Includes comprehensive documentation and examples
22
+
- Full test coverage to ensure reliability and correctness
23
+
- Supports custom MDS solvers for flexibility in clustering approaches
24
+
- Provides a user-friendly interface for clustering tasks
25
+
26
+
> [!CAUTION]
27
+
> **Deprecation Notice**: The `threshold` parameter in the `RadiusClustering` class has been deprecated. Please use the `radius` parameter instead for specifying the radius for clustering. It is planned to be completely removed in version 2.0.0. The `radius` parameter is now the standard way to define the radius for clustering, aligning with our objective of making the parameters' name more intuitive and user-friendly.
28
+
29
+
> [!NOTE]
30
+
> **NEW VERSIONS**: The package is currently under active development for new features and improvements, including some refactoring and enhancements to the existing codebase. Backwards compatibility is not guaranteed, so please check the [CHANGELOG](CHANGELOG.md) for details on changes and updates.
31
+
32
+
## Roadmap
33
+
34
+
-[x] Version 1.4.0:
35
+
-[x] Add support for custom MDS solvers
36
+
-[x] Improve documentation and examples
37
+
-[x] Add more examples and tutorials
20
38
21
39
## Installation
22
40
@@ -38,7 +56,7 @@ from radius_clustering import RadiusClustering
@@ -109,5 +127,4 @@ The Radius Clustering work has been funded by:
109
127
110
128
-[1][An iterated greedy algorithm for finding the minimum dominating set in graphs](https://www.sciencedirect.com/science/article/pii/S0378475422005055)
111
129
-[2][An exact algorithm for the minimum dominating set problem](https://dl.acm.org/doi/abs/10.24963/ijcai.2023/622)
112
-
113
-
130
+
-[3][Clustering under radius constraint using minimum dominating set](https://link.springer.com/chapter/10.1007/978-3-031-62700-2_2)
Copy file name to clipboardExpand all lines: docs/source/usage.rst
+108-2Lines changed: 108 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,20 @@
1
1
Usage
2
2
=====
3
3
4
-
Here's a basic example of how to use Radius Clustering:
4
+
This page provides a quick guide on how to use the `radius_clustering` package for clustering tasks. The package provides a simple interface for performing radius-based clustering on datasets based on the Minimum Dominating Set (MDS) algorithm.
5
+
6
+
This page is divided into three main sections:
7
+
1. **Basic Usage**: A quick example of how to use the `RadiusClustering` class and perform clustering with several parameters.
8
+
2. **Custom Dissimilarity Function**: How to use a custom dissimilarity function with the `RadiusClustering` class.
9
+
3. **Custom MDS Solver**: How to implement a custom MDS solver for more advanced clustering tasks, eventually with less guarantees on the results.
10
+
11
+
12
+
Basic Usage
13
+
-----------------
14
+
15
+
The `RadiusClustering` class provides a straightforward way to perform clustering based on a specified radius. You can choose between an approximate or exact method for clustering, depending on your needs.
16
+
17
+
Here's a basic example of how to use Radius Clustering with the `RadiusClustering` class, using the approximate method:
5
18
6
19
.. code-block:: python
7
20
@@ -22,4 +35,97 @@ Here's a basic example of how to use Radius Clustering:
22
35
# Get cluster labels
23
36
labels = rad.labels_
24
37
25
-
print(labels)
38
+
print(labels)
39
+
40
+
Similarly, you can use the exact method by changing the `manner` parameter to `"exact"`:
41
+
.. code-block:: python
42
+
# [...] Exact same code as above
43
+
rad = RadiusClustering(manner="exact", radius=0.5) #change this parameter
44
+
# [...] Exact same code as above
45
+
46
+
Custom Dissimilarity Function
47
+
-----------------------------
48
+
49
+
The main reason behind the `radius_clustering` package is that users eventually needs to use a dissimilarity function that is not a metric (or distance) function. Plus, sometimes context requires a domain-specific dissimilarity function that is not provided by default, and needs to be implemented by the user.
50
+
51
+
To use a custom dissimilarity function, you can pass it as a parameter to the `RadiusClustering` class. Here's an example of how to do this:
52
+
.. code-block:: python
53
+
54
+
from radius_clustering import RadiusClustering
55
+
import numpy as np
56
+
57
+
# Generate random data
58
+
X = np.random.rand(100, 2)
59
+
60
+
# Define a custom dissimilarity function
61
+
defdummy_dissimilarity(x, y):
62
+
return np.linalg.norm(x - y) +0.1# Example: add a constant to the distance
63
+
64
+
# Create an instance of MdsClustering with the custom dissimilarity function
65
+
rad = RadiusClustering(manner="approx", radius=0.5, metric=dummy_dissimilarity)
66
+
67
+
# Fit the model to the data
68
+
rad.fit(X)
69
+
70
+
# Get cluster labels
71
+
labels = rad.labels_
72
+
73
+
print(labels)
74
+
75
+
76
+
.. note::
77
+
The custom dissimilarity function will be passed to scikit-learn's `pairwise_distances` function, so it should be compatible with the expected input format and return type. See the scikit-learn documentation for more details on how to implement custom metrics.
78
+
79
+
Custom MDS Solver
80
+
-----------------
81
+
82
+
The two default solvers provided by the actual implementation of the `radius_clustering` package are focused on exactness (or proximity to exactness) of the results of a NP-hard problem. So, they may not be suitable for all use cases, especially when performance is a concern.
83
+
If you have your own implementation of a Minimum Dominating Set (MDS) solver, you can use it with the `RadiusClustering` class ny using the :py:func:'RadiusClustering.set_solver' method. It will check that the solver is compatible with the expected input format and return type, and will use it to perform clustering.
84
+
85
+
.. versionadded:: 1.4.0
86
+
The :py:func:`RadiusClustering.set_solver` method was added to allow users to set a custom MDS solver.
87
+
It is *NOT* backward compatible with previous versions of the package, as it comes with new structure and methods to handle custom solvers.
88
+
89
+
Here's an example of how to implement a custom MDS solver and use it with the `RadiusClustering` class, using NetworkX implementation of the dominating set problem :
# Create an instance of MdsClustering with the custom MDS solver
111
+
rad = RadiusClustering(manner="approx", radius=0.5)
112
+
rad.set_solver(custom_mds_solver)
113
+
114
+
# Fit the model to the data
115
+
rad.fit(X)
116
+
117
+
# Get cluster labels
118
+
labels = rad.labels_
119
+
120
+
print(labels)
121
+
122
+
.. note::
123
+
The custom MDS solver should accept the same parameters as the default solvers, including the number of points `n`, the edges of the graph `edges`, the number of edges `nb_edges`, and an optional `random_state` parameter for reproducibility. It should return a list of centers and the time taken to compute them.
124
+
The `set_solver` method will check that the custom solver is compatible with the expected input format and return type, and will use it to perform clustering.
125
+
If the custom solver is not compatible, it will raise a `ValueError` with a descriptive message.
126
+
127
+
.. attention::
128
+
We cannot guarantee that the custom MDS solver will produce the same results as the default solvers, especially if it is not purposely designed to solve the Minimum Dominating Set problem but rather just finds a dominating set. The results may vary depending on the implementation and the specific characteristics of the dataset.
129
+
As an example, a benchmark of our solutions and a custom one using NetworkX is available in the `Example Gallery` section of the documentation, which shows that the custom solver may produce different results than the default solvers, especially in terms of the number of clusters and the time taken to compute them (see :ref:`sphx_glr_auto_examples_plot_benchmark_custom.py`).
130
+
However, it can be useful for specific use cases where performance is a concern or when you have a custom implementation that fits your needs better.
0 commit comments