All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
ClickhouseServerAPI
can registerpandas
tables with datetime columns, and allows integers to be signed #61.ClickhouseServerAPI
will now registerdict
orlist
viapandas
#61.
0.4.0 - 2024-12-23
- Renamed
ClickhouseAPI
andClickhouseDataFrame
toClickhouseServerAPI
andClickhouseServerDataFrame
respectively, andsplinkclickhouse.clickhouse
tosplinkclickhouse.clickhouse_server
#54.
0.3.4 - 2024-12-16
- Added Clickhouse appropriate versions of comparison level
PairwiseStringDistanceFunctionLevel
and comparisonPairwiseStringDistanceFunctionAtThresholds
to the relevant libraries #51. ClickhouseAPI
can now properly registerpandas
tables with string array columns #51.
- Table registration in
chdb
now works for pandas tables whose indexes do not have a0
entry #49.
0.3.3 - 2024-12-05
- Term frequency adjustments are now not limited in Clickhouse server (or
chdb
whendebug_mode
is switched on) #46.
- Dropped support for Splink <=
4.0.5
#46.
0.3.2 - 2024-10-23
- SQL UDF
days_since_epoch
to parse a date representing a string to the number of days since1970-01-01
#39. - Custom Clickhouse
ColumnExpression
with additional transformparse_date_to_int
to parse string to days since epoch #39. - Custom date comparison and comparison levels working with integer type representing days since epoch #39.
0.3.1 - 2024-10-14
ClickhouseAPI
now has a function.set_union_default_mode()
to allow manually setting client state necessary for clustering, if session has timed out e.g. when running interactively #36.- Added support for Splink 4.0.4 #37.
estimate_probability_two_random_records_match
now works correctly whendebug_mode
is switched on #34.
0.3.0 - 2024-09-26
chdb
is now an optional dependency, requiring opt-in installation for use ofChDBAPI
#28.
0.2.5 - 2024-09-23
- Added support for Splink >= 4.0.2, dropped support for 4.0.0, 4.0.1 #26.
0.2.4 - 2024-09-19
- Extended
ClickhouseAPI
pandas table registration to support float columns #24. - Added Clickhouse-specific library comparisons/levels -
cll_ch.DistanceInKMLevel
,cl_ch.DistanceInKMAtThresholds
, andcl_ch.ExactMatchAtSubstringSizes
#24.
0.2.3 - 2024-09-16
0.2.2 - 2024-09-12
ClickhouseAPI
now allows for registering tables directly from pandasDataFrame
s, if they contain only integer and string columns #18.
- Create an alias for
rand
,random
so thatLinker.visualisations.comparison_viewer_dashboard
runs without error #14. - Workaround for Clickhouse
count(*) filter ...
parsing issue so thatlinker.clustering.compute_graph_metrics(...)
now runs #18.
0.2.1 - 2024-09-12
- Updated
numpy
dependency requirements to allow compatible versions for all supported python versions #9.
0.2.0 - 2024-09-11
ClickhouseAPI
and dataframe added to support running calculations in a Clickhouse instance #4.
0.1.1 - 2024-09-10
- Fix
random_sample_sql
so that u-training works when we don't sample the entire dataset #1.
try_parse_date
andtry_parse_timestamp
now useDateTime64
to extend the range to more useful values, and no longer support custom format strings #2.
0.1.0 - 2024-09-09
- Basic working version of package with api for
chdb