Version history
Unreleased
v0.4.1
Released on 2025-03-10. See v0.4.1 release on GitHub
Added
ClickhouseServerAPIcan registerpandastables with datetime columns, and allows integers to be signed #61.ClickhouseServerAPIwill now registerdictorlistviapandas#61.- Easier dependency resolution for python 3.13 #67.
v0.4.0
Released on 2024-12-23. See v0.4.0 release on GitHub
Changed
- Renamed
ClickhouseAPIandClickhouseDataFrametoClickhouseServerAPIandClickhouseServerDataFramerespectively, andsplinkclickhouse.clickhousetosplinkclickhouse.clickhouse_server#54.
v0.3.4
Released on 2024-12-16. See v0.3.4 release on GitHub
Added
- Added Clickhouse appropriate versions of comparison level
PairwiseStringDistanceFunctionLeveland comparisonPairwiseStringDistanceFunctionAtThresholdsto the relevant libraries #51. ClickhouseAPIcan now properly registerpandastables with string array columns #51.
Fixed
- Table registration in
chdbnow works for pandas tables whose indexes do not have a0entry #49.
v0.3.3
Released on 2024-12-05. See v0.3.3 release on GitHub
Added
- Term frequency adjustments are now not limited in Clickhouse server (or
chdbwhendebug_modeis switched on) #46.
Changed
- Dropped support for Splink <=
4.0.5#46.
v0.3.2
Released on 2024-10-23. See v0.3.2 release on GitHub
Added
- SQL UDF
days_since_epochto parse a date representing a string to the number of days since1970-01-01#39. - Custom Clickhouse
ColumnExpressionwith additional transformparse_date_to_intto parse string to days since epoch #39. - Custom date comparison and comparison levels working with integer type representing days since epoch #39.
v0.3.1
Released on 2024-10-14. See v0.3.1 release on GitHub
Added
ClickhouseAPInow has a function.set_union_default_mode()to allow manually setting client state necessary for clustering, if session has timed out e.g. when running interactively #36.- Added support for Splink 4.0.4 #37.
Fixed
estimate_probability_two_random_records_matchnow works correctly whendebug_modeis switched on #34.
v0.3.0
Released on 2024-09-26. See v0.3.0 release on GitHub
Changed
chdbis now an optional dependency, requiring opt-in installation for use ofChDBAPI#28.
v0.2.5
Released on 2024-09-23. See v0.2.5 release on GitHub
Changed
- Added support for Splink >= 4.0.2, dropped support for 4.0.0, 4.0.1 #26.
v0.2.4
Released on 2024-09-19. See v0.2.4 release on GitHub
Added
- Extended
ClickhouseAPIpandas table registration to support float columns #24. - Added Clickhouse-specific library comparisons/levels -
cll_ch.DistanceInKMLevel,cl_ch.DistanceInKMAtThresholds, andcl_ch.ExactMatchAtSubstringSizes#24.
v0.2.3
Released on 2024-09-16. See v0.2.3 release on GitHub
Changed
v0.2.2
Released on 2024-09-12. See v0.2.2 release on GitHub
Added
ClickhouseAPInow allows for registering tables directly from pandasDataFrames, if they contain only integer and string columns #18.
Fixed
- Create an alias for
rand,randomso thatLinker.visualisations.comparison_viewer_dashboardruns without error #14. - Workaround for Clickhouse
count(*) filter ...parsing issue so thatlinker.clustering.compute_graph_metrics(...)now runs #18.
v0.2.1
Released on 2024-09-12. See v0.2.1 release on GitHub
Changed
- Updated
numpydependency requirements to allow compatible versions for all supported python versions #9.
v0.2.0
Released on 2024-09-11. See v0.2.0 release on GitHub
Added
ClickhouseAPIand dataframe added to support running calculations in a Clickhouse instance #4.
v0.1.1
Released on 2024-09-10. See v0.1.1 release on GitHub
Fixed
- Fix
random_sample_sqlso that u-training works when we don't sample the entire dataset #1.
Changed
try_parse_dateandtry_parse_timestampnow useDateTime64to extend the range to more useful values, and no longer support custom format strings #2.
v0.1.0
Released on 2024-09-09. See v0.1.0 release on GitHub
Added
- Basic working version of package with api for
chdb