Version history
Unreleased
v0.4.1
Released on 2025-03-10. See v0.4.1 release on GitHub
Added
ClickhouseServerAPI
can registerpandas
tables with datetime columns, and allows integers to be signed #61.ClickhouseServerAPI
will now registerdict
orlist
viapandas
#61.- Easier dependency resolution for python 3.13 #67.
v0.4.0
Released on 2024-12-23. See v0.4.0 release on GitHub
Changed
- Renamed
ClickhouseAPI
andClickhouseDataFrame
toClickhouseServerAPI
andClickhouseServerDataFrame
respectively, andsplinkclickhouse.clickhouse
tosplinkclickhouse.clickhouse_server
#54.
v0.3.4
Released on 2024-12-16. See v0.3.4 release on GitHub
Added
- Added Clickhouse appropriate versions of comparison level
PairwiseStringDistanceFunctionLevel
and comparisonPairwiseStringDistanceFunctionAtThresholds
to the relevant libraries #51. ClickhouseAPI
can now properly registerpandas
tables with string array columns #51.
Fixed
- Table registration in
chdb
now works for pandas tables whose indexes do not have a0
entry #49.
v0.3.3
Released on 2024-12-05. See v0.3.3 release on GitHub
Added
- Term frequency adjustments are now not limited in Clickhouse server (or
chdb
whendebug_mode
is switched on) #46.
Changed
- Dropped support for Splink <=
4.0.5
#46.
v0.3.2
Released on 2024-10-23. See v0.3.2 release on GitHub
Added
- SQL UDF
days_since_epoch
to parse a date representing a string to the number of days since1970-01-01
#39. - Custom Clickhouse
ColumnExpression
with additional transformparse_date_to_int
to parse string to days since epoch #39. - Custom date comparison and comparison levels working with integer type representing days since epoch #39.
v0.3.1
Released on 2024-10-14. See v0.3.1 release on GitHub
Added
ClickhouseAPI
now has a function.set_union_default_mode()
to allow manually setting client state necessary for clustering, if session has timed out e.g. when running interactively #36.- Added support for Splink 4.0.4 #37.
Fixed
estimate_probability_two_random_records_match
now works correctly whendebug_mode
is switched on #34.
v0.3.0
Released on 2024-09-26. See v0.3.0 release on GitHub
Changed
chdb
is now an optional dependency, requiring opt-in installation for use ofChDBAPI
#28.
v0.2.5
Released on 2024-09-23. See v0.2.5 release on GitHub
Changed
- Added support for Splink >= 4.0.2, dropped support for 4.0.0, 4.0.1 #26.
v0.2.4
Released on 2024-09-19. See v0.2.4 release on GitHub
Added
- Extended
ClickhouseAPI
pandas table registration to support float columns #24. - Added Clickhouse-specific library comparisons/levels -
cll_ch.DistanceInKMLevel
,cl_ch.DistanceInKMAtThresholds
, andcl_ch.ExactMatchAtSubstringSizes
#24.
v0.2.3
Released on 2024-09-16. See v0.2.3 release on GitHub
Changed
v0.2.2
Released on 2024-09-12. See v0.2.2 release on GitHub
Added
ClickhouseAPI
now allows for registering tables directly from pandasDataFrame
s, if they contain only integer and string columns #18.
Fixed
- Create an alias for
rand
,random
so thatLinker.visualisations.comparison_viewer_dashboard
runs without error #14. - Workaround for Clickhouse
count(*) filter ...
parsing issue so thatlinker.clustering.compute_graph_metrics(...)
now runs #18.
v0.2.1
Released on 2024-09-12. See v0.2.1 release on GitHub
Changed
- Updated
numpy
dependency requirements to allow compatible versions for all supported python versions #9.
v0.2.0
Released on 2024-09-11. See v0.2.0 release on GitHub
Added
ClickhouseAPI
and dataframe added to support running calculations in a Clickhouse instance #4.
v0.1.1
Released on 2024-09-10. See v0.1.1 release on GitHub
Fixed
- Fix
random_sample_sql
so that u-training works when we don't sample the entire dataset #1.
Changed
try_parse_date
andtry_parse_timestamp
now useDateTime64
to extend the range to more useful values, and no longer support custom format strings #2.
v0.1.0
Released on 2024-09-09. See v0.1.0 release on GitHub
Added
- Basic working version of package with api for
chdb