- Allow users to specify output encoding for some CLI commands (thanks to @jbdesbas)
- Optimize the normal-form detection (thanks to @no23reason)
- Internal: fix names of C modules
- Add more type hints to CleverCSV
- Move the import of the optional tabview dependency to where it's needed (for #101)
- Allow inspecting more rows for header detection (fixes #98)
- Add type hints to CleverCSV
- Disable 32-bit builds on Windows and Linux
- Bump minimal Python version to 3.8
- Minor documentation improvements
- Improve median runtime by ~68% (~52% on average) by: 1) more caching, 2) implementing a heavy function in C.
- Redesign computation of consistency measure to a class:
ConsistencyDetector
. - Fix potential memory leak in C code for base abstraction
- Fixes to escape sequences in regexes (thanks to @JakobGM!)
- Various improvements to code quality
- Switch documentation style to furo.
- Use r-prefix for regex patterns (thanks to @JakobGM!)
- Fix documentation typo (thanks to @Aritra8438!)
- Simplify faust-cchardet import for Windows builds
- Add support for Python 3.11 by fixing a bug regarding empty strings in dialects (thanks to @stefanor!)
- Fix installation error due to change in internals at setuptools (thanks to @mweinelt!)
- Migrate to faust-cchardet as cChardet fails to install on Python 3.11 (on Windows, currently only chardet will work for Python 3.11)
- Migrate to packaging for version comparison
- Add wrapper for writing a list of dictionaries (write_dicts)
- Fix bug when writing CSVs using the
csv
module dialects - Add the builtin dialects to CleverCSV (e.g.,
clevercsv.excel
)
- Release to build wheels for Python 3.10
- Re-implement command line interface using Wilderness
- Add man-pages to package
- Remove deprecated wrapper functions
- Expand URL regex to support
localhost:<port>
urls - Minor changes to the TypeDetector API
- Add cChardet as optional dependency (fixes #48)
- Add a JSON object data type to address a specific failure case (#37).
- Add support for timezones for time data type
- Add support for building wheels on non-native architectures (#39).
- Add a flag to disable skipping type detection using the command line interface.
- Add a "bytearray" type to address a specific failure case (#35).
- Minor clarifications to licensing.
- Updates to release process. This version introduces pre-compiled wheels for Python 3.9.
- Add an
encoding
argument towrite_table
to allow specifying the output encoding. Thanks to @mitchgrogg for reporting issue #27.
- Add support for standardizing in-place and standardizing multiple files.
- Add warning on duplicate field names in DictReader
- Add return value to writers to match the standard library.
- Various speed ups to constructing the list of potential dialects. This removes a costly step of the detection process that will likely add a few more potential dialects, but has the end result of making overall dialect detection faster.
- Rename wrapper functions to a more coherent naming scheme. Old names will be available until 0.7.0, but now produce a FutureWarning.
- Add
stream_dicts
wrapper function. - Improve handling of file encoding for the
read_dataframe
wrapper: detected encoding is now passed on to Pandas. - Fix handling of optional dependency error for TabView on non-Windows platforms.
- Update URL regex to avoid catastrophic backtracking and increase performance. See issue #13 and issue #15. Thanks to @kaskawu for the fix and @jlumbroso for re-raising the issue.
- Add
num_chars
keyword argument toread_as_dicts
andcsv2df
wrappers. - Improve documentation w.r.t. handling large files. Thanks to @jlumbroso for raising this issue.
- Add an
explore
command to the command line application for CleverCSV. This command makes it easy to start exploring a CSV file using the Python interactive shell.
- Split the package into a "core" and "full" version. This allows users who only need the improved dialect detection functionality to download a version with a smaller footprint. Fixes issue #10]. Thanks to @seperman.
- Fix speed of
unix_path
regex used in type detection. (issue #13). Thanks to @kaskawu.
- Add
stream_csv
wrapper that returns a generator over rows - Minor update to the URL type detection
- Documentation updates
- Fix bugs discovered from fuzz testing (issue #7)
- Minor changes to readme and code quality
- Fix using nan as default value when skipping a dialect (issue #5)
- Bump version to fix wheel building
- Bump version to fix wheel building
- Improve type detection for quoted alphanumeric cells (#4)
- Pass
strict
dialect property to parser.
- Bugfix for
write_table
wrapper on Windows. - Move building Windows platform wheels to Travis.
- Use
cibuildwheel
version 1.0.0 for building wheels.
- Add a wrapper function that writes a table to a CSV file.
- Update CleverCSV to match updated clikit dependency
- Fix dependency versions for clikit and cleo
- Update
standardize
command to use CRLF line endings on all platforms. - Add work around for Tabview being unavailable on Windows.
- Remove packaging and dependency management with Poetry.
- Add support for building platform wheels on Travis and AppVeyor.
- Add optional
method
parameter to dialect detector. - Bugfix for
clevercsv code
command when the delimiter is tab.
- Fix a failing build due to dependency version mismatch
- Allow underscore in alphanumeric strings
- Update unix path regular expression
- Add more integration tests and log detection method
- Update URL regular expression and add unit tests
- Add IPv4 type detection
- Add tie-breaker for combined quotechar and escapechar ties
- Bugfix for console script
code
command - Update readme
- Cleanly handle failure to detect dialect in console application
- Remove any (partial) support for Python 2
- Remove Python parser - this speeds up file reading and tie breaking
- Ensure the C parser is used in the
reader
. - Update integration tests to improve error handling
- Readme updates
- Ensure detected encoding is in the generated Python code for the
clevercsv code
command. - Ensure encoding is detected in
wrappers.detect_dialect
. - Bugfix in integration test
- Expand readme
- Add documentation on Read the Docs
- Use requirements.txt file for dependencies when packaging
- Add help description to each CLI command
- Update README
- Add transpose flag for
standardize
andview
commands
- Rewrite console application using Cleo
- Add unit tests for console application
- Add
detect_dialect
wrapper function - Add support for "unix_path" data type in type detection
- Add
encoding
andnum_chars
options toread_csv
wrapper - Add
-p/--pandas
flag tocode
command to generate Pandas output.
- Rename
read_as_lol
toread_csv
.
- Allow setting the number of characters to read
- Simplify printing of skipped potential dialects
- Add
read_as_lol
wrapper function.
- Add
code
command toclevercsv
command line program.
- Bugfix to update executable to new name
- Rename package to clevercsv