-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character line size limit #996
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #996 +/- ##
==========================================
+ Coverage 83.27% 83.30% +0.02%
==========================================
Files 47 47
Lines 3145 3150 +5
Branches 483 432 -51
==========================================
+ Hits 2619 2624 +5
Misses 526 526
|
@@ -114,7 +118,15 @@ class FileContentRelation(config: RelationConfig, fileExternalId: String, inferS | |||
byteStream | |||
.through(enforceSizeLimit) | |||
.through(fs2.text.utf8.decode) | |||
.through(fs2.text.lines) | |||
.through(fs2.text.linesLimited(lineSizeLimitCharacters)) | |||
.handleErrorWith { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's .zipWithIndex
method that can expose indexes, not sure if we can attach it to error message though 🤔
But that's only a nice to have extra error detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, but then we cannot use linesLimited, we'd need to do a second pass with indexes and limit there. Not very hard, how much do you think this is important?
Co-authored-by: Dmitry Ivankov <[email protected]>
Co-authored-by: Dmitry Ivankov <[email protected]>
There is a debate between using byte size and character size.
This is my preferred approach because:
here is a draft of the alternative to give you an idea of what the flow would be like:
#995
(not if the alternative is preferred, the processing code needs a heavy coat of cleanup but the main idea is there)