-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code128: Add minimal encodation algorithm (non-extended ASCII only) #276
Conversation
Adapted from ZXing (props Alex Geller) - maybe 80% slower depending on data & stack heavy but does improve some outcomes when FNC1s present (GS1 or manual) although not much else it appears (the previous algorithm was pretty good) Prompted by tests added from PR #272, props lyngklip
I'm wondering about the first Code 128 test case. I suspect a decoder might add 128 to those '9' digits because extended ASCII is active? Edit: it seems like a grey area. I have no idea what common practice is. This might be good. |
Yes you might think that but extended mode only applies to "the ISO/IEC 646 value", i.e. to Code Sets A and B ASCII values, not to Code Set C double digit values, which aren't ASCII, so C mode stuff can be freely intermixed with extended mode shifts and latches. |
ChatGPT seems to agree with you once I helped it understand the question. That makes encoding a bit more complex, right - interesting. I was under the impression that FNC4 insertion was a "preprocessing" step. This would mean that sequences of extended ASCII that mapped to ASCII digits might be encoded in character set C, and that does seem a bit odd even though it would not a problem so long as encoder and decoder agrees. But that's what the encoder did before, right? |
Well that was a bug, which PR #275 fixes. Extended ASCII should never be encoded as Code Set C digits. |
No time to weigh in right now, but I'll likely take this PR (over the other) once I've had time to review. @gitlost Did the basis for the algorithm get written up anywhere? If we're significantly deviate from informative routine provided in the symbology specs then we should have some reference to signpost users to. (I've had to expend a lot of effort over the years convincing developers of pathological decoders that they need to fix decoding bugs, even if the codeword sequence is not the result of a reference encoder.) |
The only write up really is that it's a standard algorithm, e.g. https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm. I'm concerned about performance, both speed and stack usage, so I'd hesitate to use it without trying it out first in some real-life cases if that's possible. The performance checking I did I wouldn't be confident in, being just loops using |
Here's the very simplistic performance test I used (
|
Something that I've been thinking about: should it be made explicit what the encoder does when faced with the possibility of encoding part of the message in two different ways with the same length:
There are possibly more alternatives than the ones I have listed. The reason I ask is that I've been playing with a somewhat rewritten encoder and I have come up with something where I can sort of control the priority of these things, but I still can't match all the test cases. The test cases in some places seem to prefer range switching over shifting and in other places the other way around. Unfortunately I have no insight into specifications. |
In theory, the examples from the initial ISO/IEC 15417:2000 specification were based on this code However I have not verified it. |
Closing in favour of PR #278 |
Adapted from ZXing (props Alex Geller) - maybe 80% slower depending
on data & stack heavy but does improve some outcomes when FNC1s
present (GS1 or manual) although not much else it appears (the
previous algorithm was pretty good)
Prompted by tests added from PR #272, props lyngklip
This is the second of the alternative PRs (PR #275). You choose!