code128: Add minimal encodation algorithm (non-extended ASCII only) #276

gitlost · 2024-10-12T22:18:53Z

Adapted from ZXing (props Alex Geller) - maybe 80% slower depending
on data & stack heavy but does improve some outcomes when FNC1s
present (GS1 or manual) although not much else it appears (the
previous algorithm was pretty good)

Prompted by tests added from PR #272, props lyngklip

This is the second of the alternative PRs (PR #275). You choose!

Adapted from ZXing (props Alex Geller) - maybe 80% slower depending on data & stack heavy but does improve some outcomes when FNC1s present (GS1 or manual) although not much else it appears (the previous algorithm was pretty good) Prompted by tests added from PR #272, props lyngklip

lyngklip · 2024-10-13T16:26:21Z

I'm wondering about the first Code 128 test case. I suspect a decoder might add 128 to those '9' digits because extended ASCII is active?

Edit: it seems like a grey area. I have no idea what common practice is. This might be good.

gitlost · 2024-10-13T17:00:55Z

Yes you might think that but extended mode only applies to "the ISO/IEC 646 value", i.e. to Code Sets A and B ASCII values, not to Code Set C double digit values, which aren't ASCII, so C mode stuff can be freely intermixed with extended mode shifts and latches.

lyngklip · 2024-10-13T17:07:24Z

ChatGPT seems to agree with you once I helped it understand the question. That makes encoding a bit more complex, right - interesting. I was under the impression that FNC4 insertion was a "preprocessing" step. This would mean that sequences of extended ASCII that mapped to ASCII digits might be encoded in character set C, and that does seem a bit odd even though it would not a problem so long as encoder and decoder agrees. But that's what the encoder did before, right?

gitlost · 2024-10-13T17:12:49Z

Well that was a bug, which PR #275 fixes. Extended ASCII should never be encoded as Code Set C digits.

terryburton · 2024-10-13T18:58:13Z

No time to weigh in right now, but I'll likely take this PR (over the other) once I've had time to review.

@gitlost Did the basis for the algorithm get written up anywhere? If we're significantly deviate from informative routine provided in the symbology specs then we should have some reference to signpost users to. (I've had to expend a lot of effort over the years convincing developers of pathological decoders that they need to fix decoding bugs, even if the codeword sequence is not the result of a reference encoder.)

gitlost · 2024-10-13T19:46:15Z

The only write up really is that it's a standard algorithm, e.g. https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm.

I'm concerned about performance, both speed and stack usage, so I'd hesitate to use it without trying it out first in some real-life cases if that's possible.

The performance checking I did I wouldn't be confident in, being just loops using usertime for timings with garbage collection turned off (-2 vmreclaim).

gitlost · 2024-10-13T19:55:11Z

Here's the very simplistic performance test I used (mode128 is the Divide-and-Conquer one, code128 is the current):

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /mode128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(mode128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /code128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(code128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /mode128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(mode128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /code128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(code128 tot ) print tot ==

lyngklip · 2024-10-20T15:19:10Z

Something that I've been thinking about: should it be made explicit what the encoder does when faced with the possibility of encoding part of the message in two different ways with the same length:

prefer ASCII > Extended or vice versa
prefer A>B>C or B>A>C or C>A>B etc.
prefer range shift to range switch or...
prefer character set shift to character set switch or...
prefer dangling digit at the front of an odd digit span or at the end
switch from ASCII A to Extended B using 100 100 100 or 101 101 100 etc.

There are possibly more alternatives than the ones I have listed. The reason I ask is that I've been playing with a somewhat rewritten encoder and I have come up with something where I can sort of control the priority of these things, but I still can't match all the test cases. The test cases in some places seem to prefer range switching over shifting and in other places the other way around. Unfortunately I have no insight into specifications.

terryburton · 2024-10-20T15:55:36Z

Unfortunately I have no insight into specifications.

In theory, the examples from the initial ISO/IEC 15417:2000 specification were based on this code

However I have not verified it.

gitlost · 2024-10-28T16:17:33Z

Closing in favour of PR #278

gitlost mentioned this pull request Oct 25, 2024

Optimal encoding of Code 128 with option "suppressc" #278

Merged

gitlost closed this Oct 28, 2024

terryburton deleted the code128_pull_272_recurse branch November 7, 2024 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code128: Add minimal encodation algorithm (non-extended ASCII only) #276

code128: Add minimal encodation algorithm (non-extended ASCII only) #276

gitlost commented Oct 12, 2024

lyngklip commented Oct 13, 2024 •

edited

Loading

gitlost commented Oct 13, 2024

lyngklip commented Oct 13, 2024

gitlost commented Oct 13, 2024

terryburton commented Oct 13, 2024

gitlost commented Oct 13, 2024

gitlost commented Oct 13, 2024

lyngklip commented Oct 20, 2024

terryburton commented Oct 20, 2024

gitlost commented Oct 28, 2024

code128: Add minimal encodation algorithm (non-extended ASCII only) #276

code128: Add minimal encodation algorithm (non-extended ASCII only) #276

Conversation

gitlost commented Oct 12, 2024

lyngklip commented Oct 13, 2024 • edited Loading

gitlost commented Oct 13, 2024

lyngklip commented Oct 13, 2024

gitlost commented Oct 13, 2024

terryburton commented Oct 13, 2024

gitlost commented Oct 13, 2024

gitlost commented Oct 13, 2024

lyngklip commented Oct 20, 2024

terryburton commented Oct 20, 2024

gitlost commented Oct 28, 2024

lyngklip commented Oct 13, 2024 •

edited

Loading