Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code128: Add minimal encodation algorithm (non-extended ASCII only) #276

Closed
wants to merge 1 commit into from

Conversation

gitlost
Copy link
Contributor

@gitlost gitlost commented Oct 12, 2024

Adapted from ZXing (props Alex Geller) - maybe 80% slower depending
on data & stack heavy but does improve some outcomes when FNC1s
present (GS1 or manual) although not much else it appears (the
previous algorithm was pretty good)

Prompted by tests added from PR #272, props lyngklip

This is the second of the alternative PRs (PR #275). You choose!

Adapted from ZXing (props Alex Geller) - maybe 80% slower depending
on data & stack heavy but does improve some outcomes when FNC1s
present (GS1 or manual) although not much else it appears (the
previous algorithm was pretty good)

Prompted by tests added from PR #272, props lyngklip
@lyngklip
Copy link
Contributor

lyngklip commented Oct 13, 2024

I'm wondering about the first Code 128 test case. I suspect a decoder might add 128 to those '9' digits because extended ASCII is active?

Edit: it seems like a grey area. I have no idea what common practice is. This might be good.

@gitlost
Copy link
Contributor Author

gitlost commented Oct 13, 2024

Yes you might think that but extended mode only applies to "the ISO/IEC 646 value", i.e. to Code Sets A and B ASCII values, not to Code Set C double digit values, which aren't ASCII, so C mode stuff can be freely intermixed with extended mode shifts and latches.

@lyngklip
Copy link
Contributor

ChatGPT seems to agree with you once I helped it understand the question. That makes encoding a bit more complex, right - interesting. I was under the impression that FNC4 insertion was a "preprocessing" step. This would mean that sequences of extended ASCII that mapped to ASCII digits might be encoded in character set C, and that does seem a bit odd even though it would not a problem so long as encoder and decoder agrees. But that's what the encoder did before, right?

@gitlost
Copy link
Contributor Author

gitlost commented Oct 13, 2024

Well that was a bug, which PR #275 fixes. Extended ASCII should never be encoded as Code Set C digits.

@terryburton
Copy link
Member

No time to weigh in right now, but I'll likely take this PR (over the other) once I've had time to review.

@gitlost Did the basis for the algorithm get written up anywhere? If we're significantly deviate from informative routine provided in the symbology specs then we should have some reference to signpost users to. (I've had to expend a lot of effort over the years convincing developers of pathological decoders that they need to fix decoding bugs, even if the codeword sequence is not the result of a reference encoder.)

@gitlost
Copy link
Contributor Author

gitlost commented Oct 13, 2024

The only write up really is that it's a standard algorithm, e.g. https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm.

I'm concerned about performance, both speed and stack usage, so I'd hesitate to use it without trying it out first in some real-life cases if that's possible.

The performance checking I did I wouldn't be confident in, being just loops using usertime for timings with garbage collection turned off (-2 vmreclaim).

@gitlost
Copy link
Contributor Author

gitlost commented Oct 13, 2024

Here's the very simplistic performance test I used (mode128 is the Divide-and-Conquer one, code128 is the current):

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /mode128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(mode128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /code128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(code128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /mode128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(mode128 tot ) print tot ==

2 vmreclaim
-2 vmreclaim

/tot 0 def
/startt usertime def
1 1 100 {
(^031^031_^127^159^031^159^159^159^15912345``^255^000^127^255^224^224^159`) (dontdraw parse) /code128 /uk.co.terryburton.bwipp findresource exec
} for
/endt usertime def
/tot tot endt startt sub add def
(code128 tot ) print tot ==

@lyngklip
Copy link
Contributor

Something that I've been thinking about: should it be made explicit what the encoder does when faced with the possibility of encoding part of the message in two different ways with the same length:

  • prefer ASCII > Extended or vice versa
  • prefer A>B>C or B>A>C or C>A>B etc.
  • prefer range shift to range switch or...
  • prefer character set shift to character set switch or...
  • prefer dangling digit at the front of an odd digit span or at the end
  • switch from ASCII A to Extended B using 100 100 100 or 101 101 100 etc.

There are possibly more alternatives than the ones I have listed. The reason I ask is that I've been playing with a somewhat rewritten encoder and I have come up with something where I can sort of control the priority of these things, but I still can't match all the test cases. The test cases in some places seem to prefer range switching over shifting and in other places the other way around. Unfortunately I have no insight into specifications.

@terryburton
Copy link
Member

Unfortunately I have no insight into specifications.

In theory, the examples from the initial ISO/IEC 15417:2000 specification were based on this code

However I have not verified it.

@gitlost
Copy link
Contributor Author

gitlost commented Oct 28, 2024

Closing in favour of PR #278

@gitlost gitlost closed this Oct 28, 2024
@terryburton terryburton deleted the code128_pull_272_recurse branch November 7, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants