-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Code Quality / Tiding #390
Changes from all commits
a8a7592
99e3fae
03a8a06
731ada7
ad1e954
3fcef87
1ddd627
0443c54
7c5a6ca
b229f63
25d2fea
f8ef4c4
76f35fa
48f9cba
446e295
21f5e47
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
// Copyright (c) Drew Noakes and contributors. All Rights Reserved. Licensed under the Apache License, Version 2.0. See LICENSE in the project root for license information. | ||
|
||
using System.Buffers; | ||
|
||
namespace MetadataExtractor.Formats.Iptc | ||
{ | ||
public static class Iso2022Converter | ||
|
@@ -89,17 +91,24 @@ public static class Iso2022Converter | |
|
||
foreach (var encoding in encodings) | ||
{ | ||
char[] charBuffer = ArrayPool<char>.Shared.Rent(encoding.GetMaxCharCount(bytes.Length)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice. There are some other places we could use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's also some wins introducing a few new specialized readers that can operate directly over ReadOnlySpan. One callout, would be replacing SequentialByteArrayReader with an optimized ref struct {LittleEndian/BigEndian}BufferReader(ReadOnlySpan buffer) -- and spanifying the outer method. This would allow us to operate directly over a span, and eliminate the reader allocation. There's another big win eliminating all the temporary array allocations when we make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some more use of ArrayPool in #392. There's definitely room for improvement in the reader classes. Check out the PRs from @kwhopper for some more ideas. My vague idea here is to build on the new span types and @kwhopper's investigations, and actually avoid doing as much parsing work during the extraction phase. Many directories we store could actually be backed by a
This sounds promising. We'd need to verify that the compiler doesn't allocate an array behind the scenes. EDIT: It seems to do the right thing on modern .NET: https://sharplab.io/#v2:EYLgtghglgdgPgAQEwEYCwAoBBmABAlANnyVwGFcBvTXW/PA4hAFlwFkUAKAJQFMIAJgHkYAGwCeAZQAOEGAB4CABgB8+FEoDOAShp1qGOrgC+e2mfrqmrNkk67D+i0Y6cA2gCIAgh4A0uDwAhPwCyDwBdbQBuC1MMYyA=== ...but not on .NET Framework: https://sharplab.io/#v2:EYLgHgbALAPgAgJgIwFgBQcDMACOSK4LYDC2A3utlbjngXFNgLJIAUASgKYCGAJgPIA7ADYBPAMoAHboIA8eAAwA+XEgUBnAJSVqFNNWwBfHVRM1V9RkwStt+3WYMtWAbQBEAQTcAabG4BCPn7EbgC6mgDcZsZohkA== This is a compiler feature, so we'd need to test There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thanks. RandomAccessStream+ReaderInfo is an experiment somewhat similar to this Span conversion. It goes a bit further by abstracting away all the buffering (RandomAccessStream) and "span-ifying" (ReaderInfo with byte arrays) entirely; callers then only have to worry about one kind of reader. Side effects are the ability to know your exact physical offset at any time, and support for streamed content. If you can reach those same goals in this process, that's a great addition and should allow new things in the future. I can also check through the code in those old PR's to see if they could use Spans like you're doing here, if that has some value. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kwhopper the recent activity has been small incremental improvements. I see your PR as the kind of thing we want for 3.0. It'll be a lot of work to integrate throughout the code though, so we should noodle out the details with sketches and discussion before starting the work of integration. We only want to do that integration work once. A direction I think would be good is to divide the parsing into two stages:
Currently we do both 1 and 2 during the read phase. I'm thinking that, with this, we'd just do step 1 during that phase, and step 2 during the enumeration. This will mean a lot less work and fewer allocations during the first phase, and when that work's done during the second phase, any allocations would be shorter-lived and therefore more likely to be GC'd quickly in gen0. It'd also allow consumers to skip decoding bits they don't actually care about. I'm hoping to write this up a bit more comprehensively and would really appreciate your input. |
||
|
||
try | ||
{ | ||
var s = encoding.GetString(bytes, 0, bytes.Length); | ||
if (s.IndexOf((char)65533) != -1) | ||
int charCount = encoding.GetChars(bytes, 0, bytes.Length, charBuffer, 0); | ||
|
||
if (charBuffer.AsSpan(0, charCount).IndexOf((char)65533) != -1) | ||
continue; | ||
return encoding; | ||
} | ||
catch | ||
{ | ||
// fall through... | ||
} | ||
finally | ||
{ | ||
ArrayPool<char>.Shared.Return(charBuffer); | ||
} | ||
} | ||
|
||
// No encodings succeeded. Return null. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, this should have been a 2 instead of a 3. I'll push a fix shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#391