-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New API exploration #90
base: main
Are you sure you want to change the base?
Conversation
While writing the x3f support which can have string keys I was thinking about this. While it would require a massive code reformat I think it would actually have a minimal logical impact to do:
With Sorry, I'm not a coder primarily, so I don't know the fancy terms for what I just tried to describe. |
(Java) I modified I threaded the logic through the descriptor and exif handler without issues. As you can see K is enum property keys.
This solution will probably be a little more manual to implement, so let me know what you think before I go refactoring the whole project. |
I played with the enum idea a bit more and realized it can tick all the checkboxes in java. Using interfaces and the template capability in java enums eliminates the need for Key (interface): Example directory: By consolidating so much we'd lose plug-in backwards compatibility, but updating to the new API shouldn't be too much work. |
New API
This pull request is exploring an improved API within the library the library.
Discussion about the API should happen here, and this description will be updated to reflect what's agreed upon.
Desired qualities & features
Easier to update properties
Currently, adding a property requires editing four disparate locations in code:
We could lose the concept of descriptor classes altogether.
A new API should co-locate all this in one place. For example:
Property metadata (or, metadata about metadata)
We can model property metadata, such as:
Metadata varies depending upon the property type.
Support varied tag identifiers
Directory
as a base class requires all tag identifiers to be integers. That's fine for Exif and other TIFF formats that use numeric identifiers for tags, but not all file formats have that. Instead we end up defining our own arbitrary integer mapping.XMP in particular doesn't fit this mould, and we don't support it very well. XMP properties have two keys -- a namespace and a key. We should support composite keys such as this.
A likely implementation will have a directory base class that's generic on its key type.
We still want to easily enumerate all properties and print out their descriptions. Key types must have some common base type from which a key string can be obtained.
Top level object
The .NET project does not have a
Metadata
class, instead usingIEnumerable<Directory>
. C# provides great operators (OfType<>
,First
,Select
,SelectMany
, ...), and allows extension methods on this interface.Data driven approach
Rather than defining all properties in code, they should be loaded from a configuration file.
The configuration file would be reused between the Java and C# implementations. It should simplify the creation of implementations in other languages as well.
Implementations could provide partial support for the metadata types described in the data file, allowing gradual implementation. We could automatically generate documentation/tabulation of support across implementations.
Exiftool's Perl source code reads a lot more like data than code. This data file should be equally declarative.
Logical vs. physical properties
As discussed in drewnoakes/metadata-extractor#10, it'd be valuable to allow simpler access to logical values that may have multiple possible physical locations.
Examples of such logical properties: Timestamp, Orientation, CameraMake, CameraModel, Aperture, Exposure, Flash, FocalLength, ISO, WhiteBalance, ImageSize, GeoLocation, Altitude, Heading, ThumbnailSize, LensModel, DriveMode, ExposureMode, ExposureProgram, Rating, Subject, Label, Copyright, Author, Comment, ImageCount.
Some could be sourced from many locations (Timestamp, ISO, Flash...) and others which are combinations of multiple tags (ImageSize, GeoLocation, ThumbnailSize, ...).
Efficient storage
Some formats use fixed length records. For these, a directory could store the single
byte[]
and useIndexedProperty
methods to read/write values at runtime.Context
Some kind of object that configures how metadata extraction is completed. It could cache the parsed data file (see above), specify filtering options (see below), configuration such as threshold limits on byte[] sizes, which metadata types to extract. If we ever need runtime code generation, it could cache those resources too.
Filtering
It seems useful to be able to limit the types of metadata returned during processing, to reduce heap allocation and reduce IO/CPU usage. There's a PR for this in the Java implementation, and some discussion there.
Serialisation
It might be good to support hooks for serialisation and deserialisation in arbitrary formats. There's a PR in the Java version that uses Java's object serialisation, but a more general approach should support XML, JSON, etc.
Future support for editing metadata
This is a very sought after feature, but it's a big commitment as the cost per error is high, and it will require a great deal of engineering.
So while it's not a v-next goal explicitly, it will likely be the next significant milestone for the project, and we should at least give it some thought when it comes to this iteration of the API. We should consider trying to minimise future API churn.
Naming
The data model (directories, tags) dates back to when the project was called ExifExtractor. The terms come from the TIFF specification.