-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add avro to well-known encodings #993
Conversation
- Add a rust writer example
|
||
> Further, a name must be defined before it is used (“before” in the depth-first, left-to-right traversal of the JSON parse tree, where the types attribute of a protocol is always deemed to come “before” the messages attribute.) | ||
|
||
You can define a name inline using a single schema object for `data` or an array of schema objects. If the `data` is an array of schemas, the `name` must reference a single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this an additional restriction on top of the avro
spec, or just a helpful note for people trying to follow the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say its a half-restriction. Avro broadly can mean any of the IDL, the message serialization format, and the container format (avro files). The Avro container format does not have "channels" or "topics" and only supports writing messages that conform to the schema defined in the header. Avro only allows one schema in the header but supports "union" types (similar to protobuf anyof). In avro headers, an array of schema objects is treated as a union type. So a user can write a message to the avro file for any of the schemas in the array.
Typical MCAP use has different semantics because we have channels which can reference specific schemas. So this note is about allowing a user to specify an array of schema objects (so one avro schema object in the array can reference another schema by name without re-defining it - i.e. Point2d, etc) but we need the "name" in the mcap schema record to tell us which of the schemas in the array is the schema that messages are serialized with in that channel. Thus we don't treat an array of schemas as a union like avro does.
### avro | ||
|
||
- `name`: Fully qualified name of the record type (including namespace), e.g. `example.MyRecord` | ||
- `encoding`: `avro` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider avro1
, in case avro ever releases a version 2?
Co-authored-by: james-rms <[email protected]>
I think we should support all valid avro values in an MCAP message unless we have a really good reason not to.
I propose for names:
My reason for pushing back on this is that MCAP channels are logical channels representing a stream of data, and requiring writers to separate their unions into separate channels may remove some information they'd otherwise want to keep. My go-to example here is: messages in one channel in a recording are generally understood to have been sent in that order and arrive in that order. Messages in separate channels arrived at the recorder in log time order, but may have arrived at other consumers in a different order, and may have been sent in a different order. I also think it's powerful to be able to project any field of a message into a new MCAP data stream. If only certain Avro types can be toplevel message types, that rules that out for Avro encoding at least. |
Closing this for now. Good feedback and learnings but the primary customer for this is ok with their current setup and does not need anything urgent here. Until we get an avro user to review and provide feedback I am more comfortable shelving this. |
Add avro as a well-known schema encoding and message encoding. When using the "avro" message encoding, mcap writers indicate the messages on a channel are encoded using the avro binary serialization format. When using "avro" schema encoding, the schema format is a valid avro schema declaration as a JSON string.
A few notes/discussion points:
This change also adds a rust example that creates an mcap file with avro.
Related studio PR: https://github.com/foxglove/studio/pull/7008