This documentation contains some mixed advanced topics for Protozero users. Read the tutorial first if you are new to Protozero.
- A protobuf message has to fit into memory completely, otherwise it can not be parsed with this library. There is no streaming support.
- The length of a string, bytes, or submessage can't be more than 2^31-1.
- There is no specific support for maps but they can be used as described in the "Backwards compatibility" section of https://developers.google.com/protocol-buffers/docs/proto3#maps.
If protozero/version.hpp
is included, the following macros are set:
Macro | Example | Description |
---|---|---|
PROTOZERO_VERSION_MAJOR |
1 | Major version number |
PROTOZERO_VERSION_MINOR |
3 | Minor version number |
PROTOZERO_VERSION_PATCH |
2 | Patch number |
PROTOZERO_VERSION_CODE |
10302 | Version (major * 10,000 + minor * 100 + patch) |
PROTOZERO_VERSION_STRING |
"1.3.2" | Version string |
The behaviour of Protozero can be changed by defining the following macros. They have to be set before including any of the Protozero headers.
If this is set, you will get some extra warnings or errors during compilation if you are using an old (deprecated) interface to Protozero. Enable this if you want to make sure your code will work with future versions of Protozero.
Protozero uses the class protozero::data_view
as the return type of the
pbf_reader::get_view()
method and a few other functions take a
protozero::data_view
as parameter.
If PROTOZERO_USE_VIEW
is unset, protozero::data_view
is Protozero's own
implementation of a string view class.
Set this macro if you want to use a different implementation such as the C++17
std::string_view
class. In this case protozero::data_view
will simply be
an alias to the class you specify.
#define PROTOZERO_USE_VIEW std::string_view
The Google Protobuf spec documents that a non-repeated field can actually
appear several times in a message and the implementation is required to return
the value of the last version of that field in this case. pbf_reader.hpp
does
not enforce this. If this feature is needed in your case, you have to do this
yourself.
The spec also says that you must be able to read a packed repeated field where a not-packed repeated field is expected and vice versa. Also there can be several (packed or not-packed) repeated fields with the same tag and their contents must be concatenated. It is your responsibility to do this, Protozero doesn't do that for you.
The tag_and_type()
free function and the method of the same name on the
pbf_reader
and pbf_message
classes can be used to access both packed and
unpacked repeated fields. (It can also be used to check that you have the
right type of encoding for other fields.)
Here is the outline:
enum class ExampleMsg : protozero::pbf_tag_type {
repeated_uint32_x = 1
};
std::string data = ...
pbf_message<ExampleMsg> message{data};
while (message.next()) {
switch (message.tag_and_type()) {
case tag_and_type(ExampleMsg::repeated_uint32_x, pbf_wire_type::length_delimited): {
auto xit = message.get_packed_uint32();
... // handle the repeated field when it is packed
}
break;
case tag_and_type(ExampleMsg::repeated_uint32_x, pbf_wire_type::varint): {
auto x = message.get_uint32();
... // handle the repeated field when it is not packed
}
break;
default:
message.skip();
}
}
All this works on pbf_reader
in the same way as with pbf_message
with the
usual difference that pbf_reader
takes a numeric field tag and pbf_message
an enum field.
If you only want to check for one specific tag and type you can use the
two-argument version of pbf_reader::next()
. In this case 17
is the field
tag we are looking for:
std::string data = ...
pbf_reader message{data};
while (message.next(17, pbf_wire_type::varint)) {
auto foo = message.get_int32();
...
}
See the test under test/t/tag_and_type/
for a complete example.
If you know beforehand how large a message will become or can take an educated
guess, you can call the usual std::string::reserve()
on the underlying string
before you give it to an pbf_writer
or pbf_builder
object.
Or you can (at any time) call reserve()
on the pbf_writer
or pbf_builder
.
This will reserve the given amount of bytes in addition to whatever is already
in that message. (Note that this behaviour is different then what reserve()
does on std::string
or std::vector
.)
In the general case it is not easy to figure out how much memory you will need because of the varint packing of integers. But sometimes you can make at least a rough estimate. Still, you should probably only use this facility if you have benchmarks proving that it actually makes your program faster.
Protozero gives you access to the low-level functions for encoding and decoding varint and zigzag integer encodings, because these functions can sometimes be useful outside the Protocol Buffer context.
To use the low-level functions, add this include to your C++ program:
#include <protozero/varint.hpp>
The following functions are then available:
decode_varint()
write_varint()
encode_zigzag32()
encode_zigzag64()
decode_zigzag32()
decode_zigzag64()
See the reference documentation created by make doc
for details.
Length-delimited fields (like string fields, byte fields and messages) are
usually set by calling add_string()
, add_message()
, etc. These functions
have several forms, but they basically all take a tag, a size, and a
pointer to the data. They write the length of the data into the message
and then copy the data over.
Sometimes you have the data not in one place, but spread over several buffers. In this case you have to consolidate those buffers first, which needs an extra copy. Say you have two very long strings that should be concatenated into a message:
std::string a{"very long string..."};
std::string b{"another very long string..."};
std::string data;
protozero::pbf_writer writer{data};
a.append(b); // expensive extra copy
writer.add_string(1, a);
To avoid this, the function add_bytes_vectored()
can be used which allows
vectored (or scatter/gather) input like this:
std::string a{"very long string..."};
std::string b{"another very long string..."};
std::string data;
protozero::pbf_writer writer{data};
writer.add_bytes_vectored(1, a, b);
add_bytes_vectored()
will add up the sizes of all its arguments and copy over
all the data only once.
The function takes any number of arguments. The arguments must be of a type
supporting the data()
and size()
methods like protozero::data_view()
,
std::string
or the C++17 std::string_view
.
Note that there is only one version of the function which can be used for any length-delimited field including strings, bytes, messages and repeated packed fields.
The function is also available in the pbf_builder
class.