The RFC covers the ability to implicitly namespace Vector configuration based on the configuration directory structure. This provides an easy mechanism for organizing large Vector configuration, something that will become more pronounced as Vector introduces the upcoming Pipelines feature.
None
- Implicit namespacing based on Vector's configuration directory structure
- How and where configuration is loaded and namespaced
- How components configuration should work
- How components should work
- Advanced configuration templating tactics (explicitly including files, etc)
- Multiple configuration directories
As Vector evolves and introduces configuration-heavy functionality, like the aggregator role, and the upcoming Pipelines feature, the amount of configuration necessary to program Vector grows large. The ability to organize Vector across multiple files is non-obvious and includes a heavy amount of boilerplate, making the configuration difficult for collaboration and navigation.
To solve for above, we'd like to introduce implicit configuration namespacing based on Vector's configuration directory structure. This aligns the community behind an opinionated method for organizing Vector's configuration, making it easy for users to split up their configuration files and collaborate with others on their team.
- When loading Vector's configuration using
--config-dir
(let say--config-dir /etc/vector
), it will look in every subfolder for any component configuration file, with their filenames being their component ID.
# /etc/vector/vector.toml
[sinks.foo]
type = "anything"
can become
# /etc/vector/sinks/foo.toml
type = "anything"
- Vector will only consider the files with the
yml
,yaml
,json
, ortoml
extensions of throw an error. - Any duplicate component ID (like
sinks/foo.toml
andsinks/foo.json
) will error. - If Vector's configuration is not loaded using a specific configuration folder (
--config-dir /etc/vector
for example), Vector will keep its default behavior and only load the specified configuration file. - If Vector encounters a hidden file or a hidden folder (name starting with a
.
, like/etc/vector/.data
or/etc/vector/.foo.toml
), the file/folder will be ignored. - If Vector encounters a folder (like
/etc/vector/foo
) with a name that doesn't refer to a component type (likesources
,transforms
,sinks
,enrichment_tables
,tests
), an error will be thrown. - If a component file (like
/etc/vector/sinks/foo.toml
) doesn't have a proper sink configuration structure, Vector will error.
- When loading the configuration from a directory, instead of only considering the files in that directory, Vector will consider the subfolders, load each file in those folders and merge the components with the current configuration.
fn load_builder_from_dir(path: &Path) -> Result<(ConfigBuilder, Vec<String>), Vec<String>> {
let mut builder = ConfigBuilder::default();
let mut errors = Vec::new();
for child in path.children() {
if child.is_dir() {
match child.name() {
// same with other component types like transforms, sources, tests, enrichment_tables
"sinks" => load_sinks_from_dir(child, &mut builder, &mut errors),
other => tracing::debug!("ignoring folder {}", other),
}
} else {
load_builder_from_file(child, &mut builder, &mut errors);
}
}
}
// same with transforms, sources, tests, enrichment_tables
fn load_sinks_from_dir(path: &Path, builder: &mut ConfigBuilder, errors: &mut Vec<String>) {
for child in path.children() {
if child.is_file() {
match load_sink_from_file(child) {
Ok(sink) => builder.add_sink(child.name(), sink.inputs, sink.inner),
Err(msg) => errors.push(msg),
};
}
}
}
Why is this change worth it?
- Having this kind of feature for the new pipeline 2.0 will make it easier to load the transforms and the inner pipelines. It will also make it much nicer to use across multiple teams.
- This would allow to an admin to allow users to write the files in the
transforms
folder and not read the files insinks
folder.
How does this position us for success in the future?
- Pipeline 2.0 will rely on splitting the configuration file into compound transforms. Splitting the configuration that way will allow to have a dedicated folder or file for the definition of those transforms.
This feature is a nice-to-have for users with large configuration files, but it is not required for any upcoming development. The new Pipelines feature can still function with a single configuration file, but it exacerbates the problem stated in the pain.
Very little. This will have minimal impact to the Vector codebase. Configuration will be loaded and built in one step before configuration is validated. I don't foresee this introduces any meaningful maintenance burden on the team.
Incremental steps to execute this change. These will be converted to issues after the RFC is approved:
- Update the loading process to load components to implement the strategy design pattern.
- Load
transforms
from subfolder - Load
sinks
from subfolder - Load
sources
from subfolder - Load
enrichment_tables
from subfolder - Load
tests
from subfolder
Note: This can be filled out during the review process.
- Add the Pipeline 2.0 feature