Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Changelog and Migration documents #47

Merged
merged 3 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Kafka Sink Connector Changelog

## Version 3.0.0


### Highlights

- Update to several component and dependency versions
- Kafka Sink Connector now requires Apache Kafka 3.8.0 and Confluent platform 7.8.0
- Fixes and improvements to synchronization of the sink connector
- Overall code restructure

### Breaking

- The Kafka Sink Connector now requires Apache Kafka 3.8.0 and Confluent Platform 7.8.0

### New

- Added new runtime configuration property for backing off record ingestion in case of downstream congestion - `graphdb.sink.poll.backoff.timeout.ms`
- The configuration property has a default value of `10000` ms
- If the connector fails to flush records downstream it can retry flushing later. The amount of time it will wait is the backoff time
- A new record processor instance is started for each configured (unique) connector. Multiple tasks of a single connector will share the same record processor,
optimizing resources

### Updated

- Updated internal component dependencies to address detected vulnerabilities
vanxa marked this conversation as resolved.
Show resolved Hide resolved
-

### Fixed
- Fixed a synchronization issue in which two distinct tasks would flush record to a single repository downstream
- Fixed a synchronization issue which would prevent graceful shutdown of the connector and may potentially leave repository connections open
- Fixed core logic issues which could result in data loss

### Improvements

- Improved error handling
- Minimized record loss on failure
- Added setting in `docker-compose.yml` for setting up remote debug on connector instances (for development and troubleshooting purposes).
These settings are disabled by default
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@

Kafka Sink Connector for RDF update streaming to GraphDB

For the current version of Apache Kafka in project is 3.8.x and the Kafka worker is 7.7.x. For compatibility matrix,
For the current version of Apache Kafka in project is 3.8.x and the Kafka worker is 7.8.x. For compatibility matrix,
please refer to the confluent official [documentation](https://docs.confluent.io/platform/current/installation/versions-interoperability.html#cp-and-apache-ak-compatibility).
io/platform/current/installation/versions-interoperability.html#cp-and-apache-ak-compatibility)
io/platform/current/installation/versions-interoperability.html#cp-and-apache-ak-compatibility)

This means you can leverage the latest enhancements in both Apache Kafka and Confluent Platform to enhance the
This means you can leverage the latest enhancements in both Apache Kafka and Confluent Platform to enhance the
performance, scalability, and features of your streaming applications.

## Upgrading Kafka Version

When upgrading your Apache Kafka installation, it's crucial to ensure compatibility between Confluent and Apache
Kafka versions. Follow the guidelines provided by the Confluent documentation to guarantee a smooth upgrade process.
When upgrading your Apache Kafka installation, it's crucial to ensure compatibility between Confluent and Apache
Kafka versions. Follow the guidelines provided by the Confluent documentation to guarantee a smooth upgrade process.
Ensure that you refer to the compatibility matrix to find the suitable Confluent Platform version for your desired Apache Kafka version. The matrix can be found at the following link:

[Confluent Platform and Apache Kafka Compatibility Matrix](https://docs.confluent.io/platform/current/installation/versions-interoperability.html#cp-and-apache-ak-compatibility)
Expand All @@ -24,18 +24,18 @@ Follow these steps to upgrade your Kafka installation:
1. Review the compatibility matrix to identify the Confluent Platform version that matches your target Apache Kafka version.
2. Follow the upgrade instructions provided by Confluent for the chosen Confluent Platform version.

It's recommended to perform thorough testing in a staging environment before applying the upgrade to your production
It's recommended to perform thorough testing in a staging environment before applying the upgrade to your production
environment. For detailed instructions and additional information, consult the official Confluent documentation.

# Docker & Docker Compose

A [Dockerfile](./Dockerfile) is available for building the sink connector as a docker image. It is a multistage
A [Dockerfile](./Dockerfile) is available for building the sink connector as a docker image. It is a multistage
dockerfile which builds, tests and in the final stage copies the connector on the `plugin.path`.

The image is based on Confluent Kafka Connect [confluentinc/cp-kafka-connect](https://hub.docker.
com/r/confluentinc/cp-kafka-connect) image. To build the image navigate to the project root directory and execute `docker build -t kafka-sink-graphdb .`

Inside the [docker-compose](./docker-compose) directory there is an example compose file that sets everything up -
Inside the [docker-compose](./docker-compose) directory there is an example compose file that sets everything up -
ZooKeeper, Kafka, GraphDB and Kafka Connect. In the same directory the [run.sh](./docker-compose/run.sh) script can be used to quickly test the sink connector.

The script will do the following:
Expand Down
18 changes: 18 additions & 0 deletions UPGRADE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# GraphDB Helm Chart Upgrade Guide

## From 1.x and 2.x to 3.0

### Configurations

Version 3.0 of the Kafka Sink Connector introduces a new configuration property, used in error scenarios - `graphdb.sink.poll.backoff.timeout.ms`.
This property is used to slow down record ingestion in case of downstream congestion - those that are intermittent and in which case data flush has failed and
will be retried at a later point in time. The default value is set to `10000` ms, and can be kept as is.

### Data Migration

Because the connector is stateless, no data is kept on disk where the connector is running. Therefore, upgrading the version of the connector would simply require
a restart of the services and re-configuration of the connector instances.

### Use of Kafka producer

The existing producer runtime, which eases the creation of Kafka records and their subsequent ingestion, has been removed
2 changes: 1 addition & 1 deletion src/main/java/com/ontotext/kafka/GraphDBSinkConfig.java
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ public enum TransactionType {

public static final String POLL_BACKOFF_TIMEOUT = "graphdb.sink.poll.backoff.timeout.ms";
public static final long DEFAULT_POLL_BACKOFF_TIMEOUT = 10000;
public static final String POLL_BACKOFF_TIMEOUT_DOC = "The timeout applied per batch that is not full before it is committed";
public static final String POLL_BACKOFF_TIMEOUT_DOC = "Backoff time (in ms) which forces the task to pause ingestion in case of retriable exceptions downstream";

public static final String TEMPLATE_ID = "graphdb.template.id";
public static final String TEMPLATE_ID_DOC = "The id(IRI) of GraphDB Template to be used by in SPARQL Update";
Expand Down
Loading