Ontotext-AD · vanxa · Jan 8, 2025 · Jan 3, 2025 · Jan 6, 2025 · Jan 6, 2025
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,40 @@
+# Kafka Sink Connector Changelog
+
+## Version 3.0.0
+
+
+### Highlights
+
+- Update to several component and dependency versions
+- Kafka Sink Connector now requires Apache Kafka 3.8.0 and Confluent platform 7.8.0
+- Fixes and improvements to synchronization of the sink connector
+- Overall code restructure
+
+### Breaking
+
+- The Kafka Sink Connector now requires Apache Kafka 3.8.0 and Confluent Platform 7.8.0
+
+### New
+
+- Added new runtime configuration property for backing off record ingestion in case of downstream congestion - `graphdb.sink.poll.backoff.timeout.ms`
+  - The configuration property has a default value of `10000` ms
+  - If the connector fails to flush records downstream it can retry flushing later. The amount of time it will wait is the backoff time
+- A new record processor instance is started for each configured (unique) connector. Multiple tasks of a single connector will share the same record processor,
+optimizing resources
+
+### Updated
+
+- Updated internal component dependencies to address detected vulnerabilities
+-
+
+### Fixed
+- Fixed a synchronization issue in which two distinct tasks would flush record to a single repository downstream
+- Fixed a synchronization issue which would prevent graceful shutdown of the connector and may potentially leave repository connections open
+- Fixed core logic issues which could result in data loss
+
+### Improvements
+
+- Improved error handling
+- Minimized record loss on failure
+- Added setting in `docker-compose.yml` for setting up remote debug on connector instances (for development and troubleshooting purposes).
+  These settings are disabled by default
diff --git a/README.md b/README.md
@@ -4,17 +4,17 @@
 
 Kafka Sink Connector for RDF update streaming to GraphDB
 
-For the current version of Apache Kafka in project is 3.8.x and the Kafka worker is 7.7.x. For compatibility matrix, 
+For the current version of Apache Kafka in project is 3.8.x and the Kafka worker is 7.8.x. For compatibility matrix,
 please refer to the confluent official [documentation](https://docs.confluent.io/platform/current/installation/versions-interoperability.html#cp-and-apache-ak-compatibility).
-io/platform/current/installation/versions-interoperability.html#cp-and-apache-ak-compatibility) 
+io/platform/current/installation/versions-interoperability.html#cp-and-apache-ak-compatibility)
 
-This means you can leverage the latest enhancements in both Apache Kafka and Confluent Platform to enhance the 
+This means you can leverage the latest enhancements in both Apache Kafka and Confluent Platform to enhance the
 performance, scalability, and features of your streaming applications.
 
 ## Upgrading Kafka Version
 
-When upgrading your Apache Kafka installation, it's crucial to ensure compatibility between Confluent and Apache 
-Kafka versions. Follow the guidelines provided by the Confluent documentation to guarantee a smooth upgrade process. 
+When upgrading your Apache Kafka installation, it's crucial to ensure compatibility between Confluent and Apache
+Kafka versions. Follow the guidelines provided by the Confluent documentation to guarantee a smooth upgrade process.
 Ensure that you refer to the compatibility matrix to find the suitable Confluent Platform version for your desired Apache Kafka version. The matrix can be found at the following link:
 
 [Confluent Platform and Apache Kafka Compatibility Matrix](https://docs.confluent.io/platform/current/installation/versions-interoperability.html#cp-and-apache-ak-compatibility)
@@ -24,18 +24,18 @@ Follow these steps to upgrade your Kafka installation:
 1. Review the compatibility matrix to identify the Confluent Platform version that matches your target Apache Kafka version.
 2. Follow the upgrade instructions provided by Confluent for the chosen Confluent Platform version.
 
-It's recommended to perform thorough testing in a staging environment before applying the upgrade to your production 
+It's recommended to perform thorough testing in a staging environment before applying the upgrade to your production
 environment. For detailed instructions and additional information, consult the official Confluent documentation.
 
 # Docker & Docker Compose
 
-A [Dockerfile](./Dockerfile) is available for building the sink connector as a docker image. It is a multistage 
+A [Dockerfile](./Dockerfile) is available for building the sink connector as a docker image. It is a multistage
 dockerfile which builds, tests and in the final stage copies the connector on the `plugin.path`.
 
 The image is based on Confluent Kafka Connect [confluentinc/cp-kafka-connect](https://hub.docker.
 com/r/confluentinc/cp-kafka-connect) image. To build the image navigate to the project root directory and execute `docker build -t kafka-sink-graphdb .`
 
-Inside the [docker-compose](./docker-compose) directory there is an example compose file that sets everything up - 
+Inside the [docker-compose](./docker-compose) directory there is an example compose file that sets everything up -
 ZooKeeper, Kafka, GraphDB and Kafka Connect. In the same directory the [run.sh](./docker-compose/run.sh) script can be used to quickly test the sink connector.
 
 The script will do the following:

diff --git a/UPGRADE.md b/UPGRADE.md
@@ -0,0 +1,18 @@
+# GraphDB Helm Chart Upgrade Guide
+
+## From 1.x and 2.x to 3.0
+
+### Configurations
+
+Version 3.0 of the Kafka Sink Connector introduces a new configuration property, used in error scenarios - `graphdb.sink.poll.backoff.timeout.ms`.
+This property is used to slow down record ingestion in case of downstream congestion - those that are intermittent and in which case data flush has failed and
+will be retried at a later point in time. The default value is set to `10000` ms, and can be kept as is.
+
+### Data Migration
+
+Because the connector is stateless, no data is kept on disk where the connector is running. Therefore, upgrading the version of the connector would simply require
+a restart of the services and re-configuration of the connector instances.
+
+### Use of Kafka producer
+
+The existing producer runtime, which eases the creation of Kafka records and their subsequent ingestion, has been removed
diff --git a/src/main/java/com/ontotext/kafka/GraphDBSinkConfig.java b/src/main/java/com/ontotext/kafka/GraphDBSinkConfig.java
@@ -103,7 +103,7 @@ public enum TransactionType {
 
 	public static final String POLL_BACKOFF_TIMEOUT = "graphdb.sink.poll.backoff.timeout.ms";
 	public static final long DEFAULT_POLL_BACKOFF_TIMEOUT = 10000;
-	public static final String POLL_BACKOFF_TIMEOUT_DOC = "The timeout applied per batch that is not full before it is committed";
+	public static final String POLL_BACKOFF_TIMEOUT_DOC = "Backoff time (in ms) which forces the task to pause ingestion in case of retriable exceptions downstream";
 
 	public static final String TEMPLATE_ID = "graphdb.template.id";
 	public static final String TEMPLATE_ID_DOC = "The id(IRI) of GraphDB Template to be used by in SPARQL Update";