diff --git a/src/current/_data/redirects.yml b/src/current/_data/redirects.yml index c030bb86812..9d1204e0acd 100644 --- a/src/current/_data/redirects.yml +++ b/src/current/_data/redirects.yml @@ -253,7 +253,7 @@ - destination: how-does-a-changefeed-work.md sources: ['how-does-an-enterprise-changefeed-work.md'] - versions: ['v25.2', 'v25.1'] + versions: ['v25.2', 'v25.1', 'v24.3'] - destination: kubernetes-overview.md sources: ['operate-cockroachdb-kubernetes.md'] diff --git a/src/current/_includes/releases/v24.3/v24.3.0-alpha.1.md b/src/current/_includes/releases/v24.3/v24.3.0-alpha.1.md index bcb58d530ef..2d1df7a681d 100644 --- a/src/current/_includes/releases/v24.3/v24.3.0-alpha.1.md +++ b/src/current/_includes/releases/v24.3/v24.3.0-alpha.1.md @@ -58,9 +58,9 @@ Release Date: October 9, 2024 - Updated the cluster setting [`changefeed.sink_io_workers`]({% link v24.3/cluster-settings.md %}#setting-changefeed-sink-io-workers) with all the [sinks]({% link v24.3/changefeed-sinks.md %}) that support the setting. [#129946][#129946] - Added a LDAP authentication method to complement password-based login for the [DB Console]({% link v24.3/ui-overview.md %}) if HBA configuration has an entry for LDAP for the user attempting login, along with other matching criteria (like the requests originating IP address) for authentication to the DB Console. [#130418][#130418] - Added timers around key parts of the [changefeed]({% link v24.3/change-data-capture-overview.md %}) pipeline to help debug feeds experiencing issues. The `changefeed.stage..latency` metrics now emit latency histograms for each stage. The metric respects the [changefeed `scope` label]({% link v24.3/monitor-and-debug-changefeeds.md %}#using-changefeed-metrics-labels) for debugging specific feeds. [#128794][#128794] -- For [enterprise changefeeds]({% link v24.3/how-does-an-enterprise-changefeed-work.md %}), [events]({% link v24.3/eventlog.md %}) `changefeed_failed` and `create_changefeed` now include a `JobId` field. [#131396][#131396] +- For [enterprise changefeeds]({% link v24.3/how-does-a-changefeed-work.md %}), [events]({% link v24.3/eventlog.md %}) `changefeed_failed` and `create_changefeed` now include a `JobId` field. [#131396][#131396] - The new [metric]({% link v24.3/metrics.md %}) `seconds_until_license_expiry` allows you to monitor the status of a cluster's Enterprise license. [#129052][#129052]. -- Added the `changefeed.total_ranges` metric, which [monitors]({% link v24.3/monitor-and-debug-changefeeds.md %}) the number of [ranges]({% link v24.3/architecture/overview.md %}#architecture-range) that are watched by [changefeed aggregators]({% link v24.3/how-does-an-enterprise-changefeed-work.md %}). It shares the same polling interval as [`changefeed.lagging_ranges`]({% link v24.3/advanced-changefeed-configuration.md %}#lagging-ranges), which is controlled by the existing `lagging_ranges_polling_interval` option. [#130897][#130897] +- Added the `changefeed.total_ranges` metric, which [monitors]({% link v24.3/monitor-and-debug-changefeeds.md %}) the number of [ranges]({% link v24.3/architecture/overview.md %}#architecture-range) that are watched by [changefeed aggregators]({% link v24.3/how-does-a-changefeed-work.md %}). It shares the same polling interval as [`changefeed.lagging_ranges`]({% link v24.3/advanced-changefeed-configuration.md %}#lagging-ranges), which is controlled by the existing `lagging_ranges_polling_interval` option. [#130897][#130897]

SQL language changes

diff --git a/src/current/_includes/v24.3/cdc/cdc-schema-locked-example.md b/src/current/_includes/v24.3/cdc/cdc-schema-locked-example.md index 0908749d4de..5af4d0a248c 100644 --- a/src/current/_includes/v24.3/cdc/cdc-schema-locked-example.md +++ b/src/current/_includes/v24.3/cdc/cdc-schema-locked-example.md @@ -1,4 +1,4 @@ -Use the `schema_locked` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}) to disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on a watched table, which allows the changefeed to take a fast path that avoids checking if there are schema changes that could require synchronization between [changefeed aggregators]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). This helps to decrease the latency between a write committing to a table and it emitting to the [changefeed's sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). Enabling `schema_locked` +Use the `schema_locked` [storage parameter]({% link {{ page.version.version }}/with-storage-parameter.md %}) to disallow [schema changes]({% link {{ page.version.version }}/online-schema-changes.md %}) on a watched table, which allows the changefeed to take a fast path that avoids checking if there are schema changes that could require synchronization between [changefeed aggregators]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). This helps to decrease the latency between a write committing to a table and it emitting to the [changefeed's sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). Enabling `schema_locked` Enable `schema_locked` on the watched table with the [`ALTER TABLE`]({% link {{ page.version.version }}/alter-table.md %}) statement: diff --git a/src/current/_includes/v24.3/cdc/create-core-changefeed-avro.md b/src/current/_includes/v24.3/cdc/create-sinkless-changefeed-avro.md similarity index 96% rename from src/current/_includes/v24.3/cdc/create-core-changefeed-avro.md rename to src/current/_includes/v24.3/cdc/create-sinkless-changefeed-avro.md index 53dab65cff2..3def7db2e10 100644 --- a/src/current/_includes/v24.3/cdc/create-core-changefeed-avro.md +++ b/src/current/_includes/v24.3/cdc/create-sinkless-changefeed-avro.md @@ -28,9 +28,9 @@ In this example, you'll set up a basic changefeed for a single-node cluster that $ cockroach sql --url="postgresql://root@127.0.0.1:26257?sslmode=disable" --format=csv ~~~ - {% include {{ page.version.version }}/cdc/core-url.md %} + {% include {{ page.version.version }}/cdc/sinkless-url.md %} - {% include {{ page.version.version }}/cdc/core-csv.md %} + {% include {{ page.version.version }}/cdc/sinkless-csv.md %} 1. Enable the `kv.rangefeed.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}): diff --git a/src/current/_includes/v24.3/cdc/create-core-changefeed.md b/src/current/_includes/v24.3/cdc/create-sinkless-changefeed.md similarity index 94% rename from src/current/_includes/v24.3/cdc/create-core-changefeed.md rename to src/current/_includes/v24.3/cdc/create-sinkless-changefeed.md index df2264501a0..cf1be56ed82 100644 --- a/src/current/_includes/v24.3/cdc/create-core-changefeed.md +++ b/src/current/_includes/v24.3/cdc/create-sinkless-changefeed.md @@ -19,9 +19,9 @@ In this example, you'll set up a basic changefeed for a single-node cluster. --format=csv ~~~ - {% include {{ page.version.version }}/cdc/core-url.md %} + {% include {{ page.version.version }}/cdc/sinkless-url.md %} - {% include {{ page.version.version }}/cdc/core-csv.md %} + {% include {{ page.version.version }}/cdc/sinkless-csv.md %} 1. Enable the `kv.rangefeed.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}): diff --git a/src/current/_includes/v24.3/cdc/lagging-ranges.md b/src/current/_includes/v24.3/cdc/lagging-ranges.md index 45180baa57f..b395706d451 100644 --- a/src/current/_includes/v24.3/cdc/lagging-ranges.md +++ b/src/current/_includes/v24.3/cdc/lagging-ranges.md @@ -5,7 +5,7 @@ Use the `changefeed.lagging_ranges` metric to track the number of [ranges]({% li - `lagging_ranges_polling_interval` sets the interval rate for when lagging ranges are checked and the `lagging_ranges` metric is updated. Polling adds latency to the `lagging_ranges` metric being updated. For example, if a range falls behind by 3 minutes, the metric may not update until an additional minute afterward. - **Default:** `1m` -{% include_cached new-in.html version="v24.3" %} Use the `changefeed.total_ranges` metric to monitor the number of ranges that are watched by [aggregator processors]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) participating in the changefeed job. If you're experiencing lagging ranges, `changefeed.total_ranges` may indicate that the number of ranges watched by aggregator processors in the job is unbalanced. You may want to try [pausing]({% link {{ page.version.version }}/pause-job.md %}) the changefeed and then [resuming]({% link {{ page.version.version }}/resume-job.md %}) it, so that the changefeed replans the work in the cluster. `changefeed.total_ranges` shares the same polling interval as the `changefeed.lagging_ranges` metric, which is controlled by the `lagging_ranges_polling_interval` option. +{% include_cached new-in.html version="v24.3" %} Use the `changefeed.total_ranges` metric to monitor the number of ranges that are watched by [aggregator processors]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) participating in the changefeed job. If you're experiencing lagging ranges, `changefeed.total_ranges` may indicate that the number of ranges watched by aggregator processors in the job is unbalanced. You may want to try [pausing]({% link {{ page.version.version }}/pause-job.md %}) the changefeed and then [resuming]({% link {{ page.version.version }}/resume-job.md %}) it, so that the changefeed replans the work in the cluster. `changefeed.total_ranges` shares the same polling interval as the `changefeed.lagging_ranges` metric, which is controlled by the `lagging_ranges_polling_interval` option. {{site.data.alerts.callout_success}} You can use the [`metrics_label`]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#using-changefeed-metrics-labels) option to track the `lagging_ranges` and `total_ranges` metric per changefeed. diff --git a/src/current/_includes/v24.3/cdc/core-csv.md b/src/current/_includes/v24.3/cdc/sinkless-csv.md similarity index 100% rename from src/current/_includes/v24.3/cdc/core-csv.md rename to src/current/_includes/v24.3/cdc/sinkless-csv.md diff --git a/src/current/_includes/v24.3/cdc/core-url.md b/src/current/_includes/v24.3/cdc/sinkless-url.md similarity index 100% rename from src/current/_includes/v24.3/cdc/core-url.md rename to src/current/_includes/v24.3/cdc/sinkless-url.md diff --git a/src/current/_includes/v24.3/sidebar-data/stream-data.json b/src/current/_includes/v24.3/sidebar-data/stream-data.json index b6cbe606f6c..5b570ca775b 100644 --- a/src/current/_includes/v24.3/sidebar-data/stream-data.json +++ b/src/current/_includes/v24.3/sidebar-data/stream-data.json @@ -139,9 +139,9 @@ "title": "Technical Overview", "items": [ { - "title": "How Does an Enterprise Changefeed Work?", + "title": "How Does a Changefeed Work?", "urls": [ - "/${VERSION}/how-does-an-enterprise-changefeed-work.html" + "/${VERSION}/how-does-a-changefeed-work.html" ] } ] diff --git a/src/current/images/v24.3/changefeed-structure.png b/src/current/images/v24.3/changefeed-structure.png index 3b09f8f15e9..b802cb45038 100644 Binary files a/src/current/images/v24.3/changefeed-structure.png and b/src/current/images/v24.3/changefeed-structure.png differ diff --git a/src/current/v24.3/advanced-changefeed-configuration.md b/src/current/v24.3/advanced-changefeed-configuration.md index 0a64c56e0f6..0004031b065 100644 --- a/src/current/v24.3/advanced-changefeed-configuration.md +++ b/src/current/v24.3/advanced-changefeed-configuration.md @@ -63,13 +63,13 @@ Adjusting `kv.closed_timestamp.target_duration` could have a detrimental impact `kv.closed_timestamp.target_duration` controls the target [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) lag duration, which determines how far behind the current time CockroachDB will attempt to maintain the closed timestamp. For example, with the default value of `3s`, if the current time is `12:30:00` then CockroachDB will attempt to keep the closed timestamp at `12:29:57` by possibly retrying or aborting ongoing writes that are below this time. -A changefeed aggregates checkpoints across all ranges, and once the timestamp on all the ranges advances, the changefeed can then [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). In the context of changefeeds, `kv.closed_timestamp.target_duration` affects how old the checkpoints will be, which will determine the latency before changefeeds can consider the history of an event complete. +A changefeed aggregates checkpoints across all ranges, and once the timestamp on all the ranges advances, the changefeed can then [checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). In the context of changefeeds, `kv.closed_timestamp.target_duration` affects how old the checkpoints will be, which will determine the latency before changefeeds can consider the history of an event complete. #### `kv.rangefeed.closed_timestamp_refresh_interval` **Default:** `3s` -This setting controls the interval at which [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) updates are delivered to [rangefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) and in turn emitted as a [changefeed checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +This setting controls the interval at which [closed timestamp]({% link {{ page.version.version }}/architecture/transaction-layer.md %}#closed-timestamps) updates are delivered to [rangefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) and in turn emitted as a [changefeed checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Increasing the interval value will lengthen the delay between each checkpoint, which will increase the latency of changefeed checkpoints, but reduce the impact on SQL latency due to [overload]({% link {{ page.version.version }}/admission-control.md %}#use-cases-for-admission-control) on the cluster. This happens because every range with a rangefeed has to emit a checkpoint event with this `3s` interval. As an example, 1 million ranges would result in 330,000 events per second, which would use more CPU resources. @@ -117,7 +117,7 @@ Before tuning these settings, we recommend reading details on our [changefeed at ### Pausing changefeeds and garbage collection -By default, [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) will protect changefeed data from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) up to the time of the [_checkpoint_]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). Protected timestamps will protect changefeed data from garbage collection if the downstream [changefeed sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) is unavailable until you either [cancel]({% link {{ page.version.version }}/cancel-job.md %}) the changefeed or the sink becomes available once again. +By default, [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) will protect changefeed data from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) up to the time of the [_checkpoint_]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Protected timestamps will protect changefeed data from garbage collection if the downstream [changefeed sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) is unavailable until you either [cancel]({% link {{ page.version.version }}/cancel-job.md %}) the changefeed or the sink becomes available once again. However, if the changefeed lags too far behind, the protected changes could lead to an accumulation of garbage. This could result in increased disk usage and degraded performance for some workloads. @@ -175,7 +175,7 @@ When designing a system that needs to emit a lot of changefeed messages, whether When a changefeed emits a [resolved]({% link {{ page.version.version }}/create-changefeed.md %}#resolved) message, it force flushes all outstanding messages that have buffered, which will diminish your changefeed's throughput while the flush completes. Therefore, if you are aiming for higher throughput, we suggest setting the duration higher (e.g., 10 minutes), or **not** using the `resolved` option. -If you are setting the `resolved` option when you are aiming for high throughput, you must also consider the [`min_checkpoint_frequency`]({% link {{ page.version.version }}/create-changefeed.md %}#min-checkpoint-frequency) option, which defaults to `30s`. This option controls how often nodes flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). As a result, `resolved` messages will not be emitted more frequently than the configured `min_checkpoint_frequency`. Set this option to at least as long as your `resolved` option duration. +If you are setting the `resolved` option when you are aiming for high throughput, you must also consider the [`min_checkpoint_frequency`]({% link {{ page.version.version }}/create-changefeed.md %}#min-checkpoint-frequency) option, which defaults to `30s`. This option controls how often nodes flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). As a result, `resolved` messages will not be emitted more frequently than the configured `min_checkpoint_frequency`. Set this option to at least as long as your `resolved` option duration. ### Batching and buffering messages diff --git a/src/current/v24.3/cdc-queries.md b/src/current/v24.3/cdc-queries.md index 116c00311b0..c11c018af4e 100644 --- a/src/current/v24.3/cdc-queries.md +++ b/src/current/v24.3/cdc-queries.md @@ -63,7 +63,7 @@ Function | Description --------------------------+---------------------- `changefeed_creation_timestamp()` | Returns the decimal MVCC timestamp when the changefeed was created. Use this function to build CDC queries that restrict emitted events by time. `changefeed_creation_timestamp()` can serve a similar purpose to the [`now()` time function]({% link {{ page.version.version }}/functions-and-operators.md %}#date-and-time-functions), which is not supported with CDC queries. `event_op()` | Returns a string describing the type of event. If a changefeed is running with the [`diff`]({% link {{ page.version.version }}/create-changefeed.md %}#diff) option, then this function returns `'insert'`, `'update'`, or `'delete'`. If a changefeed is running without the `diff` option, it is not possible to determine an update from an insert, so `event_op()` returns [`'upsert'`](https://www.cockroachlabs.com/blog/sql-upsert/) or `'delete'`. -`event_schema_timestamp()` | Returns the timestamp of [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) events that cause a [changefeed message]({% link {{ page.version.version }}/changefeed-messages.md %}) to emit. When the schema change event does not result in a table backfill or scan, `event_schema_timestamp()` will return the event's timestamp. When the schema change event does result in a table backfill or scan, `event_schema_timestamp()` will return the timestamp at which the backfill/scan is read — the [high-water mark time]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) of the changefeed. +`event_schema_timestamp()` | Returns the timestamp of [schema change]({% link {{ page.version.version }}/online-schema-changes.md %}) events that cause a [changefeed message]({% link {{ page.version.version }}/changefeed-messages.md %}) to emit. When the schema change event does not result in a table backfill or scan, `event_schema_timestamp()` will return the event's timestamp. When the schema change event does result in a table backfill or scan, `event_schema_timestamp()` will return the timestamp at which the backfill/scan is read — the [high-water mark time]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) of the changefeed. You can also use the following functions in CDC queries: diff --git a/src/current/v24.3/change-data-capture-overview.md b/src/current/v24.3/change-data-capture-overview.md index a8c44044f7f..8d290ede3b8 100644 --- a/src/current/v24.3/change-data-capture-overview.md +++ b/src/current/v24.3/change-data-capture-overview.md @@ -6,29 +6,31 @@ docs_area: stream_data key: stream-data-out-of-cockroachdb-using-changefeeds.html --- -Change data capture (CDC) detects row-level data changes in CockroachDB and sends the change as a message to a configurable sink for downstream processing purposes. While CockroachDB is an excellent system of record, it also needs to coexist with other systems. +**Change data capture (CDC)** detects row-level data changes in CockroachDB and emits those changes as messages for downstream processing. While CockroachDB is an excellent system of record, CDC allows it to integrate with other systems in your data ecosystem. For example, you might want to: - Stream messages to Kafka to trigger notifications in an application. -- Keep your data mirrored in full-text indexes, analytics engines, or big data pipelines. -- Export a snaphot of tables to backfill new applications. -- Send updates to data stores for machine learning models. +- Mirror your data in full-text indexes, analytics engines, or big data pipelines. +- Export a snapshot of tables to backfill new applications. +- Feed updates to data stores powering machine learning models. {% include common/define-watched-cdc.md %} ## Stream row-level changes with changefeeds -Changefeeds are customizable _jobs_ that track row-level changes and send data in realtime in a preferred format to your specified destination, known as a _sink_. Every row change will be emitted at least once and the first emit of every event for the same key will be ordered by timestamp. +Changefeeds are customizable _jobs_ that monitor row-level changes in a table and emit updates in real time. These updates are delivered in your preferred format to a specified destination, known as a _sink_. -CockroachDB has two implementations of changefeeds: +In production, changefeeds are typically configured with an external sink such as Kafka or cloud storage. However, for development and testing purposes, _sinkless changefeeds_ allow you to stream change data directly to your SQL client. + +Each emitted row change is delivered at least once, and the first emit of every event for the same key is ordered by timestamp. - - + + @@ -44,7 +46,7 @@ CockroachDB has two implementations of changefeeds: Product availability - + @@ -52,15 +54,15 @@ CockroachDB has two implementations of changefeeds: Message delivery - + - - + + @@ -75,7 +77,7 @@ CockroachDB has two implementations of changefeeds: - + @@ -100,14 +102,14 @@ CockroachDB has two implementations of changefeeds: Message format - + - + @@ -125,10 +127,10 @@ CockroachDB has two implementations of changefeeds: To get started with changefeeds in CockroachDB, refer to: -- [Create and Configure Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}): Learn about the fundamentals of using SQL statements to create and manage Enterprise and basic changefeeds. +- [Create and Configure Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}): Learn about the fundamentals of using SQL statements to create and manage changefeeds. - [Changefeed Sinks]({% link {{ page.version.version }}/changefeed-sinks.md %}): The downstream system to which the changefeed emits changes. Learn about the supported sinks and configuration capabilities. -- [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}): The change events that emit from the changefeed to your sink. Learn about how messages are ordered at your sink and the options to configure and format messages. -- [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}): Step-by-step examples for connecting to each changefeed sink. +- [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}): The change events that emit from the changefeed. Learn about how messages are ordered and the options to configure and format messages. +- [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}): Step-by-step examples for connecting to changefeed sinks or running sinkless changefeeds. ### Authenticate to your changefeed sink @@ -161,7 +163,7 @@ For detail on how protected timestamps and garbage collection interact with chan ### Filter your change data with CDC queries -_Change data capture queries_ allow you to define and filter the change data emitted to your sink when you create an Enterprise changefeed. +_Change data capture queries_ allow you to define and filter the change data emitted to your sink when you create an changefeed. For example, you can use CDC queries to: @@ -182,4 +184,4 @@ For examples and more detail, refer to: ### Determine the nodes running a changefeed by locality -CockroachDB supports an option to set locality filter requirements that nodes must meet in order to take part in a changefeed job. This is helpful in multi-region clusters to ensure the nodes that are physically closest to the sink emit changefeed messages. For syntax and further technical detail, refer to [Run a changefeed job by locality]({% link {{ page.version.version }}/changefeeds-in-multi-region-deployments.md %}#run-a-changefeed-job-by-locality). +CockroachDB supports an option to set locality filter requirements that nodes must meet in order to take part in a changefeed job. This is helpful in multi-region clusters to ensure the nodes that are physically closest to the sink emit changefeed messages. For syntax and further technical detail, refer to [Run a changefeed job by locality]({% link {{ page.version.version }}/changefeeds-in-multi-region-deployments.md %}#run-a-changefeed-job-by-locality). \ No newline at end of file diff --git a/src/current/v24.3/changefeed-best-practices.md b/src/current/v24.3/changefeed-best-practices.md index aefb6ec1dfb..94e88ee081b 100644 --- a/src/current/v24.3/changefeed-best-practices.md +++ b/src/current/v24.3/changefeed-best-practices.md @@ -31,7 +31,7 @@ When you are running more than 10 changefeeds on a cluster, it is important to m To maintain a high number of changefeeds in your cluster: -- Connect to different nodes to [create]({% link {{ page.version.version }}/create-changefeed.md %}) each changefeed. The node on which you start the changefeed will become the _coordinator_ node for the changefeed job. The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. As a result, this node will use more resources for the changefeed job. For more detail, refer to [How does an Enterprise changefeed work?]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +- Connect to different nodes to [create]({% link {{ page.version.version }}/create-changefeed.md %}) each changefeed. The node on which you start the changefeed will become the _coordinator_ node for the changefeed job. The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. As a result, this node will use more resources for the changefeed job. For more detail, refer to [How does an Enterprise changefeed work?]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). - Consider logically grouping the target tables into one changefeed. When a changefeed [pauses]({% link {{ page.version.version }}/pause-job.md %}), it will stop emitting messages for the target tables. Grouping tables of related data into a single changefeed may make sense for your workload. However, we do not recommend watching hundreds of tables in a single changefeed. For more detail on protecting data from garbage collection when a changefeed is paused, refer to [Garbage collection and changefeeds]({% link {{ page.version.version }}/protect-changefeed-data.md %}). ## Monitor changefeeds diff --git a/src/current/v24.3/changefeed-examples.md b/src/current/v24.3/changefeed-examples.md index d3ece7031bd..852ca21e36a 100644 --- a/src/current/v24.3/changefeed-examples.md +++ b/src/current/v24.3/changefeed-examples.md @@ -5,11 +5,11 @@ toc: true docs_area: stream_data --- -This page provides step-by-step examples for using Core and {{ site.data.products.enterprise }} changefeeds. Creating {{ site.data.products.enterprise }} changefeeds is available on CockroachDB {{ site.data.products.basic }}, {{ site.data.products.standard }}, {{ site.data.products.advanced }}, and with an [{{ site.data.products.enterprise }} license](licensing-faqs.html#types-of-licenses) on CockroachDB {{ site.data.products.core }} clusters. Basic changefeeds are available in all products. +This page provides quick setup guides for connecting changefeeds to sinks and for using sinkless changefeeds. -For a summary of Core and {{ site.data.products.enterprise }} changefeed features, refer to the [Change Data Capture Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}) page. +For a summary of changefeed features, refer to the [Change Data Capture Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}) page. -{{ site.data.products.enterprise }} changefeeds can connect to the following sinks: +Changefeeds can emit messages to the following sinks: - [Kafka](#create-a-changefeed-connected-to-kafka) - [Google Cloud Pub/Sub](#create-a-changefeed-connected-to-a-google-cloud-pub-sub-sink) @@ -22,14 +22,12 @@ Refer to the [Changefeed Sinks]({% link {{ page.version.version }}/changefeed-si {% include {{ page.version.version }}/cdc/recommendation-monitoring-pts.md %} -Use the following filters to show usage examples for either **Enterprise** or **Core** changefeeds: -
- - + +
-
+
Before you run the examples, verify that you have the `CHANGEFEED` privilege in order to create and manage changefeed jobs. Refer to [Required privileges]({% link {{ page.version.version }}/create-changefeed.md %}#required-privileges) for more details. @@ -41,8 +39,6 @@ Before you run the examples, verify that you have the `CHANGEFEED` privilege in In this example, you'll set up a changefeed for a single-node cluster that is connected to a Kafka sink. The changefeed will watch two tables. -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -182,8 +178,6 @@ In this example, you'll set up a changefeed for a single-node cluster that is co In this example, you'll set up a changefeed for a single-node cluster that is connected to a Kafka sink and emits [Avro](https://avro.apache.org/docs/1.8.2/spec.html) records. The changefeed will watch two tables. -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -485,8 +479,6 @@ You'll need access to a [Google Cloud Project](https://cloud.google.com/resource In this example, you'll set up a changefeed for a single-node cluster that is connected to an AWS S3 sink. The changefeed watches two tables. Note that you can set up changefeeds for any of [these cloud storage providers]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink). -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -603,8 +595,6 @@ In this example, you'll set up a changefeed for a single-node cluster that is co In this example, you'll set up a changefeed for a single-node cluster that is connected to a local HTTP server via a webhook. For this example, you'll use an [example HTTP server](https://github.com/cockroachlabs/cdc-webhook-sink-test-server/tree/master/go-https-server) to test out the webhook sink. -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -680,7 +670,7 @@ In this example, you'll set up a changefeed for a single-node cluster that is co 2021/08/24 14:00:22 {"payload":[{"after":{"city":"san francisco","creation_time":"2019-01-02T03:04:05","current_location":"3893 Dunn Fall Apt. 11","ext":{"color":"black"},"id":"21b2ec54-81ad-4af7-a76d-6087b9c7f0f8","dog_owner_id":"8924c3af-ea6e-4e7e-b2c8-2e318f973393","status":"lost","type":"scooter"},"key":["san francisco","21b2ec54-81ad-4af7-a76d-6087b9c7f0f8"],"topic":"vehicles","updated":"1629813621680097993.0000000000"}],"length":1} ~~~ - For more detail on emitted changefeed messages, see [responses]({% link {{ page.version.version }}/changefeed-messages.md %}#responses). + For more detail on emitted changefeed messages, refer to the [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}) page. ## Create a changefeed connected to an Apache Pulsar sink @@ -769,28 +759,27 @@ In this example, you'll set up a changefeed for a single-node cluster that is co key:[null], properties:[], content:{"Key":["rome", "3c7d6676-f713-4985-ba52-4c19fe6c3692"],"Value":{"after": {"city": "rome", "end_address": null, "end_time": null, "id": "3c7d6676-f713-4985-ba52-4c19fe6c3692", "revenue": 27.00, "rider_id": "c15a4926-fbb2-4931-a9a0-6dfabc6c506b", "start_address": "39415 Brandon Avenue Apt. 29", "start_time": "2024-05-09T12:18:42.055498", "vehicle_city": "rome", "vehicle_id": "627dad1a-3531-4214-a173-16bcc6b93036"}},"Topic":"rides"} ~~~ - For more detail on emitted changefeed messages, refer to [Responses]({% link {{ page.version.version }}/changefeed-messages.md %}#responses). + For more detail on emitted changefeed messages, refer to the [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}) page.
-
+
-Basic changefeeds stream row-level changes to a client until the underlying SQL connection is closed. +Sinkless changefeeds stream row-level changes to a client until the underlying SQL connection is closed. -## Create a basic changefeed +## Create a sinkless changefeed -{% include {{ page.version.version }}/cdc/create-core-changefeed.md %} +{% include {{ page.version.version }}/cdc/create-sinkless-changefeed.md %} -## Create a basic changefeed using Avro +## Create a sinkless changefeed using Avro -{% include {{ page.version.version }}/cdc/create-core-changefeed-avro.md %} +{% include {{ page.version.version }}/cdc/create-sinkless-changefeed-avro.md %} -For further information on basic changefeeds, see [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}). +For further information on sinkless changefeeds, refer to the [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}#create-a-sinkless-changefeed) page.
## See also -- [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}) - [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}) - [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}) diff --git a/src/current/v24.3/changefeed-for.md b/src/current/v24.3/changefeed-for.md index 77cfb2eb807..092e61f4c1a 100644 --- a/src/current/v24.3/changefeed-for.md +++ b/src/current/v24.3/changefeed-for.md @@ -69,7 +69,7 @@ Option | Value | Description `envelope` | `wrapped` / `bare` / `key_only` / `row` | `wrapped` the default envelope structure for changefeed messages containing an array of the primary key, a top-level field for the type of message, and the current state of the row (or `null` for deleted rows).

`bare` removes the `after` key from the changefeed message. When used with `avro` format, `record` will replace the `after` key.

`key_only` emits only the key and no value, which is faster if you only need to know the key of the changed row.

`row` emits the row without any additional metadata fields in the message. `row` does not support [`avro` format](#format).

Refer to [Responses]({% link {{ page.version.version }}/changefeed-messages.md %}#responses) for more detail on message format.

Default: `envelope=wrapped`. `format` | `json` / `avro` / `csv` / `parquet` | Format of the emitted message.

`avro`: For mappings of CockroachDB types to Avro types, [refer to the table]({% link {{ page.version.version }}/changefeed-messages.md %}#avro-types) and detail on [Avro limitations](#avro-limitations). **Note:** [`confluent_schema_registry`](#confluent-registry) is required with `format=avro`.

`csv`: You cannot combine `format=csv` with the `diff` or [`resolved`](#resolved-option) options. Changefeeds use the same CSV format as the [`EXPORT`](export.html) statement. Refer to [Export data with changefeeds]({% link {{ page.version.version }}/export-data-with-changefeeds.md %}) for details using these options to create a changefeed as an alternative to `EXPORT`. **Note:** [`initial_scan = 'only'`](#initial-scan) is required with `format=csv`.

`parquet`: Cloud storage is the only supported sink. The `topic_in_value` option is not compatible with `parquet` format.

Default: `format=json`. `initial_scan` / `no_initial_scan` / `initial_scan_only` | N/A | Control whether or not an initial scan will occur at the start time of a changefeed. `initial_scan_only` will perform an initial scan and then the changefeed job will complete with a `successful` status. You cannot use [`end_time`](#end-time) and `initial_scan_only` simultaneously.

If none of these options are specified, an initial scan will occur if there is no [`cursor`](#cursor-option), and will not occur if there is one. This preserves the behavior from previous releases.

You cannot specify `initial_scan` and `no_initial_scan` or `no_initial_scan and` `initial_scan_only` simultaneously.

Default: `initial_scan`
If used in conjunction with `cursor`, an initial scan will be performed at the cursor timestamp. If no `cursor` is specified, the initial scan is performed at `now()`. -`min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often nodes flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). Changefeeds will wait for at least the specified duration before a flushing. This can help you control the flush frequency to achieve better throughput. If this is set to `0s`, a node will flush as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit duplicate messages during this time.

**Note:** [`resolved`](#resolved-option) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). Since `min_checkpoint_frequency` defaults to `30s`, you **must** configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency if you require `resolved` messages more frequently than `30s`.

**Default:** `30s` +`min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often nodes flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Changefeeds will wait for at least the specified duration before a flushing. This can help you control the flush frequency to achieve better throughput. If this is set to `0s`, a node will flush as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit duplicate messages during this time.

**Note:** [`resolved`](#resolved-option) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). Since `min_checkpoint_frequency` defaults to `30s`, you **must** configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency if you require `resolved` messages more frequently than `30s`.

**Default:** `30s` `mvcc_timestamp` | N/A | Include the [MVCC]({% link {{ page.version.version }}/architecture/storage-layer.md %}#mvcc) timestamp for each emitted row in a changefeed. With the `mvcc_timestamp` option, each emitted row will always contain its MVCC timestamp, even during the changefeed's initial backfill. `resolved` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Emit [resolved timestamps]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-messages) for the changefeed. Resolved timestamps do not emit until all ranges in the changefeed have progressed to a specific point in time.

Set a minimum amount of time that the changefeed's high-water mark (overall resolved timestamp) must advance by before another resolved timestamp is emitted. Example: `resolved='10s'`. This option will **only** emit a resolved timestamp if the timestamp has advanced (and by at least the optional duration, if set). If a duration is unspecified, all resolved timestamps are emitted as the high-water mark advances.

**Note:** If you set `resolved` lower than `30s`, then you **must** also set the [`min_checkpoint_frequency`](#min-checkpoint-frequency) option to at minimum the same value as `resolved`, because `resolved` messages may be emitted less frequently than `min_checkpoint_frequency`, but cannot be emitted more frequently.

Refer to [Resolved messages]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-messages) for more detail. `split_column_families` | N/A | Target a table with multiple columns families. Emit messages for each column family in the target table. Each message will include the label: `table.family`. diff --git a/src/current/v24.3/changefeed-messages.md b/src/current/v24.3/changefeed-messages.md index 94c04621dcf..e0d64ef819d 100644 --- a/src/current/v24.3/changefeed-messages.md +++ b/src/current/v24.3/changefeed-messages.md @@ -59,7 +59,7 @@ For [webhook sinks]({% link {{ page.version.version }}/changefeed-sinks.md %}#we [Webhook message batching]({% link {{ page.version.version }}/changefeed-sinks.md %}#webhook-sink-configuration) is subject to the same key [ordering guarantee](#ordering-and-delivery-guarantees) as other sinks. Therefore, as messages are batched, you will not receive two batches at the same time with overlapping keys. You may receive a single batch containing multiple messages about one key, because ordering is maintained for a single key within its batch. -Refer to [changefeed files]({% link {{ page.version.version }}/create-changefeed.md %}#files) for more detail on the file naming format for {{ site.data.products.enterprise }} changefeeds. +Refer to [changefeed files]({% link {{ page.version.version }}/create-changefeed.md %}#files) for more detail on the file naming format for changefeeds that emit to a sink. ## Message envelopes @@ -264,7 +264,7 @@ As an example, you run the following sequence of SQL statements to create a chan {"after": {"id": 4, "name": "Danny", "office": "los angeles"}, "key": [4], "updated": "1701102561022789676.0000000000"} ~~~ - The messages received at the sink are in order by timestamp **for each key**. Here, the update for key `[1]` is emitted before the insertion of key `[2]` even though the timestamp for the update to key `[1]` is higher. That is, if you follow the sequence of updates for a particular key at the sink, they will be in the correct timestamp order. However, if a changefeed starts to re-emit messages after the last [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}), it may not emit all duplicate messages between the first duplicate message and new updates to the table. For details on when changefeeds might re-emit messages, refer to [Duplicate messages](#duplicate-messages). + The messages received at the sink are in order by timestamp **for each key**. Here, the update for key `[1]` is emitted before the insertion of key `[2]` even though the timestamp for the update to key `[1]` is higher. That is, if you follow the sequence of updates for a particular key at the sink, they will be in the correct timestamp order. However, if a changefeed starts to re-emit messages after the last [checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}), it may not emit all duplicate messages between the first duplicate message and new updates to the table. For details on when changefeeds might re-emit messages, refer to [Duplicate messages](#duplicate-messages). The `updated` option adds an `updated` timestamp to each emitted row. You can also use the [`resolved` option](#resolved-messages) to emit a `resolved` timestamp message to each Kafka partition, or to a separate file at a cloud storage sink. A `resolved` timestamp guarantees that no (previously unseen) rows with a lower update timestamp will be emitted on that partition. @@ -344,9 +344,9 @@ In some unusual situations you may receive a delete message for a row without fi ## Resolved messages -When you create a changefeed with the [`resolved` option]({% link {{ page.version.version }}/create-changefeed.md %}#resolved), the changefeed will emit resolved timestamp messages in a format dependent on the connected [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). The resolved timestamp is the high-water mark that guarantees that no previously unseen rows with an [earlier update timestamp](#ordering-and-delivery-guarantees) will be emitted to the sink. That is, resolved timestamp messages do not emit until the changefeed job has reached a [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +When you create a changefeed with the [`resolved` option]({% link {{ page.version.version }}/create-changefeed.md %}#resolved), the changefeed will emit resolved timestamp messages in a format dependent on the connected [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). The resolved timestamp is the high-water mark that guarantees that no previously unseen rows with an [earlier update timestamp](#ordering-and-delivery-guarantees) will be emitted to the sink. That is, resolved timestamp messages do not emit until the changefeed job has reached a [checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). -When you specify the `resolved` option at changefeed creation, the [job's coordinating node]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) will send the resolved timestamp to each endpoint at the sink. For example, each [Kafka]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka) partition will receive a resolved timestamp message, or a [cloud storage sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink) will receive a resolved timestamp file. +When you specify the `resolved` option at changefeed creation, the [job's coordinating node]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) will send the resolved timestamp to each endpoint at the sink. For example, each [Kafka]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka) partition will receive a resolved timestamp message, or a [cloud storage sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink) will receive a resolved timestamp file. There are three different ways to configure resolved timestamp messages: @@ -543,7 +543,7 @@ The following sections outline the limitations and type mapping for relevant for ### Avro -The following sections provide information on Avro usage with CockroachDB changefeeds. Creating a changefeed using Avro is available in Core and {{ site.data.products.enterprise }} changefeeds with the [`confluent_schema_registry`](create-changefeed.html#confluent-schema-registry) option. +The following sections provide information on Avro usage with CockroachDB changefeeds. Creating a changefeed using Avro is available with the [`confluent_schema_registry`](create-changefeed.html#confluent-schema-registry) option. #### Avro limitations diff --git a/src/current/v24.3/changefeed-monitoring-guide.md b/src/current/v24.3/changefeed-monitoring-guide.md index 22e960cdd40..a83de157a45 100644 --- a/src/current/v24.3/changefeed-monitoring-guide.md +++ b/src/current/v24.3/changefeed-monitoring-guide.md @@ -9,7 +9,7 @@ CockroachDB [changefeeds]({% link {{ page.version.version }}/change-data-capture This guide provides recommendations for monitoring and alerting on changefeeds throughout the pipeline to ensure reliable operation and quick problem detection. {{site.data.alerts.callout_success}} -For details on how changefeeds work as jobs in CockroachDB, refer to the [technical overview]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +For details on how changefeeds work as jobs in CockroachDB, refer to the [technical overview]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). {{site.data.alerts.end}} ## Overview @@ -42,7 +42,7 @@ Metrics names in Prometheus replace the `.` with `_`. In Datadog, metrics names - Use with [metrics labels]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#using-changefeed-metrics-labels) (supported in v24.3.5+). - Investigation needed: If `changefeed.max_behind_nanos` is consistently increasing. - `(now() - changefeed.checkpoint_progress)` - - Description: The progress of changefeed [checkpointing]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). Indicates how recently the changefeed state was persisted durably. Critical for monitoring changefeed [recovery capability]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages). + - Description: The progress of changefeed [checkpointing]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Indicates how recently the changefeed state was persisted durably. Critical for monitoring changefeed [recovery capability]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages). - Investigation needed: If checkpointing falls too far behind the current time. - Impact: - Slow processing of changes and updates to downstream sinks. diff --git a/src/current/v24.3/changefeed-sinks.md b/src/current/v24.3/changefeed-sinks.md index 988a9a63701..68d283ce17c 100644 --- a/src/current/v24.3/changefeed-sinks.md +++ b/src/current/v24.3/changefeed-sinks.md @@ -5,7 +5,7 @@ toc: true docs_area: stream_data --- -{{ site.data.products.enterprise }} changefeeds emit messages to configurable downstream sinks. This page details the URIs, parameters, and configurations available for each changefeed sink. +Changefeeds emit messages to configurable downstream sinks. This page details the URIs, parameters, and configurations available for each changefeed sink. CockroachDB supports the following sinks: diff --git a/src/current/v24.3/changefeeds-in-multi-region-deployments.md b/src/current/v24.3/changefeeds-in-multi-region-deployments.md index 5d4cd08cfa0..21bebd115ec 100644 --- a/src/current/v24.3/changefeeds-in-multi-region-deployments.md +++ b/src/current/v24.3/changefeeds-in-multi-region-deployments.md @@ -12,7 +12,7 @@ This page describes features that you can use for changefeeds running on multi-r ## Run a changefeed job by locality -Use the `execution_locality` option to set locality filter requirements that a node must meet to take part in executing a [changefeed]({% link {{ page.version.version }}/create-changefeed.md %}) job. This will pin the [coordination of the changefeed job]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) and the nodes that process the [changefeed messages]({% link {{ page.version.version }}/changefeed-messages.md %}) to the defined locality. +Use the `execution_locality` option to set locality filter requirements that a node must meet to take part in executing a [changefeed]({% link {{ page.version.version }}/create-changefeed.md %}) job. This will pin the [coordination of the changefeed job]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) and the nodes that process the [changefeed messages]({% link {{ page.version.version }}/changefeed-messages.md %}) to the defined locality. Defining an execution locality for a changefeed job, could be useful in the following cases: @@ -51,7 +51,7 @@ Once the coordinating node is determined, nodes that match the locality requirem When a node matching the locality filter takes part in the changefeed job, that node will read from the closest [replica]({% link {{ page.version.version }}/architecture/reads-and-writes-overview.md %}#architecture-replica). If the node is a replica, it can read from itself. In the scenario where no replicas are available in the region of the assigned node, it may then read from a replica in a different region. As a result, you may want to consider [placing replicas]({% link {{ page.version.version }}/configure-replication-zones.md %}), including potentially [non-voting replicas]({% link {{ page.version.version }}/architecture/replication-layer.md %}#non-voting-replicas) that will have less impact on read latency, in the locality or region that you plan on pinning for changefeed job execution. -For an overview of how a changefeed job works, refer to the [How does an Enterprise changefeed work?]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) page. +For an overview of how a changefeed job works, refer to the [How does a changefeed work?]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) page. ## Run changefeeds on regional by row tables diff --git a/src/current/v24.3/changefeeds-on-tables-with-column-families.md b/src/current/v24.3/changefeeds-on-tables-with-column-families.md index 4a48b5c7f7d..b2825c7d59c 100644 --- a/src/current/v24.3/changefeeds-on-tables-with-column-families.md +++ b/src/current/v24.3/changefeeds-on-tables-with-column-families.md @@ -28,7 +28,7 @@ CREATE CHANGEFEED FOR TABLE {table} FAMILY {family} INTO {sink}; ~~~ {{site.data.alerts.callout_info}} -You can also use [basic changefeeds]({% link {{ page.version.version }}/changefeeds-on-tables-with-column-families.md %}?filters=core#create-a-basic-changefeed-on-a-table-with-column-families) on tables with column families by using the [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}) statement with `split_column_families` or the `FAMILY` keyword. +You can also use [sinkless changefeeds]({% link {{ page.version.version }}/changefeeds-on-tables-with-column-families.md %}?filters=sinkless#create-a-sinkless-changefeed-on-a-table-with-column-families) on tables with column families by using the [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}) statement without a sink with `split_column_families` or the `FAMILY` keyword. {{site.data.alerts.end}} If a table has multiple column families, the `FAMILY` keyword will ensure the changefeed emits messages for **each** column family you define with `FAMILY` in the `CREATE CHANGEFEED` statement. If you do not specify `FAMILY`, then the changefeed will emit messages for **all** the table's column families. @@ -83,21 +83,17 @@ The output shows the `primary` column family with `4` in the value (`{"id":4,"na - Creating a changefeed with [CDC queries]({% link {{ page.version.version }}/cdc-queries.md %}) is not supported on tables with more than one column family. - When you create a changefeed on a table with more than one column family, the changefeed will emit messages per column family in separate streams. As a result, [changefeed messages]({% link {{ page.version.version }}/changefeed-messages.md %}) for different column families will arrive at the [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) under separate topics. For more details, refer to [Message format](#message-format). -For examples of starting changefeeds on tables with column families, see the following examples for Enterprise and basic changefeeds. -
- - + +
-
+
## Create a changefeed on a table with column families In this example, you'll set up changefeeds on two tables that have [column families]({% link {{ page.version.version }}/column-families.md %}). You'll use a single-node cluster sending changes to a webhook sink for this example, but you can use any [changefeed sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) to work with tables that include column families. -1. If you do not already have one, [request a trial {{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#obtain-a-license). - 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: {% include_cached copy-clipboard.html %} @@ -112,18 +108,6 @@ In this example, you'll set up changefeeds on two tables that have [column famil cockroach sql --insecure ~~~ -1. Set your organization and license key: - - {% include_cached copy-clipboard.html %} - ~~~ sql - SET CLUSTER SETTING cluster.organization = ''; - ~~~ - - {% include_cached copy-clipboard.html %} - ~~~ sql - SET CLUSTER SETTING enterprise.license = ''; - ~~~ - 1. Enable the `kv.rangefeed.enabled` [cluster setting]({% link {{ page.version.version }}/cluster-settings.md %}): {% include_cached copy-clipboard.html %} @@ -299,11 +283,11 @@ In this example, you'll set up changefeeds on two tables that have [column famil
-
+
-## Create a basic changefeed on a table with column families +## Create a sinkless changefeed on a table with column families -In this example, you'll set up basic changefeeds on two tables that have [column families]({% link {{ page.version.version }}/column-families.md %}). You'll use a single-node cluster with the basic changefeed sending changes to the client. +In this example, you'll set up a sinkless changefeed on two tables that have [column families]({% link {{ page.version.version }}/column-families.md %}). You'll use a single-node cluster with the basic changefeed sending changes to the client. 1. Use the [`cockroach start-single-node`]({% link {{ page.version.version }}/cockroach-start-single-node.md %}) command to start a single-node cluster: @@ -385,7 +369,7 @@ In this example, you'll set up basic changefeeds on two tables that have [column {% include_cached copy-clipboard.html %} ~~~ sql - EXPERIMENTAL CHANGEFEED FOR TABLE office_dogs FAMILY employee; + CREATE CHANGEFEED FOR TABLE office_dogs FAMILY employee; ~~~ You'll receive one message for each of the inserts that affects the specified column family: @@ -406,7 +390,7 @@ In this example, you'll set up basic changefeeds on two tables that have [column {% include_cached copy-clipboard.html %} ~~~ sql - EXPERIMENTAL CHANGEFEED FOR TABLE office_dogs FAMILY employee, TABLE office_plants FAMILY dog_friendly; + CREATE CHANGEFEED FOR TABLE office_dogs FAMILY employee, TABLE office_plants FAMILY dog_friendly; ~~~ You'll receive one message for each insert that affects the specified column families: @@ -435,7 +419,7 @@ In this example, you'll set up basic changefeeds on two tables that have [column {% include_cached copy-clipboard.html %} ~~~ sql - EXPERIMENTAL CHANGEFEED FOR TABLE office_dogs WITH split_column_families; + CREATE CHANGEFEED FOR TABLE office_dogs WITH split_column_families; ~~~ In your other terminal window, insert some more values: diff --git a/src/current/v24.3/connect-to-a-changefeed-kafka-sink-with-oauth-using-okta.md b/src/current/v24.3/connect-to-a-changefeed-kafka-sink-with-oauth-using-okta.md index 8e2590bad9d..8a07f935ca6 100644 --- a/src/current/v24.3/connect-to-a-changefeed-kafka-sink-with-oauth-using-okta.md +++ b/src/current/v24.3/connect-to-a-changefeed-kafka-sink-with-oauth-using-okta.md @@ -5,7 +5,7 @@ toc: true docs_area: stream_data --- -CockroachDB {{ site.data.products.enterprise }} [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) can stream change data out to [Apache Kafka](https://kafka.apache.org/) using OAuth authentication. +CockroachDB [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}) can stream change data out to [Apache Kafka](https://kafka.apache.org/) using OAuth authentication. {% include {{ page.version.version }}/cdc/oauth-description.md %} diff --git a/src/current/v24.3/create-and-configure-changefeeds.md b/src/current/v24.3/create-and-configure-changefeeds.md index 4662b2451ba..130a68a6829 100644 --- a/src/current/v24.3/create-and-configure-changefeeds.md +++ b/src/current/v24.3/create-and-configure-changefeeds.md @@ -1,6 +1,6 @@ --- title: Create and Configure Changefeeds -summary: Create and configure a changefeed job for Core and Enterprise. +summary: Create and configure a changefeed emitting to a sink or a sinkless changefeed. toc: true docs_area: stream_data --- @@ -15,10 +15,10 @@ This page describes: ## Before you create a changefeed 1. Enable rangefeeds on CockroachDB {{ site.data.products.advanced }} and CockroachDB {{ site.data.products.core }}. Refer to [Enable rangefeeds](#enable-rangefeeds) for instructions. -1. Decide on whether you will run an {{ site.data.products.enterprise }} or basic changefeed. Refer to the [Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}) page for a comparative capability table. +1. Decide on whether you will run a changefeed that emits to a sink or a sinkless changefeed. Refer to the [Overview]({% link {{ page.version.version }}/change-data-capture-overview.md %}) page for a comparative capability table. 1. Plan the number of changefeeds versus the number of tables to include in a single changefeed for your cluster. {% include {{ page.version.version }}/cdc/changefeed-number-limit.md %} Refer to [System resources and running changefeeds]({% link {{ page.version.version }}/changefeed-best-practices.md %}#maintain-system-resources-and-running-changefeeds) and [Recommendations for the number of target tables]({% link {{ page.version.version }}/changefeed-best-practices.md %}#plan-the-number-of-watched-tables-for-a-single-changefeed). - {% include common/cdc-cloud-costs-link.md %} -1. Consider whether your {{ site.data.products.enterprise }} [changefeed use case](#create) would be better served by [change data capture queries]({% link {{ page.version.version }}/cdc-queries.md %}) that can filter data on a single table. CDC queries can improve the efficiency of changefeeds because the job will not need to encode as much change data. +1. Consider whether your [changefeed use case](#create) would be better served by [change data capture queries]({% link {{ page.version.version }}/cdc-queries.md %}) that can filter data on a single table. CDC queries can improve the efficiency of changefeeds because the job will not need to encode as much change data. 1. Read the following: - The [Changefeed Best Practices]({% link {{ page.version.version }}/changefeed-best-practices.md %}) reference for details on planning changefeeds, monitoring basics, and schema changes. - The [Considerations](#considerations) section that provides information on changefeed interactions that could affect how you configure or run your changefeed. @@ -34,7 +34,7 @@ Changefeeds connect to a long-lived request called a _rangefeed_, which pushes c SET CLUSTER SETTING kv.rangefeed.enabled = true; ~~~ -Any created changefeeds will error until this setting is enabled. If you are working on a CockroachDB Serverless cluster, the `kv.rangefeed.enabled` cluster setting is enabled by default. +Any created changefeeds will error until this setting is enabled. If you are working on a CockroachDB {{ site.data.products.basic }} or {{ site.data.products.standard }} cluster, the `kv.rangefeed.enabled` cluster setting is enabled by default. Enabling rangefeeds has a small performance cost (about a 5–10% increase in write latencies), whether or not the rangefeed is being used in a changefeed. When `kv.rangefeed.enabled` is set to `true`, a small portion of the latency cost is caused by additional write event information that is sent to the [Raft log]({% link {{ page.version.version }}/architecture/replication-layer.md %}#raft-logs) and for [replication]({% link {{ page.version.version }}/architecture/replication-layer.md %}). The remainder of the latency cost is incurred once a changefeed is running; the write event information is reconstructed and sent to an active rangefeed, which will push the event to the changefeed. @@ -53,41 +53,34 @@ For further detail on performance-related configuration, refer to the [Advanced - After you [restore from a full-cluster backup]({% link {{ page.version.version }}/restore.md %}#full-cluster), changefeed jobs will **not** resume on the new cluster. It is necessary to manually create the changefeeds following the full-cluster restore. - {% include {{ page.version.version }}/cdc/virtual-computed-column-cdc.md %} -The following Enterprise and Core sections outline how to create and configure each type of changefeed: +The following sections outline how to create and configure each type of changefeed:
- - + +
-
+
## Configure a changefeed -An {{ site.data.products.enterprise }} changefeed streams row-level changes in a [configurable format]({% link {{ page.version.version }}/changefeed-messages.md %}) to one of the following sinks: +A changefeed streams row-level changes in a [configurable format]({% link {{ page.version.version }}/changefeed-messages.md %}) to one of the following sinks: {% include {{ page.version.version }}/cdc/sink-list.md %} -You can [create](#create), [pause](#pause), [resume](#resume), and [cancel](#cancel) an {{ site.data.products.enterprise }} changefeed. For a step-by-step example connecting to a specific sink, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. +You can [create](#create), [pause](#pause), [resume](#resume), and [cancel](#cancel) a changefeed emitting messages to a sink. For a step-by-step example connecting to a specific sink, see the [Changefeed Examples]({% link {{ page.version.version }}/changefeed-examples.md %}) page. ### Create -To create an {{ site.data.products.enterprise }} changefeed: +To create a changefeed: {% include_cached copy-clipboard.html %} ~~~ sql -CREATE CHANGEFEED FOR TABLE table_name, table_name2 INTO '{scheme}://{host}:{port}?{query_parameters}'; +CREATE CHANGEFEED FOR TABLE table_name, table_name2 INTO '{scheme}://{sink_host}:{port}?{query_parameters}'; ~~~ {% include {{ page.version.version }}/cdc/url-encoding.md %} -When you create a changefeed **without** specifying a sink, CockroachDB sends the changefeed events to the SQL client. Consider the following regarding the [display format]({% link {{ page.version.version }}/cockroach-sql.md %}#sql-flag-format) in your SQL client: - -- If you do not define a display format, the CockroachDB SQL client will automatically use `ndjson` format. -- If you specify a display format, the client will use that format (e.g., `--format=csv`). -- If you set the client display format to `ndjson` and set the changefeed [`format`]({% link {{ page.version.version }}/create-changefeed.md %}#format) to `csv`, you'll receive JSON format with CSV nested inside. -- If you set the client display format to `csv` and set the changefeed [`format`]({% link {{ page.version.version }}/create-changefeed.md %}#format) to `json`, you'll receive a comma-separated list of JSON values. - For more information, see [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}). ### Show @@ -104,7 +97,7 @@ For more information, refer to [`SHOW CHANGEFEED JOB`]({% link {{ page.version.v ### Pause -To pause an {{ site.data.products.enterprise }} changefeed: +To pause a changefeed: {% include_cached copy-clipboard.html %} ~~~ sql @@ -115,7 +108,7 @@ For more information, refer to [`PAUSE JOB`]({% link {{ page.version.version }}/ ### Resume -To resume a paused {{ site.data.products.enterprise }} changefeed: +To resume a paused changefeed: {% include_cached copy-clipboard.html %} ~~~ sql @@ -126,7 +119,7 @@ For more information, refer to [`RESUME JOB`]({% link {{ page.version.version }} ### Cancel -To cancel an {{ site.data.products.enterprise }} changefeed: +To cancel a changefeed: {% include_cached copy-clipboard.html %} ~~~ sql @@ -145,20 +138,25 @@ For more information, refer to [`CANCEL JOB`]({% link {{ page.version.version }}
-
- -## Create a changefeed +
-A basic changefeed streams row-level changes to the client indefinitely until the underlying connection is closed or the changefeed is canceled. +## Create a sinkless changefeed -To create a basic changefeed: +When you create a changefeed **without** specifying a sink (a sinkless changefeed), CockroachDB sends the changefeed events to the SQL client indefinitely until the underlying connection is closed or the changefeed is canceled: {% include_cached copy-clipboard.html %} ~~~ sql -EXPERIMENTAL CHANGEFEED FOR table_name; +CREATE CHANGEFEED FOR TABLE table_name, table_name2; ~~~ -For more information, see [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}). +Consider the following regarding the [display format]({% link {{ page.version.version }}/cockroach-sql.md %}#sql-flag-format) in your SQL client: + +- If you do not define a display format, the CockroachDB SQL client will automatically use `ndjson` format. +- If you specify a display format, the client will use that format (e.g., `--format=csv`). +- If you set the client display format to `ndjson` and set the changefeed [`format`]({% link {{ page.version.version }}/create-changefeed.md %}#format) to `csv`, you'll receive JSON format with CSV nested inside. +- If you set the client display format to `csv` and set the changefeed [`format`]({% link {{ page.version.version }}/create-changefeed.md %}#format) to `json`, you'll receive a comma-separated list of JSON values. + +For more information, see [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}#create-a-sinkless-changefeed).
@@ -173,5 +171,4 @@ For more information, see [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.versio ## See also - [`SHOW JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) -- [`EXPERIMENTAL CHANGEFEED FOR`]({% link {{ page.version.version }}/changefeed-for.md %}) - [`CREATE CHANGEFEED`]({% link {{ page.version.version }}/create-changefeed.md %}) diff --git a/src/current/v24.3/create-changefeed.md b/src/current/v24.3/create-changefeed.md index 99279fb97f7..b51845188c0 100644 --- a/src/current/v24.3/create-changefeed.md +++ b/src/current/v24.3/create-changefeed.md @@ -5,9 +5,11 @@ toc: true docs_area: reference.sql --- -The `CREATE CHANGEFEED` [statement]({% link {{ page.version.version }}/sql-statements.md %}) creates a new {{ site.data.products.enterprise }} changefeed, which targets an allowlist of tables called "watched rows". Every change to a watched row is emitted as a record in a configurable format (`JSON` or Avro) to a configurable sink ([Kafka](https://kafka.apache.org/), [Google Cloud Pub/Sub](https://cloud.google.com/pubsub), a [cloud storage sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink), or a [webhook sink]({% link {{ page.version.version }}/changefeed-sinks.md %}#webhook-sink)). You can [create](#examples), [pause](#pause-a-changefeed), [resume](#resume-a-paused-changefeed), [alter]({% link {{ page.version.version }}/alter-changefeed.md %}), or [cancel](#cancel-a-changefeed) an {{ site.data.products.enterprise }} changefeed. +The `CREATE CHANGEFEED` [statement]({% link {{ page.version.version }}/sql-statements.md %}) creates a new changefeed, which targets an allowlist of tables called "watched rows". Every change to a watched row is emitted as a record in a configurable format (`JSON` or Avro) to a [configurable sink]({% link {{ page.version.version }}/changefeed-sinks.md %}) or directly to the SQL session. -To get started with changefeeds, refer to the [Create and Configure Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}) page for important usage considerations. For detail on how changefeeds emit messages, refer to the [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}) page. +When a changefeed emits messages to a sink, it works as a [job]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). You can [create](#examples), [pause](#pause-a-changefeed), [resume](#resume-a-paused-changefeed), [alter]({% link {{ page.version.version }}/alter-changefeed.md %}), or [cancel](#cancel-a-changefeed) a changefeed job. + +To get started with changefeeds, refer to the [Create and Configure Changefeeds]({% link {{ page.version.version }}/create-and-configure-changefeeds.md %}) page for important usage considerations. For details on how changefeeds emit messages, refer to the [Changefeed Messages]({% link {{ page.version.version }}/changefeed-messages.md %}) page. The [examples](#examples) on this page provide the foundational syntax of the `CREATE CHANGEFEED` statement. For examples on more specific use cases with changefeeds, refer to the following pages: @@ -130,7 +132,7 @@ Option | Value | Description `lagging_ranges_threshold` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Set a duration from the present that determines the length of time a range is considered to be lagging behind, which will then track in the [`lagging_ranges`]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#lagging-ranges-metric) metric. Note that ranges undergoing an [initial scan](#initial-scan) for longer than the threshold duration are considered to be lagging. Starting a changefeed with an initial scan on a large table will likely increment the metric for each range in the table. As ranges complete the initial scan, the number of ranges lagging behind will decrease.

**Default:** `3m` `lagging_ranges_polling_interval` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Set the interval rate for when lagging ranges are checked and the `lagging_ranges` metric is updated. Polling adds latency to the `lagging_ranges` metric being updated. For example, if a range falls behind by 3 minutes, the metric may not update until an additional minute afterward.

**Default:** `1m` `metrics_label` | [`STRING`]({% link {{ page.version.version }}/string.md %}) | Define a metrics label to which the metrics for one or multiple changefeeds increment. All changefeeds also have their metrics aggregated.

The maximum length of a label is 128 bytes. There is a limit of 1024 unique labels.

`WITH metrics_label=label_name`

For more detail on usage and considerations, see [Using changefeed metrics labels]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#using-changefeed-metrics-labels). -`min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often a node's changefeed [aggregator]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) will flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). A node's changefeed aggregator will wait at least the specified duration between sending progress updates for the ranges it is watching to the coordinator. This can help you control the flush frequency of higher latency sinks to achieve better throughput. However, more frequent checkpointing can increase CPU usage. If this is set to `0s`, a node will flush messages as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit [duplicate messages]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages) during this time.

**Note:** [`resolved`](#resolved) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). If you require `resolved` messages more frequently than `30s`, you must configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency. For more details, refer to [Resolved message frequency]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-timestamp-frequency).

**Default:** `30s` +`min_checkpoint_frequency` | [Duration string](https://pkg.go.dev/time#ParseDuration) | Controls how often a node's changefeed [aggregator]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) will flush their progress to the [coordinating changefeed node]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). A node's changefeed aggregator will wait at least the specified duration between sending progress updates for the ranges it is watching to the coordinator. This can help you control the flush frequency of higher latency sinks to achieve better throughput. However, more frequent checkpointing can increase CPU usage. If this is set to `0s`, a node will flush messages as long as the high-water mark has increased for the ranges that particular node is processing. If a changefeed is resumed, then `min_checkpoint_frequency` is the amount of time that changefeed will need to catch up. That is, it could emit [duplicate messages]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages) during this time.

**Note:** [`resolved`](#resolved) messages will not be emitted more frequently than the configured `min_checkpoint_frequency` (but may be emitted less frequently). If you require `resolved` messages more frequently than `30s`, you must configure `min_checkpoint_frequency` to at least the desired `resolved` message frequency. For more details, refer to [Resolved message frequency]({% link {{ page.version.version }}/changefeed-messages.md %}#resolved-timestamp-frequency).

**Default:** `30s` `mvcc_timestamp` | N/A | Include the [MVCC]({% link {{ page.version.version }}/architecture/storage-layer.md %}#mvcc) timestamp for each emitted row in a changefeed. With the `mvcc_timestamp` option, each emitted row will always contain its MVCC timestamp, even during the changefeed's initial backfill. `on_error` | `pause` / `fail` | Use `on_error=pause` to pause the changefeed when encountering **non**-retryable errors. `on_error=pause` will pause the changefeed instead of sending it into a terminal failure state. **Note:** Retryable errors will continue to be retried with this option specified.

Use with [`protect_data_from_gc_on_pause`](#protect-data-from-gc-on-pause) to protect changes from [garbage collection]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds).

If a changefeed with `on_error=pause` is running when a watched table is [truncated]({% link {{ page.version.version }}/truncate.md %}), the changefeed will pause but will not be able to resume reads from that table. Using [`ALTER CHANGEFEED`]({% link {{ page.version.version }}/alter-changefeed.md %}) to drop the table from the changefeed and then [resuming the job]({% link {{ page.version.version }}/resume-job.md %}) will work, but you cannot add the same table to the changefeed again. Instead, you will need to [create a new changefeed](#start-a-new-changefeed-where-another-ended) for that table.

Default: `on_error=fail` `protect_data_from_gc_on_pause` | N/A | This option is deprecated as of v23.2 and will be removed in a future release.

When a [changefeed is paused]({% link {{ page.version.version }}/pause-job.md %}), ensure that the data needed to [resume the changefeed]({% link {{ page.version.version }}/resume-job.md %}) is not garbage collected. If `protect_data_from_gc_on_pause` is **unset**, pausing the changefeed will release the existing protected timestamp records. It is also important to note that pausing and adding `protect_data_from_gc_on_pause` to a changefeed will not protect data if the [garbage collection]({% link {{ page.version.version }}/configure-replication-zones.md %}#gc-ttlseconds) window has already passed.

Use with [`on_error=pause`](#on-error) to protect changes from garbage collection when encountering non-retryable errors.

Refer to [Protect Changefeed Data from Garbage Collection]({% link {{ page.version.version }}/protect-changefeed-data.md %}) for more detail on protecting changefeed data.

**Note:** If you use this option, changefeeds that are left paused for long periods of time can prevent garbage collection. Use with the [`gc_protect_expires_after`](#gc-protect-expires-after) option to set a limit for protected data and for how long a changefeed will remain paused. @@ -237,7 +239,7 @@ CREATE CHANGEFEED INTO 'scheme://host:port' WHERE status = 'lost'; ~~~ -CDC queries can only run on a single table per changefeed and require an {{ site.data.products.enterprise }} license. +CDC queries can only run on a single table per changefeed. ### Create a sinkless changefeed @@ -249,8 +251,6 @@ CREATE CHANGEFEED FOR TABLE table_name, table_name2, table_name3 WITH updated, resolved; ~~~ -Sinkless changefeeds do not require an {{ site.data.products.enterprise }} license; however, a sinkless changefeed with CDC queries **does** require an {{ site.data.products.enterprise }} license. - To create a sinkless changefeed using CDC queries: {% include_cached copy-clipboard.html %} @@ -295,7 +295,7 @@ For guidance on how to filter changefeed messages to emit [row-level TTL]({% lin ### Manage a changefeed - For {{ site.data.products.enterprise }} changefeeds, use [`SHOW CHANGEFEED JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) to check the status of your changefeed jobs: +For changefeed jobs, use [`SHOW CHANGEFEED JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) to check the status of your changefeed jobs: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v24.3/export-data-with-changefeeds.md b/src/current/v24.3/export-data-with-changefeeds.md index ae5af42ad1b..fd8923d2235 100644 --- a/src/current/v24.3/export-data-with-changefeeds.md +++ b/src/current/v24.3/export-data-with-changefeeds.md @@ -5,15 +5,15 @@ toc: true docs_area: stream_data --- -When you create an {{ site.data.products.enterprise }} changefeed, you can include the [`initial_scan = 'only'`]({% link {{ page.version.version }}/create-changefeed.md %}#initial-scan) option to specify that the changefeed should only complete a table scan. The changefeed emits messages for the table scan and then the job completes with a `succeeded` status. As a result, you can create a changefeed with `initial_scan = 'only'` to [`EXPORT`]({% link {{ page.version.version }}/export.md %}) data out of your database. +When you create a changefeed, you can include the [`initial_scan = 'only'`]({% link {{ page.version.version }}/create-changefeed.md %}#initial-scan) option to specify that the changefeed should only complete a table scan. The changefeed emits messages for the table scan and then the job completes with a `succeeded` status. As a result, you can create a changefeed with `initial_scan = 'only'` to [`EXPORT`]({% link {{ page.version.version }}/export.md %}) data out of your database. -You can also [schedule a changefeed](#create-a-scheduled-changefeed-to-export-filtered-data) to use a changefeed initial scan for exporting data on a regular cadence. +You can also [schedule a changefeed](#create-a-scheduled-changefeed-to-export-filtered-data) that is emitting messages to a downstream sink, which allows you to use a changefeed initial scan for exporting data on a regular cadence. The benefits of using changefeeds for this use case instead of [export]({% link {{ page.version.version }}/export.md %}), include: - Changefeeds are jobs, which can be [paused]({% link {{ page.version.version }}/pause-job.md %}), [resumed]({% link {{ page.version.version }}/resume-job.md %}), [cancelled]({% link {{ page.version.version }}/cancel-job.md %}), [scheduled]({% link {{ page.version.version }}/create-schedule-for-changefeed.md %}), and [altered]({% link {{ page.version.version }}/alter-changefeed.md %}). - There is observability into a changefeed job using [`SHOW CHANGEFEED JOBS`]({% link {{ page.version.version }}/show-jobs.md %}#show-changefeed-jobs) and the [Changefeeds Dashboard]({% link {{ page.version.version }}/ui-cdc-dashboard.md %}) in the DB Console. -- Changefeed jobs have built-in [checkpointing]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) and [retries]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#changefeed-retry-errors). +- Changefeed jobs have built-in [checkpointing]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) and [retries]({% link {{ page.version.version }}/monitor-and-debug-changefeeds.md %}#changefeed-retry-errors). - [Changefeed sinks]({% link {{ page.version.version }}/changefeed-sinks.md %}) provide additional endpoints for your data. - You can use the [`format=csv`]({% link {{ page.version.version }}/create-changefeed.md %}#format) option with `initial_scan= 'only'` to emit messages in CSV format. diff --git a/src/current/v24.3/how-does-an-enterprise-changefeed-work.md b/src/current/v24.3/how-does-a-changefeed-work.md similarity index 76% rename from src/current/v24.3/how-does-an-enterprise-changefeed-work.md rename to src/current/v24.3/how-does-a-changefeed-work.md index 566f6a15e48..122a8bed214 100644 --- a/src/current/v24.3/how-does-an-enterprise-changefeed-work.md +++ b/src/current/v24.3/how-does-a-changefeed-work.md @@ -5,11 +5,11 @@ toc: true docs_area: stream_data --- -When an {{ site.data.products.enterprise }} changefeed is started on a node, that node becomes the _coordinator_ for the changefeed job (**Node 2** in the diagram). The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. The changefeed job will run across nodes in the cluster to access changed data in the watched table. The job will evenly distribute changefeed work across the cluster by assigning it to any [replica]({% link {{ page.version.version }}/architecture/replication-layer.md %}) for a particular range, which determines the node that will emit the changefeed data. If a [locality filter]({% link {{ page.version.version }}/changefeeds-in-multi-region-deployments.md %}#run-a-changefeed-job-by-locality) is specified, work is distributed to a node from those that match the locality filter and has the most locality tiers in common with a node that has a replica. +When a changefeed that will emit changes to a sink is started on a node, that node becomes the _coordinator_ for the changefeed job (**Node 2** in the diagram). The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. The changefeed job will run across nodes in the cluster to access changed data in the watched table. The job will evenly distribute changefeed work across the cluster by assigning it to any [replica]({% link {{ page.version.version }}/architecture/replication-layer.md %}) for a particular range, which determines the node that will emit the changefeed data. If a [locality filter]({% link {{ page.version.version }}/changefeeds-in-multi-region-deployments.md %}#run-a-changefeed-job-by-locality) is specified, work is distributed to a node from those that match the locality filter and has the most locality tiers in common with a node that has a replica. Each node uses its _aggregator processors_ to send back checkpoint progress to the coordinator, which gathers this information to update the _high-water mark timestamp_. The high-water mark acts as a checkpoint for the changefeed’s job progress, and guarantees that all changes before (or at) the timestamp have been emitted. In the unlikely event that the changefeed’s coordinating node were to fail during the job, that role will move to a different node and the changefeed will restart from the last checkpoint. If restarted, the changefeed may [re-emit messages]({% link {{ page.version.version }}/changefeed-messages.md %}#duplicate-messages) starting at the high-water mark time to the current time. Refer to [Ordering Guarantees]({% link {{ page.version.version }}/changefeed-messages.md %}#ordering-and-delivery-guarantees) for detail on CockroachDB's at-least-once-delivery-guarantee and how per-key message ordering is applied. -Changefeed process in a 3-node cluster +Changefeed process in a 3-node cluster With [`resolved`]({% link {{ page.version.version }}/create-changefeed.md %}#resolved) specified when a changefeed is started, the coordinator will send the resolved timestamp (i.e., the high-water mark) to each endpoint in the sink. For example, when using [Kafka]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka) this will be sent as a message to each partition; for [cloud storage]({% link {{ page.version.version }}/changefeed-sinks.md %}#cloud-storage-sink), this will be emitted as a resolved timestamp file. diff --git a/src/current/v24.3/monitor-and-debug-changefeeds.md b/src/current/v24.3/monitor-and-debug-changefeeds.md index bab92db1b17..af946d87d0c 100644 --- a/src/current/v24.3/monitor-and-debug-changefeeds.md +++ b/src/current/v24.3/monitor-and-debug-changefeeds.md @@ -6,7 +6,7 @@ docs_area: stream_data --- {{site.data.alerts.callout_info}} -Monitoring is only available for [{{ site.data.products.enterprise }} changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}#stream-row-level-changes-with-changefeeds). +Monitoring is only available for [changefeeds]({% link {{ page.version.version }}/change-data-capture-overview.md %}#stream-row-level-changes-with-changefeeds) that emit messages to a [sink]({% link {{ page.version.version }}/changefeed-sinks.md %}). {{site.data.alerts.end}} Changefeeds work as jobs in CockroachDB, which allows for [monitoring](#monitor-a-changefeed) and [debugging](#debug-a-changefeed) through the [DB Console]({% link {{ page.version.version }}/ui-overview.md %}) [**Jobs**]({% link {{ page.version.version }}/ui-jobs-page.md %}) page and [`SHOW JOBS`]({% link {{ page.version.version }}/show-jobs.md %}) SQL statements using the job ID. @@ -28,7 +28,7 @@ We recommend monitoring changefeeds with [Prometheus]({% link {{ page.version.ve ## Monitor a changefeed -Changefeed progress is exposed as a high-water timestamp that advances as the changefeed progresses. This is a guarantee that all changes before or at the timestamp have been emitted. You can monitor a changefeed: +Changefeed progress is exposed as a [high-water timestamp]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) that advances as the changefeed progresses. This is a guarantee that all changes before or at the timestamp have been emitted. You can monitor a changefeed: - On the [**Changefeeds** dashboard]({% link {{ page.version.version }}/ui-cdc-dashboard.md %}) of the DB Console. - On the [**Jobs** page]({% link {{ page.version.version }}/ui-jobs-page.md %}) of the DB Console. Hover over the high-water timestamp to view the [system time]({% link {{ page.version.version }}/as-of-system-time.md %}). @@ -76,10 +76,6 @@ If you are running a changefeed with the [`confluent_schema_registry`]({% link { ### Using changefeed metrics labels -{{site.data.alerts.callout_info}} -An {{ site.data.products.enterprise }} license is required to use metrics labels in changefeeds. -{{site.data.alerts.end}} - {% include {{ page.version.version }}/cdc/metrics-labels.md %} To start a changefeed with a metrics label, set the following cluster setting to `true`: @@ -136,7 +132,7 @@ changefeed_emitted_bytes{scope="vehicles"} 183557 | Metric | Description | Unit | Type -------------------+--------------+------+-------------------------------------------- `changefeed.admit_latency` | Difference between the event's MVCC timestamp and the time the event is put into the memory buffer. | Nanoseconds | Histogram -`changefeed.aggregator_progress` | The earliest timestamp up to which any [aggregator]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) is guaranteed to have emitted all values for which it is responsible. **Note:** This metric may regress when a changefeed restarts due to a transient error. Consider tracking the `changefeed.checkpoint_progress` metric, which will not regress. | Timestamp | Gauge +`changefeed.aggregator_progress` | The earliest timestamp up to which any [aggregator]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) is guaranteed to have emitted all values for which it is responsible. **Note:** This metric may regress when a changefeed restarts due to a transient error. Consider tracking the `changefeed.checkpoint_progress` metric, which will not regress. | Timestamp | Gauge `changefeed.backfill_count` | Number of changefeeds currently executing a backfill ([schema change]({% link {{ page.version.version }}/changefeed-messages.md %}#schema-changes) or initial scan). | Changefeeds | Gauge `changefeed.backfill_pending_ranges` | Number of [ranges]({% link {{ page.version.version }}/architecture/overview.md %}#architecture-range) in an ongoing backfill that are yet to be fully emitted. | Ranges | Gauge `changefeed.checkpoint_hist_nanos` | Time spent checkpointing changefeed progress. | Nanoseconds | Histogram @@ -153,7 +149,7 @@ changefeed_emitted_bytes{scope="vehicles"} 183557 `changefeed.message_size_hist` | Distribution in the size of emitted messages. | Bytes | Histogram `changefeed.running` | Number of currently running changefeeds, including sinkless changefeeds. | Changefeeds | Gauge `changefeed.sink_batch_hist_nanos` | Time messages spend batched in the sink buffer before being flushed and acknowledged. | Nanoseconds | Histogram -New in v24.3: `changefeed.total_ranges` | Total number of ranges that are watched by [aggregator processors]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) participating in the changefeed job. `changefeed.total_ranges` shares the same polling interval as the [`changefeed.lagging_ranges`](#lagging-ranges-metric) metric, which is controlled by the `lagging_ranges_polling_interval` option. For more details, refer to [Lagging ranges](#lagging-ranges). +New in v24.3: `changefeed.total_ranges` | Total number of ranges that are watched by [aggregator processors]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) participating in the changefeed job. `changefeed.total_ranges` shares the same polling interval as the [`changefeed.lagging_ranges`](#lagging-ranges-metric) metric, which is controlled by the `lagging_ranges_polling_interval` option. For more details, refer to [Lagging ranges](#lagging-ranges). ### Monitoring and measuring changefeed latency @@ -196,7 +192,7 @@ If your changefeed is experiencing elevated latency, you can use these metrics t ### Using logs -For {{ site.data.products.enterprise }} changefeeds, [use log information]({% link {{ page.version.version }}/logging-overview.md %}) to debug connection issues (i.e., `kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`). Debug by looking for lines in the logs with `[kafka-producer]` in them: +For changefeeds, [use log information]({% link {{ page.version.version }}/logging-overview.md %}) to debug connection issues (i.e., `kafka: client has run out of available brokers to talk to (Is your cluster reachable?)`). Debug by looking for lines in the logs with `[kafka-producer]` in them: ~~~ I190312 18:56:53.535646 585 vendor/github.com/Shopify/sarama/client.go:123 [kafka-producer] Initializing new client @@ -208,7 +204,7 @@ I190312 18:56:53.537686 585 vendor/github.com/Shopify/sarama/client.go:170 [kaf ### Using `SHOW CHANGEFEED JOBS` - For {{ site.data.products.enterprise }} changefeeds, use `SHOW CHANGEFEED JOBS` to check the status of your changefeed jobs: +For changefeeds, use `SHOW CHANGEFEED JOBS` to check the status of your changefeed jobs: {% include_cached copy-clipboard.html %} ~~~ sql diff --git a/src/current/v24.3/protect-changefeed-data.md b/src/current/v24.3/protect-changefeed-data.md index 49afa1eb6c5..634078e8c14 100644 --- a/src/current/v24.3/protect-changefeed-data.md +++ b/src/current/v24.3/protect-changefeed-data.md @@ -5,7 +5,7 @@ toc: true docs_area: stream_data --- -By default, [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) will protect changefeed data from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) up to the time of the [_checkpoint_]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}). +By default, [protected timestamps]({% link {{ page.version.version }}/architecture/storage-layer.md %}#protected-timestamps) will protect changefeed data from [garbage collection]({% link {{ page.version.version }}/architecture/storage-layer.md %}#garbage-collection) up to the time of the [_checkpoint_]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}). Protected timestamps will protect changefeed data from garbage collection in the following scenarios: diff --git a/src/current/v24.3/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md b/src/current/v24.3/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md index b2ef265d16b..07bbf48431b 100644 --- a/src/current/v24.3/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md +++ b/src/current/v24.3/stream-a-changefeed-to-a-confluent-cloud-kafka-cluster.md @@ -5,7 +5,7 @@ toc: true docs_area: stream_data --- -CockroachDB {{ site.data.products.enterprise }} changefeeds can stream change data out to [Apache Kafka](https://kafka.apache.org/) with different [configuration settings]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka-sink-configuration) and [options]({% link {{ page.version.version }}/create-changefeed.md %}). [Confluent Cloud](https://www.confluent.io/confluent-cloud/) provides a fully managed service for running Apache Kafka as well as the [Confluent Cloud Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). +CockroachDB changefeeds can stream change data out to [Apache Kafka](https://kafka.apache.org/) with different [configuration settings]({% link {{ page.version.version }}/changefeed-sinks.md %}#kafka-sink-configuration) and [options]({% link {{ page.version.version }}/create-changefeed.md %}). [Confluent Cloud](https://www.confluent.io/confluent-cloud/) provides a fully managed service for running Apache Kafka as well as the [Confluent Cloud Schema Registry](https://docs.confluent.io/platform/current/schema-registry/index.html). A schema registry is a repository for schemas, which allows you to share and manage schemas between different services. Confluent Cloud Schema Registries map to Kafka topics in your Confluent Cloud environment. @@ -248,18 +248,6 @@ To create your changefeed, you'll prepare your CockroachDB cluster with the `mov cockroach sql --url {"CONNECTION STRING"} ~~~ -1. Set your organization name and [{{ site.data.products.enterprise }} license]({% link {{ page.version.version }}/licensing-faqs.md %}#types-of-licenses) key: - - {% include_cached copy-clipboard.html %} - ~~~sql - SET CLUSTER SETTING cluster.organization = ''; - ~~~ - - {% include_cached copy-clipboard.html %} - ~~~sql - SET CLUSTER SETTING enterprise.license = ''; - ~~~ - 1. Before you can create an {{ site.data.products.enterprise }} changefeed, it is necessary to enable rangefeeds on your cluster: {% include_cached copy-clipboard.html %} @@ -322,7 +310,7 @@ You can also [create external connections]({% link {{ page.version.version }}/cr CREATE CHANGEFEED FOR TABLE users INTO "external://kafka" WITH updated, format = avro, confluent_schema_registry = "external://confluent_registry"; ~~~ - See [Options]({% link {{ page.version.version }}/create-changefeed.md %}#options) for a list of all available Enterprise changefeed options. + See [Options]({% link {{ page.version.version }}/create-changefeed.md %}#options) for a list of all available changefeed options. {{site.data.alerts.callout_success}} {% include {{ page.version.version }}/cdc/schema-registry-metric.md %} diff --git a/src/current/v24.3/ui-cdc-dashboard.md b/src/current/v24.3/ui-cdc-dashboard.md index 08ca950ecf9..e2c0ee5131b 100644 --- a/src/current/v24.3/ui-cdc-dashboard.md +++ b/src/current/v24.3/ui-cdc-dashboard.md @@ -73,7 +73,7 @@ Metric | Description ## Max Checkpoint Latency -This graph displays the most any changefeed's persisted [checkpoint]({% link {{ page.version.version }}/how-does-an-enterprise-changefeed-work.md %}) is behind the present time. Larger values indicate issues with successfully ingesting or emitting changes. If errors cause a changefeed to restart, or the changefeed is [paused]({% link {{ page.version.version }}/pause-job.md %}) and unpaused, emitted data up to the last checkpoint may be re-emitted. +This graph displays the most any changefeed's persisted [checkpoint]({% link {{ page.version.version }}/how-does-a-changefeed-work.md %}) is behind the present time. Larger values indicate issues with successfully ingesting or emitting changes. If errors cause a changefeed to restart, or the changefeed is [paused]({% link {{ page.version.version }}/pause-job.md %}) and unpaused, emitted data up to the last checkpoint may be re-emitted. {{site.data.alerts.callout_info}} In v23.1 and earlier, the **Max Checkpoint Latency** graph was named **Max Changefeed Latency**. If you want to customize charts, including how metrics are named, use the [**Custom Chart** debug page]({% link {{ page.version.version }}/ui-custom-chart-debug-page.md %}).
Basic changefeedsEnterprise changefeedsSinkless changefeedsChangefeeds
All productsCockroachDB {{ site.data.products.basic }}, {{ site.data.products.standard }}, {{ site.data.products.advanced }}, or with an {{ site.data.products.enterprise }} license in CockroachDB {{ site.data.products.core }}.All products
Streams indefinitely until underlying SQL connection is closed.Maintains connection to configured sink:
Amazon S3, Azure Event Hubs, Azure Storage, Confluent Cloud, Google Cloud Pub/Sub, Google Cloud Storage, HTTP, Kafka, Webhook.
Maintains connection to configured sink.
SQL statement Create with EXPERIMENTAL CHANGEFEED FORCreate with CREATE CHANGEFEEDCreate with CREATE CHANGEFEED FOR TABLE table_name;Create with CREATE CHANGEFEED FOR TABLE table_name INTO 'sink';
Filter change data Not supportedUse CDC queries to define the emitted change data. Use CDC queries to define the emitted change data.
Emits every change to a "watched" row as a record to the current SQL session.Emits every change to a "watched" row as a record in a configurable format: JSON, CSV, Avro, Parquet.Emits every change to a "watched" row as a record in a configurable format.
Management Create the changefeed and cancel by closing the connection.Create the changefeed and cancel by closing the SQL connection. Manage changefeed with CREATE, PAUSE, RESUME, ALTER, and CANCEL.