Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create chunk-size-calculation.md #2874

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
3335413
Create chunk-size-calculation.md
gerner-spryker Oct 25, 2024
456bbd0
Update chunk-size-calculation.md
gerner-spryker Oct 25, 2024
d288115
Update chunk-size-calculation.md
gerner-spryker Oct 25, 2024
d514e15
Update chunk-size-calculation.md
gerner-spryker Oct 25, 2024
626e70d
Update queue.md
gerner-spryker Oct 25, 2024
fdb0dae
Update queue.md
gerner-spryker Oct 25, 2024
27497a7
Create basic-chunk-size-calculation.md
gerner-spryker Oct 25, 2024
e550f46
Update queue.md
gerner-spryker Oct 25, 2024
b33dbc9
Update chunk-size-calculation.md
gerner-spryker Oct 25, 2024
6eaa48d
Update basic-chunk-size-calculation.md
gerner-spryker Oct 25, 2024
e70e817
Create advanced-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
5eb3f5d
Update basic-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
70263e7
Update basic-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
382e7be
Update advanced-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
3873b5c
Update basic-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
b37ebe8
Update advanced-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
67996e8
Update chunk-size-calculation.md
gerner-spryker Oct 28, 2024
fd41ee1
Update basic-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
935bc8b
Update advanced-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
365c0da
Create expert-chunk-size-calculation.md
gerner-spryker Oct 28, 2024
ae480d3
Update expert-chunk-size-calculation.md
gerner-spryker Oct 29, 2024
1fb17d3
Merging chunk-size-calculator documents
gerner-spryker Oct 29, 2024
a5bdb53
Update chunk-size-calculation.md
gerner-spryker Oct 29, 2024
6510679
Update chunk-size-calculation.md
gerner-spryker Oct 29, 2024
b42e6be
Update chunk-size-calculation.md
gerner-spryker Oct 30, 2024
15bd7f2
Update chunk-size-calculation.md
gerner-spryker Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: Advanced Chunk Size Calculation
description: Gives an overview over the advanced chunk size calculation
last_updated: Oct 25, 2024
template: concept-topic-template
redirect_from:
- /docs/dg/dev/backend-development/data-manipulation/queue/chunk-size-calculation.html
related:
- title: Basic Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/basic-chunk-size-calculation.html
- title: Expert Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/expert-chunk-size-calculation.html
- title: Queue
link: docs/dg/dev/backend-development/data-manipulation/queue/queue.html

---

## Advanced Chunk Size Calculation

The **Advanced Chunk Size Calculator** builds upon the Basic level by allowing to fine-tune chunk sizes based on custom hardware limitations and custom performance metrics.

The **Advanced Chunk Size Calculator** is available [here](link to google spreadsheet).

### Problem Overview

While the **Basic Chunk Size Calculator** offers a starting point, it doesn’t account for the nuances of resource allocations (like container sizes, CPU, and memory limits) or performance-sensitive variables such as application warm-up time and event message size. The **Advanced Chunk Size Calculator** addresses these issues by incorporating these additional metrics, enabling a more precise configuration that enhances stability and performance in production environments.

### Input Parameters

To calculate the correct queue chunk sizes, developers must provide the following information based on the specific production environment:
- **Scheduler and Worker Setup**: Provide details on any non-standard configurations, such as environments with multiple containers or distinct worker distributions within containers, if your scheduler setup differs from the boilerplate defaults.
- **Resource Configuration**: Provide information on your hardware setup, including instance types, CPU, and memory limits for services like Persistence, Storage, Search, or Message Broker, to allow the **Advanced Calculator** to optimize chunk sizes based on actual resource availability.
- **Detailed Product Configuration**: Provide specific metrics related to products, the highest-traffic entity. This supports more precise chunk sizing for products without requiring an in-depth understanding of Publish & Synchronize.
- **Event and Message Processing Metrics**: Provide expected event processing metrics, including deviations from default settings such as message trigger rates, custom application warm-up times, event size limits, and data division rate multipliers, to enable configuration adjustments that align with real-world performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

### Output

Once the required data is entered into the **Basic Chunk Size Calculator** and **Advanced Chunk Size Calculator**, it will compute the optimal queue chunk sizes for each queue. Developers will need to configure these queue chunk sizes in the project to align with the calculated values.

> For instructions on how to set up chunk sizes for the queues, [click here](https://docs.spryker.com/docs/dg/dev/backend-development/data-manipulation/queue/queue.html#configuration-for-chunk-size).


### Important Notes

- The **Advanced Chunk Size Calculator** allows to further configure chunk sizes based on custom hardware limitations and custom performance metrics.
- For systems that require individual configuration of queues and detailed customisation of message setups, consider using the **Expert Chunk Size Calculator**.
- Always ensure that the chunk sizes provided by the calculator are properly configured to avoid system performance issues.

---

For more detailed information about the different levels of the **Chunk Size Calculator**, see the [overview here](https://docs.spryker.com/docs/dg/dev/backend-development/data-manipulation/queue/chunk-size-calculation.html).
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: Basic Chunk Size Calculation
description: Gives an overview over the basic chunk size calculation
last_updated: Oct 25, 2024
template: concept-topic-template
redirect_from:
- /docs/dg/dev/backend-development/data-manipulation/queue/chunk-size-calculation.html
related:
- title: Advanced Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/advanced-chunk-size-calculation.html
- title: Expert Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/expert-chunk-size-calculation.html
- title: Queue
link: docs/dg/dev/backend-development/data-manipulation/queue/queue.html

---

## Basic Chunk Size Calculation

The **Basic Chunk Size Calculator** is designed to help developers configure the correct chunk sizes for the project based on the traffic and data patterns in the system. This tool simplifies the setup process for out-of-the-box and low-customised webshops, ensuring that the system can handle high-traffic entities efficiently without over-consuming resources.

The **Basic Chunk Size Calculator** is available [here](link to google spreadsheet).

### Problem Overview

In an e-commerce environment, certain business entities generate a large volume of update events due to frequent refreshes and high data volume. These **high traffic entities** account for the majority of the traffic within the **publish and synchronize**. Misconfiguring chunk sizes for these entities can lead to inefficient resource consumption, system lags, or overloads. The **Basic Chunk Size Calculator** offers a straightforward way to address this by determining the appropriate chunk size for each queue based on the production environment’s data profile.

### Input Parameters

To calculate the correct queue chunk sizes, developers must provide the following information based on their specific production environment:

- **High Traffic Entities**: Provide the total count of each high-traffic entity (e.g., products, prices, offers) across all stores, and estimate the daily refresh rate (percentage or count of entities updated daily) for ongoing system operations.
- **Stores and Locales**: Provide the total number of stores in the system and the maximum number of supported locales across all stores, as these factors impact chunk size calculation for data distribution.
> For more information on stores and locales in our system, [click here](https://docs.spryker.com/docs/pbc/all/dynamic-multistore/202410.0/base-shop/dynamic-multistore-feature-overview.html).
- **Publish and Synchronize Setup**: The **publish and synchronize** processes entity data updates, and the worker setup plays a crucial role in determining how this is managed. Developers need to specify how project workers are set up in relation to stores.
> For more information on workers, tasks, and how they are related to stores, [click here](https://docs.spryker.com/docs/pbc/all/dynamic-multistore/202410.0/base-shop/dynamic-multistore-feature-overview.html).
- **Number of Tasks Per Worker**: Provide the **number of tasks per worker**. This value is essential to calculating how resources are distributed among tasks. Note that there is no additional help or explanation for determining this number, as it is specific to each setup.
> For more information on workers, tasks, and how they are related to stores, [click here](https://docs.spryker.com/docs/pbc/all/dynamic-multistore/202410.0/base-shop/dynamic-multistore-feature-overview.html).

### Output

Once the required data is entered into the **Basic Chunk Size Calculator**, it will compute the optimal chunk sizes for each queue used by the system. These queues handle different business entities, and setting the right queue chunk size ensures efficient processing and resource allocation. Developers will need to configure these queue chunk sizes.

> For instructions on how to set up chunk sizes for the queues, [click here](https://docs.spryker.com/docs/dg/dev/backend-development/data-manipulation/queue/queue.html#configuration-for-chunk-size).

### Important Notes

- The **Basic Chunk Size Calculator** is designed for systems that follow a standard, out-of-the-box configuration. If your system is more customized, consider using the **Advanced** or **Expert Chunk Size Calculator** for fine-tuning.
- This calculator only requires a basic understanding of the system's entity data and store structure. For more complex metrics like memory usage or container performance, the advanced calculators may be necessary.
- Always ensure that the chunk sizes provided by the calculator are properly configured to avoid system performance issues.

---

For more detailed information about the different levels of the **Chunk Size Calculator**, see the [overview here](https://docs.spryker.com/docs/dg/dev/backend-development/data-manipulation/queue/chunk-size-calculation.html).
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: Chunk Size Calculation
description: Describes the challenges and solutions of selecting proper chunk sizes for project requirements
last_updated: Oct 25, 2024
template: concept-topic-template
redirect_from:
- /docs/dg/dev/backend-development/data-manipulation/queue/queue.html#concepts
related:
- title: Basic Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/basic-chunk-size-calculation.html
- title: Advanced Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/advanced-chunk-size-calculation.html
- title: Expert Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/expert-chunk-size-calculation.html

---

## Chunk Size Calculation

In an e-commerce framework, selecting the correct queue chunk size for processing data is critical for ensuring optimal performance and efficient resource usage. The challenge arises from the diversity of business entities, stores, and locales, each with different memory and CPU requirements for denormalization. Furthermore, the frequency of updates, data sizes, and the specific configurations of various system services (e.g., Redis, Elasticsearch, RabbitMQ) make it difficult to determine the appropriate chunk size for each queue.

Without proper queue chunk size configuration, projects can either overconsume resources, leading to crashes or lag, or underutilize resources, resulting in slow performance. This is where the **Chunk Size Calculator** comes in—offering a solution that helps developers fine-tune queue chunk sizes based on the specific characteristics of the project.

### What is the Chunk Size Calculator?

The **Chunk Size Calculator** is a tool that helps developers determine the appropriate queue chunk sizes for processing data across different queues in a resource-efficient way. It ensures that memory and CPU usage are optimized and prevents the system from being overwhelmed during data processing tasks. This tool is designed to handle the variability in entity sizes, stores, locales, and update frequencies, giving developers confidence that project will run smoothly in production environments.

### The Three Levels of the Chunk Size Calculator

The **Chunk Size Calculator** is divided into three levels: **Basic**, **Advanced**, and **Expert**. Each level is designed to accommodate different degrees of project complexity and customization. Below is an overview of each:

#### 1. Basic Chunk Size Calculator

The **Basic Chunk Size Calculator** is designed for small to medium B2C projects with minimal customization. It assumes that the default configuration of business entities, stores, and locales is sufficient, and that the resource consumption patterns are predictable.

With the Basic calculator, developers only need to provide a minimal set of inputs, such as store configuration and high traffic entity counts. The calculator uses these inputs to recommend chunk sizes for each queue. This is ideal for developers who are working with out-of-the-box setups and need a simple, reliable way to configure project.

**When to use**: This is the default starting point for any project. If your project has not been heavily customized, this calculator will give you the necessary queue chunk sizes with minimal effort.

Find more details on the [Basic Chunk Size Calculation](https://docs.spryker.com/docs/dg/dev/backend-development/data-manipulation/queue/basic-chunk-size-calculation.html) page.

#### 2. Advanced Chunk Size Calculator

The **Advanced Chunk Size Calculator** builds upon the basic level, requiring developers to have a deeper understanding of the services that make up the project. In addition to understanding the basic chunk size concepts, developers will need to account for service elements like: Persistence, Storage, Search, Message Broker and Scheduler.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

The Advanced calculator addresses resource allocations or performance-sensitive variables, enabling a more precise configuration that enhances stability and performance in production environments. While the calculator provides recommendations, developers will need to input detailed configuration data to fine-tune the project.

**When to use**: The Advanced calculator is suited for projects that have been moderately customized or when developers need more precise control over performance and resource usage.

Find more details on the [Advanced Chunk Size Calculation](https://docs.spryker.com/docs/dg/dev/backend-development/data-manipulation/queue/advanced-chunk-size-calculation.html) page.

#### 3. Expert Chunk Size Calculator

The **Expert Chunk Size Calculator** is designed for highly customized projects where one or more entities deviate significantly from the norm. This includes scenarios where entities are unusually large or where the project handles massive data volumes that require frequent updates.

In this case, developers need a deep understanding of how the project’s components interact, including the scheduler’s worker, and the memory distribution across workers, and tasks. The Expert calculator gives full visibility into all performance metrics, allowing developers to tweak each queue individually.

Despite the complexity, the calculator still ensures that the entire system remains balanced, preventing one queue from consuming too many resources at the expense of others.

**When to use**: The Expert calculator is reserved for production environments with complex, heavily customized setups that require fine-tuning of every performance metric.

Find more details on the [Expert Chunk Size Calculation](https://docs.spryker.com/docs/dg/dev/backend-development/data-manipulation/queue/expert-chunk-size-calculation.html) page.

### Summary

The **Chunk Size Calculator** provides developers with a powerful tool for optimizing their system’s resource consumption. Each level of the calculator is designed to address different use cases, from simple out-of-the-box configurations to highly customized, complex environments. Developers are encouraged to start with the **Basic Chunk Size Calculator** and, if necessary, progress to the **Advanced** or **Expert** calculators as the complexity of their project grows.
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: Expert Chunk Size Calculation
description: Gives an overview over the expert chunk size calculation
last_updated: Oct 25, 2024
template: concept-topic-template
redirect_from:
- /docs/dg/dev/backend-development/data-manipulation/queue/chunk-size-calculation.html
related:
- title: Basic Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/basic-chunk-size-calculation.html
- title: Advanced Chunk Size Calculation
link: docs/dg/dev/backend-development/data-manipulation/queue/advanced-chunk-size-calculation.html
- title: Queue
link: docs/dg/dev/backend-development/data-manipulation/queue/queue.html

---

## Expert Chunk Size Calculation

### Overview

The **Expert Chunk Size Calculator** is designed for developers working with heavily customized entities within the project. Whether it's a single entity or multiple entities, when these entities are significantly customized in terms of size, relationships, or data complexity, this calculator provides the granular control needed to fine-tune each queue's performance.

As the complexity of an entity increases, so does the denormalization time, which can slow down the entire system. Developers using the **Expert Chunk Size Calculator** must have a solid understanding of how containerization works in the project, how resources like memory and CPU are distributed among containers, workers, and tasks, as well as the limitations of the receiving side services (such as search and storage) and the provider systems (like the database). The message broker, which delivers messages and imposes throughput limits, is also a critical component of the overall system architecture that needs to be considered.

The **Expert Chunk Size Calculator** is available [here](link to google spreadsheet).

### Problem Statement

In highly customized systems, Basic or Advanced queue chunk size configurations may not suffice. Complex entities with large data sets and relationships demand more fine-tuned control over how tasks are processed, how resources are allocated, and how messages are handled. The **Expert Chunk Size Calculator** is needed to provide detailed, queue-by-queue configuration for developers who need to optimize the project's performance under these conditions.

### Input Parameters

The **Expert Chunk Size Calculator** requires a wide range of detailed inputs to properly configure chunk sizes. Developers need to provide in-depth information about the production environment, including:

- **Entity Customization**: The size and cardinality of the entities, which affects how much memory and CPU is consumed during the denormalization process.
- **Message Handling**: Specific configuration data regarding the size of messages that will be processed by the system and the limits imposed by the message broker and receiving systems.

The expert calculator offers the ability to set individual performance and resource consumption metrics for each queue, making it possible to precisely optimize the entire **publish and synchronize** process.

### Output

The result of the **Expert Chunk Size Calculator** is a set of optimized queue chunk sizes for each individual queue in the project.

> For instructions on how to set up chunk sizes for the queues, [click here](https://docs.spryker.com/docs/dg/dev/backend-development/data-manipulation/queue/queue.html#configuration-for-chunk-size).

### Important Notes

- The **Expert Chunk Size Calculator** is intended for projects that have significant customizations at the entity level. If your system follows a more standard setup, consider using the **Basic** or **Advanced Chunk Size Calculators**.
- This calculator requires an in-depth understanding of how system components interact, including containerization, message brokers, search and storage, and resource distribution across workers and tasks.
- For systems that require individual configuration of queues and detailed customization of message handling, consider using the **Expert Chunk Size Calculator**.

### Additional Knowledge Required

To effectively use the **Expert Chunk Size Calculator**, developers must have a strong grasp of several key concepts related to resource management and system architecture.

#### 1. Container-Worker-Task Resource Relationship

This section will cover how resources (memory, CPU, etc.) are allocated between containers, workers, and tasks. It will explain how container boundaries are defined and the importance of understanding how these resources are distributed across the system to maintain healthy processing.

#### 2. Publish and Synchronize Queues

This section will explain how queues work in the publish and synchronize middleware, how they process multiple entities, and how factors like entity size and denormalization times impact CPU and memory consumption. Understanding these relationships is key to optimizing each queue’s performance through the expert calculator.
Loading
Loading