Replication not enabled on Multiblock Format #282

osopardo1 · 2024-03-13T09:24:52Z

WARNING: Replication would be removed from 0.6.0 version

Multiblock Format

The upcoming release of Qbeast Spark has new protocol updates.

In this modification, we change the layout of a cube spread in multiple files into a single file containing multiple cubes (divided into blocks). This allows the roll-up operation to arrange several small cubes into a bigger file, helping queries to filter data more effectively.

Original protocol metadata:

"tags": {
  "state": "FLOODED",
  "cube": "w",
  "revision": "1",
  "minWeight": "2",
  "maxWeight": "3",
  "elementCount": "4" 
}

NEW protocol metadata:

"tags": {
  "revision": "1",
  "blocks": [
    {
      "cube": "w",
      "minWeight": 2,
      "maxWeight": 3,
      "replicated": false,
      "elementCount": 4
    },
    {
      "cube": "wg",
      "minWeight": 5,
      "maxWeight": 6,
      "replicated": false,
      "elementCount": 7
    },
  ]
}

But changes come with downsides.

What is Replication?

Replication is the operation that optimizes the index for Sampling and Min-Max distribution.

In summary, it reads data from the cubes that are overflowed (containing much more records than their original capacity) and spreads the information to their children.

The problem

When combining Roll-Up and Replication, the replicated (copied) data of a cube might end up in the same Parquet, thus making not possible the compatibility for reading the file from other underlying sources (delta and parquet) and even from the current qbeast implementation.

The situation

We are removing Replication from the new version of qbeast-spark

It is a very specific feature, and we have to redesign it in a way that doesn't affect compatibility with other Formats. Right now, the effort of maintaining the operation is bigger than our development capacity.

This issue is to resonate ways of writing and interacting with replicated data.

Proposed solutions

One solution might be to write replicated data in another folder inside the Table. The solution needs to be elaborated on and proposed in a document, this is only a high-level idea.

The text was updated successfully, but these errors were encountered:

fpj · 2024-04-24T16:37:07Z

@osopardo1 a question about the replication described above. In the example, there are overflowed records in cube A (I think we call these offsets, yes?). When we say that we are "replicating" to children AA and AB, does it mean that A keeps the overflowed elements and create copies in AA and AB? The figure indicates that the elements are moved to AA and AB, and if so, it is not replicating, it is moving. I'm confused about the concept of replication here.,

osopardo1 · 2024-04-25T05:27:34Z

The idea of replication states that the offset of the cube would be removed as well, and A should be rewritten with the right amount of elements.

It is true that is not the behavior that we had on the operation at the moment. I will rebuild the figure expressing that A keeps the same records, only replicating the information and not cutting it's content.
Thanks for noticing.

osopardo1 added type: bug Something isn't working 1.0.0 labels Mar 13, 2024

osopardo1 removed the 1.0.0 label Mar 27, 2024

osopardo1 mentioned this issue Apr 23, 2024

Issue 309: Update documentation for 0.6.0 release #310

Merged

4 tasks

osopardo1 mentioned this issue Jun 19, 2024

Remove .compact() operation #337

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replication not enabled on Multiblock Format #282

Replication not enabled on Multiblock Format #282

osopardo1 commented Mar 13, 2024 •

edited

Loading

fpj commented Apr 24, 2024

osopardo1 commented Apr 25, 2024

Replication not enabled on Multiblock Format #282

Replication not enabled on Multiblock Format #282

Comments

osopardo1 commented Mar 13, 2024 • edited Loading

WARNING: Replication would be removed from 0.6.0 version

Multiblock Format

Original protocol metadata:

NEW protocol metadata:

What is Replication?

The problem

The situation

Proposed solutions

fpj commented Apr 24, 2024

osopardo1 commented Apr 25, 2024

osopardo1 commented Mar 13, 2024 •

edited

Loading