You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this modification, we change the layout of a cube spread in multiple files into a single file containing multiple cubes (divided into blocks). This allows the roll-up operation to arrange several small cubes into a bigger file, helping queries to filter data more effectively.
Replication is the operation that optimizes the index for Sampling and Min-Max distribution.
In summary, it reads data from the cubes that are overflowed (containing much more records than their original capacity) and spreads the information to their children.
The problem
When combining Roll-Up and Replication, the replicated (copied) data of a cube might end up in the same Parquet, thus making not possible the compatibility for reading the file from other underlying sources (delta and parquet) and even from the current qbeast implementation.
The situation
We are removing Replication from the new version of qbeast-spark
It is a very specific feature, and we have to redesign it in a way that doesn't affect compatibility with other Formats. Right now, the effort of maintaining the operation is bigger than our development capacity.
This issue is to resonate ways of writing and interacting with replicated data.
Proposed solutions
One solution might be to write replicated data in another folder inside the Table. The solution needs to be elaborated on and proposed in a document, this is only a high-level idea.
The text was updated successfully, but these errors were encountered:
@osopardo1 a question about the replication described above. In the example, there are overflowed records in cube A (I think we call these offsets, yes?). When we say that we are "replicating" to children AA and AB, does it mean that A keeps the overflowed elements and create copies in AA and AB? The figure indicates that the elements are moved to AA and AB, and if so, it is not replicating, it is moving. I'm confused about the concept of replication here.,
The idea of replication states that the offset of the cube would be removed as well, and A should be rewritten with the right amount of elements.
It is true that is not the behavior that we had on the operation at the moment. I will rebuild the figure expressing that A keeps the same records, only replicating the information and not cutting it's content.
Thanks for noticing.
WARNING: Replication would be removed from 0.6.0 version
Multiblock Format
The upcoming release of Qbeast Spark has new protocol updates.
In this modification, we change the layout of a cube spread in multiple files into a single file containing multiple cubes (divided into blocks). This allows the
roll-up
operation to arrange several small cubes into a bigger file, helping queries to filter data more effectively.Original protocol metadata:
NEW protocol metadata:
But changes come with downsides.
What is Replication?
Replication is the operation that optimizes the index for Sampling and Min-Max distribution.
In summary, it reads data from the cubes that are overflowed (containing much more records than their original capacity) and spreads the information to their children.
The problem
When combining Roll-Up and Replication, the replicated (copied) data of a cube might end up in the same Parquet, thus making not possible the compatibility for reading the file from other underlying sources (delta and parquet) and even from the current qbeast implementation.
The situation
It is a very specific feature, and we have to redesign it in a way that doesn't affect compatibility with other Formats. Right now, the effort of maintaining the operation is bigger than our development capacity.
This issue is to resonate ways of writing and interacting with replicated data.
Proposed solutions
The text was updated successfully, but these errors were encountered: