Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting old cutouts #214

Closed
RoyWilliams opened this issue Nov 27, 2024 · 2 comments
Closed

Deleting old cutouts #214

RoyWilliams opened this issue Nov 27, 2024 · 2 comments
Assignees

Comments

@RoyWilliams
Copy link
Contributor

Figure out how to delete old cutouts.

We made the cutouts table so that old cutouts can be deleted

CREATE TABLE cutouts.cutouts (
    imjd int,
    "cutoutId" ascii,
    cutoutimage blob,
    "objectId" bigint,
    PRIMARY KEY ((imjd, "cutoutId"))
)

What is the correct query to do this in cqlsh? The obvious doesn't seem to work ...

cqlsh:cutouts> delete from cutouts where imjd=60000;
InvalidRequest: Error from server: code=2200 [Invalid query] 
message="Some partition key parts are missing: cutoutId"
@genghisken
Copy link
Collaborator

The imjd column should actually be a float. Either that or a truncated float - so that we can select it exactly. Or (imjd, smjd) where secs is an integer number of seconds (intra day) generated from the mjd. Then we have the "entropy" that we need to evenly distribute the data. So our whole index would be something like ((imjd, smjd), "cutoutId") - or we just have a new column which is mjdsecs, which is just a concatenation of imjd and the number of seconds in that day (e.g. mjdsecs, hence index would be just (mjdsecs, "cutoutId"). Will do some local experiments...) Could be 10s of seconds or minutes. We already calculate the imjd from the jd anyway, so why not go a bit deeper than just days. A single column makes things easier to select. E.g. mjd * 1000 + (int)(day/1000) or mjd * 100000 + (number of seconds).

@genghisken
Copy link
Collaborator

It also turns out that Cassandra has a built in TTL (time to live) when doing INSERTs (specified in seconds), which would completely eliminate the need to do the deletes manually. E.g.:

INSERT INTO cutouts (imjd, "cutoutId", cutoutimage, "objectId") VALUES (...) USING TTL 7776000;

(where 7,776,000 seconds is 90 days - or we could calculate the offset from the MJD in the alert packet)

We could experiment with doing inserts with very low TTL. If we go down this route we should change the definition of the table to explicitly use the Time Window Compaction Strategy.

Experiments to follow for both methods above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants