-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: MAST query result cache support outline #1578
base: main
Are you sure you want to change the base?
Conversation
ed2de79
to
e9ecbe5
Compare
Proposed changes outline. @ceb8 @barentsen feedback welcome.
Discussion Points:
|
Codecov Report
@@ Coverage Diff @@
## main #1578 +/- ##
==========================================
- Coverage 69.18% 63.11% -6.08%
==========================================
Files 304 133 -171
Lines 22529 17348 -5181
==========================================
- Hits 15587 10949 -4638
+ Misses 6942 6399 -543
... and 245 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
we don't yet have a good solution to that. |
Basic caching support ( Please review before I proceed any further. It it's agreed, similar pattern can be applied to other MAST query functions. Notes from my side:
|
I don't yet see why the tests are failing, it looks very unrelated and something I couldn't yet reproduce locally. Also, I'm travelling this week, but will try to get back to this and do a reasonable review soon. |
For failing tests: for now I assume that they are due to some transient issues with CI, and expect they will recover in the next push. |
(Just wanted to subscribe and say I am super excited about this PR, and will be happy to help test.) |
I'm swamped through Monday, but should have time to test this later next week. |
This implements a really useful feature. At the moment I frequently query MAST in order to get the same "search result" as part of my pipeline to iterate through a large list of objects, even though the files themselves are cached. Is there anyway I can help get this merged? |
@christinahedges I will try to make time next week to look at this PR. I am also in general concerned with the lack of an expiration on the cache, which has bitten me before when for example a colleague and I queried Ned repeatedly, not understanding why we were getting different results and it turned out that one of us had made the same query a year prior (according to the date on the cache file) and were still accessing that very old search result. This has also come up within astroquery.mast with Tesscut which does use astroquery caching, resulting in users not finding new results (there are TESS releases ~monthly). |
@ceb8 I think this is a really good point. In particular if users query for e.g. TESS data, there will potentially be new targets and files every month. For the most part, I want this functionality so that when I re-run my scripts 5 times a day, I'm not waiting for a result from MAST. If we had an expiration date of e.g. 7 days, that would totally meet my needs. |
@christinahedges Yeah, I think ideally the cache will have a default timeout on the short scale, the user will have the ability to set it to a user specified value, and there will always be a no-cache option. |
@bsipocz I will try to look at the caching situation overall next week then, and gather what needs to be done and make a plan. 😃 |
@bsipocz @christinahedges Opened WIP PR #1634 to address the general caching situation. Will look at this PR next. |
That is awesome thank you! I can't wait for this functionality! 🎉 🎉 🎉 🎉 🎉 🎉 🎉 |
d1a8f66
to
d949888
Compare
(then this PR will contain the changes from that one, etc, but we'll sort that out along the way). Also, this PR is very much considered WIP/draft as part of testing the other one, so please everyone, no detailed code reviews until we sort out the big picture! |
d949888
to
2d73cb9
Compare
Hmm, I think something went bad during the rebase. did you try to do it interactively, and keep only the one commit from this branch, rather than the ones from |
2d73cb9
to
441c246
Compare
@bsipocz Found some interesting things:
|
can you elaborate on this with an example? There are problems with the MAST (and gemini) class naming already, I wonder whether this would be a non-issue if that one is solved or are unrelated. |
@bsipocz It's unrelated. Basically MAST hands off querying to a difference class depending on what interface is needed, and that class does the caching, rather than the top level class the user queried. Example:
In this example you would expect the cache location to be Observations.cache_location (.astropy/cache/astroquery/Observations) but it actually caches the file in .astropy/cache/astroquery/PortalAPI because the PortalAPI class is the one that handles this type of query. |
Yeap, this is pretty bad. At least the issue of having two Would you think adding one or more layers help? Or maybe the full namespace would be the solution, as we have substructure in a few modules (though caching everything esa together, and everything ipac, and everything solarsystem, etc doesn't feel that wrong as having just one flat layer. |
btw, this is exactly the issue that I hoped to be able to recover before merging the other PR. |
@bsipocz We could use the full namespace, but from a user perspective... I think it might make more sense to cache at one higher level i.e. all mast together, all esa together etc... There is no reason for example that a user should need to know which particular MAST interface they are accessing. Also: an additional thing I just noticed is that I'm using a astroquery-wide conf class that I put in the top level |
yes, but mast is very much the odd one out, the whole structure of that module is totally different from the rest. The substructure I referred to in the previous comment is |
@bsipocz I'm wondering if there is a way to use the module conf structure to facilitate this? So that individual modules can override the defaults when it makes sense. |
yes, I suppose potentially that can work. Go the way which is the simplest implementation. I really think that a one layer is enough, especially if that's easier to do. |
@bsipocz So it turned out the mechanism for overriding the default behavior is already in place. The default cache location using the class property |
d377e99
to
460fd24
Compare
Closes #1577 .
So far this is an outline of the proposed change. I'd like to use the PR to solicit feedback (and gauge the efforts required) before proceeding further.