feat: add the Grand Unified PR (gupr) #76

liammulh · 2023-05-24T18:45:24Z

This PR does the following:

adds a code formatter (Black)
adds a linter (ruff)
adds tests (Pytest)
documents code by adding Python docstring comments to functions and methods
attempts to address when crawling too fast, backend gets stuck on disallowed sequence even when crawling is allowed again #69
attempts to address some of the feedback in docs: update README with comprehensive install steps #74
removes unused code, e.g. the User code

After I tested a bit, I don't think this PR addresses the issue of the OEIS sequence offset never being set. We still need to obtain the offset and store it in the DB.

The API has changed somewhat, so if we were using Semantic Versioning, this would be a breaking change / major version upgrade. Specifically, the API functions get_oeis_values, get_oeis_metadata, and get_oeis_factors have been changed. It adds an API function get_oeis_sequence that grabs a sequence object from the database and returns it as JSON. This API function might be useful for debugging.

I tried to simplify the code where I could, and I tried to eliminate side-effects. This might be a controversial decision because, for instance, hitting the get_oeis_values route no longer schedules getting metadata and factors. On the one hand, this is eliminating a side-effect, but on the other hand, it makes the API caller deal with the complexity of issuing a request when they want metadata or factors rather than having them eagerly scheduled.

I am sure there are more changes and decisions I made that I made that aren't covered in this summary. Please ask questions when they arise. I apologize for the scope of this PR. I know it's huge. Please feel free to ask me to change things. I know it will take a long time to review this, and I appreciate your time, @katestange and @gwhitney. Also, I want to say thank you one more time to you both. I've learned a lot from working on Numberscope, and I am very grateful for your mentorship and your patience with me as a newbie software developer! :)

This PR does the following: - adds a code formatter (Black) - adds a linter (ruff) - adds tests (Pytest) - documents code by adding Python docstring comments to functions and methods - attempts to address #69 - attempts to address some of the feedback in #74 There are some minor changes that aren't worth delving into in the commit message. The API has changed somewhat, so if we were using Semantic Versioning, this would be a breaking change / major version upgrade. I will go into more detail in the pull request description on GitHub.

gwhitney · 2023-05-25T14:14:51Z

I am super grateful for all of your effort, Liam. But at the moment I don't personally see how this PR is reviewable/mergeable. It's tantamount to a full rewrite: numerous files are deleted and new ones are created -- github at least isn't tracking whether these are just renamings or if the functionality has been split up differently. It folds in numerous independent features into a single PR: there's absolutely no reason that linting needs to be added at the same time as testing, for example. It has documentation, testing, bug fixes, and refactoring all mixed together. I appreciate that at the end of your term with Numberscope, you have a lot of different improvements you want to get in before you go, but unfortunately I don't think this single-PR approach is the fastest way to get that all accomplished. And finally, it has at least one major change in behavior (not scheduling the download of metadata and references when values are requested) that was not ever discussed as a group. I appreciate the desire to have fewer side effects, but how are we going to restore this important type of behavior? It can't possibly be the front-end's responsibility to direct the back-end to preload data that a user might want. So there has to be some mechanism in the back-end that observes the incoming requests and decides what data to pre-load. Getting the metadata and references for a sequence when its values are requested is admittedly a very elementary version of such a strategy, and one could imagine a different mechanism like a "backend supervisor demon" that looks at the request stream and decides what to preload: maybe if references are requested for a sequence, the values for all related sequences are preloaded, for example. But this PR does not supply any mechanism for cleverer preloading or any preloading at all, so in addition to everything else, it is simply removing functionality.

Of course we should wait for Kate's thoughts on the matter, but I think the most efficient way for the Numberscope project to assimilate the numerous excellent improvements contained in this PR is mark this as "do not merge," and look at it as a template/roadmap for a series of more focused PRs (which likely we will have to generate on our own). If that strategy is adopted, we'd appreciate your thoughts, Liam, on the pieces this could be decomposed into and in what order they might best be merged. Also if you have any comments on the motivation/value of the significant renaming of the code files and refactoring of which bits of code go in which file, that would help us decide whether that renaming is worth doing as one step in the PR series.

Thanks so much for taking the initiative to pursue so many different aspects of improving backscope. I hope we will be able ultimately to incorporate all or at least almost all of them.

katestange · 2023-05-25T20:54:57Z

Glen wrote "I think the most efficient way for the Numberscope project to assimilate the numerous excellent improvements contained in this PR is mark this as "do not merge," and look at it as a template/roadmap for a series of more focused PRs (which likely we will have to generate on our own)." I agree that these are a lot of excellent improvements, and also that it makes sense to break it up, so maybe before you head out, you could suggest a structure for doing this, and we can work through it that way. If you can provide a bit of a roadmap for a natural order in which to do this, that would be great.

liammulh · 2023-05-28T00:23:58Z

I understand! You're right, @gwhitney, it's basically a rewrite. However, I will point that the API is almost the same. It could be pretty easily modified to be exactly the same. It also think it would be easy to add a commit to schedule getting metadata and factors.

If you want to implement the things that were added in this PR in separate PRs, here's the order I'd suggest:

Simplify the README / put Ubuntu installation steps in the `doc` dir

I think the contents of the README in this PR with a few modifications could be used. Then add install-ubuntu.md to the doc dir.

Simplify files and directories / remove unused code

It looks to me like the structure of the files and directories in backscope is sort of borrowed from the Flask tutorial where they are building out a web blog. I think the tutorial is: https://flask.palletsprojects.com/en/0.12.x/tutorial/introduction/. (I think flaskr is the name of the web blog, not an idiomatic Flask Python package name that most Flask apps use.)

Right now, I think backscope is simple enough that we can get away with having just an entry point file (app.py in my PR) and a module for sequence-related stuff — sequence.py. (If we end up storing more than just sequences in the database, it would make sense to put database models in models.py. I think this is idiomatic for Flask apps.)

Add a code formatter

Install a Python code formatter so that the code style is consistent and therefore easier to read.

Linting

Install a linter. It could be ruff (the one I used) or some other linter. It doesn't matter too much; it's just good to have something that says "Hey, you imported something you didn't end up using!", etc. Open up an issue for the lint errors/warnings that show up and work through them as time permits.

Add tests of some sort

Install a test framework or use Python's built in testing framework. Then find something super easy to test. Add the test and some docs on how to use the test framework. Then add more tests as time permits.

Wrap calls to the OEIS in a try/except or check the status code

If the OEIS is complaining about us crawling it too fast, it would make sense for them to send a 429 status code. Or if they send a different status code, we could check to make sure the code is in [200, 300). Or we can wrap calls to the OEIS in a try/except. Or all of the above.

Raise errors/exceptions rather than returning them (optional)

I think it's more typical to raise (throw in JavaScript parlance) the errors, and then except (catch in JavaScript parlance) them in the code that calls them. This won't add complexity to the code because we already have code that deals with error return values. Ideally you could write the code so that errors are caught in one place rather than having to catch them in lots of different places. This would reduce boilerplate and complexity.

Simplify the API (optional)

If you are asking for values, you should get values rather than values, name, and ID. You need the ID to make the request anyway, so the caller already has the ID. If you want the name, create an endpoint for the name rather than getting the name through the values route.

Add Python docstring comments (optional)

Having a consistent way to describe parameters is helpful in reading the code. I used the reStructuredText format in this PR, but there are a lot of different formats to choose from. See this SO answer for a few of the different formats. Epytext looks interesting — it is similar to Javadoc / JSDoc style, which I think we use a bit in frontscope.

gwhitney · 2023-05-28T20:58:41Z

However, I will point that the API is almost the same

I think that is the most critical separation: refactorings should have zero behavior change, and behavior modifications should be associated with the minimal code change possible. Anyhow, thanks for the suggested outline, and when Kate and I get back to it we will try to pull code from this PR in the chunks you suggest. Best of luck in your new position!

gwhitney · 2023-08-26T23:59:52Z

I am in process trying to just extract the documentation updates from this PR. It's tricky because many of the changes that were in this PR are documented (as they should have been for a fully self-consistent PR). So when extracting just the documentation improvements on main as it exists now, many of the documentation changes have to be eliminated. In any case, going through this led me to discover one other significant organizational change that this PR attempted to institute, namely permanently recording and checking in database migrations (as opposed to the procedure in main, which is to reinitialize the database from scratch whenever the schema changes). I'll add that item to the discussion on backscope's trajectory.

gwhitney · 2023-08-27T02:43:42Z

The situation is similar for migrating from python manage.py to flask. Since I have no idea what of the innumerable changes (if any) allowed that migration, I will have to leave the documentation prescribing the use of python manage.py, even though I agree it would be nice to use a standard facility like the flask command if possible.

liammulh mentioned this pull request May 24, 2023

docs: update README with comprehensive install steps #74

Closed

gwhitney marked this pull request as draft May 28, 2023 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add the Grand Unified PR (gupr) #76

feat: add the Grand Unified PR (gupr) #76

liammulh commented May 24, 2023

gwhitney commented May 25, 2023

katestange commented May 25, 2023

liammulh commented May 28, 2023 •

edited

Loading

gwhitney commented May 28, 2023

gwhitney commented Aug 26, 2023

gwhitney commented Aug 27, 2023

feat: add the Grand Unified PR (gupr) #76

Are you sure you want to change the base?

feat: add the Grand Unified PR (gupr) #76

Conversation

liammulh commented May 24, 2023

gwhitney commented May 25, 2023

katestange commented May 25, 2023

liammulh commented May 28, 2023 • edited Loading

Simplify the README / put Ubuntu installation steps in the doc dir

Simplify files and directories / remove unused code

Add a code formatter

Linting

Add tests of some sort

Wrap calls to the OEIS in a try/except or check the status code

Raise errors/exceptions rather than returning them (optional)

Simplify the API (optional)

Add Python docstring comments (optional)

gwhitney commented May 28, 2023

gwhitney commented Aug 26, 2023

gwhitney commented Aug 27, 2023

liammulh commented May 28, 2023 •

edited

Loading

Simplify the README / put Ubuntu installation steps in the `doc` dir