Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Screener & docs #2207

Merged
merged 1 commit into from
Jan 18, 2025
Merged

Improve Screener & docs #2207

merged 1 commit into from
Jan 18, 2025

Conversation

ValueRaider
Copy link
Collaborator

The main change is converting the predefined bodies to use EquityQuery. Also simplify the queries, Yahoo has made them longer than necessary. This will make Screener doc page shorter and clearer.

@ericpien what is purpose of Screener.patch_body()? It's not documented well.

@R5dan R5dan mentioned this pull request Jan 5, 2025
yfinance/screener/screener.py Outdated Show resolved Hide resolved
@@ -69,7 +77,9 @@ def __init__(self, operator: str, operand: Union[numbers.Real, str, List['Equity
if len(operand) <= 0:
raise ValueError('Invalid field for Screener')

if operator in {'OR','AND'}:
if operator == 'IS-IN':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a is-in operand?
Wouldn't it make more sense to make it a function as it is not an actual operand?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about this from design perspective as well. I tried to retain as much "likeliness" as possible to Yahoo's operator types. Since there is no "is-in" operator in yahoo's own query, it could get unnecessarily difficult to debug / extend in the future. Is there a reason to take on that cost?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was what I was trying to get across.
I think it would be better to have a function for is_in if at all:

def is_in(key, *values):
    operands = []
    for oper in values:
        operands.append(EquityQuery("EQ", oper))
    
   return EquityQuery("OR", operands)

Similar to how the to_dict works, just earlier.

Copy link
Collaborator Author

@ValueRaider ValueRaider Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary goal is to simplify users' task of writing queries, and part of that is length.

EquityQuery('IS-IN', ['exchange', 'NMS', 'NYQ'])

vs

EquityQuery('OR', [EquityQuery('EQ, ['exchange', 'NMS'], EquityQuery('EQ, ['exchange', 'NYQ']])

is in is Pythonic, yfinance is Pythonic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is in is Pythonic, yfinance is Pythonic

Im not saying don't have an is-in but make it have a clear separation from EquityQuery. A custom function or class, like how I defined some in my PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming back to this:

Is there a reason to take on that cost?

What's the cost? It's basically a one-liner now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically a one-liner now.

EquityQuery.in_in is also a single line

What's the cost?

Might be misunderstanding but I thought @ericpien was agreeing with me on that statement as he was talking about the cost of debugging EquityQuery("in-in") in the future. I think it is worth it as it keeps it similar as similar to yahoo as possible, it makes it easier to change or add a new one in the future. Also, currently it is in with the "basic" operands yahoo provides, which I feel is confusing as it is not one yahoo provides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be misunderstanding but I thought @ericpien was agreeing with me on that statement as he was talking about the cost of debugging EquityQuery("in-in") in the future.

Yes, I was referring to cost of debugging. For me, the question was more so a design philosophy one. I'm new to the yfinance project so not sure what the preference is in maintaining "likeness" of the source vs offering functionalities where query doesn't explicitly exist on the yahoo side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it is then possible to change the queries returned by the is-in. If the user wants to change a particular part of it later they can, without having to reconstruct the whole thing.

Copy link
Collaborator Author

@ValueRaider ValueRaider Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yfinance has never been a minimal Yahoo query tool, it also formats the data, does a little cleaning too. Why not make interaction a little easier?

If the user wants to change a particular part of it later they can, without having to reconstruct the whole thing.

Let's wait for users to actually request this, after this PR gets released.

@ValueRaider ValueRaider changed the title Begin improving Screener docs Improve Screener docs Jan 5, 2025
@ericpien
Copy link
Contributor

ericpien commented Jan 5, 2025

The main change is converting the predefined bodies to use EquityQuery

Is the advantage of this cleaner documentation?

@ericpien what is purpose of Screener.patch_body()? It's not documented well.

It is meant to update just the provided portion of the body. i.e. user wants to repeatedly use the same Screener but just update the _body.offset by 200. Instead of reconstructing the whole Query or Screener, user can just update that value by passing the dict to the method then iterate in a loop.

yfinance/const.py Outdated Show resolved Hide resolved
@ValueRaider
Copy link
Collaborator Author

The main change is converting the predefined bodies to use EquityQuery

Is the advantage of this cleaner documentation?

It's the only reason, to shorten the docs. Next task is collapsing the long lists.

Also now Screener has been completely redone, it did not "feel right" before.

@ValueRaider ValueRaider marked this pull request as ready for review January 8, 2025 20:51
@ValueRaider
Copy link
Collaborator Author

This is almost ready for review. I just need to think more about the actual execution of a query, my current refactoring still doesn't feel right.

@R5dan
Copy link
Contributor

R5dan commented Jan 9, 2025

Can I ask why predefined screeners are still made and not just sent to the predefined endpoint?

@ValueRaider
Copy link
Collaborator Author

Fewer characters - generated docs are shorter.

@ValueRaider
Copy link
Collaborator Author

ValueRaider commented Jan 9, 2025

I just need to think more about the actual execution of a query, my current refactoring still doesn't feel right.

Fixed

yfinance/const.py Show resolved Hide resolved
yfinance/screener/query.py Show resolved Hide resolved
Comment on lines 111 to 180
# Fetch
_data = YfData(session=session)
params_dict = {"corsDomain": "finance.yahoo.com", "formatted": "false", "lang": "en-US", "region": "US"}
response = _data.post(_SCREENER_URL_,
body=post_query,
user_agent_headers=_data.user_agent_headers,
params=params_dict,
proxy=proxy)
response.raise_for_status()
return response.json()['finance']['result'][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally think this should be moved to a function and that all strings should be sent to the predefined screener url, that way it doesn't need updating if one of the screeners is updated or a new one is added.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all strings should be sent to the predefined screener url

How is this not already happening?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have created all of the predefined screeners. Personally I think that it should be something closer to:

def screen(query:'EquityQuery|str', ...):
    if isinstance(query, str):
        # Handle predefined
        _data = YfData()
        params_dict = {
		"scrIds": query,
		...
        }
        resp = _data.get(url=_PREDEFINED_URL_, params=params_dict, proxy=self.proxy)
        resp.raise_for_status()
        return resp.json()["finance"]["result"][0]

or something like this. There is an endpoint explicitly for predefined screeners, why not use it?

Copy link
Collaborator Author

@ValueRaider ValueRaider Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have created all of the predefined screeners.

Nope, that was @ericpien

yfinance/yfinance/const.py

Lines 534 to 535 in 5adddf3

PREDEFINED_SCREENER_BODY_MAP = {
'aggressive_small_caps': {"offset":0,"size":25,"sortField":"eodvolume","sortType":"desc","quoteType":"equity","query":{"operator":"and","operands":[{"operator":"or","operands":[{"operator":"eq","operands":["exchange","NMS"]},{"operator":"eq","operands":["exchange","NYQ"]}]},{"operator":"or","operands":[{"operator":"LT","operands":["epsgrowth.lasttwelvemonths",15]}]}]},"userId":"","userIdType":"guid"},

One thing I noticed when switching to predefined endpoint, is because only sending the name and not the other fields, then results are different - different sort type, different order. Probably why @ericpien hardcoded the entire query with fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

predefined endpoint, is because only sending the name and not the other fields, then results are different - different sort type, different order

Thats one of the reasons I want to send the request to the other endpoint, a user shouldn't test something on yahoo and use it here only to find that they return 2 different things, with different functionality

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit should resolve this (remember to review web docs not just code)

Copy link
Contributor

@R5dan R5dan Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember to review web docs not just code

I didn't realize that PR's had docs? Or am I misunderstanding your comment?

Having looked at the doc strings for screen I am happy with the implementation, just want a force param as I have mentioned

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This https://ranaroussi.github.io/yfinance/index.html. It's generated from code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't know that updated with PR's, thought it was only the main branch

Copy link
Collaborator Author

@ValueRaider ValueRaider Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't update on PRs, you have to generate it #1084

yfinance/utils.py Outdated Show resolved Hide resolved
yfinance/screener/query.py Show resolved Hide resolved
@R5dan
Copy link
Contributor

R5dan commented Jan 12, 2025

Could there be EquityQuery.screen which calls yf.screen on itself. I think that might be a bit cleaner in may places.

@R5dan
Copy link
Contributor

R5dan commented Jan 13, 2025

I think that in the query there should be a to modify elements for geography (where data can be returned from not what data is returned), almost like a head tag in HTML as for a particular query, so users can keep the actual query but easily change the market for example.

That way there could be is in like functionality to set the geography or to set it worldwide

}

@dynamic_docstring({"predefined_screeners": generate_list_table_from_dict_universal(PREDEFINED_SCREENER_QUERIES, bullets=True, title='Predefined queries (Dec-2024)')})
def screen(query: Union[str, QueryBase],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this should probably be a type literal instead of string. For the majority of users, they will want to stick with a very specific set of options and (if any more are added) they will want to be very deliberate about it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this compatible with allowing users to submit a string not in PREDEFINED_SCREENER_QUERIES?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because 99.9% of users will only need the ones we define, so it should have a type of a string literal to help that, however because of the ease of implementation we shouldn't stop the 0.1% of use cases (which is why there shouldn't have to be an error).

Basically most users will only need to use certain predefined screeners so we type hint to tell them, these are valid screeners
However for those who know the api well, shouldn't be restricted based on ones we defined, especially if it is new, hence why not raising an error and only having a type error (which isnt even raised at runtime)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with the string literal is it's long, 350 characters. Can you trial in your IDE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested it, I personally dont mind having it as a raw string literal, but if you dont want that you could put it in a variable like PREDEFINED_SCREENER_LITERAL and then in the type hint it just shows that but still has the same "functionality" - where the IDE will prompt what to put in and the type checker will error if its not in it.

Comment on lines 126 to 127
if query not in PREDEFINED_SCREENER_QUERIES.keys():
raise ValueError(f'Invalid key {query} provided for predefined screener')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a force param? This would default to False but allow any string instead of just those in PREDEFINED_SCREENER_QUERIES. I know this is contradictory to having query heavily typed but that is because 99.9% of users will only ever need to use the ones stated here but if a new predefined is made, they shouldn't have to wait for an update or be forced to update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a proposed solution. That check is gone, and if a HTTPError occurs then cross-check against PREDEFINED_SCREENER_QUERIES to print a helpful message.

@ValueRaider ValueRaider force-pushed the feature/improve-screener-docs branch 2 times, most recently from 290fb93 to c942e5a Compare January 16, 2025 21:31
- screener now just a function, not a class
- add 'FundQuery' query class
- add 'IS-IN' operator
- fix 'GTE' & 'LTE' operators
- more exchanges
Predefined tweaks:
- convert predefined query strings to use EquityQuery (better docs)
- send predefined queries to Yahoo's predefined endpoint
- expose PREDEFINED_SCREENER_QUERIES via __init__.py
- screen() argument defaults don't apply to predefineds
@ValueRaider ValueRaider force-pushed the feature/improve-screener-docs branch from 831db6e to dee9a55 Compare January 17, 2025 22:22
@ValueRaider ValueRaider merged commit 3f6a46d into dev Jan 18, 2025
2 checks passed
@ValueRaider ValueRaider deleted the feature/improve-screener-docs branch January 18, 2025 11:03
@ValueRaider ValueRaider changed the title Improve Screener docs Improve Screener & docs Jan 18, 2025
@ValueRaider ValueRaider mentioned this pull request Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants