-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Screener & docs #2207
Improve Screener & docs #2207
Conversation
yfinance/screener/screener_query.py
Outdated
@@ -69,7 +77,9 @@ def __init__(self, operator: str, operand: Union[numbers.Real, str, List['Equity | |||
if len(operand) <= 0: | |||
raise ValueError('Invalid field for Screener') | |||
|
|||
if operator in {'OR','AND'}: | |||
if operator == 'IS-IN': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a is-in
operand?
Wouldn't it make more sense to make it a function as it is not an actual operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about this from design perspective as well. I tried to retain as much "likeliness" as possible to Yahoo's operator types. Since there is no "is-in" operator in yahoo's own query, it could get unnecessarily difficult to debug / extend in the future. Is there a reason to take on that cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was what I was trying to get across.
I think it would be better to have a function for is_in
if at all:
def is_in(key, *values):
operands = []
for oper in values:
operands.append(EquityQuery("EQ", oper))
return EquityQuery("OR", operands)
Similar to how the to_dict
works, just earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary goal is to simplify users' task of writing queries, and part of that is length.
EquityQuery('IS-IN', ['exchange', 'NMS', 'NYQ'])
vs
EquityQuery('OR', [EquityQuery('EQ, ['exchange', 'NMS'], EquityQuery('EQ, ['exchange', 'NYQ']])
is in
is Pythonic, yfinance is Pythonic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is in
is Pythonic, yfinance is Pythonic
Im not saying don't have an is-in but make it have a clear separation from EquityQuery
. A custom function or class, like how I defined some in my PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this:
Is there a reason to take on that cost?
What's the cost? It's basically a one-liner now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically a one-liner now.
EquityQuery.in_in
is also a single line
What's the cost?
Might be misunderstanding but I thought @ericpien was agreeing with me on that statement as he was talking about the cost of debugging EquityQuery("in-in")
in the future. I think it is worth it as it keeps it similar as similar to yahoo as possible, it makes it easier to change or add a new one in the future. Also, currently it is in with the "basic" operands yahoo provides, which I feel is confusing as it is not one yahoo provides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be misunderstanding but I thought @ericpien was agreeing with me on that statement as he was talking about the cost of debugging EquityQuery("in-in") in the future.
Yes, I was referring to cost of debugging. For me, the question was more so a design philosophy one. I'm new to the yfinance project so not sure what the preference is in maintaining "likeness" of the source vs offering functionalities where query doesn't explicitly exist on the yahoo side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it is then possible to change the queries returned by the is-in
. If the user wants to change a particular part of it later they can, without having to reconstruct the whole thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yfinance has never been a minimal Yahoo query tool, it also formats the data, does a little cleaning too. Why not make interaction a little easier?
If the user wants to change a particular part of it later they can, without having to reconstruct the whole thing.
Let's wait for users to actually request this, after this PR gets released.
Is the advantage of this cleaner documentation?
It is meant to update just the provided portion of the body. i.e. user wants to repeatedly use the same |
It's the only reason, to shorten the docs. Next task is collapsing the long lists. Also now |
This is almost ready for review. I just need to think more about the actual execution of a query, my current refactoring still doesn't feel right. |
Can I ask why predefined screeners are still made and not just sent to the predefined endpoint? |
Fewer characters - generated docs are shorter. |
Fixed |
yfinance/screener/screener.py
Outdated
# Fetch | ||
_data = YfData(session=session) | ||
params_dict = {"corsDomain": "finance.yahoo.com", "formatted": "false", "lang": "en-US", "region": "US"} | ||
response = _data.post(_SCREENER_URL_, | ||
body=post_query, | ||
user_agent_headers=_data.user_agent_headers, | ||
params=params_dict, | ||
proxy=proxy) | ||
response.raise_for_status() | ||
return response.json()['finance']['result'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally think this should be moved to a function and that all strings should be sent to the predefined screener url, that way it doesn't need updating if one of the screeners is updated or a new one is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all strings should be sent to the predefined screener url
How is this not already happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have created all of the predefined screeners. Personally I think that it should be something closer to:
def screen(query:'EquityQuery|str', ...):
if isinstance(query, str):
# Handle predefined
_data = YfData()
params_dict = {
"scrIds": query,
...
}
resp = _data.get(url=_PREDEFINED_URL_, params=params_dict, proxy=self.proxy)
resp.raise_for_status()
return resp.json()["finance"]["result"][0]
or something like this. There is an endpoint explicitly for predefined screeners, why not use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have created all of the predefined screeners.
Nope, that was @ericpien
Lines 534 to 535 in 5adddf3
PREDEFINED_SCREENER_BODY_MAP = { | |
'aggressive_small_caps': {"offset":0,"size":25,"sortField":"eodvolume","sortType":"desc","quoteType":"equity","query":{"operator":"and","operands":[{"operator":"or","operands":[{"operator":"eq","operands":["exchange","NMS"]},{"operator":"eq","operands":["exchange","NYQ"]}]},{"operator":"or","operands":[{"operator":"LT","operands":["epsgrowth.lasttwelvemonths",15]}]}]},"userId":"","userIdType":"guid"}, |
One thing I noticed when switching to predefined endpoint, is because only sending the name and not the other fields, then results are different - different sort type, different order. Probably why @ericpien hardcoded the entire query with fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predefined endpoint, is because only sending the name and not the other fields, then results are different - different sort type, different order
Thats one of the reasons I want to send the request to the other endpoint, a user shouldn't test something on yahoo and use it here only to find that they return 2 different things, with different functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest commit should resolve this (remember to review web docs not just code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remember to review web docs not just code
I didn't realize that PR's had docs? Or am I misunderstanding your comment?
Having looked at the doc strings for screen
I am happy with the implementation, just want a force
param as I have mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This https://ranaroussi.github.io/yfinance/index.html. It's generated from code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't know that updated with PR's, thought it was only the main
branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't update on PRs, you have to generate it #1084
Could there be |
I think that in the query there should be a to modify elements for geography (where data can be returned from not what data is returned), almost like a head tag in HTML as for a particular query, so users can keep the actual query but easily change the market for example. That way there could be |
yfinance/screener/screener.py
Outdated
} | ||
|
||
@dynamic_docstring({"predefined_screeners": generate_list_table_from_dict_universal(PREDEFINED_SCREENER_QUERIES, bullets=True, title='Predefined queries (Dec-2024)')}) | ||
def screen(query: Union[str, QueryBase], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion this should probably be a type literal instead of string. For the majority of users, they will want to stick with a very specific set of options and (if any more are added) they will want to be very deliberate about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this compatible with allowing users to submit a string not in PREDEFINED_SCREENER_QUERIES
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because 99.9% of users will only need the ones we define, so it should have a type of a string literal to help that, however because of the ease of implementation we shouldn't stop the 0.1% of use cases (which is why there shouldn't have to be an error).
Basically most users will only need to use certain predefined screeners so we type hint to tell them, these are valid screeners
However for those who know the api well, shouldn't be restricted based on ones we defined, especially if it is new, hence why not raising an error and only having a type error (which isnt even raised at runtime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with the string literal is it's long, 350 characters. Can you trial in your IDE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested it, I personally dont mind having it as a raw string literal, but if you dont want that you could put it in a variable like PREDEFINED_SCREENER_LITERAL
and then in the type hint it just shows that but still has the same "functionality" - where the IDE will prompt what to put in and the type checker will error if its not in it.
yfinance/screener/screener.py
Outdated
if query not in PREDEFINED_SCREENER_QUERIES.keys(): | ||
raise ValueError(f'Invalid key {query} provided for predefined screener') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be a force
param? This would default to False
but allow any string instead of just those in PREDEFINED_SCREENER_QUERIES
. I know this is contradictory to having query heavily typed but that is because 99.9% of users will only ever need to use the ones stated here but if a new predefined is made, they shouldn't have to wait for an update or be forced to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a proposed solution. That check is gone, and if a HTTPError occurs then cross-check against PREDEFINED_SCREENER_QUERIES
to print a helpful message.
290fb93
to
c942e5a
Compare
- screener now just a function, not a class - add 'FundQuery' query class - add 'IS-IN' operator - fix 'GTE' & 'LTE' operators - more exchanges Predefined tweaks: - convert predefined query strings to use EquityQuery (better docs) - send predefined queries to Yahoo's predefined endpoint - expose PREDEFINED_SCREENER_QUERIES via __init__.py - screen() argument defaults don't apply to predefineds
831db6e
to
dee9a55
Compare
The main change is converting the predefined bodies to use
EquityQuery
. Also simplify the queries, Yahoo has made them longer than necessary. This will make Screener doc page shorter and clearer.@ericpien what is purpose of
Screener.patch_body()
? It's not documented well.