-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue-435] Job Argument filter for ZyteJobsComparisonMonitor #461
base: master
Are you sure you want to change the base?
Conversation
Hi @shafiq-muhammad! The current tests are failing because we need to mock new things or mock them in a different way. Take for example the test ============================================================================================ FAILURES ============================================================================================
_____________________________________________________________________________ test_jobs_comparison_monitor_get_jobs ______________________________________________________________________________
def test_jobs_comparison_monitor_get_jobs():
mock_client = Mock()
with patch(
"spidermon.contrib.scrapy.monitors.monitors.Client"
) as mock_client_class:
mock_client_class.return_value = mock_client
monitor = TestZyteJobsComparisonMonitor()
monitor._get_tags_to_filter = Mock(side_effect=lambda: None)
monitor.data = Mock()
monitor.crawler.settings.getlist.return_value = None
mock_client.spider.jobs.list = Mock(side_effect=get_paginated_jobs)
# Return exact number of jobs
> jobs = monitor._get_jobs(states=None, number_of_jobs=50)
tests/contrib/scrapy/monitors/test_jobs_comparison_monitor.py:166:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
spidermon/contrib/scrapy/monitors/monitors.py:594: in _get_jobs
if not self._has_desired_args(job, args):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <MONITOR:(TestZyteJobsComparisonMonitor/runTest) at 0x108fe9ea0>, job = <Mock id='4445866160'>, args = []
def _has_desired_args(self, job, args):
if not args and not job.get("spider_args"):
return True
> job_args = job["spider_args"].keys()
E TypeError: 'Mock' object is not subscriptable As seen at the end, it says, "'Mock' object is not subscribable." The act of "subscribing" an object is when we do Investigating where we are mocking the job, we can see that happens when calling This is defined like this earlier in the file: def get_paginated_jobs(**kwargs):
return [Mock() for _ in range(kwargs["count"])] So essentially, Now, we need to prevent def get_paginated_jobs(**kwargs):
mocked_job_meta = []
for _ in range(kwargs["count"]):
jmock = MagicMock()
def mock_getitem(key):
if key == "spider_args":
return {"arg1": "value1"}
jmock.__getitem__.side_effect = mock_getitem
mocked_job_meta.append(jmock)
return mocked_job_meta But since the actual result of def get_paginated_jobs(**kwargs):
mocked_job_meta = []
for _ in range(kwargs["count"]):
mocked_job_meta.append({"spider_args": {"arg1": "value1"}})
return mocked_job_meta That should fix the first part of the test. The rest doesn't use I hope this gives you a hint on how to fix the rest of the tests and add new ones. Let me know how it goes! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #461 +/- ##
==========================================
+ Coverage 79.54% 79.69% +0.14%
==========================================
Files 76 76
Lines 3242 3260 +18
Branches 539 325 -214
==========================================
+ Hits 2579 2598 +19
Misses 593 593
+ Partials 70 69 -1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @shafiq-muhammad!
I realized the filtering logic is not quite what is expected, but I think it can be adapted 👍
The job that will have all the desired arguments will be processed. | ||
Example ["debug_url"] or ["is_full_crawl"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue?: This is not exactly the intention of the original ticket, sorry I didn't realize this before 🙇♀️
We not only want to check that the arguments listed in SPIDERMON_JOBS_COMPARISON_ARGUMENTS appear in the jobs we'll use as comparison, but that the values are the same as well.
Following this example, if the job where we are running the monitor was called with the argument is_full_crawl=1
(and "is_full_crawl" is in SPIDERMON_JOBS_COMPARISON_ARGUMENTS), we want to only consider the past jobs that have the argument "is_full_crawl" with a value of "1". If some past job has the spider argument "is_full_crawl" with a value of "0" instead we'll want to skip it.
In the case that the current job doesn't have that argument (even if it's listed in SPIDERMON_JOBS_COMPARISON_ARGUMENTS), we might want to compare them against past jobs that don't have that argument as well. We could consider that case as having that spider argument set to None in a sense.
Job argument filter to get the past jobs with desired arguments is added for ZyteJobsComparisonMonitor.
SPIDERMON_JOBS_COMPARISON_ARGUMENTS
setting will be used to provide the list of desired arguments.