-
Notifications
You must be signed in to change notification settings - Fork 37
Add Pipeline Scheduler #1522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Pipeline Scheduler #1522
Conversation
❌ 10/16 passed, 6 failed, 1 skipped, 1m11s total ❌ test_days_since_last_run: TypeError: unsupported operand type(s) for //: 'datetime.datetime' and 'int' (2ms)
❌ test_hours_since_last_run: TypeError: unsupported operand type(s) for //: 'datetime.datetime' and 'int' (13ms)
❌ test_pipeline_scheduler: FileNotFoundError: Credentials file not found at /home/runner/.databricks/labs/remorph/.credentials.yml (22ms)
❌ test_run_python_failure_pipeline: Failed: DID NOT RAISE (254ms)
❌ test_run_sql_failure_pipeline: Failed: DID NOT RAISE (1.489s)
❌ test_gets_maven_version: AttributeError: 'NoneType' object has no attribute 'split' (1m0.18s)
Running from acceptance #488 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good few nits
3. Next, load the process by executing the following command: | ||
|
||
```bash | ||
$ launchctl load ~/Library/LaunchAgents/com.remorph.usagecollection.plist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work, in sequio and above launchctl bootstrap (gui/$id) <>
is the way to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work. I've been running this process on my Mac for over a month now. Happy to update to another approach if you feel it's more appropriate.
@@ -28,16 +35,26 @@ def execute(self): | |||
for step in self.config.steps: | |||
if step.flag == "active": | |||
logging.debug(f"Executing step: {step.name}") | |||
self._execute_step(step) | |||
self.execute_step(step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you expect to people to run execute
and specific step execute_step
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I meant to sync with you. I had to make this a public function so that the scheduler or another process can parse the pipeline config and execute the step based upon the defined frequency. What do you think? I know you made this private so that the pipeline class can execute entirely. But there needs to be a hook for the scheduler class or the pipeline class needs to parse the scheduled frequency info.
<key>ProgramArguments</key> | ||
<array> | ||
<string>/Library/Frameworks/Python.framework/Versions/3.11/bin/python3</string> | ||
<string>~/Downloads/remorph/scheduler/usage_collector.py</string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be $HOME/.databricks/labs/remorph/....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start, and I understand its purpose. That said, there are still some rough edges to smooth out.
On the installation side:
-
We need to be a bit clearer on the pre-requisites for both Linux and macOS.
- Linux: systemd
- macOS: Although we can assume that python is available somewhere, referring to it in the launch agent is an issue. (We can't use /usr/bin/python3: that's Python 3.9 which we don't test against or support.) It should be possible to find it by doing the equivalent of
/usr/bin/env python3.11
(an old#!
trick for Python scripts).
-
The install files all assume that remorph has been installed in a specific location due to fixed paths. The user probably has to edit the files.
-
With all this in mind, I think that rather than having these instructions it might be better to have an installer script. To allow progress I'm happy to accept this as-is (if the instructions are clear for now) and iterate to implement this.
-
Uninstallation is also important: people won't want this forever.
On the Python side, I think it mostly looks fine although I'd really prefer to see timezone-aware timestamps being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These instructions are clear, although I think that we also need to describe how to remove it and clean up afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I didn't even think about uninstallation.
4. Grant the usage collection script with execution permissions: | ||
|
||
```bash | ||
$ chmod +x ./src/databricks/labs/remorph/assessments/scheduler/usage_collector.py | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be earlier… step 2?
Otherwise I think the first run will already be triggered (and fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. Thanks
|
||
1. Open a new Terminal window and navigate to the remorph repository | ||
|
||
2. Copy the plist file to `~/Library/LaunchAgents/` for a user-specific task or `/Library/LaunchDaemons/` for a system-wide task: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most users won't know the difference, I don't think?
My advice here is to advise one of these, or help the user choose.
(My preference: user-specific, given the rest of the instructions assume this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, totally agree. I'll update the PR to install only a user-specific agent.
User=root | ||
|
||
[Install] | ||
WantedBy=multi-user.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EOL at EOF.
|
||
<key>ProgramArguments</key> | ||
<array> | ||
<string>/Library/Frameworks/Python.framework/Versions/3.11/bin/python3</string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't something you can assume, unfortunately:
% ls -l /Library/Frameworks/Python.framework/Versions/3.11/bin/python3
/Library/Frameworks/Python.framework/Versions/3.11/bin/python3: No such file or directory
% sw_vers
ProductName: macOS
ProductVersion: 15.4.1
BuildVersion: 24E263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, great catch. Will update this to be dynamic.
Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
Co-authored-by: Andrew Snare <[email protected]>
Co-authored-by: Andrew Snare <[email protected]>
Co-authored-by: Andrew Snare <[email protected]>
Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
…eduler.py Co-authored-by: Andrew Snare <[email protected]>
@goodwillpunning there are some unsigned commits at the beginning, which means you'll have to recreate this one :/ |
This commit adds system files for scheduling EDW usage collection in Mac and Linux operating systems and a new class,
PipelineScheduler
, which parses a pipeline definition and executes steps based upon the defined frequency.