Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic review pass #3

Open
wants to merge 6 commits into
base: review
Choose a base branch
from
Open

Conversation

cite-reader
Copy link

Each commit resolves a separate issue, with explanations in their commit messages.

The most important thing this provides is a list of the package's
dependencies. Version bounds are strongly recommended in the real world,
but this is enough to get started.
Everything in the quickstart's loop is there for a reason.

Most notably, the use of a `finally` block rather than a bare `except`
block. Bare `except` handles every exception, including things you
*really* don't want to catch like out-of-memory conditions. The
quickstart wants to _finalize_ the observer-stop its thread and reclaim
those resources-when exiting the scope it's supposed to be running, but
if the scope is being exited for an abnormal reason we shouldn't
continue on as though everything is fine. `finally` gives us those
semantics, while `except:` empathically does not.

Secondly, the observer thread can terminate for reasons other than the
main thread requesting it stop. Should this occur the original code
would loop infinitely; this new version will instead promptly exit.
Detecting and reporting such a condition is left as an exercise for the
reader.

As a side note, the printout that the observer stopped is moved to after
the thread join. In its original position it did not accurately reflect
the state of the world: the observer had only been _requested_ to stop.
We can only say that it _has_ stopped once we've observed control return
from the thread join.
I'm honestly surprised that overriding an instance method with a static
method works at all. It is at least surprising.
It's essentially universally accepted that global variables are not a
good practice, so here we pass them down to where they are needed as
parameters. In order to illustrate a motivation for the change, we
extend the capabilities of `OnMyWatch` slightly, allowing any number of
independent handlers to be registered.

While we're in there adding parameters, we hoist the creation of the S3
client from deep in the guts of the Handler class to the top level. This
is exactly the "inversion of control" or "dependency injection" pattern
familiar from other languages and gives the same benefits: since client
code determines the runtime type of the client, it is trivial to pass a
mock implementation for testing, and so on. In addition, reusing the
client allows the SDK to transparently reuse HTTPS connections as
appropriate, saving connection overhead.
The body of every conditional branch in the event handler was
essentially identical, so we can collapse the differences into the
contidional itself.

There's also a meaningful semantic change here, which I will insist is
correct based on my experience with file watchers: instead of firing on
modifications, this version fires when the file which was opened for
modifications is closed. In my experience, watchers that fire on every
call to `write(2)` are poorly behaved, because there are usually _many_
such calls involved in real-world edits and you end up in a race with
the program making the edits and it's really just not a good time.
_Also,_ at least on Linux with the inotify backend, a program which
chooses to `mmap` the file to modify it will not generate the modified
events at all. Waiting for the file to be closed does not _eliminate_
all race conditions, but it does minimize common ones.

Finally, I will simply note that this version retains the original
semantic bug regarding matching the file: because it uses a simple
substring match, a request to watch a file `foo` will also match
modifications to `foo_bar`. How this is best corrected depends on the
exact semantics of the watchdog package's interface, and I don't know
them.
This brings the advantage that `--help` works automatically. More
elaborate (and more helpful) argparse setups are left as an exercise.
self.observer.join()



class Handler(FileSystemEventHandler):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to add type hinting. The use of type-hinting seems to becoming a norm in Python through use of static checkers such as flake8, pyright and mypy. Besides that, using type hints definitely helps with debugging and maintenance to help readers of the code understand what data is flowing where. IDE's such as PyCharm (and VSCode, I think) will also warn you when it's obvious that the wrong data type is being applied to type hints - useful for your own development as well.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remembered that boto3 is using some kind of dynamic type stuff (it's been a couple of years) so usually I had to use # type: ignore to pacify mypy and move on.

On the up-side, using # type: ignore would indicate to your reviewers that you know about mypy and assume it will be used and (hopefully) are even using it yourself.

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

s3_bucket_name = ""
filename = ""

class OnMyWatch:
# Set the directory on watch
watchDirectory = "e:/Code/indellinent/test/"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pathlib.Path for easy path management?

)
if event.event_type == 'created' or event.event_type == 'closed' and self.target_filename in event.src_path:
self.s3.upload_file(event.src_path, self.bucket_name, self.target_filename)
print("Watchdog received %s event - %s." % (event.event_type, event.src_path))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logging instead of print statements? (unless, of course, you are specifically wanting console output)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this case I wanted console output, but that is good to know for future.

while True:
time.sleep(5)
except:
while self.observer.isAlive():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a "running indefinitely" process add lots of logging to ensure useful diagnostic information?

loguru is an easy way to add really simple logging to your app, but it does diverge from using the standard logging library (it seems to be fairly popular though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants