chore: random slack code cleanup #5307

joeyorlando · 2024-11-28T15:42:16Z

What this PR does

Related to #5287

Few random "clean-ups", type improvements, etc.

Additionally, fixes a change made in #5292; we should wait to read from slack_message.channel.slack_id, until we've performed the data-migration mentioned in that PR (in the mean-time we should continue to use slack_message._channel_id).

Checklist

Unit, integration, and e2e (if applicable) tests updated
Documentation added (or pr:no public docs PR label added if not required)
Added the relevant release notes label (see labels prefixed w/ release:). These labels dictate how your PR will
show up in the autogenerated release notes.

joeyorlando · 2024-11-28T15:44:07Z

engine/apps/alerts/tasks/compare_escalations.py

I found the name of this function confusing (given the contexts in which it was invoked) + it's only doing an equality check, so I don't think we need a separate function just for that?

joeyorlando · 2024-11-28T15:45:24Z

engine/requirements.in

this is a random change, but we discussed it here (since it's only a one-liner change + this library isn't used, decided to just squeeze it in this PR)

joeyorlando · 2024-11-28T15:49:00Z

engine/apps/slack/scenarios/distribute_alerts.py

+                self._post_alert_group_to_slack(
+                    slack_team_identity=slack_team_identity,
+                    alert_group=alert_group,
+                    alert=alert,
+                    attachments=alert_group.render_slack_attachments(),
+                    slack_channel=slack_channel,
+                    blocks=alert_group.render_slack_blocks(),
                )
-                self._send_first_alert(alert, slack_channel)


moved logic from _send_first_alert here (it was only invoked once, here, and was just passing through to self._post_alert_group_to_slack as can be seen here)

joeyorlando · 2024-11-28T15:51:40Z

engine/apps/slack/scenarios/distribute_alerts.py

+class IncomingAlertStep(scenario_step.ScenarioStep):
    def process_signal(self, alert: Alert) -> None:
+        """
+        Here lay dragons. Below is some added context as to why we have the explicit
+        `AlertGroup.objects.filter(...).update(...)` calls.
+
+        For example:
+        ```
+        num_updated_rows = AlertGroup.objects.filter(
+            pk=alert.group.pk, slack_message_sent=False,
+        ).update(slack_message_sent=True)
+
+        if num_updated_rows == 1:
+            # post new message
+        else:
+            # update existing message
+        ```
+
+        This piece of code guarantees that when 2 alerts are created concurrently we don't end up posting a
+        message twice. we rely on the fact that the `UPDATE` SQL statements are atomic. this is not the same as:
+
+        ```
+        if not alert_group.slack_message_sent:
+            # post new message
+            alert_group.slack_message_sent = True
+        else:
+            # update existing message
+        ```
+
+        This would end up posting multiple Slack messages for a single alert group (classic race condition). And then
+        all kinds of unexpected behaviours would arise, because some parts of the codebase assume there can only be 1
+        `SlackMessage` per `AlertGroup`.
+        """
+
+        alert_group = alert.group
+
+        if not alert_group:
+            # this case should hypothetically never happen, it's mostly to appease mypy with the
+            # fact that alert.group can "technically" be None
+            logger.warning(
+                f"Skip IncomingAlertStep.process_signal because alert {alert.pk} doesn't actually "
+                "have an alert group associated with it"
+            )
+            return
+
+        alert_group_pk = alert_group.pk
+        alert_receive_channel = alert_group.channel
+        should_skip_escalation_in_slack = alert_group.skip_escalation_in_slack
+        slack_team_identity = self.slack_team_identity
+        slack_team_identity_pk = slack_team_identity.pk


curious on others thoughts around the naming change here? Personally AlertShootingStep (🔫) was not super clear to me on what it did.

Also, main changes here are just extracting things out to variables + added some more detailed comments about why things are they way they are here (thanks to @Ferril and @vadimkerr for this conversation ❤️)

yes IncomingAlertStep sounds much better IMO

joeyorlando · 2024-11-28T15:52:46Z

engine/apps/slack/scenarios/resolution_note.py

-class ReadEditPostmortemStep(ResolutionNoteModalStep):
-    # Left for backward compatibility with slack messages created before postmortems -> resolution note change
-    pass


probably okay to drop this? I don't see any references to this over the last 6 weeks (logs)

joeyorlando · 2024-11-28T15:53:30Z

engine/apps/alerts/tests/test_maintenance.py

mock_start_disable_maintenance_task isn't needed.. these tests still pass without it, deleting this fixture

joeyorlando · 2024-11-28T15:55:31Z

engine/apps/alerts/tests/test_notify_user.py

-    with patch("apps.alerts.tasks.notify_user.compare_escalations", return_value=True):
-        # send notification for 2 active alert groups
-        send_bundled_notification(notification_bundle.id)
-        assert f"alert_group {alert_group_3.id} is not active, skip notification" in caplog.text
-        assert "perform bundled notification for alert groups with ids:" in caplog.text
-        # check bundle_uuid was set, notification for resolved alert group was deleted
-        assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0
-        assert notification_bundle.notifications.all().count() == 2
-        assert not notification_bundle.notifications.filter(alert_group=alert_group_3).exists()
-
-        # send notification for 1 active alert group
-        notification_bundle.notifications.update(bundle_uuid=None)
-        alert_group_2.resolve()
-        send_bundled_notification(notification_bundle.id)
-        assert f"alert_group {alert_group_2.id} is not active, skip notification" in caplog.text
-        assert (
-            f"there is only one alert group in bundled notification, perform regular notification. "
-            f"alert_group {alert_group_1.id}"
-        ) in caplog.text
-        # check bundle_uuid was set
-        assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0
-        assert notification_bundle.notifications.all().count() == 1
-        # cleanup notifications
-        notification_bundle.notifications.all().delete()
-
-        # send notification for 0 active alert group
-        notification_bundle.append_notification(alert_group_1, notification_policy)
-        alert_group_1.resolve()
-        send_bundled_notification(notification_bundle.id)
-        assert f"alert_group {alert_group_1.id} is not active, skip notification" in caplog.text
-        assert f"no alert groups to notify about or notification is not allowed for user {user.id}" in caplog.text
-        # check all notifications were deleted
-        assert notification_bundle.notifications.all().count() == 0
+
+    # send notification for 2 active alert groups
+    send_bundled_notification.apply((notification_bundle.id,), task_id=task_id)
+
+    assert f"alert_group {alert_group_3.id} is not active, skip notification" in caplog.text
+    assert "perform bundled notification for alert groups with ids:" in caplog.text
+
+    # check bundle_uuid was set, notification for resolved alert group was deleted
+    assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0
+    assert notification_bundle.notifications.all().count() == 2
+    assert not notification_bundle.notifications.filter(alert_group=alert_group_3).exists()
+
+    # send notification for 1 active alert group
+    notification_bundle.notifications.update(bundle_uuid=None)
+
+    # since we're calling send_bundled_notification several times within this test, we need to reset task_id
+    # because it gets set to None after the first call
+    notification_bundle.notification_task_id = task_id
+    notification_bundle.save()
+
+    alert_group_2.resolve()
+
+    send_bundled_notification.apply((notification_bundle.id,), task_id=task_id)
+
+    assert f"alert_group {alert_group_2.id} is not active, skip notification" in caplog.text
+    assert (
+        f"there is only one alert group in bundled notification, perform regular notification. "
+        f"alert_group {alert_group_1.id}"
+    ) in caplog.text
+
+    # check bundle_uuid was set
+    assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0
+    assert notification_bundle.notifications.all().count() == 1
+
+    # cleanup notifications
+    notification_bundle.notifications.all().delete()
+
+    # send notification for 0 active alert group
+    notification_bundle.append_notification(alert_group_1, notification_policy)
+
+    # since we're calling send_bundled_notification several times within this test, we need to reset task_id
+    # because it gets set to None after the first call
+    notification_bundle.notification_task_id = task_id
+    notification_bundle.save()
+
+    alert_group_1.resolve()
+
+    send_bundled_notification.apply((notification_bundle.id,), task_id=task_id)
+
+    assert f"alert_group {alert_group_1.id} is not active, skip notification" in caplog.text
+    assert f"no alert groups to notify about or notification is not allowed for user {user.id}" in caplog.text
+
+    # check all notifications were deleted
+    assert notification_bundle.notifications.all().count() == 0


because I removed the compare_escalations function, I need to reset the notification_task_id value after each invocation of send_bundled_notification (alternative here would be to move these different test cases out to separate functions)

joeyorlando · 2024-11-28T15:55:46Z

engine/apps/alerts/tests/test_silence.py

mock_start_disable_maintenance_task not used

engine/apps/api/tests/conftest.py

joeyorlando · 2024-11-28T16:01:10Z

engine/apps/slack/scenarios/distribute_alerts.py

+                # send debug mode notice
+                text = "Escalations are silenced due to Debug mode"
+                self._slack_client.chat_postMessage(
+                    channel=slack_channel_id,
+                    text=text,
+                    attachments=[],
+                    thread_ts=alert_group.slack_message.slack_id,
+                    mrkdwn=True,
+                    blocks=[
+                        {
+                            "type": "section",
+                            "text": {
+                                "type": "mrkdwn",
+                                "text": text,
+                            },
+                        },
+                    ],
+                )


same thing here, moved _send_debug_mode_notice inline here because it's only called once (here) and is simply making a call to self._slack_client.chat_postMessage

joeyorlando · 2024-11-28T16:01:48Z

engine/apps/slack/scenarios/distribute_alerts.py

+            if not alert_group.is_maintenance_incident and not should_skip_escalation_in_slack:
+                send_message_to_thread_if_bot_not_in_channel.apply_async(
+                    (alert_group_pk, slack_team_identity_pk, slack_channel_id),
+                    countdown=1,  # delay for message so that the log report is published first
+                )

-            if alert.group.is_maintenance_incident:
-                # not sending log report message for maintenance incident
-                pass
-            else:
-                # check if alert group was posted to slack before posting message to thread
-                if not alert.group.skip_escalation_in_slack:
-                    self._send_message_to_thread_if_bot_not_in_channel(alert.group, slack_channel)


simplified nested conditional logic here + moved self._send_message_to_thread_if_bot_not_in_channel inline (it was just making a call to send_message_to_thread_if_bot_not_in_channel.apply_async)

joeyorlando · 2024-11-28T16:02:53Z

engine/apps/slack/scenarios/distribute_alerts.py

-                self._slack_client.reactions_remove(channel=message.slack_channel_id, name="memo", timestamp=message.ts)
+                self._slack_client.reactions_remove(
+                    channel=message.slack_channel_slack_id, name="memo", timestamp=message.ts
+                )


this was a typo.

ResolutionNoteSlackMessage has a slack_channel_slack_id property which points to the Slack ID of the associated channel (ex. C123FB4X; slack_channel_id will actually point to the primary key ID of the associated slack_channel database record; ex. 14)

joeyorlando · 2024-11-28T16:03:08Z

engine/apps/slack/scenarios/distribute_alerts.py

-                self._slack_client.chat_delete(channel=message.slack_channel_id, ts=message.ts)
+                self._slack_client.chat_delete(channel=message.slack_channel_slack_id, ts=message.ts)


this was a typo.

ResolutionNoteSlackMessage has a slack_channel_slack_id property which points to the Slack ID of the associated channel (ex. C123FB4X; slack_channel_id will actually point to the primary key ID of the associated slack_channel database record; ex. 14)

vadimkerr

LGTM

vadimkerr · 2024-11-29T09:52:27Z

engine/apps/slack/scenarios/distribute_alerts.py

    def process_signal(self, alert: Alert) -> None:
+        """
+        🐉 Here lay dragons 🐉


Ferril

LGTM 👍

…5315) # What this PR does Inspired by [this discussion](#5307 (comment)). _tldr;_ ensures that if any of our tests try making an external network call, they will fail. Setup an example test: ```python def test_external_network_call(): import requests response = requests.get('https://www.example.com') assert response.status_code == 200 ``` and it worked (failed; [example CI test run](https://github.com/grafana/oncall/actions/runs/12106416991/job/33752144727?pr=5315#step:6:389)) as expected: ```bash __________________________ test_external_network_call __________________________ def test_external_network_call(): import requests > response = requests.get('https://www.example.com') requests = <module 'requests' from '/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/__init__.py'> apps/test_joey.py:4: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/api.py:73: in get return request("get", url, params=params, **kwargs) kwargs = {} params = None url = 'https://www.example.com' /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/api.py:59: in request return session.request(method=method, url=url, **kwargs) kwargs = {'params': None} method = 'get' session = <requests.sessions.Session object at 0x7f10ebaada90> url = 'https://www.example.com' /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/sessions.py:589: in request resp = self.send(prep, **send_kwargs) allow_redirects = True auth = None cert = None cookies = None data = None files = None headers = None hooks = None json = None method = 'get' params = None prep = <PreparedRequest [GET]> proxies = {} req = <Request [GET]> self = <requests.sessions.Session object at 0x7f10ebaada90> send_kwargs = {'allow_redirects': True, 'cert': None, 'proxies': OrderedDict(), 'stream': False, ...} settings = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'verify': True} stream = None timeout = None url = 'https://www.example.com' verify = None /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/sessions.py:703: in send r = adapter.send(request, **kwargs) adapter = <requests.adapters.HTTPAdapter object at 0x7f10ebaada30> allow_redirects = True hooks = {'response': []} kwargs = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'timeout': None, ...} request = <PreparedRequest [GET]> self = <requests.sessions.Session object at 0x7f10ebaada90> start = 1733064371.649901 stream = False /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/adapters.py:667: in send resp = conn.urlopen( cert = None chunked = False conn = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30> proxies = OrderedDict() request = <PreparedRequest [GET]> self = <requests.adapters.HTTPAdapter object at 0x7f10ebaada30> stream = False timeout = Timeout(connect=None, read=None, total=None) url = '/' verify = True /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:715: in urlopen httplib_response = self._make_request( assert_same_host = False body = None body_pos = None chunked = False clean_exit = False conn = None destination_scheme = None err = None headers = {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'} http_tunnel_required = False is_new_proxy_conn = False method = 'GET' parsed_url = Url(scheme=None, auth=None, host=None, port=None, path='/', query=None, fragment=None) pool_timeout = None redirect = False release_conn = False release_this_conn = True response_kw = {'decode_content': False, 'preload_content': False} retries = Retry(total=0, connect=None, read=False, redirect=None, status=None) self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30> timeout = Timeout(connect=None, read=None, total=None) timeout_obj = Timeout(connect=None, read=None, total=None) url = '/' /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:404: in _make_request self._validate_conn(conn) chunked = False conn = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60> httplib_request_kw = {'body': None, 'headers': {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}} method = 'GET' self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30> timeout = Timeout(connect=None, read=None, total=None) timeout_obj = Timeout(connect=None, read=None, total=None) url = '/' /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:1060: in _validate_conn conn.connect() __class__ = <class 'urllib3.connectionpool.HTTPSConnectionPool'> conn = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60> self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30> /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connection.py:363: in connect self.sock = conn = self._new_conn() self = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60> /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connection.py:174: in _new_conn conn = connection.create_connection( extra_kw = {'socket_options': [(6, 1, 1)]} self = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60> /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/util/connection.py:85: in create_connection sock.connect(sa) address = ('www.example.com', 443) af = <AddressFamily.AF_INET: 2> canonname = '' err = None family = <AddressFamily.AF_UNSPEC: 0> host = 'www.example.com' port = 443 proto = 6 res = (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('93.184.215.14', 443)) sa = ('93.184.215.14', 443) sock = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)> socket_options = [(6, 1, 1)] socktype = <SocketKind.SOCK_STREAM: 1> source_address = None timeout = None _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ inst = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)> args = (('93.184.215.14', 443),), host = '93.184.215.14' def guarded_connect(inst, *args): host = host_from_connect_args(args) if host in allowed_ip_hosts_and_hostnames or ( _is_unix_socket(inst.family) and allow_unix_socket ): return _true_connect(inst, *args) > raise SocketConnectBlockedError(allowed_list, host) E pytest_socket.SocketConnectBlockedError: A test tried to use socket.socket.connect() with host "93.184.215.14" (allowed: "calendar.google.com (142.251.167.100,142.251.167.101,142.251.167.102,142.251.167.113,142.251.167.138,142.251.167.139,2607:f8b0:4004:c09::65,2607:f8b0:4004:c09::66,2607:f8b0:4004:c09::71,2607:f8b0:4004:c09::8b),localhost (127.0.0.1,::1),oncall-dev-mariadb ()"). allow_unix_socket = False allowed_ip_hosts_and_hostnames = {'127.0.0.1', '142.251.167.100', '142.251.167.101', '142.251.167.102', '142.251.167.113', '142.251.167.138', ...} allowed_list = ['calendar.google.com (142.251.167.100,142.251.167.101,142.251.167.102,142.251.167.113,142.251.167.138,142.251.167.139...8b0:4004:c09::66,2607:f8b0:4004:c09::71,2607:f8b0:4004:c09::8b)', 'localhost (127.0.0.1,::1)', 'oncall-dev-mariadb ()'] args = (('93.184.215.14', 443),) host = '93.184.215.14' inst = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)> /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/pytest_socket.py:252: SocketConnectBlockedError ``` ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] Added the relevant release notes label (see labels prefixed w/ `release:`). These labels dictate how your PR will show up in the autogenerated release notes.

chore: random slack code cleanup

41021bf

joeyorlando added pr:no public docs Added to a PR that does not require public documentation updates release:patch PR will be added to "Other Changes" section of release notes labels Nov 28, 2024

joeyorlando requested a review from a team as a code owner November 28, 2024 15:42

joeyorlando had a problem deploying to github-pages November 28, 2024 15:42 — with GitHub Actions Error

joeyorlando commented Nov 28, 2024

View reviewed changes

update comment

ce34e1a

joeyorlando temporarily deployed to github-pages November 28, 2024 15:51 — with GitHub Actions Inactive

joeyorlando commented Nov 28, 2024

View reviewed changes

engine/apps/alerts/tests/test_silence.py Outdated

Copy link

Contributor Author

joeyorlando Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_start_disable_maintenance_task not used

joeyorlando commented Nov 28, 2024

View reviewed changes

engine/apps/api/tests/conftest.py Show resolved Hide resolved

update comment

11f44a9

joeyorlando temporarily deployed to github-pages November 28, 2024 16:00 — with GitHub Actions Inactive

joeyorlando commented Nov 28, 2024

View reviewed changes

joeyorlando requested a review from Ferril November 28, 2024 17:01

Merge branch 'dev' into jorlando/slack-cleanup

a9417ef

joeyorlando temporarily deployed to github-pages November 28, 2024 18:09 — with GitHub Actions Inactive

vadimkerr approved these changes Nov 29, 2024

View reviewed changes

Ferril approved these changes Nov 29, 2024

View reviewed changes

joeyorlando added this pull request to the merge queue Nov 29, 2024

Merged via the queue into dev with commit 5227ee3 Nov 29, 2024
26 checks passed

joeyorlando deleted the jorlando/slack-cleanup branch November 29, 2024 13:45

joeyorlando mentioned this pull request Dec 1, 2024

chore: add pytest-socket library + disable network calls in tests #5315

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: random slack code cleanup #5307

chore: random slack code cleanup #5307

joeyorlando commented Nov 28, 2024 •

edited

Loading

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

vadimkerr Nov 29, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

joeyorlando Nov 28, 2024

vadimkerr left a comment

vadimkerr Nov 29, 2024

Ferril left a comment

		self._slack_client.chat_delete(channel=message.slack_channel_id, ts=message.ts)
		self._slack_client.chat_delete(channel=message.slack_channel_slack_id, ts=message.ts)

chore: random slack code cleanup #5307

chore: random slack code cleanup #5307

Conversation

joeyorlando commented Nov 28, 2024 • edited Loading

What this PR does

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vadimkerr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ferril left a comment

Choose a reason for hiding this comment

joeyorlando commented Nov 28, 2024 •

edited

Loading