-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: random slack code cleanup #5307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the name of this function confusing (given the contexts in which it was invoked) + it's only doing an equality check, so I don't think we need a separate function just for that?
engine/requirements.in
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a random change, but we discussed it here (since it's only a one-liner change + this library isn't used, decided to just squeeze it in this PR)
self._post_alert_group_to_slack( | ||
slack_team_identity=slack_team_identity, | ||
alert_group=alert_group, | ||
alert=alert, | ||
attachments=alert_group.render_slack_attachments(), | ||
slack_channel=slack_channel, | ||
blocks=alert_group.render_slack_blocks(), | ||
) | ||
self._send_first_alert(alert, slack_channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved logic from _send_first_alert
here (it was only invoked once, here, and was just passing through to self._post_alert_group_to_slack
as can be seen here)
class IncomingAlertStep(scenario_step.ScenarioStep): | ||
def process_signal(self, alert: Alert) -> None: | ||
""" | ||
Here lay dragons. Below is some added context as to why we have the explicit | ||
`AlertGroup.objects.filter(...).update(...)` calls. | ||
|
||
For example: | ||
``` | ||
num_updated_rows = AlertGroup.objects.filter( | ||
pk=alert.group.pk, slack_message_sent=False, | ||
).update(slack_message_sent=True) | ||
|
||
if num_updated_rows == 1: | ||
# post new message | ||
else: | ||
# update existing message | ||
``` | ||
|
||
This piece of code guarantees that when 2 alerts are created concurrently we don't end up posting a | ||
message twice. we rely on the fact that the `UPDATE` SQL statements are atomic. this is not the same as: | ||
|
||
``` | ||
if not alert_group.slack_message_sent: | ||
# post new message | ||
alert_group.slack_message_sent = True | ||
else: | ||
# update existing message | ||
``` | ||
|
||
This would end up posting multiple Slack messages for a single alert group (classic race condition). And then | ||
all kinds of unexpected behaviours would arise, because some parts of the codebase assume there can only be 1 | ||
`SlackMessage` per `AlertGroup`. | ||
""" | ||
|
||
alert_group = alert.group | ||
|
||
if not alert_group: | ||
# this case should hypothetically never happen, it's mostly to appease mypy with the | ||
# fact that alert.group can "technically" be None | ||
logger.warning( | ||
f"Skip IncomingAlertStep.process_signal because alert {alert.pk} doesn't actually " | ||
"have an alert group associated with it" | ||
) | ||
return | ||
|
||
alert_group_pk = alert_group.pk | ||
alert_receive_channel = alert_group.channel | ||
should_skip_escalation_in_slack = alert_group.skip_escalation_in_slack | ||
slack_team_identity = self.slack_team_identity | ||
slack_team_identity_pk = slack_team_identity.pk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious on others thoughts around the naming change here? Personally AlertShootingStep
(🔫) was not super clear to me on what it did.
Also, main changes here are just extracting things out to variables + added some more detailed comments about why things are they way they are here (thanks to @Ferril and @vadimkerr for this conversation ❤️)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes IncomingAlertStep
sounds much better IMO
class ReadEditPostmortemStep(ResolutionNoteModalStep): | ||
# Left for backward compatibility with slack messages created before postmortems -> resolution note change | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably okay to drop this? I don't see any references to this over the last 6 weeks (logs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mock_start_disable_maintenance_task
isn't needed.. these tests still pass without it, deleting this fixture
with patch("apps.alerts.tasks.notify_user.compare_escalations", return_value=True): | ||
# send notification for 2 active alert groups | ||
send_bundled_notification(notification_bundle.id) | ||
assert f"alert_group {alert_group_3.id} is not active, skip notification" in caplog.text | ||
assert "perform bundled notification for alert groups with ids:" in caplog.text | ||
# check bundle_uuid was set, notification for resolved alert group was deleted | ||
assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0 | ||
assert notification_bundle.notifications.all().count() == 2 | ||
assert not notification_bundle.notifications.filter(alert_group=alert_group_3).exists() | ||
|
||
# send notification for 1 active alert group | ||
notification_bundle.notifications.update(bundle_uuid=None) | ||
alert_group_2.resolve() | ||
send_bundled_notification(notification_bundle.id) | ||
assert f"alert_group {alert_group_2.id} is not active, skip notification" in caplog.text | ||
assert ( | ||
f"there is only one alert group in bundled notification, perform regular notification. " | ||
f"alert_group {alert_group_1.id}" | ||
) in caplog.text | ||
# check bundle_uuid was set | ||
assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0 | ||
assert notification_bundle.notifications.all().count() == 1 | ||
# cleanup notifications | ||
notification_bundle.notifications.all().delete() | ||
|
||
# send notification for 0 active alert group | ||
notification_bundle.append_notification(alert_group_1, notification_policy) | ||
alert_group_1.resolve() | ||
send_bundled_notification(notification_bundle.id) | ||
assert f"alert_group {alert_group_1.id} is not active, skip notification" in caplog.text | ||
assert f"no alert groups to notify about or notification is not allowed for user {user.id}" in caplog.text | ||
# check all notifications were deleted | ||
assert notification_bundle.notifications.all().count() == 0 | ||
|
||
# send notification for 2 active alert groups | ||
send_bundled_notification.apply((notification_bundle.id,), task_id=task_id) | ||
|
||
assert f"alert_group {alert_group_3.id} is not active, skip notification" in caplog.text | ||
assert "perform bundled notification for alert groups with ids:" in caplog.text | ||
|
||
# check bundle_uuid was set, notification for resolved alert group was deleted | ||
assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0 | ||
assert notification_bundle.notifications.all().count() == 2 | ||
assert not notification_bundle.notifications.filter(alert_group=alert_group_3).exists() | ||
|
||
# send notification for 1 active alert group | ||
notification_bundle.notifications.update(bundle_uuid=None) | ||
|
||
# since we're calling send_bundled_notification several times within this test, we need to reset task_id | ||
# because it gets set to None after the first call | ||
notification_bundle.notification_task_id = task_id | ||
notification_bundle.save() | ||
|
||
alert_group_2.resolve() | ||
|
||
send_bundled_notification.apply((notification_bundle.id,), task_id=task_id) | ||
|
||
assert f"alert_group {alert_group_2.id} is not active, skip notification" in caplog.text | ||
assert ( | ||
f"there is only one alert group in bundled notification, perform regular notification. " | ||
f"alert_group {alert_group_1.id}" | ||
) in caplog.text | ||
|
||
# check bundle_uuid was set | ||
assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0 | ||
assert notification_bundle.notifications.all().count() == 1 | ||
|
||
# cleanup notifications | ||
notification_bundle.notifications.all().delete() | ||
|
||
# send notification for 0 active alert group | ||
notification_bundle.append_notification(alert_group_1, notification_policy) | ||
|
||
# since we're calling send_bundled_notification several times within this test, we need to reset task_id | ||
# because it gets set to None after the first call | ||
notification_bundle.notification_task_id = task_id | ||
notification_bundle.save() | ||
|
||
alert_group_1.resolve() | ||
|
||
send_bundled_notification.apply((notification_bundle.id,), task_id=task_id) | ||
|
||
assert f"alert_group {alert_group_1.id} is not active, skip notification" in caplog.text | ||
assert f"no alert groups to notify about or notification is not allowed for user {user.id}" in caplog.text | ||
|
||
# check all notifications were deleted | ||
assert notification_bundle.notifications.all().count() == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I removed the compare_escalations
function, I need to reset the notification_task_id
value after each invocation of send_bundled_notification
(alternative here would be to move these different test cases out to separate functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mock_start_disable_maintenance_task
not used
# send debug mode notice | ||
text = "Escalations are silenced due to Debug mode" | ||
self._slack_client.chat_postMessage( | ||
channel=slack_channel_id, | ||
text=text, | ||
attachments=[], | ||
thread_ts=alert_group.slack_message.slack_id, | ||
mrkdwn=True, | ||
blocks=[ | ||
{ | ||
"type": "section", | ||
"text": { | ||
"type": "mrkdwn", | ||
"text": text, | ||
}, | ||
}, | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing here, moved _send_debug_mode_notice
inline here because it's only called once (here) and is simply making a call to self._slack_client.chat_postMessage
if not alert_group.is_maintenance_incident and not should_skip_escalation_in_slack: | ||
send_message_to_thread_if_bot_not_in_channel.apply_async( | ||
(alert_group_pk, slack_team_identity_pk, slack_channel_id), | ||
countdown=1, # delay for message so that the log report is published first | ||
) | ||
|
||
if alert.group.is_maintenance_incident: | ||
# not sending log report message for maintenance incident | ||
pass | ||
else: | ||
# check if alert group was posted to slack before posting message to thread | ||
if not alert.group.skip_escalation_in_slack: | ||
self._send_message_to_thread_if_bot_not_in_channel(alert.group, slack_channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplified nested conditional logic here + moved self._send_message_to_thread_if_bot_not_in_channel
inline (it was just making a call to send_message_to_thread_if_bot_not_in_channel.apply_async
)
self._slack_client.reactions_remove(channel=message.slack_channel_id, name="memo", timestamp=message.ts) | ||
self._slack_client.reactions_remove( | ||
channel=message.slack_channel_slack_id, name="memo", timestamp=message.ts | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a typo.
ResolutionNoteSlackMessage
has a slack_channel_slack_id
property which points to the Slack ID of the associated channel (ex. C123FB4X
; slack_channel_id
will actually point to the primary key ID of the associated slack_channel
database record; ex. 14)
self._slack_client.chat_delete(channel=message.slack_channel_id, ts=message.ts) | ||
self._slack_client.chat_delete(channel=message.slack_channel_slack_id, ts=message.ts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a typo.
ResolutionNoteSlackMessage
has a slack_channel_slack_id
property which points to the Slack ID of the associated channel (ex. C123FB4X
; slack_channel_id
will actually point to the primary key ID of the associated slack_channel
database record; ex. 14)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
def process_signal(self, alert: Alert) -> None: | ||
""" | ||
🐉 Here lay dragons 🐉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐲 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
…5315) # What this PR does Inspired by [this discussion](#5307 (comment)). _tldr;_ ensures that if any of our tests try making an external network call, they will fail. Setup an example test: ```python def test_external_network_call(): import requests response = requests.get('https://www.example.com') assert response.status_code == 200 ``` and it worked (failed; [example CI test run](https://github.com/grafana/oncall/actions/runs/12106416991/job/33752144727?pr=5315#step:6:389)) as expected: ```bash __________________________ test_external_network_call __________________________ def test_external_network_call(): import requests > response = requests.get('https://www.example.com') requests = <module 'requests' from '/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/__init__.py'> apps/test_joey.py:4: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/api.py:73: in get return request("get", url, params=params, **kwargs) kwargs = {} params = None url = 'https://www.example.com' /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/api.py:59: in request return session.request(method=method, url=url, **kwargs) kwargs = {'params': None} method = 'get' session = <requests.sessions.Session object at 0x7f10ebaada90> url = 'https://www.example.com' /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/sessions.py:589: in request resp = self.send(prep, **send_kwargs) allow_redirects = True auth = None cert = None cookies = None data = None files = None headers = None hooks = None json = None method = 'get' params = None prep = <PreparedRequest [GET]> proxies = {} req = <Request [GET]> self = <requests.sessions.Session object at 0x7f10ebaada90> send_kwargs = {'allow_redirects': True, 'cert': None, 'proxies': OrderedDict(), 'stream': False, ...} settings = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'verify': True} stream = None timeout = None url = 'https://www.example.com' verify = None /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/sessions.py:703: in send r = adapter.send(request, **kwargs) adapter = <requests.adapters.HTTPAdapter object at 0x7f10ebaada30> allow_redirects = True hooks = {'response': []} kwargs = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'timeout': None, ...} request = <PreparedRequest [GET]> self = <requests.sessions.Session object at 0x7f10ebaada90> start = 1733064371.649901 stream = False /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/adapters.py:667: in send resp = conn.urlopen( cert = None chunked = False conn = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30> proxies = OrderedDict() request = <PreparedRequest [GET]> self = <requests.adapters.HTTPAdapter object at 0x7f10ebaada30> stream = False timeout = Timeout(connect=None, read=None, total=None) url = '/' verify = True /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:715: in urlopen httplib_response = self._make_request( assert_same_host = False body = None body_pos = None chunked = False clean_exit = False conn = None destination_scheme = None err = None headers = {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'} http_tunnel_required = False is_new_proxy_conn = False method = 'GET' parsed_url = Url(scheme=None, auth=None, host=None, port=None, path='/', query=None, fragment=None) pool_timeout = None redirect = False release_conn = False release_this_conn = True response_kw = {'decode_content': False, 'preload_content': False} retries = Retry(total=0, connect=None, read=False, redirect=None, status=None) self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30> timeout = Timeout(connect=None, read=None, total=None) timeout_obj = Timeout(connect=None, read=None, total=None) url = '/' /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:404: in _make_request self._validate_conn(conn) chunked = False conn = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60> httplib_request_kw = {'body': None, 'headers': {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}} method = 'GET' self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30> timeout = Timeout(connect=None, read=None, total=None) timeout_obj = Timeout(connect=None, read=None, total=None) url = '/' /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:1060: in _validate_conn conn.connect() __class__ = <class 'urllib3.connectionpool.HTTPSConnectionPool'> conn = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60> self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30> /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connection.py:363: in connect self.sock = conn = self._new_conn() self = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60> /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connection.py:174: in _new_conn conn = connection.create_connection( extra_kw = {'socket_options': [(6, 1, 1)]} self = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60> /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/util/connection.py:85: in create_connection sock.connect(sa) address = ('www.example.com', 443) af = <AddressFamily.AF_INET: 2> canonname = '' err = None family = <AddressFamily.AF_UNSPEC: 0> host = 'www.example.com' port = 443 proto = 6 res = (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('93.184.215.14', 443)) sa = ('93.184.215.14', 443) sock = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)> socket_options = [(6, 1, 1)] socktype = <SocketKind.SOCK_STREAM: 1> source_address = None timeout = None _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ inst = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)> args = (('93.184.215.14', 443),), host = '93.184.215.14' def guarded_connect(inst, *args): host = host_from_connect_args(args) if host in allowed_ip_hosts_and_hostnames or ( _is_unix_socket(inst.family) and allow_unix_socket ): return _true_connect(inst, *args) > raise SocketConnectBlockedError(allowed_list, host) E pytest_socket.SocketConnectBlockedError: A test tried to use socket.socket.connect() with host "93.184.215.14" (allowed: "calendar.google.com (142.251.167.100,142.251.167.101,142.251.167.102,142.251.167.113,142.251.167.138,142.251.167.139,2607:f8b0:4004:c09::65,2607:f8b0:4004:c09::66,2607:f8b0:4004:c09::71,2607:f8b0:4004:c09::8b),localhost (127.0.0.1,::1),oncall-dev-mariadb ()"). allow_unix_socket = False allowed_ip_hosts_and_hostnames = {'127.0.0.1', '142.251.167.100', '142.251.167.101', '142.251.167.102', '142.251.167.113', '142.251.167.138', ...} allowed_list = ['calendar.google.com (142.251.167.100,142.251.167.101,142.251.167.102,142.251.167.113,142.251.167.138,142.251.167.139...8b0:4004:c09::66,2607:f8b0:4004:c09::71,2607:f8b0:4004:c09::8b)', 'localhost (127.0.0.1,::1)', 'oncall-dev-mariadb ()'] args = (('93.184.215.14', 443),) host = '93.184.215.14' inst = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)> /opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/pytest_socket.py:252: SocketConnectBlockedError ``` ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] Added the relevant release notes label (see labels prefixed w/ `release:`). These labels dictate how your PR will show up in the autogenerated release notes.
What this PR does
Related to #5287
Few random "clean-ups", type improvements, etc.
Additionally, fixes a change made in #5292; we should wait to read from
slack_message.channel.slack_id
, until we've performed the data-migration mentioned in that PR (in the mean-time we should continue to useslack_message._channel_id
).Checklist
pr:no public docs
PR label added if not required)release:
). These labels dictate how your PR willshow up in the autogenerated release notes.