Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist servers in separate database #45

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Persist servers in separate database #45

wants to merge 16 commits into from

Conversation

ShadowNinja
Copy link
Contributor

@ShadowNinja ShadowNinja commented Jun 13, 2021

Summary of Changes

Persistent SQL Database

Servers are now stored persistently in an SQL database. This offers the following advantages:

  • Servers can now be remembered when they restart, so that things like world age can be verified. (See below)
  • Uptime can now be tracked as a percentage (not currently implemented).
  • The server list can now to maintain non-public information about servers. Previously some values used for things like tracking uptime were included in list.json because the server list didn't have anywhere else to put it. Now it can be stored in the database and not exposed.
  • The list of valid fields are now restricted. Before a server could set arbitrary additional fields and the server list would store and serve then. Multicraft used this to add a server_id field for their server. Only recognize fields will be served to clients now (although server_id has been added to allow this existing use case).
  • This allows the use of multiple server processes. Previously each process would keep its own copy of the server list in memory, which would cause servers to appear and disappear randomly when processes overwrote list.json.

Persistent server state

We need a way to match announcements to servers in the database. The way this used to be done is with the announce_ip:port pair, however this is not a good option going forward because servers can change IPs (e.g. moving to different host or using a dynamic IP).

A better option is address:port because you can maintain the same address when moving the server. However this still not ideal because you can't change the port or move to a different domain name, and without address verification it can be spoofed.

The solution is to add a world_uuid. This is stored with the world and allows the world to be moved freely and still identified. This value is kept secret to prevent spoofing of the server's announcements.

An alternative possibility would be to use a pubkey/privkey pair to identify servers, but this would be more complicated.

Fallback

address:port is used as a fallback for old servers, which should also be unspoofable with address verification enabled.

Address verification is needed to prevent spoofing with servers that do not support world_uuid, however it can fail if a server is set up for IPv4 and announces over IPv6. Last I checked there were 71/315 servers with this issue, so it's not something we just want to ignore. Therefore, Address verification has been moved into the view function and a detailed error response is returned if it fails. The response includes detailed instructions on how to fix the issue. Address verification is skipped if world_uuid is sent.

There was also one server that (minetest.zarnica.org.ua:30001) that fails the verification check for a different reason: it uses IPv4, but announces over a different IPv4 address than it listens on. Perhaps this server has multiple NICs. It doesn't seems like this is a major concern since it's just one server with an odd networking config.

Background tasks

Celery is now used to handle background tasks. This requires a message broker (Redis or RabbitMQ) to be available.

We can probably use something simpler if this is deemed too complicated. We can probably move the entire request processing into the view function to remove the need for the request processing background task. This would also allow up to serve more useful error messages if there's an issue in the final verification steps before publishing the server.

The maintenance background task could be moved to a simpler scheduler like APScheduler, or even just cron with a cli command. However, it will have to run in a separate process so that there aren't multiple schedulers if there are multiple web processes.

List changes

The following fields are no longer included in the server list:

  • ip: The IP that the announce was received from. This is only stored on the server now.
  • clients: This field was overwritten with len(clients_list) unless the server was old enough to not send a clients list. All servers on the server list currently send clients_list.
  • start: Was used to validate uptime. Kept internally now. Clients can use uptime to approximate this.
  • total_clients: Sum of clients in announce requests. Was to calculate pop_v.
  • update_time: Kept internally now.
  • updates: Number of updates received. Used to calculate pop_v. Popularity is calculated without this now.
  • liquid_finite: Removed in 0.4.10.

Also server_id was officially added. Multicraft clients set this to "multicraft". It worked before because the server list allowed arbitrary extra fields.

The list is not immediately updated when a new server announces any more, the server may not show up for up to a minute.

Ranking changes

  • "Guest" accounts are no longer down-weighted.
  • "Heavily loaded" is now limited by a server-side constant instead of relying on clients_max.
  • Points for server age increase by 2 per month instead of 1 per month. Max age boost is still 8.
  • High clients_max is no longer penalized, as it no longer meaningfully contributes to a higher ranking.
  • Ping time threshold lowered to 300ms. All pings below 300ms are treated the same to avoid penalizing a server simply for its geographical distance from the server list. Above that a penalty is applied.
  • Recent startup penalty changed from -8 to -4.

The method used to calculate popularity (pop_v) has changed. It is now rolling average (like /proc/loadavg). This is needed because with persistent server storage the more recent popularity has to be weighted more heavily so that an old server that gets more popular isn't stuck with its old low popularity.

Ping checking

Servers are pinged a few times and the lowest value is used. Pings are also retried a few times in case of packet loss.

The server list re-pings every server every minute when the list is updated, rather that just sending one ping on start and never updating it until the server restarts. This could be tweaked if that's too often.

Misc

  • There is a command to import the existing JSON server list into the new database (flask load-json list.json).
  • Integers and booleans as strings no longer supported. This was supported for compatibility with old clients (<0.4.10) and should not be an issue anymore.
  • In the initial announce all IPs that the server's address resolves to are pinged instead of just the first one. This ensures that the domain doesn't have an invalid DNS record that may cause clients to fail to connect.
  • Use Pipfile instead of requirements.txt.
  • Split server list into separate files. This is necessary to keep everything organized now.
  • Changed casing to snake_case.
  • README expanded.

Copy link
Contributor

@sfan5 sfan5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my review comments from looking at the code (haven't tested)

.editorconfig Outdated
indent_style = tab
indent_size = 4
trim_trailing_whitespace = true
insert_final_newline = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically this file is missing the final newline ;)

Pipfile Outdated Show resolved Hide resolved
server_list/models.py Show resolved Hide resolved
"proto_min": self.proto_min,
"pvp": self.pvp_enabled,
"rollback": self.rollback_enabled,
"uptime": (datetime.utcnow() - self.start_time).total_seconds(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"uptime": (datetime.utcnow() - self.start_time).total_seconds(),
"uptime": int((datetime.utcnow() - self.start_time).total_seconds()),


# Optional fields
if self.clients is not None:
obj["clients_list"] = self.clients.split("\n") if self.clients else []
Copy link
Contributor

@sfan5 sfan5 Jul 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not none then split should already return an empty array
same with mods below

Also, since the JS presentation code relies on clients_list being present this should be set unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty string check is actually necessary because split() behaves differently if you pass it a delimiter:

>>> ''.split('\n')
['']
>>> ''.split()
[]

if pings is None:
return

# Use average ping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the final ping we calculate is this, correct?

average of(
  minimum of(
    ping(address) three times
  ) for each address
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

"online": False,
"total_uptime": Server.total_uptime + (Server.last_update - Server.start_time).total_seconds(),
"down_time": datetime.utcnow(),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this reuse Server.set_offline?
I don't think sqlalchemy can turn that .total_seconds() into an SQL query (that's what it does, right?) so it'll make a roundtrip through Python anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was working, but I tried it again and it didn't work, so I'll change it to use set_offline.


Help: This is usually because your server is only listening on IPv4 but your announcement is being sent over IPv6.
There are two ways to fix this:
1. (preferred) Set bind_address = :: in your server config to listen on IPv6 and add your IPv6 address to DNS as an AAAA record.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct way to configure this is via ipv6_server = true

There are two ways to fix this:
1. (preferred) Set bind_address = :: in your server config to listen on IPv6 and add your IPv6 address to DNS as an AAAA record.
This allows both IPv4 and IPv6 because the linux kernel will forward IPv4 requests to the IPv6 socket by default using an IPv4-mapped IPv6 address (look up IPV6_V6ONLY for more information).
On other operating systems this option may not work and you'll have to use the second option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works on every relevant OS and I don't think we need this explanation anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Windows do IPv6 mapping by default? That's what I was mainly concerned about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't but Minetest sets this socket option.

Address records: {{ valid_addresses | join ", " }}

Help: This is usually because your server is only listening on IPv4 but your announcement is being sent over IPv6.
There are two ways to fix this:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are two ways to fix this:
If that is the case then you have two ways to fix this:

@sfan5
Copy link
Contributor

sfan5 commented Jul 10, 2021

Last I checked there were 71/315 servers with this issue, so it's not something we just want to ignore. Therefore, Address verification has been moved into the view function and a detailed error response is returned if it fails.

But we can't just put the stricter validation into effect soon. Server owners cannot be properly notified until they have upgraded to 5.5.0, which will take some time even after it is released.

The server list re-pings every server every minute when the list is updated, rather that just sending one ping on start and never updating it until the server restarts.

I'm not convinced we need to update the ping between the 5 minutes a server usually announces.

Ranking changes

These sound mostly fine to me but it'd be better to move these into a future PR so that this one doesn't get tangled in discussions about ranking changes.

Celery is now used to handle background tasks. This requires a message broker (Redis or RabbitMQ) to be available. We can probably use something simpler if this is deemed too complicated. We can probably move the entire request processing into the view function to remove the need for the request processing background task.

Not sure if I like Celery yet, it's probably fine. Though if it's done simpler (just start a background thread from Flask?) I wouldn't mind.
Moving processing into the requests isn't a good idea and not just for the obvious reasons. Minetest will currently(?) abort cURL requests after 5 seconds which might be too tight to do all the work.

Use Pipfile instead of requirements.txt.

pip can still read these and they not specific to Pipenv, are they?

@rubenwardy
Copy link

rubenwardy commented Jul 10, 2021

ContentDB uses Celery and SQLAlchemy - I recommend both. The problem with a background thread is when you have multiple web workers, you can end up potentially duplicating work

Also, I would suggest as an alternative to generating list.json to instead make it a Flask endpoint. You can then cache in Nginx or Apache. This feels cleaner

@ShadowNinja
Copy link
Contributor Author

We can't just put the stricter validation into effect soon. Server owners cannot be properly notified until they have upgraded to 5.5.0, which will take some time even after it is released.

I have an idea to solve this: We could try to verify the server and if it succeeds set an address_verification_required bit in the database. This way servers with a compatible configuration still get spoof protection.

I'm not convinced we need to update the ping between the 5 minutes a server usually announces.

I could bump this to 5 minutes.

[Ranking changes] sound mostly fine to me but it'd be better to move these into a future PR so that this one doesn't get tangled in discussions about ranking changes.

OK. The popularity change will still be needed, but the rest can be separated.

pip can still read [Pipfiles] and they not specific to Pipenv, are they?

No, you need pipenv. I could also include a requirements.txt.

@sfan5
Copy link
Contributor

sfan5 commented Jul 11, 2021

I have an idea to solve this: We could try to verify the server and if it succeeds set an address_verification_required bit in the database. This way servers with a compatible configuration still get spoof protection.

Good idea. Only side effect (that could result in more support questions) is that announces are no longer deterministic, if you pass domain verification once you may never fail it again.

I could bump this to 5 minutes.

Rather just remove that code entirely then? Servers are pinged on announce and those happen every 5 minutes.

@ShadowNinja ShadowNinja force-pushed the rework branch 3 times, most recently from 9bb29ee to a677f38 Compare August 6, 2021 00:23
@ShadowNinja
Copy link
Contributor Author

Only side effect (that could result in more support questions) is that announces are no longer deterministic, if you pass domain verification once you may never fail it again.

Yes, not sure if there's much we can really do about this. It seems unlikely that a broken config would get fixed and then break again, so hopefully this won't be an issue in practice.

Servers are pinged on announce and those happen every 5 minutes.

Good point. I've removed the periodic server-initiated ping.

@sfan5
Copy link
Contributor

sfan5 commented Feb 20, 2022

Can you rebase this? I want to put this to the test by mirroring all of the requests to it.

@ShadowNinja
Copy link
Contributor Author

@sfan5 Rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants