-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving pgs asset serving performance (brainstorming) #149
Comments
Hi! Wow, thanks for this in-depth feeback! So I like all of these changes and can commit to getting this across the finish line. There are a couple of pieces that require us to help:
5kb seems perfectly reasonable to me so let's go for it. For caching, what do you think about the ability to wipe the in-memory cache? In particular:
Do you have any thoughts on supporting those features? Finally, let me know what parts you'd like to work on and I can commit to whatever else. Thanks again, this is awesome! |
Some good discussion in this PR: #154 |
One thing about the analytics that worries me is that they're going into an unindexed postgres table with no expiration mechanism. Checking analytics for my site takes 3+ seconds already, which doesn't leave a lot of room for adding new features on top of analytics. That's in addition to complicating caching by making all requests hit the origin server to be counted. I'm wondering if we should be rebuilding analytics to work with caching rather than building caching around the current analytics implementation. The main change would be "pulling" analytics from whatever is doing the caching (either a CDN or a distributed cache storage system) instead of making the caching system "push" view counts to pgs. Could also use the opportunity to store the metrics in a time series database. Queries like "what time of day do I get the most traffic?" are a lot more efficient in a time series database. The specifics would depend on which method you pick for caching. But you could make that choice free from the burden of working around the current analytics system. If you do decide to build analytics around the cache, then we could just disable caching for .html files as a first step as we build the caching system. Just thinking out loud, no strong opinions. My main goal is ensuring you're happy with the resulting setup :) |
Thanks for the reminder! This got us from 3+ seconds to around 1 second: ac3be17
I love this thinking. To provide some context, the reason why we built our own analytics system is because we wanted to be in full control of how the data gets aggregated and stored without using or being up-sold on something off-the-shelf. We have positioned ourselves to host privacy-focused services and all the other self-hosted solutions that I found felt very heavy. It didn't pass the BYO test for me.
I'm definitely open to converting our analytics table to use timescaleDB and I did investigate it when building but decided against it mainly because of the complexity. I do think there are other things we can do with a vanilla db table that we can try before switching over (e.g. partitioned table). However, that doesn't change our push/pull mechanism. If we went with Souin or another cache system off-the-shelf, how could we record site-usage analytics using pull? |
Wow the analytics are a heck of a lot faster now! Thanks! 🚀
Thanks for clarifying, that makes sense (and is one of the reasons I moved my site to pgs!). I think that means we can cross major CDN providers off the list (using tuns instead of Cloudflare Tunnels is another reason I signed up). I also have doubts about I can think of four possible options that involve rebuilding analytics around caching: Option 1: Scrape Prometheus metrics from SouinIf we use Souin, we'd need a new feature added for obtaining page view counts. I'd bet darkweak would want this feature to be generic enough for others to use, and the best idea I can think of is to expand the existing prometheus metrics to include hostname and route labels (disabled by default, enableable via config). Seems you're already collecting some prometheus metrics for pgs. Pros:
Cons:
Lack of ability to filter out bots may make this option a non-starter. Option 2: Souin +
|
Btw I updated my branch with some of things discussed: main...mac-chaffee:pico:caddy-caching
|
Hey @mac-chaffee Sorry for the long delay, we had some other pico work that we thought we could fold nicely into your caching work. In particular, we have deployed a change to how we collect site usage analytics. We are now using I think you are right, we should go with option (4). Here's a patchset that I'm prototyping to adapt our metric drain to receive caddy logs: https://pr.pico.sh/prs/35 Once that is complete all we need to do is figure out how to send the logs from caddy to For the sake of argument, let's say we have a solution for caddy and our |
No worries! You all are really serious about dog-fooding haha, that sounds like a cool solution. For my own understanding, have you decided roughly what the multi-region setup would look like? Would this be like hub-and-spoke where all the stateful stuff (database, minio) lives on a single "hub" server with regional "spokes" that run caddy+pgs/pico? |
To answer the question, honestly I think I'd just have to rebase my branch and try it out! When the metrics-drain+caddy logs solution is ready, you'd just delete the line that disables caching for html files: |
Re-opening because I think we are at the point with Souin that we might need to investigate alternatives solutions before adding another service like I'm going to spend some more time looking into alternatives that aren't directly tied to Caddy. I wonder how hard it would be to use or fork something like this for our use case: |
In parallel, I can do some looking into the One reason I personally shy away from maintaining an HTTP cache is the number of cache poisoning footguns that exist when creating cache keys. Not insurmountable, but worth keeping in mind. |
I'm not sure the best place to put this yet so I'm leaving it here: I cannot PURGE using the Surrogate-Key locally -- or in prod. I'm looking into it |
In case of emergency, restarting Caddy can be a backup plan. |
FYI: darkweak/souin#583 |
It does look like badger is working as a single cache, btw, so I say we go with that for now. We just gotta figure out this surrogate key issue. |
Badger stores on-disk rather than in-memory. Looking at the Souin repo history, seems like Olric is the second oldest storage option (second to redis, but olric is embeddable) which may mean it's more stable. I can play around with it though. |
Check out this idea: #175 |
Closed by #175 |
Hello! This weekend I was interested in learning how pgs worked so I looked through the code and wrote down any possible places that I thought could affect performance. I didn't do any runtime testing, so take this with a grain of salt.
To serve a single HTML file, the following must happen:
app_users.name
) (FindUserForName)projects.user_id && name
) (FindProjectByName)feature_flags.user_id && name
) (HasFeatureForUser)_redirects
, then parsed_redirects
(calcRoutes)_headers
, then parsed_headers
feature_flags.user_id && name
) (AnalyticsVisitFromRequest)analytics_visits
) (AnalyticsCollect)The following are some ideas for improving performance:
For (1), I predict the DNS lookup is the slowest operation in the list since (I think) all the other operations don't leave your single VM. Are you using Oracle Linux? If it's anything like RHEL, then local DNS caching is not enabled by default. If you enable systemd-resolved, it will cache 4096 responses for up to 2 hours and it will respect the TTL. Users should be encouraged to set high TTLs (>1 hour) to improve performance.
For the database queries (3, 5, 6, 13), an easy win would be to fetch the
feature_flags
in the same query where we fetch the user, but then we'd still be performing 2 queries per request. Possibly caching is a better solution, see below.For the GetBucket() call (4), that will send a BucketExists() request to Minio. Technically that's not necessary since you can create Bucket objects using just the name of the bucket.
For the GetObject() calls to read
_redirects
and_headers
, I think caching is our only hope. Caching these would also allow us to cache the compiled regexes.Caching
Since all of this data is small, I think we could use an in-process, in-memory cache like https://github.com/hashicorp/golang-lru
(the 2q implementation sounds smarter since it considers frequency in addition to recency)NVM, if we want to set ttls, we have to use the LRU version. The following work would be required:_redirects
and_headers
to something like 5KB so our cache size is bounded.AssetHandler
struct (plus routes parsed from _redirects and _headers) and save it to the cache, keyed on theuser-project
slug with a default TTL of something reasonable like 1 hour (so any caching bug we happen to introduce resolves itself in 1 hour).user-project
slug key that we need for reading from our own cache._redirects
or_headers
file uploaded (or I guess we could clear the cache on any upload as a user-controlled way of clearing their own cache)If we do all that, then we can serve assets with a single locally-cached DNS lookup, a single hash table lookup, and a single GetObject() minio call! 🚀
Thoughts? I can contribute for some of this. It's a pretty big change, so just wanted to run it by you before diving too deep.
The text was updated successfully, but these errors were encountered: