Improving pgs asset serving performance (brainstorming)

Hello! This weekend I was interested in learning how pgs worked so I looked through the code and wrote down any possible places that I thought could affect performance. I didn't do any runtime testing, so take this with a grain of salt.

**To serve a single HTML file, the following must happen:**

1. DNS lookup for TXT record to determine project name (findRouteConfig -> GetCustomDomain)
2. Regex match on all routes (CreateServeBasic)
3. DB Query 1 (WHERE clause on `app_users.name`) (FindUserForName)
4. GetBucket() call to minio
5. DB Query 2 (WHERE clause on `projects.user_id && name`) (FindProjectByName)
6. DB Query 3 (WHERE clause on `feature_flags.user_id && name`) (HasFeatureForUser)
7. GetObject() call to minio for `_redirects`, then parsed
    - (note: redirects may be ignored after minio error)
    - (note: redirect file loaded into RAM without limits)
8. Regex match on every line of `_redirects` (calcRoutes)
9. Three possibilities (handle):
    - If it's a redirect to an external site, it's fetched with http.get() and returned as-is
    - If it's a request to an image, it's fetched from imgproxy with http.get()
    - If it's a request to any other file, it's fetch from minio with GetObject()
11. GetObject() call to minio for `_headers`, then parsed
12. Regex match on every line of `_headers`
13. DB Query 4 (WHERE clause on `feature_flags.user_id && name`) (AnalyticsVisitFromRequest)
14. Regex match on possible crawlers (trackableRequest -> IsCrawler)
15. (async) DB Query 5 (INSERT INTO `analytics_visits`) (AnalyticsCollect)

**The following are some ideas for improving performance:**

For (1), I predict the DNS lookup is the [slowest operation](https://www.dnsperf.com/#!dns-resolvers) in the list since (I think) all the other operations don't leave your single VM. Are you using Oracle Linux? If it's anything like RHEL, then local [DNS caching is not enabled by default](https://access.redhat.com/solutions/2189381). If you enable systemd-resolved, it will [cache 4096 responses for up to 2 hours](https://www.ctrl.blog/entry/dns-client-ttl.html) and it will respect the TTL. Users should be encouraged to set high TTLs (>1 hour) to improve performance.

For the database queries (3, 5, 6, 13), an easy win would be to fetch the `feature_flags` in the same query where we fetch the user, but then we'd still be performing 2 queries per request. Possibly caching is a better solution, see below.

For the GetBucket() call (4), that will send a BucketExists() request to Minio. Technically that's not necessary since you can create Bucket objects using just the name of the bucket.

For the GetObject() calls to read `_redirects` and `_headers`, I think caching is our only hope. Caching these would also allow us to cache the compiled regexes.

### Caching

Since all of this data is small, I think we could use an in-process, in-memory cache like https://github.com/hashicorp/golang-lru ~~(the 2q implementation sounds smarter since it considers frequency in addition to recency)~~ NVM, if we want to set ttls, we have to use the LRU version. The following work would be required:

1. Limit `_redirects` and `_headers` to something like 5KB so our cache size is bounded.
2. When a site is first requested, we create something like that `AssetHandler` struct (plus routes parsed from _redirects and _headers) and save it to the cache, keyed on the `user-project` slug with a default TTL of something reasonable like 1 hour (so any caching bug we happen to introduce resolves itself in 1 hour). 
    - We check for the cached version before creating a new one.
    - We don't cache the TXT record; we rely on systemd-resolved caching since that is designed to respect DNS TTLs. That allows us to quickly determine the `user-project` slug key that we need for reading from our own cache.
3. When any of the following happens, we delete our cache entry if it exists:
    - User deleted (or just let it expire)
    - Project deleted (or just let it expire)
    - Project disabled (or just let it expire)
    - User flags change like pico+ or analytics
    - New `_redirects` or `_headers` file uploaded (or I guess we could clear the cache on any upload as a user-controlled way of clearing their own cache)

If we do all that, then we can serve assets with a single locally-cached DNS lookup, a single hash table lookup, and a single GetObject() minio call! 🚀 

Thoughts? I can contribute for some of this. It's a pretty big change, so just wanted to run it by you before diving too deep.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving pgs asset serving performance (brainstorming) #149

Caching

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improving pgs asset serving performance (brainstorming) #149

Description

Caching

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions