diff --git a/docs/architecture/databases.md b/docs/architecture/databases.md index 0eee453..c7953b4 100644 --- a/docs/architecture/databases.md +++ b/docs/architecture/databases.md @@ -18,175 +18,10 @@ Our queueing system gets hit hard, and therefore we do all of that work on one d The "work" database is where application tables specific to the processing of data live. -### guestbook - -The guestbook is where we keep track of URLs that have been/want to be searched. These tables live in the `cmd/migrate` app, which handles our migrations on every deploy. [These are dbmate migrations](https://github.com/GSA-TTS/jemison/tree/main/cmd/migrate/work_db/db/migrations). - -```sql -create table guestbook ( - id bigint generated always as identity primary key, - domain64 bigint not null, - last_modified timestamp, - last_fetched timestamp, - next_fetch timestamp not null, - scheme integer not null default 1, - content_type integer not null default 1, - content_length integer not null default 0, - path text not null, - unique (domain64, path) -); -``` - -The dates drive a significant part of the entree/fetch algorithms. - -* `last_modified` is EITHER the timestamp provided by the remote webserver for any given page, OR if not present, we assign this value in `fetch`, setting it to the last fetched timestamp. -* `last_fetched` is the time that the page was fetched. This is updated every time we fetch the page. -* `next_fetch` is a computed value; if a page is intended to be fetched weekly, then `fetch` will set this as the current time plus one week at the time the page is fetched. - -### hosts - -```sql -create table hosts ( - id bigint generated always as identity primary key, - domain64 bigint, - next_fetch timestamp not null, - unique(id), - unique(domain64), - constraint domain64_domain - check (domain64 > 0 and domain64 <= max_bigint()) -) -; -``` - -Like the `guestbook`, this table plays a role in determining whether a given domain should be crawled. If we want to crawl a domain *right now*, we set the `next_fetch` value in this table to yesterday, allowing all crawls of URLs under this domain to be valid. +Read more about the [tables and their roles in the work database](databases/work.md). ## search The `search` database holds our data pipelines and the tables that get actively searched. -This database is not (yet) well designed. Currently, there is a notion of a `raw_content` table, which is where `pack` deposits text. - -```sql -CREATE TABLE raw_content ( - id BIGSERIAL PRIMARY KEY, - host_path BIGINT references guestbook(id), - tag TEXT default , - content TEXT -) -``` - -From there, it is unclear how best to structure and optimize the content. - -There are two early-stage ideas. Both have tradeoffs in terms of performance and implementation complexity, and it is not clear yet which to pursue. - - -### one idea: inheritence. - -https://www.postgresql.org/docs/current/tutorial-inheritance.html - -We could define a searchable table as `gov`. - -```sql -create table gov ( - id ..., - host_path ..., - tag ..., - content ... -); -``` - -From there, we could have *empty* inheritence tables. - -```sql -create table gsa () inherits (gov); -create table hhs () inherits (gov); -create table nih () inherits (gov); -``` - -and, from there, the next level down: - -```sql -create table cc () inherits (nih); -create table nccih () inherits (nih); -create table nia () inherits (nih); -``` - -Then, insertions happen at the **leaves**. That is, we only insert at the lowest level of the hierarchy. However, we can then query tables higher up, and get results from the entire tree. - -This does two things: - -1. It lets queries against a given domain happen naturally. If we want to query `nia.nih.gov`, we target that table with our query. -2. If we want to query all of `nih`, then we query the `nih` table. -3. If we want to query everything, we target `gov` (or another tld). - -Given that we are going to treat these tables as build artifacts, we can always regenerate them. And, it is possible to add new tables through a migration easily; we just add a new create table statement. - -(See [this article](https://medium.com/miro-engineering/sql-migrations-in-postgresql-part-1-bc38ec1cbe75) about partioning/inheritence, indexing, and migrations. It's gold.) - -### declarative partitioning - -Another approach is to use `PARTITION`s. - -This would suggest our root table has columns we can use to drive the derivative partitions. - -```sql -create table gov ( - id ..., - domain64 BIGINT, - host_path ..., - tag ..., - content ... - partition by range(domain64) -); -``` - -To encode all of the TLDs, domains, and subdomains we will encounter, we'll use a `domain64` encoding. Why? It maps the entire URL space into a single, 64-bit number (or, `BIGINT`). - -``` -FF:FFFFFF:FFFFFF:FF -``` - -or - -``` -tld:domain:subdomain:subsub -``` - -This is described more in detail in [domain64.md](domain64.md). - -As an example: - -| tld | domain | sub | hex | dec | -|-----|--------|-----|----------------------|-------------------| -| gov | gsa | _ | #x0100000100000000 | 72057598332895232 | -| gov | gsa | tts | #x0100000100000100 | 72057598332895488 | -| gov | gsa | api | #x0100000100000200 | 72057598332895744 | - -GSA is from the range #x0100000001000000 -> #x0100000001FFFFFF, or 72057594054705152 -> 72057594071482367 (a diff of 16777215). Nothing else can be in that range, because we're using the bitstring to partition off ranges of numbers. - -Now, everything becomes bitwise operations on 64-bit integers, which will be fast everywhere... and, our semantics map well to our domain. - -Partitioning to get a table with only GSA entries is - -```sql -CREATE TABLE govgsa PARTITION OF gov - FOR VALUES FROM (72057598332895232) TO (72057602627862527); -``` - -Or, just one subdomain in the space: - -```sql -CREATE TABLE govgsatts PARTITION OF gov - FOR VALUES FROM (72057598332895488) TO (72057598332895743); -``` - -or we can keep the hex representation: - -```sql -CREATE TABLE govgsatts PARTITION OF gov - FOR VALUES FROM (select x'0100000100000100') TO (select x'01000001000001FF'); -``` - -All table operations are on the top-level table (insert, etc.), the indexes and whatnot are inherited automatically, and I can search the TLD, domain, or subdomain without difficulty---because it all becomes a question of what range the `domain64` value is in. - - +Read more about the [tables and their roles in the search database](databases/search.md). diff --git a/docs/architecture/databases/search.md b/docs/architecture/databases/search.md new file mode 100644 index 0000000..edcdf7f --- /dev/null +++ b/docs/architecture/databases/search.md @@ -0,0 +1,127 @@ + +This database is not (yet) well designed. Currently, there is a notion of a `raw_content` table, which is where `pack` deposits text. + +```sql +CREATE TABLE raw_content ( + id BIGSERIAL PRIMARY KEY, + host_path BIGINT references guestbook(id), + tag TEXT default , + content TEXT +) +``` + +From there, it is unclear how best to structure and optimize the content. + +There are two early-stage ideas. Both have tradeoffs in terms of performance and implementation complexity, and it is not clear yet which to pursue. + + +### one idea: inheritence. + +https://www.postgresql.org/docs/current/tutorial-inheritance.html + +We could define a searchable table as `gov`. + +```sql +create table gov ( + id ..., + host_path ..., + tag ..., + content ... +); +``` + +From there, we could have *empty* inheritence tables. + +```sql +create table gsa () inherits (gov); +create table hhs () inherits (gov); +create table nih () inherits (gov); +``` + +and, from there, the next level down: + +```sql +create table cc () inherits (nih); +create table nccih () inherits (nih); +create table nia () inherits (nih); +``` + +Then, insertions happen at the **leaves**. That is, we only insert at the lowest level of the hierarchy. However, we can then query tables higher up, and get results from the entire tree. + +This does two things: + +1. It lets queries against a given domain happen naturally. If we want to query `nia.nih.gov`, we target that table with our query. +2. If we want to query all of `nih`, then we query the `nih` table. +3. If we want to query everything, we target `gov` (or another tld). + +Given that we are going to treat these tables as build artifacts, we can always regenerate them. And, it is possible to add new tables through a migration easily; we just add a new create table statement. + +(See [this article](https://medium.com/miro-engineering/sql-migrations-in-postgresql-part-1-bc38ec1cbe75) about partioning/inheritence, indexing, and migrations. It's gold.) + +### declarative partitioning + +Another approach is to use `PARTITION`s. + +This would suggest our root table has columns we can use to drive the derivative partitions. + +```sql +create table gov ( + id ..., + domain64 BIGINT, + host_path ..., + tag ..., + content ... + partition by range(domain64) +); +``` + +To encode all of the TLDs, domains, and subdomains we will encounter, we'll use a `domain64` encoding. Why? It maps the entire URL space into a single, 64-bit number (or, `BIGINT`). + +``` +FF:FFFFFF:FFFFFF:FF +``` + +or + +``` +tld:domain:subdomain:subsub +``` + +This is described more in detail in [domain64.md](domain64.md). + +As an example: + +| tld | domain | sub | hex | dec | +|-----|--------|-----|----------------------|-------------------| +| gov | gsa | _ | #x0100000100000000 | 72057598332895232 | +| gov | gsa | tts | #x0100000100000100 | 72057598332895488 | +| gov | gsa | api | #x0100000100000200 | 72057598332895744 | + +GSA is from the range #x0100000001000000 -> #x0100000001FFFFFF, or 72057594054705152 -> 72057594071482367 (a diff of 16777215). Nothing else can be in that range, because we're using the bitstring to partition off ranges of numbers. + +Now, everything becomes bitwise operations on 64-bit integers, which will be fast everywhere... and, our semantics map well to our domain. + +Partitioning to get a table with only GSA entries is + +```sql +CREATE TABLE govgsa PARTITION OF gov + FOR VALUES FROM (72057598332895232) TO (72057602627862527); +``` + +Or, just one subdomain in the space: + +```sql +CREATE TABLE govgsatts PARTITION OF gov + FOR VALUES FROM (72057598332895488) TO (72057598332895743); +``` + +or we can keep the hex representation: + +```sql +CREATE TABLE govgsatts PARTITION OF gov + FOR VALUES FROM (select x'0100000100000100') TO (select x'01000001000001FF'); +``` + +All table operations are on the top-level table (insert, etc.), the indexes and whatnot are inherited automatically, and I can search the TLD, domain, or subdomain without difficulty---because it all becomes a question of what range the `domain64` value is in. + + diff --git a/docs/architecture/databases/work.md b/docs/architecture/databases/work.md new file mode 100644 index 0000000..348f4a2 --- /dev/null +++ b/docs/architecture/databases/work.md @@ -0,0 +1,60 @@ +# work db + +The "work" DB is where day-to-day application work takes place. Data supporting the crawling/indexing work, for example, lives in this database. It is separate from the queues (which are high frequency, small data transactions) and search (which is read heavy). + +[//]: # ( O| - Zero or one ) +[//]: # ( || - One and only one ) +[//]: # ( O{ - Zero or many ) +[//]: # ( |{ - One or many ) + +```mermaid +erDiagram + HOSTS { + INT id + BIGINT domain64 UK + } + GUESTBOOK + HOSTS ||--O{ GUESTBOOK: "Has" +``` + +## guestbook + +The guestbook is where we keep track of URLs that have been/want to be searched. These tables live in the `cmd/migrate` app, which handles our migrations on every deploy. [These are dbmate migrations](https://github.com/GSA-TTS/jemison/tree/main/cmd/migrate/work_db/db/migrations). + +```sql +create table guestbook ( + id bigint generated always as identity primary key, + domain64 bigint not null, + last_modified timestamp, + last_fetched timestamp, + next_fetch timestamp not null, + scheme integer not null default 1, + content_type integer not null default 1, + content_length integer not null default 0, + path text not null, + unique (domain64, path) +); +``` + +The dates drive a significant part of the entree/fetch algorithms. + +* `last_modified` is EITHER the timestamp provided by the remote webserver for any given page, OR if not present, we assign this value in `fetch`, setting it to the last fetched timestamp. +* `last_fetched` is the time that the page was fetched. This is updated every time we fetch the page. +* `next_fetch` is a computed value; if a page is intended to be fetched weekly, then `fetch` will set this as the current time plus one week at the time the page is fetched. + +## hosts + +```sql +create table hosts ( + id bigint generated always as identity primary key, + domain64 bigint, + next_fetch timestamp not null, + unique(id), + unique(domain64), + constraint domain64_domain + check (domain64 > 0 and domain64 <= max_bigint()) +) +; +``` + +Like the `guestbook`, this table plays a role in determining whether a given domain should be crawled. If we want to crawl a domain *right now*, we set the `next_fetch` value in this table to yesterday, allowing all crawls of URLs under this domain to be valid. diff --git a/docs/architecture/domain64.md b/docs/architecture/domain64.md index 478aebc..2a1f9a5 100644 --- a/docs/architecture/domain64.md +++ b/docs/architecture/domain64.md @@ -24,6 +24,14 @@ This lets us track * 16,777,215 (#FFFFFF) subdomains under each domain * 255 (#FF) reserved +```mermaid +packet-beta +0-7: "TLD" +8-31: "Domain" +32-55: "Subdomain" +56-63: "Reserved" +``` + For example ``` @@ -126,6 +134,52 @@ Using those 32 bits for paths, we could: * 10 means we used three nibbles for the root (1024 roots) and 5 nibbles for paths (1M) * 11 is undefined + +```mermaid +packet-beta +0-3: "TLD" +4-19: "Domain" +20-35: "Subdomain" +36-37: "Path type" +38-63: "Path" +``` + +where "Path" might be + +```mermaid +packet-beta +0-1: "00" +2-31: "Path id" +``` + +or + +```mermaid +packet-beta +0-1: "01" +2-9: "Root id" +10-31: "Path id" +``` + +or + +```mermaid +packet-beta +0-1: "10" +2-13: "Root id" +14-31: "Path id" +``` + +and "11" is undefined + +```mermaid +packet-beta +0-1: "11" +2-31: "Undefined" +``` + + + This would make subpath searching optimal. We can filter, based on the domain64, down to the path Knowing if we can do this a priori is the trick; that is, what path structure is appropriate for a given site? It might be that we have to assume `00`, and then under analysis (post-crawl), potentially re-assign, which allows for optimization after a second crawl? diff --git a/docs/architecture/index.md b/docs/architecture/index.md index ad40ad1..a653fd6 100644 --- a/docs/architecture/index.md +++ b/docs/architecture/index.md @@ -44,3 +44,6 @@ flowchart LR At this point, further services clean, process, and prepare the text for search. Read more about the [data processing pipeline](processing.md). + +## administration via API + diff --git a/docs/architecture/services/adding-a-new-service.md b/docs/architecture/services/adding-a-new-service.md new file mode 100644 index 0000000..b8e440d --- /dev/null +++ b/docs/architecture/services/adding-a-new-service.md @@ -0,0 +1,382 @@ +# adding a new service + +Adding a new service should generally follow steps that look like the following. + +## the architecture + +Our services communicate via queues. We use the [river](https://riverqueue.com) library to handle our queueing/messaging. There are three databases: one used by the queues (automatic; no work required on the developer's part), a "work" database (consider it a living scratchpad for services), and a "search" database, where our data ends up and is optimized for various use cases. + +When there is data to store, we either use S3 or Postgres. + +## a new service, step-by-step + +### create a new `cmd` + +In the `cmd` directory, create a new folder for the service. We'll call this new example service `searchapi`. Because these will become golang paths, lets avoid using any symbols, etc. in the service names. + +### create `main.go` + +In the service folder, create a `main.go`. + +```go +package main + +import ( + "sync" +) + +func main() { + var wg sync.WaitGroup + wg.Add(1) + wg.Wait() +} +``` + +### add a `Makefile` + +You can copy an existing file, or create a new one. At the least, it should provide `run`, `build`, and `clean`. + +``` +run: clean + go run *.go + +build: clean generate + go build -buildvcs=false -o service.exe + +clean: + rm -f service.exe +``` + +All services are output as `service.exe` for consistency. + +### add to the container stack + +In the top-level `compose.yaml`, add a new entry for the service. You might begin by copying an existing service. Note that it will (possibly) need a port number (if it has any internal connections). Otherwise, it should be straight-forward: + +``` +searchapi: + <<: *services-common + image: jemison/dev + # Simulate CF + # https://stackoverflow.com/questions/42345235/how-to-specify-memory-cpu-limit-in-docker-compose-version-3 + deploy: + resources: + limits: + memory: 64m + build: + context: . + dockerfile: ./cmd/searchapi/Dockerfile + entrypoint: /home/vcap/app/cmd/searchapi/run.sh + volumes: + - type: bind + source: . + target: /home/vcap/app + ports: + - 10007:8888 + # https://docs.docker.com/compose/how-tos/startup-order/ + depends_on: + nginx: + condition: service_started + jemison-queues-db: + condition: service_healthy + jemison-work-db: + condition: service_healthy + healthcheck: + test: curl --fail http://searchapi:8888/heartbeat || exit 1 + interval: 60s + timeout: 180s + retries: 3 + start_period: 60s + environment: + ENV: "DOCKER" + PORT: 8888 + DEBUG_LEVEL: debug + GIN_MODE: debug + SCHEDULE: ${SCHEDULE:-""} +``` + +Things to check: + +* `depends_on`: If there are services that we need to wait for, put those here. +* `healthcheck`: this should be the same across the board +* `environment`: thsese want to be double-checked (this is a "to do"), but setting the `ENV` to `DOCKER` is a must. + +(The containerization/runtime configuration is still a bit in flux/wants discussion/design work. But, copy-pasta-ing this is probably fine for the moment.) + +### test + +At this point, test that the build works, and the service comes up. The example provided has an infinite wait. It probably will *not* respond to healthchecks... we'll add that in a moment. + +### add configuration + +Every service has configuration that must be defined. + +In `config/services`, add a new service Jsonnet file. + +E.g. in `searchapi.jsonnet`: + +```jsonnet +local B = import 'base.libsonnet'; +local service = 'searchapi'; + +local credentials = [ + [ + 'port', + { cf: 8080, container: 8888, localhost: 8888 }, + ], +]; + +local parameters = [ + [ + 'debug_level', + { cf: 'warn', container: 'debug', localhost: 'debug'}, + ], +] + B.parameters; + +{ + creds:: [[service] + x for x in credentials], + params:: [[service] + x for x in parameters], + cf: B.params('credentials', 'cf', service, self.creds) + + B.params('parameters', 'cf', service, self.params), + container: { name: service } + + B.params('credentials', 'container', service, self.creds) + + B.params('parameters', 'container', service, self.params), +} +``` + +This is a good, basic config. It imports some common config, sets the debug level (which every service is expected to have), and that's it. This will automatically be slurped up by the build. + +### add a healthcheck + +We need to now add some common infrastructure to the service. + +``` +var ThisServiceName = "searchapi" + +func main() { + env.InitGlobalEnv(ThisServiceName) + engine := common.InitializeAPI() + + zap.L().Info("listening to the music of the spheres", + zap.String("port", env.Env.Port)) + // Local and Cloud should both get this from the environment. + http.ListenAndServe(":"+env.Env.Port, engine) +} +``` + +We may want to consider having some global constants for service names, instead of strings. However, there's no way to make that constant work across the config files and the applications, so... :shrug:. We use strings for the moment. + +It should now be possible to stand up the stack with the new service, and have it respond to healthchecks. The common API initialization establishes a basic healthcheck for every single service. We need this so that Cloud.gov will be able to tell if our services are alive/responding. + +### add to the deployment + +You should, when you're ready, add to the TF deployment. We'll document this later (when we have a standardized TF deploy). + +### add the queues + +If the service communicates via queues, you need to add that in. There are two ways it might communicate: + +1. As a worker +2. As a job creator + +We'll handle each in turn. + +#### add a job creator + +If the service creates jobs for other services, there is common code for this. + +In `main`: + +``` +var ThisServiceName = "searchapi" +var ChQSHP = make(chan queueing.QSHP) + +func main() { + env.InitGlobalEnv(ThisServiceName) + InitializeQueues() + + engine := common.InitializeAPI() + + go queueing.Enqueue(ChQSHP) + + zap.L().Info("listening to the music of the spheres", + zap.String("port", env.Env.Port)) + // Local and Cloud should both get this from the environment. + http.ListenAndServe(":"+env.Env.Port, engine) +} +``` + +The two lines do two things: + +1. We create a channel global to the service called ChQSHP. This is short for "Channel for Queue name, Scheme, Host, and Port." A channel is like a wire; it lets you send data in one end, and at the other, a process will pick up the data and do stuff. +2. We pass one end of the channel to `Enqueue` in the `queueing` library. (This is an internal library that is part of Jemison.) + +Now, we can, anywhere in the service, send a message down this channel. This is how we enqueue new jobs for other services (or, oddly, even the service we're writing.) + +``` + ChQSHP <- queueing.QSHP{ + Queue: "entree", + Scheme: "https", + Host: "blogs.nasa.gov", + Path: "/astronauts", + } +``` + +This says "create a QSHP data structure, and pack it with values. (The channel takes one value; to pass multiple values, we pass them in a struct.) Then, once packed, we send the value over the channel with the `<-` operator. + +* https://go.dev/tour/concurrency/2 +* https://gobyexample.com/channels + +The work of sending the message to the queue (a DB table) is handled by the `Enqueue` function. If we add services, we need to extend the `Enqueue` function to handle them. It isn't magic, and it has some checking to make sure the string we pass (the service name) is one we know about. In other words, it serves as a gatekeeper to make sure we don't send messages off into the void. + +#### as a job worker + +The above is how we send jobs *to* the queue. What if we want to work jobs from the queue? + +Add a function call to `main`: + +``` +var ThisServiceName = "searchapi" + +func main() { + env.InitGlobalEnv(ThisServiceName) + InitializeQueues() + + engine := common.InitializeAPI() + + zap.L().Info("listening to the music of the spheres", + zap.String("port", env.Env.Port)) + // Local and Cloud should both get this from the environment. + http.ListenAndServe(":"+env.Env.Port, engine) +} +``` + +It needs to come after the env is initialized (or we don't have logging), and should probably come before most anything else (because we want to fail fast if we can't establish communication with the DB). + +Now, in a file called `queues.go`, add that function. It probably looks like this to start: + +```go +// The work client, doing the work of `SearchApi` +var SearchApiPool *pgxpool.Pool +var SearchApiClient *river.Client[pgx.Tx] + +type SearchApiWorker struct { + river.WorkerDefaults[common.SearchApiArgs] +} + +func InitializeQueues() { + queueing.InitializeRiverQueues() + + ctx, fP, workers := common.CommonQueueInit() + SearchApiPool = fP + + // Essentially adds a worker "type" to the work engine. + river.AddWorker(workers, &SearchApiWorker{}) + + // Grab the number of workers from the config. + SearchApiService, err := env.Env.GetUserService(ThisServiceName) + if err != nil { + zap.L().Error("could not SearchApi service config") + log.Println(err) + os.Exit(1) + } + + // Work client + SearchApiClient, err = river.NewClient(riverpgxv5.New(SearchApiPool), &river.Config{ + Queues: map[string]river.QueueConfig{ + ThisServiceName: {MaxWorkers: int(SearchApiService.GetParamInt64("workers"))}, + }, + Workers: workers, + }) + + if err != nil { + zap.L().Error("could not establish worker pool") + log.Println(err) + os.Exit(1) + } + + // Start the work clients + if err := SearchApiClient.Start(ctx); err != nil { + zap.L().Error("workers are not the means of production. exiting.") + os.Exit(42) + } +} +``` + +Someday, it would be nice to make this boilerplate go away. However, the generics involved make standardizing it difficult. I haven't figured it out yet... + +This code... + +1. Initializes the queues. This happens in every service. +2. We use a common init to get a DB context, a pool, and a list of workers. (The list is empty.) +3. We set a global (to the service) to the pool. We use it elsewhere. +4. Register the worker for this service with the library. +5. Get the configuration for this service. +6. Use the configuration value to create a work client, and assign the number of workers defined in the config file. +7. Start the workers. + +We have to add the job arguments structure to our common infrastructure: + +```go +type SearchApiArgs struct { + Scheme string `json:"scheme"` + Host string `json:"host"` + Path string `json:"path"` +} + +func (SearchApiArgs) Kind() string { + return "searchapi" +} +``` + +This is in `internal/common/types.go`. This datastructure is what gets turned into JSON and stuck in the DB as part of the queueing process. Most services in Jemison only pass URLs, but if you need to pass additional data, it will go here. This is how you get data from one service to the next. + +We have one more thing to do: we have to define the workers. + +In a file called `work.go`: + +```go +func (w *FetchWorker) Work(ctx context.Context, job *river.Job[common.FetchArgs]) error { + + // ... possibly turn the arguments into a URL? + u := url.URL{ + Scheme: job.Args.Scheme, + Host: job.Args.Host, + Path: job.Args.Path, + } + + // ... do stuff ... + + // ... enqueue something to another service? + ChQSHP <- queueing.QSHP{ + Queue: "extract", + Scheme: job.Args.Scheme, + Host: job.Args.Host, + Path: job.Args.Path, + } + + // Return `nil` if everything went well. + // Returning an error anywhere in the work function will cause it to be requeued. + return nil +} +``` + + +## draw the owl + +For those who are not familiar with the "draw the owl" meme: + +https://www.reddit.com/r/funny/comments/eccj2/how_to_draw_an_owl/?rdt=34092 + +From here, you just have to draw the owl. :D + +Humor aside, the question is "what does your service do?" If this is the search API, then you're going to be adding code to implement a Gin API engine. It will probably **not** take jobs from the queue, because it is intended to receieve search queries, hit the `search` database, and return JSON data. + +However... some questions: + +1. How are we going to log data about searches? Will it... talk directly to a database? If so, we have some boilerplate for talking to the databases. + 1. Or, will we enqueue our data, and have a service that grabs all performance/app data, and stores it somewhere in the work DB? This way, all our services might have some internal metrics that they can simply enqueue for the `metrics` service to consume and store. +2. Do we ever need to control the search engine? For example, do we ever want to turn it off? Configure it? If so, then perhaps our `admin` component will enqueue messages for this service, and those will be "control signals." If that's the case, then we will need a work component, and it will watch for those messages. + 1. Why would we do this *this way*? Because we do *not* want to introduce the idea that services talk to each-other via API. We have queues. Use them. + +There might be others. But, the point being: this is where we start drawing the owl. \ No newline at end of file