Req: Documentation request for scaling in production #95

maradanasai · 2024-01-14T04:09:26Z

Hi, this is amazing and liked a lot. Can someone provide an architecture or documentation on how to use this in production with scale?

undefined-moe · 2024-01-14T09:58:54Z

As those sandbox runner interact with local filesystem (file cache not shared across multiple instances), i would suggest running a controller along with a single sandbox daemon on each machine, and those controllers then connects to a master that deals with task distribution.

maradanasai · 2024-01-14T12:27:05Z

Hi @undefined-moe thanks for getting back. Can you please elaborate with couple of options on low level with more details?

If I can run multiple go-judge instances as runners among the multiple VMs in cloud and distributing the incoming submissions traffic (using messaging queue or load balancer), how to deal with response events?

I would like to provide the real time updates about the execution of test cases to User when submitted the program. Can you please provide your thoughts on this in detail?

undefined-moe · 2024-01-14T19:40:12Z

There are two ways of splitting tasks,

by judge task
by testcase

For the first way, a single judge task is sticked on the same mechine so that their would be lower cost to transfer compiled binaries across vms, pseudo code below:

ws = new WebSocket(masterAddr);
ws.onmessage = (msg) => {
  task = parseTask(msg);
  result = compile(task);
  ws.send(result);
  testcases.forEachParallel(() => {
    result = runProgram();
    ws.send(result);
  });
}

You can also check detailed implementation here:

JudgeClientAdapter https://github.com/hydro-dev/Hydro/blob/master/packages/hydrojudge/src/hosts/hydro.ts
JudgeFlow https://github.com/hydro-dev/Hydro/blob/master/packages/hydrojudge/src/flow.ts
ServerSide https://github.com/hydro-dev/Hydro/blob/master/packages/hydrooj/src/handler/judge.ts

p.s. you have to run a controller client on each machine, only responsible for controlling the go-judge daemon on that machine (e.g. manage task state, download testdata, etc).

criyle · 2024-01-16T01:35:28Z

It was not designed to be used behind a load balancer since it has local cache which is stateful. Since transmitting files was considered as a rather expensive operation, it is recommend to deploy this as a sidecar with your controller application which split a request into multiple subsequent sandbox calls.

If you insist on load balancer and you can bear the cost of transmitting files over the network. I would recommend to mount shared file system (e.g. NFS) on all of your hosts and use -dir to point cache directory to your mount point in order to share the states over multiple hosts.

Alternatively you may implement a FileStore interface to your customized scalable implementation (e.g. s3), but you need to keep in mind that managing a separated infrastructure or using cloud services comes with cost.

maradanasai · 2024-01-16T03:40:27Z

Hi @criyle thanks for sharing it. Do you have a controller that is implemented for this? Can you please help with providing low level details and data flow on how to use this in production with scale?

criyle · 2024-01-16T03:51:51Z

You may check out the demo implementation that shows how judger deployed with sandbox, which receives the OJ task and processes compile and running calls to sandbox. In production environments like k8s, you can describe this combination as a pod and scale at pod level.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Req: Documentation request for scaling in production #95

Req: Documentation request for scaling in production #95

maradanasai commented Jan 14, 2024 •

edited

Loading

undefined-moe commented Jan 14, 2024

maradanasai commented Jan 14, 2024

undefined-moe commented Jan 14, 2024 •

edited

Loading

criyle commented Jan 16, 2024

maradanasai commented Jan 16, 2024 •

edited

Loading

criyle commented Jan 16, 2024

Req: Documentation request for scaling in production #95

Req: Documentation request for scaling in production #95

Comments

maradanasai commented Jan 14, 2024 • edited Loading

undefined-moe commented Jan 14, 2024

maradanasai commented Jan 14, 2024

undefined-moe commented Jan 14, 2024 • edited Loading

criyle commented Jan 16, 2024

maradanasai commented Jan 16, 2024 • edited Loading

criyle commented Jan 16, 2024

maradanasai commented Jan 14, 2024 •

edited

Loading

undefined-moe commented Jan 14, 2024 •

edited

Loading

maradanasai commented Jan 16, 2024 •

edited

Loading