Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Req: Documentation request for scaling in production #95

Open
maradanasai opened this issue Jan 14, 2024 · 6 comments
Open

Req: Documentation request for scaling in production #95

maradanasai opened this issue Jan 14, 2024 · 6 comments

Comments

@maradanasai
Copy link

maradanasai commented Jan 14, 2024

Hi, this is amazing and liked a lot. Can someone provide an architecture or documentation on how to use this in production with scale?

@undefined-moe
Copy link
Contributor

As those sandbox runner interact with local filesystem (file cache not shared across multiple instances), i would suggest running a controller along with a single sandbox daemon on each machine, and those controllers then connects to a master that deals with task distribution.

@maradanasai
Copy link
Author

Hi @undefined-moe thanks for getting back. Can you please elaborate with couple of options on low level with more details?

If I can run multiple go-judge instances as runners among the multiple VMs in cloud and distributing the incoming submissions traffic (using messaging queue or load balancer), how to deal with response events?

I would like to provide the real time updates about the execution of test cases to User when submitted the program. Can you please provide your thoughts on this in detail?

@undefined-moe
Copy link
Contributor

undefined-moe commented Jan 14, 2024

There are two ways of splitting tasks,

  1. by judge task
  2. by testcase

For the first way, a single judge task is sticked on the same mechine so that their would be lower cost to transfer compiled binaries across vms, pseudo code below:

ws = new WebSocket(masterAddr);
ws.onmessage = (msg) => {
  task = parseTask(msg);
  result = compile(task);
  ws.send(result);
  testcases.forEachParallel(() => {
    result = runProgram();
    ws.send(result);
  });
}

You can also check detailed implementation here:

p.s. you have to run a controller client on each machine, only responsible for controlling the go-judge daemon on that machine (e.g. manage task state, download testdata, etc).

@criyle
Copy link
Owner

criyle commented Jan 16, 2024

It was not designed to be used behind a load balancer since it has local cache which is stateful. Since transmitting files was considered as a rather expensive operation, it is recommend to deploy this as a sidecar with your controller application which split a request into multiple subsequent sandbox calls.

If you insist on load balancer and you can bear the cost of transmitting files over the network. I would recommend to mount shared file system (e.g. NFS) on all of your hosts and use -dir to point cache directory to your mount point in order to share the states over multiple hosts.

Alternatively you may implement a FileStore interface to your customized scalable implementation (e.g. s3), but you need to keep in mind that managing a separated infrastructure or using cloud services comes with cost.

@maradanasai
Copy link
Author

maradanasai commented Jan 16, 2024

Hi @criyle thanks for sharing it. Do you have a controller that is implemented for this? Can you please help with providing low level details and data flow on how to use this in production with scale?

@criyle
Copy link
Owner

criyle commented Jan 16, 2024

You may check out the demo implementation that shows how judger deployed with sandbox, which receives the OJ task and processes compile and running calls to sandbox. In production environments like k8s, you can describe this combination as a pod and scale at pod level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants