Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved bypass4netns implementation #39

Merged
merged 55 commits into from
Jan 11, 2024

Conversation

naoki9911
Copy link
Collaborator

In this work, bypass4netns is entirely reimplemented and some new features added.

New features

  • Handling connections to bypassed sockets (--handle-c2c-connections option)
    This enables containers to connect to bypassed (published) sockets.
  • Policy-aware bypassing (--tracer option)
    This enables bypass4netns to take care of iptables or other ACLs in intermediate NetNS.
  • [Experimental] multi-node communication (--mulitnode option)
    This enables multi-node communication without VXLAN. This feature is experimental and more work is required to apply in usernetes.
  • Tests with existing applications (see GHA workflow)
    Tests for above features with existing applications including statically linked golang binary.

Removed features

  • bypassing UDP sockets (SOCK_DGRAM)
    Currently, this feature is not implemented yet.

@AkihiroSuda
Copy link
Member

Thanks, but why was UDP removed?

@naoki9911
Copy link
Collaborator Author

Thanks, but why was UDP removed?

New bypass4netns does not implement them yet for simplification of implementation.
Of course, bypass4netns should handle UDP sockets and I will add bypassing for UDP sockets.

- name: setup lxd (v5.19)
id: s1
if: steps.cache-restore.outputs.cache-hit != 'true'
run: ./setup_lxd.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these scripts into a subdirectory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 4b874fe

path: /tmp/test-image.tar.zst
lookup-only: true

- name: setup lxd (v5.19)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"v5.19" should be removed, as setup_lxd.sh does not pin the LXD version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 4b874fe

apt(8) configures sockets to be non-blocking via fcntl.
This patch records fcntl's F_SETFD, F_SETFL and applies them on
created sockets.

Signed-off-by: Naoki MATSUMOTO <[email protected]>
Re-constructed entire data format.
Currently, only SOCK_STREAM socket is handled.

Signed-off-by: Naoki MATSUMOTO <[email protected]>
When the listening socket is bypassed,
processes in a container cannot connect to the socket with inner port.
This patch handles connection to published port and rewrite destination
address.

TODO: return dummy destination address when getpeername(2) called.

Assuming the below situation.
When port 5021 is published as port 5202 (`-p 5202:5201`),
other processes in the container try to connect to 127.0.0.1:5201
or interface's address (e.g. 10.4.0.38:5201).

bypass4netns handles such connection
and rewrite the destination address to 127.0.0.1:5202 or ::1:5202

Signed-off-by: Naoki MATSUMOTO <[email protected]>
Normally, inter-container connections are handled by slirp4netns.
When the listening socket is bypassed,
other containers cannot connect to the socket.
This patch handles such connections and rewrite destination address.

TODO: care CNI-plugin's filter
Signed-off-by: Naoki MATSUMOTO <[email protected]>
bypass4netns bypasses sockets connecting to bypassed socket.
But this may break ACL or iptables config in intermediate NetNS.
Tracer agent checks whether the container can connect to
other container's port and only connectable connection is bypassed.

TODO: use RAW_SOCKET not to establish TCP connections

Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
This currently not utilized

Signed-off-by: Naoki MATSUMOTO <[email protected]>
TODO: allocate published port on host-side dynamically

Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
pid can be thread's pid and open_pidfd() fails with the pid.
When open_pidfd() fails, retry with the pid's tgid and replace the pid
to the tgid.

Signed-off-by: Naoki MATSUMOTO <[email protected]>
Some binaries (e.g. golang) close fd then create socket with same fd
immediately.
Seccomp notify sometimes drops first close and b4ns cannot bypass the socket.

This is workaround for such inconsistent condition.

Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Signed-off-by: Naoki MATSUMOTO <[email protected]>
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 3385f1f into rootless-containers:master Jan 11, 2024
23 checks passed
@AkihiroSuda AkihiroSuda added this to the v0.4.0 milestone Jan 11, 2024
@AkihiroSuda
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants