Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] "Name does not resolve" when using Authentik OIDC #1558

Open
skjdghsdjgsdj opened this issue Jan 27, 2025 · 8 comments
Open

[Bug] "Name does not resolve" when using Authentik OIDC #1558

skjdghsdjgsdj opened this issue Jan 27, 2025 · 8 comments
Assignees

Comments

@skjdghsdjgsdj
Copy link

RomM version
3.7.3

Describe the bug
I cannot get Authentik SSO login to work. When attempting to log in using the "Login with Authentik" button on the login page, I get "Internal server error" and the romm container emits a massive stack trace ultimately referencing failed DNS resolution.

I censored the domain that it tries to resolve for privacy, but the domain definitely resolves properly on both my LAN and over the internet. On my LAN, it resolves to a CNAME that itself points to an A record. On the internet, it resolves straight to an A record. I updated my internal DNS to use an A record as a test, but it made no difference. My DNS does not offer an IPv6 address by design.

My RomM instance otherwise works properly when I use a local login. This error occurs only after I'm logged in via Authentik; i.e., I only see it once I'm either already logged into my SSO or, if I'm logged out, after I submit the login form in Authentik. If I'm not logged into the SSO and I click "Login with Authentik", I see the SSO login, but it fails once I submit the login.

I'm absolutely sure I'm using the same client ID and secret in my Authentik provider setup and in the docker-compose.yml.

Coincidentally or not, the login via Authentik does work in Safari. Why I have no idea: it fails in Firefox and in Chrome. I have several extensions in Firefox, but none in Chrome, plus the DNS resolution error is server-side. I speculate there's some race condition where the A record resolves first before the AAAA resolution fails (as expected).

The container log is attached. My docker-compose.yml as follows. I'm running Docker in a Proxmox LXC. None of the other services I run in any other containers or VMs have this problem.

When I'm in the Proxmox LXC, dig mydomain works fine.

version: "3"

volumes:
  mysql_data:
  romm_resources:
  romm_redis_data:

services:
  romm:
    image: rommapp/romm:latest
    container_name: romm
    restart: unless-stopped
    environment:
      - DB_HOST=romm-db
      - DB_NAME=romm # Should match MARIADB_DATABASE in mariadb
      - DB_USER=romm-user # Should match MARIADB_USER in mariadb
      - DB_PASSWD=... # Should match MARIADB_PASSWORD in mariadb
      - ROMM_AUTH_SECRET_KEY=... # Generate a key with `openssl rand -hex 32`
      - IGDB_CLIENT_ID=... # Generate an ID and SECRET in IGDB
      - IGDB_CLIENT_SECRET=... # https://api-docs.igdb.com/#account-creation
      - MOBYGAMES_API_KEY= # https://www.mobygames.com/info/api/
      - STEAMGRIDDB_API_KEY= # https://github.com/rommapp/romm/wiki/Generate-API-Keys#steamgriddb
      - OIDC_ENABLED=true
      - OIDC_PROVIDER=authentik
      - OIDC_CLIENT_ID=...
      - OIDC_CLIENT_SECRET=...
      - OIDC_REDIRECT_URI=https://.../api/oauth/openid # the domain is RomM's
      - OIDC_SERVER_APPLICATION_URL=https://.../application/o/romm/ # the domain is Authentik's
      - DISABLE_USERPASS_LOGIN=true
    volumes:
      - romm_resources:/romm/resources # Resources fetched from IGDB (covers, screenshots, etc.)
      - romm_redis_data:/redis-data # Cached data for background tasks
      - /mnt/games/ROMM:/romm/library # Your game library. Check https://github.com/rommapp/romm?tab=readme-ov-file#folder-structure for more details.
      - /mnt/games/ROMM/assets:/romm/assets # Uploaded saves, states, etc.
      - /mnt/games/ROMM/config:/romm/config # Path where config.yml is stored
    ports:
      - 80:8080
    depends_on:
      romm-db:
        condition: service_healthy
        restart: true

  romm-db:
    image: mariadb:latest
    container_name: romm-db
    restart: unless-stopped
    environment:
      - MARIADB_ROOT_PASSWORD=... # Use a unique, secure password
      - MARIADB_DATABASE=romm
      - MARIADB_USER=romm-user
      - MARIADB_PASSWORD=...
    volumes:
      - mysql_data:/var/lib/mysql
    healthcheck:
      test: ["CMD", "healthcheck.sh", "--connect", "--innodb_initialized"]
      start_period: 30s
      start_interval: 10s
      interval: 10s
      timeout: 5s
      retries: 5
  dozzle-agent:
    image: amir20/dozzle:latest
    command: agent
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - 7007:7007

To Reproduce
Set up Authentik login in docker-compose.yml and an OIDC provider and application in Authentik. Attempt to log into RomM via Authentik and, when a user is logged into Authentik, requests to RomM fail.

Expected behavior
Logging in via Authentik should redirect via OIDC successfully.

Screenshots
n/a

Desktop (please complete the following information):

  • OS: macOS 15.2
  • Browser: Firefox and Chrome
  • Version: latest

Additional context
Add any other context about the problem here.

@gantoine
Copy link
Member

if you docker exec -it romm sh and try to ping the authentic URL/container, do you get a response? are you running authentic inside docker as well, and do they share a network?

@gantoine gantoine self-assigned this Jan 27, 2025
@skjdghsdjgsdj
Copy link
Author

skjdghsdjgsdj commented Jan 27, 2025

It does resolve with ping inside the container. Authentik is running in Docker but in a different LXC. That is, effectively Authentik is running on a different machine.

If it helps to know the actual subnets:

  • 10.47.1.0/16 is used by Proxmox as a subnet shared across all the LXCs.
  • 192.168.1.0/24 is the LAN subnet.
  • 10.0.3.25/24 is a NAT interface that the LXC uses to access the internet.

I will point out that both RomM and Authentik are proxied through Caddy, but in a straightforward setup and networking-wise RomM and Authentik are no different than other Caddy reverse proxies I have that work fine, so I highly doubt it's Caddy causing a problem.

And for IPs:

  • 192.168.1.2 is the DNS server's IP address for devices in that subnet.
  • 192.168.1.7 is my Caddy instance. When I connect to the container with docker exec -it romm sh and ping my Authentik instance, this is the result I get. This is expected because my RomM instance points to a CNAME to my Caddy instance, and that CNAME DNS entry is an A record to 192.168.1.7.
  • 10.47.1.116 is the DNS server's IP address for Proxmox LXCs and is preferable, but both that and 192.168.1.7 should work when inside an LXC.
  • 192.168.1.220 was my computer that attempted to access RomM.

If it helps: I did read some Linux DNS resolvers have issues if IPv6 is not available, but they attempt both A and AAAA lookups and the AAAA happens to return first with NXDOMAIN or some other failure, and therefore the entire lookup erroneously fails instead of using the result of the successful A lookup. Perhaps it's a problem with the DNS resolution that's built into Python.

I have no problem like this anywhere else in my Proxmox or Docker infrastructure, so I'm led to believe this is a problem with the Docker image or its networking settings. I'm happy to try any troubleshooting options but I'm not sure what to attempt from here.

@skjdghsdjgsdj
Copy link
Author

I noted it works with Safari. That's true on one of my computers. On another, the Safari behavior is the same as Chrome and Firefox, so maybe there was lingering auth data that allowed it to succeed.

@skjdghsdjgsdj
Copy link
Author

skjdghsdjgsdj commented Jan 28, 2025

So here's some more information, hopefully. I ran tcpdump to watch the raw DNS traffic in the Proxmox container, and therefore all UDP port 53 traffic from the system and the Docker containers. Here's the result:

# tcpdump -nnni eth0 udp port 53 -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
02:36:55.852401 IP (tos 0x0, ttl 63, id 49218, offset 0, flags [DF], proto UDP (17), length 68)
    10.0.3.25.46008 > 10.0.3.1.53: 62107+ AAAA? authentik.instance. (40)
02:36:55.852402 IP (tos 0x0, ttl 63, id 43613, offset 0, flags [DF], proto UDP (17), length 68)
    10.0.3.25.43029 > 10.0.3.1.53: 61704+ A? authentik.instance. (40)
02:36:55.852711 IP (tos 0x0, ttl 64, id 6800, offset 0, flags [DF], proto UDP (17), length 196)
    10.0.3.1.53 > 10.0.3.25.46008: 62107 0/1/0 (168)
02:36:55.852726 IP (tos 0x0, ttl 64, id 6801, offset 0, flags [DF], proto UDP (17), length 123)
    10.0.3.1.53 > 10.0.3.25.43029: 61704 2/0/0 authentik.instance. CNAME caddy.instance., caddy.instance. A 192.168.1.7 (95)

In this example, I replaced my Authentik domain with authentik.instance and my Caddy instance with caddy.instance. The actual IPs are preserved.

This does follow the pattern I saw earlier: RomM attempts a DNS lookup of both the IPv4 and IPv6 addresses for Authentik. The IPv6 request happens to return first and correctly replies with a failed lookup, even though the IPv4 request succeeds, also as expected. Like I suggested in my original issue, this almost sounds like a race condition in which the IPv6 request happens to return first, and instead of the DNS resolver waiting for both requests to complete, it just immediately gives up once it sees any failure.

A fix for my specific case may be to disable IPv6 lookups entirely. I'm not sure how to do that, especially without changing kernel settings on the host Proxmox server which I don't want to do; I'm fine changing system settings in the LXC or the Docker containers where possible.

But, if indeed the issue is the resolver failing on any resolution error instead of all resolution errors, both IPv4 and IPv6, that sounds more like a library or system/container problem and that should be fixed instead.

edit: (╯°□°)╯︵ ┻━┻ Okay I have no idea what the hell is happening. On my one computer where Safari works fine for Authentik logins to RomM, the tcpdump for DNS looks exactly the same, including the IPv6 lookup returning first with no response and the IPv4 lookup succeeding. I'd be happy to do whatever testing can help track this down.

@Anas-Eha
Copy link

Anas-Eha commented Jan 28, 2025

I'm using a similar set up (AD being local DNS, Opensense + Unbound + HAproxy for reverse proxy in a different machine, firewall and external DNS). Authentik and Romm do not talk to each other, except through the proxy.

I disabled Ipv6 on my DNS and removed all AAAA records just to test. Stopped all stacks and restarted them. I was still able to connect with Authentik internally and externally. I'm using Firefox for the browser.

I had a problem with Authentik before with Romm, the cause was a space somewhere in the compose, hence why I'm sharing mine below, if that could help.

My compose for reference and the .env file. (I only included romm)

#romm:
    image: rommapp/romm:latest
    container_name: romm
    restart: unless-stopped
    env_file:
      - stack.env
    environment:
      DB_HOST: ${DB_HOST}
      DB_NAME: ${DB_NAME}
      DB_USER: ${DB_USER}
      DB_PASSWD: ${DB_PASSWD}
      ROMM_AUTH_SECRET_KEY: ${ROMM_AUTH_SECRET_KEY}
      STEAMGRIDDB_API_KEY: ${STEAMGRIDDB_API_KEY}
      IGDB_CLIENT_ID: ${IGDB_CLIENT_ID}
      IGDB_CLIENT_SECRET: ${IGDB_CLIENT_SECRET}
      OIDC_ENABLED: ${OIDC_ENABLED}
      OIDC_PROVIDER: ${OIDC_PROVIDER}
      OIDC_REDIRECT_URI: ${OIDC_REDIRECT_URI}
      OIDC_SERVER_APPLICATION_URL: ${OIDC_SERVER_APPLICATION_URL}
      OIDC_CLIENT_ID: ${OIDC_CLIENT_ID}
      OIDC_CLIENT_SECRET: ${OIDC_CLIENT_SECRET}
      DISABLE_USERPASS_LOGIN: ${DISABLE_USERPASS_LOGIN}
    volumes:
      - romm_resources:/romm/resources
      - romm_redis_data:/redis-data
      - /home/docker/RomM/library:/romm/library
      - /home/docker/RomM/assets:/romm/assets
      - /home/docker/RomM/config:/romm/config
      - /home/docker/certs:/romm/certs
    ports:
      - 62080:8080
    depends_on:
      romm-db:
        condition: service_healthy

stack.env

DB_HOST=romm-db
DB_NAME=romm
DB_USER=romm-user
DB_PASSWD=...
ROMM_AUTH_SECRET_KEY=...
STEAMGRIDDB_API_KEY=...
MARIADB_ROOT_PASSWORD=...
MARIADB_DATABASE=romm
MARIADB_USER=romm-user
MARIADB_PASSWORD=...
IGDB_CLIENT_ID=...
IGDB_CLIENT_SECRET=...
OIDC_CLIENT_ID=...
OIDC_CLIENT_SECRET=...
OIDC_ENABLED=true
OIDC_PROVIDER=Authentik
OIDC_REDIRECT_URI=https://romm.example.com/api/oauth/openid
OIDC_SERVER_APPLICATION_URL=https://authentik.example.com/application/o/romm
DISABLE_USERPASS_LOGIN=true

HA Proxy backend:

backend Romm
    # health checking is DISABLED
    mode http
    balance source
    # stickiness
    stick-table type ip size 50k expire 30m  
    stick on src
    http-reuse safe
    option forwarded proto by host by_port for for_port
    server Rommserver XXX:62080 

Authentik provider uses Signing Key self signed certificate.
And the redirect URI is https://romm.example.com/api/oauth/openid (without the trailing slash, saw that could mess up things, even if it is supposed to be fixed I think)

Hope that helps.

First message on github, hope I didn't break any convention / standards.

@skjdghsdjgsdj
Copy link
Author

Well this is interesting. I looked at your config and compared to mine. I have no trailing slash on my OIDC_REDIRECT_URI but I did have one on OIDC_SERVER_APPLICATION. I removed it and...things seem to be working?

I have no clue why that could cause a DNS resolution error in the logs unless the error is a lie and it's actually a 404 or something. I will keep monitoring to see if this actually fixed the problem or it's a fluke.

Regardless, I don't think RomM should emit this kind of error if the OIDC_SERVER_APPLICATION has a trailing slash. I will note in Authentik that the "OpenID Configuration Issuer" field, and in fact all the other fields, do have trailing slashes which suggests RomM is inconsistent with what Authentik defines.

@Anas-Eha
Copy link

Happy you got it potentially working : )

Just to note, this may be linked to that issue:
#1430

@skjdghsdjgsdj
Copy link
Author

Happy you got it potentially working : )

Just to note, this may be linked to that issue: #1430

Agreed, looks like the same cause, although mine is manifesting as a DNS error for no reason I can determine. Hopefully fixing #1430 fixes this too. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants