Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log queries #20

Closed
vhiairrassary opened this issue Dec 15, 2024 · 4 comments
Closed

Log queries #20

vhiairrassary opened this issue Dec 15, 2024 · 4 comments

Comments

@vhiairrassary
Copy link
Collaborator

Hello,

I think it would be a great addition to log the executed queries. Would you be interested in this feature? I am willing to do the change.

@lmangani
Copy link
Collaborator

Hey @vhiairrassary first of all thank you! We just implemented basic logging some days ago, but the feature is very very basic. If you feel like testing it and making it better, that'd be greatly appreciated: #17

If you plan on contributing, I'd gladly add you to the group!

@mskyttner
Copy link

Hi @lmangani and @vhiairrassary, interesting work you are doing with executing statements and logging etc! I just wanted to add a few ideas about logging. First of all, logging to stdout was added recently (very nice!) which makes it possible to create a container with duckdb and this extension installed and get log output as if it was any other container/webserver.

I think it would be neat if the log would be in the standard combined log format showing also the request for the executed queries discussed here.

To illustrate a setup for testing I thought I could describe what I am using. I have a Containerfile/Dockerfile which installs duckdb CLI and the extension and it has a default startup command to serve an init.sql script which can set up credentials etc for serving certain data. In the end it launches the httpserver.

The following Containerfile can be built into a container image using docker or podman, i.e. docker build -t duckserve . -f Containerfile:

FROM debian:bookworm-slim

RUN apt update && apt install -y --no-install-recommends \
	wget \
	ca-certificates \
	unzip \
	procps

ENV DUCKDB_VER=v1.1.3

WORKDIR /usr/local/bin

RUN wget -O cli.zip "https://github.com/duckdb/duckdb/releases/download/$DUCKDB_VER/duckdb_cli-linux
-amd64.zip" && \
	unzip cli.zip && \
	rm cli.zip && \
	chmod +x duckdb

WORKDIR /data

RUN duckdb myduck.db 'CALL dbgen(sf=0.1); select 42;'
RUN duckdb myduck.db 'install "httpfs"; load "httpfs";'
RUN duckdb myduck.db 'install "httpserver" from community; load "httpserver";'

VOLUME ["/data"]
EXPOSE 9999

CMD ["sh", "-c", "duckdb"]

After building we have a container image called duckserve and in order to start a process/container with specific data as a service (so we can compare the log output to other commonly used components such as nginx) the following compose.yaml file can be created:

services:
  duckserve:
    image: duckserve
    command: duckdb -init serve.sql myduck.db
    environment:
      - DUCKDB_HTTPSERVER_DEBUG=1
      - DUCKDB_HTTPSERVER_FOREGROUND=1
    ports:
      - "9999:9999"
    volumes:
      - ./init.sql:/data/serve.sql:ro

  nginx:
    image: nginx:alpine
    volumes:
      - ./proxy.conf:/etc/nginx/conf.d/default.conf
    ports:
      - "80:80"

The .sql which executes initially can look like this:

load httpfs; load httpserver;
set disabled_filesystems = 'LocalFileSystem';
select httpserve_start('0.0.0.0', 9999, '');

If going to the duckserve service and issuing a query like from lineitem limit 5; we see a log entry:

duckserve-1  | [2024-12-16 11:47:21] GET / - 200 - from 172.27.0.3:57986

If we proxy this service with nginx, we see the following equivalent log entry there:

nginx-1      | 172.27.0.1 - - [16/Dec/2024:11:47:21 +0000] "GET / HTTP/1.1" 200 26090 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:133.0) Gecko/20100101 Firefox/133.0" "-"

The combined log (used in nginx and Apache and elsewhere, too) appears to be "standardized" and contains elements like the remote address, remote user, timestamp, request, status code, bytes sent, the referrer and http user agent in one line (more info here https://nginx.org/en/docs/http/ngx_http_log_module.html)

Perhaps the queries would be logged as the request (perhaps not if it is a POST)?

If so how would one want it to be logged?

@lmangani
Copy link
Collaborator

lmangani commented Dec 16, 2024

Thanks for the input @mskyttner we can most definitely adopt this format for logging - I'll take a look in the coming days.

About the queries, I would avoid it - DuckDB should be capable of producing its own query log separantely:

SET log_query_path = '/tmp/duckdb_query.log';

@lmangani
Copy link
Collaborator

@mskyttner here's a quick draft implementing the requested format: #25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants