Added log processing #75

ilyakooo0 · 2020-02-24T20:44:47Z

closes #74

denibertovic · 2020-02-26T22:49:51Z

Hey @ilyakooo0 Thanks for the PR. 🍰 I've only taken a cursory look as I'm short on time but I'll try and review it properly tomorrow. It seems the CI is erroring for unrelated reasons.

denibertovic · 2020-02-26T22:51:12Z

src/Docker/Client/Utils.hs

+
+processLog :: Monad m => ConduitT BSS.ByteString BSS.ByteString m ()
+processLog = do
+        -- metadata (is the next string stdout or stderr)


Do I understand correctly that the first 4 bytes in the STREAM just denote that the data marked as collected from stdout/stderr?

As I explained in the issue, I couldn't find any documentation regarding the format. All I could find was this repo: https://github.com/mafintosh/docker-raw-stream

In that repo, the second 4 bytes represent the length of the subsequent string, and the first bit denotes whether the string is from stdout or stderr.

I assume that the first 4 bytes are reserved for metadata.

I have had the opportunity to test this branch and it works as expected.

denibertovic · 2020-02-26T22:51:43Z

src/Docker/Client/Utils.hs

+processLog = do
+        -- metadata (is the next string stdout or stderr)
+        _ <- CB.take 4
+        len' <- CB.take 4


Why do we only take 4 bytes at a time?

You mean, why don't we read the first 8 bytes in one go?

I'm not sure reading it in one go and partitioning into two parts would be terribly more efficient.

ilyakooo0 · 2020-02-27T09:34:50Z

Actually, some of the builds errored because I used the new ConduitT, which is not available in the older versions.

UPD: I now fixed it and builds are failing only for unrelated reasons

mbg · 2020-04-02T18:39:13Z

Hi everyone, I ran into the same problem yesterday, but found that the format is documented in https://docs.docker.com/engine/api/v1.40/#operation/ContainerAttach

Here's some (simplified) code to show how I deal with it at the moment (which seems to work fine):

import Data.ByteString (ByteString)
import qualified Data.ByteString as BS

testSink :: ConduitT ByteString Void IO ()
testSink = do
    -- wait for data to become available
    mbs <- await

    -- the data might be `Nothing` if the conduit has been closed,
    -- so lets decide what to do...
    case mbs of
        Nothing -> pure ()
        -- otherwise, if there is data, send it to the output channel with
        -- appropriate metadata added to it
        Just bs  -> do
            -- the docker daemon attaches 8 byte headers to each line of
            -- output, see:
            -- https://docs.docker.com/engine/api/v1.40/#operation/ContainerAttach
            -- the first byte identifies the stream (1 - stdout, 2 - stderr)
            let streamType = BS.unpack (BS.take 1 bs)
            let message = BS.drop 8 bs

            -- do something with `message` according to `streamType`

            -- resume waiting for input
            testSink


-- put this somewhere else in your program to use it
getContainerLogsStream logOpts containerID $ CB.lines .| testSink

It is worth noting that it may not be sensible to deal with this directly in the implementation of getContainerLogsStream since the 8 byte headers are only present if tty: false for the container. If a user sets tty: true in the CreateOpts then the 8 byte header will not be present.

denibertovic · 2020-04-13T22:23:14Z

Hey @mbg thanks for the clarification and the example. It was always the intention of the library to leave the stream processing outside of the library ie. let the user define the sink that gets passed to the getContainerLogsSream function. So I'm not sure if it's wise to include this entire machinery in the library itself.

That said, there is too much abstration leaking outside of the library and the user has to know docker api internals to construct a proper sink....(ie. the message structure and what the first bytes mean and so on). So I think we need to address this somehow.

denibertovic · 2022-04-05T14:20:25Z

I'd love to get this one merged but I haven't had time to look into it in more detail.
I still think we shouldn't handle the stream necessarily in the library but there's also too much abstraction leaking if we let the users implement it themselves.

Would an example stream that just prints stuff to stdout and a pointer to it in the docs be the best course of action? @ilyakooo0 @mbg what do you think?

ilyakooo0 · 2022-04-05T14:26:12Z

IMO the main problem is that the raw stream produces invalid characters (as some of the data is not text at all). So, the stream is mostly useless unless you do the exact processing that is done in this PR even if you want to get the binary output of the executable.

I can't imagine a reason why someone would use this library and want to get the raw unprocessed stream.

mbg · 2022-04-05T15:02:47Z

I agree with @ilyakooo0. I think that there should be an implementation of this in the library, at the very least as an opt-in (i.e. even if it was just something along the lines of the testSink from my comment that got exported). The two questions for me are:

If the processing gets incorporated into getContainerLogsStream, how do we disable it if "tty": true (as I noted in my comment). This is not an issue if the processing is provided by a separate function.
Users would probably want more than just the ByteString representing the message, but also the source of the message (stdout/stderr). So we might need to introduce a new data type to represent a line of output for that.

Neither of these are particularly big issues, they are just design decisions that need to be made.

mbg · 2022-09-22T08:11:30Z

@denibertovic have you had any more thoughts on this / have you made a decision?

denibertovic · 2022-09-22T13:51:02Z

@mbg I agree with your previous comment. This should be implemented as a separate function and not directly in getContainerLogsStream so that it can handle the tty: true case. We export that function and provide docs/example on how to use it with getContainerLogStream.
I haven't thought about your second point but introducing a new type for that seems fine to me.

I don't have time to work on this but I'd be happy to merge if the above changes were made.

ilyakooo0 added 2 commits February 24, 2020 23:16

Added log processing

ab58276

Fixed typo

5cfdc4d

ilyakooo0 requested a review from denibertovic February 24, 2020 20:44

Fixed typo

d2524ec

denibertovic reviewed Feb 26, 2020

View reviewed changes

ConduitT -> ConduitM

36eaac1

ilyakooo0 requested a review from denibertovic February 28, 2020 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added log processing #75

Added log processing #75

ilyakooo0 commented Feb 24, 2020

denibertovic commented Feb 26, 2020

denibertovic Feb 26, 2020

ilyakooo0 Feb 27, 2020

denibertovic Feb 26, 2020

ilyakooo0 Feb 27, 2020

ilyakooo0 Feb 27, 2020

ilyakooo0 commented Feb 27, 2020 •

edited

Loading

mbg commented Apr 2, 2020 •

edited

Loading

denibertovic commented Apr 13, 2020

denibertovic commented Apr 5, 2022

ilyakooo0 commented Apr 5, 2022

mbg commented Apr 5, 2022

mbg commented Sep 22, 2022

denibertovic commented Sep 22, 2022

Added log processing #75

Are you sure you want to change the base?

Added log processing #75

Conversation

ilyakooo0 commented Feb 24, 2020

denibertovic commented Feb 26, 2020

denibertovic Feb 26, 2020

Choose a reason for hiding this comment

ilyakooo0 Feb 27, 2020

Choose a reason for hiding this comment

denibertovic Feb 26, 2020

Choose a reason for hiding this comment

ilyakooo0 Feb 27, 2020

Choose a reason for hiding this comment

ilyakooo0 Feb 27, 2020

Choose a reason for hiding this comment

ilyakooo0 commented Feb 27, 2020 • edited Loading

mbg commented Apr 2, 2020 • edited Loading

denibertovic commented Apr 13, 2020

denibertovic commented Apr 5, 2022

ilyakooo0 commented Apr 5, 2022

mbg commented Apr 5, 2022

mbg commented Sep 22, 2022

denibertovic commented Sep 22, 2022

ilyakooo0 commented Feb 27, 2020 •

edited

Loading

mbg commented Apr 2, 2020 •

edited

Loading