Skip to content

GBK Test Data Handling Issue #9107

Closed
Closed
@opencmit2

Description

@opencmit2

Bug Report

Describe the bug

Issue Content:

GBK Test Data Source

echo 小明 | iconv -f utf8 -t gbk  >> /tmp/test.log 

Fluent Bit Configuration File

[INPUT]
    Name         tail
    Tag          dummy.local
    Path         /tmp/test.log
[FILTER]
    Name wasm
    match *
    Event_Format json
    WASM_Path /data/flb312/etc/filter.wasm
    Function_Name go_filter
    accessible_paths .
[OUTPUT]
    Name  stdout
    Match *

Event_Format Set to JSON Handling Code and Corresponding Output

//export go_filter
func go_filter(tag *uint8, tag_len uint, time_sec uint, time_nsec uint, record *uint8, record_len uint) *uint8 {
        brecord := unsafe.Slice(record, record_len)
        fmt.Println(brecord) // [123 34 108 111 103 34 58 34 208 161 238 131 131 34 125]
        var p fastjson.Parser
        value, err := p.Parse(string(brecord))
        if err != nil {
                fmt.Printf("Error parsing JSON: %v\n", err)
                return nil
        }
        logValue := value.GetStringBytes("log")
        fmt.Printf("%v\n", logValue)  //[208 161 238 131 131]
        return nil
}

Fluent Bit Configuration File

[INPUT]
    Name         tail
    Tag          dummy.local
    Path         /tmp/test.log
[FILTER]
    Name wasm
    match *
    Event_Format msgpack
    WASM_Path /data/flb312/etc/filter.wasm
    Function_Name go_filter
    accessible_paths .
[OUTPUT]
    Name  stdout
    Match *

Event_Format Set to msgpack Handling Code and Corresponding Output

//export go_filter
func go_filter(tag *uint8, tag_len uint, time_sec uint, time_nsec uint, record *uint8, record_len uint) *uint8 {
        brecord := unsafe.Slice(record, record_len)
        fmt.Println(brecord) // [129 163 108 111 103 164 208 161 195 247]
        var logData map[string]interface{}
        if err := msgpack.Unmarshal(brecord, &logData); err != nil {
                panic(err)
        }
        if logStr, ok := logData["log"].(string); ok {
                fmt.Printf("%v\n", []byte(logStr)) //[208 161 195 247]
        }
        return nil
}

When Event_Format is set to JSON, the byte slice is [208 161 238 131 131].

When Event_Format is set to MessagePack, the byte slice is [208 161 195 247].

Only the byte slice [208 161 195 247] can be successfully transcoded from GBK to UTF-8. I suspect that Fluent Bit might be performing additional processing when Event_Format is set to JSON.

Expected behavior

Screenshots

Event_Format set to JSON

image

Event_Format set to MessagePack
image

Your Environment

  • Version used: 3.1.2
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?):
  • Server type and version: Ubuntu 24.04 LTS
  • Operating System and version:
  • Filters and plugins:

Additional context

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions