Skip to content

Commit 3deeb66

Browse files
committed
completed the chardet project
0 parents  commit 3deeb66

File tree

481 files changed

+121715
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

481 files changed

+121715
-0
lines changed

.github/workflows/go.yml

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# This workflow will build a golang project
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-go
3+
4+
name: Go
5+
6+
on:
7+
push:
8+
branches: [ "main" ]
9+
pull_request:
10+
branches: [ "main" ]
11+
12+
jobs:
13+
14+
build:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Go
20+
uses: actions/setup-go@v4
21+
with:
22+
go-version: '1.23'
23+
24+
- name: Build
25+
run: go build -v ./...
26+
27+
- name: Test
28+
run: go test -v ./...

.github/workflows/golangci-lint.yml

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
name: golangci-lint
2+
on:
3+
push:
4+
branches:
5+
- main
6+
- master
7+
pull_request:
8+
9+
permissions:
10+
contents: read
11+
# Optional: allow read access to pull request. Use with `only-new-issues` option.
12+
# pull-requests: read
13+
14+
jobs:
15+
golangci:
16+
name: lint
17+
runs-on: ubuntu-latest
18+
steps:
19+
- uses: actions/checkout@v4
20+
- uses: actions/setup-go@v5
21+
with:
22+
go-version: 1.23
23+
- name: golangci-lint
24+
uses: golangci/golangci-lint-action@v6
25+
with:
26+
version: v1.60

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.idea

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Wlynxg
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+173
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
<div align=center>
2+
3+
## chardet: Go character encoding detector
4+
[![Go Reference](https://pkg.go.dev/badge/github.com/wlynxg/chardet.svg)](https://pkg.go.dev/github.com/wlynxg/chardet)
5+
[![License](https://img.shields.io/github/license/wlynxg/chardet.svg?style=flat)](https://github.com/wlynxg/chardet)
6+
[![Go Report Card](https://goreportcard.com/badge/github.com/wlynxg/chardet)](https://goreportcard.com/report/github.com/wlynxg/chardet)
7+
8+
</div>
9+
10+
# Introduction
11+
12+
This is a Go port of the python's [chardet](https://github.com/chardet/chardet) library. Much respect and appreciation to the original authors for their excellent work.
13+
14+
chardet is a character encoding detector library written in Go. It helps you automatically detect the character encoding of text content.
15+
16+
# Installation
17+
18+
To install chardet, use `go get`:
19+
20+
```bash
21+
go get github.com/wlynxg/chardet
22+
```
23+
24+
## Supported Encodings & Languages
25+
26+
**Support Encodings**:
27+
28+
<details>
29+
<summary>Expand the list of supported encodings</summary>
30+
31+
- **Ascii**
32+
- **UTF-8**
33+
- **UTF-8-SIG**
34+
- **UTF-16**
35+
- **UTF-16LE**
36+
- **UTF-16BE**
37+
- **UTF-32**
38+
- **UTF-32BE**
39+
- **UTF-32LE**
40+
- **GB2312**
41+
- **HZ-GB-2312**
42+
- **SHIFT_JIS**
43+
- **Big5**
44+
- **Johab**
45+
- **KOI8-R**
46+
- **TIS-620**
47+
- **MacCyrillic**
48+
- **MacRoman**
49+
- **EUC-TW**
50+
- **EUC-KR**
51+
- **EUC-JP**
52+
- **CP932**
53+
- **CP949**
54+
- **Windows-1250**
55+
- **Windows-1251**
56+
- **Windows-1252**
57+
- **Windows-1253**
58+
- **Windows-1254**
59+
- **Windows-1255**
60+
- **Windows-1256**
61+
- **Windows-1257**
62+
- **ISO-8859-1**
63+
- **ISO-8859-2**
64+
- **ISO-8859-5**
65+
- **ISO-8859-6**
66+
- **ISO-8859-7**
67+
- **ISO-8859-8**
68+
- **ISO-8859-9**
69+
- **ISO-8859-13**
70+
- **ISO-2022-CN**
71+
- **ISO-2022-JP**
72+
- **ISO-2022-KR**
73+
- **X-ISO-10646-UCS-4-3412**
74+
- **X-ISO-10646-UCS-4-2143**
75+
- **IBM855**
76+
- **IBM866**
77+
78+
</details>
79+
80+
**Support Languages**:
81+
<details>
82+
<summary>Expand the list of supported languages</summary>
83+
- Chinese
84+
- Japanese
85+
- Korean
86+
- Hebrew
87+
- Russian
88+
- Greek
89+
- Bulgarian
90+
- Thai
91+
- Turkish
92+
93+
</details>
94+
95+
# Usage
96+
97+
## Basic Usage
98+
99+
The simplest way to use chardet is with the `Detect` function:
100+
101+
```go
102+
package main
103+
104+
import (
105+
"fmt"
106+
"github.com/wlynxg/chardet"
107+
)
108+
109+
func main() {
110+
data := []byte("Your text data here...")
111+
result := chardet.Detect(data)
112+
fmt.Printf("Detected result: %+v\n", result)
113+
//Output: Detected result: {Encoding:Ascii Confidence:1 Language:}
114+
}
115+
```
116+
117+
## Advanced Usage
118+
119+
For handling large amounts of text, you can use the detector incrementally. This allows the detector to stop as soon as it reaches sufficient confidence in its result.
120+
```go
121+
package main
122+
123+
import (
124+
"fmt"
125+
"github.com/wlynxg/chardet"
126+
)
127+
128+
func main() {
129+
// Create a detector instance
130+
detector := chardet.NewUniversalDetector(0)
131+
// Process text in chunks
132+
chunk1 := []byte("First chunk of text...")
133+
chunk2 := []byte("Second chunk of text...")
134+
detector.Feed(chunk1)
135+
detector.Feed(chunk2)
136+
// Get the result
137+
result := detector.GetResult()
138+
fmt.Printf("Detected result: %+v\n", result)
139+
// Output: Detected result: {Encoding:Ascii Confidence:1 Language:}
140+
}
141+
```
142+
143+
## Processing Multiple Files
144+
145+
You can reuse the same detector instance for multiple files by using the `Reset()` method:
146+
```go
147+
package main
148+
149+
import (
150+
"fmt"
151+
"os"
152+
"github.com/wlynxg/chardet"
153+
)
154+
155+
func main() {
156+
detector := chardet.NewUniversalDetector(0)
157+
files := []string{"file1.txt", "file2.txt"}
158+
for _, file := range files {
159+
detector.Reset()
160+
data, err := os.ReadFile(file)
161+
if err != nil {
162+
continue
163+
}
164+
detector.Feed(data)
165+
result := detector.GetResult()
166+
fmt.Printf("File %s encoding: %+v\n", file, result)
167+
}
168+
}
169+
```
170+
171+
# License
172+
173+
`chardet` is licensed under the [MIT License](LICENSE), 100% free and open-source, forever.

0 commit comments

Comments
 (0)