Skip to content

Commit

Permalink
V2 API using Go 1.23 iterators (#4)
Browse files Browse the repository at this point in the history
* V2 API, using Go 1.23 iterators.

* Completely new API for tree building.

* Update Readme and CI. Add doc.

* Remove example_test.go (for now). Doc tweaks.

* Golangci-lint does not like a helper function calling TB.Fatal.

* Golangci-lint also does not like a goroutine in a test function calling TB.Fatal (and this time it is probably right).
  • Loading branch information
bobg authored Aug 28, 2024
1 parent a746879 commit a2918d6
Show file tree
Hide file tree
Showing 7 changed files with 334 additions and 504 deletions.
25 changes: 22 additions & 3 deletions .github/workflows/go.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,24 @@ jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3

- name: Set up Go
uses: actions/setup-go@v2
uses: actions/setup-go@v3
with:
go-version: 1.16
go-version: 1.23

- name: golangci-lint
uses: golangci/golangci-lint-action@v6
with:
# Optional: version of golangci-lint to use in form of v1.2 or v1.2.3 or `latest` to use the latest version
version: latest

# Optional: golangci-lint command line arguments.
# args: --issues-exit-code=0

# Optional: show only new issues if it's a pull request. The default value is `false`.
# only-new-issues: true

- name: Unit tests
run: go test -v -coverprofile=cover.out ./...
Expand All @@ -24,3 +36,10 @@ jobs:
uses: shogo82148/actions-goveralls@v1
with:
path-to-profile: cover.out

- name: Modver
if: ${{ github.event_name == 'pull_request' }}
uses: bobg/[email protected]
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
pull_request_url: https://github.com/${{ github.repository }}/pull/${{ github.event.number }}
48 changes: 22 additions & 26 deletions Readme.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Hashsplit - content-based splitting of byte streams

[![Go Reference](https://pkg.go.dev/badge/github.com/bobg/hashsplit.svg)](https://pkg.go.dev/github.com/bobg/hashsplit)
[![Go Report Card](https://goreportcard.com/badge/github.com/bobg/hashsplit)](https://goreportcard.com/report/github.com/bobg/hashsplit)
![Tests](https://github.com/bobg/hashsplit/actions/workflows/go.yml/badge.svg)
[![Coverage Status](https://coveralls.io/repos/github/bobg/hashsplit/badge.svg?branch=master)](https://coveralls.io/github/bobg/hashsplit?branch=master)
[![Go Reference](https://pkg.go.dev/badge/github.com/bobg/hashsplit.svg)](https://pkg.go.dev/github.com/bobg/hashsplit/v2)
[![Go Report Card](https://goreportcard.com/badge/github.com/bobg/hashsplit)](https://goreportcard.com/report/github.com/bobg/hashsplit/v2)
![Tests](https://github.com/bobg/hashsplit/v2/actions/workflows/go.yml/badge.svg)
[![Coverage Status](https://coveralls.io/repos/github/bobg/hashsplit/badge.svg?branch=master)](https://coveralls.io/github/bobg/hashsplit/v2?branch=master)

Hashsplitting is a way of dividing a byte stream into pieces
based on the stream's content rather than on any predetermined chunk size.
As the Splitter reads the stream it maintains a _rolling checksum_ of the last several bytes.
A chunk boundary occurs when the rolling checksum has enough trailing bits set
A chunk boundary occurs when the rolling checksum has enough trailing bits set to zero
(where “enough” is a configurable setting that determines the average chunk size).

## Usage
Expand All @@ -18,18 +18,27 @@ an `io.Reader`,
like this:

```go
err := Split(r, f)
split, errptr := hashsplit.Split(r)
for chunk := range split {
// ...handle the contents of r one chunk at a time...
}
if err := *errptr; err != nil {
// ...handle an error reading from r...
}
```

...where `f` is a `func([]byte, uint) error` that receives each consecutive chunk and its “level”
(which can be thought of as how badly the splitter wanted to make a boundary at the end of the chunk).
These chunks can be arranged in a “hashsplit tree” like this:
Chunks can be arranged in a “hashsplit tree” like this:

```go
var tb TreeBuilder
err := Split(r, tb.Add)
if err != nil { ... }
root, err := tb.Root()
split, errptr := hashsplit.Split(r)
tree := hashsplit.Tree(split)
var root *hashsplit.TreeNode
for node := range tree {
root = node
}
if err := *errptr; err != nil {
// ...handle an error reading from r...
}
```

...and now `root` is the root of a tree whose leaves contain consecutive chunks of the input.
Expand Down Expand Up @@ -64,16 +73,3 @@ More information,
and a proposed standard,
can be found at
[github.com/hashsplit/hashsplit-spec](https://github.com/hashsplit/hashsplit-spec).

## Compatibility note

An earlier version of this package included a Splitter.Split method,
which allowed a Splitter `s` to consume all of the input from an io.Reader `r`.
This has been removed.
The same behavior can be obtained simply by doing:

```go
_, err := io.Copy(s, r)
if err != nil { ... }
err = s.Close()
```
59 changes: 0 additions & 59 deletions example_test.go

This file was deleted.

9 changes: 6 additions & 3 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
module github.com/bobg/hashsplit
module github.com/bobg/hashsplit/v2

go 1.16
go 1.23

require github.com/chmduquesne/rollinghash v4.0.0+incompatible

retract [v1.0.0, v1.0.2]
require (
github.com/bobg/go-generics/v3 v3.7.0 // indirect
github.com/bobg/seqs v1.2.0 // indirect
)
4 changes: 4 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
github.com/bobg/go-generics/v3 v3.7.0 h1:4SJHDWqONTRcA8al6491VW/ys6061bPCcTcI7YnIHPc=
github.com/bobg/go-generics/v3 v3.7.0/go.mod h1:wGlMLQER92clsh3cJoQjbUtUEJ03FoxnGhZjaWhf4fM=
github.com/bobg/seqs v1.2.0 h1:y9SS1zisfkNQepd+xia/3ELDYR4T4vuLTyCA4UVOiKA=
github.com/bobg/seqs v1.2.0/go.mod h1:icgB+vXIoU6s675tLYVAgcUYry1PkYwgEKvzOuFemOk=
github.com/chmduquesne/rollinghash v4.0.0+incompatible h1:hnREQO+DXjqIw3rUTzWN7/+Dpw+N5Um8zpKV0JOEgbo=
github.com/chmduquesne/rollinghash v4.0.0+incompatible/go.mod h1:Uc2I36RRfTAf7Dge82bi3RU0OQUmXT9iweIcPqvr8A0=
Loading

0 comments on commit a2918d6

Please sign in to comment.