-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
limit validation errors or spit out only first one #92
Comments
Glad to hear you like this library! For the problem you encountered, if possible I would love to add the capability of limiting the validations as a first class option. However, this may involve API signature change and the implementation is also not trivial because the validation is done recursively. So I'd like to limit the support to the scenario that is most likely causing the problem you described. Can we say that the problem mainly occurs when validating a big array/slice or map? And does your problem only occur when using |
Yeah, well we've only started to use ozzo recently and the problem for now only occured with large slices . I guess in theory it's possible for this to also occur with large nested structs, but not as likely as with slices. I think enhancing Btw what do you think of the API for the enhanced |
actually just had a second look at the code and realized that just introducing a variation of This probably requires some more changes here https://github.com/geekflyer/ozzo-validation/blob/083f0b20911750dfaf3fa0b23b3ef55bd37a3fd9/validation.go#L198 🤔 |
here's a small repro for the problem: package main
import (
"fmt"
validation "github.com/go-ozzo/ozzo-validation/v3"
)
type MyItem struct {
Name string
}
func (mi *MyItem) Validate() error {
return validation.ValidateStruct(mi,
validation.Field(&mi.Name, validation.Required),
)
}
type MyData struct {
Items []*MyItem
}
func (md *MyData) Validate() error {
return validation.ValidateStruct(md,
validation.Field(&md.Items, validation.NotNil, validation.Each(validation.NotNil)),
)
}
func main() {
const numItems = 150000
faultyItem := &MyItem{Name: ""}
items := make([]*MyItem, numItems)
for i := 0; i < numItems; i++ {
items[i] = faultyItem
}
myData := MyData{Items: items}
validationErrors := myData.Validate()
fmt.Println(validationErrors)
} run this via:
This takes on my machine (MBP 2.4 GHz Intel Core i5 quad core, 16GB RAM) about 1 minute and 45 seconds to run. |
actually it turns out most of the memory and CPU is actually wasted when turning large validation errors into a string due to the way the validation error message is constructed. I just rewrote the // Error returns the error string of Errors.
func (es Errors) Error() string {
if len(es) == 0 {
return ""
}
keys := []string{}
for key := range es {
keys = append(keys, key)
}
sort.Strings(keys)
var stringBuilder strings.Builder
for i, key := range keys {
if i > 0 {
stringBuilder.WriteString("; ")
}
if errs, ok := es[key].(Errors); ok {
fmt.Fprintf(&stringBuilder,"%v: (%v)", key, errs)
} else {
fmt.Fprintf(&stringBuilder, "%v: %v", key, es[key].Error())
}
}
stringBuilder.WriteString(".")
return stringBuilder.String()
} The error message is unfortunately still not very useful and a lot of time is seemingly still spend on writing to error message to stdout. Here's a small subset of the error message:
That being said, if it's not easy to limit the creation of validation errors in the first place, maybe another way to tackle this problem is to use a string.Builder as above and also add a way to limit the length of the error message or have some way to recursively filter a validation error to only get / print a subset of the errors? For reference io-ts (validation library for TypeScript) has the concept of |
Thanks for the study! So the performance bottleneck is mainly on the error message formatting, instead of the validation rule execution? Do you have any suggestion on how to limit the length of the error message or some filtering mechanism on Errors? I will take a look at io-ts. FYI, I just checked in the optimization that you suggested above. Thanks! |
Hi, yes the performance bottleneck is mainly in the error message formatting, however I think if one has even bigger slices (let's say 1 million items with 1 million errors) the error creation itself becomes also quite expensive and take multiple seconds. Regarding the error messages. Here's some example I came up with: package main
import (
"fmt"
"sort"
"strconv"
"strings"
validation "github.com/go-ozzo/ozzo-validation/v3"
)
type MyItem struct {
Name string `json:"name"`
}
func (mi *MyItem) Validate() error {
return validation.ValidateStruct(mi,
validation.Field(&mi.Name, validation.Required),
)
}
type MyData struct {
Items []*MyItem `json:"items"`
}
func (md *MyData) Validate() error {
return validation.ValidateStruct(md,
validation.Field(&md.Items, validation.NotNil, validation.Each(validation.NotNil)),
)
}
func main() {
const numItems = 150000
faultyItem := &MyItem{Name: ""}
items := make([]*MyItem, numItems)
for i := 0; i < numItems; i++ {
items[i] = faultyItem
}
myData := MyData{Items: items}
validationErrors := myData.Validate()
if validationErrors != nil {
fmt.Println(FirstErrorJSONPathReporter(validationErrors))
}
}
type ValidationErrorReporter func(validationErrors error) string
func FirstErrorJSONPathReporter(validationError error) string {
var pathParts []string
var getFirstLeafErrorMessage func(nestedError error) string
getFirstLeafErrorMessage = func(nestedError error) string {
if innerErrors, ok := nestedError.(validation.Errors); ok {
var keys []string
for key := range innerErrors {
keys = append(keys, key)
}
sort.Strings(keys)
for _, key := range keys {
if innerError := innerErrors[key]; innerError != nil {
pathParts = append(pathParts, key)
return getFirstLeafErrorMessage(innerError)
}
}
}
return nestedError.Error()
}
leafErrorMsg := getFirstLeafErrorMessage(validationError)
// if there's no error nesting, simply return the first error part without any fancy formatting
if len(pathParts) == 0 {
return leafErrorMsg
}
var jsonPath strings.Builder
for i, pathPart := range pathParts {
if _, err := strconv.Atoi(pathPart); err == nil {
// path parts which can be converted to an integer are wrapped with [ ] to be json-path compliant.
fmt.Fprintf(&jsonPath, "[%v]", pathPart)
} else if i == 0 {
fmt.Fprintf(&jsonPath, "%v", pathPart)
} else {
fmt.Fprintf(&jsonPath, ".%v", pathPart)
}
}
finalMessage := jsonPath.String() + fmt.Sprintf(": %v", leafErrorMsg)
return finalMessage
} As you can see I defined the type Now In this particular implementation it does a DFS (easier to implement than BFS) and reports the first found leaf error together with it's json-path formatted location in the nested struct. Concretely instead of printing a super large error message for the 150k validation errors it will print instead simply this:
So altogether it would be nice to have a reporter type / interface and a few standard reporters like this one in ozzo :) |
Hey, is there a good solution to this problem now? |
Hi,
first of all thanks for this fabulous library. Much nicer to use than struct tags!
Unfortunately I encountered a memory and CPU problem when using ozzo to validate large structs or slices.
The problem stems from the fact that ozzo spits out all validation errors, which can be a lot in a large map or slice.
In particular we ran into an issue in production, where we were using ozzo to validate json payloads of a REST API. The expected payload in this API is a nested struct which contains slices typically containing more than 100,000 items. If the caller of this API sends a payload where each array item has a validation error (even if it's practically the same error for each item), ozzo will create 100,000 validation errors. Since each validation error in ozzo is a map, it means ozzo creates 100,000 maps which use quite a lot of memory and also spit out rather noisy and unreadable, super long validation error messages. Since validation errors are typically of very ephemeral nature, the just-created-errors will become very soon eligible for garbage collection.
In our case this meant suddenly our app's CPU usage pegged to 100% and most of it was spent on creating validation errors and garbage collecting them afterwards.
This actually caused some semi outage in one of our production apps after introducing ozzo.
In summary, it would be nice if ozzo has a way to limit the creation of validation errors (i.e. max 10) or some sort of short circuiting once the first validation error in a structure was encountered.
My PR #93 introduces a new rule
EachWithFirstErrorOnly
which introduces this capability for large maps and slices, but I think having the capability to limit validation errors a first class option in ozzo (instead of just a rule variation) would be better.Thanks.
The text was updated successfully, but these errors were encountered: