Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit validation errors or spit out only first one #92

Open
geekflyer opened this issue Jan 20, 2020 · 8 comments
Open

limit validation errors or spit out only first one #92

geekflyer opened this issue Jan 20, 2020 · 8 comments

Comments

@geekflyer
Copy link

geekflyer commented Jan 20, 2020

Hi,

first of all thanks for this fabulous library. Much nicer to use than struct tags!

Unfortunately I encountered a memory and CPU problem when using ozzo to validate large structs or slices.
The problem stems from the fact that ozzo spits out all validation errors, which can be a lot in a large map or slice.

In particular we ran into an issue in production, where we were using ozzo to validate json payloads of a REST API. The expected payload in this API is a nested struct which contains slices typically containing more than 100,000 items. If the caller of this API sends a payload where each array item has a validation error (even if it's practically the same error for each item), ozzo will create 100,000 validation errors. Since each validation error in ozzo is a map, it means ozzo creates 100,000 maps which use quite a lot of memory and also spit out rather noisy and unreadable, super long validation error messages. Since validation errors are typically of very ephemeral nature, the just-created-errors will become very soon eligible for garbage collection.
In our case this meant suddenly our app's CPU usage pegged to 100% and most of it was spent on creating validation errors and garbage collecting them afterwards.
This actually caused some semi outage in one of our production apps after introducing ozzo.

In summary, it would be nice if ozzo has a way to limit the creation of validation errors (i.e. max 10) or some sort of short circuiting once the first validation error in a structure was encountered.

My PR #93 introduces a new rule EachWithFirstErrorOnly which introduces this capability for large maps and slices, but I think having the capability to limit validation errors a first class option in ozzo (instead of just a rule variation) would be better.

Thanks.

@geekflyer geekflyer changed the title limit validation errors or spit out only first limit validation errors or spit out only first one Jan 20, 2020
@qiangxue
Copy link
Member

Glad to hear you like this library!

For the problem you encountered, if possible I would love to add the capability of limiting the validations as a first class option. However, this may involve API signature change and the implementation is also not trivial because the validation is done recursively.

So I'd like to limit the support to the scenario that is most likely causing the problem you described. Can we say that the problem mainly occurs when validating a big array/slice or map? And does your problem only occur when using Each?

@geekflyer
Copy link
Author

geekflyer commented Jan 20, 2020

Yeah, well we've only started to use ozzo recently and the problem for now only occured with large slices . I guess in theory it's possible for this to also occur with large nested structs, but not as likely as with slices.

I think enhancing Each as done in my PR solves the problem for now, but maybe long term (i.e. next major of ozzo validation) it may be worth to think about an API change and making limiting errors a first class citizen.

Btw what do you think of the API for the enhanced Each in my PR?
Also I just renamed EachWithFirstErrorOnly to EachUntilFirstError.

@geekflyer
Copy link
Author

geekflyer commented Jan 20, 2020

actually just had a second look at the code and realized that just introducing a variation of Each won't fix my problem since it only short circuits the validation rules passed to validation.EachUntilFirstError(rules...) but not the rules implemented on the slice element type via Validate (which are actually the more expensive ones typically).

This probably requires some more changes here https://github.com/geekflyer/ozzo-validation/blob/083f0b20911750dfaf3fa0b23b3ef55bd37a3fd9/validation.go#L198

🤔

@geekflyer
Copy link
Author

geekflyer commented Jan 20, 2020

here's a small repro for the problem:

package main

import (
	"fmt"

	validation "github.com/go-ozzo/ozzo-validation/v3"
)

type MyItem struct {
	Name string
}

func (mi *MyItem) Validate() error {
	return validation.ValidateStruct(mi,
		validation.Field(&mi.Name, validation.Required),
	)
}

type MyData struct {
	Items []*MyItem
}

func (md *MyData) Validate() error {
	return validation.ValidateStruct(md,
		validation.Field(&md.Items, validation.NotNil, validation.Each(validation.NotNil)),
	)
}

func main() {

	const numItems = 150000

	faultyItem := &MyItem{Name: ""}

	items := make([]*MyItem, numItems)

	for i := 0; i < numItems; i++ {
		items[i] = faultyItem
	}

	myData := MyData{Items: items}

	validationErrors := myData.Validate()

	fmt.Println(validationErrors)
	
}

run this via:

time go run main.go

This takes on my machine (MBP 2.4 GHz Intel Core i5 quad core, 16GB RAM) about 1 minute and 45 seconds to run.

@geekflyer
Copy link
Author

geekflyer commented Jan 20, 2020

actually it turns out most of the memory and CPU is actually wasted when turning large validation errors into a string due to the way the validation error message is constructed.

I just rewrote the validation.Errors.Error() receiver method to use a strings.Builder and now my repro runs much faster (like a few seconds).:

// Error returns the error string of Errors.
func (es Errors) Error() string {

	if len(es) == 0 {
		return ""
	}

	keys := []string{}
	for key := range es {
		keys = append(keys, key)
	}
	sort.Strings(keys)

	var stringBuilder strings.Builder

	for i, key := range keys {
		if i > 0 {
			stringBuilder.WriteString("; ")
		}
		if errs, ok := es[key].(Errors); ok {
			fmt.Fprintf(&stringBuilder,"%v: (%v)", key, errs)
		} else {
			fmt.Fprintf(&stringBuilder, "%v: %v", key, es[key].Error())
		}
	}
	stringBuilder.WriteString(".")
	return stringBuilder.String()
}

The error message is unfortunately still not very useful and a lot of time is seemingly still spend on writing to error message to stdout. Here's a small subset of the error message:

 99972: (Name: cannot be blank.); 99973: (Name: cannot be blank.); 99974: (Name: cannot be blank.); 99975: (Name: cannot be blank.); 99976: (Name: cannot be blank.); 99977: (Name: cannot be blank.); 99978: (Name: cannot be blank.); 99979: (Name: cannot be blank.); 9998: (Name: cannot be blank.); 99980: (Name: cannot be blank.); 99981: (Name: cannot be blank.); 99982: (Name: cannot be blank.); 99983: (Name: cannot be blank.); 99984: (Name: cannot be blank.); 99985: (Name: cannot be blank.); 99986: (Name: cannot be blank.); 99987: (Name: cannot be blank.); 99988: (Name: cannot be blank.); 99989: (Name: cannot be blank.); 9999: (Name: cannot be blank.); 99990: (Name: cannot be blank.); 99991: (Name: cannot be blank.); 99992: (Name: cannot be blank.); 99993: (Name: cannot be blank.); 99994: (Name: cannot be blank.); 99995: (Name: cannot be blank.); 99996: (Name: cannot be blank.); 99997: (Name: cannot be blank.); 99998: (Name: cannot be blank.); 99999: (Name: cannot be blank.).).

That being said, if it's not easy to limit the creation of validation errors in the first place, maybe another way to tackle this problem is to use a string.Builder as above and also add a way to limit the length of the error message or have some way to recursively filter a validation error to only get / print a subset of the errors?

For reference io-ts (validation library for TypeScript) has the concept of reporters which contain the logic on how validation errors are serialized into an error message (https://github.com/gcanti/io-ts#error-reporters). Perhaps having a reporter concept in ozzo where one of the reporters may limit the number of errors being reported would be a way to solve this without changing too much on the API surface.

@qiangxue
Copy link
Member

Thanks for the study! So the performance bottleneck is mainly on the error message formatting, instead of the validation rule execution? Do you have any suggestion on how to limit the length of the error message or some filtering mechanism on Errors? I will take a look at io-ts.

FYI, I just checked in the optimization that you suggested above. Thanks!

@geekflyer
Copy link
Author

geekflyer commented Jan 22, 2020

Hi,

yes the performance bottleneck is mainly in the error message formatting, however I think if one has even bigger slices (let's say 1 million items with 1 million errors) the error creation itself becomes also quite expensive and take multiple seconds.
For now to get this quickly done I guess just fixing the formatting is a step in the right direction.

Regarding the error messages. Here's some example I came up with:

package main

import (
	"fmt"
	"sort"
	"strconv"
	"strings"

	validation "github.com/go-ozzo/ozzo-validation/v3"
)

type MyItem struct {
	Name string `json:"name"`
}

func (mi *MyItem) Validate() error {
	return validation.ValidateStruct(mi,
		validation.Field(&mi.Name, validation.Required),
	)
}

type MyData struct {
	Items []*MyItem `json:"items"`
}

func (md *MyData) Validate() error {
	return validation.ValidateStruct(md,
		validation.Field(&md.Items, validation.NotNil, validation.Each(validation.NotNil)),
	)
}

func main() {
	const numItems = 150000

	faultyItem := &MyItem{Name: ""}

	items := make([]*MyItem, numItems)

	for i := 0; i < numItems; i++ {
		items[i] = faultyItem
	}

	myData := MyData{Items: items}

	validationErrors := myData.Validate()

	if validationErrors != nil {
		fmt.Println(FirstErrorJSONPathReporter(validationErrors))
	}

}

type ValidationErrorReporter func(validationErrors error) string

func FirstErrorJSONPathReporter(validationError error) string {

	var pathParts []string

	var getFirstLeafErrorMessage func(nestedError error) string
	getFirstLeafErrorMessage = func(nestedError error) string {
		if innerErrors, ok := nestedError.(validation.Errors); ok {
			var keys []string
			for key := range innerErrors {
				keys = append(keys, key)
			}
			sort.Strings(keys)
			for _, key := range keys {
				if innerError := innerErrors[key]; innerError != nil {
					pathParts = append(pathParts, key)
					return getFirstLeafErrorMessage(innerError)
				}
			}
		}
		return nestedError.Error()
	}

	leafErrorMsg := getFirstLeafErrorMessage(validationError)

	// if there's no error nesting, simply return the first error part without any fancy formatting
	if len(pathParts) == 0 {
		return leafErrorMsg
	}

	var jsonPath strings.Builder

	for i, pathPart := range pathParts {
		if _, err := strconv.Atoi(pathPart); err == nil {
			// path parts which can be converted to an integer are wrapped with [ ] to be json-path compliant.
			fmt.Fprintf(&jsonPath, "[%v]", pathPart)
		} else if i == 0 {
			fmt.Fprintf(&jsonPath, "%v", pathPart)
		} else {
			fmt.Fprintf(&jsonPath, ".%v", pathPart)
		}
	}

	finalMessage := jsonPath.String() + fmt.Sprintf(": %v", leafErrorMsg)

	return finalMessage
}

As you can see I defined the type type ValidationErrorReporter func(validationErrors error) string . This could also be an interface but you get the idea.

Now FirstErrorJSONPathReporter implements this type and as the name indicates it will report the first error in the validation.Errors, or in other words will turn an instance of validation.Errors into a string.

In this particular implementation it does a DFS (easier to implement than BFS) and reports the first found leaf error together with it's json-path formatted location in the nested struct.

Concretely instead of printing a super large error message for the 150k validation errors it will print instead simply this:

items[0].name: cannot be blank

So altogether it would be nice to have a reporter type / interface and a few standard reporters like this one in ozzo :)

@Flipped199
Copy link

Hey, is there a good solution to this problem now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants