Skip to content

Latest commit

 

History

History
 
 

Report

This report shows the latest benchmarks for the performance of the large-records library.

In most of these benchmarks one of the main things we measure is the size of the ghc core. While this is not in and of itself a useful measure, it is a useful proxy for what we are interested in (compilation time and memory usage), and we can measure it much more accurately). When reporting core size, we report the size of the core after desugaring (ds-preopt), after the very simple optimiser (ds), and after the simplifier (simpl).

Note on the use of color: In the plots we intentionally do not use different colors to differentiate between these phases, using color to emphasize the difference between different approaches instead. When a particular phase requires a separate explanation, we will use a separate plot. We will use red (and occassionally blue) for approaches that do not work, and green for the approach that we have adopted.

Compilation time in these benchmarks is as reported by -ddump-timings, compiling with -O0 (the primary goal here is to speed up the development cycle). We measure this without also using any of the core dump options (-ddump-ds-preopt, -ddump-ds, or -ddump-simpl). This is important, because not only do these flags affect compilation time, they may do so non-linearly if part of the tree is never actually forced.

Runtime benchmarks do use -O1, since here we are interested in how the library would behave in a production build.

All measurements are done with ghc 8.8.4 as this is the version of ghc used by the client for whom this work was done (Juspay). The only exception is the NoFieldSelectors experiment, see below.

Benchmarks for large-records

Benchmark: "Before vs After"

This is one of the main benchmarks of the library, and is described in more detail in Avoiding quadratic core code size with large records. It measures the compilation cost of compiling a module containing

  • Record of a certain size
  • Stock-derived Eq and Show instances
  • TH-derived generics-sop Generic instance
  • HasField instance derived by the RecordDotPreprocessor plugin
  • ToJSON instance in terms of gtoJSON

(this is the "Before"), and compares that to a module containing the same ingredients, but with

  • Record representation generated by large-records
  • Eq, Show, Generic (for large-generics generics) and HasField instances all generated by large-records
  • ToJSON instance in terms of the large-generics equivalent of gtoJSON.

(this is "After").

Benchmark: "HigherKinded"

This benchmark is mostly a sanity check: it repeats the "After" benchmark, but using a higher kinded record (that is, a record with a functor parameter, like is common for example in beam applications).

Benchmark: "HasNormalForm"

HasNormalForm is an ingredient in how we deal with generic transforms. We described this only very briefly in the blog post (section [Transforms][blogpost1-transforms]), but discussed it in a bit more detail in the Haskell eXchange presentation about the large-records library Avoiding Quadratic Blow-up During Compilation (section "Avoiding evidence", starts around 41:35).

Although the size of the core before the very simple optimiser runs is still quadratic (and huge!):

the very simple optimiser cleans this up to exactly nothing (this is what I describe in the talk):

Provided that the full tree before the very simple optimiser is never fully forced (and normally it isn't), compilation time is still fine:

Benchmarks for large-anon

Although the focus of these benchmarks is compilation time performance, when comparing with superrecord we will also look at runtime performance. It is expected here that superrecord will outperform large-anon; that is after all the trade-off that we make. However, runtime performance should still be somewhat reasonable.

In order to keep benchmarking time within check, we only show the core size the simplifier (-ddump-simpl); the output of -ddump-ds-preopt and -ddump-ds is similar, so showing all three does not provide further insights, and showing only one significantly reduces benchmarking time. The output of the simplifier is also what continues on through the compilation pipeline, of course.

There is no explicit support for generics in superrecord, so we instead we use translation from and to JSON as a substitute; the way that these are defined in superrecord is somewhat reminiscent of how generic functions are defined in GHC.Generics. To ToJSON benchmark then would be an example of a generic consumer, whereas the ParseJSON benchmark would be an example of a generic producer.

Record construction

This uses the source plugin, but it offers merely syntactic sugar, it does not use any unsafe features from the library; you could instead just write these as repeated inserts. The library does offer experimental integration with typelet, which confirms that the only source of non-linear core size is indeed the lack of type sharing in the type arguments to insert. However, as the benchmarks show, this does not actually result in an improvement in compilation time (at least, not for this benchmark): compilation time is nicely linear without.

Comparison with superrecord

In superrecord there are two APIs for constructing records: a safe one and an unsafe one, which requires a conservative size estimate up-front (specified as an argument to unsafeRNil) and modifies the record inplace, thus requiring very careful use.

The safe API results in such enormous core size and corresponding compilation time that it dwarves everything else in comparison, so we show it in a graph by itself; we only measured core size up to records of size 30 and compilation time up to size 50:

The unsafe API is somewhat more reasonable:

Field access

This extracts half the record fields into a non-record datatype.

Comparison with superrecord

Here is where we see the win from applyDiff (we paid for it during record construction).

Update many fields

This overwrites half the record fields.

Comparison with superrecord

Update single fields

Comparison with superrecord

ToJSON

Comparison with superrecord

ParseJSON

Comparison with superrecord

Experiments

In this section we report on a number of experiments that test various approaches used in the large-records library, independent of the details of the full library. Indeed, these experiments do not use the large-records library at all.

Simple record

This experiment isolates the just cost of having a record with many fields. This cost is due to the record accessors, as discussed in section Warmup: terms of the first blogpost.

Note that this cost is not prevented by the use of NoFieldSelectors: even when using this language pragma, field selectors are still created, they just won't pollute the namespace (they are used for HasField instances).

Pattern synonyms vs NoFieldSelectors

This experiment checks that NoFieldSelectors indeed makes it possible to define record pattern synonyms without incurring a quadratic cost. Since NoFieldSelectors was introduced only in ghc 9.2.1, this is the only experiment ran with that version of ghc (and we currently make no use of this feature).

Superclasses

This is a simple experiment that measures the cost of having many superclasses. This cost turns out to be quadratic due to the record accessors for the type class dictionaries, as explained in section Type class dictionaries of the first blogpost. These accessors are introduced by the simplifier, so we don't see this cost until simpl:

Applicative chains

This experiment is about the cost of the lack of type sharing, as illustrated by chains of calls to the Applicative operator <*>. We discss this in section More subtle: types of the first blog post.

Note: type-level sharing is discussed in more detail in the third blog post on this topic, Type-level sharing in Haskell, now; see also the corresponding section on type-level sharing below.

Induction

The induction experiment compares the cost of instance induction over lists, with the cost of instance induction over trees generated from lists, with or without a phantom argument. This is one of the main topics discussed in the second blog post, Induction without core-size blow-up.

Generic instance for HList

This experiment compares a standard generic-sop generics instance for HList with a large-generics instance using tree induction and phantom contexts, as explored in the induction experiment (both instances hand-written). This verifies that the techniques we explored can in fact be used to construct Generics instances that do not result in quadratic core.

Constraint families

The problem with constraint families was described in the second blog post, Induction without core-size blow-up, a.k.a. Large records: anonymous edition, section Constraint Families.

The difference here comes from introducing an intermediate type class to ensure shallow evaluation. As mentioned in the blog post, this is only visible before the very simple optimiser runs:

After the optimiser the difference between the two approaches disappears:

(This experiment assumes an phantom role assigned to EmptyClass, as discussed in the "Induction" experiment.)

As long as we can depend on the very simple optimiser, the difference is not terribly important:

We nonetheless prefer the shallow approach, in order to limit dependency on the VSO as much as possible.

Pre-evaluation

This experiment assesses the feasability of pre-evaluating types, as described in Postscript: Pre-evaluating type families. For reference we have also included the measurements for the regular induction case (without pre-evaluation, using phantom context). It's clear from these graphs that the difference is negligible, and so we do not use pre-evaluation at all. It's mildly interesting that this is a case where phantom contexts are actually (slightly) worse, however.

Type-level sharing

Type-level sharing is discussed in detail Type-level sharing in Haskell, now. We have two benchmarks here: one for the construction of HLists and one for applicative chains; both are discussed in more detail in the blog post.

Unfortunately, although these benchmarks show that the typelet package can indeed help to construct ghc core of linear size, they do not show an improvement in compilation time, at least for these examples.

HList

For the HList benchmarks we have a baseline (no sharing), using letAs, and using letAs', the CPS form of letAs.

Applicative chains

For the applicative chains benchmarks we only have a baseline (no sharing), and one using let.