WIP: Memory Usage Improvements (to fix #65) #197

kputnam · 2019-06-16T22:52:13Z

This is still WIP. Hoping for only minimal API changes.

…] calls (GH-65)

kputnam · 2019-06-16T22:56:07Z

Reading an example 4.4MB file in master has peak memory usage of 921.5 MB. With current work in progress, this is down to 782.23 MB. Most of the problem remains in the tokenizer, not the parser, which is a relief.

kputnam · 2019-06-22T09:04:53Z

The tokenization code was the biggest culprit for memory allocation. While there weren't a lot of retained objects that would suggest a leak, the tokenizer allocated so much short-term garbage that the GC struggled to keep up. It also took up a lot of stack frames, because it was written to use recursion rather than imperative looping.

The build for this branch is currently broken. I've made some drastic (but incomplete) changes that should reduce String allocations, along with other changes to further reduce the memory footprint of various classes. Once I have the parts working separately and unit tests written, I'll go about fitting the new tokenizer into the rest of the library. That might result in user-facing API changes, but I'm aiming to minimize that and provide shims where possible.

kputnam · 2019-06-23T00:39:27Z

Here's some preliminary results. The test script reads a file, hands it to the tokenizer, and iterates through each SegmentTok. No higher level parsing is done.

In the output below, the "Finished" is ultimately the important metric to the user. But the tokenizer's contribution to the total memory use is closer to "Finish" - "Post-init". The tokenizer may actually consume less, due to definitions and other classes being lazy-loaded while the tokenizer runs.

Some of the post-init memory could be consumed by reading the input file into memory. That could explain why the larger file has a larger post-init size compared to the smaller file.

master: 450KB input

» gtime -v ./memprof-tk.rb --fast file-s.x12 >/dev/null
Pre-init: 17 MiB
Post-init: 18 MiB
Finish: 44 MiB      <-- 26MB from tokenizer
1.992 seconds       <--
	Command being timed: "./memprof-tk.rb --fast file-s.x12"
	...
	Maximum resident set size (kbytes): 44516

wip: 450KB input

» gtime -v prof/memprof-tk.rb --fast prof/file-s.x12 >/dev/null
Pre-init: 17 MiB
Post-init: 18 MiB
Finish: 29 MiB      <-- 11MB from tokenizer
0.884 seconds       <--
	Command being timed: "prof/memprof-tk.rb --fast prof/file-s.x12"
	...
	Maximum resident set size (kbytes): 29284

The difference is more dramatic on larger files.

master: 4.4MB input

» gtime -v ./memprof-tk.rb --fast file-l.x12 >/dev/null
Pre-init: 17 MiB
Post-init: 22 MiB
Finish: 198 MiB      <-- 176MB from tokenizer
19.207 seconds       <--
	Command being timed: "./memprof-tk.rb --fast file-l.x12"
	...
	Maximum resident set size (kbytes): 199016

wip: 4.4MB input

» gtime -v prof/memprof-tk.rb --fast prof/file-l.x12 >/dev/null
Pre-init: 17 MiB
Post-init: 22 MiB
Finish: 33 MiB      <-- 11MB from tokenizer
7.932 seconds       <--
	Command being timed: "prof/memprof-tk.rb --fast prof/file-l.x12"
	User time (seconds): 7.79
	System time (seconds): 0.21
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.07
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 33700

When I have time, I'd like to plot out the runtime and memory usage as file size increases. One important note is these gains might be diminished when the parser is changed to accommodate the new tokenizer.

There also is further savings by not adding the FunctionalGroupDef to the config. That saves a few MB by not loading all the SegmentDefs and ElementDefs. Doing that, the faster branch runs in 19MB on the small file and 23MB on the larger one. Subtracting the post-init would suggest the tokenizer uses 1MB or less for both small and large files.

…est benchmark

…rk time

- Reader::Tokenizer returns fatal error when zero ISA's found or when an error occurs inside of an ISA..IEA envelope; other errors are not Reader::Result#fatal? - Create Reader::Input to track the position of tokenized elements - Fix subtle bugs in Reader::Pointer and/or Reader::StringPtr - Parser::StateMachine#insert now destructively updates tokenizer state (separators and segment_dict) - Parser::BuilderDsl uses new Reader::StacktracePosition - Parser::DslReader has been merged into Parser::BuilderDsl - Now ElementVal::AN and ElementVal::ID use a StringPtr where possible, but ElementVal::TM, ::DT, ::Nn, ::R all need to "parse" the string so it requires allocating the substring first (TODO)

- Don't use def_delegators on methods called frequently, it allocates *args - Build native extension String.strncmp to compare two substrings, which results in faster StringPtr#== and StringPtr#start_with?

implement zero-allocation lstrip, rstrip, is_control_character_at?(offset), and lstrip_control_characters_offset(offset)

…] char-at-a-time accesses

kputnam added 6 commits June 15, 2019 01:41

WIP: Reduce a benchmark from 4.6GB to 650MB 😅 by eliminating String#[…

6bed953

…] calls (GH-65)

Travis CI no longer supports 2.1.2-4

8b5504b

Use Regexp#match? instead of =~ to avoid memory allocation

901a17b

Fix issue with empty strings in Substring#repro

24ea222

Remove #offset attribute from Position (only line, column, filename)

8f5adc0

Cleanup messes

cac1bd0

kputnam added 4 commits June 16, 2019 21:20

Eliminate 9.9MB of unused MatchData due to =~

70a253b

Eliminate 10.9MB by using a mutable buffer

05ef585

Travis CI's 2.7.0 preview support seems broken

ca4728e

Rewrite tokenization code to minimize String allocations

4452cd9

kputnam changed the title ~~Memory Usage Improvements (to fix #65)~~ WIP: Memory Usage Improvements (to fix #65) Jun 22, 2019

kputnam added 2 commits June 22, 2019 12:28

Housekeeping

f25c82b

Tokenizer works, allocates one 3-byte String per segment

c128060

kputnam added 8 commits June 23, 2019 01:34

Grind away the memory allocations, from 985MB allocated to 877MB in t…

f80b6b7

…est benchmark

Optimize StringPtr#==

dfc5566

Eliminate all calls to StringPtr#reify; benchmark runs slower at 14.9s

9fca60e

Restore one allocation to improve benchmark time to 6.2s

f5614d9

Leave segment_id as a String, not symbol. Benchmark at 6.8s

5dac65c

Put it back, segment_id is a Symbol. No significant change in benchma…

69a77df

…rk time

Housekeeping

de3ae6d

kputnam force-pushed the gh-65 branch 3 times, most recently from 65d8e65 to 2424ce6 Compare July 19, 2019 07:33

Eliminate ~540MB of allocations from benchmark

d06579e

- Don't use def_delegators on methods called frequently, it allocates *args - Build native extension String.strncmp to compare two substrings, which results in faster StringPtr#== and StringPtr#start_with?

kputnam force-pushed the gh-65 branch from 2424ce6 to d06579e Compare July 19, 2019 07:46

Rename native extension to reader/native_ext, strncmp to substr_eq?;

2a61864

implement zero-allocation lstrip, rstrip, is_control_character_at?(offset), and lstrip_control_characters_offset(offset)

Fix build error

6dea808

kputnam force-pushed the gh-65 branch from c236364 to 6dea808 Compare July 22, 2019 06:23

Commit summary forthcoming

99bfef9

kputnam force-pushed the gh-65 branch 6 times, most recently from f747e4b to e64b6d2 Compare September 11, 2019 22:09

Commit summary forthcoming

5230a3b

kputnam force-pushed the gh-65 branch 4 times, most recently from 23db4b5 to 75756c6 Compare September 12, 2019 22:33

kputnam added 2 commits September 13, 2019 02:33

Write specs for memory allocations in StringPtr operations

0aa0cf0

Eliminate implicit Array allocations from frequently-called constructors

6b82c4f

kputnam force-pushed the gh-65 branch from 75756c6 to 6b82c4f Compare September 13, 2019 08:58

kputnam added 3 commits September 14, 2019 10:14

More StringPtr specs

2953357

Housekeeping

704589a

Remove dependency on memory_profiler

7d0a032

kputnam force-pushed the gh-65 branch from 3eb991a to 7ffdfba Compare September 18, 2019 06:50

Reduce benchmark allocations from 372MB to 213MB by rewriting input[n…

3fe1d6b

…] char-at-a-time accesses

kputnam force-pushed the gh-65 branch from 7ffdfba to 3fe1d6b Compare September 18, 2019 07:07

kputnam added 2 commits September 22, 2019 00:51

Write more specs for and tidy up Stupidedi::Reader::Tokenizer

2a69df3

Remove non-control characters from tokens before construction

925c777

kputnam force-pushed the gh-65 branch from 4a86e4f to 2f982b3 Compare September 27, 2019 07:51

Fix syntax warnings and allocation specs

9e91796

kputnam force-pushed the gh-65 branch from 2f982b3 to 9e91796 Compare September 28, 2019 01:40

kputnam added 2 commits September 29, 2019 23:20

Housekeeping

1ee5d59

More housekeeping

1ce9320

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Memory Usage Improvements (to fix #65) #197

WIP: Memory Usage Improvements (to fix #65) #197

kputnam commented Jun 16, 2019

kputnam commented Jun 16, 2019

kputnam commented Jun 22, 2019

kputnam commented Jun 23, 2019 •

edited

Loading

WIP: Memory Usage Improvements (to fix #65) #197

Are you sure you want to change the base?

WIP: Memory Usage Improvements (to fix #65) #197

Conversation

kputnam commented Jun 16, 2019

kputnam commented Jun 16, 2019

kputnam commented Jun 22, 2019

kputnam commented Jun 23, 2019 • edited Loading

kputnam commented Jun 23, 2019 •

edited

Loading