Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New command: ref-explore #33

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
## HEAD

- New command `heapy ref-explore` (https://github.com/zombocom/heapy/pull/33)

## 0.2.0

- Heapy::Alive is removed (https://github.com/schneems/heapy/pull/27)
Expand Down
68 changes: 68 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,74 @@ $ heapy read tmp/2015-10-01T10:18:59-05:00-heap.dump all

You can also use T-Lo's online JS based [Heap Analyzer](http://tenderlove.github.io/heap-analyzer/) for visualizations. Another tool is [HARB](https://github.com/csfrancis/harb)

### Following references to an object

Using the methods above you might find out that certain kinds of objects are retained for many generations, but you might still not know what keeps them retained.

For this purpose you can use `heapy ref-explore`, which will follow the references to an object until it finds a GC root node. This should give you an
indication, _why_ an object is still retained:

```
$ heapy ref-explore spec/fixtures/dumps/00-heap.dump 0x7fb47763feb0

## Reference chain
<OBJECT ActiveRecord::Attribute::FromDatabase 0x7FB47763FEB0> (allocated at activerecord-4.2.3/lib/active_record/attribute.rb:5)
<HASH 0x7FB474CA1A90> (allocated at lib/active_record/attribute_set/builder.rb:30)
<OBJECT ActiveRecord::LazyAttributeHash 0x7FB474CA1B30> (allocated at lib/active_record/attribute_set/builder.rb:16)
<OBJECT ActiveRecord::AttributeSet 0x7FB474CA1A68> (allocated at lib/active_record/attribute_set/builder.rb:17)
<OBJECT Repo 0x7FB474CA1A40> (allocated at activerecord-4.2.3/lib/active_record/core.rb:114)
<ARRAY 996 items 0x7FB474D790A8> (allocated at activerecord-4.2.3/lib/active_record/querying.rb:50)
<OBJECT Repo::ActiveRecord_Relation 0x7FB476A8BE98> (allocated at lib/active_record/relation/spawn_methods.rb:10)
<OBJECT PagesController 0x7FB476AB25C0> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<HASH 0x7FB4772EAE68> (allocated at rack-1.6.4/lib/rack/mock.rb:92)
<OBJECT ActionDispatch::Request 0x7FB476AB2480> (allocated at actionpack-4.2.3/lib/action_controller/metal.rb:237)
<OBJECT ActionDispatch::Response 0x7FB476AB2458> (allocated at lib/action_controller/metal/rack_delegation.rb:28)
<ROOT machine_context 0x2> (allocated at )

## All references to 0x7fb47763feb0
* <HASH 0x7FB474CA1A90> (allocated at lib/active_record/attribute_set/builder.rb:30)
```

#### Obtaining object addresses for inspection

Heapy does not _yet_ include a way to obtain suitable addresses for further inspection. You might work around this using `grep`. Assuming you are
looking for a string in generation 35 of your dump, you can filter like this:

```
grep "generation\":35" spec/fixtures/dumps/00-heap.dump | grep STRING
```

You can then try any of the addresses returned in the result.

#### Interactive mode

Loading a larger heap dump for reference exploration might take some time and you might want to try more than one object address to see if they all share the same path to a root node. When called without an address, `ref-explore` will enter interactive mode, where you can enter an address, see the result and then enter the next address until you quit (Ctrl+C):

```
heapy ref-explore spec/fixtures/dumps/00-heap.dump
Enter address > 0xdeadbeef

Could not find a reference chain leading to a root node. Searching for a non-specific chain now.

## Reference chain

## All references to 0xdeadbeef

Enter address > 0x7fb47763df70

## Reference chain
<STRING 0x7FB47763DF70> (allocated at lib/active_record/type/string.rb:35)
<OBJECT ActiveRecord::Attribute::FromDatabase 0x7FB47763DF98> (allocated at activerecord-4.2.3/lib/active_record/attribute.rb:5)
--- shortened for documentation purposes ---
<OBJECT ActionDispatch::Response 0x7FB476AB2458> (allocated at lib/action_controller/metal/rack_delegation.rb:28)
<ROOT machine_context 0x2> (allocated at )

## All references to 0x7fb47763df70
* <OBJECT ActiveRecord::Attribute::FromDatabase 0x7FB47763DF98> (allocated at activerecord-4.2.3/lib/active_record/attribute.rb:5)

Enter address >
```

## Development

After checking out the repo, run `$ bundle install` to install dependencies. Then, run `rake spec` to run the tests.
Expand Down
28 changes: 28 additions & 0 deletions lib/heapy.rb
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,33 @@ def diff(before, after, retained = nil)
Diff.new(before: before, after: after, retained: retained, output_diff: options[:output_diff] || nil).call
end

long_desc <<-DESC
Follows references to given object addresses and prints them as a reference stack. This can for example be useful
if you are wondering why a given object has not been garbage collected.

Run with a list of addresses to get results for reference stacks to all the given addresses

$ heapy ref-explore my.dump 0xabcdef 0xdeadbeef\x5

Run without specifying addresses to get an interactive prompt that asks you to enter one address at a time

$ heapy ref-explore my.dump\x5

DESC
desc "ref-explore <file> [<address>...]", "Follows references to a given object"
def ref_explore(file, *addresses)
explorer = ReferenceExplorer.new(file)
if addresses.any?
explorer.drill_down_list(addresses)
else
begin
explorer.drill_down_interactive
rescue Interrupt
nil
end
end
end

map %w[--version -v] => :version
desc "version", "Show heapy version"
def version
Expand Down Expand Up @@ -103,3 +130,4 @@ def wat

require 'heapy/analyzer'
require 'heapy/diff'
require 'heapy/reference_explorer'
161 changes: 161 additions & 0 deletions lib/heapy/reference_explorer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
require 'json'
require 'readline'
require 'set'

module Heapy

# Follows references to given object addresses and prints
# them as a reference stack.
# Since multiple reference stacks are possible, it will preferably
# try to print a stack that leads to a root node, since reference chains
# leading to a root node will make an object non-collectible by GC.
#
# In case no chain to a root node can be found one possible stack is printed
# as a fallback.
class ReferenceExplorer
def initialize(filename)
@objects = {}
@reverse_references = {}
@virtual_root_address = 0
File.open(filename) do |f|
f.each.with_index do |line, i|
o = JSON.parse(line)
addr = add_object(o)
add_reverse_references(o, addr)
add_class_references(o, addr)
end
end
end

def drill_down_list(addresses)
addresses.each { |addr| drill_down(addr) }
end

def drill_down_interactive
while buf = Readline.readline("Enter address > ", true)
drill_down(buf)
end
end

def drill_down(addr_string)
addr = addr_string.to_i(16)
puts

chain = find_root_chain(addr)
unless chain
puts 'Could not find a reference chain leading to a root node. Searching for a non-specific chain now.'
puts
chain = find_any_chain(addr)
end

puts '## Reference chain'
chain.each do |ref|
puts format_object(ref)
end

puts
puts "## All references to #{addr_string}"
refs = @reverse_references[addr] || []
refs.each do |ref|
puts " * #{format_object(ref)}"
end

puts
end

def inspect
"<ReferenceExplorer #{@objects.size} objects; #{@reverse_references.size} back-refs>"
end

private

def add_object(o)
addr = o['address']&.to_i(16)
if !addr && o['type'] == 'ROOT'
addr = @virtual_root_address
o['name'] ||= o['root']
@virtual_root_address += 1
end

return unless addr

simple_object = o.slice('type', 'file', 'name', 'class', 'length', 'imemo_type')
simple_object['class'] = simple_object['class'].to_i(16) if simple_object.key?('class')
simple_object['file'] = o['file'] + ":#{o['line']}" if o.key?('file') && o.key?('line')

@objects[addr] = simple_object

addr
end

def add_reverse_references(o, addr)
return unless o.key?('references')
o.fetch('references').map { |r| r.to_i(16) }.each do |ref|
(@reverse_references[ref] ||= []) << addr
end
end

# An instance of a class keeps that class marked by the GC.
# This is not directly indicated as a reference in a heap dump,
# so we manually introduce the back-reference.
def add_class_references(o, addr)
return unless o.key?('class')
return if o['type'] == 'IMEMO'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this type filter is not completely correct either, however, as https://github.com/ruby/ruby/blob/d5ef373b1194bac64784ae316d125d7a2cf1988a/gc.c#L7026 can sometimes mark the class depending on the type of IMEMO.

This would still work just fine in newer version of Ruby because the class would appear in the references array if there was indeed a reference.


class_addr = o.fetch('class').to_i(16)
(@reverse_references[class_addr] ||= []) << addr
end

def find_root_chain(addr, known_addresses = Set.new)
known_addresses << addr

return [addr] if addr < @virtual_root_address # assumption: only root objects have smallest possible addresses

references = @reverse_references[addr] || []

references.reject { |a| known_addresses.include?(a) }.each do |ref|
path = find_root_chain(ref, known_addresses)
return [addr] + path if path
end

nil
end

def find_any_chain(addr, known_addresses = Set.new)
known_addresses << addr

references = @reverse_references[addr] || []

next_ref = references.reject { |a| known_addresses.include?(a) }.first
if next_ref
[addr] + find_any_chain(next_ref, known_addresses)
else
[]
end
end

def format_path(path)
return '' unless path

path.split('/').reverse.take(4).reverse.join('/')
end

def format_object(addr)
obj = @objects[addr]
return "<Unknown 0x#{addr.to_s(16)}>" unless obj

desc = if obj['name']
obj['name']
elsif obj['type'] == 'OBJECT'
@objects.dig(obj['class'], 'name')
elsif obj['type'] == 'ARRAY'
"#{obj['length']} items"
Comment on lines +151 to +152

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For IMEMO objects it would be useful to show obj['imemo_type'] here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trusting you here, I blindly added this. The heap dump included in this repo does not seem to contain IMEMOs. Feel free to double check that this is the desired result and maybe share an example how it is rendered :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to work, 'imemo_type' needs to be added here:

      simple_object = o.slice('type', 'file', 'name', 'class', 'length')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 I clearly did not work on this piece of code for some time :D Fixed.

elsif obj['type'] == 'IMEMO'
obj['imemo_type']
end
desc = desc ? " #{desc}" : ''
addr = addr ? " 0x#{addr.to_s(16).upcase}" : ''
"<#{obj['type']}#{desc}#{addr}> (allocated at #{format_path obj['file']})"
end
end
end
Loading