Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve YAMLTree performance by not using object_ids #663

Merged
merged 2 commits into from
Dec 18, 2023

Conversation

amomchilov
Copy link
Contributor

@amomchilov amomchilov commented Dec 18, 2023

Since Ruby 2.7, object_id became more expensive, particularly on its first call for a particular object. @jemmaissroff has a great blog post about it.

We can reduce how many objects we assign object IDs to, by using Hash#compare_by_identity in the YAMLTree::Registrar. The semantics will be the same, but now loads/stores into the Hash are faster, and we don't need to trigger lots of object ID assignments.

On a related note, I used assert_same in the tests, where applicable (instead of assert_equal a.object_id, b.object_id).

Is there a benchmark suite I can run to compare before/after performance?

Object IDs became more expensive in Ruby 2.7. Using `Hash#compare_by_identity` let's us get the same effect, without needing to force all these objects to have object_ids assigned to them.
@amomchilov amomchilov changed the title Reduce usage of object_ids and improve Registrar performance Reduce usage of object_ids and improve YAMLTree performance Dec 18, 2023
@amomchilov amomchilov changed the title Reduce usage of object_ids and improve YAMLTree performance YAMLTree performance by not using object_ids Dec 18, 2023
@tenderlove tenderlove merged commit 21f051b into ruby:master Dec 18, 2023
60 checks passed
@tenderlove
Copy link
Member

Great find, thank you!

@amomchilov amomchilov changed the title YAMLTree performance by not using object_ids Improve YAMLTree performance by not using object_ids Dec 18, 2023
@obj_to_id = {}
@obj_to_node = {}
@obj_to_id = {}.compare_by_identity
@obj_to_node = {}.compare_by_identity
@targets = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tenderlove Looking again... is @targets just completely unused? Only ever shoved into, never read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in #665

@amomchilov amomchilov mentioned this pull request Dec 19, 2023
@amomchilov amomchilov deleted the optimize-object_id-use branch December 20, 2023 05:00
@amomchilov
Copy link
Contributor Author

Put this tiny synthetic benchmark together and woah... 7% improvement on net serialization performance!

Comparison:
  After removing @targets array:     3030.9 i/s
After switching to identity set:     2975.9 i/s - same-ish: difference falls within error
             Before all changes:     2828.5 i/s - 1.07x  slower
benchmark.rb
def generate_test_data
  repeated_points = 10.times.map { |i| [i, i] }
  (repeated_points * 2) + 80.times.map { |i| [i, i] }
end

def test_perf
  puts Psych.dump(generate_test_data)

  points = generate_test_data
  Benchmark.ips do |x|
    x.report("             Before all changes") do |times|
      i = 0
      while (i += 1) < times
        Psych.dump(points.map { |p| p.dup })
      end
    end

    x.hold!("psych_benchmark1.json")
    # Then: git checkout 0dc25a9

    x.report("After switching to identity set") do |times|
      i = 0
      while (i += 1) < times
        Psych.dump(points.map { |p| p.dup })
      end
    end

    x.hold!("psych_benchmark2.json")
    # Then: git checkout 6905a2

    x.report("  After removing @targets array") do |times|
      i = 0
      while (i += 1) < times
        Psych.dump(points.map { |p| p.dup })
      end
    end

    x.compare!
  end
end

The difference is emphasized here, I chose objects that do minimal other serialization (e.g. simple arrays, no custom class with long names, and cheap field values).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants