Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dispatch performance regression #55807

Open
charleskawczynski opened this issue Sep 18, 2024 · 3 comments
Open

Dispatch performance regression #55807

charleskawczynski opened this issue Sep 18, 2024 · 3 comments
Labels
compiler:latency Compiler latency domain:types and dispatch Types, subtyping and method dispatch performance Must go faster regression 1.9 Regression in the 1.9 release

Comments

@charleskawczynski
Copy link
Contributor

I think I found a regression with compiling methods with increasing type information (cc @gbaraldi). Here is a reproducer:

Base.@kwdef struct Nested{A,B}
    num::Int = 1
end
nest_val(na, nb, ::Val{1}) = Nested{na, nb}()
nest_val(na, nb, ::Val{n}) where {n} = nest_val(Nested{na, nb}, Nested{na, nb}, Val(n-1))
nest_val(na, nb, n::Int) = nest_val(na, nb, Val(n))
nest_val(n) = nest_val(1, 1, n)
foo(t::Nested) = 1
for i in 1:25
    let i=i
        NV = nest_val(i)
        @time begin
            foo(NV)
        end
    end
end
Julia 1.8 🚀
  0.000003 seconds
  0.000018 seconds (355 allocations: 24.688 KiB, 75.77% compilation time)
  0.000018 seconds (355 allocations: 24.688 KiB, 76.19% compilation time)
  0.000015 seconds (355 allocations: 24.688 KiB, 76.97% compilation time)
  0.000015 seconds (355 allocations: 24.688 KiB, 77.44% compilation time)
  0.000017 seconds (355 allocations: 24.688 KiB, 80.15% compilation time)
  0.000021 seconds (355 allocations: 24.688 KiB, 76.09% compilation time)
  0.000017 seconds (356 allocations: 24.797 KiB, 75.76% compilation time)
  0.000017 seconds (355 allocations: 24.688 KiB, 78.70% compilation time)
  0.000015 seconds (355 allocations: 24.688 KiB, 76.84% compilation time)
  0.000015 seconds (355 allocations: 24.688 KiB, 73.59% compilation time)
  0.000019 seconds (356 allocations: 24.844 KiB, 69.89% compilation time)
  0.000020 seconds (355 allocations: 24.688 KiB, 59.46% compilation time)
  0.000017 seconds (355 allocations: 24.688 KiB, 79.65% compilation time)
  0.000017 seconds (355 allocations: 24.688 KiB, 77.86% compilation time)
  0.000015 seconds (355 allocations: 24.688 KiB, 75.55% compilation time)
  0.000020 seconds (356 allocations: 25.250 KiB, 64.27% compilation time)
  0.000015 seconds (356 allocations: 24.922 KiB, 77.83% compilation time)
  0.000015 seconds (355 allocations: 24.688 KiB, 76.04% compilation time)
  0.000017 seconds (355 allocations: 24.688 KiB, 80.88% compilation time)
  0.000023 seconds (355 allocations: 24.688 KiB, 66.85% compilation time)
  0.000015 seconds (355 allocations: 24.688 KiB, 76.05% compilation time)
  0.000019 seconds (355 allocations: 24.688 KiB, 80.83% compilation time)
  0.000016 seconds (355 allocations: 24.688 KiB, 73.39% compilation time)
  0.000015 seconds (355 allocations: 24.688 KiB, 72.33% compilation time)
Julia 1.9 🐢
  0.000002 seconds
  0.000199 seconds (323 allocations: 23.266 KiB, 18.01% compilation time)
  0.000029 seconds (323 allocations: 23.266 KiB, 81.77% compilation time)
  0.000022 seconds (323 allocations: 23.266 KiB, 81.52% compilation time)
  0.000031 seconds (323 allocations: 23.266 KiB, 88.58% compilation time)
  0.000029 seconds (323 allocations: 23.266 KiB, 85.20% compilation time)
  0.000036 seconds (323 allocations: 23.266 KiB, 59.35% compilation time)
  0.000025 seconds (324 allocations: 23.375 KiB, 78.60% compilation time)
  0.000105 seconds (323 allocations: 23.266 KiB, 58.74% compilation time)
  0.000036 seconds (323 allocations: 23.266 KiB, 69.38% compilation time)
  0.000045 seconds (323 allocations: 23.266 KiB, 41.84% compilation time)
  0.000083 seconds (324 allocations: 23.422 KiB, 57.75% compilation time)
  0.000098 seconds (323 allocations: 23.266 KiB, 24.15% compilation time)
  0.000127 seconds (323 allocations: 23.266 KiB, 12.59% compilation time)
  0.000247 seconds (323 allocations: 23.266 KiB, 11.17% compilation time)
  0.000465 seconds (323 allocations: 23.266 KiB, 5.23% compilation time)
  0.000932 seconds (324 allocations: 23.828 KiB, 2.04% compilation time)
  0.001802 seconds (324 allocations: 23.500 KiB, 2.74% compilation time)
  0.003669 seconds (323 allocations: 23.266 KiB, 1.76% compilation time)
  0.007252 seconds (323 allocations: 23.266 KiB, 0.56% compilation time)
  0.013877 seconds (323 allocations: 23.266 KiB, 0.42% compilation time)
  0.029442 seconds (323 allocations: 23.266 KiB, 0.22% compilation time)
  0.056137 seconds (323 allocations: 23.266 KiB, 0.24% compilation time)
  0.112961 seconds (323 allocations: 23.266 KiB, 0.09% compilation time)
  0.225148 seconds (325 allocations: 24.453 KiB, 0.05% compilation time)
Julia 1.10 🐢
  0.000002 seconds
  0.000042 seconds (312 allocations: 23.031 KiB, 69.04% compilation time)
  0.000070 seconds (312 allocations: 23.031 KiB, 54.46% compilation time)
  0.000062 seconds (312 allocations: 23.031 KiB, 70.45% compilation time)
  0.000033 seconds (312 allocations: 23.031 KiB, 87.13% compilation time)
  0.000033 seconds (313 allocations: 23.125 KiB, 76.04% compilation time)
  0.000035 seconds (312 allocations: 23.031 KiB, 84.02% compilation time)
  0.000026 seconds (312 allocations: 23.031 KiB, 64.77% compilation time)
  0.000034 seconds (313 allocations: 23.172 KiB, 56.66% compilation time)
  0.000032 seconds (312 allocations: 23.031 KiB, 61.55% compilation time)
  0.000049 seconds (312 allocations: 23.031 KiB, 55.45% compilation time)
  0.000077 seconds (312 allocations: 23.031 KiB, 41.63% compilation time)
  0.000085 seconds (312 allocations: 23.031 KiB, 28.54% compilation time)
  0.000136 seconds (313 allocations: 23.219 KiB, 12.19% compilation time)
  0.000272 seconds (312 allocations: 23.031 KiB, 8.77% compilation time)
  0.000518 seconds (312 allocations: 23.031 KiB, 4.98% compilation time)
  0.001071 seconds (313 allocations: 23.594 KiB, 4.32% compilation time)
  0.001984 seconds (312 allocations: 23.031 KiB, 1.38% compilation time)
  0.003681 seconds (312 allocations: 23.031 KiB, 0.69% compilation time)
  0.007402 seconds (312 allocations: 23.031 KiB, 0.33% compilation time)
  0.014631 seconds (313 allocations: 23.312 KiB, 0.11% compilation time)
  0.029882 seconds (312 allocations: 23.031 KiB, 0.10% compilation time)
  0.061312 seconds (312 allocations: 23.031 KiB, 0.06% compilation time)
  0.119165 seconds (312 allocations: 23.031 KiB, 0.03% compilation time)
  0.238003 seconds (312 allocations: 23.031 KiB, 0.02% compilation time)
Julia 1.11.0-rc3 🐢
  0.000003 seconds
  0.000018 seconds (105 allocations: 4.828 KiB, 53.09% compilation time)
  0.000037 seconds (105 allocations: 4.828 KiB, 59.62% compilation time)
  0.000020 seconds (105 allocations: 4.828 KiB, 43.18% compilation time)
  0.000016 seconds (105 allocations: 4.828 KiB, 48.21% compilation time)
  0.000016 seconds (106 allocations: 4.922 KiB, 62.70% compilation time)
  0.000022 seconds (105 allocations: 4.828 KiB, 60.64% compilation time)
  0.000016 seconds (105 allocations: 4.828 KiB, 40.52% compilation time)
  0.000016 seconds (106 allocations: 4.969 KiB, 34.64% compilation time)
  0.000023 seconds (105 allocations: 4.828 KiB, 33.10% compilation time)
  0.000026 seconds (105 allocations: 4.828 KiB, 22.79% compilation time)
  0.000057 seconds (105 allocations: 4.828 KiB, 28.88% compilation time)
  0.000078 seconds (105 allocations: 4.828 KiB, 12.46% compilation time)
  0.000126 seconds (106 allocations: 5.016 KiB, 4.88% compilation time)
  0.000252 seconds (105 allocations: 4.828 KiB, 5.63% compilation time)
  0.000459 seconds (105 allocations: 4.828 KiB, 1.83% compilation time)
  0.000896 seconds (106 allocations: 5.359 KiB, 1.23% compilation time)
  0.001794 seconds (105 allocations: 4.828 KiB, 0.70% compilation time)
  0.003549 seconds (105 allocations: 4.828 KiB, 0.26% compilation time)
  0.007310 seconds (105 allocations: 4.828 KiB, 0.29% compilation time)
  0.014699 seconds (106 allocations: 5.109 KiB, 0.14% compilation time)
  0.028997 seconds (105 allocations: 4.828 KiB, 0.11% compilation time)
  0.058824 seconds (105 allocations: 4.828 KiB, 0.07% compilation time)
  0.116585 seconds (105 allocations: 4.828 KiB, 0.03% compilation time)
  0.233782 seconds (105 allocations: 4.828 KiB, 0.04% compilation time)
@gbaraldi
Copy link
Member

gbaraldi commented Sep 18, 2024

Profiling this shows that we spend all the time in may_contain_union_decision. I wonder if because nest_val makes so many types a query here gets more expensive?
We spend it in.

if (may_contain_union_decision(param, e, log))

image

@gbaraldi gbaraldi changed the title Compilation regression Dispatch performance regression Sep 18, 2024
@charleskawczynski
Copy link
Contributor Author

charleskawczynski commented Sep 18, 2024

I don't know if it's helpful, but maybe it's useful to print both timings:

Base.@kwdef struct Nested{A,B}
    num::Int = 1
end
nest_val(na, nb, ::Val{1}) = Nested{na, nb}()
nest_val(na, nb, ::Val{n}) where {n} = nest_val(Nested{na, nb}, Nested{na, nb}, Val(n-1))
nest_val(na, nb, n::Int) = nest_val(na, nb, Val(n))
nest_val(n) = nest_val(1, 1, n)
foo(t::Nested) = 1
for i in 1:20
    let i=i
        local NV
        ts = @elapsed begin
            NV = nest_val(i)
        end
        tc = @elapsed begin
            foo(NV)
        end
        println("make struct, compile foo ($ts, $tc)")
    end
end

Which gives:

Julia 1.8
make struct, compile foo (7.166e-6, 6.67e-7)
make struct, compile foo (0.004022666, 1.6542e-5)
make struct, compile foo (0.002778625, 1.5167e-5)
make struct, compile foo (0.003137792, 1.5666e-5)
make struct, compile foo (0.003960458, 1.7083e-5)
make struct, compile foo (0.00359825, 1.5542e-5)
make struct, compile foo (0.005652, 1.6833e-5)
make struct, compile foo (0.003164166, 1.9875e-5)
make struct, compile foo (0.00294875, 1.9958e-5)
make struct, compile foo (0.003155875, 1.5875e-5)
make struct, compile foo (0.00335075, 1.6959e-5)
make struct, compile foo (0.004950833, 1.3625e-5)
make struct, compile foo (0.002843458, 1.5e-5)
make struct, compile foo (0.003530583, 2.5416e-5)
make struct, compile foo (0.003686625, 2.2625e-5)
make struct, compile foo (0.004032625, 2.0875e-5)
make struct, compile foo (0.005220416, 1.7792e-5)
make struct, compile foo (0.002873875, 1.4125e-5)
make struct, compile foo (0.003001292, 1.8084e-5)
make struct, compile foo (0.003124833, 1.65e-5)
make struct, compile foo (0.003397416, 1.5667e-5)
make struct, compile foo (0.00493975, 1.7458e-5)
make struct, compile foo (0.002824375, 1.6208e-5)
make struct, compile foo (0.004336292, 3.1959e-5)
make struct, compile foo (0.003701584, 1.9792e-5)
Julia 1.9
make struct, compile foo (8.583e-6, 5.42e-7)
make struct, compile foo (0.002410334, 2.0834e-5)
make struct, compile foo (0.002503166, 2.1375e-5)
make struct, compile foo (0.004724167, 3.1166e-5)
make struct, compile foo (0.002950084, 3.0709e-5)
make struct, compile foo (0.00239775, 2.1416e-5)
make struct, compile foo (0.002397084, 2.0291e-5)
make struct, compile foo (0.002525083, 2.25e-5)
make struct, compile foo (0.004604209, 2.1875e-5)
make struct, compile foo (0.002201084, 2.775e-5)
make struct, compile foo (0.002346083, 3.2833e-5)
make struct, compile foo (0.002791917, 7.5792e-5)
make struct, compile foo (0.003193125, 8.1958e-5)
make struct, compile foo (0.005648834, 0.000127708)
make struct, compile foo (0.003473291, 0.000329667)
make struct, compile foo (0.00532975, 0.000477834)
make struct, compile foo (0.007331708, 0.000896208)
make struct, compile foo (0.011776, 0.001789083)
make struct, compile foo (0.022942209, 0.003608)
make struct, compile foo (0.037886375, 0.007364625)
make struct, compile foo (0.074177125, 0.014116542)
make struct, compile foo (0.146975333, 0.029440667)
make struct, compile foo (0.297015334, 0.056496375)
make struct, compile foo (0.573344292, 0.116529041)
make struct, compile foo (1.156415666, 0.233382625)
Julia 1.10
make struct, compile foo (4.667e-6, 5.83e-7)
make struct, compile foo (0.004977791, 3.7333e-5)
make struct, compile foo (0.003883417, 1.6458e-5)
make struct, compile foo (0.006146209, 1.85e-5)
make struct, compile foo (0.005635083, 2.1875e-5)
make struct, compile foo (0.0058935, 1.8042e-5)
make struct, compile foo (0.005211708, 2.0125e-5)
make struct, compile foo (0.005305292, 1.8875e-5)
make struct, compile foo (0.005589875, 2.1875e-5)
make struct, compile foo (0.005397459, 2.4875e-5)
make struct, compile foo (0.007137834, 3.9458e-5)
make struct, compile foo (0.00608025, 4.6917e-5)
make struct, compile foo (0.006199792, 7.4875e-5)
make struct, compile foo (0.007023333, 0.000131583)
make struct, compile foo (0.0087665, 0.000257583)
make struct, compile foo (0.013760625, 0.000475)
make struct, compile foo (0.019923625, 0.001033375)
make struct, compile foo (0.035313041, 0.001937458)
make struct, compile foo (0.064332625, 0.003830042)
make struct, compile foo (0.11993025, 0.007522041)
make struct, compile foo (0.237543333, 0.014965458)
make struct, compile foo (0.462009541, 0.029582917)
make struct, compile foo (0.92686225, 0.058957541)
make struct, compile foo (1.849894875, 0.119395083)
make struct, compile foo (3.674977167, 0.235317583)
Julia 1.11.0-rc3
make struct, compile foo (6.875e-6, 1.084e-6)
make struct, compile foo (0.006509375, 1.2875e-5)
make struct, compile foo (0.004416625, 1.0791e-5)
make struct, compile foo (0.006226084, 9.958e-6)
make struct, compile foo (0.00611975, 2.2e-5)
make struct, compile foo (0.006635584, 1.5708e-5)
make struct, compile foo (0.00617275, 1.1041e-5)
make struct, compile foo (0.007417, 2.2042e-5)
make struct, compile foo (0.00666675, 3.1625e-5)
make struct, compile foo (0.00691325, 1.75e-5)
make struct, compile foo (0.006277, 2.2708e-5)
make struct, compile foo (0.00674025, 4.1625e-5)
make struct, compile foo (0.00696, 6.5792e-5)
make struct, compile foo (0.009641459, 0.000124958)
make struct, compile foo (0.009864834, 0.000233292)
make struct, compile foo (0.013464583, 0.000479917)
make struct, compile foo (0.021435375, 0.00090875)
make struct, compile foo (0.035556, 0.001867167)
make struct, compile foo (0.065938209, 0.003510917)
make struct, compile foo (0.121968083, 0.007109292)
make struct, compile foo (0.240853833, 0.014498833)
make struct, compile foo (0.479468625, 0.02824475)
make struct, compile foo (0.945538042, 0.0569535)
make struct, compile foo (1.917964208, 0.115457167)
make struct, compile foo (3.767460916, 0.223717334)

@charleskawczynski
Copy link
Contributor Author

FWIW, the regression still exists but is not nearly as severe when Nested is a singleton (Base.@kwdef struct Nested{A,B} end). (Suggested by @dennisYatunin)

@nsajko nsajko added performance Must go faster domain:types and dispatch Types, subtyping and method dispatch regression 1.9 Regression in the 1.9 release compiler:latency Compiler latency labels Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency domain:types and dispatch Types, subtyping and method dispatch performance Must go faster regression 1.9 Regression in the 1.9 release
Projects
None yet
Development

No branches or pull requests

3 participants