-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rules for BLAS.dot, BLAS.dotc, and BLAS.dotu #739
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportPatch coverage:
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## main #739 +/- ##
==========================================
- Coverage 78.88% 78.50% -0.38%
==========================================
Files 18 19 +1
Lines 8118 8185 +67
==========================================
+ Hits 6404 6426 +22
- Misses 1714 1759 +45
☔ View full report in Codecov by Sentry. |
src/rules/LinearAlgebra/blas.jl
Outdated
end | ||
_, _, Xow, _, Yow = EnzymeRules.overwritten(config) | ||
# copy only the elements we need | ||
Xtape = Xow ? BLAS.blascopy!(n.val, X.val, incx.val, similar(X.val, n.val), 1) : nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes more sense to use the fallback instead of copying? Is that even possible?
How does Enzyme treat complex numbers? e.g. if I wanted to also support |
yup |
|
||
const ConstOrDuplicated{T} = Union{Const{T},Duplicated{T}} | ||
|
||
_safe_similar(x::AbstractArray, n::Integer) = similar(x, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These utilities should be reusable for and greatly simplify the rules for all other Level 1 BLAS functions.
Hi @sethaxen,
but I am lacking the tests for it. |
Sure, I just added some tests for
Yeah, it'd be nice if we could make that comparison when this PR is finished before I tackle BLAS rules for the other functions you listed. |
I just build Enzyme.jl on top of Enzyme with Blas-Tblgen, removing the julia code that calls the Blas Fallback. |
@ZuseZ4 I'm interested to hear more about this tablegen approach. It seems to produce extremely terse expressions of the rules, which would in some ways be preferable to this approach here. But do you have do define both forward- and reverse-mode rules? And I'm a little confused by the existing rules. e.g. axpy!(∂a, X, ∂Y)
axpy!(a, ∂X, ∂Y) and the reverse-mode rule ∂a = dot(X, ∂Y)
axpy!(conj(a), ∂Y, ∂X)
scal!(0, ∂Y) But the tablegen rule looks like neither of these and even uses |
Hi @sethaxen. The incorrect reverse blas rules are mostly due to a design decision of my last approach. If say dot uses axpy for the reverse pass, tablegen checks the number and types of axpy arguments based on the axpy rules, which in turn required me to declare and implement all blas functions used for the reverse axpy pass. That recursively caused a typical "big bang" approach where I needed to handle multiple rules at once. Since I didn't had the tests back then I just ended up using placeholder (/incorrect) reverse rules to compile it and thus never merged it. I do have a better solution for it now, I'll push more of it on Monday |
Ah, that makes sense. It'll be interesting to see how you handle |
It seems the BLAS fallback warnings are raised even if the the BLAS fallbacks are not being hit? |
Yeah the BLAS fallback injection is done prior to any custom rules, so will always occur. However if a custom rule is hit it will use that implementation rather than the injected fallback. |
That is indeed a bit annoying. We do have Blas-Tablegen and Instruction-Tablegen. The second one could handle this, but we will probably develop it further in the future, while Blas-Tablegen hopefully can stay unchanged in the future. So I might actually copy over a bit of the logic, such that we don't introduce a dependency there. |
Y::ConstOrBatchDuplicated{<:Union{Ptr{T},AbstractArray{T}}}, | ||
incy::Const{<:Integer}, | ||
) where {T<:BLAS.$Ttype} | ||
RT <: Const && return func.val(n.val, X.val, incx.val, Y.val, incy.val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason calling the 2-arg dot
, which forwards to the 5-arg dot
, now errors with Const
return type:
using Enzyme, LinearAlgebra
x, y, ∂x, ∂y = ntuple(_ -> randn(5), 4);
autodiff(Forward, BLAS.dot, Duplicated, Duplicated(x, ∂x), Duplicated(y, ∂y)) # fine
autodiff(Forward, BLAS.dot, Duplicated, Const(x), Duplicated(y, ∂y)) # fine
autodiff(Forward, BLAS.dot, Const, Duplicated(x, ∂x), Duplicated(y, ∂y)) # errors, see below
┌ Warning: Using fallback BLAS replacements, performance may be degraded
└ @ Enzyme.Compiler ~/.julia/packages/GPUCompiler/BxfIW/src/utils.jl:56
mod:; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:10:11:12:13"
target triple = "x86_64-linux-gnu"
@_j_str1 = private unnamed_addr constant [11 x i8] c"typeassert\00"
; Function Attrs: noinline nosync readonly
define dso_local fastcc double @julia_dot_2279(i64 signext %0, i64 zeroext %1, i64 signext %2, i64 zeroext %3, i64 signext %4) unnamed_addr #0 !dbg !11 {
top:
%5 = call {}*** @julia.get_pgcstack()
%6 = inttoptr i64 %1 to double*, !dbg !14
%7 = inttoptr i64 %3 to double*, !dbg !14
%8 = sub i64 1, %0, !dbg !14
%9 = icmp sgt i64 %0, 0, !dbg !14
br i1 %9, label %10, label %cblas_ddot64_.exit, !dbg !14
10: ; preds = %top
%11 = icmp sgt i64 %4, 0, !dbg !14
%12 = mul i64 %8, %4, !dbg !14
%13 = select i1 %11, i64 0, i64 %12, !dbg !14
%14 = icmp sgt i64 %2, 0, !dbg !14
%15 = mul i64 %8, %2, !dbg !14
%16 = select i1 %14, i64 0, i64 %15, !dbg !14
br label %17, !dbg !14
17: ; preds = %17, %10
%18 = phi i64 [ 0, %10 ], [ %33, %17 ], !dbg !14
%19 = phi i64 [ %13, %10 ], [ %32, %17 ], !dbg !14
%20 = phi i64 [ %16, %10 ], [ %31, %17 ], !dbg !14
%21 = phi double [ 0.000000e+00, %10 ], [ %30, %17 ], !dbg !14
%22 = shl i64 %20, 32, !dbg !14
%23 = ashr exact i64 %22, 32, !dbg !14
%24 = getelementptr inbounds double, double* %6, i64 %23, !dbg !14
%25 = load double, double* %24, align 8, !dbg !14, !tbaa !15
%26 = shl i64 %19, 32, !dbg !14
%27 = ashr exact i64 %26, 32, !dbg !14
%28 = getelementptr inbounds double, double* %7, i64 %27, !dbg !14
%29 = load double, double* %28, align 8, !dbg !14, !tbaa !15
%30 = call double @llvm.fmuladd.f64(double %25, double %29, double %21) #17, !dbg !14
%31 = add nsw i64 %23, %2, !dbg !14
%32 = add nsw i64 %27, %4, !dbg !14
%33 = add nuw nsw i64 %18, 1, !dbg !14
%34 = icmp eq i64 %33, %0, !dbg !14
br i1 %34, label %cblas_ddot64_.exit, label %17, !dbg !14, !llvm.loop !19
cblas_ddot64_.exit: ; preds = %17, %top
%35 = phi double [ 0.000000e+00, %top ], [ %30, %17 ], !dbg !14
ret double %35, !dbg !14
}
; Function Attrs: nofree readnone
declare {}*** @julia.get_pgcstack() #1
; Function Attrs: inaccessiblememonly allocsize(1)
declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}**, i64, {} addrspace(10)*) #2
; Function Attrs: inaccessiblememonly nofree
declare token @llvm.julia.gc_preserve_begin(...) #3
; Function Attrs: nofree nounwind readnone
declare nonnull {}* @julia.pointer_from_objref({} addrspace(11)*) local_unnamed_addr #4
; Function Attrs: inaccessiblememonly nofree
declare void @llvm.julia.gc_preserve_end(token) #3
; Function Attrs: inaccessiblememonly nofree norecurse nounwind
declare void @julia.write_barrier({} addrspace(10)* readonly, ...) local_unnamed_addr #5
; Function Attrs: nofree
declare nonnull {} addrspace(10)* @ijl_invoke({} addrspace(10)*, {} addrspace(10)** nocapture readonly, i32, {} addrspace(10)*) #6
declare nonnull {} addrspace(10)* @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) local_unnamed_addr #7
; Function Attrs: noreturn
declare void @ijl_throw({} addrspace(12)*) local_unnamed_addr #8
; Function Attrs: nofree norecurse nounwind readnone
declare nonnull {} addrspace(10)* @julia.typeof({} addrspace(10)*) local_unnamed_addr #9
; Function Attrs: noreturn
declare void @ijl_type_error(i8*, {} addrspace(10)*, {} addrspace(12)*) local_unnamed_addr #8
; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare double @llvm.fmuladd.f64(double, double, double) #10
define double @julia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1) local_unnamed_addr #11 !dbg !22 {
entry:
%2 = call {}*** @julia.get_pgcstack()
%3 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !23
%4 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %3 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !23
%5 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %4, i64 0, i32 1, !dbg !23
%6 = load i64, i64 addrspace(11)* %5, align 8, !dbg !23, !range !28, !alias.scope !29, !noalias !32
%7 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !23
%8 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %7 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !23
%9 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 1, !dbg !23
%10 = load i64, i64 addrspace(11)* %9, align 8, !dbg !23, !range !28, !alias.scope !29, !noalias !32
%.not.i = icmp eq i64 %6, %10, !dbg !37
br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !41
L12.i: ; preds = %entry
%current_task15.i = getelementptr inbounds {}**, {}*** %2, i64 -13, !dbg !42
%current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !42
%11 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #18, !dbg !42
%12 = bitcast {} addrspace(10)* %11 to {} addrspace(10)* addrspace(10)*, !dbg !42
%13 = addrspacecast {} addrspace(10)* addrspace(10)* %12 to {} addrspace(10)* addrspace(11)*, !dbg !42
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %13, align 8, !dbg !42, !tbaa !45, !alias.scope !51, !noalias !52
%14 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 1, !dbg !42
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %14, align 8, !dbg !42, !tbaa !45, !alias.scope !51, !noalias !52
%15 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #18, !dbg !42
%16 = bitcast {} addrspace(10)* %15 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !42
%.repack.i = bitcast {} addrspace(10)* %15 to {} addrspace(10)* addrspace(10)*, !dbg !42
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !42, !tbaa !55, !alias.scope !51, !noalias !52
%.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 1, !dbg !42
store i64 %6, i64 addrspace(10)* %.repack7.i, align 8, !dbg !42, !tbaa !55, !alias.scope !51, !noalias !52
%.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 2, !dbg !42
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !42, !tbaa !55, !alias.scope !51, !noalias !52
%.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 3, !dbg !42
store i64 %10, i64 addrspace(10)* %.repack11.i, align 8, !dbg !42, !tbaa !55, !alias.scope !51, !noalias !52
store atomic {} addrspace(10)* %15, {} addrspace(10)* addrspace(11)* %13 release, align 8, !dbg !42, !tbaa !45, !alias.scope !51, !noalias !52
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %15) #17, !dbg !42
%17 = bitcast {} addrspace(10)* %11 to i8 addrspace(10)*, !dbg !42
%18 = addrspacecast i8 addrspace(10)* %17 to i8 addrspace(11)*, !dbg !42
%19 = getelementptr inbounds i8, i8 addrspace(11)* %18, i64 8, !dbg !42
%20 = bitcast i8 addrspace(11)* %19 to {} addrspace(10)* addrspace(11)*, !dbg !42
store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %20 release, align 8, !dbg !42, !tbaa !45, !alias.scope !51, !noalias !52
%21 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %20 acquire, align 8, !dbg !57, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
%22 = addrspacecast {} addrspace(10)* %21 to {} addrspace(11)*, !dbg !69
%.not13.i = icmp eq {} addrspace(11)* %22, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !69
br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !69
L17.i: ; preds = %L12.i
%23 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #18, !dbg !70
%24 = bitcast {} addrspace(10)* %23 to {} addrspace(10)* addrspace(10)*, !dbg !70
store {} addrspace(10)* %11, {} addrspace(10)* addrspace(10)* %24, align 8, !dbg !70, !tbaa !55, !alias.scope !51, !noalias !52
%25 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %23) #19, !dbg !70
%26 = cmpxchg {} addrspace(10)* addrspace(11)* %20, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %25 acq_rel acquire, align 8, !dbg !74, !tbaa !45, !alias.scope !51, !noalias !68
%27 = extractvalue { {} addrspace(10)*, i1 } %26, 0, !dbg !74
%28 = extractvalue { {} addrspace(10)*, i1 } %26, 1, !dbg !74
br i1 %28, label %xchg_wb.i, label %L27.i, !dbg !74
L27.i: ; preds = %L17.i
%29 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %27) #20, !dbg !77
%30 = icmp eq {} addrspace(10)* %29, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !77
br i1 %30, label %L32.i, label %fail.i, !dbg !77
L32.i: ; preds = %xchg_wb.i, %L27.i, %L12.i
%value_phi.i = phi {} addrspace(10)* [ %25, %xchg_wb.i ], [ %21, %L12.i ], [ %27, %L27.i ]
%31 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #18, !dbg !41
%32 = bitcast {} addrspace(10)* %31 to {} addrspace(10)* addrspace(10)*, !dbg !41
store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %32, align 8, !dbg !41, !tbaa !55, !alias.scope !51, !noalias !52
%33 = addrspacecast {} addrspace(10)* %31 to {} addrspace(12)*, !dbg !41
call void @ijl_throw({} addrspace(12)* %33) #21, !dbg !41
unreachable, !dbg !41
xchg_wb.i: ; preds = %L17.i
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %25) #17, !dbg !74
br label %L32.i, !dbg !77
fail.i: ; preds = %L27.i
%34 = addrspacecast {} addrspace(10)* %27 to {} addrspace(12)*, !dbg !77
call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %34) #21, !dbg !77
unreachable, !dbg !77
julia_dot_2276_inner.exit: ; preds = %entry
%35 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, {} addrspace(10)* nonnull %1), !dbg !78
%36 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !79
%37 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %36) #20, !dbg !79
%38 = bitcast {}* %37 to i8**, !dbg !79
%39 = load i8*, i8** %38, align 8, !dbg !79, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
%40 = ptrtoint i8* %39 to i64, !dbg !79
%41 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !79
%42 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %41) #20, !dbg !79
%43 = bitcast {}* %42 to i8**, !dbg !79
%44 = load i8*, i8** %43, align 8, !dbg !79, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
%45 = ptrtoint i8* %44 to i64, !dbg !79
%46 = call fastcc double @julia_dot_2279(i64 signext %6, i64 zeroext %40, i64 noundef signext 1, i64 zeroext %45, i64 noundef signext 1) #16, !dbg !78
call void @llvm.julia.gc_preserve_end(token %35), !dbg !78
ret double %46, !dbg !92
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #12
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #12
; Function Attrs: readnone
declare void @llvm.enzymefakeuse(...) #13
; Function Attrs: mustprogress willreturn
define double @preprocess_julia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1) local_unnamed_addr #14 !dbg !93 {
entry:
%2 = call {}*** @julia.get_pgcstack() #22
%3 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !94
%4 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %3 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !94
%5 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %4, i64 0, i32 1, !dbg !94
%6 = load i64, i64 addrspace(11)* %5, align 8, !dbg !94, !range !28, !alias.scope !29, !noalias !32
%7 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !94
%8 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %7 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !94
%9 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 1, !dbg !94
%10 = load i64, i64 addrspace(11)* %9, align 8, !dbg !94, !range !28, !alias.scope !29, !noalias !32
%.not.i = icmp eq i64 %6, %10, !dbg !97
br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !99
L12.i: ; preds = %entry
%current_task15.i = getelementptr inbounds {}**, {}*** %2, i64 -13, !dbg !100
%current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !100
%11 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #23, !dbg !100
%12 = bitcast {} addrspace(10)* %11 to {} addrspace(10)* addrspace(10)*, !dbg !100
%13 = addrspacecast {} addrspace(10)* addrspace(10)* %12 to {} addrspace(10)* addrspace(11)*, !dbg !100
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %13, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
%14 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 1, !dbg !100
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %14, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
%15 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #23, !dbg !100
%16 = bitcast {} addrspace(10)* %15 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !100
%.repack.i = bitcast {} addrspace(10)* %15 to {} addrspace(10)* addrspace(10)*, !dbg !100
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
%.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 1, !dbg !100
store i64 %6, i64 addrspace(10)* %.repack7.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
%.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 2, !dbg !100
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
%.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 3, !dbg !100
store i64 %10, i64 addrspace(10)* %.repack11.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
store atomic {} addrspace(10)* %15, {} addrspace(10)* addrspace(11)* %13 release, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %15) #24, !dbg !100
%17 = bitcast {} addrspace(10)* %11 to i8 addrspace(10)*, !dbg !100
%18 = addrspacecast i8 addrspace(10)* %17 to i8 addrspace(11)*, !dbg !100
%19 = getelementptr inbounds i8, i8 addrspace(11)* %18, i64 8, !dbg !100
%20 = bitcast i8 addrspace(11)* %19 to {} addrspace(10)* addrspace(11)*, !dbg !100
store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %20 release, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
%21 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %20 acquire, align 8, !dbg !104, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
%22 = addrspacecast {} addrspace(10)* %21 to {} addrspace(11)*, !dbg !108
%.not13.i = icmp eq {} addrspace(11)* %22, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !108
br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !108
L17.i: ; preds = %L12.i
%23 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #23, !dbg !109
%24 = bitcast {} addrspace(10)* %23 to {} addrspace(10)* addrspace(10)*, !dbg !109
store {} addrspace(10)* %11, {} addrspace(10)* addrspace(10)* %24, align 8, !dbg !109, !tbaa !55, !alias.scope !51, !noalias !101
%25 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %23) #25, !dbg !109
%26 = cmpxchg {} addrspace(10)* addrspace(11)* %20, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %25 acq_rel acquire, align 8, !dbg !111, !tbaa !45, !alias.scope !51, !noalias !68
%27 = extractvalue { {} addrspace(10)*, i1 } %26, 0, !dbg !111
%28 = extractvalue { {} addrspace(10)*, i1 } %26, 1, !dbg !111
br i1 %28, label %xchg_wb.i, label %L27.i, !dbg !111
L27.i: ; preds = %L17.i
%29 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %27) #26, !dbg !113
%30 = icmp eq {} addrspace(10)* %29, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !113
br i1 %30, label %L32.i, label %fail.i, !dbg !113
L32.i: ; preds = %xchg_wb.i, %L27.i, %L12.i
%value_phi.i = phi {} addrspace(10)* [ %25, %xchg_wb.i ], [ %21, %L12.i ], [ %27, %L27.i ]
%31 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #23, !dbg !99
%32 = bitcast {} addrspace(10)* %31 to {} addrspace(10)* addrspace(10)*, !dbg !99
store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %32, align 8, !dbg !99, !tbaa !55, !alias.scope !51, !noalias !101
%33 = addrspacecast {} addrspace(10)* %31 to {} addrspace(12)*, !dbg !99
call void @ijl_throw({} addrspace(12)* %33) #27, !dbg !99
unreachable, !dbg !99
xchg_wb.i: ; preds = %L17.i
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %25) #24, !dbg !111
br label %L32.i, !dbg !113
fail.i: ; preds = %L27.i
%34 = addrspacecast {} addrspace(10)* %27 to {} addrspace(12)*, !dbg !113
call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %34) #27, !dbg !113
unreachable, !dbg !113
julia_dot_2276_inner.exit: ; preds = %entry
%35 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, {} addrspace(10)* nonnull %1) #22, !dbg !114
%36 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !115
%37 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %36) #26, !dbg !115
%38 = bitcast {}* %37 to i8**, !dbg !115
%39 = load i8*, i8** %38, align 8, !dbg !115, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
%40 = ptrtoint i8* %39 to i64, !dbg !115
%41 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !115
%42 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %41) #26, !dbg !115
%43 = bitcast {}* %42 to i8**, !dbg !115
%44 = load i8*, i8** %43, align 8, !dbg !115, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
%45 = ptrtoint i8* %44 to i64, !dbg !115
%46 = call fastcc double @julia_dot_2279(i64 signext %6, i64 zeroext %40, i64 noundef signext 1, i64 zeroext %45, i64 noundef signext 1) #28, !dbg !114
call void @llvm.julia.gc_preserve_end(token %35) #22, !dbg !114
ret double %46, !dbg !120
}
; Function Attrs: mustprogress willreturn
define internal void @fwddiffejulia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* %"'", {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1, {} addrspace(10)* %"'1") local_unnamed_addr #14 !dbg !121 {
entry:
%2 = call {}*** @julia.get_pgcstack()
%3 = call {}*** @julia.get_pgcstack()
%4 = call {}*** @julia.get_pgcstack()
%5 = call {}*** @julia.get_pgcstack()
%6 = call {}*** @julia.get_pgcstack()
%7 = call {}*** @julia.get_pgcstack() #22
%8 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !122
%9 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %8 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !122
%10 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %9, i64 0, i32 1, !dbg !122
%11 = load i64, i64 addrspace(11)* %10, align 8, !dbg !122, !range !28, !alias.scope !125, !noalias !128
%12 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !122
%13 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %12 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !122
%14 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %13, i64 0, i32 1, !dbg !122
%15 = load i64, i64 addrspace(11)* %14, align 8, !dbg !122, !range !28, !alias.scope !130, !noalias !133
%.not.i = icmp eq i64 %11, %15, !dbg !135
br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !137
L12.i: ; preds = %entry
%current_task15.i = getelementptr inbounds {}**, {}*** %7, i64 -13, !dbg !138
%current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !138
%16 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #23, !dbg !138
%17 = bitcast {} addrspace(10)* %16 to {} addrspace(10)* addrspace(10)*, !dbg !138
%18 = addrspacecast {} addrspace(10)* addrspace(10)* %17 to {} addrspace(10)* addrspace(11)*, !dbg !138
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %18, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
%19 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 1, !dbg !138
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %19, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
%20 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #23, !dbg !138
%21 = bitcast {} addrspace(10)* %20 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !138
%.repack.i = bitcast {} addrspace(10)* %20 to {} addrspace(10)* addrspace(10)*, !dbg !138
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
%.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 1, !dbg !138
store i64 %11, i64 addrspace(10)* %.repack7.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
%.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 2, !dbg !138
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
%.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 3, !dbg !138
store i64 %15, i64 addrspace(10)* %.repack11.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
store atomic {} addrspace(10)* %20, {} addrspace(10)* addrspace(11)* %18 release, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %16, {} addrspace(10)* nonnull %20) #24, !dbg !138
%22 = bitcast {} addrspace(10)* %16 to i8 addrspace(10)*, !dbg !138
%23 = addrspacecast i8 addrspace(10)* %22 to i8 addrspace(11)*, !dbg !138
%24 = getelementptr inbounds i8, i8 addrspace(11)* %23, i64 8, !dbg !138
%25 = bitcast i8 addrspace(11)* %24 to {} addrspace(10)* addrspace(11)*, !dbg !138
store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %25 release, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
%26 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %25 acquire, align 8, !dbg !142, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
%27 = addrspacecast {} addrspace(10)* %26 to {} addrspace(11)*, !dbg !146
%.not13.i = icmp eq {} addrspace(11)* %27, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !146
br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !146
L17.i: ; preds = %L12.i
%28 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #23, !dbg !147
%29 = bitcast {} addrspace(10)* %28 to {} addrspace(10)* addrspace(10)*, !dbg !147
store {} addrspace(10)* %16, {} addrspace(10)* addrspace(10)* %29, align 8, !dbg !147, !tbaa !55, !alias.scope !51, !noalias !139
%30 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %28) #25, !dbg !147
%31 = cmpxchg {} addrspace(10)* addrspace(11)* %25, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %30 acq_rel acquire, align 8, !dbg !149, !tbaa !45, !alias.scope !51, !noalias !68
%32 = extractvalue { {} addrspace(10)*, i1 } %31, 0, !dbg !149
%33 = extractvalue { {} addrspace(10)*, i1 } %31, 1, !dbg !149
br i1 %33, label %xchg_wb.i, label %L27.i, !dbg !149
L27.i: ; preds = %L17.i
%34 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %32) #26, !dbg !151
%35 = icmp eq {} addrspace(10)* %34, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !151
br i1 %35, label %L32.i, label %fail.i, !dbg !151
L32.i: ; preds = %xchg_wb.i, %L27.i, %L12.i
%value_phi.i = phi {} addrspace(10)* [ %30, %xchg_wb.i ], [ %26, %L12.i ], [ %32, %L27.i ]
%36 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #23, !dbg !137
%37 = bitcast {} addrspace(10)* %36 to {} addrspace(10)* addrspace(10)*, !dbg !137
store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %37, align 8, !dbg !137, !tbaa !55, !alias.scope !51, !noalias !139
%38 = addrspacecast {} addrspace(10)* %36 to {} addrspace(12)*, !dbg !137
call void @ijl_throw({} addrspace(12)* %38) #27, !dbg !137
unreachable, !dbg !137
xchg_wb.i: ; preds = %L17.i
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %16, {} addrspace(10)* nonnull %30) #24, !dbg !149
br label %L32.i, !dbg !151
fail.i: ; preds = %L27.i
%39 = addrspacecast {} addrspace(10)* %32 to {} addrspace(12)*, !dbg !151
call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %39) #27, !dbg !151
unreachable, !dbg !151
julia_dot_2276_inner.exit: ; preds = %entry
%40 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %1, {} addrspace(10)* %"'1"), !dbg !152
%"'ipc" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !153
%41 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !153
%42 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc"), !dbg !153
%_replacementA = phi {}* , !dbg !153
%"'ipc25" = bitcast {}* %42 to i8**, !dbg !153
%_replacementA17 = phi i8** , !dbg !153
%"'ipl" = load i8*, i8** %"'ipc25", align 8, !dbg !153, !tbaa !89, !alias.scope !158, !noalias !159, !nonnull !13
%"'ipc26" = ptrtoint i8* %"'ipl" to i64, !dbg !153
%_replacementA19 = phi i64 , !dbg !153
%"'ipc20" = addrspacecast {} addrspace(10)* %"'1" to {} addrspace(11)*, !dbg !153
%43 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !153
%44 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc20"), !dbg !153
%_replacementA21 = phi {}* , !dbg !153
%"'ipc27" = bitcast {}* %44 to i8**, !dbg !153
%_replacementA22 = phi i8** , !dbg !153
%"'ipl28" = load i8*, i8** %"'ipc27", align 8, !dbg !153, !tbaa !89, !alias.scope !160, !noalias !161, !nonnull !13
%_replacementA23 = phi i8* , !dbg !153
%"'ipc29" = ptrtoint i8* %"'ipl28" to i64, !dbg !153
%_replacementA24 = phi i64 , !dbg !153
%45 = bitcast {}*** %6 to {}**, !dbg !152
%46 = getelementptr inbounds {}*, {}** %45, i64 -13, !dbg !152
%47 = getelementptr inbounds {}*, {}** %46, i64 15, !dbg !152
%48 = bitcast {}** %47 to i8**, !dbg !152
%49 = load i8*, i8** %48, align 8, !dbg !152
%50 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %46, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
%51 = bitcast {} addrspace(10)* %50 to [1 x i64] addrspace(10)*, !dbg !152
%52 = addrspacecast [1 x i64] addrspace(10)* %51 to [1 x i64] addrspace(11)*, !dbg !152
%53 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %52, i64 0, i32 0, !dbg !152
store i64 %11, i64 addrspace(11)* %53, align 8, !dbg !152
%54 = bitcast {}*** %5 to {}**, !dbg !152
%55 = getelementptr inbounds {}*, {}** %54, i64 -13, !dbg !152
%56 = getelementptr inbounds {}*, {}** %55, i64 15, !dbg !152
%57 = bitcast {}** %56 to i8**, !dbg !152
%58 = load i8*, i8** %57, align 8, !dbg !152
%59 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %55, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195885311824 to {}*) to {} addrspace(10)*)), !dbg !152
%60 = bitcast {} addrspace(10)* %59 to [2 x i64] addrspace(10)*, !dbg !152
%61 = addrspacecast [2 x i64] addrspace(10)* %60 to [2 x i64] addrspace(11)*, !dbg !152
%62 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %61, i64 0, i32 0, !dbg !152
store i64 %_replacementA19, i64 addrspace(11)* %62, align 8, !dbg !152
%63 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %61, i64 0, i32 1, !dbg !152
store i64 %"'ipc26", i64 addrspace(11)* %63, align 8, !dbg !152
%64 = bitcast {}*** %4 to {}**, !dbg !152
%65 = getelementptr inbounds {}*, {}** %64, i64 -13, !dbg !152
%66 = getelementptr inbounds {}*, {}** %65, i64 15, !dbg !152
%67 = bitcast {}** %66 to i8**, !dbg !152
%68 = load i8*, i8** %67, align 8, !dbg !152
%69 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %65, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
%70 = bitcast {} addrspace(10)* %69 to [1 x i64] addrspace(10)*, !dbg !152
%71 = addrspacecast [1 x i64] addrspace(10)* %70 to [1 x i64] addrspace(11)*, !dbg !152
%72 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %71, i64 0, i32 0, !dbg !152
store i64 1, i64 addrspace(11)* %72, align 8, !dbg !152
%73 = bitcast {}*** %3 to {}**, !dbg !152
%74 = getelementptr inbounds {}*, {}** %73, i64 -13, !dbg !152
%75 = getelementptr inbounds {}*, {}** %74, i64 15, !dbg !152
%76 = bitcast {}** %75 to i8**, !dbg !152
%77 = load i8*, i8** %76, align 8, !dbg !152
%78 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %74, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195885311824 to {}*) to {} addrspace(10)*)), !dbg !152
%79 = bitcast {} addrspace(10)* %78 to [2 x i64] addrspace(10)*, !dbg !152
%80 = addrspacecast [2 x i64] addrspace(10)* %79 to [2 x i64] addrspace(11)*, !dbg !152
%81 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %80, i64 0, i32 0, !dbg !152
store i64 %_replacementA24, i64 addrspace(11)* %81, align 8, !dbg !152
%82 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %80, i64 0, i32 1, !dbg !152
store i64 %"'ipc29", i64 addrspace(11)* %82, align 8, !dbg !152
%83 = bitcast {}*** %2 to {}**, !dbg !152
%84 = getelementptr inbounds {}*, {}** %83, i64 -13, !dbg !152
%85 = getelementptr inbounds {}*, {}** %84, i64 15, !dbg !152
%86 = bitcast {}** %85 to i8**, !dbg !152
%87 = load i8*, i8** %86, align 8, !dbg !152
%88 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %84, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
%89 = bitcast {} addrspace(10)* %88 to [1 x i64] addrspace(10)*, !dbg !152
%90 = addrspacecast [1 x i64] addrspace(10)* %89 to [1 x i64] addrspace(11)*, !dbg !152
%91 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %90, i64 0, i32 0, !dbg !152
store i64 1, i64 addrspace(11)* %91, align 8, !dbg !152
%92 = call fast double @julia_forward_2281([1 x i64] addrspace(11)* %52, [2 x i64] addrspace(11)* %61, [1 x i64] addrspace(11)* %71, [2 x i64] addrspace(11)* %80, [1 x i64] addrspace(11)* %90), !dbg !152
call void @llvm.julia.gc_preserve_end(token %40) #22, !dbg !152
ret void
allocsForInversion: ; No predecessors!
}
; Function Attrs: alwaysinline
define double @julia_forward_2281([1 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(8) %0, [2 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(16) %1, [1 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(8) %2, [2 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(16) %3, [1 x i64] addrspace(11)* nocapture nofree noundef nonnull readonly align 8 dereferenceable(8) %4) #15 !dbg !162 {
top:
%5 = call {}*** @julia.get_pgcstack()
%6 = getelementptr inbounds [1 x i64], [1 x i64] addrspace(11)* %0, i64 0, i64 0, !dbg !163
%7 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(11)* %1, i64 0, i64 0, !dbg !163
%8 = getelementptr inbounds [1 x i64], [1 x i64] addrspace(11)* %2, i64 0, i64 0, !dbg !163
%9 = getelementptr inbounds [2 x i64], [2 x i64] addrspace(11)* %3, i64 0, i64 0, !dbg !163
%10 = getelementptr inbounds [1 x i64], [1 x i64] addrspace(11)* %4, i64 0, i64 0, !dbg !163
%11 = load i64, i64 addrspace(11)* %6, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
%12 = load i64, i64 addrspace(11)* %7, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
%13 = load i64, i64 addrspace(11)* %8, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
%14 = load i64, i64 addrspace(11)* %9, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
%15 = load i64, i64 addrspace(11)* %10, align 8, !dbg !165, !tbaa !166, !alias.scope !168, !noalias !169
%16 = call double @julia_dot_2284(i64 signext %11, i64 zeroext %12, i64 signext %13, i64 zeroext %14, i64 signext %15) #16, !dbg !165
ret double %16, !dbg !165
}
define internal double @julia_dot_2284(i64 signext %0, i64 zeroext %1, i64 signext %2, i64 zeroext %3, i64 signext %4) #16 !dbg !170 {
top:
%5 = call {}*** @julia.get_pgcstack()
%6 = call double inttoptr (i64 140194943493221 to double (i64, i64, i64, i64, i64)*)(i64 %0, i64 %1, i64 %2, i64 %3, i64 %4), !dbg !171
ret double %6, !dbg !171
}
attributes #0 = { noinline nosync readonly "enzyme_math"="enzyme_custom" "enzyme_preserve_primal"="*" "enzymejl_job"="140194381763984" "enzymejl_mi"="140193926607856" "enzymejl_world"="33467" "frame-pointer"="all" "probe-stack"="inline-asm" }
attributes #1 = { nofree readnone "enzyme_inactive" "enzyme_shouldrecompute" "enzymejl_world"="33467" }
attributes #2 = { inaccessiblememonly allocsize(1) "enzymejl_world"="33467" }
attributes #3 = { inaccessiblememonly nofree "enzyme_inactive" "enzymejl_world"="33467" }
attributes #4 = { nofree nounwind readnone "enzymejl_world"="33467" }
attributes #5 = { inaccessiblememonly nofree norecurse nounwind "enzyme_inactive" "enzymejl_world"="33467" }
attributes #6 = { nofree "enzymejl_world"="33467" }
attributes #7 = { "enzymejl_world"="33467" }
attributes #8 = { noreturn "enzymejl_world"="33467" }
attributes #9 = { nofree norecurse nounwind readnone "enzyme_inactive" "enzyme_shouldrecompute" "enzymejl_world"="33467" }
attributes #10 = { nofree nosync nounwind readnone speculatable willreturn "enzymejl_world"="33467" }
attributes #11 = { "enzymejl_world"="33467" "probe-stack"="inline-asm" }
attributes #12 = { argmemonly nofree nosync nounwind willreturn "enzymejl_world"="33467" }
attributes #13 = { readnone "enzymejl_world"="33467" }
attributes #14 = { mustprogress willreturn "enzymejl_world"="33467" "probe-stack"="inline-asm" }
attributes #15 = { alwaysinline "frame-pointer"="all" "probe-stack"="inline-asm" }
attributes #16 = { "frame-pointer"="all" "probe-stack"="inline-asm" }
attributes #17 = { nounwind }
attributes #18 = { allocsize(1) }
attributes #19 = { nofree }
attributes #20 = { nounwind readnone }
attributes #21 = { noreturn }
attributes #22 = { mustprogress willreturn }
attributes #23 = { mustprogress willreturn allocsize(1) }
attributes #24 = { mustprogress nounwind willreturn }
attributes #25 = { mustprogress nofree willreturn }
attributes #26 = { mustprogress nounwind readnone willreturn }
attributes #27 = { mustprogress noreturn willreturn }
attributes #28 = { mustprogress willreturn "frame-pointer"="all" "probe-stack"="inline-asm" }
!llvm.module.flags = !{!0, !1, !2, !3}
!llvm.dbg.cu = !{!4, !6, !7, !9}
!llvm.ident = !{!10}
!0 = !{i32 2, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !{i32 1, !"wchar_size", i32 4}
!3 = !{i32 7, !"uwtable", i32 1}
!4 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !5, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!5 = !DIFile(filename: "/cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/LinearAlgebra/src/blas.jl", directory: ".")
!6 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !5, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!7 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !8, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!8 = !DIFile(filename: "/home/sethaxen/projects/Enzyme.jl/src/rules/LinearAlgebra/blas.jl", directory: ".")
!9 = distinct !DICompileUnit(language: DW_LANG_Julia, file: !5, producer: "julia", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly, nameTableKind: None)
!10 = !{!"clang version 14.0.3 (/depot/downloads/clones/llvm-project.git-5a9787eb535c2edc5dea030cc221c1d60f38c9f42344f410e425ea2139e233aa 465c166c5422079185c3289cdc2613420d8d6c51)"}
!11 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2279", scope: null, file: !5, line: 344, type: !12, scopeLine: 344, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !4, retainedNodes: !13)
!12 = !DISubroutineType(types: !13)
!13 = !{}
!14 = !DILocation(line: 345, scope: !11)
!15 = !{!16, !16, i64 0}
!16 = !{!"double", !17, i64 0}
!17 = !{!"omnipotent char", !18, i64 0}
!18 = !{!"Simple C/C++ TBAA"}
!19 = distinct !{!19, !20, !21}
!20 = !{!"llvm.loop.mustprogress"}
!21 = !{!"llvm.loop.unroll.disable"}
!22 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2276", scope: null, file: !5, line: 392, type: !12, scopeLine: 392, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!23 = !DILocation(line: 10, scope: !24, inlinedAt: !26)
!24 = distinct !DISubprogram(name: "length;", linkageName: "length", scope: !25, file: !25, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!25 = !DIFile(filename: "essentials.jl", directory: ".")
!26 = distinct !DILocation(line: 393, scope: !22, inlinedAt: !27)
!27 = distinct !DILocation(line: 0, scope: !22)
!28 = !{i64 0, i64 9223372036854775807}
!29 = !{!30}
!30 = !{!"jnoalias_typemd", !31}
!31 = !{!"jnoalias"}
!32 = !{!33, !34, !35, !36}
!33 = !{!"jnoalias_gcframe", !31}
!34 = !{!"jnoalias_stack", !31}
!35 = !{!"jnoalias_data", !31}
!36 = !{!"jnoalias_const", !31}
!37 = !DILocation(line: 499, scope: !38, inlinedAt: !40)
!38 = distinct !DISubprogram(name: "==;", linkageName: "==", scope: !39, file: !39, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!39 = !DIFile(filename: "promotion.jl", directory: ".")
!40 = distinct !DILocation(line: 394, scope: !22, inlinedAt: !27)
!41 = !DILocation(line: 394, scope: !22, inlinedAt: !27)
!42 = !DILocation(line: 41, scope: !43, inlinedAt: !40)
!43 = distinct !DISubprogram(name: "LazyString;", linkageName: "LazyString", scope: !44, file: !44, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!44 = !DIFile(filename: "strings/lazy.jl", directory: ".")
!45 = !{!46, !46, i64 0}
!46 = !{!"jtbaa_mutab", !47, i64 0}
!47 = !{!"jtbaa_value", !48, i64 0}
!48 = !{!"jtbaa_data", !49, i64 0}
!49 = !{!"jtbaa", !50, i64 0}
!50 = !{!"jtbaa"}
!51 = !{!35}
!52 = !{!53, !33, !34, !30, !36}
!53 = distinct !{!53, !54, !"na_addr13"}
!54 = distinct !{!54, !"addr13"}
!55 = !{!56, !56, i64 0}
!56 = !{!"jtbaa_immut", !47, i64 0}
!57 = !DILocation(line: 53, scope: !58, inlinedAt: !60)
!58 = distinct !DISubprogram(name: "getproperty;", linkageName: "getproperty", scope: !59, file: !59, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!59 = !DIFile(filename: "Base.jl", directory: ".")
!60 = distinct !DILocation(line: 81, scope: !61, inlinedAt: !62)
!61 = distinct !DISubprogram(name: "String;", linkageName: "String", scope: !44, file: !44, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!62 = distinct !DILocation(line: 232, scope: !63, inlinedAt: !65)
!63 = distinct !DISubprogram(name: "convert;", linkageName: "convert", scope: !64, file: !64, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!64 = !DIFile(filename: "strings/basic.jl", directory: ".")
!65 = distinct !DILocation(line: 12, scope: !66, inlinedAt: !40)
!66 = distinct !DISubprogram(name: "DimensionMismatch;", linkageName: "DimensionMismatch", scope: !67, file: !67, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!67 = !DIFile(filename: "array.jl", directory: ".")
!68 = !{!33, !34, !30, !36}
!69 = !DILocation(line: 82, scope: !61, inlinedAt: !62)
!70 = !DILocation(line: 107, scope: !71, inlinedAt: !73)
!71 = distinct !DISubprogram(name: "sprint;", linkageName: "sprint", scope: !72, file: !72, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!72 = !DIFile(filename: "strings/io.jl", directory: ".")
!73 = distinct !DILocation(line: 83, scope: !61, inlinedAt: !62)
!74 = !DILocation(line: 61, scope: !75, inlinedAt: !76)
!75 = distinct !DISubprogram(name: "replaceproperty!;", linkageName: "replaceproperty!", scope: !59, file: !59, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!76 = distinct !DILocation(line: 88, scope: !61, inlinedAt: !62)
!77 = !DILocation(line: 89, scope: !61, inlinedAt: !62)
!78 = !DILocation(line: 395, scope: !22, inlinedAt: !27)
!79 = !DILocation(line: 65, scope: !80, inlinedAt: !82)
!80 = distinct !DISubprogram(name: "unsafe_convert;", linkageName: "unsafe_convert", scope: !81, file: !81, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!81 = !DIFile(filename: "pointer.jl", directory: ".")
!82 = distinct !DILocation(line: 1240, scope: !83, inlinedAt: !85)
!83 = distinct !DISubprogram(name: "pointer;", linkageName: "pointer", scope: !84, file: !84, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!84 = !DIFile(filename: "abstractarray.jl", directory: ".")
!85 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !87)
!86 = distinct !DISubprogram(name: "vec_pointer_stride;", linkageName: "vec_pointer_stride", scope: !5, file: !5, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!87 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !88)
!88 = distinct !DILocation(line: 395, scope: !22, inlinedAt: !27)
!89 = !{!90, !90, i64 0}
!90 = !{!"jtbaa_arrayptr", !91, i64 0}
!91 = !{!"jtbaa_array", !49, i64 0}
!92 = !DILocation(line: 0, scope: !22)
!93 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2276", scope: null, file: !5, line: 392, type: !12, scopeLine: 392, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!94 = !DILocation(line: 10, scope: !24, inlinedAt: !95)
!95 = distinct !DILocation(line: 393, scope: !93, inlinedAt: !96)
!96 = distinct !DILocation(line: 0, scope: !93)
!97 = !DILocation(line: 499, scope: !38, inlinedAt: !98)
!98 = distinct !DILocation(line: 394, scope: !93, inlinedAt: !96)
!99 = !DILocation(line: 394, scope: !93, inlinedAt: !96)
!100 = !DILocation(line: 41, scope: !43, inlinedAt: !98)
!101 = !{!102, !33, !34, !30, !36}
!102 = distinct !{!102, !103, !"na_addr13"}
!103 = distinct !{!103, !"addr13"}
!104 = !DILocation(line: 53, scope: !58, inlinedAt: !105)
!105 = distinct !DILocation(line: 81, scope: !61, inlinedAt: !106)
!106 = distinct !DILocation(line: 232, scope: !63, inlinedAt: !107)
!107 = distinct !DILocation(line: 12, scope: !66, inlinedAt: !98)
!108 = !DILocation(line: 82, scope: !61, inlinedAt: !106)
!109 = !DILocation(line: 107, scope: !71, inlinedAt: !110)
!110 = distinct !DILocation(line: 83, scope: !61, inlinedAt: !106)
!111 = !DILocation(line: 61, scope: !75, inlinedAt: !112)
!112 = distinct !DILocation(line: 88, scope: !61, inlinedAt: !106)
!113 = !DILocation(line: 89, scope: !61, inlinedAt: !106)
!114 = !DILocation(line: 395, scope: !93, inlinedAt: !96)
!115 = !DILocation(line: 65, scope: !80, inlinedAt: !116)
!116 = distinct !DILocation(line: 1240, scope: !83, inlinedAt: !117)
!117 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !118)
!118 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !119)
!119 = distinct !DILocation(line: 395, scope: !93, inlinedAt: !96)
!120 = !DILocation(line: 0, scope: !93)
!121 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2276", scope: null, file: !5, line: 392, type: !12, scopeLine: 392, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !6, retainedNodes: !13)
!122 = !DILocation(line: 10, scope: !24, inlinedAt: !123)
!123 = distinct !DILocation(line: 393, scope: !121, inlinedAt: !124)
!124 = distinct !DILocation(line: 0, scope: !121)
!125 = !{!126, !30}
!126 = distinct !{!126, !127, !"primal"}
!127 = distinct !{!127, !" diff: %"}
!128 = !{!129, !33, !34, !35, !36}
!129 = distinct !{!129, !127, !"shadow_0"}
!130 = !{!131, !30}
!131 = distinct !{!131, !132, !"primal"}
!132 = distinct !{!132, !" diff: %"}
!133 = !{!134, !33, !34, !35, !36}
!134 = distinct !{!134, !132, !"shadow_0"}
!135 = !DILocation(line: 499, scope: !38, inlinedAt: !136)
!136 = distinct !DILocation(line: 394, scope: !121, inlinedAt: !124)
!137 = !DILocation(line: 394, scope: !121, inlinedAt: !124)
!138 = !DILocation(line: 41, scope: !43, inlinedAt: !136)
!139 = !{!140, !33, !34, !30, !36}
!140 = distinct !{!140, !141, !"na_addr13"}
!141 = distinct !{!141, !"addr13"}
!142 = !DILocation(line: 53, scope: !58, inlinedAt: !143)
!143 = distinct !DILocation(line: 81, scope: !61, inlinedAt: !144)
!144 = distinct !DILocation(line: 232, scope: !63, inlinedAt: !145)
!145 = distinct !DILocation(line: 12, scope: !66, inlinedAt: !136)
!146 = !DILocation(line: 82, scope: !61, inlinedAt: !144)
!147 = !DILocation(line: 107, scope: !71, inlinedAt: !148)
!148 = distinct !DILocation(line: 83, scope: !61, inlinedAt: !144)
!149 = !DILocation(line: 61, scope: !75, inlinedAt: !150)
!150 = distinct !DILocation(line: 88, scope: !61, inlinedAt: !144)
!151 = !DILocation(line: 89, scope: !61, inlinedAt: !144)
!152 = !DILocation(line: 395, scope: !121, inlinedAt: !124)
!153 = !DILocation(line: 65, scope: !80, inlinedAt: !154)
!154 = distinct !DILocation(line: 1240, scope: !83, inlinedAt: !155)
!155 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !156)
!156 = distinct !DILocation(line: 177, scope: !86, inlinedAt: !157)
!157 = distinct !DILocation(line: 395, scope: !121, inlinedAt: !124)
!158 = !{!129, !30}
!159 = !{!126, !33, !34, !35, !36}
!160 = !{!134, !30}
!161 = !{!131, !33, !34, !35, !36}
!162 = distinct !DISubprogram(name: "forward", linkageName: "julia_forward_2281", scope: null, file: !8, line: 62, type: !12, scopeLine: 62, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !13)
!163 = !DILocation(line: 37, scope: !164, inlinedAt: !165)
!164 = distinct !DISubprogram(name: "getproperty;", linkageName: "getproperty", scope: !59, file: !59, type: !12, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !7, retainedNodes: !13)
!165 = !DILocation(line: 75, scope: !162)
!166 = !{!167, !167, i64 0, i64 0}
!167 = !{!"jtbaa_const", !49, i64 0}
!168 = !{!36}
!169 = !{!33, !34, !35, !30}
!170 = distinct !DISubprogram(name: "dot", linkageName: "julia_dot_2284", scope: null, file: !5, line: 344, type: !12, scopeLine: 344, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !9, retainedNodes: !13)
!171 = !DILocation(line: 345, scope: !170)
oldFunc:; Function Attrs: mustprogress willreturn
define double @preprocess_julia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1) local_unnamed_addr #14 !dbg !93 {
entry:
%2 = call {}*** @julia.get_pgcstack() #17
%3 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !94
%4 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %3 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !94
%5 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %4, i64 0, i32 1, !dbg !94
%6 = load i64, i64 addrspace(11)* %5, align 8, !dbg !94, !range !28, !alias.scope !29, !noalias !32
%7 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !94
%8 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %7 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !94
%9 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %8, i64 0, i32 1, !dbg !94
%10 = load i64, i64 addrspace(11)* %9, align 8, !dbg !94, !range !28, !alias.scope !29, !noalias !32
%.not.i = icmp eq i64 %6, %10, !dbg !97
br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !99
L12.i: ; preds = %entry
%current_task15.i = getelementptr inbounds {}**, {}*** %2, i64 -13, !dbg !100
%current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !100
%11 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #18, !dbg !100
%12 = bitcast {} addrspace(10)* %11 to {} addrspace(10)* addrspace(10)*, !dbg !100
%13 = addrspacecast {} addrspace(10)* addrspace(10)* %12 to {} addrspace(10)* addrspace(11)*, !dbg !100
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %13, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
%14 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %13, i64 1, !dbg !100
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %14, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
%15 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #18, !dbg !100
%16 = bitcast {} addrspace(10)* %15 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !100
%.repack.i = bitcast {} addrspace(10)* %15 to {} addrspace(10)* addrspace(10)*, !dbg !100
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
%.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 1, !dbg !100
store i64 %6, i64 addrspace(10)* %.repack7.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
%.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 2, !dbg !100
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
%.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %16, i64 0, i32 3, !dbg !100
store i64 %10, i64 addrspace(10)* %.repack11.i, align 8, !dbg !100, !tbaa !55, !alias.scope !51, !noalias !101
store atomic {} addrspace(10)* %15, {} addrspace(10)* addrspace(11)* %13 release, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %15) #19, !dbg !100
%17 = bitcast {} addrspace(10)* %11 to i8 addrspace(10)*, !dbg !100
%18 = addrspacecast i8 addrspace(10)* %17 to i8 addrspace(11)*, !dbg !100
%19 = getelementptr inbounds i8, i8 addrspace(11)* %18, i64 8, !dbg !100
%20 = bitcast i8 addrspace(11)* %19 to {} addrspace(10)* addrspace(11)*, !dbg !100
store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %20 release, align 8, !dbg !100, !tbaa !45, !alias.scope !51, !noalias !101
%21 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %20 acquire, align 8, !dbg !104, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
%22 = addrspacecast {} addrspace(10)* %21 to {} addrspace(11)*, !dbg !108
%.not13.i = icmp eq {} addrspace(11)* %22, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !108
br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !108
L17.i: ; preds = %L12.i
%23 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #18, !dbg !109
%24 = bitcast {} addrspace(10)* %23 to {} addrspace(10)* addrspace(10)*, !dbg !109
store {} addrspace(10)* %11, {} addrspace(10)* addrspace(10)* %24, align 8, !dbg !109, !tbaa !55, !alias.scope !51, !noalias !101
%25 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %23) #20, !dbg !109
%26 = cmpxchg {} addrspace(10)* addrspace(11)* %20, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %25 acq_rel acquire, align 8, !dbg !111, !tbaa !45, !alias.scope !51, !noalias !68
%27 = extractvalue { {} addrspace(10)*, i1 } %26, 0, !dbg !111
%28 = extractvalue { {} addrspace(10)*, i1 } %26, 1, !dbg !111
br i1 %28, label %xchg_wb.i, label %L27.i, !dbg !111
L27.i: ; preds = %L17.i
%29 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %27) #21, !dbg !113
%30 = icmp eq {} addrspace(10)* %29, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !113
br i1 %30, label %L32.i, label %fail.i, !dbg !113
L32.i: ; preds = %xchg_wb.i, %L27.i, %L12.i
%value_phi.i = phi {} addrspace(10)* [ %25, %xchg_wb.i ], [ %21, %L12.i ], [ %27, %L27.i ]
%31 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #18, !dbg !99
%32 = bitcast {} addrspace(10)* %31 to {} addrspace(10)* addrspace(10)*, !dbg !99
store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %32, align 8, !dbg !99, !tbaa !55, !alias.scope !51, !noalias !101
%33 = addrspacecast {} addrspace(10)* %31 to {} addrspace(12)*, !dbg !99
call void @ijl_throw({} addrspace(12)* %33) #22, !dbg !99
unreachable, !dbg !99
xchg_wb.i: ; preds = %L17.i
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %11, {} addrspace(10)* nonnull %25) #19, !dbg !111
br label %L32.i, !dbg !113
fail.i: ; preds = %L27.i
%34 = addrspacecast {} addrspace(10)* %27 to {} addrspace(12)*, !dbg !113
call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %34) #22, !dbg !113
unreachable, !dbg !113
julia_dot_2276_inner.exit: ; preds = %entry
%35 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* nonnull %0, {} addrspace(10)* nonnull %1) #17, !dbg !114
%36 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !115
%37 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %36) #21, !dbg !115
%38 = bitcast {}* %37 to i8**, !dbg !115
%39 = load i8*, i8** %38, align 8, !dbg !115, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
%40 = ptrtoint i8* %39 to i64, !dbg !115
%41 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !115
%42 = call nonnull {}* @julia.pointer_from_objref({} addrspace(11)* noundef %41) #21, !dbg !115
%43 = bitcast {}* %42 to i8**, !dbg !115
%44 = load i8*, i8** %43, align 8, !dbg !115, !tbaa !89, !alias.scope !29, !noalias !32, !nonnull !13
%45 = ptrtoint i8* %44 to i64, !dbg !115
%46 = call fastcc double @julia_dot_2279(i64 signext %6, i64 zeroext %40, i64 noundef signext 1, i64 zeroext %45, i64 noundef signext 1) #23, !dbg !114
call void @llvm.julia.gc_preserve_end(token %35) #17, !dbg !114
ret double %46, !dbg !120
}
newFunc:; Function Attrs: mustprogress willreturn
define internal void @fwddiffejulia_dot_2276_inner.3({} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %0, {} addrspace(10)* %"'", {} addrspace(10)* noundef nonnull align 16 dereferenceable(40) %1, {} addrspace(10)* %"'1") local_unnamed_addr #14 !dbg !121 {
entry:
%2 = call {}*** @julia.get_pgcstack()
%3 = call {}*** @julia.get_pgcstack()
%4 = call {}*** @julia.get_pgcstack()
%5 = call {}*** @julia.get_pgcstack()
%6 = call {}*** @julia.get_pgcstack()
%7 = call {}*** @julia.get_pgcstack() #17
%8 = bitcast {} addrspace(10)* %0 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !122
%9 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %8 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !122
%10 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %9, i64 0, i32 1, !dbg !122
%11 = load i64, i64 addrspace(11)* %10, align 8, !dbg !122, !range !28, !alias.scope !125, !noalias !128
%12 = bitcast {} addrspace(10)* %1 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)*, !dbg !122
%13 = addrspacecast { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(10)* %12 to { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)*, !dbg !122
%14 = getelementptr inbounds { i8 addrspace(13)*, i64, i16, i16, i32 }, { i8 addrspace(13)*, i64, i16, i16, i32 } addrspace(11)* %13, i64 0, i32 1, !dbg !122
%15 = load i64, i64 addrspace(11)* %14, align 8, !dbg !122, !range !28, !alias.scope !130, !noalias !133
%.not.i = icmp eq i64 %11, %15, !dbg !135
br i1 %.not.i, label %julia_dot_2276_inner.exit, label %L12.i, !dbg !137
L12.i: ; preds = %entry
%current_task15.i = getelementptr inbounds {}**, {}*** %7, i64 -13, !dbg !138
%current_task1.i = bitcast {}*** %current_task15.i to {}**, !dbg !138
%16 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 16, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195699359008 to {}*) to {} addrspace(10)*)) #18, !dbg !138
%17 = bitcast {} addrspace(10)* %16 to {} addrspace(10)* addrspace(10)*, !dbg !138
%18 = addrspacecast {} addrspace(10)* addrspace(10)* %17 to {} addrspace(10)* addrspace(11)*, !dbg !138
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %18, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
%19 = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %18, i64 1, !dbg !138
store {} addrspace(10)* null, {} addrspace(10)* addrspace(11)* %19, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
%20 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 32, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195667121680 to {}*) to {} addrspace(10)*)) #18, !dbg !138
%21 = bitcast {} addrspace(10)* %20 to { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)*, !dbg !138
%.repack.i = bitcast {} addrspace(10)* %20 to {} addrspace(10)* addrspace(10)*, !dbg !138
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619104 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
%.repack7.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 1, !dbg !138
store i64 %11, i64 addrspace(10)* %.repack7.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
%.repack9.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 2, !dbg !138
store {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195729619072 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(10)* %.repack9.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
%.repack11.i = getelementptr inbounds { {} addrspace(10)*, i64, {} addrspace(10)*, i64 }, { {} addrspace(10)*, i64, {} addrspace(10)*, i64 } addrspace(10)* %21, i64 0, i32 3, !dbg !138
store i64 %15, i64 addrspace(10)* %.repack11.i, align 8, !dbg !138, !tbaa !55, !alias.scope !51, !noalias !139
store atomic {} addrspace(10)* %20, {} addrspace(10)* addrspace(11)* %18 release, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %16, {} addrspace(10)* nonnull %20) #19, !dbg !138
%22 = bitcast {} addrspace(10)* %16 to i8 addrspace(10)*, !dbg !138
%23 = addrspacecast i8 addrspace(10)* %22 to i8 addrspace(11)*, !dbg !138
%24 = getelementptr inbounds i8, i8 addrspace(11)* %23, i64 8, !dbg !138
%25 = bitcast i8 addrspace(11)* %24 to {} addrspace(10)* addrspace(11)*, !dbg !138
store atomic {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspace(11)* %25 release, align 8, !dbg !138, !tbaa !45, !alias.scope !51, !noalias !139
%26 = load atomic {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %25 acquire, align 8, !dbg !142, !tbaa !45, !alias.scope !51, !noalias !68, !nonnull !13
%27 = addrspacecast {} addrspace(10)* %26 to {} addrspace(11)*, !dbg !146
%.not13.i = icmp eq {} addrspace(11)* %27, addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(11)*), !dbg !146
br i1 %.not13.i, label %L17.i, label %L32.i, !dbg !146
L17.i: ; preds = %L12.i
%28 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195691323952 to {}*) to {} addrspace(10)*)) #18, !dbg !147
%29 = bitcast {} addrspace(10)* %28 to {} addrspace(10)* addrspace(10)*, !dbg !147
store {} addrspace(10)* %16, {} addrspace(10)* addrspace(10)* %29, align 8, !dbg !147, !tbaa !55, !alias.scope !51, !noalias !139
%30 = call nonnull {} addrspace(10)* ({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)*, {} addrspace(10)*, {} addrspace(10)*, ...) @julia.call2({} addrspace(10)* ({} addrspace(10)*, {} addrspace(10)**, i32, {} addrspace(10)*)* noundef nonnull @ijl_invoke, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195718475744 to {}*) to {} addrspace(10)*), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195662589696 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888394272 to {}*) to {} addrspace(10)*), {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195672369408 to {}*) to {} addrspace(10)*), {} addrspace(10)* nonnull %28) #20, !dbg !147
%31 = cmpxchg {} addrspace(10)* addrspace(11)* %25, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195888361480 to {}*) to {} addrspace(10)*), {} addrspace(10)* %30 acq_rel acquire, align 8, !dbg !149, !tbaa !45, !alias.scope !51, !noalias !68
%32 = extractvalue { {} addrspace(10)*, i1 } %31, 0, !dbg !149
%33 = extractvalue { {} addrspace(10)*, i1 } %31, 1, !dbg !149
br i1 %33, label %xchg_wb.i, label %L27.i, !dbg !149
L27.i: ; preds = %L17.i
%34 = call {} addrspace(10)* @julia.typeof({} addrspace(10)* %32) #21, !dbg !151
%35 = icmp eq {} addrspace(10)* %34, addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), !dbg !151
br i1 %35, label %L32.i, label %fail.i, !dbg !151
L32.i: ; preds = %xchg_wb.i, %L27.i, %L12.i
%value_phi.i = phi {} addrspace(10)* [ %30, %xchg_wb.i ], [ %26, %L12.i ], [ %32, %L27.i ]
%36 = call noalias nonnull {} addrspace(10)* @julia.gc_alloc_obj({}** nonnull %current_task1.i, i64 noundef 8, {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195706238240 to {}*) to {} addrspace(10)*)) #18, !dbg !137
%37 = bitcast {} addrspace(10)* %36 to {} addrspace(10)* addrspace(10)*, !dbg !137
store {} addrspace(10)* %value_phi.i, {} addrspace(10)* addrspace(10)* %37, align 8, !dbg !137, !tbaa !55, !alias.scope !51, !noalias !139
%38 = addrspacecast {} addrspace(10)* %36 to {} addrspace(12)*, !dbg !137
call void @ijl_throw({} addrspace(12)* %38) #22, !dbg !137
unreachable, !dbg !137
xchg_wb.i: ; preds = %L17.i
call void ({} addrspace(10)*, ...) @julia.write_barrier({} addrspace(10)* noundef nonnull %16, {} addrspace(10)* nonnull %30) #19, !dbg !149
br label %L32.i, !dbg !151
fail.i: ; preds = %L27.i
%39 = addrspacecast {} addrspace(10)* %32 to {} addrspace(12)*, !dbg !151
call void @ijl_type_error(i8* noundef getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {} addrspace(10)* noundef addrspacecast ({}* inttoptr (i64 140195745050864 to {}*) to {} addrspace(10)*), {} addrspace(12)* %39) #22, !dbg !151
unreachable, !dbg !151
julia_dot_2276_inner.exit: ; preds = %entry
%40 = call token (...) @llvm.julia.gc_preserve_begin({} addrspace(10)* %0, {} addrspace(10)* %"'", {} addrspace(10)* %1, {} addrspace(10)* %"'1"), !dbg !152
%"'ipc" = addrspacecast {} addrspace(10)* %"'" to {} addrspace(11)*, !dbg !153
%41 = addrspacecast {} addrspace(10)* %0 to {} addrspace(11)*, !dbg !153
%42 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc"), !dbg !153
%_replacementA = phi {}* , !dbg !153
%"'ipc25" = bitcast {}* %42 to i8**, !dbg !153
%_replacementA17 = phi i8** , !dbg !153
%"'ipl" = load i8*, i8** %"'ipc25", align 8, !dbg !153, !tbaa !89, !alias.scope !158, !noalias !159, !nonnull !13
%"'ipc26" = ptrtoint i8* %"'ipl" to i64, !dbg !153
%_replacementA19 = phi i64 , !dbg !153
%"'ipc20" = addrspacecast {} addrspace(10)* %"'1" to {} addrspace(11)*, !dbg !153
%43 = addrspacecast {} addrspace(10)* %1 to {} addrspace(11)*, !dbg !153
%44 = call {}* @julia.pointer_from_objref({} addrspace(11)* %"'ipc20"), !dbg !153
%_replacementA21 = phi {}* , !dbg !153
%"'ipc27" = bitcast {}* %44 to i8**, !dbg !153
%_replacementA22 = phi i8** , !dbg !153
%"'ipl28" = load i8*, i8** %"'ipc27", align 8, !dbg !153, !tbaa !89, !alias.scope !160, !noalias !161, !nonnull !13
%_replacementA23 = phi i8* , !dbg !153
%"'ipc29" = ptrtoint i8* %"'ipl28" to i64, !dbg !153
%_replacementA24 = phi i64 , !dbg !153
%45 = bitcast {}*** %6 to {}**, !dbg !152
%46 = getelementptr inbounds {}*, {}** %45, i64 -13, !dbg !152
%47 = getelementptr inbounds {}*, {}** %46, i64 15, !dbg !152
%48 = bitcast {}** %47 to i8**, !dbg !152
%49 = load i8*, i8** %48, align 8, !dbg !152
%50 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %46, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
%51 = bitcast {} addrspace(10)* %50 to [1 x i64] addrspace(10)*, !dbg !152
%52 = addrspacecast [1 x i64] addrspace(10)* %51 to [1 x i64] addrspace(11)*, !dbg !152
%53 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %52, i64 0, i32 0, !dbg !152
store i64 %11, i64 addrspace(11)* %53, align 8, !dbg !152
%54 = bitcast {}*** %5 to {}**, !dbg !152
%55 = getelementptr inbounds {}*, {}** %54, i64 -13, !dbg !152
%56 = getelementptr inbounds {}*, {}** %55, i64 15, !dbg !152
%57 = bitcast {}** %56 to i8**, !dbg !152
%58 = load i8*, i8** %57, align 8, !dbg !152
%59 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %55, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195885311824 to {}*) to {} addrspace(10)*)), !dbg !152
%60 = bitcast {} addrspace(10)* %59 to [2 x i64] addrspace(10)*, !dbg !152
%61 = addrspacecast [2 x i64] addrspace(10)* %60 to [2 x i64] addrspace(11)*, !dbg !152
%62 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %61, i64 0, i32 0, !dbg !152
store i64 %_replacementA19, i64 addrspace(11)* %62, align 8, !dbg !152
%63 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %61, i64 0, i32 1, !dbg !152
store i64 %"'ipc26", i64 addrspace(11)* %63, align 8, !dbg !152
%64 = bitcast {}*** %4 to {}**, !dbg !152
%65 = getelementptr inbounds {}*, {}** %64, i64 -13, !dbg !152
%66 = getelementptr inbounds {}*, {}** %65, i64 15, !dbg !152
%67 = bitcast {}** %66 to i8**, !dbg !152
%68 = load i8*, i8** %67, align 8, !dbg !152
%69 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %65, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
%70 = bitcast {} addrspace(10)* %69 to [1 x i64] addrspace(10)*, !dbg !152
%71 = addrspacecast [1 x i64] addrspace(10)* %70 to [1 x i64] addrspace(11)*, !dbg !152
%72 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %71, i64 0, i32 0, !dbg !152
store i64 1, i64 addrspace(11)* %72, align 8, !dbg !152
%73 = bitcast {}*** %3 to {}**, !dbg !152
%74 = getelementptr inbounds {}*, {}** %73, i64 -13, !dbg !152
%75 = getelementptr inbounds {}*, {}** %74, i64 15, !dbg !152
%76 = bitcast {}** %75 to i8**, !dbg !152
%77 = load i8*, i8** %76, align 8, !dbg !152
%78 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %74, i64 16, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195885311824 to {}*) to {} addrspace(10)*)), !dbg !152
%79 = bitcast {} addrspace(10)* %78 to [2 x i64] addrspace(10)*, !dbg !152
%80 = addrspacecast [2 x i64] addrspace(10)* %79 to [2 x i64] addrspace(11)*, !dbg !152
%81 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %80, i64 0, i32 0, !dbg !152
store i64 %_replacementA24, i64 addrspace(11)* %81, align 8, !dbg !152
%82 = getelementptr [2 x i64], [2 x i64] addrspace(11)* %80, i64 0, i32 1, !dbg !152
store i64 %"'ipc29", i64 addrspace(11)* %82, align 8, !dbg !152
%83 = bitcast {}*** %2 to {}**, !dbg !152
%84 = getelementptr inbounds {}*, {}** %83, i64 -13, !dbg !152
%85 = getelementptr inbounds {}*, {}** %84, i64 15, !dbg !152
%86 = bitcast {}** %85 to i8**, !dbg !152
%87 = load i8*, i8** %86, align 8, !dbg !152
%88 = call {} addrspace(10)* @julia.gc_alloc_obj({}** %84, i64 8, {} addrspace(10)* addrspacecast ({}* inttoptr (i64 140195837729488 to {}*) to {} addrspace(10)*)), !dbg !152
%89 = bitcast {} addrspace(10)* %88 to [1 x i64] addrspace(10)*, !dbg !152
%90 = addrspacecast [1 x i64] addrspace(10)* %89 to [1 x i64] addrspace(11)*, !dbg !152
%91 = getelementptr [1 x i64], [1 x i64] addrspace(11)* %90, i64 0, i32 0, !dbg !152
store i64 1, i64 addrspace(11)* %91, align 8, !dbg !152
%92 = call fast double @julia_forward_2281([1 x i64] addrspace(11)* %52, [2 x i64] addrspace(11)* %61, [1 x i64] addrspace(11)* %71, [2 x i64] addrspace(11)* %80, [1 x i64] addrspace(11)* %90), !dbg !152
call void @llvm.julia.gc_preserve_end(token %40) #17, !dbg !152
ret void
allocsForInversion: ; No predecessors!
}
pp: %_replacementA19 = phi i64 , !dbg !78 of %40 = ptrtoint i8* %39 to i64, !dbg !70
julia: /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:7903: void GradientUtils::eraseFictiousPHIs(): Assertion `pp->getNumUses() == 0' failed.
[107420] signal (6.-6): Aborted
in expression starting at REPL[13]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f81ede2871a)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
eraseFictiousPHIs at /workspace/srcdir/Enzyme/enzyme/Enzyme/GradientUtils.cpp:7903
CreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/EnzymeLogic.cpp:4648
EnzymeCreateForwardDiff at /workspace/srcdir/Enzyme/enzyme/Enzyme/CApi.cpp:502
EnzymeCreateForwardDiff at /home/sethaxen/projects/Enzyme.jl/src/api.jl:138
enzyme! at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:6956
unknown function (ip: 0x7f81c93d23b9)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
#codegen#162 at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8194
codegen at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:7820
unknown function (ip: 0x7f81c93aa5fd)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
_thunk at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8707
_thunk at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8704 [inlined]
cached_compilation at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8742 [inlined]
#s287#191 at /home/sethaxen/projects/Enzyme.jl/src/compiler.jl:8800 [inlined]
#s287#191 at ./none:0
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
GeneratedFunctionStub at ./boot.jl:602
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_call_staged at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/method.c:530
ijl_code_for_staged at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/method.c:581
get_staged at ./compiler/utilities.jl:115
retrieve_code_info at ./compiler/utilities.jl:127 [inlined]
InferenceState at ./compiler/inferencestate.jl:354
typeinf_edge at ./compiler/typeinfer.jl:922
abstract_call_method at ./compiler/abstractinterpretation.jl:611
abstract_call_gf_by_type at ./compiler/abstractinterpretation.jl:152
abstract_call_known at ./compiler/abstractinterpretation.jl:1949
jfptr_abstract_call_known_12792.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
tojlinvoke21381.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
j_abstract_call_known_12333.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
abstract_call at ./compiler/abstractinterpretation.jl:2020
abstract_call at ./compiler/abstractinterpretation.jl:1999
abstract_eval_statement_expr at ./compiler/abstractinterpretation.jl:2183
abstract_eval_statement at ./compiler/abstractinterpretation.jl:2396
abstract_eval_basic_statement at ./compiler/abstractinterpretation.jl:2684
typeinf_local at ./compiler/abstractinterpretation.jl:2869
typeinf_nocycle at ./compiler/abstractinterpretation.jl:2957
_typeinf at ./compiler/typeinfer.jl:244
typeinf at ./compiler/typeinfer.jl:215
typeinf_ext at ./compiler/typeinfer.jl:1056
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1089
typeinf_ext_toplevel at ./compiler/typeinfer.jl:1085
jfptr_typeinf_ext_toplevel_16333.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_type_infer at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:320
jl_generate_fptr_impl at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jitlayers.cpp:444
jl_compile_method_internal at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2348 [inlined]
jl_compile_method_internal at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2237
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2750 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
autodiff at /home/sethaxen/projects/Enzyme.jl/src/Enzyme.jl:321
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
do_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/builtins.c:730
autodiff at /home/sethaxen/projects/Enzyme.jl/src/Enzyme.jl:215
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
do_call at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:126
eval_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:226
eval_stmt_value at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:177 [inlined]
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:624
jl_interpret_toplevel_thunk at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
eval_user_input at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:153
repl_backend_loop at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:249
#start_repl_backend#46 at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:234
start_repl_backend at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:231
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
#run_repl#59 at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:377
run_repl at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/usr/share/julia/stdlib/v1.9/REPL/src/REPL.jl:363
jfptr_run_repl_61794.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
#1019 at ./client.jl:421
jfptr_YY.1019_49540.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_f__call_latest at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:816 [inlined]
invokelatest at ./essentials.jl:813 [inlined]
run_main_repl at ./client.jl:405
exec_options at ./client.jl:322
_start at ./client.jl:522
jfptr__start_37296.clone_1 at /home/sethaxen/.julia/juliaup/julia-1.9.0-rc2+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
true_main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/cli/loader_exe.c:59
unknown function (ip: 0x7f81ede29d8f)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 20156377 (Pool: 20133683; Big: 22694); GC: 28
Aborted (core dumped)
This is strange because only this line of code should be hit in this case, and all it does is call the primal function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2-arg methods also error for reverse-mode. These are the only remaining failures in the test suite.
Edit: also, this only happens with dot
and real inputs, not with dotc
or dotu
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open an issue?
All tests seem to pass for me. Some of the jobs in CI seem to not be picked up for some reason, but as far as I can tell, the same tests pass here that pass on main. Here's an updated benchmark. The main takeaways are that the forward mode rules give a 2.5-7.3x speed-up vs the versions hit by main, and the reverse mode rules give a 13-60x speed-up when no tape is needed and a 5.7-8.7x speed-up when a tape is needed. In this latter case, the trade-off is that these rules allocate a tape for all entries that contribute to the output, which in a case like the one benchmarked here (where future operations mutate only a single entry in the array) is wasteful. using BenchmarkTools, Enzyme, LinearAlgebra, Random
Random.seed!(42)
n = 1_000
x = randn(n)
y = randn(n)
∂x = randn(eltype(x), size(x))
∂y = randn(eltype(y), size(y))
incx = incy = 1
# version that triggers tape allocation
function f_overwite!(f, n, x, incx, y, incy)
s = f(n, x, incx, y, incy)
x[1] = 0
y[1] = 0
return s
end
## BLAS.dot
@btime autodiff(
Forward, $(BLAS.dot), $n, $(Duplicated(x, ∂x)), $incx, $(Duplicated(y, ∂y)), $incy,
)
# main: 731.130 ns (0 allocations: 0 bytes)
# here: 100.365 ns (2 allocations: 64 bytes)
# no tape needed
@btime autodiff(
ReverseWithPrimal, $(BLAS.dot), Active, $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(x, copy(∂x)); Dy=Duplicated(y, copy(∂y)))
# main: 11.139 μs (2 allocations: 32 bytes)
# here: 192.827 ns (7 allocations: 208 bytes)
# tape needed
@btime autodiff(
ReverseWithPrimal, $(f_overwite!), Active, $(BLAS.dot), $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(copy(x), copy(∂x)); Dy=Duplicated(copy(y), copy(∂y)))
# main: 11.624 μs (2 allocations: 32 bytes)
# here: 1.342 μs (9 allocations: 16.08 KiB)
T = ComplexF64
x = randn(T, n)
y = randn(T, n)
∂x = randn(eltype(x), size(x))
∂y = randn(eltype(y), size(y))
incx = incy = 1
## BLAS.dotu
@btime autodiff(
Forward, $(BLAS.dotu), $n, $(Duplicated(x, ∂x)), $incx, $(Duplicated(y, ∂y)), $incy,
)
# main: 1.599 μs (0 allocations: 0 bytes)
# here: 650.117 ns (2 allocations: 64 bytes)
# no tape needed
@btime autodiff(
ReverseWithPrimal, $(BLAS.dotu), Active, $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(x, copy(∂x)); Dy=Duplicated(y, copy(∂y)))
# main: 21.493 μs (2 allocations: 48 bytes)
# here: 1.706 μs (8 allocations: 256 bytes)
# tape needed
@btime autodiff(
ReverseWithPrimal, $(f_overwite!), Active, $(BLAS.dotu), $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(copy(x), copy(∂x)); Dy=Duplicated(copy(y), copy(∂y)))
# main: 22.231 μs (2 allocations: 48 bytes)
# here: 3.885 μs (10 allocations: 31.75 KiB)
## BLAS.dotc
@btime autodiff(
Forward, $(BLAS.dotc), $n, $(Duplicated(x, ∂x)), $incx, $(Duplicated(y, ∂y)), $incy,
)
# main: 1.584 μs (0 allocations: 0 bytes)
# here: 647.920 ns (2 allocations: 64 bytes)
# no tape needed
@btime autodiff(
ReverseWithPrimal, $(BLAS.dotc), Active, $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(x, copy(∂x)); Dy=Duplicated(y, copy(∂y)))
# main: 21.220 μs (2 allocations: 48 bytes)
# here: 758.145 ns (8 allocations: 256 bytes)
# tape needed
@btime autodiff(
ReverseWithPrimal, $(f_overwite!), Active, $(BLAS.dotc), $n, Dx, $incx, Dy, $incy,
) setup=(Dx=Duplicated(copy(x), copy(∂x)); Dy=Duplicated(copy(y), copy(∂y)))
# main: 22.251 μs (2 allocations: 48 bytes)
# here: 2.977 μs (10 allocations: 31.75 KiB) @ZuseZ4 it would be interesting to see how the tablegen versions compare. |
This reverts commit 418bc56.
In an attempt to learn Enzyme's rule system and speed up AD of BLAS calls, this PR adds a rule for
BLAS.dot
(andBLAS.dotc
andBLAS.dotu
). On an input of length 10,000, this is 6x faster than the fallback in forward mode and 60x faster than the fallback in reverse mode.