Active Job reporter reports incorrect data if a job is run within another job via `perform_now` #2467

jonleighton · 2024-11-20T00:30:19Z

Issue Description

Sentry's Active Job reporting code works by prepending a module to overload the perform_now method.

This has the outcome that if one job runs the code in another job via perform_now, and an error occurs, the transaction and context will be reported as the inner job, even though it's the outer job being run.

Similarly, we may not even be in a job run at all, e.g. if some controller calls perform_now on a job, the context will be wrong. It will look like the error occurred in a worker process, when it actually occurred in a web process.

Reproduction Steps

class AJob < ApplicationJob
  def perform
    BJob.perform_now
  end
end

class BJob < ApplicationJob
  def perform
    raise "omg"
  end
end

Call AJob.perform_later.

Expected Behavior

Transaction/context is reported from AJob.

Actual Behavior

Transaction/context is reported from BJob.

Ruby Version

3.3.5

SDK Version

5.21.0

Integration and Its Version

No response

Sentry Config

No response

The text was updated successfully, but these errors were encountered:

sl0thentr0py · 2024-11-20T11:09:16Z

right, technically if we're in the same process and not in the worker, we could just start a span instead. I'll have to see what the best way of detecting this is though, or maybe not patch perform_now but the enqueue/execute separately.

st0012 · 2025-01-12T21:30:09Z

In such events, my expectations are:

The exception should be directly tied to BJob as the exception is raised in it.
AJob should be listed in the stacktrace.
AJob should be the transaction.

In my example app:

ErrorJob = BJob
IndirectErrorJob = AJob

And the current behaviour is:

And 3) is not satisfied, which I think is the main concern of the issue?

Simply check if the job is inside another Rails context (request or job) is easy to implement, but MIGHT not entirely correct either:

In that case, Sentry would associate the exception to IndirectErrorJob. But whether it's the correct behaviour or now depends on how users model what a job means in their mind:

If a job represents an execution unit, then this is wrong. Because the closest execution unit of the exception should be ErrorJob instead.
If a job is merely a class wrapper (e.g. a service object), then this is correct as the current main context is indeed IndirectErrorJob.

I personally think 1 is what majority of users have in their minds, especially if we consider a scenario like:

class AJob < ApplicationJob
  def perform
    foo
    bar
    BJob.perform_now
    CJob.perform_now
  end
end

If both AJob's exceptions as well as BJob and CJob's exceptions are all reported under AJob, it'd be quite confusing IMO. But I'd also like to gather opinions from you guys on this.

jonleighton · 2025-01-13T00:21:08Z

When I reported this bug, option 2 is what I was expecting:

If a job is merely a class wrapper (e.g. a service object), then this is correct as the current main context is indeed IndirectErrorJob.

But I can also see your argument that that behaviour may be confusing and depends on the user's expectations.

The reason I expected this behaviour is that when I see an exception reported as coming from ErrorJob, then I take it to mean "an execution of ErrorJob failed", but in this case it is actually meaning "an execution of IndirectErrorJob failed, inside the ErrorJob bit".

In our application, this behaviour lead to confusion about which job execution was actually failing.

If both AJob's exceptions as well as BJob and CJob's exceptions are all reported under AJob, it'd be quite confusing IMO.

I don't really see it that way - I see Job.perform_now (rather than Job.perform_later) as basically an ordinary method call that shouldn't be considered part of job execution. Although that's not quite correct, because callbacks are still executed.

Either way, if some controller code executes Job.perform_now, I'd definitely expect the error to be tied to the controller/action, rather than the job, to make it clear that this error happened in a web process, not a job process.

st0012 · 2025-01-19T12:38:43Z

Either way, if some controller code executes Job.perform_now, I'd definitely expect the error to be tied to the controller/action, rather than the job, to make it clear that this error happened in a web process, not a job process.

First of all, transaction isn't a 1 to 1 mapping to the execution process. It's a way to group code execution for easier filtering and understanding. For example, you can create multiple transactions inside your request with Sentry.start_transaction, instead of just having that request transaction.

So using "which process this exception is raised at" to decide what the transaction attribute should be is a wrong assumption. A better question would be: "Does having a separate transaction for perform_now job executions help users debug their program or not?".

And I'm torn on this one because:

When ErrorJob is used in multiple places (both perform_later and perform_now), I want to see all of them by checking the ErrorJob transaction filter.
But when ErrorJob is reported under the other transaction (e.g. FooController#create), the event will also contain the request context's information, like breadcrumbs....etc. that could potentially help debugging.
- That said, if a job is executed with perform_now in a controller action, and relies on the controller context to work correctly, I'd argue that it may be a bad design. A job should only be interfaced through the parameters it receives, which is already reported under the current design.

Any thoughts? @sl0thentr0py @solnic

sl0thentr0py · 2025-01-20T11:10:23Z

Agree with @st0012's two points.

I just want to add one more point.

Either way, if some controller code executes Job.perform_now, I'd definitely expect the error to be tied to the controller/action, rather than the job, to make it clear that this error happened in a web process, not a job process.

In general, Sentry is gradually shifting to a trace-centric UI where the actual transaction will not be the primary model of organizing, so this discussion is somewhat less relevant from a future point of view. As long as the job is part of the trace irrespective of it being a new transaction or not, we should be good.

jonleighton assigned sl0thentr0py Nov 20, 2024

getsantry bot added the Waiting for: Product Owner label Nov 20, 2024

getsantry bot added this to GitHub Issues with 👀 3 Nov 20, 2024

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Nov 20, 2024

sl0thentr0py added the active_job label Nov 20, 2024

getsantry bot removed the Waiting for: Product Owner label Nov 20, 2024

sl0thentr0py added the Type: Bug label Nov 20, 2024

stephanie-anderson removed the Type: Bug label Dec 10, 2024

st0012 assigned st0012 and unassigned sl0thentr0py Jan 12, 2025

getsantry bot added the Waiting for: Product Owner label Jan 13, 2025

sl0thentr0py removed the Waiting for: Product Owner label Jan 13, 2025

getsantry bot removed the status in GitHub Issues with 👀 3 Jan 13, 2025

st0012 mentioned this issue Jan 19, 2025

Add Rails 8 example app #2512

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active Job reporter reports incorrect data if a job is run within another job via `perform_now` #2467

Active Job reporter reports incorrect data if a job is run within another job via `perform_now` #2467

jonleighton commented Nov 20, 2024

sl0thentr0py commented Nov 20, 2024

st0012 commented Jan 12, 2025

jonleighton commented Jan 13, 2025

st0012 commented Jan 19, 2025

sl0thentr0py commented Jan 20, 2025

Active Job reporter reports incorrect data if a job is run within another job via perform_now #2467

Active Job reporter reports incorrect data if a job is run within another job via perform_now #2467

Comments

jonleighton commented Nov 20, 2024

Issue Description

Reproduction Steps

Expected Behavior

Actual Behavior

Ruby Version

SDK Version

Integration and Its Version

Sentry Config

sl0thentr0py commented Nov 20, 2024

st0012 commented Jan 12, 2025

jonleighton commented Jan 13, 2025

st0012 commented Jan 19, 2025

sl0thentr0py commented Jan 20, 2025

Active Job reporter reports incorrect data if a job is run within another job via `perform_now` #2467

Active Job reporter reports incorrect data if a job is run within another job via `perform_now` #2467