Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active Job reporter reports incorrect data if a job is run within another job via perform_now #2467

Open
jonleighton opened this issue Nov 20, 2024 · 5 comments
Assignees

Comments

@jonleighton
Copy link

Issue Description

Sentry's Active Job reporting code works by prepending a module to overload the perform_now method.

This has the outcome that if one job runs the code in another job via perform_now, and an error occurs, the transaction and context will be reported as the inner job, even though it's the outer job being run.

Similarly, we may not even be in a job run at all, e.g. if some controller calls perform_now on a job, the context will be wrong. It will look like the error occurred in a worker process, when it actually occurred in a web process.

Reproduction Steps

class AJob < ApplicationJob
  def perform
    BJob.perform_now
  end
end

class BJob < ApplicationJob
  def perform
    raise "omg"
  end
end

Call AJob.perform_later.

Expected Behavior

Transaction/context is reported from AJob.

Actual Behavior

Transaction/context is reported from BJob.

Ruby Version

3.3.5

SDK Version

5.21.0

Integration and Its Version

No response

Sentry Config

No response

@sl0thentr0py
Copy link
Member

right, technically if we're in the same process and not in the worker, we could just start a span instead. I'll have to see what the best way of detecting this is though, or maybe not patch perform_now but the enqueue/execute separately.

@st0012
Copy link
Collaborator

st0012 commented Jan 12, 2025

In such events, my expectations are:

  1. The exception should be directly tied to BJob as the exception is raised in it.
  2. AJob should be listed in the stacktrace.
  3. AJob should be the transaction.

In my example app:

  • ErrorJob = BJob
  • IndirectErrorJob = AJob

And the current behaviour is:

Image

And 3) is not satisfied, which I think is the main concern of the issue?

Simply check if the job is inside another Rails context (request or job) is easy to implement, but MIGHT not entirely correct either:

Image

In that case, Sentry would associate the exception to IndirectErrorJob. But whether it's the correct behaviour or now depends on how users model what a job means in their mind:

  1. If a job represents an execution unit, then this is wrong. Because the closest execution unit of the exception should be ErrorJob instead.
  2. If a job is merely a class wrapper (e.g. a service object), then this is correct as the current main context is indeed IndirectErrorJob.

I personally think 1 is what majority of users have in their minds, especially if we consider a scenario like:

class AJob < ApplicationJob
  def perform
    foo
    bar
    BJob.perform_now
    CJob.perform_now
  end
end

If both AJob's exceptions as well as BJob and CJob's exceptions are all reported under AJob, it'd be quite confusing IMO. But I'd also like to gather opinions from you guys on this.

@jonleighton
Copy link
Author

When I reported this bug, option 2 is what I was expecting:

If a job is merely a class wrapper (e.g. a service object), then this is correct as the current main context is indeed IndirectErrorJob.

But I can also see your argument that that behaviour may be confusing and depends on the user's expectations.

The reason I expected this behaviour is that when I see an exception reported as coming from ErrorJob, then I take it to mean "an execution of ErrorJob failed", but in this case it is actually meaning "an execution of IndirectErrorJob failed, inside the ErrorJob bit".

In our application, this behaviour lead to confusion about which job execution was actually failing.

If both AJob's exceptions as well as BJob and CJob's exceptions are all reported under AJob, it'd be quite confusing IMO.

I don't really see it that way - I see Job.perform_now (rather than Job.perform_later) as basically an ordinary method call that shouldn't be considered part of job execution. Although that's not quite correct, because callbacks are still executed.

Either way, if some controller code executes Job.perform_now, I'd definitely expect the error to be tied to the controller/action, rather than the job, to make it clear that this error happened in a web process, not a job process.

@st0012
Copy link
Collaborator

st0012 commented Jan 19, 2025

Either way, if some controller code executes Job.perform_now, I'd definitely expect the error to be tied to the controller/action, rather than the job, to make it clear that this error happened in a web process, not a job process.

First of all, transaction isn't a 1 to 1 mapping to the execution process. It's a way to group code execution for easier filtering and understanding. For example, you can create multiple transactions inside your request with Sentry.start_transaction, instead of just having that request transaction.

So using "which process this exception is raised at" to decide what the transaction attribute should be is a wrong assumption. A better question would be: "Does having a separate transaction for perform_now job executions help users debug their program or not?".

And I'm torn on this one because:

  1. When ErrorJob is used in multiple places (both perform_later and perform_now), I want to see all of them by checking the ErrorJob transaction filter.
  2. But when ErrorJob is reported under the other transaction (e.g. FooController#create), the event will also contain the request context's information, like breadcrumbs....etc. that could potentially help debugging.
    • That said, if a job is executed with perform_now in a controller action, and relies on the controller context to work correctly, I'd argue that it may be a bad design. A job should only be interfaced through the parameters it receives, which is already reported under the current design.

Any thoughts? @sl0thentr0py @solnic

@sl0thentr0py
Copy link
Member

Agree with @st0012's two points.

I just want to add one more point.

Either way, if some controller code executes Job.perform_now, I'd definitely expect the error to be tied to the controller/action, rather than the job, to make it clear that this error happened in a web process, not a job process.

In general, Sentry is gradually shifting to a trace-centric UI where the actual transaction will not be the primary model of organizing, so this discussion is somewhat less relevant from a future point of view. As long as the job is part of the trace irrespective of it being a new transaction or not, we should be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants