Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ONNX): avoids resizing unsupported dimensions #3945

Open
wants to merge 50 commits into
base: main
Choose a base branch
from

Conversation

bjacobgordon
Copy link
Contributor

No description provided.

@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch 3 times, most recently from 6baa8d5 to ab7e021 Compare January 8, 2025 23:02
@bjacobgordon bjacobgordon changed the title fix(ONNX): protects against mismatched dynamic meta dimensions fix(ONNX): avoids resizing fixed dimensions Jan 9, 2025
@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from ab7e021 to 7aec80b Compare January 9, 2025 17:17
Copy link
Collaborator

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main structural question is about the need for adding the BaseTensorType method. If it were useful elsewhere (I have some doubts, since we would need to know too much about the two tensor shapes prior to using it- namely that they are present, and they have the same rank), I would consider keeping it; however, the code is simplified here by not using it, and I suspect that the same would be true in other circumstances where it might be used.

lib/Dialect/Torch/IR/TorchTypes.cpp Outdated Show resolved Hide resolved
include/torch-mlir/Dialect/Torch/IR/TorchTypes.h Outdated Show resolved Hide resolved
lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from 7aec80b to a20ee29 Compare January 10, 2025 23:03
@bjacobgordon bjacobgordon changed the title fix(ONNX): avoids resizing fixed dimensions fix(ONNX): avoids non-scalable dimensions in onnx.resize Jan 13, 2025
@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from a20ee29 to 574f4fe Compare January 13, 2025 15:22
@bjacobgordon bjacobgordon marked this pull request as ready for review January 13, 2025 17:06
@bjacobgordon bjacobgordon requested a review from zjgarvey January 13, 2025 17:06
Copy link
Collaborator

@zjgarvey zjgarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the misdirect earlier, we need to perform the runtime asserts on the scales or sizes values instead of sizes, since we will not have access to the correct output sizes ahead of time.

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
Comment on lines 2758 to 2820
int64_t const batchDimension = 0;
int64_t const channelDimension = 1;
int64_t nonScalableDimensions[] = {
batchDimension,
channelDimension,
};

auto errorMessageForScaling = [](int64_t givenDimension) {
switch (givenDimension) {
case batchDimension:
return "Unexpected intent to scale the batch dimension";
case channelDimension:
return "Unexpected intent to scale the channel dimension";
default:
return "Scalable dimension treated as non-scalable";
}
};

auto unknownSize = Torch::kUnknownSize;

// Compile-time check for dimensions of static size
for (auto eachDimension : nonScalableDimensions) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a personal preference, but I would certainly prefer:

Suggested change
int64_t const batchDimension = 0;
int64_t const channelDimension = 1;
int64_t nonScalableDimensions[] = {
batchDimension,
channelDimension,
};
auto errorMessageForScaling = [](int64_t givenDimension) {
switch (givenDimension) {
case batchDimension:
return "Unexpected intent to scale the batch dimension";
case channelDimension:
return "Unexpected intent to scale the channel dimension";
default:
return "Scalable dimension treated as non-scalable";
}
};
auto unknownSize = Torch::kUnknownSize;
// Compile-time check for dimensions of static size
for (auto eachDimension : nonScalableDimensions) {
// Compile-time check for dimensions of static size
for (int64_t i = 0; i<2; i++) {

This is for two reasons:

  1. Although the torch op might have these specific dimension specifications, the ONNX op does not. We should say something a bit more generic like "unsupported: non-trivial scaling in the first two dimensions.".
  2. We might want to implement support for this path instead of writing a match failure in the future. With all of the variables and structures introduced here, I think this would cause more work for the future developer.

Copy link
Contributor Author

@bjacobgordon bjacobgordon Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And personal preferences are worth sharing! They're heuristics for our engineering experiences, and they're worth putting out there because:

  • they could help someone solve a problem that they've encountered in their own experience
  • it's an opportunity to evaluate a preference, put them to words, recalibrate them, or maybe even drop them altogether when presented with new information.
  1. Gotcha, so we want the error message to not reveal the rational. In that case, I'll have the message report the dimension index instead. Also gives us a chance to differentiate the messages between compile-time and run-time check.
  2. I agree that it's work! In the case of adding support, work is unavoidable:
    • the work of deleting 3 variable declarations that contain cognitive load
      OR
    • the work of gathering the required cognitive load by
      • talking to other engineers
      • researching within the codebase
      • reading external docs
      • etc.
        OR
    • some other unforeseen work!

Predicted questions by future devs (especially if they're also cross-discipline):

We do 2 loops, so what's significant about the first two dimensions?

  • pings engineer again

Okay, so they shouldn't be scaled, but why? What do they represent?

  • pings engineer yet again

Okay, they're the batch and channels dims, but how's that different from the other dims?

  • pings engineer one more time

Oh, their elements have nothing to do spatial dimensions, got it!

With the dims explicitly declared and the implementation based on them, another engineer might be able to avoid having to recollect the CL we had already collected once before.

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved
binder.op, errorMessageForScaling(eachDimension));
}

auto binderLocation = binder.getLoc();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use loc. This is actually the original op's location (and not the binder), and loc is a fairly canonical variable name for the location of the op being rewritten.

@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from 574f4fe to cb371ea Compare January 14, 2025 15:04
@IanWood1
Copy link
Contributor

IanWood1 commented Jan 14, 2025

I think the renaming of loc -> opLocation should probably be split into a different PR so it can be reviewed separately, it adds ~600 lines of changes that aren't directly related to the functional changes. It would help to keep this PR focused and easier to review.

@bjacobgordon bjacobgordon marked this pull request as draft January 15, 2025 14:49
@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch 3 times, most recently from 05ea165 to cb20894 Compare January 15, 2025 23:19
@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch 2 times, most recently from 14233f1 to 6e5ca41 Compare January 17, 2025 18:23
- before, some of assignments were explicitly `Location` while the rest were `auto`
- Intellisense was able to infer `mlir::Location`, meaning `auto` is sufficient
1. `loc`: when searching "loc" across the codebase, this appears in contexts where it could mean "location", "locale", etc. Avoiding abbreviations reduces this ambiguity.
2. `location`: Now the question to answer is "The location of what, exactly?" without having to ping a colleague or scrub the codebase. We see `auto location = binder.getLoc()`, so the (incorrect) inference is "ah the location of the binder, I guess", so we use that to modify the word "location"
3. `binderLocation`: Turns out that "binder" is short for "opBinder", meaning this is actually "location of _op_". So, we switch the "modifying noun"
4. `opLocation`: bakes in the understanding that probably required preceding engineers (like me) to spend theirs and others time clarifying. Adding 7 characters to "loc" avoids time to grok, avoids unnecessary roping in of other engineers, and makes it easier to onboard other engineers. Overall, this is easier to build upon from an organizational standpoint.

This particular name did not exist in the codebase prior to this commit, so reverting this change is trivially easy no matter how far back in time in was added!
- cast to `ValueTensorType` was overly specific for the methods used
@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from 6e5ca41 to 8a78147 Compare January 17, 2025 19:50
…ransforms filter

- helper isn't concerned with the role of the transformations at the call site, only its interpretation
@bjacobgordon bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from 8a78147 to 54cf76e Compare January 17, 2025 21:50
@bjacobgordon
Copy link
Contributor Author

bjacobgordon commented Jan 17, 2025

Okay, @zjgarvey, I think we're in business! Got green on the CI a few hours ago. Just wrapped up self-review.

I'm guessing now we'll:

  • discuss how the diffs might need to change
  • finalize those changes
  • discuss which chunks of commits need to be in the PR stack
    Something like that?

I kept the commits atomic and ordered such that it's easier to propagate changes to the head of the branch in case an earlier commit needs to be inserted/tweaked/excised.

Let me know what you think!

@bjacobgordon bjacobgordon requested a review from zjgarvey January 17, 2025 21:59
@bjacobgordon bjacobgordon marked this pull request as ready for review January 17, 2025 21:59
@bjacobgordon bjacobgordon changed the title fix(ONNX): avoids non-scalable dimensions in onnx.resize fix(ONNX): avoids resizing unsupported dimensions Jan 17, 2025
@zjgarvey
Copy link
Collaborator

Okay, @zjgarvey, I think we're in business! Got green on the CI a few hours ago. Just wrapped up self-review.

I'm guessing now we'll:

* discuss how the diffs might need to change

* finalize those changes

* discuss which chunks of commits need to be in the PR stack
  Something like that?

I kept the commits atomic and ordered such that it's easier to propagate changes to the head of the branch in case an earlier commit needs to be inserted/tweaked/excised.

Let me know what you think!

Nice!

At least for now, please exclude any commits which involve style changes not directly related to the fix content. E.g. widespread enforcement of naming preferences like the loc -> opLocation should be factored out into a separate PR as @IanWood1 suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants