New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

fix(ONNX): avoids resizing unsupported dimensions #3945

Open

bjacobgordon wants to merge 50 commits into llvm:main from bjacobgordon:fix-onnx-adds-exceptions-enforcing-convention-in-resize-op

+757 −565

Contributor

bjacobgordon commented Jan 7, 2025

No description provided.

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch 3 times, most recently from 6baa8d5 to ab7e021 Compare

January 8, 2025 23:02

bjacobgordon changed the title ~~fix(ONNX): protects against mismatched dynamic meta dimensions~~ fix(ONNX): avoids resizing fixed dimensions

bjacobgordon mentioned this pull request

convert-torch-onnx-to-torch generates invalid IR for onnx.Resize where scaling is in the first two dimensions #3453

Open

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from ab7e021 to 7aec80b Compare

January 9, 2025 17:17

zjgarvey reviewed

View reviewed changes

Collaborator

zjgarvey left a comment

I think the main structural question is about the need for adding the BaseTensorType method. If it were useful elsewhere (I have some doubts, since we would need to know too much about the two tensor shapes prior to using it- namely that they are present, and they have the same rank), I would consider keeping it; however, the code is simplified here by not using it, and I suspect that the same would be true in other circumstances where it might be used.

lib/Dialect/Torch/IR/TorchTypes.cpp Outdated Show resolved Hide resolved

include/torch-mlir/Dialect/Torch/IR/TorchTypes.h Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from 7aec80b to a20ee29 Compare

January 10, 2025 23:03

bjacobgordon changed the title ~~fix(ONNX): avoids resizing fixed dimensions~~ fix(ONNX): avoids non-scalable dimensions in onnx.resize

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from a20ee29 to 574f4fe Compare

January 13, 2025 15:22

bjacobgordon marked this pull request as ready for review

January 13, 2025 17:06

bjacobgordon requested a review from zjgarvey

January 13, 2025 17:06

zjgarvey requested changes

View reviewed changes

Collaborator

zjgarvey left a comment

Sorry for the misdirect earlier, we need to perform the runtime asserts on the scales or sizes values instead of sizes, since we will not have access to the correct output sizes ahead of time.

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated

Comment on lines 2758 to 2820

+                      int64_t const batchDimension = 0;
+                      int64_t const channelDimension = 1;
+                      int64_t nonScalableDimensions[] = {
+                          batchDimension,
+                          channelDimension,
+                      };
+                      auto errorMessageForScaling = [](int64_t givenDimension) {
+                        switch (givenDimension) {
+                        case batchDimension:
+                          return "Unexpected intent to scale the batch dimension";
+                        case channelDimension:
+                          return "Unexpected intent to scale the channel dimension";
+                        default:
+                          return "Scalable dimension treated as non-scalable";
+                        }
+                      };
+                      auto unknownSize = Torch::kUnknownSize;
+                      // Compile-time check for dimensions of static size
+                      for (auto eachDimension : nonScalableDimensions) {

Collaborator

zjgarvey Jan 13, 2025

This is a personal preference, but I would certainly prefer:

Suggested change

      
                    int64_t const batchDimension = 0;
          
                    int64_t const channelDimension = 1;
          
                    int64_t nonScalableDimensions[] = {
          
                        batchDimension,
          
                        channelDimension,
          
                    };
          
                    auto errorMessageForScaling = [](int64_t givenDimension) {
          
                      switch (givenDimension) {
          
                      case batchDimension:
          
                        return "Unexpected intent to scale the batch dimension";
          
                      case channelDimension:
          
                        return "Unexpected intent to scale the channel dimension";
          
                      default:
          
                        return "Scalable dimension treated as non-scalable";
          
                      }
          
                    };
          
                    auto unknownSize = Torch::kUnknownSize;
          
                    // Compile-time check for dimensions of static size
          
                    for (auto eachDimension : nonScalableDimensions) {
          
                    // Compile-time check for dimensions of static size
          
                    for (int64_t i = 0; i<2; i++) {

This is for two reasons:

Although the torch op might have these specific dimension specifications, the ONNX op does not. We should say something a bit more generic like "unsupported: non-trivial scaling in the first two dimensions.".
We might want to implement support for this path instead of writing a match failure in the future. With all of the variables and structures introduced here, I think this would cause more work for the future developer.

Contributor Author

bjacobgordon Jan 13, 2025 •

edited

Loading

And personal preferences are worth sharing! They're heuristics for our engineering experiences, and they're worth putting out there because:

they could help someone solve a problem that they've encountered in their own experience
it's an opportunity to evaluate a preference, put them to words, recalibrate them, or maybe even drop them altogether when presented with new information.

Gotcha, so we want the error message to not reveal the rational. In that case, I'll have the message report the dimension index instead. Also gives us a chance to differentiate the messages between compile-time and run-time check.
I agree that it's work! In the case of adding support, work is unavoidable:
- the work of deleting 3 variable declarations that contain cognitive load
  OR
- the work of gathering the required cognitive load by
  - talking to other engineers
  - researching within the codebase
  - reading external docs
  - etc.
    OR
- some other unforeseen work!

Predicted questions by future devs (especially if they're also cross-discipline):

We do 2 loops, so what's significant about the first two dimensions?

pings engineer again

Okay, so they shouldn't be scaled, but why? What do they represent?

pings engineer yet again

Okay, they're the batch and channels dims, but how's that different from the other dims?

pings engineer one more time

Oh, their elements have nothing to do spatial dimensions, got it!

With the dims explicitly declared and the implementation based on them, another engineer might be able to avoid having to recollect the CL we had already collected once before.

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated

+                            binder.op, errorMessageForScaling(eachDimension));
+                      }
+                      auto binderLocation = binder.getLoc();

Collaborator

zjgarvey Jan 13, 2025

please use loc. This is actually the original op's location (and not the binder), and loc is a fairly canonical variable name for the location of the op being rewritten.

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from 574f4fe to cb371ea Compare

January 14, 2025 15:04

Contributor

IanWood1 commented Jan 14, 2025 •

edited

Loading

I think the renaming of loc -> opLocation should probably be split into a different PR so it can be reviewed separately, it adds ~600 lines of changes that aren't directly related to the functional changes. It would help to keep this PR focused and easier to review.

bjacobgordon marked this pull request as draft

January 15, 2025 14:49

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch 3 times, most recently from 05ea165 to cb20894 Compare

January 15, 2025 23:19

zjgarvey reviewed

View reviewed changes

lib/Conversion/TorchOnnxToTorch/DefaultDomainQtoZ.cpp Outdated Show resolved Hide resolved

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch 2 times, most recently from 14233f1 to 6e5ca41 Compare

January 17, 2025 18:23

bjacobgordon added 8 commits

January 17, 2025 19:36


          refactor(ONNX): prefers auto annotation when getting op location

03bba3f

- before, some of assignments were explicitly `Location` while the rest were `auto`
- Intellisense was able to infer `mlir::Location`, meaning `auto` is sufficient


          refactor(ONNX): enforces opLocation naming convention

db0adf1

1. `loc`: when searching "loc" across the codebase, this appears in contexts where it could mean "location", "locale", etc. Avoiding abbreviations reduces this ambiguity.
2. `location`: Now the question to answer is "The location of what, exactly?" without having to ping a colleague or scrub the codebase. We see `auto location = binder.getLoc()`, so the (incorrect) inference is "ah the location of the binder, I guess", so we use that to modify the word "location"
3. `binderLocation`: Turns out that "binder" is short for "opBinder", meaning this is actually "location of _op_". So, we switch the "modifying noun"
4. `opLocation`: bakes in the understanding that probably required preceding engineers (like me) to spend theirs and others time clarifying. Adding 7 characters to "loc" avoids time to grok, avoids unnecessary roping in of other engineers, and makes it easier to onboard other engineers. Overall, this is easier to build upon from an organizational standpoint.

This particular name did not exist in the codebase prior to this commit, so reverting this change is trivially easy no matter how far back in time in was added!


          refactor(ONNX): prefers assignment to precede first usage in onnx.resize

71298b3

- avoids SSA before match failures


          refactor(ONNX): extracts opLocation within onnx.resize

2c6d620


          refactor(ONNX): moves rank closer to first usage in onnx.resize


          refactor(ONNX): forces cast of operand in onnx.resize

e7741f2


          refactor(ONNX): loosens downcast in onnx.resize

68e8123

- cast to `ValueTensorType` was overly specific for the methods used


          refactor(ONNX): extracts inputTensor within onnx.resize

652ec6d

bjacobgordon added 6 commits

January 17, 2025 19:36


          refactor(ONNX): dissolves extract into sole call site within transf…

5f9877f

…orms filter


          refactor(ONNX): mimics conditional structure in transforms filter

51be1ec


          refactor(ONNX): prefers braces around conditional block

5c76e9d


          refactor(ONNX): removes redundant xTy declaration in transforms filter

3ecb0c8


          refactor(ONNX): merges identical conditional structures

e7d0121


          refactor(ONNX): leverages operandType when declaring sizes in tra…

e4d8ae9

…nsforms filter

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from 6e5ca41 to 8a78147 Compare

January 17, 2025 19:50

bjacobgordon added 17 commits

January 17, 2025 21:41


          refactor(ONNX): inlines size when getting type of selection in transf…

adc5519

…orms filter


          refactor(ONNX): simplifies selectIndex op in transforms filter

ecbb6f0


          refactor(ONNX): removes redundant auto annotation in transforms filter

fd4f7df


          refactor(ONNX): captures magic number in transforms filter

492d735


          refactor(ONNX): extracts numberOfTransformableDimensions within tra…

917981a

…nsforms filter


          refactor(ONNX): renames sizes to sizesOfTransformationVector

2c46184


          refactor(ONNX): renames operandType to typeOfTransformationVector…

7b1abec

… in transforms filter


          refactor(ONNX): renames operand to givenTransformationVector in t…

cbb0f75

…ransforms filter

- helper isn't concerned with the role of the transformations at the call site, only its interpretation


          refactor(ONNX): renames i to eachDimension in transforms filter

efbb7b2

- avoids single-letter naming


          refactor(ONNX): renames selectIndex to eachDimensionAsOp in trans…

b68b26d

…forms filter


          refactor(ONNX): renames itemList to filteredTransformations in tr…

abf9b69

…ansforms filter


          refactor(ONNX): renames item to eachTransformation in transforms …

4baeee2

…filter


          refactor(ONNX): renames ext to `selectionFromEachTransformationVect…

1b5a49d

…or` in onnx.resize


          refactor(ONNX): renames selectResultType to `typeOfSelectionFromTra…

832b5bc

…nsformationVector` in onnx.resize


          refactor(ONNX): renames extractTy to typeOfEveryTransformation in…

9e1bf41

… onnx.resize


          refactor(ONNX): renames zero to zeroAsOp in transforms filter

54b021d


          fix(ONNX): avoids resizing unsupported dimensions

54cf76e

bjacobgordon force-pushed the fix-onnx-adds-exceptions-enforcing-convention-in-resize-op branch from 8a78147 to 54cf76e Compare

January 17, 2025 21:50

Contributor Author

bjacobgordon commented Jan 17, 2025 •

edited

Loading

Okay, @zjgarvey, I think we're in business! Got green on the CI a few hours ago. Just wrapped up self-review.

I'm guessing now we'll:

discuss how the diffs might need to change
finalize those changes
discuss which chunks of commits need to be in the PR stack
Something like that?

I kept the commits atomic and ordered such that it's easier to propagate changes to the head of the branch in case an earlier commit needs to be inserted/tweaked/excised.

Let me know what you think!

bjacobgordon requested a review from zjgarvey

January 17, 2025 21:59

bjacobgordon marked this pull request as ready for review

January 17, 2025 21:59

bjacobgordon changed the title ~~fix(ONNX): avoids non-scalable dimensions in onnx.resize~~ fix(ONNX): avoids resizing unsupported dimensions

Collaborator

zjgarvey commented Jan 17, 2025

Okay, @zjgarvey, I think we're in business! Got green on the CI a few hours ago. Just wrapped up self-review.

I'm guessing now we'll:
* discuss how the diffs might need to change

* finalize those changes

* discuss which chunks of commits need to be in the PR stack
  Something like that?
I kept the commits atomic and ordered such that it's easier to propagate changes to the head of the branch in case an earlier commit needs to be inserted/tweaked/excised.

Let me know what you think!

Nice!

At least for now, please exclude any commits which involve style changes not directly related to the fix content. E.g. widespread enforcement of naming preferences like the loc -> opLocation should be factored out into a separate PR as @IanWood1 suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet