Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 05ed89f46980b7e5a5328bc20af8b32ca9f1f715
Author: PeixuanZuo <[email protected]>
Date:   Thu Feb 22 13:34:55 2024 +0800

    [ROCm] Add excluded libs for ROCm python package  (#19586)

    The rocm lib version has changed in rocm 6.0

    Using libs packaged in whl might cause errors.
    For example, `libamdhip64.so.6` packaged in whl will cause compute error
    when training gpt2 model.

    The root cause still in investigating.

commit 8354329086ebb190db9ea0cb6a3fa72f53f8f881
Author: PeixuanZuo <[email protected]>
Date:   Thu Feb 22 13:34:45 2024 +0800

    [ROCm] SkipGroupNorm triton (#19408)

    Change GroupNorm triton to support SkipGroupNorm

commit 3d88487c96bf467c4b83dff179c9e282602e2d64
Author: Vincent Wang <[email protected]>
Date:   Thu Feb 22 10:35:26 2024 +0800

    Minor Triton Fix (#19589)

    Including removing a unnecessary assert, and add support of passing
    string attribute from ONNX node attribute to python functoin kwargs
    (mainly for passing debug info from graph to python for now).

commit 5197db19802a39e47d19ac829cd08a94bacbdfbb
Author: Sheil Kumar <[email protected]>
Date:   Wed Feb 21 15:45:44 2024 -0800

    Diable __cpuid call for ARM64EC (#19592)

    Diable __cpuid call for ARM64EC

    Co-authored-by: Sheil Kumar <[email protected]>

commit 38c34323939bac03b9648b2e59dbbe8de0bd7092
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Wed Feb 21 13:58:53 2024 -0800

    Bump ip from 1.1.8 to 1.1.9 in /js/react_native (#19582)

    Bumps [ip](https://github.com/indutny/node-ip) from 1.1.8 to 1.1.9.
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a
    href="https://github.com/indutny/node-ip/commit/1ecbf2fd8c0cc85e44c3b587d2de641f50dc0217"><code>1ecbf2f</code></a>
    1.1.9</li>
    <li><a
    href="https://github.com/indutny/node-ip/commit/6a3ada9b471b09d5f0f5be264911ab564bf67894"><code>6a3ada9</code></a>
    lib: fixed CVE-2023-42282 and added unit test</li>
    <li>See full diff in <a
    href="https://github.com/indutny/node-ip/compare/v1.1.8...v1.1.9">compare
    view</a></li>
    </ul>
    </details>
    <br />

    [![Dependabot compatibility
    score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ip&package-manager=npm_and_yarn&previous-version=1.1.8&new-version=1.1.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

    Dependabot will resolve any conflicts with this PR as long as you don't
    alter it yourself. You can also trigger a rebase manually by commenting
    `@dependabot rebase`.

    [//]: # (dependabot-automerge-start)
    Dependabot will merge this PR once CI passes on it, as requested by
    @fs-eire.

    [//]: # (dependabot-automerge-end)

    ---

    <details>
    <summary>Dependabot commands and options</summary>
    <br />

    You can trigger Dependabot actions by commenting on this PR:
    - `@dependabot rebase` will rebase this PR
    - `@dependabot recreate` will recreate this PR, overwriting any edits
    that have been made to it
    - `@dependabot merge` will merge this PR after your CI passes on it
    - `@dependabot squash and merge` will squash and merge this PR after
    your CI passes on it
    - `@dependabot cancel merge` will cancel a previously requested merge
    and block automerging
    - `@dependabot reopen` will reopen this PR if it is closed
    - `@dependabot close` will close this PR and stop Dependabot recreating
    it. You can achieve the same result by closing it manually
    - `@dependabot show <dependency name> ignore conditions` will show all
    of the ignore conditions of the specified dependency
    - `@dependabot ignore this major version` will close this PR and stop
    Dependabot creating any more for this major version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this minor version` will close this PR and stop
    Dependabot creating any more for this minor version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this dependency` will close this PR and stop
    Dependabot creating any more for this dependency (unless you reopen the
    PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the
    [Security Alerts
    page](https://github.com/microsoft/onnxruntime/network/alerts).

    </details>

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit ebd220b0730f9898aaa0275ef0d8195ce70057d0
Author: Matttttt <[email protected]>
Date:   Wed Feb 21 21:38:18 2024 +0000

    Misspelling in README.md (#19433)

    Fixed a misspelling.

commit 3afb38cfb7d4263f262dea33bcfa16d35c67fede
Author: Tianlei Wu <[email protected]>
Date:   Wed Feb 21 12:46:16 2024 -0800

    [CUDA] Add use_tf32 cuda provider option (for FP32 Conv) (#19426)

    Follow up of https://github.com/microsoft/onnxruntime/pull/19357 to apply the use_tf32 option on fp32 cuDNN convolution.

    When use_tf32 = 0, we will disable TF32 in cuDNN convolution for FP32 inputs.

    https://docs.nvidia.com/deeplearning/cudnn/api/cudnn-graph-library.html#cudnnmathtype-t
    **CUDNN_FMA_MATH**
    - Restricted to only kernels that use FMA instructions.
    - On pre-NVIDIA A100 GPU devices, CUDNN_DEFAULT_MATH and CUDNN_FMA_MATH
    have the same behavior: Tensor Core kernels will not be selected.
    - With NVIDIA Ampere architecture and CUDA toolkit 11,
    CUDNN_DEFAULT_MATH permits TF32 Tensor Core operation and CUDNN_FMA_MATH
    does not.
    - The TF32 behavior for CUDNN_DEFAULT_MATH and the other Tensor Core
    math types can be explicitly disabled by the environment variable
    NVIDIA_TF32_OVERRIDE=0.

commit e5ce81ae847d0b347a3dfe95abfc9e407e2f0469
Author: Adam Pocock <[email protected]>
Date:   Wed Feb 21 15:24:41 2024 -0500

    [java] Adding ML program flag for CoreML (#19551)
    Adds the new CoreML enum flags to enable ML Program support in Java.
    Adds support for #19347 to the Java API.

commit 57d6819212464f49b30db047528be0f409dadc67
Author: Xu Xing <[email protected]>
Date:   Thu Feb 22 00:08:47 2024 +0800

    [js/web] Fix fused-conv is not included in npm test (#19581)

    BUG: https://github.com/microsoft/onnxruntime/issues/18855
    <!-- Describe your changes. -->
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

commit 58f4921686bf0a5b0442fb6df92d1b1972a118cc
Author: Yulong Wang <[email protected]>
Date:   Wed Feb 21 00:31:06 2024 -0800

    [js] changes to allow Float16Array if any polyfill is available (#19305)

    This change adds only necessary code to enable ort-web works with any
    Float16Array polyfill. Unlike #19302, in this PR, ort-web does not
    include any specific polyfill; instead, it's user's choice for how to
    use a polyfill.

    ORT-web uses Float16Array if it's available; otherwise, fallback to use
    Uint16Array.

    ```js
    // case 1: user does not use polyfill:
    import * as ort from 'onnxruntime-web';

    const myF16Data = new Uint16Array(...);  // need to use Uint16Array
    const myF16tensor = new ort.Tensor('float16', myF16Data, dims);
    ```

    ```js
    // case 2: user use polyfill:
    import * as ort from 'onnxruntime-web';
    import {
      Float16Array, isFloat16Array, isTypedArray,
      getFloat16, setFloat16,
      f16round,
    } from "@petamoriken/float16";
    globalThis.Float16Array = Float16Array;  // ort-web will pick the global Float16Array

    const myF16Data = new Float16Array(...);  // Use the polyfilled Float16Array type
    const myF16tensor = new ort.Tensor('float16', myF16Data, dims);
    ```

commit 8092a89688f92dee83d1d0111acaa1e1d2dfdb85
Author: satyajandhyala <[email protected]>
Date:   Tue Feb 20 21:18:54 2024 -0800

    Changed command line argpasrse to process '--symmetric [True|False]'. (#19577)
    <!-- Describe your changes. -->
    Accept the command line option --symmetric and its optional value
    correctly. If the optional value matches uncased to 'True' then set
    symmetric to True else set symmetric to False. Asymmetric quantization
    will generate zero_point input.
    ```
    usage: matmul_4bits_quantizer.py [-h] --input_model INPUT_MODEL --output_model OUTPUT_MODEL [--block_size BLOCK_SIZE] [--symmetric [{True,False}]] [--accuracy_level ACCURACY_LEVEL] [-v]
                                     [--nodes_to_exclude NODES_TO_EXCLUDE [NODES_TO_EXCLUDE ...]]
    ```
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

commit 124bde985ae883566c44f5cd84d351612006100c
Author: Baiju Meswani <[email protected]>
Date:   Tue Feb 20 19:20:42 2024 -0800

    Bring QAT POC back to a functional state (#19290)

commit 6226c5f62f3d16b9702d5c40993ee9bf1cbd119c
Author: PeixuanZuo <[email protected]>
Date:   Wed Feb 21 11:08:48 2024 +0800

    [ROCm] Add SkipGroupNorm for ROCm EP (#19303)

    Add SkipGroupNorm for ROCm EP.

    ---------

    Co-authored-by: Peixuan Zuo <[email protected]@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

commit 8fadc6c913bc30edff2e89756da515b9bd75d256
Author: zhijiang <[email protected]>
Date:   Wed Feb 21 10:41:42 2024 +0800

    Zhijxu/cleanup cached tensors when oom (#19306)

    in pytorch, when oom happens at bp, user could decrease the batch size
    and rerun it without restarting the process.

    while in ORT, the intermediate tensors are kept even OOM, so decrease
    batch size still fail.

    this is torch run, we can see after oom failure, torch will release
    tensor before next step

    ![image](https://github.com/microsoft/onnxruntime/assets/43435212/92b8a2e3-454b-448a-a223-17cb91d463c2)

    this is from ort, we can see ort not release its tensors after OOM
    failure.

    ![image](https://github.com/microsoft/onnxruntime/assets/43435212/bb6a3882-8e14-4f37-8079-e7f70fc2546b)

    ort with the PR, we can see memory is released, **the 4GB memory is not
    own by ort, and will be released by torch at the end**.

    ![image](https://github.com/microsoft/onnxruntime/assets/43435212/7f39d711-4e36-47d5-aecf-3805433a6d01)

commit 0c4421cb7867434e1e08b4274f16f6c2f14cb4ce
Author: Markus Tavenrath <[email protected]>
Date:   Wed Feb 21 03:39:43 2024 +0100

    Fix compile warnings (as errors) for functions which miss returning required return value (#19079)

    Added dummy return values to functions which specify a return value, but
    do not return an value value.
    Fix compiler errors with 'warnings as errors' enabled.

commit 45e20bf7810689ecf385957c34434c6d2456e32b
Author: Scott McKay <[email protected]>
Date:   Wed Feb 21 12:38:37 2024 +1000

    Use build.py to build in py-win-gpu.yml so parallelization parameters are set (#19578)
    <!-- Describe your changes. -->
    build.py sets a few parallelization parameters when building. Using
    msbuild directly lacks those.

    https://github.com/microsoft/onnxruntime/blob/7a5860e4909387448cb51351d3af50933238ba10/tools/ci_build/build.py#L1665-L1669

    Changed to use build.py. If there's a concern with that we _could_ set
    the parameters in the yaml, but that will be uglier due to duplicating
    logic in multiple places.
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

commit 6e04e36e3faf2d8115c0962c85b86a6a8b48ac5b
Author: Yulong Wang <[email protected]>
Date:   Tue Feb 20 17:33:37 2024 -0800

    [js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317)
    upgrade tsc in common from 4.9.5 to 5.2.2

commit 70567a4b3a8bc74fb0f1a9ed9ea5a5be6b99b378
Author: Yulong Wang <[email protected]>
Date:   Tue Feb 20 17:33:21 2024 -0800

    [js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358)
    use ApiTensor insteadof onnxjs Tensor in TensorResultValidator. Make
    test runner less depend on onnxjs classes.

commit 3fe2c137ee5923ee369062453d528fe0e33bf4bc
Author: Yulong Wang <[email protected]>
Date:   Tue Feb 20 17:23:01 2024 -0800

    [js] small fix to workaround formatter (#19400)
    Rename shader variable names to snake_case naming and also to avoid
    formatter behaving inconsistently in win/linux.

commit 97ff17c2cbb6ee6f27c052e9c4302c70a41af485
Author: Yulong Wang <[email protected]>
Date:   Tue Feb 20 17:02:11 2024 -0800

    update script of run CI for external PRs to add "Big Models" (#19576)
    update script of run CI for external PRs to add "Big Models"

commit 7a5860e4909387448cb51351d3af50933238ba10
Author: Jake Mathern <[email protected]>
Date:   Tue Feb 20 13:41:40 2024 -0800

    Fix cmake function duplicate lib (#19547)
    Fixes cmake function definition in winml.cmake to copy link flags.
    XFGCheck errors in WindowsAI because this function does not transfer
    linker flags

commit ec9c8cbdc9686ccda6553674d6aab61cfd245cf0
Author: Scott McKay <[email protected]>
Date:   Wed Feb 21 07:40:35 2024 +1000

    Use xcode parallel build flags to speed up iOS CI that is timing out (#19570)
    <!-- Describe your changes. -->
    Provide specific xcodebuild flags instead of depending on cmake to do
    the right thing.

    This built in just over an hour with a ccache miss. Previous CIs with a
    ccache miss were timing out after 150 minutes.
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

commit 3c49aacd5667b320a4e02626a176098f7423d7c0
Author: Sheil Kumar <[email protected]>
Date:   Tue Feb 20 13:13:40 2024 -0800

    Disable __cpuid check on arm64 builds as intrinsic is not available (#19574)

    Disable __cpuid check on arm64 builds as intrinsic is not available

    Motivation
    Breaking the arm64 build.

    Co-authored-by: Sheil Kumar <[email protected]>

commit 1b48054e1b7991ccef664fbedd659ec95d0e7ca7
Author: Jiajie Hu <[email protected]>
Date:   Wed Feb 21 01:24:34 2024 +0800

    [js/webgpu] Create Split indices helpers by rank, not by shape (#19554)
    This is required to make shape uniforms really work.
    The bug was unveiled in a model with multiple Split nodes. The later
    nodes would try to reuse a previous pipeline cache, while the old shapes
    were hardcoded as constants in cache.

commit 7efb0dbe12cf8736d97dcc3b8f41eb96c5c34719
Author: Xavier Dupré <[email protected]>
Date:   Tue Feb 20 17:22:44 2024 +0100

    add option DefaultTensorType to specify the default tensor type to quantize (#19455)
    The current quantization tool relies on shape inference to provide the
    type of every intermediate tensor, then the tool knows which type it
    must dequantize into (float32, float16). However, this information is
    not available if shape inference fails. That happens every time the
    model include an operator from a custom domain such as com.microsoft.

    This PR introduces an extra option `DefaultTensorType` as a fall back
    when the quantizer cannot find the type it needs.
    This fixes issue #19409.

commit e832562d70685ffeaab7e3bfa20cd5e9aec916a3
Author: Markus Tavenrath <[email protected]>
Date:   Tue Feb 20 09:06:03 2024 +0100

    Fix invalid usage of designated initializers. (#19497)
    I've replaces all ocurances of C++ designated initializers in the CUDA
    NHWC Tests by member initialization.
    C++ designated initializers have been introduced in C++ 20. Yet GCC
    accepts designated initializers in C++17 which is the standard used to
    compile onnxruntime. Yet MSVC is standard conform and accepts this
    feature starting C++20 which leads to compile failures on Windows
    without this change.

commit f3e3b531fe4c0d33d70928b101fb5d445e4174a8
Author: PeixuanZuo <[email protected]>
Date:   Tue Feb 20 10:31:39 2024 +0800

    Update build directory clean up stage for python package pipeline (#19553)

    Fix to make clean up stage take effect.

    If the `SourceFolder ` is empty, the task deletes files from the root
    folder of the repository as though
    [$(Build.SourcesDirectory)](https://learn.microsoft.com/en-us/azure/devops/pipelines/build/variables)
    was specified.

commit b55260d076da309f3a4634eb5248a0eb541e8ca0
Author: pengwa <[email protected]>
Date:   Mon Feb 19 10:21:19 2024 +0800

    Minor fix for cmake (#19552)

    When build on Linux, get a warning saying "
    CMake Warning at CMakeLists.txt:1603 (message):
      MPI and NCCL disabled on Win build.
    "

    This message is not correct. So have such a fix to avoid any
    misunderstanding from users.

    ![image](https://github.com/microsoft/onnxruntime/assets/10530022/848c2d77-a538-4e31-8e0d-4b539233e515)
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

commit dfeda9019cfed2d6df5bcacc54269c7de481bdee
Author: satyajandhyala <[email protected]>
Date:   Sat Feb 17 09:19:17 2024 -0800

    [JS/WebGPU] Add MatMulNBits (#19446)
    Add MatMulNBits to support MatMul using 4-bit quantized weights
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

commit 06269a3952fb1759d93235b9d66f9beb10ae8663
Author: Yulong Wang <[email protected]>
Date:   Fri Feb 16 18:28:27 2024 -0800

    [js/webgpu] allow uint8 tensors for webgpu (#19545)
    allow uint8 tensors for webgpu

commit 4874a41008138ecc1f26e9cd17e5d9d7febb29aa
Author: Adrian Lizarraga <[email protected]>
Date:   Fri Feb 16 16:59:43 2024 -0800

    [QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546)
    Updates the default QNN SDK version to 2.19.2.240210.
    Build and test the latest version of QNN SDK in our pipelines.

commit 44d8ad93b20efdba921ca80f23485c084b5174d0
Author: kunal-vaishnavi <[email protected]>
Date:   Fri Feb 16 15:21:43 2024 -0800

    Whisper Timestamps and Temperature (#19509)
    This PR updates exporting and running the Whisper model with beam search
    by adding the following.

    - Adds temperature as a graph input to the exported model
    - Fixes the token ids by adding them as attributes to
    `WhisperBeamSearch`
    - Fixes the timestamps test cases so they pass now
    - Fixes a bug with invoking `torch.onnx.export`
    - Cleans up the Whisper scripts and groups the arguments in
    `convert_to_onnx.py`
    - Adds a `requirements.txt` file to specify package dependencies
    - Adds `whisper-large-v3` to list of pretrained models
    - Fixes a bug with missing cross-attention KV cache inputs in the
    decoder subgraph

    - This is a follow-up to [this
    PR](https://github.com/microsoft/onnxruntime/pull/19188).
    - The incorrect token ids in the timestamps processor were first noticed
    during [this PR
    review](https://github.com/microsoft/onnxruntime/pull/17500#discussion_r1333520007).
    When they were originally added in [this
    PR](https://github.com/microsoft/onnxruntime/pull/15853), the offsets
    were previously constant across the Whisper model sizes. When comparing
    the new `whisper-large-v3` variant, the English-only variants (e.g.
    `whisper-tiny.en`), and the original variants (e.g. `whisper-tiny`),
    both the values and the offsets differ. Therefore, it is easier to set
    the token ids as attributes to `WhisperBeamSearch` when exporting to
    ensure the right values are used in the timestamps processor.
    - The Hugging Face API for returning timestamps and the expected outputs
    from the PyTorch model have both changed.
    - The fix for `torch.onnx.export` is a follow-up to [this PR
    review](https://github.com/microsoft/onnxruntime/pull/17179#issuecomment-1683001470).
    - The argument grouping is a follow-up to [this PR
    review](https://github.com/microsoft/onnxruntime/pull/17500#discussion_r1333521721).
    - Specific package versions are needed to run the Whisper scripts and
    the `requirements.txt` file ensures that these versions are installed.
    - The `whisper-large-v3` variant is released and should be in the list
    of official pretrained models.
    - After the changes from [this
    PR](https://github.com/microsoft/onnxruntime/pull/17316), the exported
    model is not loading in an ORT inference session because the
    cross-attention KV cache inputs are missing in the decoder subgraph.

commit 1dce5e17321d50bf345022b525a937933473415a
Author: Tianlei Wu <[email protected]>
Date:   Fri Feb 16 14:41:11 2024 -0800

    Disable TF32 in Linux_Test stage of Linux GPU CI Pipeline (#19541)
    Some test thresholds that previously worked in T4 GPU does not work
    anymore. The reason is current pipeline uses A10, and TF32 is enabled by
    default.

    Disable TF32 in Linux GPU CI Pipeline in testing to avoid such random
    test failure.
    Linux Test has random failure at tests:

    ProviderOptionsTest > testCUDAOptions() FAILED
    org.opentest4j.AssertionFailedError: array contents differ at index
    [446], expected: <0.0419757> but was: <0.041948937>
    at
    app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    at
    app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
    at
    app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
    at
    app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
    at
    app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)

    org.opentest4j.AssertionFailedError: array contents differ at index [6],
    expected: <0.0225981> but was: <0.022587791>
    at
    app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    at
    app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
    at
    app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
    at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676)
    at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)

commit b84712151c06f0f59359916be572f71bd36721a4
Author: Adrian Lizarraga <[email protected]>
Date:   Fri Feb 16 14:36:05 2024 -0800

    QNN EP: Fuse DQ -> Q sequences into a QNN Convert op (#19511)
    Fuses DQ -> Q sequences into a QNN Convert operator if:
    - Converting from one qtype to another. Ex: Dequantize(uint8 to float)
    -> Quantize(float to uint16)
    - The DQ and Q operators are not part of another node unit (i.e.,
    standalone)
    - The Q operator is the only consumer for the DQ operator.
    Allows faster execution of QDQ models with mixed activation types by
    leveraging the QNN Convert operator, which converts between quantization
    types. For certain models, this results in inference latency speed-ups
    of up to 2x (depends on the number of DQ -> Q sequences).

    Original:
    ```
    u8 ----> DQ ---> Q ---u16--> Add ---u16-->
                                  ^
                                  |
    u16 --------------------------+
    ```

    After fusing DQ -> Q:
    ```
    u8 ----> Convert ---u16--> Add ---u16-->
                                ^
                                |
    u16 ------------------------+
    ```

commit ef0b71308c0e2395d3ea63e627515ff8e624ad45
Author: Sheil Kumar <[email protected]>
Date:   Fri Feb 16 05:34:55 2024 -0800

    Optimize KahnsTopologicalSort and PriorityNodeCompare (#19475)

    **Description**
    1) During SessionInitialization, KahnsTopologicalSort is a major cause
    of perf degradation.
    The main cause of slow down is that the TopologicalSort needs to keep
    track of nodes to visit in order, and reorder them based on priority (as
    informed by a comparator). The existing implementation uses a
    priority_queue that is backed by a std::vector container. However,
    vectors are not good for insertion and reordering. The appropriate data
    type for this operation is a linked list. However, linked lists like
    std::list are not usable as a container for std::priority_queue. This is
    because std::priority_queue requires random access, which linked lists
    do not have. However, for this simple implementation, we can leverage a
    std::list under the hood and perform insertions manually using
    std::upper_bound. This drastically reduces the time taken by the method,
    which currently instead causes numerous recopies and a lot of movement
    inside the graph nodes to visit list.

    2) In the comparator, I hide forward and backward attribute checking
    behind the #ifdef ENABLE_TRAINING macro, as I believe it should only be
    valid in the training scenario.

    3) In noopelimination transformer, I prevent the creation of Initializer
    (which unpacks tensorproto data) in every node and only create
    initializers when Add/Sub/Mul/Div op nodes are detected.

    **Motivation and Context**
    Session creation time of many models is quite slow.

    ---------

    Co-authored-by: Sheil Kumar <[email protected]>

commit 4bfa69def85476b33ccfaf68cf070f3fb65d39f7
Author: Tianlei Wu <[email protected]>
Date:   Thu Feb 15 20:22:36 2024 -0800

    Speed Up DecoderMaskedSelfAttentionTest (#19531)
    The unit tests take 19 minutes to run (in debug build) because of too
    many combinations. I reduce the combinations and remain good test
    coverage. After the change, the test can finish in 51 seconds.

    Before:
    [----------] 2 tests from DecoderMaskedSelfAttentionTest
    [ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp32
    [       OK ] DecoderMaskedSelfAttentionTest.Test_fp32 (394086 ms)
    [ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp16
    [       OK ] DecoderMaskedSelfAttentionTest.Test_fp16 (747035 ms)
    [----------] 2 tests from DecoderMaskedSelfAttentionTest (1141122 ms
    total)

    After:
    [----------] 2 tests from DecoderMaskedSelfAttentionTest
    [ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp32
    [       OK ] DecoderMaskedSelfAttentionTest.Test_fp32 (21057 ms)
    [ RUN      ] DecoderMaskedSelfAttentionTest.Test_fp16
    [       OK ] DecoderMaskedSelfAttentionTest.Test_fp16 (30653 ms)
    [----------] 2 tests from DecoderMaskedSelfAttentionTest (51710 ms
    total)
    Reduce test time, and improve build pipeline efficiency.

commit d0061d6fb15d40eeb35fa1b40a414cd231d51db9
Author: sophies927 <[email protected]>
Date:   Thu Feb 15 17:03:11 2024 -0800

    Update stale.yml to use old version as a bug fix (#19532)
    Changed the actions/stale version back to v8 from v9.
    There is a well-documented issue w/ the new actions/stale version
    (v9.0.0) that causes the following error: "Error delete _state: [403]
    Resource not accessible by integration". See
    https://github.com/actions/stale/issues/1133 for more context.

    This issue is preventing the stale bot from labeling stale issues since
    the version was updated b/c the action can no longer access the cache
    and cannot apply labels to all issues due to GH API rate limiting.

    There are two potential fixes if we continue to use the new version: (1)
    run the action on all PRs/issues to avoid using the cache or (2) give
    write access to the endpoints listed in
    https://docs.github.com/en/rest/authentication/permissions-required-for-fine-grained-personal-access-tokens?apiVersion=2022-11-28#repository-permissions-for-actions.
    Neither of these options is preferable, so I am going to wait until the
    bug is fixed.

    Note: The old version (v8.0.0) uses Node 16, which will be deprecated in
    Spring 2024, instead of Node 20, so we should keep an eye on [this
    issue](https://github.com/actions/stale/issues/1133) to see when they
    make the fix and we can switch back to the new version.

commit d63c664ca0021fbac31cee57ff1eaa8bce3d1903
Author: rui-ren <[email protected]>
Date:   Thu Feb 15 00:02:08 2024 -0800

    fix rocm  ci pipeline (#19525)
    <!-- Describe your changes. -->

    ROCm CI pipeline issue.
    ```
    Downloading and preparing dataset wikitext/wikitext-2-raw-v1 (download: 4.50 MiB, generated: 12.91 MiB, post-processed: Unknown size, total: 17.41 MiB) to /home/onnxruntimedev/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20...
        main()
      File "/stage/huggingface-transformers/examples/pytorch/language-modeling/run_mlm.py", line 242, in main
        datasets = load_dataset(data_args.dataset_name, data_args.dataset_config_name, cache_dir=model_args.cache_dir)
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/load.py", line 856, in load_dataset
        builder_instance.download_and_prepare(
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 583, in download_and_prepare
        self._download_and_prepare(
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/builder.py", line 639, in _download_and_prepare
        split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
      File "/home/onnxruntimedev/.cache/huggingface/modules/datasets_modules/datasets/wikitext/aa5e094000ec7afeb74c3be92c88313cd6f132d564c7effd961c10fd47c76f20/wikitext.py", line 138, in _split_generators
        data_file = dl_manager.download_and_extract(self.config.data_url)
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 289, in download_and_extract
        return self.extract(self.download(url_or_urls))
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 197, in download
        downloaded_path_or_paths = map_nested(
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 195, in map_nested
        return function(data_struct)
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 220, in _download
        return cached_path(url_or_filename, download_config=download_config)
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 281, in cached_path
        output_path = get_from_cache(
      File "/opt/miniconda/envs/rocm-ci/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 634, in get_from_cache
        raise ConnectionError("Couldn't reach {}".format(url))
    ConnectionError: Couldn't reach https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-2-raw-v1.zip

    ```
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->
    Update the `datasets` pipeline to latest version `2.17.0`.

commit 660f39aca5d47888804163405c64ee67eec6eed5
Author: Changming Sun <[email protected]>
Date:   Wed Feb 14 18:35:56 2024 -0800

    Perf improvement for Intel MTL CPUs (#19524)
    See the comments inside of the changed files for more detailed
    information.

    The file onnxruntime/core/platform/windows/hardware_core_enumerator.cc
    and onnxruntime/core/platform/windows/hardware_core_enumerator.h were
    copied from WinML source folder in this repo, with minor coding style
    changes.

    I had an offline discussion with Sheil. We agree that given the lack of
    a future proof solution, we may check-in this temp fix first, and rework
    it later. I will have a meeting with @ivberg for discussing the issue
    deeply, and seeking for a long term solution. Thanks for offering help,
    @ivberg !
    With this change, we will see about 2x perf improvement on some Intel
    CPUs.

commit 775c774f4bdcdd57c107030e1341809b4b5ba35e
Author: jingyanwangms <[email protected]>
Date:   Wed Feb 14 18:07:51 2024 -0800

    Add BF16 to Sqrt (#19363)
    Sqrt does not have BF16 support yet. Adding that with this PR
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

commit a67e6925468effd1897c2f541821d32a2860a037
Author: rui-ren <[email protected]>
Date:   Wed Feb 14 15:07:56 2024 -0800

    add GatherSliceToSplitFusion and Unittest (#19218)

    in multi-query attention
    ```
    batch_size, seq_length, three_times_hidden_size = fused_qkv.shape
    fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads + 2, self.head_dim)
    return fused_qkv[..., :-2, :], fused_qkv[..., [-2], :], fused_qkv[..., [-1], :]
    ```
    which can be optimized to
    ```
    batch_size, seq_length, three_times_hidden_size = fused_qkv.shape
    fused_qkv = fused_qkv.view(batch_size, seq_length, self.num_heads + 2, self.head_dim)
    (query, key, value) = fused_qkv.split([self.num_heads, 1, 1], dim=2)
    return query, key, value
    ```

    this optimization can be validated from nsight profiling and perf
    benchmarking.

    <img width="545" alt="image"
    src="https://github.com/microsoft/onnxruntime/assets/15321482/cefcd061-4a01-4aaf-a008-8e265f7f63e9">

    As such, This PR is to Optimize the `Gather/Gather/Slice` Ops to `Split`
    Kernel.
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

    As 2 `Gather` and 1 `Slice` Kernels are time consuming for backward
    prop, it would be efficient to use 1 `Split` Kernel

    - Before Fusion
    <img width="419" alt="image"
    src="https://github.com/microsoft/onnxruntime/assets/15321482/17410319-57ea-4176-afd4-1efdcd3fdbae">

    - After Fusion
    <img width="424" alt="image"
    src="https://github.com/microsoft/onnxruntime/assets/15321482/f1ee1582-96d4-45f4-8778-49d1f3fd370a">
    After the optimization, there will have **~7%** perf gain.

    > The `Transpose` Kernel can be fused too, will update it in next PR.
    However, after testing Transponse Ops fusion on Falcon model, there is
    no perf gain. Will not create a new PR.

    ---------

    Co-authored-by: ruiren <[email protected]>

commit 4e5119760d8cf1c2e751f4264f23ab3e5a25aebc
Author: Scott McKay <[email protected]>
Date:   Thu Feb 15 08:46:03 2024 +1000

    Add initial support for CoreML ML Program to the CoreML EP. (#19347)
    <!-- Describe your changes. -->
    Adds infrastructure to create an ML Package containing the Model using
    ML Program. Updated coremltools files to v7.1 to bring in new protobuf
    definitions along with the tools to write the weight.bin file and create
    an ML Package correctly.

    Enables building a CoreML Model on all platforms which means all the
    operator builder code can be debugged anywhere. Execution of the
    generated CoreML model is obviously limited to Apple platforms.

    The Conv operator builder has been updated to be able to generate an ML
    Program Operation.
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->
    NeuralNetwork is no longer being developed and ML Program is the
    replacement going forward.

commit 944d8f85135e0caf836ae7f6ad1bfac8dcba2f21
Author: Baiju Meswani <[email protected]>
Date:   Wed Feb 14 12:49:34 2024 -0800

    Update the default std flag used during torch extensions compilation (#19516)

commit 3b03b2e046092522e84f0b9aebac1b394a3e4b13
Author: Prathik Rao <[email protected]>
Date:   Wed Feb 14 11:19:33 2024 -0800

    Upgrade default ORTModule opset from 15 to 17 (#19315)
    <!-- Describe your changes. -->

    This PR upgrades ORTModule's default opset from 15 to 17. Opset 17 is
    the final opset supported by torchscript exporter
    (https://github.com/pytorch/pytorch/pull/107829)
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

    Engineering excellence contribution for ORT Training DRI.

    ---------

    Co-authored-by: Prathik Rao <[email protected]@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

commit 1508c2ee39023274417417b290303cf058ceedd6
Author: Sheil Kumar <[email protected]>
Date:   Wed Feb 14 10:31:03 2024 -0800

    Restrict L2 Cache Core check to Intel devices (#19483)
    Limit SoC core detection via 2 level cache core logic to Intel and
    Hybrid processors.
    The following code was added to add support for a new class of CPU cores
    present in Intel’s next generation Intel Core Ultra mobile processors.
    This code is essential to avoid placing threads on low performing SoC
    cores that don’t have L3 cache. SoC cores are meant to specialize in
    system bringup and help improve responsiveness and power usage, in other
    words they are not meant to run compute heavy AI workloads. In order to
    avoid broad exposure of this logic, it is currently designed to be
    restricted to Intel platforms that have hybrid enabled.

    ---------

    Co-authored-by: Sheil Kumar <[email protected]>

commit fbff99a432caef529f90d20137fa5aee33f38fcf
Author: Tianlei Wu <[email protected]>
Date:   Wed Feb 14 10:08:46 2024 -0800

    Change Jave Test Threshold (#19508)
    Increase the threshold to 1e-5 to avoid test failed in CUDA when
    difference is slightly larger than 1e-6.
    May because TF32 is used in those CUDA tests.

    https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1291322&view=logs&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=28e21ca6-87a4-5e1e-0441-72b5e8326f2d

    ProviderOptionsTest > testCUDAOptions() FAILED
    org.opentest4j.AssertionFailedError: array contents differ at index
    [103], expected: <0.0102678> but was: <0.010266338>
    at
    app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    at
    app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
    at
    app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
    at
    app//ai.onnxruntime.providers.ProviderOptionsTest.runProvider(ProviderOptionsTest.java:99)
    at
    app//ai.onnxruntime.providers.ProviderOptionsTest.testCUDAOptions(ProviderOptionsTest.java:43)

    https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1293200&view=logs&jobId=f2f63060-d9d6-52d0-adee-b97db5a9ab91&j=f2f63060-d9d6-52d0-adee-b97db5a9ab91&t=28e21ca6-87a4-5e1e-0441-72b5e8326f2d

    InferenceTest > testCUDA() FAILED
    org.opentest4j.AssertionFailedError: array contents differ at index
    [103], expected: <0.0102678> but was: <0.010266337>
    at
    app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
    at
    app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.failArraysNotEqual(AssertArrayEquals.java:440)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:290)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:123)
    at
    app//org.junit.jupiter.api.AssertArrayEquals.assertArrayEquals(AssertArrayEquals.java:119)
    at
    app//org.junit.jupiter.api.Assertions.assertArrayEquals(Assertions.java:1360)
    at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:676)
    at app//ai.onnxruntime.InferenceTest.testCUDA(InferenceTest.java:615)

commit f53d2c2465d81cdb4e14c7241eab327184192c88
Author: Ye Wang <[email protected]>
Date:   Wed Feb 14 18:08:11 2024 +0000

    Phi2 script fixes (#19500)
    <!-- Describe your changes. -->

    This PR is intended to support Phi2 passes in Olive.
    Merge it before https://github.com/microsoft/Olive/pull/938
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

commit 544407038d96521617fe633cf97153d3e75561f5
Author: Prathik Rao <[email protected]>
Date:   Wed Feb 14 10:05:16 2024 -0800

    SimplifiedLayerNormalization Fusion BFloat16 support for Llama-v2 on A100 (#18898)
    <!-- Describe your changes. -->

    Adds bfloat16 as a supported dtype for SimplifiedLayerNormFusion which
    will provide speedup for Llama-v2 on A100 using bfloat16 numerical
    format.

    _layernorm_optimized_training.onnx exported in bfloat16 vs. float16:_

    ![image](https://github.com/microsoft/onnxruntime/assets/31260940/8c0a5f0f-5fcb-4637-bcd9-f34272ec0284)

    ```python
    from torch import nn
    from onnxruntime.training.ortmodule import ORTModule, DebugOptions, LogLevel
    import torch

    dtype = torch.bfloat16

    class Net(nn.Module):
        def __init__(self):
            super().__init__()
            self.fc = nn.Linear(784, 10, dtype=dtype)
            self.layernorm = nn.LayerNorm([784], dtype=dtype)

        def forward(self, x):
            x = x.view(x.shape[0], -1)
            x = self.layernorm(x)
            x = self.fc(x)

            return x

    model = Net()
    model = ORTModule(model, DebugOptions(save_onnx=True, onnx_prefix='layernorm', log_level=LogLevel.INFO))
    model.to("cuda")

    images = torch.randn((8, 28, 28), dtype=dtype).to("cuda")
    output = model(images)
    ```
    <!-- - Why is this change required? What problem does it solve?
    - If it fixes an open issue, please link to the issue here. -->

    ONNX Runtime integration with Llama-v2 family of LLMs.

    ---------

    Co-authored-by: Prathik Rao <[email protected]@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

commit 18f76bd25ded7a6ec4b8675e1c2813753fec5343
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Tue Feb 13 15:59:24 2024 -0800

    Bump gradle/wrapper-validation-action from 1 to 2 (#19412)

    Bumps
    [gradle/wrapper-validation-action](https://github.com/gradle/wrapper-validation-action)
    from 1 to 2.
    <details>
    <summary>Release notes</summary>
    <p><em>Sourced from <a
    href="https://github.com/gradle/wrapper-validation-action/releases">gradle/wrapper-validation-action's
    releases</a>.</em></p>
    <blockquote>
    <h2>v2.0.0</h2>
    <h2>What's Changed</h2>
    <p>The version of the Node.js runtime was updated to 20, and the
    majority of dependencies were updated to the latest versions.
    From now on, the <code>wrapper-validation-action</code> will require a
    Node.js 20 runtime environment.</p>
    <p>There are no functional changes in this release.
    This release is tagged with the <code>v2</code> version label.</p>
    <ul>
    <li>[NEW] Update Node.js runtime to version 20 (<a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li>
    </ul>
    <h2>v2.0.0-rc.1</h2>
    <p>This is a release candidate for <code>v2.0.0</code>. It is also
    available under the <code>v2</code> version label.</p>
    <h2>What's Changed</h2>
    <p>The version of the Node.js runtime was updated to 20, and the
    majority of dependencies were updated to the latest versions.
    From now on, the <code>wrapper-validation-action</code> will require a
    Node.js 20 runtime environment.</p>
    <p>There are no functional changes in this release.</p>
    <ul>
    <li>[NEW] Update Node.js runtime to version 20 (<a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li>
    </ul>
    <h2>v1.1.0</h2>
    <p>The action now adds the path of the failed wrapper Jar as a
    <code>failed-wrapper</code> Step output parameter.
    This makes the value available for reporting in later Steps/Jobs.</p>
    <h2>v1.0.6</h2>
    <h1>Gradle Wrapper Validation</h1>
    <ul>
    <li>Security vulnerability: <a
    href="https://github.com/gradle/wrapper-validation-action/commit/959bfac6da73353b14c33ab55d44b04f3cd95525">Bump
    json5 from 1.0.1 to 1.0.2</a></li>
    <li>Security vulnerability: <a
    href="https://github.com/gradle/wrapper-validation-action/commit/ffa46e5c8750eca4459bd01191fa54c8b10f778f">Bump
    qs from 6.10.1 to 6.11.0</a></li>
    </ul>
    <h2>v1.0.5</h2>
    <h1>Gradle Wrapper Validation</h1>
    <ul>
    <li>Update dependencies for Node 16 (<a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/53">#53</a>)</li>
    <li>Update dependencies with security vulnerabilities (<a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/67">#67</a>)</li>
    <li>Update various other dependencies (<a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/45">#45</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/47">#47</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/48">#48</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/54">#54</a>)</li>
    </ul>
    <h2>v1.0.4</h2>
    <h1>Gradle Wrapper Validation</h1>
    <ul>
    <li>Retry connections to the server on failure (<a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/39">#39</a>)</li>
    <li>Update dependencies (<a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/38">#38</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/37">#37</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/36">#36</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/34">#34</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/31">#31</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/30">#30</a>,
    <a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/29">#29</a>)</li>
    </ul>
    <h2>v1.0.3</h2>
    <h1>Gradle Wrapper Validation</h1>
    <p>Update <code>minimist</code> version to  <code>1.2.5</code></p>
    <h2>v1.0.2</h2>
    <!-- raw HTML omitted -->
    </blockquote>
    <p>... (truncated)</p>
    </details>
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/27152f6fa06a6b8062ef7195c795692e51fc2c81"><code>27152f6</code></a>
    Update to Node 20 (<a
    href="https://redirect.github.com/gradle/wrapper-validation-action/issues/170">#170</a>)</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/d8758a98d16d912adc6fe28c4eb7bee69eb481f1"><code>d8758a9</code></a>
    Build output</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/e916071cca19c1df0d7932e61a1029451b96d441"><code>e916071</code></a>
    Update NPM dependencies</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/d9359e465a2e9a25f4433bfd10ff6ceb34b3491c"><code>d9359e4</code></a>
    Add asdf config file</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/77d43de1708304ff0de210ad94a771ce3f4aef26"><code>77d43de</code></a>
    Update upload-artifact version</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/2f8436d9bbfed346d2fbe8ceaeae437b0c8c92e3"><code>2f8436d</code></a>
    Use setup-node@v4 instead of pinning to a revision</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/bfa0fe410a68922e64827497952f1dfcb1182db4"><code>bfa0fe4</code></a>
    Consistently use npm cache for workflows</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/8be8473276734e4bd055a8824238f73e1c1113b5"><code>8be8473</code></a>
    Update workflows and action to NodeJS 20</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/c8fad9e3f832df7a2cf7d83c755a79f4afa58c9e"><code>c8fad9e</code></a>
    Bump <code>@​babel/traverse</code> from 7.14.7 to 7.23.2</li>
    <li><a
    href="https://github.com/gradle/wrapper-validation-action/commit/342dbebe7272035434f9baccc29a816ec6dd2c7b"><code>342dbeb</code></a>
    Update README to use <code>actions/checkout@v4</code></li>
    <li>See full diff in <a
    href="https://github.com/gradle/wrapper-validation-action/compare/v1...v2">compare
    view</a></li>
    </ul>
    </details>
    <br />

    [![Dependabot compatibility
    score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=gradle/wrapper-validation-action&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

    Dependabot will resolve any conflicts with this PR as long as you don't
    alter it yourself. You can also trigger a rebase manually by commenting
    `@dependabot rebase`.

    [//]: # (dependabot-automerge-start)
    [//]: # (dependabot-automerge-end)

    ---

    <details>
    <summary>Dependabot commands and options</summary>
    <br />

    You can trigger Dependabot actions by commenting on this PR:
    - `@dependabot rebase` will rebase this PR
    - `@dependabot recreate` will recreate this PR, overwriting any edits
    that have been made to it
    - `@dependabot merge` will merge this PR after your CI passes on it
    - `@dependabot squash and merge` will squash and merge this PR after
    your CI passes on it
    - `@dependabot cancel merge` will cancel a previously requested merge
    and block automerging
    - `@dependabot reopen` will reopen this PR if it is closed
    - `@dependabot close` will close this PR and stop Dependabot recreating
    it. You can achieve the same result by closing it manually
    - `@dependabot show <dependency name> ignore conditions` will show all
    of the ignore conditions of the specified dependency
    - `@dependabot ignore this major version` will close this PR and stop
    Dependabot creating any more for this major version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this minor version` will close this PR and stop
    Dependabot creating any more for this minor version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this dependency` will close this PR and stop
    Dependabot creating any more for this dependency (unless you reopen the
    PR or upgrade to it yourself)

    </details>

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit f048fb5b14f5495950fb984dc474c8930861e474
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Tue Feb 13 15:59:15 2024 -0800

    Bump nuget/setup-nuget from 1 to 2 (#19411)

    Bumps [nuget/setup-nuget](https://github.com/nuget/setup-nuget) from 1
    to 2.
    <details>
    <summary>Release notes</summary>
    <p><em>Sourced from <a
    href="https://github.com/nuget/setup-nuget/releases">nuget/setup-nuget's
    releases</a>.</em></p>
    <blockquote>
    <h2>v2.0.0</h2>
    <h2>What's Changed</h2>
    <ul>
    <li>build(deps): bump semver from 7.3.8 to 7.5.2 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/49">NuGet/setup-nuget#49</a></li>
    <li>build(deps-dev): bump word-wrap from 1.2.3 to 1.2.5 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/51">NuGet/setup-nuget#51</a></li>
    <li>build(deps-dev): bump <code>@​babel/traverse</code> from 7.23.0 to
    7.23.2 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/57">NuGet/setup-nuget#57</a></li>
    <li>Update to use Node.js 20 by <a
    href="https://github.com/frederikprijck"><code>@​frederikprijck</code></a>
    in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/59">NuGet/setup-nuget#59</a></li>
    <li>build(deps-dev): bump prettier from 2.8.7 to 3.0.3 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/60">NuGet/setup-nuget#60</a></li>
    <li>build(deps-dev): bump <code>@​types/node</code> from 18.18.0 to
    20.8.9 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/62">NuGet/setup-nuget#62</a></li>
    <li>build(deps-dev): bump <code>@​vercel/ncc</code> from 0.36.1 to
    0.38.1 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/61">NuGet/setup-nuget#61</a></li>
    <li>build(deps-dev): bump eslint-plugin-jest from 27.4.0 to 27.6.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/64">NuGet/setup-nuget#64</a></li>
    <li>build(deps-dev): bump nock from 13.3.3 to 13.3.6 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/63">NuGet/setup-nuget#63</a></li>
    <li>build(deps-dev): bump eslint from 8.50.0 to 8.52.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/65">NuGet/setup-nuget#65</a></li>
    <li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
    5.62.0 to 6.9.1 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/70">NuGet/setup-nuget#70</a></li>
    <li>build(deps-dev): bump eslint-plugin-github from 4.10.0 to 4.10.1 by
    <a href="https://github.com/dependabot"><code>@​dependabot</code></a> in
    <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/68">NuGet/setup-nuget#68</a></li>
    <li>build(deps-dev): bump <code>@​types/jest</code> from 29.5.5 to
    29.5.7 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/69">NuGet/setup-nuget#69</a></li>
    <li>build(deps-dev): bump eslint from 8.52.0 to 8.53.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/73">NuGet/setup-nuget#73</a></li>
    <li>build(deps-dev): bump <code>@​types/node</code> from 20.8.9 to
    20.8.10 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/71">NuGet/setup-nuget#71</a></li>
    <li>build(deps-dev): bump nock from 13.3.6 to 13.3.8 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/72">NuGet/setup-nuget#72</a></li>
    <li>build(deps-dev): bump prettier from 3.0.3 to 3.1.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/74">NuGet/setup-nuget#74</a></li>
    <li>build(deps-dev): bump <code>@​types/jest</code> from 29.5.7 to
    29.5.8 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/76">NuGet/setup-nuget#76</a></li>
    <li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
    6.9.1 to 6.10.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/77">NuGet/setup-nuget#77</a></li>
    <li>build(deps-dev): bump <code>@​types/node</code> from 20.8.10 to
    20.9.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/75">NuGet/setup-nuget#75</a></li>
    <li>build(deps-dev): bump eslint from 8.53.0 to 8.54.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/80">NuGet/setup-nuget#80</a></li>
    <li>build(deps-dev): bump <code>@​types/node</code> from 20.9.0 to
    20.9.2 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/79">NuGet/setup-nuget#79</a></li>
    <li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
    6.10.0 to 6.12.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/81">NuGet/setup-nuget#81</a></li>
    <li>build(deps-dev): bump <code>@​types/jest</code> from 29.5.8 to
    29.5.10 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/83">NuGet/setup-nuget#83</a></li>
    <li>build(deps-dev): bump typescript from 5.2.2 to 5.3.2 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/82">NuGet/setup-nuget#82</a></li>
    <li>build(deps-dev): bump nock from 13.3.8 to 13.4.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/88">NuGet/setup-nuget#88</a></li>
    <li>build(deps-dev): bump <code>@​types/node</code> from 20.9.2 to
    20.10.3 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/86">NuGet/setup-nuget#86</a></li>
    <li>build(deps-dev): bump eslint from 8.54.0 to 8.55.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/85">NuGet/setup-nuget#85</a></li>
    <li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
    6.12.0 to 6.13.2 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/89">NuGet/setup-nuget#89</a></li>
    <li>build(deps-dev): bump <code>@​types/jest</code> from 29.5.10 to
    29.5.11 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/93">NuGet/setup-nuget#93</a></li>
    <li>build(deps-dev): bump prettier from 3.1.0 to 3.1.1 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/91">NuGet/setup-nuget#91</a></li>
    <li>build(deps-dev): bump typescript from 5.3.2 to 5.3.3 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/92">NuGet/setup-nuget#92</a></li>
    <li>build(deps-dev): bump <code>@​types/node</code> from 20.10.3 to
    20.10.4 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/90">NuGet/setup-nuget#90</a></li>
    <li>build(deps-dev): bump eslint from 8.55.0 to 8.56.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/94">NuGet/setup-nuget#94</a></li>
    <li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
    6.13.2 to 6.19.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/107">NuGet/setup-nuget#107</a></li>
    <li>build(deps-dev): bump eslint-plugin-jest from 27.6.0 to 27.6.3 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/106">NuGet/setup-nuget#106</a></li>
    <li>build(deps-dev): bump <code>@​types/node</code> from 20.10.4 to
    20.11.5 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/110">NuGet/setup-nuget#110</a></li>
    <li>build(deps-dev): bump prettier from 3.1.1 to 3.2.4 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/109">NuGet/setup-nuget#109</a></li>
    <li>build(deps-dev): bump <code>@​types/node</code> from 20.11.5 to
    20.11.10 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/116">NuGet/setup-nuget#116</a></li>
    <li>build(deps-dev): bump nock from 13.4.0 to 13.5.1 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/115">NuGet/setup-nuget#115</a></li>
    <li>build(deps-dev): bump ts-jest from 29.1.1 to 29.1.2 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/113">NuGet/setup-nuget#113</a></li>
    <li>build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
    6.19.0 to 6.20.0 by <a
    href="https://github.com/dependabot"><code>@​dependabot</code></a> in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/117">NuGet/setup-nuget#117</a></li>
    </ul>
    <h2>New Contributors</h2>
    <ul>
    <li><a
    href="https://github.com/frederikprijck"><code>@​frederikprijck</code></a>
    made their first contribution in <a
    href="https://redirect.github.com/NuGet/setup-nuget/pull/59">NuGet/setup-nuget#59</a></li>
    </ul>
    <p><strong>Full Changelog</strong>: <a
    href="https://github.com/NuGet/setup-nuget/compare/v1.2.0...v1.3.0">https://github.com/NuGet/setup-nuget/compare/v1.2.0...v1.3.0</a></p>
    <!-- raw HTML omitted -->
    </blockquote>
    <p>... (truncated)</p>
    </details>
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a
    href="https://github.com/NuGet/setup-nuget/commit/a21f25cd3998bf370fde17e3f1b4c12c175172f9"><code>a21f25c</code></a>
    Update dist for release (<a
    href="https://redirect.github.com/nuget/setup-nuget/issues/118">#118</a>)</li>
    <li><a
    href="https://github.com/NuGet/setup-nuget/commit/5166d73a4364a70349ca47c0d2eaf29f6a35ee91"><code>5166d73</code></a>
    build(deps-dev): bump <code>@​typescript-eslint/parser</code> from
    6.19.0 to 6.20.0 (<a
    href="https://redirect.github.com/nuget/setup-nuget/issues/117">#117</a>)</li>
    <li><a
    href="https://github.com/NuGet/setup-nuget/commit/b9155458821fce8ba9dfd93e80d9ae0ca27cc95e"><code>b915545</code></a>
    build(deps-dev): bump ts-jest from 29.1.1 to 29.1.2 (<a
    href="https://redirect.github.com/nuget/setup-nuget/issues/113">#113</a>)</li>
    <li><a
    href="https://github.com/NuGet/setup-nuget/commit/00081d4dbea954580da0a8959641007c145fabca"><code>00081d4</code></a>
    build(deps-dev): bump nock from 13.4.0 to 13.5.1 (<a
    href="https://redirect.github.com/nuget/setup-nuget/issues/115">#115</a>)</li>
    <li><a
    href="https://github.com/NuGet/…
  • Loading branch information
sspintel committed Feb 22, 2024
1 parent 923ada9 commit e0d08b1
Show file tree
Hide file tree
Showing 391 changed files with 11,355 additions and 3,712 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/gradle-wrapper-validation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: gradle/wrapper-validation-action@v1
- uses: gradle/wrapper-validation-action@v2
2 changes: 1 addition & 1 deletion .github/workflows/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ jobs:
triage:
runs-on: ubuntu-latest
steps:
- uses: github/issue-labeler@v3.3
- uses: github/issue-labeler@v3.4
with:
repo-token: "${{ secrets.GITHUB_TOKEN }}"
configuration-path: .github/labeler.yml
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-csharp-apidocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
wget https://github.com/dotnet/docfx/releases/download/v${DOCFXVERSION}/docfx-linux-x64-v${DOCFXVERSION}.zip -O build/docfx/docfx.zip
unzip build/docfx/docfx.zip -d build/docfx
- name: Install NuGet
uses: nuget/setup-nuget@v1
uses: nuget/setup-nuget@v2
- name: Build Documentation
run: |
build/docfx/docfx metadata csharp/ApiDocs/docfx.json
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/publish-java-apidocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
java-version: '11'
distribution: 'adopt'
- name: Build with Gradle
uses: gradle/gradle-build-action@v2
uses: gradle/gradle-build-action@v3
with:
build-root-directory: java
gradle-executable: java/gradlew
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/stale.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
issues: write
pull-requests: write
steps:
- uses: actions/stale@v9.0.0
- uses: actions/stale@v8
with:
# Comma separated list of labels that can be assigned to issues to exclude them from being marked as stale
exempt-issue-labels: contributions welcome, feature request, regression
Expand Down
10 changes: 6 additions & 4 deletions cmake/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,7 @@ option(onnxruntime_CROSS_COMPILING "Cross compiling onnx runtime" OFF)
option(onnxruntime_GCOV_COVERAGE "Compile with options necessary to run code coverage" OFF)
option(onnxruntime_DONT_VECTORIZE "Do not vectorize operations in Eigen" OFF)

#It's preferred to turn it OFF when onnxruntime is dynamically linked to PROTOBUF. But Tensort always required the full version of protobuf.
cmake_dependent_option(onnxruntime_USE_FULL_PROTOBUF "Link to libprotobuf instead of libprotobuf-lite when this option is ON" OFF "NOT onnxruntime_USE_TENSORRT" ON)
option(onnxruntime_USE_FULL_PROTOBUF "Link to libprotobuf instead of libprotobuf-lite when this option is ON" OFF)
option(tensorflow_C_PACKAGE_PATH "Path to tensorflow C package installation dir")
option(onnxruntime_ENABLE_LANGUAGE_INTEROP_OPS "Enable operator implemented in language other than cpp" OFF)
option(onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS "Dump debug information about node inputs and outputs when executing the model." OFF)
Expand Down Expand Up @@ -985,9 +984,12 @@ function(onnxruntime_set_compile_flags target_name)
foreach(FLAG ${ORT_WARNING_FLAGS})
target_compile_options(${target_name} PRIVATE "$<$<COMPILE_LANGUAGE:CUDA>:SHELL:--compiler-options ${FLAG}>")
endforeach()
if ((NVCC_HAS_STRICT_ALIASING AND "${target_name}" MATCHES "cuda") OR (HAS_STRICT_ALIASING AND NOT "${target_name}" MATCHES "cuda"))
if (NVCC_HAS_STRICT_ALIASING AND "${target_name}" MATCHES "cuda")
target_compile_options(${target_name} PRIVATE "$<$<COMPILE_LANGUAGE:CUDA>:-Wno-strict-aliasing>")
endif()
if (HAS_STRICT_ALIASING AND NOT "${target_name}" MATCHES "cuda")
target_compile_options(${target_name} PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:-Wno-strict-aliasing>")
endif()
endif()
if (onnxruntime_USE_ROCM)
# flags are detected with CXX language mode, some flags are not supported with hipclang
Expand Down Expand Up @@ -1588,7 +1590,7 @@ if (UNIX AND onnxruntime_USE_NCCL)
else()
set(onnxruntime_USE_NCCL OFF)
set(onnxruntime_USE_MPI OFF)
message( WARNING "MPI and NCCL disabled on Win build." )
message( WARNING "MPI and NCCL are disabled because build is on Windows or USE_NCCL is set to OFF." )
endif()

if (onnxruntime_USE_MPI)
Expand Down
9 changes: 7 additions & 2 deletions cmake/adjust_global_compile_flags.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,13 @@ if (onnxruntime_MINIMAL_BUILD)
endif()
endif()

# enable stream for all the non-minimal build
if (NOT onnxruntime_MINIMAL_BUILD)
# Enable stream for all the non-minimal build, except for DML. There's currently a bug
# in the allocation planner when reusing buffers and more than one streams are used that
# make it possible (although rarely) to reach a reference count of 0 for a buffer that is
# still being used. Since DML doesn't benefit from multiple streams, disabling it is the
# safest option for now.
# https://github.com/microsoft/onnxruntime/issues/19480
if (NOT onnxruntime_MINIMAL_BUILD AND NOT onnxruntime_USE_DML)
add_compile_definitions(ORT_ENABLE_STREAM)
endif()

Expand Down
6 changes: 1 addition & 5 deletions cmake/onnxruntime_providers.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,7 @@ if(onnxruntime_USE_CUDA)
set(PROVIDERS_CUDA onnxruntime_providers_cuda)
endif()
if(onnxruntime_USE_COREML)
if (CMAKE_SYSTEM_NAME STREQUAL "Darwin" OR CMAKE_SYSTEM_NAME STREQUAL "iOS")
set(PROVIDERS_COREML onnxruntime_providers_coreml coreml_proto)
else()
set(PROVIDERS_COREML onnxruntime_providers_coreml)
endif()
set(PROVIDERS_COREML onnxruntime_providers_coreml coreml_proto)
endif()
if(onnxruntime_USE_NNAPI_BUILTIN)
set(PROVIDERS_NNAPI onnxruntime_providers_nnapi)
Expand Down
127 changes: 114 additions & 13 deletions cmake/onnxruntime_providers_coreml.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,27 @@ endif()

add_compile_definitions(USE_COREML=1)

# Check if we can build the coremltools code for creating an mlpackage with an mlprogram.
# The coremltools source requires std::filesystem::path which is only available from iOS 13 on.
set(_enable_ML_PROGRAM ON)
if (IOS AND CMAKE_OSX_DEPLOYMENT_TARGET VERSION_LESS 13.0)
message(WARNING "CoreML ML Program is not supported on iOS < 13.0. Excluding ML Program support from build.")
set(_enable_ML_PROGRAM OFF)
elseif(LINUX)
# uuid-dev is required. we don't bother installing on CIs as it's really for manual developer testing.
find_library(LibUUID_LIBRARY NAMES uuid)
find_path(LibUUID_INCLUDE_DIR NAMES uuid/uuid.h)
if (NOT LibUUID_INCLUDE_DIR)
message(STATUS "uuid/uuid.h was not found as is required for ML Program support. "
"Run `sudo apt install uuid-dev` if you need to test ML Program related CoreML EP code. ")
set(_enable_ML_PROGRAM OFF)
endif()
endif()

if (_enable_ML_PROGRAM)
add_compile_definitions(COREML_ENABLE_MLPROGRAM=1)
endif()

# Compile CoreML proto definition to ${CMAKE_CURRENT_BINARY_DIR}/coreml_proto
set(COREML_PROTO_ROOT ${coremltools_SOURCE_DIR}/mlmodel/format)
file(GLOB coreml_proto_srcs "${COREML_PROTO_ROOT}/*.proto")
Expand All @@ -19,8 +40,8 @@ target_compile_definitions(coreml_proto
PUBLIC $<TARGET_PROPERTY:${PROTOBUF_LIB},INTERFACE_COMPILE_DEFINITIONS>)
set_target_properties(coreml_proto PROPERTIES COMPILE_FLAGS "-fvisibility=hidden")
set_target_properties(coreml_proto PROPERTIES COMPILE_FLAGS "-fvisibility-inlines-hidden")
set(_src_sub_dir "coreml_proto/")

set(_src_sub_dir "coreml_proto/")
onnxruntime_protobuf_generate(
APPEND_PATH
GEN_SRC_SUB_DIR ${_src_sub_dir}
Expand Down Expand Up @@ -55,6 +76,10 @@ file(GLOB_RECURSE onnxruntime_providers_shared_utils_cc_srcs CONFIGURE_DEPENDS
"${ONNXRUNTIME_ROOT}/core/providers/shared/utils/utils.cc"
)

file(GLOB onnxruntime_providers_coreml_public_headers CONFIGURE_DEPENDS
"${ONNXRUNTIME_INCLUDE_DIR}/core/providers/coreml/*.h"
)

file(GLOB
onnxruntime_providers_coreml_cc_srcs_top CONFIGURE_DEPENDS
"${ONNXRUNTIME_ROOT}/core/providers/coreml/*.h"
Expand All @@ -67,42 +92,118 @@ file(GLOB_RECURSE
"${ONNXRUNTIME_ROOT}/core/providers/coreml/builders/*.h"
"${ONNXRUNTIME_ROOT}/core/providers/coreml/builders/*.cc"
)
if (NOT CMAKE_SYSTEM_NAME STREQUAL "Darwin" AND NOT CMAKE_SYSTEM_NAME STREQUAL "iOS")
list(REMOVE_ITEM onnxruntime_providers_coreml_cc_srcs_nested
"${ONNXRUNTIME_ROOT}/core/providers/coreml/builders/model_builder.h"
"${ONNXRUNTIME_ROOT}/core/providers/coreml/builders/model_builder.cc"

if(_enable_ML_PROGRAM)
# Add helpers to create mlpackage weights. limit to just the files we need to minimize the changes to make them
# build on Windows and Linux.
file(GLOB
onnxruntime_providers_coreml_milblob_cc_srcs CONFIGURE_DEPENDS
"${coremltools_SOURCE_DIR}/mlmodel/src/MILBlob/*.hpp"
"${coremltools_SOURCE_DIR}/mlmodel/src/MILBlob/*.cpp"
"${coremltools_SOURCE_DIR}/mlmodel/src/MILBlob/Util/*.hpp"
"${coremltools_SOURCE_DIR}/mlmodel/src/MILBlob/Blob/BlobDataType.hpp"
"${coremltools_SOURCE_DIR}/mlmodel/src/MILBlob/Blob/StorageFormat.hpp"
"${coremltools_SOURCE_DIR}/mlmodel/src/MILBlob/Blob/FileWriter.?pp"
"${coremltools_SOURCE_DIR}/mlmodel/src/MILBlob/Blob/StorageWriter.?pp"
)

# Add helpers to create mlpackage
file(GLOB
onnxruntime_providers_coreml_modelpackage_cc_srcs CONFIGURE_DEPENDS
"${coremltools_SOURCE_DIR}/modelpackage/src/ModelPackage.?pp"
"${coremltools_SOURCE_DIR}/modelpackage/src/Utils/JsonMap.?pp"
)

set(coremltools_srcs
${onnxruntime_providers_coreml_milblob_cc_srcs}
${onnxruntime_providers_coreml_modelpackage_cc_srcs}
)

source_group(TREE ${coremltools_SOURCE_DIR} PREFIX coremltools FILES ${coremltools_srcs})
endif()

# Add CoreML objective c++ source code
if (CMAKE_SYSTEM_NAME STREQUAL "Darwin" OR CMAKE_SYSTEM_NAME STREQUAL "iOS")
if (APPLE)
file(GLOB
onnxruntime_providers_coreml_objcc_srcs CONFIGURE_DEPENDS
"${ONNXRUNTIME_ROOT}/core/providers/coreml/model/model.h"
"${ONNXRUNTIME_ROOT}/core/providers/coreml/model/model.mm"
"${ONNXRUNTIME_ROOT}/core/providers/coreml/model/host_utils.h"
"${ONNXRUNTIME_ROOT}/core/providers/coreml/model/host_utils.mm"
)
else()
# add the Model implementation that uses the protobuf types but excludes any actual CoreML dependencies
# by using stub implementations on non-Apple platforms.
file(GLOB
onnxruntime_providers_coreml_objcc_srcs CONFIGURE_DEPENDS
"${ONNXRUNTIME_ROOT}/core/providers/coreml/model/host_utils.h"
"${ONNXRUNTIME_ROOT}/core/providers/coreml/model/host_utils_stub.cc"
"${ONNXRUNTIME_ROOT}/core/providers/coreml/model/model.h"
"${ONNXRUNTIME_ROOT}/core/providers/coreml/model/model_stub.cc"
)
endif()

set(onnxruntime_providers_coreml_cc_srcs
${onnxruntime_providers_coreml_cc_srcs_top}
${onnxruntime_providers_coreml_cc_srcs_nested}
${onnxruntime_providers_shared_utils_cc_srcs}
${onnxruntime_providers_coreml_objcc_srcs}
)

source_group(TREE ${ONNXRUNTIME_ROOT}/core FILES ${onnxruntime_providers_coreml_cc_srcs})
source_group(TREE ${ONNXRUNTIME_ROOT} FILES ${onnxruntime_providers_coreml_cc_srcs})
source_group(TREE ${ONNXRUNTIME_INCLUDE_DIR} FILES ${onnxruntime_providers_coreml_public_headers})

onnxruntime_add_static_library(onnxruntime_providers_coreml
${onnxruntime_providers_coreml_cc_srcs} ${onnxruntime_providers_coreml_objcc_srcs}
${onnxruntime_providers_coreml_public_headers}
${onnxruntime_providers_coreml_cc_srcs}
${coremltools_srcs}
)

onnxruntime_add_include_to_target(onnxruntime_providers_coreml
onnxruntime_common onnxruntime_framework onnx onnx_proto ${PROTOBUF_LIB} flatbuffers::flatbuffers Boost::mp11 safeint_interface
onnxruntime_common onnxruntime_framework onnx onnx_proto ${PROTOBUF_LIB} flatbuffers::flatbuffers Boost::mp11
safeint_interface
)
if (CMAKE_SYSTEM_NAME STREQUAL "Darwin" OR CMAKE_SYSTEM_NAME STREQUAL "iOS")
onnxruntime_add_include_to_target(onnxruntime_providers_coreml coreml_proto)
target_link_libraries(onnxruntime_providers_coreml PRIVATE coreml_proto "-framework Foundation" "-framework CoreML")
add_dependencies(onnxruntime_providers_coreml coreml_proto)

onnxruntime_add_include_to_target(onnxruntime_providers_coreml coreml_proto)
target_link_libraries(onnxruntime_providers_coreml PRIVATE coreml_proto)
add_dependencies(onnxruntime_providers_coreml coreml_proto)

if (APPLE)
target_compile_definitions(onnxruntime_providers_coreml PRIVATE __APPLE__)
endif()

if (_enable_ML_PROGRAM)
# Setup coremltools fp16 and json dependencies for creating an mlpackage.
#
# These are also used by external/xnnpack.cmake. fp16 depends on psimd
FetchContent_Declare(psimd URL ${DEP_URL_psimd} URL_HASH SHA1=${DEP_SHA1_psimd})
onnxruntime_fetchcontent_makeavailable(psimd)
set(PSIMD_SOURCE_DIR ${psimd_SOURCE_DIR})
FetchContent_Declare(fp16 URL ${DEP_URL_fp16} URL_HASH SHA1=${DEP_SHA1_fp16})
set(FP16_BUILD_TESTS OFF CACHE INTERNAL "")
set(FP16_BUILD_BENCHMARKS OFF CACHE INTERNAL "")
onnxruntime_fetchcontent_makeavailable(fp16)

# need to tweak the include paths to match what the coreml source code expects
target_include_directories(onnxruntime_providers_coreml PRIVATE
${fp16_SOURCE_DIR}/include
${nlohmann_json_SOURCE_DIR}/single_include/nlohmann
${coremltools_SOURCE_DIR}
${coremltools_SOURCE_DIR}/mlmodel/src/
${coremltools_SOURCE_DIR}/modelpackage/src/
)

add_dependencies(onnxruntime_providers_coreml nlohmann_json::nlohmann_json fp16)

if (LINUX)
target_link_libraries(onnxruntime_providers_coreml PRIVATE uuid)
endif()
endif()

if (APPLE)
target_link_libraries(onnxruntime_providers_coreml PRIVATE "-framework Foundation" "-framework CoreML")
endif()

add_dependencies(onnxruntime_providers_coreml ${onnxruntime_EXTERNAL_DEPENDENCIES})

set_target_properties(onnxruntime_providers_coreml PROPERTIES CXX_STANDARD_REQUIRED ON)
Expand Down
7 changes: 7 additions & 0 deletions cmake/onnxruntime_python.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,9 @@ file(GLOB onnxruntime_python_transformers_models_llama_src CONFIGURE_DEPENDS
file(GLOB onnxruntime_python_transformers_models_longformer_src CONFIGURE_DEPENDS
"${ONNXRUNTIME_ROOT}/python/tools/transformers/models/longformer/*.py"
)
file(GLOB onnxruntime_python_transformers_models_phi2_src CONFIGURE_DEPENDS
"${ONNXRUNTIME_ROOT}/python/tools/transformers/models/phi2/*.py"
)
file(GLOB onnxruntime_python_transformers_models_stable_diffusion_src CONFIGURE_DEPENDS
"${ONNXRUNTIME_ROOT}/python/tools/transformers/models/stable_diffusion/*.py"
)
Expand Down Expand Up @@ -543,6 +546,7 @@ add_custom_command(
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/gpt2
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/llama
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/longformer
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/phi2
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/stable_diffusion
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/t5
COMMAND ${CMAKE_COMMAND} -E make_directory $<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/whisper
Expand Down Expand Up @@ -646,6 +650,9 @@ add_custom_command(
COMMAND ${CMAKE_COMMAND} -E copy
${onnxruntime_python_transformers_models_longformer_src}
$<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/longformer/
COMMAND ${CMAKE_COMMAND} -E copy
${onnxruntime_python_transformers_models_phi2_src}
$<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/phi2/
COMMAND ${CMAKE_COMMAND} -E copy
${onnxruntime_python_transformers_models_stable_diffusion_src}
$<TARGET_FILE_DIR:${build_output_target}>/onnxruntime/transformers/models/stable_diffusion/
Expand Down
5 changes: 0 additions & 5 deletions cmake/onnxruntime_rocm_hipify.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,7 @@ set(contrib_ops_excluded_files
"bert/packed_multihead_attention.cc"
"bert/packed_multihead_attention_impl.h"
"bert/packed_multihead_attention_impl.cu"
"diffusion/group_norm.cc"
"diffusion/group_norm_impl.cu"
"diffusion/group_norm_impl.h"
"diffusion/group_norm_impl_kernel.cuh"
"diffusion/group_norm_common_base.h"
"diffusion/group_norm_common_base.cc"
"diffusion/nhwc_conv.cc"
"math/gemm_float8.cc"
"math/gemm_float8.cu"
Expand Down
22 changes: 7 additions & 15 deletions cmake/onnxruntime_unittests.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,9 @@ function(AddTest)
target_compile_options(${_UT_TARGET} PRIVATE ${DISABLED_WARNINGS_FOR_TVM})
target_compile_options(${_UT_TARGET} PRIVATE "$<$<COMPILE_LANGUAGE:CUDA>:SHELL:--compiler-options -Wno-error=sign-compare>"
"$<$<NOT:$<COMPILE_LANGUAGE:CUDA>>:-Wno-error=sign-compare>")
target_compile_options(${_UT_TARGET} PRIVATE "-Wno-error=uninitialized")
if (${HAS_NOERROR})
target_compile_options(${_UT_TARGET} PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:-Wno-error=uninitialized>")
endif()
endif()

set(TEST_ARGS ${_UT_TEST_ARGS})
Expand Down Expand Up @@ -565,11 +567,7 @@ if(onnxruntime_USE_ROCM)
endif()

if(onnxruntime_USE_COREML)
if (CMAKE_SYSTEM_NAME STREQUAL "Darwin" OR CMAKE_SYSTEM_NAME STREQUAL "iOS")
list(APPEND onnxruntime_test_providers_dependencies onnxruntime_providers_coreml coreml_proto)
else()
list(APPEND onnxruntime_test_providers_dependencies onnxruntime_providers_coreml)
endif()
list(APPEND onnxruntime_test_providers_dependencies onnxruntime_providers_coreml coreml_proto)
endif()

if(onnxruntime_USE_ACL)
Expand Down Expand Up @@ -674,15 +672,9 @@ endif()

if(onnxruntime_USE_COREML)
list(APPEND onnxruntime_test_framework_src_patterns ${TEST_SRC_DIR}/providers/coreml/*)
if (CMAKE_SYSTEM_NAME STREQUAL "Darwin" OR CMAKE_SYSTEM_NAME STREQUAL "iOS")
list(APPEND onnxruntime_test_framework_libs onnxruntime_providers_coreml coreml_proto)
list(APPEND onnxruntime_test_providers_dependencies onnxruntime_providers_coreml coreml_proto)
list(APPEND onnxruntime_test_providers_libs onnxruntime_providers_coreml coreml_proto)
else()
list(APPEND onnxruntime_test_framework_libs onnxruntime_providers_coreml)
list(APPEND onnxruntime_test_providers_dependencies onnxruntime_providers_coreml)
list(APPEND onnxruntime_test_providers_libs onnxruntime_providers_coreml)
endif()
list(APPEND onnxruntime_test_framework_libs onnxruntime_providers_coreml coreml_proto)
list(APPEND onnxruntime_test_providers_dependencies onnxruntime_providers_coreml coreml_proto)
list(APPEND onnxruntime_test_providers_libs onnxruntime_providers_coreml coreml_proto)
endif()

if(onnxruntime_USE_XNNPACK)
Expand Down
2 changes: 2 additions & 0 deletions cmake/winml.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -827,6 +827,7 @@ if (winml_is_inbox)
get_target_property(compile_options ${target} COMPILE_OPTIONS)
get_target_property(include_directories ${target} INCLUDE_DIRECTORIES)
get_target_property(link_libraries ${target} LINK_LIBRARIES)
get_target_property(link_flags ${target} LINK_FLAGS)
get_target_property(link_options ${target} LINK_OPTIONS)

add_library(${new_target} SHARED ${sources})
Expand All @@ -835,6 +836,7 @@ if (winml_is_inbox)
target_compile_options(${new_target} PRIVATE ${compile_options})
target_include_directories(${new_target} PRIVATE ${include_directories})
target_link_libraries(${new_target} PRIVATE ${link_libraries})
set_property(TARGET ${new_target} PROPERTY LINK_FLAGS "${link_flags}")
target_link_options(${new_target} PRIVATE ${link_options})
endfunction()

Expand Down
Loading

0 comments on commit e0d08b1

Please sign in to comment.