You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Objective: tried to run the llm model on NPU using ipex-llm, and got following error:
python llama3.py --repo-id-or-model-path "meta-llama/Llama-3.2-1B-Instruct" --save-directory "llama-3.2-1B-Inst"
2025-01-29 16:58:29,747 - INFO - Converting model, it may takes up to several minutes ...
2025-01-29 16:58:39,059 - INFO - Finish to convert model
start compiling
Model saved to llama-3.2-1B-Inst\lm_head.xml
start compiling
start compiling
Model saved to llama-3.2-1B-Inst\embedding_post.xml
start compiling
Model saved to llama-3.2-1B-Inst\embedding_post_prefill.xml
start compiling
Model saved to llama-3.2-1B-Inst\decoder_layer_0.xml
start compiling
Model saved to llama-3.2-1B-Inst\decoder_layer_1.xml
**start compiling
Traceback (most recent call last):
File "C:\Users\intel\ritu\ipex-llm\python\llm\example\NPU\HF-Transformers-AutoModels\LLM\llama3.py", line 73, in <module>
model = AutoModelForCausalLM.from_pretrained(**
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\unittest\mock.py", line 1378, in patched
return func(*newargs, **newkeywargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_model.py", line 243, in from_pretrained
model = cls.optimize_npu_model(*args, **optimize_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_model.py", line 317, in optimize_npu_model
optimize_llm_single_process(
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_models\convert.py", line 458, in optimize_llm_single_process
convert_llm(model,
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\convert_pipeline.py", line 215, in convert_llm
convert_llm_for_deploy(model,
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\convert_pipeline.py", line 549, in convert_llm_for_deploy
convert_llama_layer(model, 0, n_splits_linear, n_splits_down_proj,
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_pipeline_model\llama.py", line 294, in convert_llama_layer
single_decoder = LowBitLlamaMultiDecoderlayer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\site-packages\ipex_llm\transformers\npu_models\llama_mp.py", line 216, in __init__
self.compile(npu_dpu_groups=6)
File "C:\Users\intel\miniforge3\envs\llm-npu\Lib\site-packages\intel_npu_acceleration_library\backend\factory.py", line 1054, in compile
backend_lib.compile(self._mm, npu_dpu_groups)
**OSError: [WinError -529697949] Windows Error 0xe06d7363**
The text was updated successfully, but these errors were encountered:
I met the same issue last weekend. finally, I find a way to get it running. Use git to clone the entire repo from hugging fack to local drive, like D:\llm-npn\Llama-3.2-1B-Instruct, and create another empty folder D:\llm-npn\Llama-3.2-1B-Instruct-lowbit, then run this command as: python llama3.py --repo-id-or-model-path "D:/llm-npn/Llama-3.2-1B-Instruct" --save-directory "D:/llm-npn/Llama-3.2-1B-Instruct-lowbit".
Device: MTL Core Ultra 165U
OS: Windows 11
Objective: tried to run the llm model on NPU using ipex-llm, and got following error:
The text was updated successfully, but these errors were encountered: