[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

Dango233 · 2025-02-06T07:11:17Z

Dango233
Feb 6, 2025

Ascends NPUs seems to be a great alternative (to Macstudio and epyc) to run quantized R1.
For example: Atlas 300I Duo offers 140TFLOPS fp16 408GB/s mem bandwidth + 96G Vram.
2 of this card onto a PC could run the quantized 671B R1 relatively well I would say.

However, as shown in https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/CANN.md, there is no deepseek architecture support yet, and low bit quantization seems to be not validated yet.

@hipudding Do you have plan on porting low-bit quantized R1 to Ascend cards, via gguf-cann backend?

That seems a pretty valid use case to me...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

{{title}}

Replies: 0 comments

Select a reply

[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

Dango233 Feb 6, 2025

Replies: 0 comments

Dango233
Feb 6, 2025