How to (very slowly) run high-precision R1 quant on M2 Ultra with most of model being on SSD? #11680
Unanswered
okuvshynov
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Setup:
What is the right way to configure llama.cpp (either cli or server) to do roughly the following:
It's fine if it it slow, I'm ok with 1-2 token / minute.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions