We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSeek R1 Q4_0 gguf is hitting an assert error i am not sure how to troubleshoot. Running with 32k context and some layers offloaded to GPU.
Linux
CUDA
Threadripper pro 7965x, 512gb ram 8 dims, 3090 ti and 3x A4000
DeepSeek-R1 Q4_0 GGUF
Assert error at end of logs posted above
No response
Attaching to deep-seek-1 deep-seek-1 | ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no deep-seek-1 | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no deep-seek-1 | ggml_cuda_init: found 4 CUDA devices: deep-seek-1 | Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes deep-seek-1 | Device 1: NVIDIA RTX A4000, compute capability 8.6, VMM: yes deep-seek-1 | Device 2: NVIDIA RTX A4000, compute capability 8.6, VMM: yes deep-seek-1 | Device 3: NVIDIA RTX A4000, compute capability 8.6, VMM: yes deep-seek-1 | build: 4524 (6171c9d2) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu deep-seek-1 | system info: n_threads = 24, n_threads_batch = 24, total_threads = 48 deep-seek-1 | deep-seek-1 | system_info: n_threads = 24 (n_threads_batch = 24) / 48 | CUDA : ARCHS = 520,610,700,750 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | deep-seek-1 | deep-seek-1 | main: HTTP server is listening, hostname: 0.0.0.0, port: 8008, http threads: 6 deep-seek-1 | main: loading model deep-seek-1 | srv load_model: loading model '/models/deepseekv3-r1/DeepSeek-R1-Q4_0/DeepSeek-R1-Q4_0.gguf' deep-seek-1 | llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090 Ti) - 23840 MiB free deep-seek-1 | llama_model_load_from_file_impl: using device CUDA1 (NVIDIA RTX A4000) - 15804 MiB free deep-seek-1 | llama_model_load_from_file_impl: using device CUDA2 (NVIDIA RTX A4000) - 15804 MiB free deep-seek-1 | llama_model_load_from_file_impl: using device CUDA3 (NVIDIA RTX A4000) - 15804 MiB free deep-seek-1 | llama_model_loader: loaded meta data with 51 key-value pairs and 1025 tensors from /models/deepseekv3-r1/DeepSeek-R1-Q4_0/DeepSeek-R1-Q4_0.gguf (version GGUF V3 (latest)) deep-seek-1 | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. deep-seek-1 | llama_model_loader: - kv 0: general.architecture str = deepseek2 deep-seek-1 | llama_model_loader: - kv 1: general.type str = model deep-seek-1 | llama_model_loader: - kv 2: general.name str = DeepSeek R1 deep-seek-1 | llama_model_loader: - kv 3: general.size_label str = 256x20B deep-seek-1 | llama_model_loader: - kv 4: general.tags arr[str,1] = ["text-generation"] deep-seek-1 | llama_model_loader: - kv 5: deepseek2.block_count u32 = 61 deep-seek-1 | llama_model_loader: - kv 6: deepseek2.context_length u32 = 163840 deep-seek-1 | llama_model_loader: - kv 7: deepseek2.embedding_length u32 = 7168 deep-seek-1 | llama_model_loader: - kv 8: deepseek2.feed_forward_length u32 = 18432 deep-seek-1 | llama_model_loader: - kv 9: deepseek2.attention.head_count u32 = 128 deep-seek-1 | llama_model_loader: - kv 10: deepseek2.attention.head_count_kv u32 = 128 deep-seek-1 | llama_model_loader: - kv 11: deepseek2.rope.freq_base f32 = 10000.000000 deep-seek-1 | llama_model_loader: - kv 12: deepseek2.attention.layer_norm_rms_epsilon f32 = 0.000001 deep-seek-1 | llama_model_loader: - kv 13: deepseek2.expert_used_count u32 = 8 deep-seek-1 | llama_model_loader: - kv 14: deepseek2.leading_dense_block_count u32 = 3 deep-seek-1 | llama_model_loader: - kv 15: deepseek2.vocab_size u32 = 129280 deep-seek-1 | llama_model_loader: - kv 16: deepseek2.attention.q_lora_rank u32 = 1536 deep-seek-1 | llama_model_loader: - kv 17: deepseek2.attention.kv_lora_rank u32 = 512 deep-seek-1 | llama_model_loader: - kv 18: deepseek2.attention.key_length u32 = 192 deep-seek-1 | llama_model_loader: - kv 19: deepseek2.attention.value_length u32 = 128 deep-seek-1 | llama_model_loader: - kv 20: deepseek2.expert_feed_forward_length u32 = 2048 deep-seek-1 | llama_model_loader: - kv 21: deepseek2.expert_count u32 = 256 deep-seek-1 | llama_model_loader: - kv 22: deepseek2.expert_shared_count u32 = 1 deep-seek-1 | llama_model_loader: - kv 23: deepseek2.expert_weights_scale f32 = 2.500000 deep-seek-1 | llama_model_loader: - kv 24: deepseek2.expert_weights_norm bool = true deep-seek-1 | llama_model_loader: - kv 25: deepseek2.expert_gating_func u32 = 2 deep-seek-1 | llama_model_loader: - kv 26: deepseek2.rope.dimension_count u32 = 64 deep-seek-1 | llama_model_loader: - kv 27: deepseek2.rope.scaling.type str = yarn deep-seek-1 | llama_model_loader: - kv 28: deepseek2.rope.scaling.factor f32 = 40.000000 deep-seek-1 | llama_model_loader: - kv 29: deepseek2.rope.scaling.original_context_length u32 = 4096 deep-seek-1 | llama_model_loader: - kv 30: deepseek2.rope.scaling.yarn_log_multiplier f32 = 0.100000 deep-seek-1 | llama_model_loader: - kv 31: tokenizer.ggml.model str = gpt2 deep-seek-1 | llama_model_loader: - kv 32: tokenizer.ggml.pre str = deepseek-v3 deep-seek-1 | llama_model_loader: - kv 33: tokenizer.ggml.tokens arr[str,129280] = ["<|begin▁of▁sentence|>", "<�... deep-seek-1 | llama_model_loader: - kv 34: tokenizer.ggml.token_type arr[i32,129280] = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... deep-seek-1 | llama_model_loader: - kv 35: tokenizer.ggml.merges arr[str,127741] = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e... deep-seek-1 | llama_model_loader: - kv 36: tokenizer.ggml.bos_token_id u32 = 0 deep-seek-1 | llama_model_loader: - kv 37: tokenizer.ggml.eos_token_id u32 = 1 deep-seek-1 | llama_model_loader: - kv 38: tokenizer.ggml.padding_token_id u32 = 1 deep-seek-1 | llama_model_loader: - kv 39: tokenizer.ggml.add_bos_token bool = true deep-seek-1 | llama_model_loader: - kv 40: tokenizer.ggml.add_eos_token bool = false deep-seek-1 | llama_model_loader: - kv 41: tokenizer.chat_template str = {% if not add_generation_prompt is de... deep-seek-1 | llama_model_loader: - kv 42: general.quantization_version u32 = 2 deep-seek-1 | llama_model_loader: - kv 43: general.file_type u32 = 2 deep-seek-1 | llama_model_loader: - kv 44: quantize.imatrix.file str = /models_out/DeepSeek-R1-GGUF/DeepSeek... deep-seek-1 | llama_model_loader: - kv 45: quantize.imatrix.dataset str = /training_data/calibration_datav3.txt deep-seek-1 | llama_model_loader: - kv 46: quantize.imatrix.entries_count i32 = 720 deep-seek-1 | llama_model_loader: - kv 47: quantize.imatrix.chunks_count i32 = 124 deep-seek-1 | llama_model_loader: - kv 48: split.no u16 = 0 deep-seek-1 | llama_model_loader: - kv 49: split.tensors.count i32 = 1025 deep-seek-1 | llama_model_loader: - kv 50: split.count u16 = 0 deep-seek-1 | llama_model_loader: - type f32: 361 tensors deep-seek-1 | llama_model_loader: - type q4_0: 652 tensors deep-seek-1 | llama_model_loader: - type q4_1: 11 tensors deep-seek-1 | llama_model_loader: - type q6_K: 1 tensors deep-seek-1 | print_info: file format = GGUF V3 (latest) deep-seek-1 | print_info: file type = Q4_0 deep-seek-1 | print_info: file size = 353.00 GiB (4.52 BPW) deep-seek-1 | load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect deep-seek-1 | load: special tokens cache size = 818 deep-seek-1 | load: token to piece cache size = 0.8223 MB deep-seek-1 | print_info: arch = deepseek2 deep-seek-1 | print_info: vocab_only = 0 deep-seek-1 | print_info: n_ctx_train = 163840 deep-seek-1 | print_info: n_embd = 7168 deep-seek-1 | print_info: n_layer = 61 deep-seek-1 | print_info: n_head = 128 deep-seek-1 | print_info: n_head_kv = 128 deep-seek-1 | print_info: n_rot = 64 deep-seek-1 | print_info: n_swa = 0 deep-seek-1 | print_info: n_embd_head_k = 192 deep-seek-1 | print_info: n_embd_head_v = 128 deep-seek-1 | print_info: n_gqa = 1 deep-seek-1 | print_info: n_embd_k_gqa = 24576 deep-seek-1 | print_info: n_embd_v_gqa = 16384 deep-seek-1 | print_info: f_norm_eps = 0.0e+00 deep-seek-1 | print_info: f_norm_rms_eps = 1.0e-06 deep-seek-1 | print_info: f_clamp_kqv = 0.0e+00 deep-seek-1 | print_info: f_max_alibi_bias = 0.0e+00 deep-seek-1 | print_info: f_logit_scale = 0.0e+00 deep-seek-1 | print_info: n_ff = 18432 deep-seek-1 | print_info: n_expert = 256 deep-seek-1 | print_info: n_expert_used = 8 deep-seek-1 | print_info: causal attn = 1 deep-seek-1 | print_info: pooling type = 0 deep-seek-1 | print_info: rope type = 0 deep-seek-1 | print_info: rope scaling = yarn deep-seek-1 | print_info: freq_base_train = 10000.0 deep-seek-1 | print_info: freq_scale_train = 0.025 deep-seek-1 | print_info: n_ctx_orig_yarn = 4096 deep-seek-1 | print_info: rope_finetuned = unknown deep-seek-1 | print_info: ssm_d_conv = 0 deep-seek-1 | print_info: ssm_d_inner = 0 deep-seek-1 | print_info: ssm_d_state = 0 deep-seek-1 | print_info: ssm_dt_rank = 0 deep-seek-1 | print_info: ssm_dt_b_c_rms = 0 deep-seek-1 | print_info: model type = 671B deep-seek-1 | print_info: model params = 671.03 B deep-seek-1 | print_info: general.name = DeepSeek R1 deep-seek-1 | print_info: n_layer_dense_lead = 3 deep-seek-1 | print_info: n_lora_q = 1536 deep-seek-1 | print_info: n_lora_kv = 512 deep-seek-1 | print_info: n_ff_exp = 2048 deep-seek-1 | print_info: n_expert_shared = 1 deep-seek-1 | print_info: expert_weights_scale = 2.5 deep-seek-1 | print_info: expert_weights_norm = 1 deep-seek-1 | print_info: expert_gating_func = sigmoid deep-seek-1 | print_info: rope_yarn_log_mul = 0.1000 deep-seek-1 | print_info: vocab type = BPE deep-seek-1 | print_info: n_vocab = 129280 deep-seek-1 | print_info: n_merges = 127741 deep-seek-1 | print_info: BOS token = 0 '<|begin▁of▁sentence|>' deep-seek-1 | print_info: EOS token = 1 '<|end▁of▁sentence|>' deep-seek-1 | print_info: EOT token = 1 '<|end▁of▁sentence|>' deep-seek-1 | print_info: PAD token = 1 '<|end▁of▁sentence|>' deep-seek-1 | print_info: LF token = 131 'Ä' deep-seek-1 | print_info: FIM PRE token = 128801 '<|fim▁begin|>' deep-seek-1 | print_info: FIM SUF token = 128800 '<|fim▁hole|>' deep-seek-1 | print_info: FIM MID token = 128802 '<|fim▁end|>' deep-seek-1 | print_info: EOG token = 1 '<|end▁of▁sentence|>' deep-seek-1 | print_info: max token length = 256 deep-seek-1 | load_tensors: offloading 7 repeating layers to GPU deep-seek-1 | load_tensors: offloaded 7/62 layers to GPU deep-seek-1 | load_tensors: CUDA0 model buffer size = 6179.06 MiB deep-seek-1 | load_tensors: CUDA1 model buffer size = 12358.12 MiB deep-seek-1 | load_tensors: CUDA2 model buffer size = 12358.12 MiB deep-seek-1 | load_tensors: CUDA3 model buffer size = 12358.12 MiB deep-seek-1 | load_tensors: CPU_AARCH64 model buffer size = 307402.66 MiB deep-seek-1 | load_tensors: CPU_Mapped model buffer size = 316192.55 MiB deep-seek-1 | llama_init_from_model: n_seq_max = 1 deep-seek-1 | llama_init_from_model: n_ctx = 32768 deep-seek-1 | llama_init_from_model: n_ctx_per_seq = 32768 deep-seek-1 | llama_init_from_model: n_batch = 2048 deep-seek-1 | llama_init_from_model: n_ubatch = 512 deep-seek-1 | llama_init_from_model: flash_attn = 0 deep-seek-1 | llama_init_from_model: freq_base = 10000.0 deep-seek-1 | llama_init_from_model: freq_scale = 0.025 deep-seek-1 | llama_init_from_model: n_ctx_per_seq (32768) < n_ctx_train (163840) -- the full capacity of the model will not be utilized deep-seek-1 | llama_kv_cache_init: kv_size = 32768, offload = 0, type_k = 'q4_0', type_v = 'f16', n_layer = 61, can_shift = 0 deep-seek-1 | llama_kv_cache_init: CPU KV buffer size = 88816.00 MiB deep-seek-1 | llama_init_from_model: KV self size = 88816.00 MiB, K (q4_0): 26352.00 MiB, V (f16): 62464.00 MiB deep-seek-1 | llama_init_from_model: CPU output buffer size = 0.49 MiB deep-seek-1 | llama_init_from_model: CUDA0 compute buffer size = 2451.50 MiB deep-seek-1 | llama_init_from_model: CUDA1 compute buffer size = 234.00 MiB deep-seek-1 | llama_init_from_model: CUDA2 compute buffer size = 234.00 MiB deep-seek-1 | llama_init_from_model: CUDA3 compute buffer size = 234.00 MiB deep-seek-1 | llama_init_from_model: CUDA_Host compute buffer size = 8385.01 MiB deep-seek-1 | llama_init_from_model: graph nodes = 5025 deep-seek-1 | llama_init_from_model: graph splits = 790 (with bs=512), 20 (with bs=1) deep-seek-1 | common_init_from_params: KV cache shifting is not supported for this model, disabling KV cache shifting deep-seek-1 | common_init_from_params: setting dry_penalty_last_n to ctx_size = 32768 deep-seek-1 | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) deep-seek-1 | srv init: initializing slots, n_slots = 1 deep-seek-1 | slot init: id 0 | task -1 | new slot n_ctx_slot = 32768 deep-seek-1 | main: model loaded deep-seek-1 | main: chat template, chat_template: {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '' + '\n' + tool['function']['arguments'] + '\n' + '' + '<|tool▁call▁end|>'}}{%- else %}{{'<|Assistant|>' + message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '' + '\n' + tool['function']['arguments'] + '\n' + '' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '' + '\n' + tool['function']['arguments'] + '\n' + '' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}, example_format: 'You are a helpful assistant deep-seek-1 | deep-seek-1 | <|User|>Hello<|Assistant|>Hi there<|end▁of▁sentence|><|User|>How are you?<|Assistant|>' deep-seek-1 | main: server is listening on http://0.0.0.0:8008 - starting the main loop deep-seek-1 | srv update_slots: all slots are idle deep-seek-1 | request: GET / 192.168.50.107 200 deep-seek-1 | slot launch_slot_: id 0 | task 0 | processing task deep-seek-1 | slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 72 deep-seek-1 | slot update_slots: id 0 | task 0 | kv cache rm [0, end) deep-seek-1 | slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 72, n_tokens = 72, progress = 1.000000 deep-seek-1 | slot update_slots: id 0 | task 0 | prompt done, n_past = 72, n_tokens = 72 deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory deep-seek-1 | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120 deep-seek-1 | llama_kv_cache_update_impl: failed to allocate compute buffers deep-seek-1 | slot release: id 0 | task 0 | stop processing: n_past = 1756, truncated = 0 deep-seek-1 | slot print_timing: id 0 | task 0 | deep-seek-1 | prompt eval time = 19560.97 ms / 72 tokens ( 271.68 ms per token, 3.68 tokens per second) deep-seek-1 | eval time = 267987.37 ms / 1685 tokens ( 159.04 ms per token, 6.29 tokens per second) deep-seek-1 | total time = 287548.34 ms / 1757 tokens deep-seek-1 | srv update_slots: all slots are idle deep-seek-1 | request: POST /v1/chat/completions 192.168.50.107 200 deep-seek-1 | slot launch_slot_: id 0 | task 1686 | processing task deep-seek-1 | slot update_slots: id 0 | task 1686 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 55 deep-seek-1 | slot update_slots: id 0 | task 1686 | kv cache rm [8, end) deep-seek-1 | slot update_slots: id 0 | task 1686 | prompt processing progress, n_past = 55, n_tokens = 47, progress = 0.854545 deep-seek-1 | slot update_slots: id 0 | task 1686 | prompt done, n_past = 55, n_tokens = 47 deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed/app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: deep-seek-1 | GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed deep-seek-1 | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Name and Version
DeepSeek R1 Q4_0 gguf is hitting an assert error i am not sure how to troubleshoot. Running with 32k context and some layers offloaded to GPU.
Operating systems
Linux
GGML backends
CUDA
Hardware
Threadripper pro 7965x, 512gb ram 8 dims, 3090 ti and 3x A4000
Models
DeepSeek-R1 Q4_0 GGUF
Problem description & steps to reproduce
Assert error at end of logs posted above
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: