Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval bug: model is failing an assert error #11476

Open
ad-astra-video opened this issue Jan 29, 2025 · 0 comments
Open

Eval bug: model is failing an assert error #11476

ad-astra-video opened this issue Jan 29, 2025 · 0 comments

Comments

@ad-astra-video
Copy link

ad-astra-video commented Jan 29, 2025

Name and Version

DeepSeek R1 Q4_0 gguf is hitting an assert error i am not sure how to troubleshoot. Running with 32k context and some layers offloaded to GPU.

Operating systems

Linux

GGML backends

CUDA

Hardware

Threadripper pro 7965x, 512gb ram 8 dims, 3090 ti and 3x A4000

Models

DeepSeek-R1 Q4_0 GGUF

Problem description & steps to reproduce

Assert error at end of logs posted above

First Bad Commit

No response

Relevant log output

Attaching to deep-seek-1
deep-seek-1  | ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
deep-seek-1  | ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
deep-seek-1  | ggml_cuda_init: found 4 CUDA devices:
deep-seek-1  |   Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
deep-seek-1  |   Device 1: NVIDIA RTX A4000, compute capability 8.6, VMM: yes
deep-seek-1  |   Device 2: NVIDIA RTX A4000, compute capability 8.6, VMM: yes
deep-seek-1  |   Device 3: NVIDIA RTX A4000, compute capability 8.6, VMM: yes
deep-seek-1  | build: 4524 (6171c9d2) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
deep-seek-1  | system info: n_threads = 24, n_threads_batch = 24, total_threads = 48
deep-seek-1  |
deep-seek-1  | system_info: n_threads = 24 (n_threads_batch = 24) / 48 | CUDA : ARCHS = 520,610,700,750 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
deep-seek-1  |
deep-seek-1  | main: HTTP server is listening, hostname: 0.0.0.0, port: 8008, http threads: 6
deep-seek-1  | main: loading model
deep-seek-1  | srv    load_model: loading model '/models/deepseekv3-r1/DeepSeek-R1-Q4_0/DeepSeek-R1-Q4_0.gguf'
deep-seek-1  | llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3090 Ti) - 23840 MiB free
deep-seek-1  | llama_model_load_from_file_impl: using device CUDA1 (NVIDIA RTX A4000) - 15804 MiB free
deep-seek-1  | llama_model_load_from_file_impl: using device CUDA2 (NVIDIA RTX A4000) - 15804 MiB free
deep-seek-1  | llama_model_load_from_file_impl: using device CUDA3 (NVIDIA RTX A4000) - 15804 MiB free
deep-seek-1  | llama_model_loader: loaded meta data with 51 key-value pairs and 1025 tensors from /models/deepseekv3-r1/DeepSeek-R1-Q4_0/DeepSeek-R1-Q4_0.gguf (version GGUF V3 (latest))
deep-seek-1  | llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
deep-seek-1  | llama_model_loader: - kv   0:                       general.architecture str              = deepseek2
deep-seek-1  | llama_model_loader: - kv   1:                               general.type str              = model
deep-seek-1  | llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1
deep-seek-1  | llama_model_loader: - kv   3:                         general.size_label str              = 256x20B
deep-seek-1  | llama_model_loader: - kv   4:                               general.tags arr[str,1]       = ["text-generation"]
deep-seek-1  | llama_model_loader: - kv   5:                      deepseek2.block_count u32              = 61
deep-seek-1  | llama_model_loader: - kv   6:                   deepseek2.context_length u32              = 163840
deep-seek-1  | llama_model_loader: - kv   7:                 deepseek2.embedding_length u32              = 7168
deep-seek-1  | llama_model_loader: - kv   8:              deepseek2.feed_forward_length u32              = 18432
deep-seek-1  | llama_model_loader: - kv   9:             deepseek2.attention.head_count u32              = 128
deep-seek-1  | llama_model_loader: - kv  10:          deepseek2.attention.head_count_kv u32              = 128
deep-seek-1  | llama_model_loader: - kv  11:                   deepseek2.rope.freq_base f32              = 10000.000000
deep-seek-1  | llama_model_loader: - kv  12: deepseek2.attention.layer_norm_rms_epsilon f32              = 0.000001
deep-seek-1  | llama_model_loader: - kv  13:                deepseek2.expert_used_count u32              = 8
deep-seek-1  | llama_model_loader: - kv  14:        deepseek2.leading_dense_block_count u32              = 3
deep-seek-1  | llama_model_loader: - kv  15:                       deepseek2.vocab_size u32              = 129280
deep-seek-1  | llama_model_loader: - kv  16:            deepseek2.attention.q_lora_rank u32              = 1536
deep-seek-1  | llama_model_loader: - kv  17:           deepseek2.attention.kv_lora_rank u32              = 512
deep-seek-1  | llama_model_loader: - kv  18:             deepseek2.attention.key_length u32              = 192
deep-seek-1  | llama_model_loader: - kv  19:           deepseek2.attention.value_length u32              = 128
deep-seek-1  | llama_model_loader: - kv  20:       deepseek2.expert_feed_forward_length u32              = 2048
deep-seek-1  | llama_model_loader: - kv  21:                     deepseek2.expert_count u32              = 256
deep-seek-1  | llama_model_loader: - kv  22:              deepseek2.expert_shared_count u32              = 1
deep-seek-1  | llama_model_loader: - kv  23:             deepseek2.expert_weights_scale f32              = 2.500000
deep-seek-1  | llama_model_loader: - kv  24:              deepseek2.expert_weights_norm bool             = true
deep-seek-1  | llama_model_loader: - kv  25:               deepseek2.expert_gating_func u32              = 2
deep-seek-1  | llama_model_loader: - kv  26:             deepseek2.rope.dimension_count u32              = 64
deep-seek-1  | llama_model_loader: - kv  27:                deepseek2.rope.scaling.type str              = yarn
deep-seek-1  | llama_model_loader: - kv  28:              deepseek2.rope.scaling.factor f32              = 40.000000
deep-seek-1  | llama_model_loader: - kv  29: deepseek2.rope.scaling.original_context_length u32              = 4096
deep-seek-1  | llama_model_loader: - kv  30: deepseek2.rope.scaling.yarn_log_multiplier f32              = 0.100000
deep-seek-1  | llama_model_loader: - kv  31:                       tokenizer.ggml.model str              = gpt2
deep-seek-1  | llama_model_loader: - kv  32:                         tokenizer.ggml.pre str              = deepseek-v3
deep-seek-1  | llama_model_loader: - kv  33:                      tokenizer.ggml.tokens arr[str,129280]  = ["<|begin▁of▁sentence|>", "<�...
deep-seek-1  | llama_model_loader: - kv  34:                  tokenizer.ggml.token_type arr[i32,129280]  = [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
deep-seek-1  | llama_model_loader: - kv  35:                      tokenizer.ggml.merges arr[str,127741]  = ["Ġ t", "Ġ a", "i n", "Ġ Ġ", "h e...
deep-seek-1  | llama_model_loader: - kv  36:                tokenizer.ggml.bos_token_id u32              = 0
deep-seek-1  | llama_model_loader: - kv  37:                tokenizer.ggml.eos_token_id u32              = 1
deep-seek-1  | llama_model_loader: - kv  38:            tokenizer.ggml.padding_token_id u32              = 1
deep-seek-1  | llama_model_loader: - kv  39:               tokenizer.ggml.add_bos_token bool             = true
deep-seek-1  | llama_model_loader: - kv  40:               tokenizer.ggml.add_eos_token bool             = false
deep-seek-1  | llama_model_loader: - kv  41:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
deep-seek-1  | llama_model_loader: - kv  42:               general.quantization_version u32              = 2
deep-seek-1  | llama_model_loader: - kv  43:                          general.file_type u32              = 2
deep-seek-1  | llama_model_loader: - kv  44:                      quantize.imatrix.file str              = /models_out/DeepSeek-R1-GGUF/DeepSeek...
deep-seek-1  | llama_model_loader: - kv  45:                   quantize.imatrix.dataset str              = /training_data/calibration_datav3.txt
deep-seek-1  | llama_model_loader: - kv  46:             quantize.imatrix.entries_count i32              = 720
deep-seek-1  | llama_model_loader: - kv  47:              quantize.imatrix.chunks_count i32              = 124
deep-seek-1  | llama_model_loader: - kv  48:                                   split.no u16              = 0
deep-seek-1  | llama_model_loader: - kv  49:                        split.tensors.count i32              = 1025
deep-seek-1  | llama_model_loader: - kv  50:                                split.count u16              = 0
deep-seek-1  | llama_model_loader: - type  f32:  361 tensors
deep-seek-1  | llama_model_loader: - type q4_0:  652 tensors
deep-seek-1  | llama_model_loader: - type q4_1:   11 tensors
deep-seek-1  | llama_model_loader: - type q6_K:    1 tensors
deep-seek-1  | print_info: file format = GGUF V3 (latest)
deep-seek-1  | print_info: file type   = Q4_0
deep-seek-1  | print_info: file size   = 353.00 GiB (4.52 BPW)
deep-seek-1  | load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
deep-seek-1  | load: special tokens cache size = 818
deep-seek-1  | load: token to piece cache size = 0.8223 MB
deep-seek-1  | print_info: arch             = deepseek2
deep-seek-1  | print_info: vocab_only       = 0
deep-seek-1  | print_info: n_ctx_train      = 163840
deep-seek-1  | print_info: n_embd           = 7168
deep-seek-1  | print_info: n_layer          = 61
deep-seek-1  | print_info: n_head           = 128
deep-seek-1  | print_info: n_head_kv        = 128
deep-seek-1  | print_info: n_rot            = 64
deep-seek-1  | print_info: n_swa            = 0
deep-seek-1  | print_info: n_embd_head_k    = 192
deep-seek-1  | print_info: n_embd_head_v    = 128
deep-seek-1  | print_info: n_gqa            = 1
deep-seek-1  | print_info: n_embd_k_gqa     = 24576
deep-seek-1  | print_info: n_embd_v_gqa     = 16384
deep-seek-1  | print_info: f_norm_eps       = 0.0e+00
deep-seek-1  | print_info: f_norm_rms_eps   = 1.0e-06
deep-seek-1  | print_info: f_clamp_kqv      = 0.0e+00
deep-seek-1  | print_info: f_max_alibi_bias = 0.0e+00
deep-seek-1  | print_info: f_logit_scale    = 0.0e+00
deep-seek-1  | print_info: n_ff             = 18432
deep-seek-1  | print_info: n_expert         = 256
deep-seek-1  | print_info: n_expert_used    = 8
deep-seek-1  | print_info: causal attn      = 1
deep-seek-1  | print_info: pooling type     = 0
deep-seek-1  | print_info: rope type        = 0
deep-seek-1  | print_info: rope scaling     = yarn
deep-seek-1  | print_info: freq_base_train  = 10000.0
deep-seek-1  | print_info: freq_scale_train = 0.025
deep-seek-1  | print_info: n_ctx_orig_yarn  = 4096
deep-seek-1  | print_info: rope_finetuned   = unknown
deep-seek-1  | print_info: ssm_d_conv       = 0
deep-seek-1  | print_info: ssm_d_inner      = 0
deep-seek-1  | print_info: ssm_d_state      = 0
deep-seek-1  | print_info: ssm_dt_rank      = 0
deep-seek-1  | print_info: ssm_dt_b_c_rms   = 0
deep-seek-1  | print_info: model type       = 671B
deep-seek-1  | print_info: model params     = 671.03 B
deep-seek-1  | print_info: general.name     = DeepSeek R1
deep-seek-1  | print_info: n_layer_dense_lead   = 3
deep-seek-1  | print_info: n_lora_q             = 1536
deep-seek-1  | print_info: n_lora_kv            = 512
deep-seek-1  | print_info: n_ff_exp             = 2048
deep-seek-1  | print_info: n_expert_shared      = 1
deep-seek-1  | print_info: expert_weights_scale = 2.5
deep-seek-1  | print_info: expert_weights_norm  = 1
deep-seek-1  | print_info: expert_gating_func   = sigmoid
deep-seek-1  | print_info: rope_yarn_log_mul    = 0.1000
deep-seek-1  | print_info: vocab type       = BPE
deep-seek-1  | print_info: n_vocab          = 129280
deep-seek-1  | print_info: n_merges         = 127741
deep-seek-1  | print_info: BOS token        = 0 '<|begin▁of▁sentence|>'
deep-seek-1  | print_info: EOS token        = 1 '<|end▁of▁sentence|>'
deep-seek-1  | print_info: EOT token        = 1 '<|end▁of▁sentence|>'
deep-seek-1  | print_info: PAD token        = 1 '<|end▁of▁sentence|>'
deep-seek-1  | print_info: LF token         = 131 'Ä'
deep-seek-1  | print_info: FIM PRE token    = 128801 '<|fim▁begin|>'
deep-seek-1  | print_info: FIM SUF token    = 128800 '<|fim▁hole|>'
deep-seek-1  | print_info: FIM MID token    = 128802 '<|fim▁end|>'
deep-seek-1  | print_info: EOG token        = 1 '<|end▁of▁sentence|>'
deep-seek-1  | print_info: max token length = 256
deep-seek-1  | load_tensors: offloading 7 repeating layers to GPU
deep-seek-1  | load_tensors: offloaded 7/62 layers to GPU
deep-seek-1  | load_tensors:        CUDA0 model buffer size =  6179.06 MiB
deep-seek-1  | load_tensors:        CUDA1 model buffer size = 12358.12 MiB
deep-seek-1  | load_tensors:        CUDA2 model buffer size = 12358.12 MiB
deep-seek-1  | load_tensors:        CUDA3 model buffer size = 12358.12 MiB
deep-seek-1  | load_tensors:  CPU_AARCH64 model buffer size = 307402.66 MiB
deep-seek-1  | load_tensors:   CPU_Mapped model buffer size = 316192.55 MiB
deep-seek-1  | llama_init_from_model: n_seq_max     = 1
deep-seek-1  | llama_init_from_model: n_ctx         = 32768
deep-seek-1  | llama_init_from_model: n_ctx_per_seq = 32768
deep-seek-1  | llama_init_from_model: n_batch       = 2048
deep-seek-1  | llama_init_from_model: n_ubatch      = 512
deep-seek-1  | llama_init_from_model: flash_attn    = 0
deep-seek-1  | llama_init_from_model: freq_base     = 10000.0
deep-seek-1  | llama_init_from_model: freq_scale    = 0.025
deep-seek-1  | llama_init_from_model: n_ctx_per_seq (32768) < n_ctx_train (163840) -- the full capacity of the model will not be utilized
deep-seek-1  | llama_kv_cache_init: kv_size = 32768, offload = 0, type_k = 'q4_0', type_v = 'f16', n_layer = 61, can_shift = 0
deep-seek-1  | llama_kv_cache_init:        CPU KV buffer size = 88816.00 MiB
deep-seek-1  | llama_init_from_model: KV self size  = 88816.00 MiB, K (q4_0): 26352.00 MiB, V (f16): 62464.00 MiB
deep-seek-1  | llama_init_from_model:        CPU  output buffer size =     0.49 MiB
deep-seek-1  | llama_init_from_model:      CUDA0 compute buffer size =  2451.50 MiB
deep-seek-1  | llama_init_from_model:      CUDA1 compute buffer size =   234.00 MiB
deep-seek-1  | llama_init_from_model:      CUDA2 compute buffer size =   234.00 MiB
deep-seek-1  | llama_init_from_model:      CUDA3 compute buffer size =   234.00 MiB
deep-seek-1  | llama_init_from_model:  CUDA_Host compute buffer size =  8385.01 MiB
deep-seek-1  | llama_init_from_model: graph nodes  = 5025
deep-seek-1  | llama_init_from_model: graph splits = 790 (with bs=512), 20 (with bs=1)
deep-seek-1  | common_init_from_params: KV cache shifting is not supported for this model, disabling KV cache shifting
deep-seek-1  | common_init_from_params: setting dry_penalty_last_n to ctx_size = 32768
deep-seek-1  | common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
deep-seek-1  | srv          init: initializing slots, n_slots = 1
deep-seek-1  | slot         init: id  0 | task -1 | new slot n_ctx_slot = 32768
deep-seek-1  | main: model loaded
deep-seek-1  | main: chat template, chat_template: {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' in message %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if message['content'] is none %}{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '' + '\n' + tool['function']['arguments'] + '\n' + '' + '<|tool▁call▁end|>'}}{%- else %}{{'<|Assistant|>' + message['content'] + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '' + '\n' + tool['function']['arguments'] + '\n' + '' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '' + '\n' + tool['function']['arguments'] + '\n' + '' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and 'tool_calls' not in message %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}, example_format: 'You are a helpful assistant
deep-seek-1  |
deep-seek-1  | <|User|>Hello<|Assistant|>Hi there<|end▁of▁sentence|><|User|>How are you?<|Assistant|>'
deep-seek-1  | main: server is listening on http://0.0.0.0:8008 - starting the main loop
deep-seek-1  | srv  update_slots: all slots are idle
deep-seek-1  | request: GET / 192.168.50.107 200
deep-seek-1  | slot launch_slot_: id  0 | task 0 | processing task
deep-seek-1  | slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 72
deep-seek-1  | slot update_slots: id  0 | task 0 | kv cache rm [0, end)
deep-seek-1  | slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 72, n_tokens = 72, progress = 1.000000
deep-seek-1  | slot update_slots: id  0 | task 0 | prompt done, n_past = 72, n_tokens = 72
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | ggml_backend_cuda_buffer_type_alloc_buffer: allocating 9866.00 MiB on device 1: cudaMalloc failed: out of memory
deep-seek-1  | ggml_gallocr_reserve_n: failed to allocate CUDA1 buffer of size 10345253120
deep-seek-1  | llama_kv_cache_update_impl: failed to allocate compute buffers
deep-seek-1  | slot      release: id  0 | task 0 | stop processing: n_past = 1756, truncated = 0
deep-seek-1  | slot print_timing: id  0 | task 0 |
deep-seek-1  | prompt eval time =   19560.97 ms /    72 tokens (  271.68 ms per token,     3.68 tokens per second)
deep-seek-1  |        eval time =  267987.37 ms /  1685 tokens (  159.04 ms per token,     6.29 tokens per second)
deep-seek-1  |       total time =  287548.34 ms /  1757 tokens
deep-seek-1  | srv  update_slots: all slots are idle
deep-seek-1  | request: POST /v1/chat/completions 192.168.50.107 200
deep-seek-1  | slot launch_slot_: id  0 | task 1686 | processing task
deep-seek-1  | slot update_slots: id  0 | task 1686 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 55
deep-seek-1  | slot update_slots: id  0 | task 1686 | kv cache rm [8, end)
deep-seek-1  | slot update_slots: id  0 | task 1686 | prompt processing progress, n_past = 55, n_tokens = 47, progress = 0.854545
deep-seek-1  | slot update_slots: id  0 | task 1686 | prompt done, n_past = 55, n_tokens = 47
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed/app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012:
deep-seek-1  | GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
deep-seek-1  | /app/ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp:4012: GGML_ASSERT(params->wsize >= (GGML_PAD(nbw3, sizeof(int64_t)) + n_as * sizeof(int64_t) + n_as * ne12 * sizeof(mmid_row_mapping))) failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant