sync : llama.cpp #1006

ggerganov · 2024-11-04T08:51:43Z

TODO:

fix build and tests

Signed-off-by: Xiaodong Ye <[email protected]>

… MobileVLM model. (llama/9763) * ggml: Add POOL2D OP for GPU ACC to the Vulkan. - The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <[email protected]> * [fix] Correct the incorrect order of the parameters. fix casting to int. Signed-off-by: Changyeon Kim <[email protected]> --------- Signed-off-by: Changyeon Kim <[email protected]>

* ggml : RISC-V vector gemv for q4_0_8x8 * ggml : Added WIP rvv q4_0_8x8 gemm * ggml : Added initial implementation of rvv gemm * ggml : optimize gemm to avoid register spillover * ggml : Fix GCC rvv load alignment issue * ggml : Format gemm rvv code * ggml : Fix a typo in RVV q4_0_8_8 GEMM

* ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file

Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <[email protected]>

This is a more or less direct translation from the Metal implementation to GLSL. Signed-off-by: Sergio Lopez <[email protected]>

* llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE

ggml-ci

* llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

* metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var

slaren · 2024-11-04T11:25:56Z

The test-opt should just be disabled until it is updated in #988, since the opt interface has been removed it cannot be updated.

Looks like other tests are failing too, I will update them.

slaren · 2024-11-04T11:39:28Z

I disabled all tests and examples that depend on ggml_opt. They should be re-enabled or removed in #988.

ggerganov and others added 18 commits November 4, 2024 10:50

scripts : update sync

deb78e6

musa: workaround for Guilty Lockup in cleaning src0 (llama/10042)

9dc22cc

Signed-off-by: Xiaodong Ye <[email protected]>

llama : refactor model loader with backend registry (llama/10026)

3a22438

ggml : fix memory leaks when loading invalid gguf files (llama/10094)

cd1388d

* ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file

kompute: add backend registry / device interfaces (llama/10045)

4c7c45b

Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <[email protected]>

kompute: add mul_mat_q4_k shader (llama/10097)

43db1f2

This is a more or less direct translation from the Metal implementation to GLSL. Signed-off-by: Sergio Lopez <[email protected]>

ggml : check tensor name lengths in gguf files (llama/10100)

7199975

llama : fix buffer checks for mamba and rwk (llama/10111)

343474c

* llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE

build: fix build error in Windows env with OneAPI setup (llama/10107)

1a11888

ggml : remove ggml_scratch (llama/10121)

5e3ca1c

ggml-ci

vulkan : improve ggml_vk_create_buffer error handling (llama/9898)

f55d317

llama : use smart pointers for ggml resources (llama/10117)

f7ab752

llama : add simple-chat example (llama/10124)

b45e446

* llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <[email protected]>

metal : minor fixup in FA kernel (llama/10143)

433a421

* metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var

ggml : move CPU backend to a separate file (llama/10144)

1785094

sync : llama.cpp

44548a9

ggerganov changed the title ~~sync : llam.cpp~~ sync : llama.cpp Nov 4, 2024

update tests and examples

ada9fbb

slaren force-pushed the sync branch from f087ecf to ada9fbb Compare November 4, 2024 11:43

JohannesGaessler mentioned this pull request Nov 4, 2024

ggml: new optimization interface #988

Open

ggerganov marked this pull request as ready for review November 4, 2024 17:37

ggerganov merged commit f3c1e6a into master Nov 4, 2024
4 checks passed

ggerganov deleted the sync branch November 4, 2024 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : llama.cpp #1006

sync : llama.cpp #1006

ggerganov commented Nov 4, 2024 •

edited

Loading

slaren commented Nov 4, 2024 •

edited

Loading

slaren commented Nov 4, 2024

sync : llama.cpp #1006

sync : llama.cpp #1006

Conversation

ggerganov commented Nov 4, 2024 • edited Loading

slaren commented Nov 4, 2024 • edited Loading

slaren commented Nov 4, 2024

ggerganov commented Nov 4, 2024 •

edited

Loading

slaren commented Nov 4, 2024 •

edited

Loading