Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync : llama.cpp #1006

Merged
merged 19 commits into from
Nov 4, 2024
Merged

sync : llama.cpp #1006

merged 19 commits into from
Nov 4, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Nov 4, 2024

TODO:

  • fix build and tests

ggerganov and others added 18 commits November 4, 2024 10:50
… MobileVLM model. (llama/9763)

* ggml: Add POOL2D OP for GPU ACC to the Vulkan.

- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.

Signed-off-by: Changyeon Kim <[email protected]>

* [fix] Correct the incorrect order of the parameters.

fix casting to int.

Signed-off-by: Changyeon Kim <[email protected]>

---------

Signed-off-by: Changyeon Kim <[email protected]>
* ggml : RISC-V vector gemv for q4_0_8x8

* ggml : Added WIP rvv q4_0_8x8 gemm

* ggml : Added initial implementation of rvv gemm

* ggml : optimize gemm to avoid register spillover

* ggml : Fix GCC rvv load alignment issue

* ggml : Format gemm rvv code

* ggml : Fix a typo in RVV q4_0_8_8 GEMM
* ggml : fix gguf string leak when reading kv pairs fails

* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type

* ggml : avoid crashing on failed memory allocations when loading a gguf file
Get in line with the other backends by supporting the newer
backend/device registry interfaces.

Signed-off-by: Sergio Lopez <[email protected]>
This is a more or less direct translation from the Metal implementation
to GLSL.

Signed-off-by: Sergio Lopez <[email protected]>
* llama : fix buffer checks for mamba and rwk

* llama : fix missing worst case flag during reserve

* cuda : fix supports_op for norm

* disable sched SET_CAUSE
* llama : add simple-chat example

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>
* metal : minor fixup in FA kernel

ggml-ci

* metal : use the unrolled loop variable

* metal : remove unused var
@ggerganov ggerganov changed the title sync : llam.cpp sync : llama.cpp Nov 4, 2024
@slaren
Copy link
Collaborator

slaren commented Nov 4, 2024

The test-opt should just be disabled until it is updated in #988, since the opt interface has been removed it cannot be updated.

Looks like other tests are failing too, I will update them.

@slaren
Copy link
Collaborator

slaren commented Nov 4, 2024

I disabled all tests and examples that depend on ggml_opt. They should be re-enabled or removed in #988.

@ggerganov ggerganov marked this pull request as ready for review November 4, 2024 17:37
@ggerganov ggerganov merged commit f3c1e6a into master Nov 4, 2024
4 checks passed
@ggerganov ggerganov deleted the sync branch November 4, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants