feat(llama.cpp): Vulkan, Kompute, SYCL #1647

mudler · 2024-01-26T09:05:38Z

Tracker for: ggml-org/llama.cpp#5138 and also ROCm

Vulkan: feat(vulkan): add vulkan support to the llama.cpp backend #2648 (upstream Vulkan Implementation ggml-org/llama.cpp#2059 )
Kompute: (upstream Nomic Vulkan backend ggml-org/llama.cpp#4456 )
sycl: feat(sycl): Add support for Intel GPUs with sycl (#1647) #1660 ( upstream Feature: Integrate with unified SYCL backend for Intel GPUs ggml-org/llama.cpp#2690 )
ROCm: Build docker container for ROCm #1595 (feat(llama.cpp): enable ROCm/HIPBLAS support #1100)

* feat(sycl): Add sycl support (#1647) * onekit: install without prompts * set cmake args only in grpc-server Signed-off-by: Ettore Di Giacinto <[email protected]> * cleanup * fixup sycl source env * Cleanup docs * ci: runs on self-hosted * fix typo * bump llama.cpp * llama.cpp: update server * adapt to upstream changes * adapt to upstream changes * docs: add sycl --------- Signed-off-by: Ettore Di Giacinto <[email protected]>

RiQuY · 2024-04-08T15:48:36Z

The merge requests linked on this issue appears to be merged upstream. Does that mean LocalAI already supports Vulkan or there are any additional tasks to do before that?

mudler · 2024-06-24T18:06:34Z

The merge requests linked on this issue appears to be merged upstream. Does that mean LocalAI already supports Vulkan or there are any additional tasks to do before that?

Only kompute is missing as for now

jim3692 · 2024-09-09T08:04:08Z

The merge requests linked on this issue appears to be merged upstream. Does that mean LocalAI already supports Vulkan or there are any additional tasks to do before that?

Only kompute is missing as for now

It looks like kompute is also merged

KhazAkar · 2024-11-19T21:52:46Z

So.. what's missing in LocalAI to support vulkan? Or compilation of in-tree llama.cpp to support vulkan would be enough to use it?

arenekosreal · 2025-02-16T11:17:17Z

It seems that we need to set BUILD_TYPE=vulkan to let llama.cpp use vulkan. But does this mean we have to rebuild the program if we want to use CPU backend? I do not see any option to let me switch between them at runtime.

I have a pretty old GTX 1050Ti Mobile GPU, which means that I can only run some mini models on it, but it is still much faster than run those small models on CPU. With the ability to switch dynamically, I can run mini models on my GPU while can also try some larger models with my CPU. The reason why I do not use cuda is that vulkan is much smaller than cuda runtime. Although using vulkan is slower than using cublas, I think it is acceptable on my old buddy.

My idea is that we can also let vulkan backend optional like cuda backend, so LocalAI will prefer vulkan backend while can also fallback to CPU backend. At this moment, I find that LocalAI cannot turn to use CPU if it meets large models failed to load into GPU when building with BUILD_TYPE=vulkan.

KhazAkar · 2025-02-16T11:37:51Z

It seems that we need to set BUILD_TYPE=vulkan to let llama.cpp use vulkan. But does this mean we have to rebuild the program if we want to use CPU backend? I do not see any option to let me switch between them at runtime.

I have a pretty old GTX 1050Ti Mobile GPU, which means that I can only run some mini models on it, but it is still much faster than run those small models on CPU. With the ability to switch dynamically, I can run mini models on my GPU while can also try some larger models with my CPU. The reason why I do not use cuda is that vulkan is much smaller than cuda runtime. Although using vulkan is slower than using cublas, I think it is acceptable on my old buddy.

My idea is that we can also let vulkan backend optional like cuda backend, so LocalAI will prefer vulkan backend while can also fallback to CPU backend. At this moment, I find that LocalAI cannot turn to use CPU if it meets large models failed to load into GPU when building with BUILD_TYPE=vulkan.

You can technically don't pass ngl or set it to 0 to not use GPU for offload

arenekosreal · 2025-02-17T03:49:41Z

It seems that we need to set BUILD_TYPE=vulkan to let llama.cpp use vulkan. But does this mean we have to rebuild the program if we want to use CPU backend? I do not see any option to let me switch between them at runtime.
I have a pretty old GTX 1050Ti Mobile GPU, which means that I can only run some mini models on it, but it is still much faster than run those small models on CPU. With the ability to switch dynamically, I can run mini models on my GPU while can also try some larger models with my CPU. The reason why I do not use cuda is that vulkan is much smaller than cuda runtime. Although using vulkan is slower than using cublas, I think it is acceptable on my old buddy.
My idea is that we can also let vulkan backend optional like cuda backend, so LocalAI will prefer vulkan backend while can also fallback to CPU backend. At this moment, I find that LocalAI cannot turn to use CPU if it meets large models failed to load into GPU when building with BUILD_TYPE=vulkan.

You can technically don't pass ngl or set it to 0 to not use GPU for offload

I found a very hacky way to achieve that: Build llama-cpp-fallback with BUILD_TYPE=openblas, while build other parts with BUILD_TYPE=vulkan. Then LocalAI will try vulkan first, and fallback to openblas. But this means that I have to enable other optimizations on llama-cpp-fallback. I am not sure if this usage is valid because there do not have any documentation about the mixture of types of backends.

KhazAkar · 2025-02-17T06:14:48Z

Interesting, thanks for that @arenekosreal !
Technically llama.cpp is able to be compiled with multiple backends backed in, so this distinction in LocalAI is interesting to see

mudler added the enhancement New feature or request label Jan 26, 2024

mudler added a commit that referenced this issue Jan 29, 2024

feat(sycl): Add sycl support (#1647)

6096c9d

This was referenced Jan 29, 2024

⬆️ Update ggerganov/llama.cpp #1656

Merged

feat(sycl): Add support for Intel GPUs with sycl (#1647) #1660

Merged

mudler added a commit that referenced this issue Jan 30, 2024

feat(sycl): Add sycl support (#1647)

91a984e

mudler changed the title ~~llama.cpp Vulkan, Kompute, SYCL~~ feat(llama.cpp): Vulkan, Kompute, SYCL Jan 31, 2024

mudler added the roadmap label Jan 31, 2024

mudler mentioned this issue Jan 31, 2024

[EPIC] Model support dashboard (v2) #1126

Open

89 tasks

mudler added a commit that referenced this issue Feb 1, 2024

feat(sycl): Add sycl support (#1647)

927e41c

mudler pinned this issue Mar 2, 2024

mudler mentioned this issue Mar 2, 2024

feat(entrypoint): detect GPU #1788

Closed

mudler unpinned this issue Apr 28, 2024

mudler mentioned this issue Jun 24, 2024

feat(vulkan): add vulkan support to the llama.cpp backend #2648

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llama.cpp): Vulkan, Kompute, SYCL #1647

feat(llama.cpp): Vulkan, Kompute, SYCL #1647

mudler commented Jan 26, 2024 •

edited

Loading

RiQuY commented Apr 8, 2024

mudler commented Jun 24, 2024

jim3692 commented Sep 9, 2024

KhazAkar commented Nov 19, 2024

arenekosreal commented Feb 16, 2025

KhazAkar commented Feb 16, 2025

arenekosreal commented Feb 17, 2025

KhazAkar commented Feb 17, 2025

feat(llama.cpp): Vulkan, Kompute, SYCL #1647

feat(llama.cpp): Vulkan, Kompute, SYCL #1647

Comments

mudler commented Jan 26, 2024 • edited Loading

RiQuY commented Apr 8, 2024

mudler commented Jun 24, 2024

jim3692 commented Sep 9, 2024

KhazAkar commented Nov 19, 2024

arenekosreal commented Feb 16, 2025

KhazAkar commented Feb 16, 2025

arenekosreal commented Feb 17, 2025

KhazAkar commented Feb 17, 2025

mudler commented Jan 26, 2024 •

edited

Loading