Multithreading Offloading (thread on device) #11633

lexasub · 2025-02-03T20:39:14Z

lexasub
Feb 3, 2025

Hi everyone,

I'm working on asynchronously launching ggml_backend_tensor_set in llama-model-loader.cpp using a thread pool. Currently, these calls are executed sequentially in one group. Previously, I tried to create a unique queue for each device and process the data sequentially, but that didn't yield the expected results.

Right now, I'm temporarily using a single shared queue, but unfortunately, this doesn't solve the problem — I'm encountering a SIGSEGV error. I will continue to look for a solution.

I wanted to ask if anyone is currently working on Multithreading Offloading? I would appreciate any advice or ideas!
Why multithreading? because we have rpc-server devices (in 4 pc, load so much time)

Thank you!

slaren · 2025-02-03T22:48:55Z

slaren
Feb 3, 2025
Collaborator

I would expect the load time of the RPC servers to be mainly limited by the network bandwidth, so unless you have multiple NICs with direct connections to each server, I don't think this is likely to help significantly. The best way to reduce the load time would be to implement a tensor cache in the server (as previously mentioned in #9740 (comment)).

1 reply

lexasub Feb 4, 2025
Author

The interesting part is that the network doesn’t seem to be heavily loaded after I noticed the memory usage increase, yet we still end up waiting for quite some time. There’s also the possibility that due to a timeout, we could be sending packets to different servers located in another city (in my case, two servers in the same location). This could result in an average timeout reduction. It’s akin to the analogy that nine women can’t give birth to a child in one month, but for a CPU pipeline, such a scenario might actually work fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreading Offloading (thread on device) #11633

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Multithreading Offloading (thread on device) #11633

lexasub Feb 3, 2025

Replies: 1 comment · 1 reply

slaren Feb 3, 2025 Collaborator

lexasub Feb 4, 2025 Author

lexasub
Feb 3, 2025

Replies: 1 comment 1 reply

slaren
Feb 3, 2025
Collaborator

lexasub Feb 4, 2025
Author