Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeJS Cluster support #431

Open
mupperton opened this issue Sep 19, 2024 · 5 comments
Open

NodeJS Cluster support #431

mupperton opened this issue Sep 19, 2024 · 5 comments

Comments

@mupperton
Copy link
Contributor

I regularly make use of the NodeJS cluster module in "normal" API services, as JS is single threaded, and we want to make use of all available parallelism on servers that have multiple cores/threads

This however appears to not work as expected for a restate-registered service

My observations are that requests from the restate server to the NodeJS service appear to have a "sticky session" or using some kind of Keep-Alive, as they appear to always use the same worker process for requests in a short time span, and it requires no requests for approx ~90 seconds (from basic testing) before another worker process will be used instead, and of course then that becomes the sticky worker until another ~90 seconds have passed

However this ultimately defeats the point of the cluster module, as it's designed to improve concurrency, but all concurrent requests will be handled by the same worker currently

Likely this is a side effect HTTP2 being used?

I haven't tried this with another runtime like Bun

@igalshilman
Copy link
Contributor

I'm not familiar with the cluster model for node, i'll take a look!

Likely this is a side effect HTTP2 being used?

This could be the case, as a single TCP connection is established, and then invocations are multiplexed within in as h2 streams.

Meanwhile I'd like to propose some alternatives:

More pods

  • if you are running on k8s, is it possible to deploy more pods instead? i
  • f you want to get fancy you can even combine restate with knative (here is a blog post that describes this approach https://knative.dev/blog/articles/building-stateful-applications-with-knative-and-restate/) this will get you out of the box scale out/down and load balancing even with http2.
  • Add more pods and run restate + envoy (single pod) for http2 load balancing to your node applications (ask me more if this is relevant)

If you are running on a bare metal, consider using

alternatively if you are running on a bare metal, consider deploying more NodeJS process with Ngnix/Caddy as a reverse proxy in front of them (all in the same box reverse proxying to local host)

@igalshilman
Copy link
Contributor

One additional tought:

@igalshilman
Copy link
Contributor

Can confirm that this is indeed due to HTTP2, as the cluster module load balances per physical TCP connection, while HTTP2 keeps a single TCP connection but multiplex the streams on a single connection.

I've tried to look at ways to deal with this, and it seems that they require application side (pretty complicated) load balancing. Let me know if the alternative approaches are enough.

@mupperton
Copy link
Contributor Author

Thanks @igalshilman - my use case is not really CPU bound, more just wanting the ability for a single Node service that has many handlers to have better parallelism for handling multiple requests concurrently as Node is single-threaded, so the Worker API would probably be worse performance

We can try multiple pods and verify our network load balancing is working

@mupperton
Copy link
Contributor Author

Our pod scaling and load balancing is working as expected

I'll leave it up to you if you think it's worth keeping this issue open if there is a chance you may consider supporting this in the future, otherwise you can close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants