Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🤯🤯🤯 OMG! Restate + Xstate library #2

Open
milanbgd011 opened this issue Oct 17, 2024 · 19 comments
Open

🤯🤯🤯 OMG! Restate + Xstate library #2

milanbgd011 opened this issue Oct 17, 2024 · 19 comments

Comments

@milanbgd011
Copy link

Dear Jack Kleeman, HUGE thank you! This is a major breakthru!!!
Number of SDKs and this Xstate mapping tells me Restate is the real deal.

Anything remotely similar to this, but very basic remapping is here, same thing you built here properly:
https://youtu.be/GpbOkDjpeYU?t=1631

I tried to somehow make the Temporal click with NextJS > therefore and Xstate:
temporalio/samples-typescript#97

I implemented xstate behind the API call that saves snapshot to Postgres, it works but very basic stuff.
With your mapping and power of Restate and Xstate now the possiblities are endless.
Restate Cloud offer seems attractive.

====

Question:

When will the library be published on NPM so I can try it following the examples you created and try to push it more.

@milanbgd011
Copy link
Author

milanbgd011 commented Oct 17, 2024

Ah, it is published here: https://www.npmjs.com/package/@restatedev/xstate

But you did some updates in the meantime and they are not on NPM. Will this pipeline be placed so that this starts to grow.

I am crusing very difficult tasks with Xstate on the frontend and basic backend, but to be able to do it in a "durable" way is really next level stuff.

EDIT: I can just use "public_api.js"... ma bad :). All works, testing now..

EDIT2: I literally cant believe this exists. Biggest fan!

@jackkleeman
Copy link
Collaborator

hi! I am so so glad to see your excitement about this. youre right there are some improvements since 0.0.1, let me do a release and the pipeline will run :)

@milanbgd011
Copy link
Author

Thank you for pushing the new version.

In this early stage, it would be really valuable to see the full list of tasks that need to be done before the lib reaches version 1.0, just by putting "# Road to 1.0" in README.md and just listing all the tasks that remain to be done.

This will bring clarity and get people involved. I might contribute too if capable to pinch in, but I need a list of clear targets.

You envisioned the solution, it works on the version 0.0.1. When it comes to 1.0 it will move mountains on the shoulders of Restate.

@jackkleeman
Copy link
Collaborator

What sort of things would you like to see in 1.0? For us, we need to see real world usage before solidifying the featureset and offering a backwards compatibility promise, etc

@milanbgd011
Copy link
Author

I will definitely try this stuff in production on non-critical stuff to check how it is working.

I would need a bit of time to dig into this deeply and report back. Give me couple of days please and I will post my feedback.

@kalinicm
Copy link

kalinicm commented Oct 19, 2024

edit: i am also the milanbgd011 account :)

I got a bit of time, not still digging deep enough unfortunately because of time.

But all works out of the box zero problems:

  • invoke
  • entry
  • exit
  • after
  • on
  • assign
  • ...

Even remembers the "after" trigger, but when it exists the state where after was initiated then it does not work.

Even if I try to get it again to the state where the "pending AFTER:{} needs to fire" it will reject this "AFTER:{}" because it seems that each state has unique "UUID" which guarantees that the "after" will be executed if that unique ID of the current state is the same, otherwise it will be discarded.

Which means that this "pending" after will never accidentally trigger if we end up in the same state, it simply does not belong to that state unique ID. >>> Received now cancelled event 447f3f3a-61e9-4f54-b8e6-9b1d6bea4994.xstate.after.10000

[restate] [auth/send][inv_1h3VdnA6H65X5hvfJibdxkttFWETR00te9][2024-10-19T23:28:02.235Z] INFO:  Received now cancelled event 447f3f3a-61e9-4f54-b8e6-9b1d6bea4994.xstate.after.10000.client.authorizing for target {
  id: 'xyy',
  sessionId: '447f3f3a-61e9-4f54-b8e6-9b1d6bea4994',
  _parent: { id: 'fakeRoot', sessionId: 'fakeRoot' }
}

I tried parallel states, works perfectly.

Also tried versioning the machine, like I leave machine in some "invoke" state to wait for 30 seconds, I stop the service, change the machine to go to "stopped" instead of "idle" state and start service. Restate uses new onDone target and just moves to stopped.

This means that versioning the machines is a piece of cake. You just say what you want to happend with users that are in state that is being removed or rewired and anyone in that state will get new path to go to from that point.

In your example you demonstrated that sendTo along with delay also works. Nesting machines works.

            actions: sendTo(
              ({ self }) => self._parent!,
              { type: "TOKEN" },
              { delay: 1000 }
            ),

====

OMG! What have you built here?! This thing can crush every single task no problem :)

KUDOS! Not a single problem, glitch or weird behaviour. ♥️♥️♥️♥️♥️♥️

This thing goes to production instantly (for non-critical stuff until I am 100% sure no problems will occur).

====

I am just wandering is there anything not covered in here. I mean should we make a list of xstate keys available and then check on/off if it works or not in README.md. This is unclear to me currently.

@jackkleeman
Copy link
Collaborator

Thank you! What lovely feedback to receive and I am glad you are trying your hardest to break it - definitely keep trying and if you succeed, I will make sure it gets fixed :D

I think we are missing callback actors, observable actors and possibly transition actors. In practice callback is very similar to promise, observable im not sure really makes sense in a serverless/backend context, but you could certainly make rpcs to the restate ingress from some long-running process with an observable streams. transition actors, i am not sure if these can ever be async - if they are purely sync, they should work as-is, but if async they need a similar approach to promise actors where we run them in a separate handler. definitely more help identifiying particular use cases that are not covered - this would be hugely valuable

@milanbgd011
Copy link
Author

milanbgd011 commented Oct 20, 2024

Breaking it will not be easy task I think :) But sure I will try! Thank you for the offer to fix what I find broken!

Neither of three mentioned actors are imporant. Serverless is very important, so observable is just wrong/weird I think.

definitely more help identifiying particular use cases that are not covered - this would be hugely valuable

Do you want me to create PR listing all xstate actions, actors... as a bullet points and then we can check off if they work or not in current version. I can confirm many of them and I can test others one by one when I get time, so we have full 1:1 mapping this Restate/Xstate repo to Xstate V5 and know what is covered and what not. I think visibility of this is key for progress here.

PS. Doing some cool examples and letting David K. Piano like the X post will make this thing visible to all Xstate developers. This is the "marketing channel" perfect to get this off the ground. This is how I found Restate and your repo.

41108

@jackkleeman
Copy link
Collaborator

@milanbgd011 that would be fantastic! and any suggestions re examples that perhaps align with what you see as interesting use cases, let me know and I can build them

@milanbgd011
Copy link
Author

milanbgd011 commented Oct 20, 2024

=== deleted the post, nor relevant anymore...

@milanbgd011
Copy link
Author

milanbgd011 commented Oct 22, 2024

@jackkleeman I tested 90%+ of the xstate API I think, all seems to be working, zero issues.

One thing you mentioned on video was not perfectly clear to me, here is video with timestamp where this segment starts https://youtu.be/vECQbpNOQYI?t=1703

QUOTE: ...have some handlers that will always run only one at a time and some handlers that can run concurrently and this is...
QUESTION: How is this achieved to have handler run only one or to run concurrently? What it actually means?


Nice thing is that machine returns the next state machine is in on each CURL sent to it, so that you can use this information to know what is going on the backend immediately, then track the progress with snapshot API call.

One thing that could be usefull is to have "event does not exist" when fired, to signify that none of the states/parallel states was able to receive this event and do something with it. But I think this is more xstate territory and must be done manually.


I guess making the video to explain how you can dig deep into problems would be very usefull, becaue it will show what restate logs and what is available. Running simple machine with a few invokes and events and after. Then assuming that machine is in bad state for some reason, what data is available in Restate to dig into the problem.

Since happy path works 100%, no problems at all. Taking the worst paths possible and how to get out of them would be very very usefull, it will show what Restate has baked in.


Only thing that remains unclear is how to scheadule some event or state change. Two cases are in there, specific timestamp or after some time dynamically (not just after: { 10_000: 'gotoThisState' }. Like recurring stuff and all.

It could be solved by creating parallel state which executes each 1h and in context we store last timestamp (like periodically sending email) and then if the value is more than 10 days we send email, if not another execution will be triggered in 1 hour and check again.

.sleep() is basically after in xstate?

@jackkleeman
Copy link
Collaborator

QUOTE: ...have some handlers that will always run only one at a time and some handlers that can run concurrently and this is...
QUESTION: How is this achieved to have handler run only one or to run concurrently? What it actually means?

In Restate, some types of services are 'keyed' and they can store state. For example, virtual object services have this property, and the xstate library is built on top of a virtual object. For a given key, handlers of a virtual object will by default lock the object and can not run concurrently with each other. However, you can also set a handler to be 'shared' which means it can run concurrently with any other handler running, but it can't mutate state, and it only reads a snapshot of state. In the xstate integration, the invokePromise handler is one such handler, which doesn't block the main loop. This is necessary because it can run for arbitrarily long to complete a promise execution. However, all other handlers do not block (in line with xstate API) and so should run for very short periods of time. These are executed serially, so that we have strong ordering, and so that they can safely update state instead of just read it.


One thing that could be usefull is to have "event does not exist" when fired, to signify that none of the states/parallel states was able to receive this event and do something with it. But I think this is more xstate territory and must be done manually.

Agreed. I think it is possible to interpret the state machine at runtime to determine the set of valid events for the current state, and reject invalid events. If you file a separate issue I can take a look at this!


Re recurring events, i think broadly the idea of repeatedly polling via self-transitions (after: 1h -> self) makes sense. I think you can use child machines to have multiple of these polls concurrently executing. I would hope the approach to this is the same with our integration as it would be in native xstate.
As for .sleep() - this is a way to suspend a handler invocation for an arbitrarily long time. The actual after feature is implemented with a delayed invocation eg .send("hello", {delay: 1000}). This is because in xstate we don't want to block an invocation as this will block the whole state machine, we instead want to schedule non-blocking events in the future. In normal restate code without xstate, you might use a mix of these two approaches depending on the use case.

@milanbgd011
Copy link
Author

milanbgd011 commented Oct 28, 2024

@jackkleeman Thank you for the detailed response. I will try to keep it short and precise, not to overwhelm you with too many questions at once. Sorry for late response, private matters.

I crated new issue for non-listened to events here: #3
Thank you for tackling this one.

I will express how I percieve the information you shared, then you correct me if I am wrong.

BLOCKING (should be instant, very fast to execute because they block machine)

  • assign()
  • enqueue.assign()

NON-BLOCKING (does not block machine, we can send more events to it no problem)

  • raise() - execute now (+ delay)
  • enqueue.raise() - execute after entering fully next state (after entry:{}) (+ delay)
  • sendTo() (+ delay)
  • enqueue.sendTo() (+ delay)
  • sendParent() (+ delay)
  • invoke: fromPromise()
  • target: '...'

I think that I fired event against machine with 2 parallel states, while one state was inside "invoke: fromPromise() waiting 5 seconds to resolve". It did take the new event and started transition on the second parallel state without affecting current processing of first state. Is this expected? This is perfect scenario basically. Machine responds to events if possible to receive them on matter which parallel state is doing what.

When we are doing the "blocking" stuff, then all incoming events are being scheduled while machine is "stuck" doing some assign(), but none of the events will drop.

@jackkleeman
Copy link
Collaborator

  1. I am not sure that any in-xstate operation can block the state machine. From Restates perspective, every invocation through this library is just read state -> do something which will not include any await -> write state -> finish. as such, invocations should always be very short. with the sole exception of invokePromise handler, where promises are executed, and there is an await, but this handler doesnt block the machine.

  2. it is 100% expected that the machine can still process events while some part of it is waiting on a fromPromise execution. This is because the promise is offloaded to the invokePromise handler which is a restate shared handler, and so can execute concurrently to other handlers. All the handler needs to do is find the promise actor definition in the state machine, pass the appropriate input values, await the promise, and send the result or rejection back to the main state machine

  3. It is not possible to do any blocking stuff within this library, because we simply do not await anything in any handler except invokePromise. Even using Restates underlying features available on the restate.ObjectContext, I don't think you can block the state machine without that await. However, invocations of course still take time to complete (a few milliseconds), during which incoming events queue up inside Restate. Certainly no events will ever drop. If you did somehow manage to block the state machine meaningfully, for example if you took down to the lambda/node service running the underlying code such that the invocation cannot be processed, what this would look like is a long invocation on a handler like send, ie an event is taking a long time to be processed. To make an analogy, if you were using xstate in the browser, this would be like the browser hanging, during which time xstate can't receive any new messages from the event loop

@milanbgd011
Copy link
Author

milanbgd011 commented Oct 30, 2024

@jackkleeman then you for the clarification, it seems very simplistic to reason about, just like a simple xstate machine - but a durable execution one.

I am not sure how you built this resilient thing in couple of commits really. Seems really weird :) Really exciting about this xstate + restate. Maybe it is not obvious for others how powerful this combinaiton, but sure enough they will know. Thank you once again for the effort, really appreciated. Keeping you posted. There is so much stuff to test, will report if any problems occur.


EXAMPLE 1: HEAVY TIME-RELATED STUFF

From the examples perspective, first one needed is with extreme heavy usage of scheaduled events (or anything related to time) could be a good candidate, because simple execution is ok, but doing this durably is another beast. Demoing how you can do newsletters, subscriptions and other time related stuff inside single machine with parallel states. If user subscribes to email, we save something to context and then move the parallel states "subscription" to "active" and each 7 days sends newsletter by circling thru "waitingNextDeliveryWindow" and "invokeSendingEmail" to actually send it. For me to cancel newsletter subscription I just move parallel state into "idle" to break the loop and done. SImple but very effective.


EXAMPLE 2: MACHINE VERSIONING

One big topic is machine versioning, for example if we have some users in "awaitingApproval" state and we create new improved "awaitingConsesus", and basically put "after" on old state to make it go into the next state as soon it boots the machine again, but I do not think this is possible. The question is how to initiate the transit to new awaitingConsensus using "after" or anything else to make machine switch to the next state. One possible solution is change "on:{}" to capture any event being sent to the macine and this initiates move to the new state then (effectively updating current users to use new workflow).


EXAMPLE 3: HARD PROBLEMS IN OTHER SYSTEMS BEING HANDLED ELEGANTLY IN RESTATE+XSTATE

Also, since you guys come from Apache Flink, you know the limitations of other systems (kafka blocking entire queue, stuff missing in Flink and others, then showing how elegantly this hard problems for this other systems is handled in xstate and restate, thus avoiding the pain with other solutions.


PS. Do not feel obliged to do any of this stuff, I am just spitting out ideas, nothing else.

@jackkleeman
Copy link
Collaborator

I am not sure how you built this resilient thing in couple of commits really.

I guess the xstate bit is quite simple, but the restate bit is a lot of commits ;)

Awesome ideas!

Re machine versioning, this is a hard problem in general in durable execution. In some ways its a bit more tractable using XState, because you only have to ensure that the machine state is still valid under the new version, ie the states, inputs etc for the current state (of all existing state machines) still make sense in the new version. Currently to solve the more general versioning problem users can inform Restate of new versions and it will send new invocations to the new versions; however in the case of this library, the new version is still going to read the state set by the old version, so there is still a need for state-compatibility.

I think the safest way to do versioning when making changes that can't be state-compatible is to have multiple state machine definitions side by side inside a parent state machine. The first step of the parent state machine would be to select the latest version available in the running service, then delegate to that machine. This selected version is then persisted by xstate like any other state node, and if later new versions are added, we will keep using the old state machine, but new invocations will use the new one. I would hope this is only needed in exceptional circumstances however

@milanbgd011
Copy link
Author

milanbgd011 commented Oct 31, 2024

I agree, lets try to clarify a bit more thru examples to be sure we are talking about same stuff.

Rule is simple, do not delete old state, just rewire it to forward to right place and keep it in code until you are sure 100% no machines running depend on it. It does not bother if you have these old states lingering around, it is ok. It is way better than doing any versioning which is very hard to maintain.

We have 2 different types of "versioning":

  • versioning existing EVENT
    • for the event it is easy, just update the existing event listener to do something else
  • versioning existing STATE
    • in old state listen to all events, target new state and enqueue event that is sent to the new state

Versioning the state example

V1

{
     idle: {},
     awaitingConfirmation: {
       on: {
         CONFIRM: {...},
         .....more events
       },    
     },
}

V2

It will react to any event sent to machine and forward this event to new state the machine should be in

{
     idle: {},
     awaitingConfirmation2: {
        on: {
         CONFIRM: {...},
         .....more events
        }
     },
    // inactive states (handle old machine versions)
     awaitingConfirmation: {
       on: {
         '*': {
            target: 'awaitConfirmation2',
             actions: [
                 ({ enqueue, event }) => {
                      enqueue(event); // <<<<<<<<<< this will execute the event in new "awaitConfirmation2" state
                 }
             ]
          }
       }
     },
}

This basically means that the UI/API firing the event will use the same event, but the machine structure will be different, which means no need for API request updates or UI updates that fire the APIs. Perfect separation.

This example forwards all events to new state, but we can even separate by event and send them to different states too, so the granularity of versioning in machine is very easy. Just take all defined events and decide what they will do (forward to which state) and done.


To get traction on this project, we need blogpost of some example that handles impossible scenario, something that other systems are not capable of - or is too complex to do them. Kafka and Temporal have big gaps and problems when running in prod. Kafka can block and Temporal is very unintuitive to write. This example must expose weaknesess of other systems and this blog post will be than liked on X by others and that is the way to go. But needs to be something really impossible to do in other systems. If you know Kafka and Flink and a bit of Temporal, you are probably aware what they have problems with.


One more issue that is critical to build ie. Uber like app is realtime tracking of vehicle movement. Now, this is done via Kafka usually and seems like Restate is not good fit for this job, because of the frequency of possible updates to the machine. Am I right here or not? Is there a way to handle this in restate efficiently and be able to run 10.000 of this stuff on single dedicated server? I think using Redis Streams here would do excellent job, because of the speed and transient nature of this tracking.. and redis is very fast and efficient with space. No blocking, no maintenance, easy to setup and run. Right?

@jackkleeman
Copy link
Collaborator

Your examples re versioning makes sense to me

I agree it would be cool to write a post comparing kafka/temporal to resstate/xstate. Its definitely on our to do list!

I think Restate can definitely handle a large volume of updates, Kafka style. Restate is internally a streaming system and we've benchmarked thousands of events per second on a single machine - and we can horizontally scale, too. I don't think you would need to bring in another tool like Redis streams. That said, you may not want to use the xstate integration for that kind of volume - for maximum throughput you might want a Go or Rust service, using Restates default RPC API, instead of the Xstate one.

@milanbgd011
Copy link
Author

Thank you Jack!

In the plethora of tools and approaches, it is easy to miss the stuff Restate brings to the table actually.

Creating dummy "uber" app using restate/xstate (with RPC for high thruput GPS location and xstate for all other) and then comparing to Temporal/Kafka, and pinpointing the "weak spots" in their stack will pull in people to give it a try. Building good starting point for others to experiment seems legit.

So basically, "let us wire up our product to simplistic but complete real world scenario and then show off where they shine".

You know, even all cars seem alike but deeply they are not, but this is very hard to spot without spending a ton of time and investing yourself to learn new framework/tool before going to prod. So, many will not do it unfortunately.

Restate goes straight to the backbone of the entire company, whoever decides to use it. Very specific situation. :)

I have perfect project for restate/xstate duo, so I hope to get my hands dirty soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants