Provide an example for how to implement Http long polling #63

akka-ci · 2016-09-08T15:44:06Z

Issue by agemooij
Tuesday Mar 22, 2016 at 16:33 GMT
Originally opened as akka/akka#20121

Quoted from a Gitter discussion:

Hey hakkers. I'm still pretty new to doing complex things with streams and akka-http. Does anyone have an example of how to implement long polling with the client-side part of akka-http? I've got a Consul client written as Spray actors that I'm thinking of moving to akka-http but the conceptual model is (unsurprisingly) very different indeed.

The docs mention long polling a bunch of times when discussing the higher (i.e. connection pooling) API levels but I can't find any real information (docs, examples, blogs, SO questions, etc.) on how I would go about implementing a long polling client using akka-http and streams.

For instance: I want to poll forever but how do I automatically recreate the http connection produced by Http().outgoingConnection when it closes? And that method produces a Flow[HttpRequest, HttpResponse, Future[Http.OutgoingConnection]] but how do I turn that into a loop of requests where every next request is based on the response to the previous one (i.e. polling forchanges since the index that got sent with the previous change). Both of these are basically circular/recursive internal graph stages but conceptually what I want is a Source[ConsulChangeEvent].

akka-ci · 2016-09-08T15:44:09Z

Comment by drewhk
Tuesday Mar 22, 2016 at 16:36 GMT

Isn't long polling simply like:

val events = Source.repeat(pollRequest).via(Http().outgoingConnection(url))

I.e. continuously feed requests into the Http connection, returning a response once there is one.

akka-ci · 2016-09-08T15:44:09Z

Comment by ktoso
Tuesday Mar 22, 2016 at 16:38 GMT

Somewhat, but plus reconnecting

Hah, yeah in that sense yeah. I keep thinking of the more complicated cases somehow, will dump thoughts here in a bit (what we did in ConductR was more than just that)

akka-ci · 2016-09-08T15:44:10Z

Comment by drewhk
Tuesday Mar 22, 2016 at 16:40 GMT

Then use Source.repeat(pollRequest).mapAsync(1)(Http.singleRequest) for now

akka-ci · 2016-09-08T15:44:12Z

Comment by agemooij
Tuesday Mar 22, 2016 at 16:46 GMT

The URI of each next request depends on the response to the previous request. An example API call:

GET /v1/health/service/blaze-canary-service?index=3232540&wait=300s&passing

Here, index is a value that comes from the previous response. Also, indeed, when the connection gets dropped, it should recreate it again. The wait param specifies the server-side max wait (i.e. 300 seconds). After that the connection gets dropped and it should be recreated.

The docs strongly state that using the shared pool for long-polling requests is not encouraged. AFAIK Http().singleRequest uses the super pool.

akka-ci · 2016-09-08T15:44:13Z

Comment by agemooij
Tuesday Mar 22, 2016 at 16:48 GMT

The target API is this Consul (see the top section about "blocking queries")

akka-ci · 2016-09-08T15:44:14Z

Comment by agemooij
Tuesday Mar 22, 2016 at 16:49 GMT

A Spray implementation as a reference for what I'm trying to achieve using streams:

package rfs.blaze.gateway.config
package consul

import scala.concurrent.duration._
import scala.util._

import akka.actor._
import akka.io.IO

import spray.can._
import spray.can.client._
import spray.http._
import spray.httpx.RequestBuilding._
import spray.json._

object Consul {
  def startWatching(endpoint: ConsulEndpoint, sender: ActorRef)(implicit actorRefFactory: ActorRefFactory): ActorRef = {
    val encodedUri = endpoint.baseUri.path.toString.replaceAllLiterally("/", "-")
    val name = s"consul-watcher${encodedUri}"
    val watcher = actorRefFactory.actorOf(Props[ConsulEndpointWatcher], name)

    watcher.tell(ConsulEndpointWatcher.StartWatching(endpoint), sender)
    watcher
  }
}

object ConsulEndpointWatcher {
  case class StartWatching(endpoint: ConsulEndpoint)
  case object Reconnect

  case class ConsulValueChanged(value: AnyRef)
  case object InvalidConsulResponse
  case class ConsulEndpointUnreachable(endpoint: ConsulEndpoint)
}

/**
 *
 */
class ConsulEndpointWatcher extends Actor with ActorLogging {
  import ConsulEndpointWatcher._
  import context.dispatcher
  import Http._

  def receive = notConnected

  def notConnected: Receive = {
    case StartWatching(endpoint) ⇒ {
      context.become(connecting(sender(), endpoint, None, None))
      connect(endpoint)
    }
  }

  def connecting(controller: ActorRef, endpoint: ConsulEndpoint, previousIndex: Option[Long], previousValue: Option[AnyRef]): Receive = {
    case _: Connected ⇒ {
      log.debug("Connected")

      val connection = sender()

      context.watch(connection)
      context.become(watching(controller, connection, endpoint, previousIndex, previousValue))
      watch(connection, endpoint, previousIndex)
    }

    case CommandFailed(Connect(address, _, _, _, _)) ⇒ {
      log.error(s"Failed to connect using address ${address}")

      controller ! ConsulEndpointUnreachable(endpoint)

      // TODO: is reconnect always the correct strategy?
      // TODO: implement a backoff strategy
      // shouldn't we leave this to the supervision hierarchy?
      context.system.scheduler.scheduleOnce(1 second, self, Reconnect)
    }

    case Reconnect ⇒ {
      log.debug("Connecting...")

      connect(endpoint)
    }

    case Terminated(actor) ⇒ log.debug(s"Received Terminated event in 'connecting' state. Deceased: $actor")
  }

  def newIndex(headers: Seq[HttpHeader]) =
    headers.find(_.is("x-consul-index")).map(_.value.toLong)

  def consulIndexAlreadySeen(previousIndex: Option[Long], headers: Seq[HttpHeader]) =
    previousIndex.isDefined && previousIndex == newIndex(headers)

  def watching(controller: ActorRef, connection: ActorRef, endpoint: ConsulEndpoint, previousIndex: Option[Long], previousValue: Option[AnyRef]): Receive = {
    case HttpResponse(_, _, headers, _) if consulIndexAlreadySeen(previousIndex, headers) ⇒
      log.debug(s"x-consul-index ${newIndex(headers)} already seen")
      watch(connection, endpoint, newIndex(headers))

    // TODO: deal with other non-200 status codes?
    case HttpResponse(status: StatusCodes.ServerError, entity, _, _) ⇒ {
      log.warning(s"Received status $status from consul: $entity")
      watch(connection, endpoint, previousIndex, delay = 1 second)
    }

    case HttpResponse(status, entity, headers, _) ⇒ {
      readResponseValue(entity, endpoint.reader) match {
        case Success(newValue) ⇒ {
          if (previousValue.forall(_ != newValue)) {
            log.debug(s"Received new value $newValue on endpoint $endpoint")
            controller ! ConsulValueChanged(newValue)
            context.become(watching(controller, connection, endpoint, newIndex(headers), Some(newValue)))
          } else {
            log.debug(s"Received unchanged value $newValue on endpoint $endpoint")
          }

          watch(connection, endpoint, newIndex(headers))
        }

        case Failure(error) ⇒ {
          log.error(s"Error while reading response value. Error: ${error.getMessage}, status code: ${status}")
          controller ! InvalidConsulResponse
          watch(connection, endpoint, newIndex(headers), delay = 1 second)
        }
      }
    }

    case event @ (SendFailed(_) | Timedout(_) | Aborted | Closed | PeerClosed | ErrorClosed(_) | Terminated(_)) ⇒ {
      log.debug(s"Disconnected from $endpoint due to $event, reconnecting")
      context.become(connecting(controller, endpoint, previousIndex, previousValue))
      context.system.scheduler.scheduleOnce(1 second, self, Reconnect)
    }

    case Reconnect ⇒ log.warning("Received a Reconnect event in 'watching' state.")
  }

  // ==========================================================================
  // Implementation details
  // ==========================================================================

  /** Overridable for test purposes */
  private[consul] def connect(endpoint: ConsulEndpoint): Unit = {
    IO(Http)(context.system) ! Connect(
      host = endpoint.host,
      port = endpoint.port,
      settings = Some(ClientConnectionSettings(context.system).copy(
        idleTimeout = endpoint.watchTimeout + endpoint.clientTimeoutBuffer,
        requestTimeout = endpoint.watchTimeout + endpoint.clientTimeoutBuffer
      ))
    )
  }

  private def watch(connection: ActorRef, endpoint: ConsulEndpoint, previousIndex: Option[Long], delay: FiniteDuration = 0 seconds): Unit = {
    context.system.scheduler.scheduleOnce(delay, connection, requestFor(endpoint, previousIndex))
  }

  private def requestFor(endpoint: ConsulEndpoint, previousIndex: Option[Long]): HttpRequest = {
    Get(endpoint.uri(previousIndex)) ~> addHeader(HttpHeaders.Connection("Keep-Alive"))
  }

  private def readResponseValue(entity: HttpEntity, reader: ConsulResponseReader): Try[AnyRef] = {
    entity.toOption.map { data ⇒
      Try(reader.read(JsonParser(data.asString)))
    }.getOrElse(Failure(new RuntimeException("Empty http entity!")))
  }
}

akka-ci · 2016-09-08T15:44:15Z

Comment by agemooij
Tuesday Mar 22, 2016 at 16:52 GMT

And @drewhk I would love for there to be a "don't be stupid, just use this one-liner" answer to this because the complexity of this in Spray was exactly the reason why I started looking at how this would work in akka-http 😄

akka-ci · 2016-09-08T15:44:16Z

Comment by drewhk
Tuesday Mar 22, 2016 at 16:57 GMT

The docs strongly state that using the shared pool for long-polling requests is not encouraged. AFAIK Http().singleRequest uses the super pool.

I guess the host based connection pool can be used then:

Source.repeat(pollRequest).via(Http().cachedHostConnectionPool(...))

This will do reconnects AFAIK at the pool level, and since this is a dedicated pool it won't interfere with the rest of the system.

The final solution might be a bit more complicated and need a Graph cycle if you need to limit the poll count bounded. For example (pseudocode):

Graph.create {
  Source.single(pollRequest) ~> merge ~> cachedConnectionPool ~> bcast ~> out
                                                   merge <~ map(_ => pollRequest) <~ bcast
}

will keep the number of outstanding polls exactly one (as there is one poll circulating in the loop). Of course above needs error handling.

akka-ci · 2016-09-08T15:44:17Z

Comment by agemooij
Tuesday Mar 22, 2016 at 20:35 GMT

Thanks. That looks like something that might work. I'll go figure it out and report back. That DSL sure takes some time to get used to 😄

akka-ci · 2016-09-08T15:44:19Z

Comment by agemooij
Tuesday Mar 29, 2016 at 21:22 GMT

Finally got some time to experiment with this again. An update on my progress so far:

the approach sketched above by @drewhk works in essence, although it required some tweaks here and there
the details of the cached host connection pool make things a bit more complex due to the type of the flow returned, i.e. Flow[(HttpRequest, T), (Try[HttpResponse], T), HostConnectionPool]. For now I've gone for using the request itself as the identifying T
if I don't immediately consume the response entity, the whole thing grounds to a halt. This means I need to map the output HttpResponse to one using a HttpEntity.Strict entity to make sure the source doesn't get blocked by consumers that forget to read the entity.
I'm still curious about whether/how this could work using the connection-level API, mostly because the use of the cached host connector pool makes it very tricky to make sure that the same cached pool is not used by other code, which might also be long pooling. I think whoever wrote those warnings in the docs about not misusing shared pools for long polling was correct. I'll dive deeper into this.

Here's what I have so far. All feedback/comments/suggestions would be very welcome.

object AkkaHttpLongPolling {
  def longPollingSource(host: String, port: Int, uri: Uri, maxWait: Duration)(implicit s: ActorSystem, fm: Materializer): Source[HttpResponse, NotUsed] = {
    import GraphDSL.Implicits._
    import s.dispatcher

    val connectionPoolSettings = ConnectionPoolSettings(s)

    Source.fromGraph(GraphDSL.create() { implicit b ⇒
      val init = Source.single(createRequest(uri, maxWait, None))
      val merge = b.add(Merge[HttpRequest](2))
      val broadcast = b.add(Broadcast[(Try[HttpResponse], HttpRequest)](2))
      val tupler = b.add(Flow[HttpRequest].map(r ⇒ (r, r)))
      val http = b.add(Http().cachedHostConnectionPool[HttpRequest](host, port, connectionPoolSettings).mapMaterializedValue(_ ⇒ NotUsed))

      val outbound = b.add(
        Flow[(Try[HttpResponse], HttpRequest)]
          .collect { case (Success(response), _) ⇒ response }
          .mapAsync(1)(response ⇒ response.entity.toStrict(5.seconds).map(strictEntity ⇒ response.copy(entity = strictEntity)))
          .withAttributes(ActorAttributes.supervisionStrategy(Supervision.resumingDecider))
      )

      val feedback = b.add(
        Flow[(Try[HttpResponse], HttpRequest)]
          .map {
            case (Success(response), _) ⇒ {
              val index = response.headers.find(_.is("x-consul-index")).map(_.value.toLong)
              println(s"Success. New index = ${index}.")
              createRequest(uri, maxWait, index)
            }
            case (Failure(cause), _) ⇒ {
              println("Failure: " + cause.getMessage)
              createRequest(uri, maxWait, None)
            }
          }
      )

      // format: OFF
      init ~> merge ~> tupler ~> http ~> broadcast ~> outbound
              merge <~ feedback       <~ broadcast
      // format: ON

      SourceShape(outbound.out)
    })
  }

  private def createRequest(baseUri: Uri, maxWait: Duration, index: Option[Long]): HttpRequest = {
    HttpRequest(
      uri = baseUri.copy(
        rawQueryString = Some(s"wait=${maxWait.toSeconds}s&index=${index.map(_.toString).getOrElse("0")}")
      )
    )
  }
}

The query parameters on the request are specific to the Consul API but I expect every long-polling endpoint has some form of those same two parameters.

I plan to turn this into a blog post once I've figured some more things out and handled all the corner cases.

Would this fit anywhere in the akka-http docs?

Updated: implemented conversion to Httpentity.Strict and reformatted using the akka docs style

akka-ci · 2016-09-08T15:44:21Z

Comment by agemooij
Sunday Apr 03, 2016 at 20:17 GMT

Some more progress. I tired using the connection-level flow in combination with flatMapConcat but so far it doesn't reuse the resulting connection but start a new one every time. Also, just like the above version using the cached pool, the resulting flow does not recover from network errors and collapses on the first failure.

object LongPollingHttpClientUsingSingleConnection {
  def longPollingSource(host: String, port: Int, uri: Uri, maxWait: Duration)(implicit s: ActorSystem, fm: Materializer): Source[HttpResponse, NotUsed] = {
    Source.fromGraph(GraphDSL.create() { implicit b ⇒
      import GraphDSL.Implicits._
      import s._

      val settings = ClientConnectionSettings(s).withIdleTimeout(maxWait * 1.2)

      val initSource: Source[HttpRequest, NotUsed] = Source.single(createRequest(uri, maxWait, None)).log("outer 0", out ⇒ s"Sending request...")
      val httpFlow: Flow[HttpRequest, HttpResponse, NotUsed] =
        Flow[HttpRequest]
          .flatMapConcat(request ⇒
            Source.single(request)
              .via(Http().outgoingConnection(host, port, settings = settings))
          )
          .mapMaterializedValue(_ ⇒ NotUsed)

      val outboundResponsesFlow: Flow[HttpResponse, HttpResponse, NotUsed] =
        Flow[HttpResponse] // TODO: add size limit
          .mapAsync(1)(response ⇒ response.entity.toStrict(5.seconds).map(strictEntity ⇒ response.copy(entity = strictEntity)))
          .withAttributes(ActorAttributes.supervisionStrategy(Supervision.resumingDecider)) // TODO: turn into log-and-resume

      val feedbackResponsesFlow: Flow[HttpResponse, HttpRequest, NotUsed] =
        Flow[HttpResponse]
          .map { response ⇒
            log.debug(s"Success. Response: ${response.copy(entity = HttpEntity.Empty)}.")
            val index = response.headers.find(_.is("x-consul-index")).map(_.value.toLong)
            log.debug(s"New index: ${index}.")
            createRequest(uri, maxWait, index)
          }

      val init = b.add(initSource)
      val http = b.add(httpFlow)
      val merge = b.add(Merge[HttpRequest](2))
      val broadcast = b.add(Broadcast[HttpResponse](2))
      val outbound = b.add(outboundResponsesFlow)
      val feedback = b.add(feedbackResponsesFlow)

      // format: OFF
      init ~> merge ~> http     ~> broadcast ~> outbound
              merge <~ feedback <~ broadcast
      // format: ON

      SourceShape(outbound.out)
    })
  }

  private def createRequest(baseUri: Uri, maxWait: Duration, index: Option[Long]): HttpRequest = {
    HttpRequest(
      uri = baseUri.copy(
        rawQueryString = Some(s"wait=${maxWait.toSeconds}s&index=${index.map(_.toString).getOrElse("0")}")
      )
    )
  }
}

I think the docs could use some sections on stream resiliency and failure recovery in more complex cases such as this.

I think I'll dive into custom graph stages next, since they are the only code I've seen so far that has access to upstream completion/failure events.

akka-ci · 2016-09-08T15:44:22Z

Comment by agemooij
Sunday Apr 03, 2016 at 20:19 GMT

I hope my efforts at slowly clawing my way towards enlightenment provide some insight into the struggles of people trying to figure out how this stream stuff works 😄

akka-ci · 2016-09-08T15:44:23Z

Comment by ktoso
Sunday Apr 03, 2016 at 22:22 GMT

Awesome research here, thanks for sharing it and sorry I wasn't able to comment in depth yet, a bit overloaded last week. Hope to investigate here this week :)

akka-ci · 2016-09-08T15:44:25Z

Comment by agemooij
Saturday Apr 09, 2016 at 09:24 GMT

Hi @ktoso . Any hints towards directions to take in my research. I'm hoping to dig a little deeper this weekend so if you have any ideas, I'd love to hear them and try them out.

akka-ci · 2016-09-08T15:44:25Z

Comment by agemooij
Saturday Apr 09, 2016 at 14:20 GMT

I have discovered PoolInterfaceActor and PoolFlow. Studying them now...

akka-ci · 2016-09-08T15:44:27Z

Comment by ktoso
Saturday Apr 09, 2016 at 17:30 GMT

Had some time now while wife was shopping ;)
The initial code looks pretty good I think its the rights direction. I'll need to get to a pc tough to give more detailed feedback :)

akka-ci · 2016-09-08T15:44:28Z

Comment by agemooij
Thursday Apr 21, 2016 at 08:59 GMT

Hi @ktoso and the rest of the akka team. I've been diving into the implementation of http core and streams and I'm very curious about the design decisions that went into the current design and about your ideas about the direction things are going.

In the current design, http connection management has been directly linked to stream materialization and there are no hooks at all in the API for dealing with events and state changes on the connection. The stream just stops and the surrounding code will have to detect this in some way, which is not entirely trivial since AFAIK there are no easy prebuilt stages for dealing with "normal" stream completion.

This means that in order to build streams that deal with connection loss, transparently reconnecting, etc., you will need to break our of the akka-stream context and dive into the implementation layer and the underlying actors.

This is exactly how the host connection pools in http core have been implemented, i.e. a Flow implemented by an actor that internally uses more actors to rematerialize nested connection Flows when they complete.

Unfortunately a lot of that work is private and very deeply entwined with complex implementation details. I'm attempting to strip out and copy the parts I would need to implement a persistent, resilient long polling connection Source but a lot of those implementation details are still not entirely obvious to me.

I'm wondering whether this topic, i.e. exposing connection management in http core as a higher level abstraction, has been discussed in the team and whether there are plans to provide such a thing in the future.

akka-ci · 2016-09-08T15:44:31Z

Comment by agemooij
Thursday Apr 21, 2016 at 09:13 GMT

updated:: removed outdated link to source code.

Next step, strip out some of the PoolInterfaceActor implementation details in an attempt to get an http connection Flow that transparently reconnects (or find some way to implement this using a unique single-connection host connector pool per long polling operation, but then that warning in the Akka docs definitely needs to be rewritten 😄 )

akka-ci · 2016-09-08T15:44:32Z

Comment by agemooij
Friday Apr 22, 2016 at 22:08 GMT

After writing that last bit in the above comment, I tested an implementation using a custom, non-shared pool configured to only allow on single connection and that seems to work, including resilience across connection failures.

It seems that as long as you make sure the pool is not used by anyone else, using a single-connection host pool is actually the best solution for long polling since it stops you from having to implement all the rematerialization stuff hidden behind the pool interface.

Latest implementation now here: https://github.com/agemooij/stream-experiments/blob/master/src/main/scala/scalapenos/experiments/streams/LongPollingHttpClient.scala

akka-ci · 2016-09-08T15:44:32Z

Comment by ktoso
Friday Apr 22, 2016 at 22:18 GMT

I figured it's about time I spend a considerable chunk of time on this, to appreciate your effort here – I'm diving in now, thanks for keeping at it!

akka-ci · 2016-09-08T15:44:33Z

Comment by ktoso
Friday Apr 22, 2016 at 23:49 GMT

So I've read through but need to give it a spin to notice exact semantics, some things that are missing you pointed out. API style actually is something we could provide - exacly such extension.

I also noted:
nextRequest: HttpResponse ⇒ HttpRequest, // yeah, this makes a lot of sense, possibly a default could be identity (imagine stateful endpoint, like twitter stream)

import s._ // avoid importing everything from system

.collect { case Success(response) ⇒ response } // probably log failures though?

.mapAsync(1)(response ⇒ response.entity.toStrict(5.seconds).map(strictEntity ⇒ response.copy(entity = strictEntity))) // we'd likely want to the toStrict and combine with "json streaming" (once i finally finish it)?

.withAttributes(ActorAttributes.supervisionStrategy(Supervision.resumingDecider)) // TODO: turn into log-and-resume supervision in its current form is not that good, not sure if this is very needed here, if possible working without it would be preferable

the materialized value to control it is obviously a key feature to add.

single-connection host pool likely makes sense for this scenario, I think agree, this way we don't impact anyone else; but will try to verify with others.

We'll meet with the team on monday and I hope to have a look together at this and how it could move forward btw.

Hope the comments some sense and we'll be back here soon :)

akka-ci · 2016-09-08T15:44:34Z

Comment by ktoso
Friday Apr 22, 2016 at 23:49 GMT

I'll assign myself here to keep it on my radar

akka-ci · 2016-09-08T15:44:34Z

Comment by jrudolph
Saturday Apr 23, 2016 at 09:24 GMT

@agemooij how easy would it be to implement your use-case using singleRequest over a dedicated pool? Maybe that's the core abstraction that is missing?

akka-ci · 2016-09-08T15:44:36Z

Comment by agemooij
Saturday Apr 23, 2016 at 12:35 GMT

@jrudolph good idea. I think the main problem with using singleRequest is how to make sure that the pool used for the long polling connections is not shared with "normal" http client code to the same host/port.

Some kind of way to make absolutely sure the pool is unique, like an optional name field in ConnectionPoolSettings would fix that.

If isolation can be guaranteed, then the remaining problem comes down to configuring the maximum number of pool connections for a situation in which connections get hogged for very long periods and the number of needed connections is very variable. Just setting the number to be pretty high (i.e. just guessing) would be perfectly acceptable to most users I guess.

akka-ci · 2016-09-08T15:44:38Z

Comment by agemooij
Saturday Apr 23, 2016 at 13:07 GMT

I also noted:
nextRequest: HttpResponse ⇒ HttpRequest, // yeah, this makes a lot of sense, possibly a default could be identity (imagine stateful endpoint, like twitter stream)

Good idea. The default value should be initialRequest then, which requires splitting the argument list. The API then becomes:

  def longPollingSource(host: String, port: Int,
                        initialRequest: HttpRequest,
                        connectionSettings: ClientConnectionSettings = ClientConnectionSettings(system))
                       (nextRequest: HttpResponse ⇒ HttpRequest = _ => initialRequest)
                       (implicit m: Materializer): Source[HttpResponse, NotUsed] = {

.collect { case Success(response) ⇒ response } // probably log failures though?

I'd log it on the feedback side, since the output side is only meant for reporting successfully received responses.

import s._ // avoid importing everything from system

Is this a best practice I'm unaware of? What's the danger?

.mapAsync(1)(response ⇒ response.entity.toStrict(5.seconds).map(strictEntity ⇒ response.copy(entity = strictEntity))) // we'd likely want to the toStrict and combine with "json streaming" (once i finally finish it)?

I'm not sure what you mean. Are you suggesting adding API for this on HttpResponse?

Having to always drain the response entities to prevent blocking the main flow is certainly inconvenient for building user-proof APIs, so I'd prefer some easier, safely bounded way to implement response transformation in cases such as this; i.e. you have a certain knowledge of the types of responses you will likely receive and you don't want to bubble up this responsibility to the API user.

akka-ci · 2016-09-08T15:44:38Z

Comment by jrudolph
Monday Apr 25, 2016 at 11:26 GMT

I think the main problem with using singleRequest is how to make sure that the pool used for the long polling connections is not shared with "normal" http client code to the same host/port.

I think using a dedicated single-connection pool is the right solution. The responsibility of the "pool" is the connection state handling.

The abstraction (not the functionality) which is currently missing is something which provides singleRequest given an arbitrary Flow[(T, HttpRequest), (T, HttpResponse)].

This is something quite generic. Think about the burger chain where in the backend you have a pipeline of (burger stacking) cooks, basically a Flow[Order, Burger]. For organizational reasons, this simple linear pipeline may not be good enough, so the backend is implemented behind an interface of the kind Flow[(OrderId, Order), (OrderId, Burger)] (could be a simple, linear pipeline, or a number of on-duty cooks / pipelines waiting in a pool for more work which some algorithm scheduling pipelines intelligently (like reusing existing cooks/pipelines before getting cooks from the pool which have to dry themselves first, etc.)). On top of that the sales process is that the customer makes the order at the vending machine, getting a token of future completion in form of a receipt containing the order ID. So, we have several components, the actual backend implementation, the queueing / scheduling / multiplexing component, and an interface where you can issue orders to the queue, instantly receiving a token for future completion.

In akka-http, we have some implementations of those components but not abstraction that you could easily plug together.

akka-ci · 2016-09-08T15:44:39Z

Comment by agemooij
Monday Apr 25, 2016 at 11:43 GMT

@jrudolph 👍 akka-http is still pretty low-level, especially the current client-side APIs.

Are #15909 and #16856 still being worked on? I guess this is (or should be) part of #16856, right?

asarkar · 2017-12-03T23:40:02Z

I happen to have the same problem, and created #1591 for it. I'll close it if there is a conclusion to this ticket. @agemooij did you eventually find a solution?

agemooij · 2017-12-04T08:37:56Z

@asarkar the solution I reported above worked for me.
The latest version of that is here: https://github.com/agemooij/stream-experiments/blob/master/src/main/scala/scalapenos/experiments/streams/LongPollingHttpClient.scala

asarkar · 2017-12-04T13:33:51Z

@agemooij thanks for your response. My problem is similar but different. I've a list of unknown size and want to run a long polling request for each one. In your code, you return a Source, so it's not clear to me what happens when you materialize the graph. In my case, the code is blocked after it processes the first element from the list, and never gets to the second one. Here's most of my code for reference:

P.S. I'm using the Connection-Level API because of this warning from the docs:

The request-level API is implemented on top of a connection pool that is shared inside the actor system. A consequence of using a pool is that long-running requests block a connection while running and starve other requests. Make sure not to use the request-level API for long-running requests like long-polling GET requests. Use the Connection-Level Client-Side API or an extra pool just for the long-running connection instead.

private lazy val connectionFlow: Flow[HttpRequest, HttpResponse, Future[Http.OutgoingConnection]] =
  Http().outgoingConnection(host = k8SProperties.host, port = k8SProperties.port)

private val responseValidationFlow = (response: HttpResponse) => {
  response.entity.dataBytes
    .via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 8096))
    .mapAsyncUnordered(Runtime.getRuntime.availableProcessors()) { data =>
      if (response.status == OK) {
        Unmarshal(data).to[Event]
          .map(Right(_))
      } else {
        Future.successful(data.utf8String)
          .map(Left(_))
      }
    }
}

override def receive = {
  case GetEventsRequest(apps, replyTo) => {
    val request: String => HttpRequest = (app: String) => RequestBuilding.Get(Uri(k8SProperties.baseUrl)
      .withPath(Path(s"/api/v1/watch/namespaces/${k8SProperties.namespace}/pods"))
      .withQuery(Query(Map(
        "labelSelector" -> s"app=$app",
        "export" -> "true",
        "includeUninitialized" -> "false",
        "pretty" -> "false"
      )))
    )
      .withHeaders(
        Accept(MediaRange(MediaTypes.`application/json`)),
        Authorization(OAuth2BearerToken(authToken))
      )

    val flow = Source(apps)
      .map(request)
      .withAttributes(ActorAttributes.dispatcher(blockingDispatcher))
      .via(connectionFlow)
      .flatMapMerge(apps.size, responseValidationFlow)

      val done = flow.runForeach(replyTo ! GetEventsResponse(_))

    Await.result(done, Duration.Inf)
  }
}

agemooij · 2017-12-04T16:01:11Z

@asarkar A few initial comments:

I notice an Await.result call in your actor, which is an infinitely blocking call. That doesn;t sounds like a great idea so I'm not surprised that your code blocks.
For your comments on the use of the connection-level API, please see the discussion above about the merits of using a pool and the fact that I used a custom pool configured to only have one active connection. Of course you can tweak that pool for your use case but it saves you from having to deal with connection management, reconnecting, etc. that is done for you transparently and that makes the code a lot simpler

agemooij · 2017-12-04T16:04:29Z

@asarkar AFAICS the reason why the above code only performs the first long poller is that you ended up sequencing all pollers by using map instead of a form of mapAsync and sending them all to the same connectionFlow. Your other pollers never get started because the first element will cause connectionFlow to long-poll forever, never getting around to picking up another element.

asarkar · 2017-12-04T16:18:48Z

@agemooij The infinite blocking is required, otherwise I get a future timeout. It makes sense to me because the stream never closes. As for the connection management, if I switch to a cachedHostConnectionPool with max 50 connections (random number large enough for my use case), I seem to be getting what I want. I've yet to play with the connection pool settings and see how they differ, and at what point they saturate.

agemooij · 2017-12-04T20:17:52Z

@asarkar Please read what I said above and look at my example code. I'm not using the cachedHostconnectionPool but a customized one.

The infinite blocking call will make your actor very resilient. Much better is to turn the long poller into a self-sustaining cycle with some options to break out of its loop for specific situations. This is done by the cyclic graph I use in my example. No need for infinite blocking.

asarkar · 2017-12-04T20:34:39Z

I'll try that, although, this is a pet project and I'm running out of time. You misunderstood though, I never said you are using cachedHostconnectionPool, I said, "if I switch to" (it).
I'll report back when I've something useful. Thanks for your time, I appreciate it.

liff · 2017-12-05T08:49:41Z

How about a more generic unfoldFlow as a building block?

import scala.concurrent.Future
import akka.NotUsed
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.{HttpRequest, Uri}
import akka.http.scaladsl.unmarshalling.{FromResponseUnmarshaller, Unmarshal}
import akka.stream.scaladsl.{Broadcast, Flow, GraphDSL, Merge, Source}
import akka.stream.{Materializer, SourceShape}


object ConsulThings {
  /** Same as [[Source.unfold]], but uses a [[Flow]] to generate the next state-element tuple. */
  def unfoldFlow[S, E](s: S)(f: Flow[S, Option[(S, E)], NotUsed]): Source[E, NotUsed] = {
    val first = Source.single(s)
    val fo = f.takeWhile(_.isDefined).collect{case Some(elem) => elem}
    val next = Flow[(S, E)].map(_._1)

    Source.fromGraph(GraphDSL.create() { implicit b =>
      import GraphDSL.Implicits._

      val merge = b.add(Merge[S](2))
      val split = b.add(Broadcast[(S, E)](2))
      val emit = b.add(Flow[(S, E)].map(_._2))

      first ~> merge ~> fo   ~> split ~> emit
               merge <~ next <~ split

      SourceShape(emit.out)
    })
  }

  /** “Watches” a blocking Consul query endpoint. */
  def watch[A: FromResponseUnmarshaller](uri: Uri)(implicit system: ActorSystem, materializer: Materializer): Source[A, NotUsed] = {
    import system.dispatcher

    def addIndex(index: Option[String]) =
      index.map(i => uri.withQuery(uri.query().+:("index" -> i).+:("wait" -> "5s"))).getOrElse(uri)

    unfoldFlow(Option.empty[String])(Flow[Option[String]]
      .map(i => HttpRequest(uri = addIndex(i)))
      // uses a single connection; the authority of the URI must not change between elements
      .via(Http(system).outgoingConnection(host = uri.authority.host.address(), port = uri.authority.port))
      .mapAsync(1) { response =>
        if (response.status.isSuccess()) {
          val index = response.headers.find(_ is "x-consul-index").map(_.value())
          Unmarshal(response).to[A].map(a => Option((index, a)))
        } else {
          Future.failed(new RuntimeException(response.status.toString))
        }
      })
  }
}

agemooij · 2017-12-05T09:30:38Z

Yeah, that looks pretty good.

I would recommend you to use a connection pool like in my example instead of Http(). outgoingConnection since the pool will transparently deal with connection loss and reconnection for you while the single-connection solution will fail and complete all streams when one of them has a temporary connection problem.

In long-polling you are going to be dealing with intermittent reconnection and without using a connection pool, that would be very hard to deal with.

liff · 2017-12-05T10:23:40Z

Alternatively, wrapping the watch in RestartSource seems to work well, too.

akka-ci added this to the 2.4.x milestone Sep 8, 2016

akka-ci added 3 - in progress Someone is working on this ticket help wanted Identifies issues that the core team will likely not have time to work on t:http labels Sep 8, 2016

ktoso added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted and removed t:http labels Sep 8, 2016

ktoso removed this from the 2.4.x milestone Sep 12, 2016

asarkar mentioned this issue Dec 3, 2017

Provide an example for how to implement Http long polling akka/akka#20121

Closed

asarkar mentioned this issue Dec 4, 2017

Akka HTTP streaming client for long-lived connection never completes #1591

Closed

agemooij mentioned this issue Jul 30, 2018

Add ability to continuously watch a resource doriordan/skuber#181

Merged

Provide an example for how to implement Http long polling #63

Provide an example for how to implement Http long polling #63

Comments

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

akka-ci commented Sep 8, 2016

asarkar commented Dec 3, 2017

agemooij commented Dec 4, 2017

asarkar commented Dec 4, 2017 • edited Loading

agemooij commented Dec 4, 2017

agemooij commented Dec 4, 2017 • edited Loading

asarkar commented Dec 4, 2017 • edited Loading

agemooij commented Dec 4, 2017

asarkar commented Dec 4, 2017

liff commented Dec 5, 2017

agemooij commented Dec 5, 2017

liff commented Dec 5, 2017

asarkar commented Dec 4, 2017 •

edited

Loading

agemooij commented Dec 4, 2017 •

edited

Loading

asarkar commented Dec 4, 2017 •

edited

Loading