Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an example for how to implement Http long polling #63

Open
akka-ci opened this issue Sep 8, 2016 · 38 comments
Open

Provide an example for how to implement Http long polling #63

akka-ci opened this issue Sep 8, 2016 · 38 comments
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted 3 - in progress Someone is working on this ticket help wanted Identifies issues that the core team will likely not have time to work on

Comments

@akka-ci
Copy link

akka-ci commented Sep 8, 2016

Issue by agemooij
Tuesday Mar 22, 2016 at 16:33 GMT
Originally opened as akka/akka#20121


Quoted from a Gitter discussion:

Hey hakkers. I'm still pretty new to doing complex things with streams and akka-http. Does anyone have an example of how to implement long polling with the client-side part of akka-http? I've got a Consul client written as Spray actors that I'm thinking of moving to akka-http but the conceptual model is (unsurprisingly) very different indeed.

The docs mention long polling a bunch of times when discussing the higher (i.e. connection pooling) API levels but I can't find any real information (docs, examples, blogs, SO questions, etc.) on how I would go about implementing a long polling client using akka-http and streams.

For instance: I want to poll forever but how do I automatically recreate the http connection produced by Http().outgoingConnection when it closes? And that method produces a Flow[HttpRequest, HttpResponse, Future[Http.OutgoingConnection]] but how do I turn that into a loop of requests where every next request is based on the response to the previous one (i.e. polling forchanges since the index that got sent with the previous change). Both of these are basically circular/recursive internal graph stages but conceptually what I want is a Source[ConsulChangeEvent].

@akka-ci akka-ci added this to the 2.4.x milestone Sep 8, 2016
@akka-ci akka-ci added 3 - in progress Someone is working on this ticket help wanted Identifies issues that the core team will likely not have time to work on t:http labels Sep 8, 2016
@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by drewhk
Tuesday Mar 22, 2016 at 16:36 GMT


Isn't long polling simply like:

val events = Source.repeat(pollRequest).via(Http().outgoingConnection(url))

I.e. continuously feed requests into the Http connection, returning a response once there is one.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by ktoso
Tuesday Mar 22, 2016 at 16:38 GMT


Somewhat, but plus reconnecting

Hah, yeah in that sense yeah. I keep thinking of the more complicated cases somehow, will dump thoughts here in a bit (what we did in ConductR was more than just that)

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by drewhk
Tuesday Mar 22, 2016 at 16:40 GMT


Then use Source.repeat(pollRequest).mapAsync(1)(Http.singleRequest) for now

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Tuesday Mar 22, 2016 at 16:46 GMT


The URI of each next request depends on the response to the previous request. An example API call:

GET /v1/health/service/blaze-canary-service?index=3232540&wait=300s&passing

Here, index is a value that comes from the previous response. Also, indeed, when the connection gets dropped, it should recreate it again. The wait param specifies the server-side max wait (i.e. 300 seconds). After that the connection gets dropped and it should be recreated.

The docs strongly state that using the shared pool for long-polling requests is not encouraged. AFAIK Http().singleRequest uses the super pool.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Tuesday Mar 22, 2016 at 16:48 GMT


The target API is this Consul (see the top section about "blocking queries")

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Tuesday Mar 22, 2016 at 16:49 GMT


A Spray implementation as a reference for what I'm trying to achieve using streams:

package rfs.blaze.gateway.config
package consul

import scala.concurrent.duration._
import scala.util._

import akka.actor._
import akka.io.IO

import spray.can._
import spray.can.client._
import spray.http._
import spray.httpx.RequestBuilding._
import spray.json._

object Consul {
  def startWatching(endpoint: ConsulEndpoint, sender: ActorRef)(implicit actorRefFactory: ActorRefFactory): ActorRef = {
    val encodedUri = endpoint.baseUri.path.toString.replaceAllLiterally("/", "-")
    val name = s"consul-watcher${encodedUri}"
    val watcher = actorRefFactory.actorOf(Props[ConsulEndpointWatcher], name)

    watcher.tell(ConsulEndpointWatcher.StartWatching(endpoint), sender)
    watcher
  }
}

object ConsulEndpointWatcher {
  case class StartWatching(endpoint: ConsulEndpoint)
  case object Reconnect

  case class ConsulValueChanged(value: AnyRef)
  case object InvalidConsulResponse
  case class ConsulEndpointUnreachable(endpoint: ConsulEndpoint)
}

/**
 *
 */
class ConsulEndpointWatcher extends Actor with ActorLogging {
  import ConsulEndpointWatcher._
  import context.dispatcher
  import Http._

  def receive = notConnected

  def notConnected: Receive = {
    case StartWatching(endpoint)  {
      context.become(connecting(sender(), endpoint, None, None))
      connect(endpoint)
    }
  }

  def connecting(controller: ActorRef, endpoint: ConsulEndpoint, previousIndex: Option[Long], previousValue: Option[AnyRef]): Receive = {
    case _: Connected  {
      log.debug("Connected")

      val connection = sender()

      context.watch(connection)
      context.become(watching(controller, connection, endpoint, previousIndex, previousValue))
      watch(connection, endpoint, previousIndex)
    }

    case CommandFailed(Connect(address, _, _, _, _))  {
      log.error(s"Failed to connect using address ${address}")

      controller ! ConsulEndpointUnreachable(endpoint)

      // TODO: is reconnect always the correct strategy?
      // TODO: implement a backoff strategy
      // shouldn't we leave this to the supervision hierarchy?
      context.system.scheduler.scheduleOnce(1 second, self, Reconnect)
    }

    case Reconnect  {
      log.debug("Connecting...")

      connect(endpoint)
    }

    case Terminated(actor)  log.debug(s"Received Terminated event in 'connecting' state. Deceased: $actor")
  }

  def newIndex(headers: Seq[HttpHeader]) =
    headers.find(_.is("x-consul-index")).map(_.value.toLong)

  def consulIndexAlreadySeen(previousIndex: Option[Long], headers: Seq[HttpHeader]) =
    previousIndex.isDefined && previousIndex == newIndex(headers)

  def watching(controller: ActorRef, connection: ActorRef, endpoint: ConsulEndpoint, previousIndex: Option[Long], previousValue: Option[AnyRef]): Receive = {
    case HttpResponse(_, _, headers, _) if consulIndexAlreadySeen(previousIndex, headers) 
      log.debug(s"x-consul-index ${newIndex(headers)} already seen")
      watch(connection, endpoint, newIndex(headers))

    // TODO: deal with other non-200 status codes?
    case HttpResponse(status: StatusCodes.ServerError, entity, _, _)  {
      log.warning(s"Received status $status from consul: $entity")
      watch(connection, endpoint, previousIndex, delay = 1 second)
    }

    case HttpResponse(status, entity, headers, _)  {
      readResponseValue(entity, endpoint.reader) match {
        case Success(newValue)  {
          if (previousValue.forall(_ != newValue)) {
            log.debug(s"Received new value $newValue on endpoint $endpoint")
            controller ! ConsulValueChanged(newValue)
            context.become(watching(controller, connection, endpoint, newIndex(headers), Some(newValue)))
          } else {
            log.debug(s"Received unchanged value $newValue on endpoint $endpoint")
          }

          watch(connection, endpoint, newIndex(headers))
        }

        case Failure(error)  {
          log.error(s"Error while reading response value. Error: ${error.getMessage}, status code: ${status}")
          controller ! InvalidConsulResponse
          watch(connection, endpoint, newIndex(headers), delay = 1 second)
        }
      }
    }

    case event @ (SendFailed(_) | Timedout(_) | Aborted | Closed | PeerClosed | ErrorClosed(_) | Terminated(_))  {
      log.debug(s"Disconnected from $endpoint due to $event, reconnecting")
      context.become(connecting(controller, endpoint, previousIndex, previousValue))
      context.system.scheduler.scheduleOnce(1 second, self, Reconnect)
    }

    case Reconnect  log.warning("Received a Reconnect event in 'watching' state.")
  }

  // ==========================================================================
  // Implementation details
  // ==========================================================================

  /** Overridable for test purposes */
  private[consul] def connect(endpoint: ConsulEndpoint): Unit = {
    IO(Http)(context.system) ! Connect(
      host = endpoint.host,
      port = endpoint.port,
      settings = Some(ClientConnectionSettings(context.system).copy(
        idleTimeout = endpoint.watchTimeout + endpoint.clientTimeoutBuffer,
        requestTimeout = endpoint.watchTimeout + endpoint.clientTimeoutBuffer
      ))
    )
  }

  private def watch(connection: ActorRef, endpoint: ConsulEndpoint, previousIndex: Option[Long], delay: FiniteDuration = 0 seconds): Unit = {
    context.system.scheduler.scheduleOnce(delay, connection, requestFor(endpoint, previousIndex))
  }

  private def requestFor(endpoint: ConsulEndpoint, previousIndex: Option[Long]): HttpRequest = {
    Get(endpoint.uri(previousIndex)) ~> addHeader(HttpHeaders.Connection("Keep-Alive"))
  }

  private def readResponseValue(entity: HttpEntity, reader: ConsulResponseReader): Try[AnyRef] = {
    entity.toOption.map { data 
      Try(reader.read(JsonParser(data.asString)))
    }.getOrElse(Failure(new RuntimeException("Empty http entity!")))
  }
}

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Tuesday Mar 22, 2016 at 16:52 GMT


And @drewhk I would love for there to be a "don't be stupid, just use this one-liner" answer to this because the complexity of this in Spray was exactly the reason why I started looking at how this would work in akka-http 😄

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by drewhk
Tuesday Mar 22, 2016 at 16:57 GMT


The docs strongly state that using the shared pool for long-polling requests is not encouraged. AFAIK Http().singleRequest uses the super pool.

I guess the host based connection pool can be used then:

Source.repeat(pollRequest).via(Http().cachedHostConnectionPool(...))

This will do reconnects AFAIK at the pool level, and since this is a dedicated pool it won't interfere with the rest of the system.

The final solution might be a bit more complicated and need a Graph cycle if you need to limit the poll count bounded. For example (pseudocode):

Graph.create {
  Source.single(pollRequest) ~> merge ~> cachedConnectionPool ~> bcast ~> out
                                                   merge <~ map(_ => pollRequest) <~ bcast
}

will keep the number of outstanding polls exactly one (as there is one poll circulating in the loop). Of course above needs error handling.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Tuesday Mar 22, 2016 at 20:35 GMT


Thanks. That looks like something that might work. I'll go figure it out and report back. That DSL sure takes some time to get used to 😄

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Tuesday Mar 29, 2016 at 21:22 GMT


Finally got some time to experiment with this again. An update on my progress so far:

  • the approach sketched above by @drewhk works in essence, although it required some tweaks here and there
  • the details of the cached host connection pool make things a bit more complex due to the type of the flow returned, i.e. Flow[(HttpRequest, T), (Try[HttpResponse], T), HostConnectionPool]. For now I've gone for using the request itself as the identifying T
  • if I don't immediately consume the response entity, the whole thing grounds to a halt. This means I need to map the output HttpResponse to one using a HttpEntity.Strict entity to make sure the source doesn't get blocked by consumers that forget to read the entity.
  • I'm still curious about whether/how this could work using the connection-level API, mostly because the use of the cached host connector pool makes it very tricky to make sure that the same cached pool is not used by other code, which might also be long pooling. I think whoever wrote those warnings in the docs about not misusing shared pools for long polling was correct. I'll dive deeper into this.

Here's what I have so far. All feedback/comments/suggestions would be very welcome.

object AkkaHttpLongPolling {
  def longPollingSource(host: String, port: Int, uri: Uri, maxWait: Duration)(implicit s: ActorSystem, fm: Materializer): Source[HttpResponse, NotUsed] = {
    import GraphDSL.Implicits._
    import s.dispatcher

    val connectionPoolSettings = ConnectionPoolSettings(s)

    Source.fromGraph(GraphDSL.create() { implicit b 
      val init = Source.single(createRequest(uri, maxWait, None))
      val merge = b.add(Merge[HttpRequest](2))
      val broadcast = b.add(Broadcast[(Try[HttpResponse], HttpRequest)](2))
      val tupler = b.add(Flow[HttpRequest].map(r  (r, r)))
      val http = b.add(Http().cachedHostConnectionPool[HttpRequest](host, port, connectionPoolSettings).mapMaterializedValue(_  NotUsed))

      val outbound = b.add(
        Flow[(Try[HttpResponse], HttpRequest)]
          .collect { case (Success(response), _)  response }
          .mapAsync(1)(response  response.entity.toStrict(5.seconds).map(strictEntity  response.copy(entity = strictEntity)))
          .withAttributes(ActorAttributes.supervisionStrategy(Supervision.resumingDecider))
      )

      val feedback = b.add(
        Flow[(Try[HttpResponse], HttpRequest)]
          .map {
            case (Success(response), _)  {
              val index = response.headers.find(_.is("x-consul-index")).map(_.value.toLong)
              println(s"Success. New index = ${index}.")
              createRequest(uri, maxWait, index)
            }
            case (Failure(cause), _)  {
              println("Failure: " + cause.getMessage)
              createRequest(uri, maxWait, None)
            }
          }
      )

      // format: OFF
      init ~> merge ~> tupler ~> http ~> broadcast ~> outbound
              merge <~ feedback       <~ broadcast
      // format: ON

      SourceShape(outbound.out)
    })
  }

  private def createRequest(baseUri: Uri, maxWait: Duration, index: Option[Long]): HttpRequest = {
    HttpRequest(
      uri = baseUri.copy(
        rawQueryString = Some(s"wait=${maxWait.toSeconds}s&index=${index.map(_.toString).getOrElse("0")}")
      )
    )
  }
}

The query parameters on the request are specific to the Consul API but I expect every long-polling endpoint has some form of those same two parameters.

I plan to turn this into a blog post once I've figured some more things out and handled all the corner cases.

Would this fit anywhere in the akka-http docs?

Updated: implemented conversion to Httpentity.Strict and reformatted using the akka docs style

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Sunday Apr 03, 2016 at 20:17 GMT


Some more progress. I tired using the connection-level flow in combination with flatMapConcat but so far it doesn't reuse the resulting connection but start a new one every time. Also, just like the above version using the cached pool, the resulting flow does not recover from network errors and collapses on the first failure.

object LongPollingHttpClientUsingSingleConnection {
  def longPollingSource(host: String, port: Int, uri: Uri, maxWait: Duration)(implicit s: ActorSystem, fm: Materializer): Source[HttpResponse, NotUsed] = {
    Source.fromGraph(GraphDSL.create() { implicit b 
      import GraphDSL.Implicits._
      import s._

      val settings = ClientConnectionSettings(s).withIdleTimeout(maxWait * 1.2)

      val initSource: Source[HttpRequest, NotUsed] = Source.single(createRequest(uri, maxWait, None)).log("outer 0", out  s"Sending request...")
      val httpFlow: Flow[HttpRequest, HttpResponse, NotUsed] =
        Flow[HttpRequest]
          .flatMapConcat(request 
            Source.single(request)
              .via(Http().outgoingConnection(host, port, settings = settings))
          )
          .mapMaterializedValue(_  NotUsed)

      val outboundResponsesFlow: Flow[HttpResponse, HttpResponse, NotUsed] =
        Flow[HttpResponse] // TODO: add size limit
          .mapAsync(1)(response  response.entity.toStrict(5.seconds).map(strictEntity  response.copy(entity = strictEntity)))
          .withAttributes(ActorAttributes.supervisionStrategy(Supervision.resumingDecider)) // TODO: turn into log-and-resume

      val feedbackResponsesFlow: Flow[HttpResponse, HttpRequest, NotUsed] =
        Flow[HttpResponse]
          .map { response 
            log.debug(s"Success. Response: ${response.copy(entity = HttpEntity.Empty)}.")
            val index = response.headers.find(_.is("x-consul-index")).map(_.value.toLong)
            log.debug(s"New index: ${index}.")
            createRequest(uri, maxWait, index)
          }

      val init = b.add(initSource)
      val http = b.add(httpFlow)
      val merge = b.add(Merge[HttpRequest](2))
      val broadcast = b.add(Broadcast[HttpResponse](2))
      val outbound = b.add(outboundResponsesFlow)
      val feedback = b.add(feedbackResponsesFlow)

      // format: OFF
      init ~> merge ~> http     ~> broadcast ~> outbound
              merge <~ feedback <~ broadcast
      // format: ON

      SourceShape(outbound.out)
    })
  }

  private def createRequest(baseUri: Uri, maxWait: Duration, index: Option[Long]): HttpRequest = {
    HttpRequest(
      uri = baseUri.copy(
        rawQueryString = Some(s"wait=${maxWait.toSeconds}s&index=${index.map(_.toString).getOrElse("0")}")
      )
    )
  }
}

I think the docs could use some sections on stream resiliency and failure recovery in more complex cases such as this.

I think I'll dive into custom graph stages next, since they are the only code I've seen so far that has access to upstream completion/failure events.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Sunday Apr 03, 2016 at 20:19 GMT


I hope my efforts at slowly clawing my way towards enlightenment provide some insight into the struggles of people trying to figure out how this stream stuff works 😄

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by ktoso
Sunday Apr 03, 2016 at 22:22 GMT


Awesome research here, thanks for sharing it and sorry I wasn't able to comment in depth yet, a bit overloaded last week. Hope to investigate here this week :)

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Saturday Apr 09, 2016 at 09:24 GMT


Hi @ktoso . Any hints towards directions to take in my research. I'm hoping to dig a little deeper this weekend so if you have any ideas, I'd love to hear them and try them out.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Saturday Apr 09, 2016 at 14:20 GMT


I have discovered PoolInterfaceActor and PoolFlow. Studying them now...

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by ktoso
Saturday Apr 09, 2016 at 17:30 GMT


Had some time now while wife was shopping ;)
The initial code looks pretty good I think its the rights direction. I'll need to get to a pc tough to give more detailed feedback :)

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Thursday Apr 21, 2016 at 08:59 GMT


Hi @ktoso and the rest of the akka team. I've been diving into the implementation of http core and streams and I'm very curious about the design decisions that went into the current design and about your ideas about the direction things are going.

In the current design, http connection management has been directly linked to stream materialization and there are no hooks at all in the API for dealing with events and state changes on the connection. The stream just stops and the surrounding code will have to detect this in some way, which is not entirely trivial since AFAIK there are no easy prebuilt stages for dealing with "normal" stream completion.

This means that in order to build streams that deal with connection loss, transparently reconnecting, etc., you will need to break our of the akka-stream context and dive into the implementation layer and the underlying actors.

This is exactly how the host connection pools in http core have been implemented, i.e. a Flow implemented by an actor that internally uses more actors to rematerialize nested connection Flows when they complete.

Unfortunately a lot of that work is private and very deeply entwined with complex implementation details. I'm attempting to strip out and copy the parts I would need to implement a persistent, resilient long polling connection Source but a lot of those implementation details are still not entirely obvious to me.

I'm wondering whether this topic, i.e. exposing connection management in http core as a higher level abstraction, has been discussed in the team and whether there are plans to provide such a thing in the future.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Thursday Apr 21, 2016 at 09:13 GMT


updated:: removed outdated link to source code.

Next step, strip out some of the PoolInterfaceActor implementation details in an attempt to get an http connection Flow that transparently reconnects (or find some way to implement this using a unique single-connection host connector pool per long polling operation, but then that warning in the Akka docs definitely needs to be rewritten 😄 )

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Friday Apr 22, 2016 at 22:08 GMT


After writing that last bit in the above comment, I tested an implementation using a custom, non-shared pool configured to only allow on single connection and that seems to work, including resilience across connection failures.

It seems that as long as you make sure the pool is not used by anyone else, using a single-connection host pool is actually the best solution for long polling since it stops you from having to implement all the rematerialization stuff hidden behind the pool interface.

Latest implementation now here: https://github.com/agemooij/stream-experiments/blob/master/src/main/scala/scalapenos/experiments/streams/LongPollingHttpClient.scala

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by ktoso
Friday Apr 22, 2016 at 22:18 GMT


I figured it's about time I spend a considerable chunk of time on this, to appreciate your effort here – I'm diving in now, thanks for keeping at it!

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by ktoso
Friday Apr 22, 2016 at 23:49 GMT


So I've read through but need to give it a spin to notice exact semantics, some things that are missing you pointed out. API style actually is something we could provide - exacly such extension.

I also noted:
nextRequest: HttpResponse ⇒ HttpRequest, // yeah, this makes a lot of sense, possibly a default could be identity (imagine stateful endpoint, like twitter stream)

import s._ // avoid importing everything from system

.collect { case Success(response) ⇒ response } // probably log failures though?

.mapAsync(1)(response ⇒ response.entity.toStrict(5.seconds).map(strictEntity ⇒ response.copy(entity = strictEntity))) // we'd likely want to the toStrict and combine with "json streaming" (once i finally finish it)?

.withAttributes(ActorAttributes.supervisionStrategy(Supervision.resumingDecider)) // TODO: turn into log-and-resume supervision in its current form is not that good, not sure if this is very needed here, if possible working without it would be preferable

the materialized value to control it is obviously a key feature to add.

single-connection host pool likely makes sense for this scenario, I think agree, this way we don't impact anyone else; but will try to verify with others.

We'll meet with the team on monday and I hope to have a look together at this and how it could move forward btw.

Hope the comments some sense and we'll be back here soon :)

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by ktoso
Friday Apr 22, 2016 at 23:49 GMT


I'll assign myself here to keep it on my radar

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by jrudolph
Saturday Apr 23, 2016 at 09:24 GMT


@agemooij how easy would it be to implement your use-case using singleRequest over a dedicated pool? Maybe that's the core abstraction that is missing?

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Saturday Apr 23, 2016 at 12:35 GMT


@jrudolph good idea. I think the main problem with using singleRequest is how to make sure that the pool used for the long polling connections is not shared with "normal" http client code to the same host/port.

Some kind of way to make absolutely sure the pool is unique, like an optional name field in ConnectionPoolSettings would fix that.

If isolation can be guaranteed, then the remaining problem comes down to configuring the maximum number of pool connections for a situation in which connections get hogged for very long periods and the number of needed connections is very variable. Just setting the number to be pretty high (i.e. just guessing) would be perfectly acceptable to most users I guess.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Saturday Apr 23, 2016 at 13:07 GMT


I also noted:
nextRequest: HttpResponse ⇒ HttpRequest, // yeah, this makes a lot of sense, possibly a default could be identity (imagine stateful endpoint, like twitter stream)

Good idea. The default value should be initialRequest then, which requires splitting the argument list. The API then becomes:

  def longPollingSource(host: String, port: Int,
                        initialRequest: HttpRequest,
                        connectionSettings: ClientConnectionSettings = ClientConnectionSettings(system))
                       (nextRequest: HttpResponse  HttpRequest = _ => initialRequest)
                       (implicit m: Materializer): Source[HttpResponse, NotUsed] = {

.collect { case Success(response) ⇒ response } // probably log failures though?

I'd log it on the feedback side, since the output side is only meant for reporting successfully received responses.

import s._ // avoid importing everything from system

Is this a best practice I'm unaware of? What's the danger?

.mapAsync(1)(response ⇒ response.entity.toStrict(5.seconds).map(strictEntity ⇒ response.copy(entity = strictEntity))) // we'd likely want to the toStrict and combine with "json streaming" (once i finally finish it)?

I'm not sure what you mean. Are you suggesting adding API for this on HttpResponse?

Having to always drain the response entities to prevent blocking the main flow is certainly inconvenient for building user-proof APIs, so I'd prefer some easier, safely bounded way to implement response transformation in cases such as this; i.e. you have a certain knowledge of the types of responses you will likely receive and you don't want to bubble up this responsibility to the API user.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by jrudolph
Monday Apr 25, 2016 at 11:26 GMT


I think the main problem with using singleRequest is how to make sure that the pool used for the long polling connections is not shared with "normal" http client code to the same host/port.

I think using a dedicated single-connection pool is the right solution. The responsibility of the "pool" is the connection state handling.

The abstraction (not the functionality) which is currently missing is something which provides singleRequest given an arbitrary Flow[(T, HttpRequest), (T, HttpResponse)].

This is something quite generic. Think about the burger chain where in the backend you have a pipeline of (burger stacking) cooks, basically a Flow[Order, Burger]. For organizational reasons, this simple linear pipeline may not be good enough, so the backend is implemented behind an interface of the kind Flow[(OrderId, Order), (OrderId, Burger)] (could be a simple, linear pipeline, or a number of on-duty cooks / pipelines waiting in a pool for more work which some algorithm scheduling pipelines intelligently (like reusing existing cooks/pipelines before getting cooks from the pool which have to dry themselves first, etc.)). On top of that the sales process is that the customer makes the order at the vending machine, getting a token of future completion in form of a receipt containing the order ID. So, we have several components, the actual backend implementation, the queueing / scheduling / multiplexing component, and an interface where you can issue orders to the queue, instantly receiving a token for future completion.

In akka-http, we have some implementations of those components but not abstraction that you could easily plug together.

@akka-ci
Copy link
Author

akka-ci commented Sep 8, 2016

Comment by agemooij
Monday Apr 25, 2016 at 11:43 GMT


@jrudolph 👍 akka-http is still pretty low-level, especially the current client-side APIs.

Are #15909 and #16856 still being worked on? I guess this is (or should be) part of #16856, right?

@ktoso ktoso added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted and removed t:http labels Sep 8, 2016
@ktoso ktoso removed this from the 2.4.x milestone Sep 12, 2016
@asarkar
Copy link

asarkar commented Dec 3, 2017

I happen to have the same problem, and created #1591 for it. I'll close it if there is a conclusion to this ticket. @agemooij did you eventually find a solution?

@agemooij
Copy link
Contributor

agemooij commented Dec 4, 2017

@asarkar the solution I reported above worked for me.
The latest version of that is here: https://github.com/agemooij/stream-experiments/blob/master/src/main/scala/scalapenos/experiments/streams/LongPollingHttpClient.scala

@asarkar
Copy link

asarkar commented Dec 4, 2017

@agemooij thanks for your response. My problem is similar but different. I've a list of unknown size and want to run a long polling request for each one. In your code, you return a Source, so it's not clear to me what happens when you materialize the graph. In my case, the code is blocked after it processes the first element from the list, and never gets to the second one. Here's most of my code for reference:

P.S. I'm using the Connection-Level API because of this warning from the docs:

The request-level API is implemented on top of a connection pool that is shared inside the actor system. A consequence of using a pool is that long-running requests block a connection while running and starve other requests. Make sure not to use the request-level API for long-running requests like long-polling GET requests. Use the Connection-Level Client-Side API or an extra pool just for the long-running connection instead.

private lazy val connectionFlow: Flow[HttpRequest, HttpResponse, Future[Http.OutgoingConnection]] =
  Http().outgoingConnection(host = k8SProperties.host, port = k8SProperties.port)

private val responseValidationFlow = (response: HttpResponse) => {
  response.entity.dataBytes
    .via(Framing.delimiter(ByteString("\n"), maximumFrameLength = 8096))
    .mapAsyncUnordered(Runtime.getRuntime.availableProcessors()) { data =>
      if (response.status == OK) {
        Unmarshal(data).to[Event]
          .map(Right(_))
      } else {
        Future.successful(data.utf8String)
          .map(Left(_))
      }
    }
}

override def receive = {
  case GetEventsRequest(apps, replyTo) => {
    val request: String => HttpRequest = (app: String) => RequestBuilding.Get(Uri(k8SProperties.baseUrl)
      .withPath(Path(s"/api/v1/watch/namespaces/${k8SProperties.namespace}/pods"))
      .withQuery(Query(Map(
        "labelSelector" -> s"app=$app",
        "export" -> "true",
        "includeUninitialized" -> "false",
        "pretty" -> "false"
      )))
    )
      .withHeaders(
        Accept(MediaRange(MediaTypes.`application/json`)),
        Authorization(OAuth2BearerToken(authToken))
      )

    val flow = Source(apps)
      .map(request)
      .withAttributes(ActorAttributes.dispatcher(blockingDispatcher))
      .via(connectionFlow)
      .flatMapMerge(apps.size, responseValidationFlow)

      val done = flow.runForeach(replyTo ! GetEventsResponse(_))

    Await.result(done, Duration.Inf)
  }
}

@agemooij
Copy link
Contributor

agemooij commented Dec 4, 2017

@asarkar A few initial comments:

  • I notice an Await.result call in your actor, which is an infinitely blocking call. That doesn;t sounds like a great idea so I'm not surprised that your code blocks.
  • For your comments on the use of the connection-level API, please see the discussion above about the merits of using a pool and the fact that I used a custom pool configured to only have one active connection. Of course you can tweak that pool for your use case but it saves you from having to deal with connection management, reconnecting, etc. that is done for you transparently and that makes the code a lot simpler

@agemooij
Copy link
Contributor

agemooij commented Dec 4, 2017

@asarkar AFAICS the reason why the above code only performs the first long poller is that you ended up sequencing all pollers by using map instead of a form of mapAsync and sending them all to the same connectionFlow. Your other pollers never get started because the first element will cause connectionFlow to long-poll forever, never getting around to picking up another element.

@asarkar
Copy link

asarkar commented Dec 4, 2017

@agemooij The infinite blocking is required, otherwise I get a future timeout. It makes sense to me because the stream never closes. As for the connection management, if I switch to a cachedHostConnectionPool with max 50 connections (random number large enough for my use case), I seem to be getting what I want. I've yet to play with the connection pool settings and see how they differ, and at what point they saturate.

@agemooij
Copy link
Contributor

agemooij commented Dec 4, 2017

@asarkar Please read what I said above and look at my example code. I'm not using the cachedHostconnectionPool but a customized one.

The infinite blocking call will make your actor very resilient. Much better is to turn the long poller into a self-sustaining cycle with some options to break out of its loop for specific situations. This is done by the cyclic graph I use in my example. No need for infinite blocking.

@asarkar
Copy link

asarkar commented Dec 4, 2017

I'll try that, although, this is a pet project and I'm running out of time. You misunderstood though, I never said you are using cachedHostconnectionPool, I said, "if I switch to" (it).
I'll report back when I've something useful. Thanks for your time, I appreciate it.

@liff
Copy link

liff commented Dec 5, 2017

How about a more generic unfoldFlow as a building block?

import scala.concurrent.Future
import akka.NotUsed
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model.{HttpRequest, Uri}
import akka.http.scaladsl.unmarshalling.{FromResponseUnmarshaller, Unmarshal}
import akka.stream.scaladsl.{Broadcast, Flow, GraphDSL, Merge, Source}
import akka.stream.{Materializer, SourceShape}


object ConsulThings {
  /** Same as [[Source.unfold]], but uses a [[Flow]] to generate the next state-element tuple. */
  def unfoldFlow[S, E](s: S)(f: Flow[S, Option[(S, E)], NotUsed]): Source[E, NotUsed] = {
    val first = Source.single(s)
    val fo = f.takeWhile(_.isDefined).collect{case Some(elem) => elem}
    val next = Flow[(S, E)].map(_._1)

    Source.fromGraph(GraphDSL.create() { implicit b =>
      import GraphDSL.Implicits._

      val merge = b.add(Merge[S](2))
      val split = b.add(Broadcast[(S, E)](2))
      val emit = b.add(Flow[(S, E)].map(_._2))

      first ~> merge ~> fo   ~> split ~> emit
               merge <~ next <~ split

      SourceShape(emit.out)
    })
  }

  /** “Watches” a blocking Consul query endpoint. */
  def watch[A: FromResponseUnmarshaller](uri: Uri)(implicit system: ActorSystem, materializer: Materializer): Source[A, NotUsed] = {
    import system.dispatcher

    def addIndex(index: Option[String]) =
      index.map(i => uri.withQuery(uri.query().+:("index" -> i).+:("wait" -> "5s"))).getOrElse(uri)

    unfoldFlow(Option.empty[String])(Flow[Option[String]]
      .map(i => HttpRequest(uri = addIndex(i)))
      // uses a single connection; the authority of the URI must not change between elements
      .via(Http(system).outgoingConnection(host = uri.authority.host.address(), port = uri.authority.port))
      .mapAsync(1) { response =>
        if (response.status.isSuccess()) {
          val index = response.headers.find(_ is "x-consul-index").map(_.value())
          Unmarshal(response).to[A].map(a => Option((index, a)))
        } else {
          Future.failed(new RuntimeException(response.status.toString))
        }
      })
  }
}

@agemooij
Copy link
Contributor

agemooij commented Dec 5, 2017

Yeah, that looks pretty good.

I would recommend you to use a connection pool like in my example instead of Http(). outgoingConnection since the pool will transparently deal with connection loss and reconnection for you while the single-connection solution will fail and complete all streams when one of them has a temporary connection problem.

In long-polling you are going to be dealing with intermittent reconnection and without using a connection pool, that would be very hard to deal with.

@liff
Copy link

liff commented Dec 5, 2017

Alternatively, wrapping the watch in RestartSource seems to work well, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted 3 - in progress Someone is working on this ticket help wanted Identifies issues that the core team will likely not have time to work on
Projects
None yet
Development

No branches or pull requests

5 participants