skypilot-users

How does the Sky Serve readiness probe handle routing of traffic to different replicas?

I'm curious about how the Sky Serve readiness probe manages routing of traffic to different replicas based on the readiness signal. Specifically, I want to know if traffic is immediately routed to a different replica upon receiving a 'not ready' signal, such as when the max queue size is reached within a container. Is this behavior only during startup or does it continue during operation?

Al

Aleks Smechov

Asked on Mar 19, 2024

  • The Sky Serve readiness probe is used for healthiness checks and determining if a replica can handle new requests.
  • If a replica is not ready to handle new requests, the controller will terminate that replica and start a new one.
  • The readiness probe is not currently designed for queue size checking, but there are plans to add support for collecting queue size information.
  • To customize where requests should be sent, it is recommended to customize the algorithm of the load balancer.
  • For handling request routing, you can customize the load balancing policies in SkyPilot.
  • Example of load balancing policies customization: Load Balancing Policies
Mar 19, 2024Edited by