Auto-Scaling
To enable auto-scaling you need to go to the checkbox autoscaling:

The way that auto-scaling works is in conjunction with Max Concurrent requests that is set:

Max Concurrent Requests: This is the number of requests per replica that after the number set will begin rejecting requests. The trade off with setting a higher number can be higher latency but a lower number rejecting the requests.
Target Concurrent Requests: This is the number of concurrent requests that trigger autoscaling of another replica. This should be set below the Max Concurrent requests.
Smoothing Factor: This is the moving average of concurrent requests and how long you the amount of target requests coming in before a new replica is brought up or scaled down.
Conservative: Responds to changes slowly
Moderate: Balanced responsiveness
Aggressive: Quickly react to change
Auto Scaling Range for Replicas: This is the minimum to maximum amount of replicas the system will put as the base line and the max it will go up to.
Note: Whatever options you have set for hardware selection will be used for autoscaling.
Last updated