Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use asg warm pools for faster buildkite job starts #822

Open
nitrocode opened this issue Apr 8, 2021 · 7 comments · May be fixed by #838
Open

Use asg warm pools for faster buildkite job starts #822

nitrocode opened this issue Apr 8, 2021 · 7 comments · May be fixed by #838
Labels
agent lifecycle Agent boot, job lifecycle, agent shutdown asg-initiated-termination enhancement

Comments

@nitrocode
Copy link
Contributor

nitrocode commented Apr 8, 2021

https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-ec2-auto-scaling-introduces-warm-pools-accelerate-scale-out-while-saving-money/

https://aws.amazon.com/blogs/compute/scaling-your-applications-faster-with-ec2-auto-scaling-warm-pools/

If we could keep X instances warmed up, we could start jobs much faster without having to set the min count on the asg to something non-zero.

It should be added soon to cloudformation: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-as-group.html

@chloeruka
Copy link
Contributor

Woah! You're quick. Yeah this feature looks great; especially for Windows instances that take a while to boot. According to the announcement it's not yet available on Cloudformation, but when it is we'll take a look into it.

@nitrocode
Copy link
Contributor Author

nitrocode commented May 4, 2021

The only param that would require an additional input to use the warm pool would be the min size of it. We use an asg min size of 0 and max size of 10 so a warm pool min size of 3 seems reasonable.

The cloudformation docs have been released.

We could add the following

Parameters:
  WarmPoolMinSize:
    Description: Minimum number of instances in warm pool
    Type: Number
    Default: 0

Conditions:
    UseWarmPool:
      !Not [ !Equals [ !Ref WarmPoolMinSize, 0 ] ]

Resources:
  WarmPool: 
    Type: AWS::AutoScaling::WarmPool
    Condition: UseWarmPool
    Properties:
      AutoScalingGroupName: !Ref AgentAutoScaleGroup
      MinSize: !Ref WarmPoolMinSize
      PoolState: Stopped

What do you folks think?

@nitrocode nitrocode linked a pull request May 4, 2021 that will close this issue
@theonlysinjin
Copy link

This is fantastic.
I've been using the Cloudwatch metrics to scale ASG when there are no idle agents (ie all are busy now), to help beat the agent start time.
Though it seems our agents now start (with the update) in just under 2 minutes which is pretty good.

@dieend
Copy link

dieend commented Nov 9, 2021

Warm Pool is now available in CloudFormation https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-autoscaling-warmpool.html

Right now I can't manually update the created ASG from the current template because the template configuring MixedInstancesPolicy

MixedInstancesPolicy:
InstancesDistribution:
OnDemandPercentageAboveBaseCapacity: !If [ SpotPriceSet, 0, !Ref OnDemandPercentage ]
SpotAllocationStrategy: capacity-optimized
SpotMaxPrice: !If [SpotPriceSet, !Ref SpotPrice, !Ref "AWS::NoValue"]
LaunchTemplate:
LaunchTemplateSpecification:
LaunchTemplateId: !Ref AgentLaunchTemplate
Version: !GetAtt "AgentLaunchTemplate.LatestVersionNumber"
Overrides:
- InstanceType: !Select [ "0", !Split [ ",", !Join [ ",", [ !Ref InstanceType, "", "", "" ] ] ] ]
- !If
- UseInstanceType2
- InstanceType: !Select [ "1", !Split [ ",", !Join [ ",", [ !Ref InstanceType, "", "", "" ] ] ] ]
- !Ref "AWS::NoValue"
- !If
- UseInstanceType3
- InstanceType: !Select [ "2", !Split [ ",", !Join [ ",", [ !Ref InstanceType, "", "", "" ] ] ] ]
- !Ref "AWS::NoValue"
- !If
- UseInstanceType4
- InstanceType: !Select [ "3", !Split [ ",", !Join [ ",", [ !Ref InstanceType, "", "", "" ] ] ] ]
- !Ref "AWS::NoValue"

We have to remove them if we'd like to use WarmPool

@aiven-amartin
Copy link

Any progress/caveats on this issue? Would be really nice to have this option rather than trying to bake git checkouts/other slow workspace parts into the AMI itself.

@alex-shakouri-ai
Copy link

This would be helpful on our side as well! Would help out with ensuring we have agents on a warm start up to be able to start jobs faster!

@wolfeidau
Copy link
Contributor

@alex-shakouri-ai We had a look at this and it would be hard to just plug into the existing model.

Currently we aren't looking at re architecting the auto scaling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent lifecycle Agent boot, job lifecycle, agent shutdown asg-initiated-termination enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants