Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMR Serverless - Max concurrent vCPUs per account #212

Open
pgasiorowski opened this issue Feb 10, 2025 · 6 comments
Open

EMR Serverless - Max concurrent vCPUs per account #212

pgasiorowski opened this issue Feb 10, 2025 · 6 comments
Labels

Comments

@pgasiorowski
Copy link

The notification alert is triggered even though it does not reach the quota limit

Service: EMR Serverless
LimitName: Max concurrent vCPUs per account
Type: Per account
Current limit: 8000
Default limit: 16

As you can see on attached screenshot the utilization never reaches the limit but the
Quota Monitor claims the "Current Usage" is 100-618% which triggers false alerts.

~ $ aws service-quotas list-service-quotas --service-code emr-serverless 
{
    "Quotas": [
        {
            "ServiceCode": "emr-serverless",
            "ServiceName": "Amazon EMR Serverless",
            "QuotaArn": "arn:aws:servicequotas:REGION:ACCOUNT:emr-serverless/L-D05C8A75",
            "QuotaCode": "L-D05C8A75",
            "QuotaName": "Max concurrent vCPUs per account",
            "Value": 8000.0,
            "Unit": "None",
            "Adjustable": true,
            "GlobalQuota": false,
            "UsageMetric": {
                "MetricNamespace": "AWS/Usage",
                "MetricName": "ResourceCount",
                "MetricDimensions": {
                    "Class": "None",
                    "Resource": "vCPU",
                    "Service": "EMR Serverless",
                    "Type": "Resource"
                },
                "MetricStatisticRecommendation": "Sum"
            },
            "QuotaAppliedAtLevel": "ACCOUNT"
        }
    ]
}

Image

@sanjay-reddy-kandi
Copy link
Member

Thanks for raising this issue. We are taking a look and will get back to you on this

@sanjay-reddy-kandi
Copy link
Member

Are you seeing WARN/ or ERROR alerts? What is the NotificationThreshold parameter value in your CloudFormation template? Also, what does 100-618% in your question mean? Pls lmk. thanks

@pgasiorowski
Copy link
Author

pgasiorowski commented Feb 12, 2025

NotificationThreshold is set to 80

I am seeing some WARNINGs and ERROR alerts:

Image

By 100-618% in my question I mean that:

  • The errors from QuotaMonitor I receive show CurrentUsage is between 100-618%`. 627.9% was the highest alert I received today.

Again, The account quota limit is set to 8000 vCPUs, and the account has never reached it according to CloudWatch Metrics.
So either the alert is wrong or the metrics are wrong.

@sanjay-reddy-kandi
Copy link
Member

We are working to debug this issue. The solution uses AWS Service Quotas and CloudWatch metric query APIs, specifically the SERVICE_QUOTA function to determine utilization percentages. To investigate further, we need additional information:

  1. Please provide the value of the L-D05C8A75 quota from the SQQuotaTable DynamoDB table generated by the solution.
  2. Execute the following command, adjusting the date range to cover when you observed the issue:
    aws cloudwatch get-metric-data --metric-data-queries file://./my_metric_data_auto_scaling.json --start-time 2025-02-11T00:00:00Z --end-time 2025-02-12T23:59:59Z > emr_serverless_metric_output.json. Please share the contents of emr_serverless_metric_output.json. Please adjust the time accordingly. This is the emr_serverless_metric_data.json file.
  3. Please can you share the actual percentage usage screenshot observed in the Service Quota console for EMR Serverless like this? Adjust the time accordingly
Image
  1. Have you recently requested or received any quota increases for EMR Serverless?
  2. Are you able to share the CloudWatch Logs for the QM-CWPoller Lambda function with LOG_LEVEL environment variable for the Lambda function set to debug? If so, please provide logs covering the time period when an incorrect alert was generated.

To temporarily mute the alert, follow this Configure notifications section in our Implementation Guide.

Based on your responses, we may need to investigate further or apply a temporary fix while working on a permanent solution. Thank you for your cooperation and patience as we work to resolve this issue.

@pgasiorowski
Copy link
Author

  1. Value=4000

For the other answers I could contact you through our aws representative. Please confirm if that is fine.

@sanjay-reddy-kandi
Copy link
Member

Yes, that's totally fine. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants