Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS SDK - Intermittent NullReferenceException #3625

Open
JamieKeeling opened this issue Jan 28, 2025 · 14 comments
Open

AWS SDK - Intermittent NullReferenceException #3625

JamieKeeling opened this issue Jan 28, 2025 · 14 comments
Labels
bug This issue is a bug. dynamodb p2 This is a standard priority issue

Comments

@JamieKeeling
Copy link

JamieKeeling commented Jan 28, 2025

Describe the bug

When attempting to insert records using IDynamoDBContext.SaveAsync a NullReferenceException is thrown from the AWS DynamoDB SDK.

Expected Behavior

The IDynamoDBContext.SaveAsync call is successful and the record is created within the configured AWS DynamoDB table.

Current Behavior

The following stack trace shows where the error is raised:

innerExceptionMesage: Object reference not set to an instance of an object.
innerExceptionStackTrace: at Amazon.DynamoDBv2.DocumentModel.Table.LoadTable(IAmazonDynamoDB ddbClient, TableConfig config)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.GetUnconfiguredTable(String tableName, Boolean disableFetchingTableMetadata)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.GetTargetTable(ItemStorageConfig storageConfig, DynamoDBFlatConfig flatConfig, DynamoDBConsumer consumer)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.SaveHelperAsync(Type valueType, Object value, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.SaveHelperAsync[T](T value, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.SaveAsync[T](T value, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)

Reproduction Steps

Unable to reproduce locally, only where hosted.

Configuration of AWS SDK services (AWSSDK.Extensions.NETCore.Setup)

var options = configuration.GetAWSOptions();
services.AddDefaultAWSOptions(options);
services.AddAWSService<IAmazonDynamoDB>();
services.TryAddSingleton<IDynamoDBContext, DynamoDBContext>();

Saving the record

var operationConfig = new DynamoDBOperationConfig { OverrideTableName = _dynamoTableName };
await _dynamoDbContext.SaveAsync(recordToSave, operationConfig);

Note: _dynamoTableName is the logical id for the DynamoDB table.

Additionally, we use the following logic within Startup.cs to ensure that the required interfaces are registered and available before marking the application as ready (letting AWS restart the task if necessary):

private static void VerifyAmazonConnectivity(IApplicationBuilder app)
{
        app.ApplicationServices.GetRequiredService<IAmazonDynamoDB>();
        app.ApplicationServices.GetRequiredService<IDynamoDBContext>();
}

Possible Solution

We have been unable to reproduce the issue locally, and so instead must resort to failing and creating new Fargate instances until we see evidence of the IDynamoDBContext.SaveAsync working as intended.

Additional Information/Context

Initially we manually registered the required dependencies as follows:

services.AddSingleton<IAmazonDynamoDB, AmazonDynamoDBClient>();
services.AddSingleton<IDynamoDBContext, DynamoDBContext>();

This resulted in a different error message when invoking IDynamoDBContext.SaveAsync:

innerExceptionMesage: Value cannot be null. (Parameter 'ddbClient')
innerExceptionStackTrace: at Amazon.DynamoDBv2.DocumentModel.Table..ctor(IAmazonDynamoDB ddbClient, TableConfig config)
at Amazon.DynamoDBv2.DocumentModel.Table.LoadTable(IAmazonDynamoDB ddbClient, TableConfig config)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.GetUnconfiguredTable(String tableName, Boolean disableFetchingTableMetadata)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.SaveHelperAsync(Type valueType, Object value, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)
at Amazon.DynamoDBv2.DataModel.DynamoDBContext.SaveHelperAsync[T](T value, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)

As you can see from the message there's an implication that the required client isn't being injected. Our representative example purposefully validates they are available before allowing the task to respond to requests.

This in turn changes the error message from the above to the one reported within this issue.

Relatedely, we have other applications using the manual registration and are able to save records using the same functionality without issue.

AWS .NET SDK and/or Package version used

AWSSDK.Extensions.NETCore.Setup : 3.7.301
AWSSDK.DynamoDBv2 : 3.7.405.7
AWSSDK.Core : 3.7.401.1

Targeted .NET Platform

.NET 8

Operating System and version

Alpine Linux (ASP NET Docker Image)

@JamieKeeling JamieKeeling added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 28, 2025
@ashishdhingra
Copy link
Contributor

@JamieKeeling Good morning. Thanks for opening the issue. While I try to investigate the possible root cause, could you please:

  • Your application type.
  • Share the end-to-end minimal code example, including the model for records that you are trying to save.
  • Try enabling verbose logging using statements below and see if anything useful is emitted in the logs.
    Amazon.AWSConfigs.LoggingConfig.LogResponses = Amazon.ResponseLoggingOption.Always;
    Amazon.AWSConfigs.LoggingConfig.LogTo = Amazon.LoggingOptions.Console;
    Amazon.AWSConfigs.AddTraceListener("Amazon", new System.Diagnostics.ConsoleTraceListener());

Thanks,
Ashish

@ashishdhingra ashishdhingra added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Jan 28, 2025
@ashishdhingra ashishdhingra self-assigned this Jan 28, 2025
@JamieKeeling
Copy link
Author

Hi @ashishdhingra, thank you for reaching out.

  • Your application type

This is a .NET 8 Web API

  • Share the end-to-end minimal code example, including the model for records that you are trying to save.

I'm unable to do this - the application code is commercially sensitive and removing enough would dilute the example.

I can provide an anomymised model for the records if that still helps, keeping the attributes and usage intact?

Try enabling verbose logging using statements below and see if anything useful is emitted in the logs.

Still working on this one - it's hard to reproduce the issue to generate logs of interest.

I have also observed that when this issue occurs, we see errors in trying to use IDynamoDBContext.QueryAsync<T>

---> System.InvalidOperationException: Must have one range key or a GSI index defined for the table <redacted>
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.ComposeQueryFilterHelper(DynamoDBFlatConfig currentConfig, Document hashKey, IEnumerable`1 conditions, ItemStorageConfig storageConfig, List`1& indexNames)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.ConvertQueryByValue[T](Object hashKeyValue, IEnumerable`1 conditions, DynamoDBOperationConfig operationConfig, ItemStorageConfig storageConfig)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.QueryAsync[T](Object hashKeyValue, DynamoDBOperationConfig operationConfig)

We query for records as follows:

var operationConfig = new DynamoDBOperationConfig
                {
                    ConsistentRead = true,
                    BackwardQuery = true,
                    OverrideTableName = _dynamoTableName
                };

                var dynamoResponse = await _dynamoDbContext.QueryAsync<POCO>(identifier.ToString(), operationConfig).GetNextSetAsync();

As above, if we restart the application we find query behaviors are restored and perform as expected.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 31, 2025
@ashishdhingra ashishdhingra removed their assignment Feb 6, 2025
@normj
Copy link
Member

normj commented Feb 7, 2025

@JamieKeeling I would highly recommend adjusting how you create the DynamoDBContext object using our newer patterns that remove the underlying lazy call to DescribeTable to get the metadata of your table. That DescribeTable call can cause problematic behavior. We have deprecated the lazy DescribeTable code path in the upcoming in V4. This blog post goes into more depth about the new pattern where essentially your code provides the table metadata avoiding the need for the DescribeTable call. https://aws.amazon.com/blogs/developer/improved-dynamodb-initialization-patterns-for-the-aws-sdk-for-net/

In your case since you are adding DynamoDBContext as a singleton I would do something like the following.

builder.Services.TryAddSingleton<IDynamoDBContext>(sp =>
{
    var client = sp.GetRequiredService<IAmazonDynamoDB>();
    var context = new DynamoDBContext(client, new DynamoDBContextConfig
    {
        DisableFetchingTableMetadata = true
    });

    return context;
});

@dscpinheiro dscpinheiro added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-review labels Feb 7, 2025
@JamieKeeling
Copy link
Author

JamieKeeling commented Feb 7, 2025

Hi @normj - thank you for the suggestion, I have since implemented this change as per the documentation.

Additionally, I have included the following JSON within the appSettings.json due to seeing messages of Resolved DefaultConfigurationMode for RegionEndpoint [eu-west-1] to [Legacy] prior to querying DynamoDB:

{
  "AWS": {
    "Region": "eu-west-1",
    "DefaultsMode": "Standard"
  }
}

I am also validating the configuration as part of startup, ensuring all types are available:

private static void VerifyAmazonConnectivity(IApplicationBuilder app)
 {
     var dynamoClient = app.ApplicationServices.GetRequiredService<IAmazonDynamoDB>();
     dynamoClient.Config.Validate();
     
     app.ApplicationServices.GetRequiredService<IDynamoDBContext>();
 }

With those changes in place I still see the failures, however I now notice a pattern:

IDynamoDbContext.SaveAsync

innerExceptionMesage: Value cannot be null. (Parameter 'ddbClient')
innerExceptionStackTrace:    at Amazon.DynamoDBv2.DocumentModel.Table..ctor(IAmazonDynamoDB ddbClient, TableConfig config)
   at Amazon.DynamoDBv2.DocumentModel.Table.CreateTableFromItemStorageConfig(IAmazonDynamoDB client, TableConfig config, ItemStorageConfig itemStorageConfig, DynamoDBFlatConfig flatConfig)
   at Amazon.DynamoDBv2.DataModel.ItemStorageConfigCache.CreateStorageConfig(Type baseType, String actualTableName, DynamoDBFlatConfig flatConfig)
   at Amazon.DynamoDBv2.DataModel.ItemStorageConfigCache.GetConfig(Type type, DynamoDBFlatConfig flatConfig, Boolean conversionOnly)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.SaveHelperAsync(Type valueType, Object value, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.SaveHelperAsync[T](T value, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.SaveAsync[T](T value, DynamoDBOperationConfig operationConfig, CancellationToken cancellationToken)

Reviewing the stack trace against the AWS SDK source code, it appears that the ddbClient is taken from the Context property within Amazon.DynamoDBv2.DataModel.ItemStorageConfigCache.CreateStorageConfig.

IDynamoDBContext.FromDocument

exception: System.ArgumentNullException: Value cannot be null. (Parameter 'ddbClient')
   at Amazon.DynamoDBv2.DocumentModel.Table..ctor(IAmazonDynamoDB ddbClient, TableConfig config)
   at Amazon.DynamoDBv2.DocumentModel.Table.CreateTableFromItemStorageConfig(IAmazonDynamoDB client, TableConfig config, ItemStorageConfig itemStorageConfig, DynamoDBFlatConfig flatConfig)
   at Amazon.DynamoDBv2.DataModel.ItemStorageConfigCache.CreateStorageConfig(Type baseType, String actualTableName, DynamoDBFlatConfig flatConfig)
   at Amazon.DynamoDBv2.DataModel.ItemStorageConfigCache.GetConfig(Type type, DynamoDBFlatConfig flatConfig, Boolean conversionOnly)
   at Amazon.DynamoDBv2.DataModel.ItemStorageConfigCache.GetConfig[T](DynamoDBFlatConfig flatConfig, Boolean conversionOnly)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.FromDocumentHelper[T](Document document, DynamoDBFlatConfig flatConfig)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.FromDocument[T](Document document, DynamoDBOperationConfig operationConfig)
   at Amazon.DynamoDBv2.DataModel.DynamoDBContext.FromDocument[T](Document document)

Similar to the above, this also attempts to resolve ddbClient in the same manner and fails.

I am unable to reproduce any of the presented issues locally however the IDynamoDBContext.FromDocument example now happens consistently whilst hosted in Fargate.

The DynamoDbClient configuration for the above is as follows:

accountIdEndpointMode: 0
     allowAutoRedirect: true
     authenticationServiceName: dynamodb
     bufferSize: 8192
     cacheHttpClient: true
     clockOffset: 00:00:00
     correctedUtcNow: 2025-02-07T18:31:02.0578706Z
     defaultConfigurationMode: 0
     disableHostPrefixInjection: false
     disableLogging: false
     disableRequestCompression: false
     endpointDiscoveryCacheLimit: 1000
     endpointDiscoveryEnabled: false
     fastFailRequests: false
     httpClientCacheSize: 1
     ignoreConfiguredEndpointUrls: false
     isMaxErrorRetrySet: true
     logMetrics: false
     logResponse: false
     maxErrorRetry: 4
     progressUpdateInterval: 102400
     proxyPort: 0
     readEntireResponse: false
     regionEndpoint: {
       displayName: Europe (Ireland)
       originalSystemName: eu-west-1
       partitionDnsSuffix: amazonaws.com
       partitionName: aws
       systemName: eu-west-1
     }
     regionEndpointServiceName: dynamodb
     requestChecksumCalculation: 0
     requestMinCompressionSizeBytes: 10240
     resignRetries: false
     responseChecksumValidation: 0
     retryMode: 1
     serviceId: DynamoDB
     serviceVersion: 2012-08-10
     signatureMethod: 1
     signatureVersion: 4
     throttleRetries: true
     useAlternateUserAgentHeader: false
     useDualstackEndpoint: false
     useFIPSEndpoint: false
     useHttp: false
     userAgent: aws-sdk-dotnet-coreclr/3.7.405.7 ua/2.0 os/linux#5.10.230.223 md/ARCH#X64 lang/.NET_Core#8.0.12 exec-env/AWS_ECS_FARGATE md/aws-sdk-dotnet-core#3.7.401.1 api/DynamoDB#3.7.405.7

In summary, I can confirm that:

  • IAmazonDynamoDB is configured via DI as per the example above
  • IAmazonDynamoDB is resolvable, as within the same execution path preceding the IDynamoDBContext.FromDocument example we're using IAmazonDynamoDB.QueryAsync successfully

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Feb 8, 2025
@JamieKeeling
Copy link
Author

Based on the example stack trace from my previous reply I have tried to take a deeper look at this, but all avenues point to an issue with the SDK itself.

With the updated DI setup where I am passing in the IAmazonDbClient and DynamoDBContextConfig, the initilisation of IDynamoDBContext will use the following constructor:

public DynamoDBContext(IAmazonDynamoDB client, DynamoDBContextConfig config)

This in turn invokes a call to the internal constructor, crucially creating an instance of ItemStorageConfigCache with the IDynamoDbContext passed in automatically:

this.StorageConfigCache = new ItemStorageConfigCache(this);

Subsequent calls make their way to CreateTableFromItemStorageConfig, and the instance of IAmazonDbClient comes from the Context property set by ItemStorageConfigCache when created above.

var table = Table.CreateTableFromItemStorageConfig(Context.Client, emptyConfig, config, flatConfig);

Finally, a call to create a Table instance is made which is what throws the NullReferenceException:

internal Table(IAmazonDynamoDB ddbClient, TableConfig config)

Knowing this, I have added instrumentation to our application that uses Reflection and checks whether the internal Client against the IDynamoDBContext instance is populated:

var dynamoClient = _dynamoDbContext.GetType().GetProperty("Client", BindingFlags.NonPublic | BindingFlags.Instance);
_logger.Information("CSGet_Client_Check", dynamoClient != null ? "Provided" : "Missing");

In all instances of the error, the log references Provided showing that based on the stack trace and source code examples above there's no logical reason for the reference to be missing.

@sstere
Copy link

sstere commented Feb 20, 2025

@normj

@JamieKeeling is right! I also have this problem for months! I have a Blazor Server application and the registration looks like this:

services.AddAWSService<IAmazonDynamoDB>(); // this is registered as Singleton by default services.AddScoped<IDynamoDBContext>(sp => new DynamoDBContext(sp.GetRequiredService<IAmazonDynamoDB>(), new DynamoDBContextConfig() { TableNamePrefix = $"{Current.EnvironmentName}-" }));

The issue is intermittent. I don't understand what is causing it.

@JamieKeeling
Copy link
Author

JamieKeeling commented Feb 20, 2025

I have managed to replicate this locally using both the initial version of the application (i.e. no modernisation of DI) and also the updated and suggested improvemements within this thread.

Steps

Invoke a call to IDynamoDBContext.FromDocument - specifically, we do so using this example:

using (_dynamoDbContext)
{
    // Pull the searchedAt out of the LastEvaluatedKey so that we can pass this back to the client
    // as the next URL for pagination. Later we use the searchedAt they pass back in in the request
    // to form the new ExclusiveStartKey and so forth.
    var lastKey = dynamoDbResponse.LastEvaluatedKey != null
        ? _dynamoDbContext
            .FromDocument<CustomerServicingPagination>(
                Document.FromAttributeMap(dynamoDbResponse.LastEvaluatedKey)).searchedAt
        : null;

    return new ProductSearchesPaginated
    {
        NextKey = lastKey,
        ProductSearches = dynamoDbResponse.Items.Select(productSearch =>
        {
            var recordAsDocument = Document.FromAttributeMap(productSearch);
            return _dynamoDbContext.FromDocument<ProductSearch>(recordAsDocument);
        }).ToList()
    };
}

This works correctly. In a subsequent HTTP request to the API we then invoke the following:

var operationConfig = new DynamoDBOperationConfig { OverrideTableName = _dynamoTableName };
await _dynamoDbContext.SaveAsync(myObject, operationConfig);

This causes the same ddbClient exception seen when hosted with an identical stack trace.

The SaveAsync call utilises the object persistence model where the myObject reference uses the following type:

myObject (based on current production code, so no additional range key attributes)

[DynamoDBTable("productRefinements", LowerCamelCaseProperties = true)]
public class ProductRefinement
{
    [DynamoDBHashKey]
    public string UniqueIdentifier { get; set; }

    [DynamoDBProperty(typeof(DatetimeOffsetPropertyConverter))]
    public DateTimeOffset UniqueDate { get; set; }

    public string UniquePath { get; set; }

    [DynamoDBProperty(typeof(JObjectPropertyConverter))]
    public JObject PagingArgs { get; set; }

    public string SortKey { get; set; }

    public string UniqueUserId { get; set; }

    [DynamoDBProperty(typeof(JArrayPropertyConverter))]
    public JArray Filters { get; set; }
}
    

@peterrsongg
Copy link
Contributor

Hello, reading through this issue right now. Thank you for providing reproduction steps above. We've also received a different customer report internally. I will post any updates I find here as well

@JamieKeeling
Copy link
Author

JamieKeeling commented Feb 20, 2025

Thanks @peterrsongg - having posted this I spotted we've implemented the using() pattern within the execution path which completes successfully, after which a subsequent use of the IDynamoDbContext interface seems to fail.

using (_dynamoDbContext)
{

Out of curiousity I removed it and was no longer able to reproduce the issue, leading me to believe it's responsible in some way.

I'm assuming we're safe to let .NET manage these interfaces, but are you able to provide any guidance for the expected AWS usages at all?

@peterrsongg
Copy link
Contributor

That is good to know that the using statement changes behavior. I want to step in a debug to see exactly what is happening before I say anything

@normj
Copy link
Member

normj commented Feb 20, 2025

@JamieKeeling You are adding the DynamoDBContext as a scoped services. If a new scope is not created after you wrapped the DynamoDBContext in a using statement that would cause the null pointer exception. Even though the DynamoDBContext doesn't own the IAmazonDbClient because you pass it in it nulls its references the client. So future requests will hit that null.

@JamieKeeling
Copy link
Author

@normj It’s configured in DI as a singleton, how is the lifetime resolved as scoped?

@normj
Copy link
Member

normj commented Feb 20, 2025

@JamieKeeling Sorry I got mixed up with @sstere comment that showed it being added as scoped service. If you are adding it as a singleton then it should never be in a using block. That kills the instance for any other use cases.

@peterrsongg
Copy link
Contributor

I haven't been able to reproduce it locally, with the using statement or without the using statement. But does removing the using statement fix the issue for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. dynamodb p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

6 participants