Skip to content

Commit

Permalink
docs: create new Node.js performance docs section and clean up upgrad…
Browse files Browse the repository at this point in the history
…ing docs (#6466)

* docs: create new Node.js performance docs section and clean up upgrading docs

* docs: create lambda -> Node.js performance doc cross-link
  • Loading branch information
kuhe committed Sep 12, 2024
1 parent 6a539fe commit 059223d
Show file tree
Hide file tree
Showing 6 changed files with 264 additions and 14 deletions.
31 changes: 24 additions & 7 deletions UPGRADING.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,13 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa
configure them by supplying a new `requestHandler`. Here's the example of setting http options in Node.js runtime. You
can find more in [v3 reference for NodeHttpHandler](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-smithy-node-http-handler/).

All v3 requests use HTTPS by default. You only need to provide custom httpsAgent.
All v3 requests use HTTPS by default. You can provide a custom agent via the `httpsAgent`
field of the `NodeHttpHandler` constructor input.

```javascript
const { Agent } = require("https");
const { Agent: HttpAgent } = require("http");
const { NodeHttpHandler } = require("@smithy/node-http-handler");

const dynamodbClient = new DynamoDBClient({
requestHandler: new NodeHttpHandler({
httpsAgent: new Agent({
Expand All @@ -71,19 +72,19 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa
});
```

If you are passing custom endpoint which uses http, then you need to provide httpAgent.
If you are using a custom endpoint which uses http, then you can provide an `httpAgent`.

```javascript
const { Agent } = require("http");
const { NodeHttpHandler } = require("@smithy/node-http-handler");
const dynamodbClient = new DynamoDBClient({
endpoint: "http://example.com",
requestHandler: new NodeHttpHandler({
httpAgent: new Agent({
/*params*/
}),
}),
endpoint: "http://example.com",
});
```

Expand All @@ -92,6 +93,7 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa

```javascript
const { FetchHttpHandler } = require("@smithy/fetch-http-handler");
const dynamodbClient = new DynamoDBClient({
requestHandler: new FetchHttpHandler({
requestTimeout: /*number in milliseconds*/
Expand Down Expand Up @@ -121,14 +123,16 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa
- **v3**: **Deprecated**. Requests are _always_ asynchronous.
- `xhrWithCredentials`
- **v2**: Sets the "withCredentials" property of an XMLHttpRequest object.
- **v3**: Not available. SDK inherits [the default fetch configurations](https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch)
- **v3**: the `fetch` equivalent field `credentials` can be set via constructor
configuration to the `requestHandler` config when using the browser
default `FetchHttpHandler`.

- [`logger`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#logger-property)
- **v2**: An object that responds to .write() (like a stream) or .log() (like the console object) in order to log information about requests.
- **v3**: No change. More granular logs are available in v3.
- [`maxRedirects`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#maxRedirects-property)
- **v2**: The maximum amount of redirects to follow for a service request.
- **v3**: **Deprecated**. SDK _does not_ follow redirects to avoid unintentional cross-region requests.
- **v3**: **Deprecated**. SDK _does not_ follow redirects to avoid unintentional cross-region requests. S3 region redirects can be enabled separately with `followRegionRedirects=true` in the S3 Client only.
- [`maxRetries`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Config.html#maxRetries-property)
- **v2**: The maximum amount of retries to perform for a service request.
- **v3**: Changed to `maxAttempts`. See more in [v3 reference for RetryInputConfig](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-smithy-middleware-retry/#maxattempts).
Expand Down Expand Up @@ -179,6 +183,19 @@ This list is indexed by [v2 config parameters](https://docs.aws.amazon.com/AWSJa
- **v2**: Whether to use the Accelerate endpoint with the S3 service.
- **v3**: No change.
## Error handling
Top level fields such as `error.code` and http response metadata like the
status code have slightly moved locations within the thrown error object
to subfields like `error.$metadata` or `error.$response`.
This is because v3 more accurately follows the service models and avoids
adding metadata at the top level of the error object, which may conflict
with the structural error shape modeled by the services.
See how error handling has changed in v3
here: [ERROR_HANDLING](./supplemental-docs/ERROR_HANDLING.md).
## Credential Providers
In v2, the SDK provides a list of credential providers to choose from, as well as a credentials provider chain,
Expand Down Expand Up @@ -348,7 +365,7 @@ variable.
### File System Credentials
- **v2**: [`FileSystemCredentials`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/FileSystemCredentials.html)
- **v2**: [`FileSystemCredentials`](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/FileSystemCredentials.html)
represents credentials from a JSON file on disk.
- **v3**: **Deprecated**. You can explicitly read the JSON file and supply to the client. Please open a
[feature request](https://github.com/aws/aws-sdk-js-v3/issues/new?assignees=&labels=feature-request&template=---feature-request.md&title=)
Expand Down
21 changes: 14 additions & 7 deletions supplemental-docs/AWS_LAMBDA.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

## AWS Lambda provided AWS SDK

Several AWS Lambda runtimes, including those for Node.js, include the AWS SDK at various versions.
Several AWS Lambda runtimes, including those for Node.js, include the AWS SDK at various versions.

The SDK is provided as a convenience for development. For greater control of the SDK version and its runtime characteristics such as
The SDK is provided as a convenience for development. For greater control of the SDK version and its runtime characteristics such as
JavaScript bundling, upload your selection of the AWS SDK as part of your function code.

To check the version of the SDK that is installed, you can log the package.json metadata of a package that you are using.
Expand All @@ -16,13 +16,13 @@ const pkgJson = require("@aws-sdk/client-s3/package.json");
exports.handler = function (event) {
console.log(pkgJson);
return JSON.stringify(pkgJson);
}
};
```

## Best practices for initializing AWS SDK Clients in AWS Lambda

Suppose that you have an `async` function called, for example `prepare`, that you need to initialize only once.
You do not want to execute it for every function invocation.
Suppose that you have an `async` function called, for example `prepare`, that you need to initialize only once.
You do not want to execute it for every function invocation.

```js
// Example: one-time initialization in the handler code path.
Expand Down Expand Up @@ -51,7 +51,7 @@ export async function handler(event) {
}
```

There is a potential complication with this style. This is a peculiarity of AWS Lambda's cold/warm states and provisioned concurrency.
There is a potential complication with this style. This is a peculiarity of AWS Lambda's cold/warm states and provisioned concurrency.
If you make network requests in the `prepare()` function, they may be frozen pre-flight as part of early provisioning. In a certain
edge case, time-sensitive signed requests may become invalid due to the delay between provisioning and execution.

Expand All @@ -65,7 +65,8 @@ let ready = false;

export async function handler(event) {
if (!ready) {
await prepare(); ready = true;
await prepare();
ready = true;
}
// ...
}
Expand Down Expand Up @@ -94,3 +95,9 @@ export async function handler(event) {
});
}
```

## Parallel request workloads with the AWS SDK on AWS Lambda

See also the section about parallel workloads in Node.js, which is
applicable to AWS Lambda:
[Performance/Parallel Workloads in Node.js](./performance//parallel-workloads-node-js.md).
39 changes: 39 additions & 0 deletions supplemental-docs/CLIENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -533,6 +533,45 @@ client.middlewareStack.add(
await client.listBuckets({});
```
### Middleware Caching `cacheMiddleware`.
> Available only in [v3.649.0](https://github.com/aws/aws-sdk-js-v3/releases/tag/v3.649.0) and later.
By default (false), the middleware function stack is resolved every request,
because the user may modify the middleware stack by adding middleware to the
`client` or `command` instances at any time.
By contrast, when `cacheMiddleware=true`, the creation of the middleware function stack
is cached on a per-client, per-command-class basis.
In the following example, the S3 HeadObject Command is called 10 times, but
its middleware function stack is only created once, instead of once per request.
```ts
// example: middleware caching
import { S3Client, HeadObjectCommand } from "@aws-sdk/client-s3";

const client = new S3Client({ cacheMiddleware: true });

for (let i = 0; i < 10; ++i) {
await client.send(
new HeadObjectCommand({
Bucket: "...",
Key: String(i),
})
);
}
```
This caches the combination of `S3Client+HeadObjectCommand`'s resolved
`middlewareStack` upon the first request. This has two key effects:
- request creation time is reduced by (up to) a few milliseconds per request
- modifying the middleware stack after requests have begun will have no effect.
**Only enable this feature if you need the marginal increaese to
request performance, and are aware of its side-effects.**
### Dual-stack `useDualstackEndpoint`
This is a simple `boolean` setting that is present in most SDK Clients.
Expand Down
5 changes: 5 additions & 0 deletions supplemental-docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ Upgrading from AWS SDK for JavaScript (v2) (https://github.com/aws/aws-sdk-js).

Best practices for working within AWS Lambda using the AWS SDK for JavaScript (v3).

#### [Performance](./performance/README.md)

Details what steps the AWS SDK team has taken to optimize performance of the SDK,
and includes tips for configuring the SDK to run efficiently.

#### [TypeScript](./TYPESCRIPT.md)

TypeScript tips & FAQ related to this project.
1 change: 1 addition & 0 deletions supplemental-docs/performance/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ Topics:
- [Bundle Sizes](./bundle-sizes.md)
- [Dynamic Imports](./dynamic-imports.md)
- [Dependency File Count Reduction](./dependency-file-count-reduction.md)
- [Parallel workloads in Node.js](./parallel-workloads-node-js.md)
181 changes: 181 additions & 0 deletions supplemental-docs/performance/parallel-workloads-node-js.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Performance > Parallel workloads in Node.js

Other sections such as bundle sizing, dependency count, and dynamic imports
cover aspects of performance related to the initial startup of your application.

This section focuses on post-startup performance of request throughput. Specifically,
we cover performance configuration of the AWS SDK for JavaScript (v3)
in Node.js using HTTP/1.1 and the `node:https` module via the SDK's requestHandler
dependency, `@smithy/node-http-handler`.

## What is a parallel workload?

A parallel workload is any time you make more than one request
before the first request has completed.

In single-threaded JavaScript, this is accomplished via the asynchronicity of `Promise`s.

## Configuration options related to throughput

Here is an example containing SDK Client configuration options that have
an effect on request throughput.

```ts
// example: configuring an SDK client for throughput.
import { S3 } from "@aws-sdk/client-s3";
import { NodeHttpHandler } from "@smithy/node-http-handler";
import { Agent } from "node:https";

const s3 = new S3({
/**
* Default is false. Setting this to true caches
* middleware resolution and prevents modifications
* to the middlewareStack from taking effect.
*
* Use only if you are not adding custom middleware.
*/
cacheMiddleware: true,
requestHandler: new NodeHttpHandler({
httpsAgent: new Agent({
/**
* Default is true. This should be left as true
* generally speaking, unless you have very specific
* use-case needing the alternative.
*/
keepAlive: true,
/**
* See expanded note below about sockets.
* You should use a number that is the size
* of your parallel workload batch.
*/
maxSockets: 50,
}),
}),
});

// shorthand syntax available since v3.521.0
const client = new S3({
requestHandler: {
requestTimeout: 3_000,
httpsAgent: { maxSockets: 50 },
},
});
```

## Client instances

In this SDK, much functionality is cached for performance reasons, but
the cache is usually associated with the client instance. In particular,
the following are cached on the client instance:

- credentials fetched by async function calls
- if your client is configured to source credentials from a provider that includes
a network request and/or file-system read, this work is done once per client until
expiration of the credentials. If you instantiate a new client for every request,
this will slow things down substantially.
- middleware function stack when `cacheMiddleware=true`
- `node:https` Agent and its socket pool

If you do need multiple instances of an SDK client, but don't want to
have separate credentials and socket pools, you can share
credentials and requestHandlers between clients.

```ts
// example: credential and socket pool sharing from primary client.
import { S3 } from "@aws-sdk/client-s3";

const s3_east = new S3({ region: "us-east-1" });

const { credentials, requestHandler } = s3_east.config;

const s3_west = new S3({
region: "us-west-2",
credentials,
requestHandler,
});
```

```ts
// example: credential and socket pool sharing from user instantiated objects.
import { S3 } from "@aws-sdk/client-s3";
import { fromNodeProviderChain } from "@aws-sdk/credential-providers";
import { NodeHttpHandler } from "@smithy/node-http-handler";

const credentials = fromNodeProviderChain();
const requestHandler = new NodeHttpHandler({
httpsAgent: {
maxSockets: 100,
},
});

const s3_east = new S3({ region: "us-east-1", credentials, requestHandler });
const s3_west = new S3({ region: "us-west-2", credentials, requestHandler });
```

## Node.js Sockets

The `node:https` Agent class manages sockets on your behalf. The most impactful configuration you can make for parallel workloads is to set
the value of `maxSockets`.

Configuring the `maxSockets` value for the SDK's requestHandler should
be based on the parallelism or parallel workload batch size of your application
and usage scenario.

- Configuring too few sockets leads to a slowdown as this is equivalent to
setting a lower cap on the parallel workload batch size.
- Configuring too many sockets can _also_ slow down your application. This is
because the application may open a new socket, which takes some CPU time, when
an existing socket was about to become free for reuse.
- configuring too many sockets can cause you to hit the file descriptor limit of the
operating system. This can manifest as `Error: EMFILE, too many open files`
in Node.js.

## Example Scenario

You have 10,000 files to upload to S3.

- Uploading one at a time is too slow.
- Uploading all at once risks crashing your application process, or
being throttled by the server.

#### Recommendataion

Test your application to determine the right level of parallel request traffic.
After that, configure the `maxSockets` value to be equal to the batch size, or
a factor of it.

```ts
// example: workload of 10,000 files, batch size of 100.
import { S3 } from "@aws-sdk/client-s3";

const files = [
/*... */
];
const BATCH_SIZE = 100;

const s3 = new S3({
requestHandler: {
httpsAgent: { maxSockets: 100 },
},
});

const promises = [];
while (files.length) {
promises.push(
...files.slice(0, BATCH_SIZE).map((file) => {
return s3.putObject({
Bucket: "...",
Key: file.name,
Body: file.contents,
});
})
);
await Promise.all(promises);
promises.length = 0;
}
```

In this example we've adhered to the best practices mentioned in this section:

- use one client instance for repeated requests
- set a `maxSockets` value that is a factor of the batch size

0 comments on commit 059223d

Please sign in to comment.