Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent fetch of azure metricdefinitions and batchApi usage #41790

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
5b9beae
Use concurrency in metricsdefinition collection
MichaelKatsoulis Nov 6, 2024
a103910
Fix conflicts
MichaelKatsoulis Nov 7, 2024
0486d0e
Handle errors
MichaelKatsoulis Nov 6, 2024
1b8314e
Remove commented code
MichaelKatsoulis Nov 7, 2024
b1180db
Change ResourceConfigurations.Metrics to a map
MichaelKatsoulis Nov 13, 2024
245a8e3
Use batch API
MichaelKatsoulis Nov 19, 2024
33a8e0f
New queryResourceClient per location
MichaelKatsoulis Nov 21, 2024
121b69f
Updates
MichaelKatsoulis Nov 25, 2024
a25ce30
Fix error handling
MichaelKatsoulis Nov 26, 2024
976d38a
Wait for 50 reource ids before fetching the metrics
MichaelKatsoulis Nov 28, 2024
9456997
Handle metric definitions update
MichaelKatsoulis Nov 28, 2024
ed7c6f8
Fix error in storage accounts
MichaelKatsoulis Dec 13, 2024
550a83f
Set timegrain if is equal to ''
MichaelKatsoulis Jan 14, 2025
d1af82c
remove comments
MichaelKatsoulis Jan 14, 2025
166f9a2
Set correct endtime
MichaelKatsoulis Jan 14, 2025
e981204
Remove comments
MichaelKatsoulis Jan 16, 2025
9ae52a1
Use batch API as feature
MichaelKatsoulis Jan 22, 2025
2aea321
Use baseclient to tackle code duplication
MichaelKatsoulis Jan 24, 2025
1dbf0a7
Merge remote-tracking branch 'upstream/main' into concurrent-fetch-of…
MichaelKatsoulis Jan 24, 2025
11c3bcd
notice txt
MichaelKatsoulis Jan 24, 2025
54d4c03
Comments and linting errors fix
MichaelKatsoulis Jan 24, 2025
7366106
Set top value
MichaelKatsoulis Jan 27, 2025
7188a40
Add unit tests for concurrent fetching of metric definitions
MichaelKatsoulis Jan 27, 2025
eb1292c
Merge branch 'main' into concurrent-fetch-of-azure-metricdefinitions
MichaelKatsoulis Jan 27, 2025
e9b4c98
Add batch client unit tests
MichaelKatsoulis Jan 28, 2025
81da366
Merge branch 'main' into concurrent-fetch-of-azure-metricdefinitions
MichaelKatsoulis Jan 28, 2025
2c656d5
Remove logs and update asciidoc and changelog
MichaelKatsoulis Jan 28, 2025
b191819
Add support of batch API for storage accounts
MichaelKatsoulis Feb 10, 2025
dd5f3b3
Merge remote-tracking branch 'upstream/main' into concurrent-fetch-of…
MichaelKatsoulis Feb 10, 2025
64735ab
Update docs and add unit tests form storage client
MichaelKatsoulis Feb 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.next.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,7 @@ https://github.com/elastic/beats/compare/v8.8.1\...main[Check the HEAD diff]
- Add `use_performance_counters` to collect CPU metrics using performance counters on Windows for `system/cpu` and `system/core` {pull}41965[41965]
- Add support of additional `collstats` metrics in mongodb module. {pull}42171[42171]
- Preserve queries for debugging when `merge_results: true` in SQL module {pull}42271[42271]
- Add `enable_batch_api` option in azure monitor to allow metrics collection of multiple resources using azure batch Api {pull}41790[41790]

*Metricbeat*
- Add benchmark module {pull}41801[41801]
Expand Down
30 changes: 30 additions & 0 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1833,6 +1833,36 @@ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


--------------------------------------------------------------------------------
Dependency : github.com/Azure/azure-sdk-for-go/sdk/monitor/query/azmetrics
Version: v1.1.0
Licence type (autodetected): MIT
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/github.com/!azure/azure-sdk-for-go/sdk/monitor/query/[email protected]/LICENSE.txt:

MIT License

Copyright (c) Microsoft Corporation. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE

--------------------------------------------------------------------------------
Dependency : github.com/elastic/azure-sdk-for-go/sdk/resourcemanager/consumption/armconsumption
Version: v1.1.0-elastic
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ require (
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.13.0
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.7.0
github.com/Azure/azure-sdk-for-go/sdk/messaging/azeventhubs v1.2.1
github.com/Azure/azure-sdk-for-go/sdk/monitor/query/azmetrics v1.1.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/consumption/armconsumption v1.1.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerservice/armcontainerservice/v4 v4.8.0
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/costmanagement/armcostmanagement v1.1.1
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ github.com/Azure/azure-sdk-for-go/sdk/internal v1.10.0 h1:ywEEhmNahHBihViHepv3xP
github.com/Azure/azure-sdk-for-go/sdk/internal v1.10.0/go.mod h1:iZDifYGJTIgIIkYRNWPENUnqx6bJ2xnSDFI2tjwZNuY=
github.com/Azure/azure-sdk-for-go/sdk/messaging/azeventhubs v1.2.1 h1:0f6XnzroY1yCQQwxGf/n/2xlaBF02Qhof2as99dGNsY=
github.com/Azure/azure-sdk-for-go/sdk/messaging/azeventhubs v1.2.1/go.mod h1:vMGz6NOUGJ9h5ONl2kkyaqq5E0g7s4CHNSrXN5fl8UY=
github.com/Azure/azure-sdk-for-go/sdk/monitor/query/azmetrics v1.1.0 h1:X/C/tY3dxwsuFnSNArmTWKr0O6P59SRY6VsUcIkefEw=
github.com/Azure/azure-sdk-for-go/sdk/monitor/query/azmetrics v1.1.0/go.mod h1:wCAGp7Xm35A5laB8z8yK9p/kU8OEBFuTvUm4eKCzr/M=
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerservice/armcontainerservice/v4 v4.8.0 h1:0nGmzwBv5ougvzfGPCO2ljFRHvun57KpNrVCMrlk0ns=
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerservice/armcontainerservice/v4 v4.8.0/go.mod h1:gYq8wyDgv6JLhGbAU6gg8amCPgQWRE+aCvrV2gyzdfs=
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/costmanagement/armcostmanagement v1.1.1 h1:ehSLdbLah6kk6HTVc6e/lrbmbz7MMbpNxkOd3OYlhB0=
Expand Down
6 changes: 6 additions & 0 deletions metricbeat/docs/modules/azure.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,12 @@ https://management.azure.com/ for azure PublicCloud
https://management.usgovcloudapi.net/ for azure USGovernmentCloud
Users can also use this in case of a Hybrid Cloud model, where one may define their own audiences.

`enable_batch_api` ::
_boolean_
Optional, by default is set to False. Set this to True when facing scalability issues. When configured, the azure batch api will be used
to fetch metrics of multiple resources in one api call.
Currently supported metricsets are monitor, container_registry, container_instance, container_service, compute_vm, compute_vm_scaleset, database_account.

[float]
== Metricsets

Expand Down
118 changes: 118 additions & 0 deletions x-pack/metricbeat/module/azure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
### Azure Monitor Walkthrough when EnableBatchApi is True

#### Initialization Phase

1. **InitResources Method**:

- **Validate Resources**: Checks if any resources are defined in the user configuration.

- **Check Refresh Interval**: If the refresh interval is not expired, it initializes the `MetricDefinitionsChan` and `ErrorChan` channels and sends existing metric definitions through the channel. These metric definitions have been collected in previous collection times.

- **Initialize WaitGroup**: Creates a `sync.WaitGroup` to track all goroutines for resource collection.

- **Retrieve Resource Definitions**: Iterates over user-configured resources and retrieves their definitions from Azure Monitor.

- **Check Resource Definitions**: If, for a given user configuration, no resources have been retrieved, an error is logged, and it continues to the next resource of the configuration.

- **Initialization of Channels**: `MetricDefinitionsChan` and `ErrorChan` are initialized once. The `MetricDefinitionsChan` channel will be used to receive the metric definitions of all resources of the provided configuration. `ErrorChan` will be used to report errors in the metric definitions collection process.

- **Map Resources to Client**: Maps the retrieved resources to the client's resource list.

- **Collect Metric Definitions**: For each resource, calls the provided mapping function (`mapMetrics`) to collect metric definitions. Refer to the **mapMetrics Function**.

- **Close Channels**: Once all goroutines complete, it closes the `MetricDefinitionsChan` and `ErrorChan` channels. This signals that all metric definitions of all resources in the configuration are collected.

2. **mapMetrics Function**:

- **Start Goroutine**: Starts a new goroutine for each resource to collect its metric definitions.

- **Retrieve Metric Definitions**: Calls `getMappedResourceDefinitions` to retrieve and map metric definitions for each resource. Refer to the **getMappedResourceDefinitions Function**.

- **Check for Errors**: In case `getMappedResourceDefinitions` returns an error, it is sent to the `ErrorChan`. This will cause the data collection to stop.

- **Send to Channel**: Sends the retrieved metric definitions to the `MetricDefinitionsChan` channel.

3. **getMappedResourceDefinitions Function**:

- **Avoid Redundant Calls**: Uses a map to avoid calling the metric definitions function multiple times for the same namespace and resource.

- **Retrieve Metric Definitions**: Retrieves metric definitions from Azure Monitor for the specified resource.

- **Filter Supported Metrics**: Validates and filters the metric names and aggregations based on the supported metrics.

- **Map Dimensions**: Maps dimensions to the metrics as specified in the resource configuration.

- **Return Metrics**: Returns the list of mapped metrics.


#### Data Collection and Processing Phase

4. **Fetch Method**:

- **Set Reference Time**: The `Fetch` method starts by setting the reference time for the current fetch operation. This is used to calculate time intervals for metrics collection.

- **Initialize Resources**: It calls the `InitResources` method to collect and validate resources based on user configuration. Refer to the Initialization Phase.

- **Check Channel Initialization**: If the `MetricDefinitionsChan` channel is `nil`, it returns an error, indicating no resources were found based on the configurations.

- **Create Metric Stores**: Initializes a map of `MetricStore` to hold accumulated metrics, grouped by specific criteria. The criteria (`ResDefGroupingCriteria`) are needed in order to use the Batch Request.

- **Process Metrics from Channel**: Enters a loop to process metric definitions as they are sent through the `MetricDefinitionsChan` channel.

- **Update Metric Definitions**: Updates the `MetricDefinitions` if required. The metric definitions are only updated if they have expired. The `MetricDefinitions` are needed in the **Check Refresh Interval** step. In that way, if not expired in an upcoming fetch, the stored `MetricDefinitions` will be used, avoiding redundant API calls.

- **Group and Store Metrics**: Calls `GroupAndStoreMetrics` to group metrics and store them in `MetricStore`. Refer to the **GroupAndStoreMetrics Method**.

- **Process Stores**: If the store size reaches the batch API limit, it processes the store using the `processStore` function and collects metric values. That way, the Batch API will be used in the most efficient way. Refer to the **processStore and processAllStores Functions**.

- **Map and Publish Events**: Maps the collected metric values into events and publishes them using the `mapToEvents` method.

- **Error Handling**:

- **MetricDefinitionsChan is Closed**: In case `MetricDefinitionsChan` is closed, it processes all remaining metric stores using the `processAllStores` function and publishes the final set of events. The `MetricDefinitionsChan` can be closed in case all metric definitions have been collected by all goroutines. In that case, stores that have not reached the size of the batch API limit will be processed, collecting all the metric values.

- **Error received in ErrorChan**: Listens to `ErrorChan`. If an error happens during the **Check for Errors** step of metric definitions collection, we stop the data collection.

- **Terminate Loop**: Breaks the loop when both the data and error channels are closed.

- **Final Processing**: Processes all remaining metric stores using the `processAllStores` function and publishes the final set of events. This step is for safety reasons, in case the **MetricDefinitionsChan is Closed** step is not triggered. May be redundant.

5. **GroupAndStoreMetrics Method**:

- **Group Metrics**: Groups metrics based on specific criteria which are Namespace, Subscription ID, Location, Names, aggregations, TimeGrain, and Dimensions. Batch API can be called for multiple resources only if those criteria are the same for all resources.

- **Check Update Requirement**: Checks if the metric needs to be collected again based on the time grain and the last collection time.

- **Store Metrics**: Adds the metrics to the appropriate `MetricStore`.

6. **processStore and processAllStores Functions**:

- **Collect Metric Values**: Collects metric values for the metrics stored in the `MetricStore` using the batch API.

- **Clear Store**: Clears the metrics from the store after collecting the values. This is required so metric values for the same resources are not collected again in the same collection period.

- **Process All Stores**: Iterates over all metric stores and collects metric values for each, using the batch API.

7. **GetMetricsInBatch Method**:

- **Prepare Batch Request**: Prepares a batch request for the metrics grouped by the same criteria.

- **Set Time Interval**: Sets the time interval for the metrics collection.

- **Add Filter Conditions**: Adds filter conditions for the metrics based on their dimensions.

- **Make API Call**: Makes a batch API call to Azure Monitor to retrieve the metric values.

- **Method Details**:

- **Client Creation**: Creates a new QueryResources client, setting up the endpoint, credentials, and options. For each different location, a new client is required.

- **Query Options**: Sets up the query options including time grain, filter, start time, end time, and top (limit).

- **Query Execution**: Calls the QueryResources method of the Azure Monitor service client, passing the resource IDs and query options.

- **Batch Processing**: Processes resource IDs in batches of BatchApiResourcesLimit (typically 50).

- **Handle Response**: Appends the metric data from the response to the result list.

- **Process Response**: Processes the API response, updates the metric registry, and appends the collected values to the metric definitions.
6 changes: 6 additions & 0 deletions x-pack/metricbeat/module/azure/_meta/docs.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,12 @@ https://management.azure.com/ for azure PublicCloud
https://management.usgovcloudapi.net/ for azure USGovernmentCloud
Users can also use this in case of a Hybrid Cloud model, where one may define their own audiences.

`enable_batch_api` ::
_boolean_
Optional, by default is set to False. Set this to True when facing scalability issues. When configured, the azure batch api will be used
to fetch metrics of multiple resources in one api call.
Currently supported metricsets are monitor, container_registry, container_instance, container_service, compute_vm, compute_vm_scaleset, database_account.

[float]
== Metricsets

Expand Down
Loading
Loading