[NEW] `OBSERVE` command for enhanced observability in Valkey #1167

mwarzynski · 2024-10-14T11:33:56Z

TLDR, I propose to improve observability for Valkey, like built-in RED time-series metrics

Overview

This proposal outlines a new OBSERVE command to improve Valkey’s observability capabilities. By enabling advanced time-series metrics, custom gathering pipelines, and in-server data aggregation, OBSERVE will equip Valkey users with first-class monitoring commands for granular insight into server behavior and performance.

Background

After discussions with Irfan Ahmad, an attendee at the '24 Valkey Summit, I developed this initial proposal to introduce native observability pipelines within Valkey. Currently, Valkey lacks comprehensive, customizable observability tools embedded directly within the server, and this proposal aims to fill that gap.

Note: This proposal is a work in progress. Feedback on the overall approach and any preliminary design concerns would be greatly appreciated.

Current Observability Limitations in Valkey

Currently, Valkey’s observability relies on commands like MONITOR, SLOWLOG, and INFO.

While useful, these commands have limitations:

MONITOR: Streams every command, generating high data volume that may overload production environments.
SLOWLOG: Logs only commands exceeding a set execution time, omitting quick operations and general command patterns.
INFO: Provides server statistics but lacks detailed command- and key-specific insights.

These commands lack the flexibility for in-depth, customizable observability exposed directly within the valkey-server instance,
such as filtering specific events, sampling data, executing custom processing steps, aggregating metrics over time windows.

Feature proposal

Problem statement and goals

The proposed OBSERVE command suite will bring observability as a core Valkey feature. Through user-defined “observability pipelines,” Valkey instances can produce detailed insights in a structured, efficient manner. These pipelines will be customizable to support diverse use cases, providing users with foundational building blocks for monitoring without overwhelming server resources. This new functionality could be enhanced with integration with tools like Prometheus and Grafana for visualization or alerting, though its fully customizable and primary purpose is in-server analysis.

Proposed solution -- Commands

The OBSERVE command set introduces the concept of observability pipelines — user-defined workflows for collecting, filtering, aggregating, and storing metrics.

Core Commands

OBSERVE CREATE <pipeline_name> <configuration>
Creates an observability pipeline with a specified configuration. Configuration details, specified in the next section, define steps such as filtering, partitioning, sampling, and aggregation.
Pipeline and it's configuration is persisted in the runtime memory (i.e. user needs to re-create the pipeline after server restart).
OBSERVE START <pipeline_name>
Starts data collection for the specified pipeline.
OBSERVE STOP <pipeline_name>
Stops data collection for the specified pipeline.
OBSERVE DELETE <pipeline_name>
Deletes the pipeline and its configuration.
OBSERVE RETRIEVE <pipeline_name>
Retrieves collected data. Alternatively, GET could potentially serve for this function, but further design discussion is needed.
OBSERVE LOADSTEPF <step_name> <lua_code>
Allows defining custom processing steps using Lua, for cases where built-in steps do not meet needed requirements.

Pipeline configuration

Pipelines are configured as chains of data processing stages, including filtering, aggregation, and output buffering. Format is similar to the Unix piping.

Key stages in this pipeline model include:

filter(f): Filters events based on defined conditions (e.g., command type).
partition(f): Partitions events according to a function (e.g., by key prefix).
sample(f): Samples events at a specified rate.
map(f): Transforms each event with a specified function.
window(f): Aggregates data within defined time windows.
reduce(f): Reduces data over a window via an aggregation function.
output(f): Directs output to specified sinks.

Example configuration syntax:

OBSERVE CREATE get_errors_pipeline "
filter(filter_by_commands(['GET'])) |
filter(filter_for_errors) |
window(window_duration(1m)) |
reduce(count) |
output(output_timeseries_to_key('get_errors_count', max_length=1000))
"

Output

The goal is to capture time-series metrics within the defined pipeline outputs, f.e. for the pipeline above it would be structured as follows:

[<timestamp1, errors_count1>, <timestamp2, errors_count2>, ...] // capped at 1000 items

It remains uncertain whether storing output data in a format compatible with direct retrieval via GET (or another existing command) will be feasible. Consequently, we might need to introduce an OBSERVE RETRIEVE <since_offset> command for clients polling results data. This command would provide:

{
    current_offset: <latest_returned_offset as a number>,
    data: [ ... result items ],
    lag_detected: <true or false> // true if `since_offset` points to data that’s been removed, signaling potential data loss.
}

Here, offset represents the sequence number of items produced by the pipeline, including any items removed due to buffer constraints. This approach allows clients to poll for results while adjusting their polling frequency based on the lag_detected flag. If lag_detected is true, clients would be advised to increase polling frequency to reduce data loss.

Use-Case Examples

Below are examples of how the proposed OBSERVE command and pipeline configurations could be used to address various
observability needs.

Counting Specific Commands Per Minute with Buffer Size

Use Case: Count the number of GET commands executed per minute.

Pipeline Creation:
```
OBSERVE CREATE get_commands_per_minute "
filter(filter_by_commands(['GET'])) |
window(window_duration(1m)) |
reduce(reduce_count) |
output(output_timeseries_to_key('get_command_count', buffer_size=1440))
"
```
Explanation: This pipeline filters for GET commands, counts them per every minute, and stores the counts
in a time-series key get_command_count with a buffer size of 1440 (e.g., one day's worth of minute-level data).
Hot Key Analysis

Use Case: Identify and monitor the most frequently accessed keys within a certain time window, allowing for proactive load management and identification of potential bottlenecks.

Pipeline Creation:
```
OBSERVE CREATE hot_keys_analysis "
filter(filter_by_commands(['GET'])) |
sample(sample_percentage(0.005)) |
partition(partition_by_key()) |
window(window_duration(1m)) |
reduce(reduce_count) |
map(map_top_keys(10)) |
output(output_timeseries_to_key('hot_keys', buffer_size=60))
"
```
Explanation: This pipeline filters for sampled 0.5% of GET commands, partitions events by the accessed key, and aggregates their counts in one-minute intervals.
The map_top_keys(10) step then selects the top 10 most frequently accessed keys in each interval along with the access counts.
The result is stored as a time-series in hot_keys with a buffer size of 60, retaining one hour of hot key data.
Average Latency Per Time Window with Buffer

Use Case: Monitor average latency of SET commands per minute.

Pipeline Creation:
```
OBSERVE CREATE set_latency_monitor "
filter(filter_by_commands('SET')) |
sample(sample_percentage(0.005)) |
window(window_duration(1m)) |
map(map_get_latency) |
reduce(average) |
output(timeseries_to_key('set_average_latency', buffer_size=720))
"
```
Explanation: This pipeline filters for SET commands, extracts their latency, aggregates the average latency every
minute, and stores it with a buffer size of 720 (e.g., 12 hours of minute-level data).
Client Statistics

Use Case: Gather command counts per client for GET and SET commands, sampled at 5%.

Pipeline Creation:
```
OBSERVE CREATE client_stats_per_minute "
filter(filter_by_commands(['GET', 'SET'])) |
sample(sample_percentage(0.05)) |
map(map_client_info) |
window(window_duration(1m)) |
reduce(count_by_client) |
output(timeseries_to_key('client_stats', buffer_size=1440))
"
```
Explanation: This pipeline filters for GET and SET commands, samples 5% of them, extracts client information, c
ounts commands per client every minute, and stores the data under client_stats with a buffer size of 1440.

Error Tracking

Use Case: Monitor the number of errors occurring per minute.

Pipeline Creation:

OBSERVE CREATE error_tracking_pipeline "
filter(filter_event_type('error')) | # likely filter for errors would have to be more advanced
window(window_duration(1m)) |
reduce(count) |
output(timeseries_to_key('total_errors', buffer_size=1440))
"

Explanation: This pipeline filters events of type 'error', counts them every minute, and stores the totals in tota l_errors with a buffer size of 1440.

TTL Analysis

Use Case: Analyze the average TTL of keys set with SETEX command per minute.

Pipeline Creation:
```
OBSERVE CREATE ttl_analysis_pipeline "
filter(filter_by_commands(['SETEX'])) |
map(map_extract_ttl) |
window(window_duration(1m)) |
reduce(average) |
output(timeseries_to_key('average_ttl', buffer_size=1440))
"
```
Explanation: This pipeline filters for SETEX commands, extracts the TTL values, calculates the average TTL every
minute, and stores it in average_ttl with a buffer size of 1440.
Distribution of Key and Value Sizes

Use Case: Create a histogram of value sizes for SET commands.

Pipeline Creation:
```
OBSERVE CREATE value_size_distribution "
filter(command('SET')) |
map(extract_value_size) |
window(window_duration(1m)) |
reduce(histogram(buckets([0, 64, 256, 1024, 4096, 16384]))) |
output(timeseries_to_key('value_size_distribution', buffer_size=1440))
"
```
Explanation: This pipeline filters for SET commands, extracts the size of the values, aggregates them into histog
ram buckets every minute, and stores the distributions with a buffer size of 1440.

Feedback Request

Feedback is requested on the following points:

Feature Scope: Does the proposed OBSERVE command align with your vision for Valkey’s observability?
Command Design: Are there any suggestions for the OBSERVE command syntax and structure?

Let's first reach the consensus for the 'Feature Scope'. If the answer is yes, we can discuss the designs.
I am ready to commit to building this feature as soon as the designs are accepted, even in draft form.

Thank you for your time and consideration. I look forward to discussing this proposal further.

The text was updated successfully, but these errors were encountered:

allenss-amazon · 2024-10-23T16:15:28Z

I like the concepts and directionality here. I think it would be profitable to split this into two subsections. One section would be to get more specific on the events that would feed into the observability framework.

A second section would focus on the representation and processing of those events. I mention this because there's quite a bit of overlap in the functionality of the second section and any implementation of a timestream processing module. In other words, can we get both timestream processing of the observability event stream (part 1) and more generic timestream data processing capabilities in the same development effort or conversely split the development effort into two parts that cooperate?

mwarzynski · 2024-11-05T13:40:38Z

@allenss-amazon Thank you for the insightful feedback! We’re definitely on the same page about maintaining flexibility for future enhancements, particularly around timestream processing and event-based observability.

For now, our approach is to keep the implementation streamlined and tailored to Valkey’s current state of the codebase. By initially focusing on passing command execution details directly as the input to our observability pipeline, we’ll establish a foundation that can be then adapted for timestream events in the future.

If and when we introduce events, we could replace or extend the input system for our pipeline to accommodate event-based data, allowing for similar processing while offering additional data input types. Keeping a direct dependency between command execution and the observability implementation will help us maintain a simple architecture and deliver a more focused solution in the short term.

I’ve also included a simplified diagram of this first-approach implementation to illustrate the flow we envision. I’d love to hear your thoughts on whether this approach makes sense. What do you think?

Thanks again for the input—we really appreciate the foresight around extensibility here!

allenss-amazon · 2024-11-06T03:20:52Z

I think it's important to provide an architecture that's lightweight enough that it could be enabled nearly everywhere -- maybe even by default. That means that the data collection needs to be fast. I think it's important to get more specific about the kinds of data events you're going to rely on and how this data is generated.

mwarzynski · 2024-11-06T14:09:11Z

@allenss-amazon:

That means that the data collection needs to be fast. I think it's important to get more specific about the kinds of data events you're going to rely on and how this data is generated.

Absolutely. Performance is one of the primary goals.

My initial approach is to check whether any observe functionality needs to do something after command executes.
This check is performed through observePostCommand at the end of the call function.

Currently, this creates an observeUnit that captures basic information about each command execution:

typedef struct observeUnit {
    int command_id;

    robj **argv;
    size_t argv_len;

    size_t response_size_bytes;

    long long duration_microseconds;
} observeUnit;

If the observe functionality is disabled or unconfigured, there’s no impact on command execution performance—it’s just a simple check of a boolean flag. If it’s enabled, I construct this unit on the stack and attempt to process it through the pipeline. (Note that the pipeline processing part hasn’t been implemented yet.)

I think it's important to provide an architecture that's lightweight enough that it could be enabled nearly everywhere -- maybe even by default.

At this early stage, the structs feel lightweight and execution should be 'fast', but I anticipate that 'data gathering' may need additional complexity. Currently, observeUnit is constructed from client and duration, but our pipelines may require more detailed information. Especially if we will want to allow the pipelines to do some fancy stuff.
One example of required enhancements could be capturing command errors -- at the moment there is no straightforward, efficient method to access the OK/ERROR result.

@allenss-amazon What do you think about my approach? Do you have any suggestions for enhancing 'data gathering' for observability?

allenss-amazon · 2024-11-07T18:32:25Z

I'm skeptical of the "one size fits all" interface style here. For example, using the end of call as your insertion point will miss all client commands that block. But that clearly can be fixed by tapping into the unblocking machinery also.

This also misses all non-command related activity -- cluster bus, evictions, etc. Which I think could also be quite interesting. I'd propose that we implement a mechanism to self-monitor the existing metrics in the core. For example, having a periodic task to execute an "info" command and collect the various values into samples to feed into the machinery could also be quite valuable.

The automatic info scheme has a low enough overhead (you adjust the frequency of collection to match your CPU wallet ;-)) that it could reasonably be left on in any production environment. It also creates an incentive to increase instrumentation in the core.

mwarzynski · 2024-11-11T14:47:06Z

@allenss-amazon:

I'm skeptical of the "one size fits all" interface style here. For example, using the end of call as your insertion point will miss all client commands that block. But that clearly can be fixed by tapping into the unblocking machinery also.

The primary goal is to design observability pipelines with enough flexibility to support a variety of input sources. In my initial implementation, I focused on integrating the first input source as 'command executions.' However, with the right design, we should be able to expand this model to support additional sources or even rework the data push mechanism to accommodate a new system based on event streams.

Here’s a diagram illustrating this approach:

The concept is that multiple sources can feed into the Observe Units Processor, which will process them through the pipelines. Additionally, the Observe Units Processor should allow us to easily implement new input sources. (I may need some guidance on how best to structure this in the code.)

This also misses all non-command related activity -- cluster bus, evictions, etc. Which I think could also be quite interesting. I'd propose that we implement a mechanism to self-monitor the existing metrics in the core. For example, having a periodic task to execute an "info" command and collect the various values into samples to feed into the machinery could also be quite valuable.

You’re right that the initial input source doesn’t cover several internal cluster activities (and blocking commands), but the design should allow us to extend the list of sources in future iterations. Structuring this properly may require some guidance, particularly to ensure compatibility with all potential Valkey input streams.

Output from the INFO command is already accessible to the Clients, so I wonder—if we limited ourselves to just these results, would it be valuable enough to build out the entire observability pipeline? It’s feasible for users to set up a custom client to periodically fetch INFO data and compute time-series metrics. To bring real value to the observability pipeline, I aimed to start with something less/not accessible in the current Valkey feature set, while aligning with Google's 4 Golden Signals. Hence, I opted to implement this at the command level (though we could debate if this approach is ideal).

What do you think? Are there any other arguments for limiting ourselves to the INFO results besides simplicity / incentives?

virtualirfan · 2024-11-11T16:53:08Z

@allenss-amazon said:

having a periodic task to execute an "info" command and collect the various values into samples to feed into the machinery could also be quite valuable.

This is an excellent idea.

@mwarzynski said:

The concept is that multiple sources can feed into the Observe Units Processor, which will process them through the pipelines. Additionally, the Observe Units Processor should allow us to easily implement new input sources. (I may need some guidance on how best to structure this in the code.)

@mwarzynski I like your approach to Allen's concept … IIRC, you want to split (a) the data collection (which can include filters for efficiency) and (b) the trigger from (c) what you term "Observe Units Processing". Then the "data collection" can come (i) command post processing like you're prototyping, or (ii) triggered by another command's execution including but not limited to the INFO command, or (iii) even timers for fully internal sampling.

This frees us up to focus on multiple different parts of the implementation. We can discuss the programmable pipeline implementation while keeping the "data collection" part open for significant extension as we more use cases emerge.

@mwarzynski said:

Output from the INFO command is already accessible to the Clients, so I wonder—if we limited ourselves to just these results, would it be valuable enough to build out the entire observability pipeline? It’s feasible for users to set up a custom client to periodically fetch INFO data and compute time-series metrics.

I think @allenss-amazon's idea is not to limit us to INFO but rather to make INFO command execution be able to feed the processing pipeline as a data source in addition to what you have already been working on. I gather from Allen's comment that the flexibility to be able to feed data (e.g. INFO or other commands) would motivate developers to feed more data into the observability pipeline and potentially have it "always on".

Overall, I see the conceptual two stages "collection" and "programmable processing" as a powerful combination.

Also note that pulling detailed command results (e.g. from INFO) out to the client is very expensive which might limit use cases. However, as Allen hints, having verbose output instead go into a server-side pipeline, especially with filters, could make it feasible to run "always on" observability.

@allenss-amazon Please correct as appropriate … I don't want to go in a direction that you didn't intend.

Overall, I think we have immediate use cases for memory sizing that require data collection from observePostCommand so I'd appreciate if we can start with that and add support for (x) INFO (among other commands) as input, (y) time/event/notifications triggers as a second step. Mostly, I just want to lock down the initial scope so that @mwarzynski can start work on the programmable pipeline part which is where we will discover how tricky this business is.

allenss-amazon · 2024-11-12T22:44:53Z

Yes, multiple sources of data feeding the analysis engine. I believe a time-series processing module is in Valkey's future (likely with fidelity to the Redis time series module) and that this observability proposal should use that module as it's analysis engine rather than something unique. Thus the discussion could bifurcate into two threads, one about time-series processing and this thread which focuses on data collection mechanisms.

I proposed a source of data, which is a periodic self-sampling of the "INFO" metrics. An initial implementation of this would trivially be built by having a periodic timer recursively invoke the INFO command and parse the results. Long term, I envision this as driving a re-architecting of stat collection within the Valkey universe to avoid the serialize/deserialize overhead of this approach, gaining efficiency and therefore usability. This would also provide a degree of uniformity in format and semantics for info stats as well as a reflection mechanism (i.e., command getkeysandflags for info stats) that could drive more generic tools like a grafana connector.

@mwarzynski proposed a source of data which is to tap info the command processing and invoke a LUA script with the command, it's arguments and execution time. This is simple, but expensive in that it's going to duplicate a lot of the work that the core already does for you. For example, rather than a single tap-in point for all commands, why not have a per-command tap-in point? I mean the ability to establish a separate LUA script to be invoked for each command.. This would avoid needless LUA execution for commands that aren't of interest. Also the per-command LUA scripts run faster because the command parsing is already completed. With that thought in mind there are other potential tap-in points. For example, leveraging the ACL infrastructure would allow you to tap into core code that validates read and write access for keys independent of the commands., again something that reduces redundant parsing overhead. I'm sure there will be more points that would prove profitable.

If we're in a world of multiple data sources and LUA scripts, then we should think about how those different LUA environments interact. Is there a single global LUA environment for all of OBSERVE or is there a need for multiple environments?

Shivshankar-Reddy mentioned this issue Oct 15, 2024

[Minor] Remove 'posting in the mailing list' in contributing.md #1174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NEW] `OBSERVE` command for enhanced observability in Valkey #1167

[NEW] `OBSERVE` command for enhanced observability in Valkey #1167

mwarzynski commented Oct 14, 2024 •

edited

Loading

allenss-amazon commented Oct 23, 2024

mwarzynski commented Nov 5, 2024

allenss-amazon commented Nov 6, 2024

mwarzynski commented Nov 6, 2024

allenss-amazon commented Nov 7, 2024 •

edited

Loading

mwarzynski commented Nov 11, 2024

virtualirfan commented Nov 11, 2024

allenss-amazon commented Nov 12, 2024

[NEW] OBSERVE command for enhanced observability in Valkey #1167

[NEW] OBSERVE command for enhanced observability in Valkey #1167

Comments

mwarzynski commented Oct 14, 2024 • edited Loading

Overview

Background

Current Observability Limitations in Valkey

Feature proposal

Problem statement and goals

Proposed solution -- Commands

Core Commands

Pipeline configuration

Output

Use-Case Examples

Feedback Request

allenss-amazon commented Oct 23, 2024

mwarzynski commented Nov 5, 2024

allenss-amazon commented Nov 6, 2024

mwarzynski commented Nov 6, 2024

allenss-amazon commented Nov 7, 2024 • edited Loading

mwarzynski commented Nov 11, 2024

virtualirfan commented Nov 11, 2024

allenss-amazon commented Nov 12, 2024

[NEW] `OBSERVE` command for enhanced observability in Valkey #1167

[NEW] `OBSERVE` command for enhanced observability in Valkey #1167

mwarzynski commented Oct 14, 2024 •

edited

Loading

allenss-amazon commented Nov 7, 2024 •

edited

Loading