Add specification for telemetry #17

adklempner · 2024-05-24T01:14:10Z

No description provided.

weboko · 2024-05-24T08:47:21Z

standards/application/telemetry.md

@@ -0,0 +1,39 @@
+---


I think we need metrics section. There can be 2 types of them: general and implementation specific.

As a basic I think following should be specified:

timestamps of receiving and sending message;

size of message packets;

content topics / pubsub topics of the messages;

code logs of errors and warnings;

peer ID or sender and receiver;

message hashes;

In case we collect multi addresses we can derive IP and try to see correlation between geographical presence and availability. This approach is questionable and unless we need it we should drop it.

Relay metrics needed:

mesh state: peers discovered, connections established;

any errors with peers;

This is a follow-up from today's Bug Bash meeting.

What else can we add, @jm-clius @chaitanyaprem ?

weboko · 2024-05-24T08:52:49Z

standards/application/telemetry.md

+
+#### Group Chats
+
+#### Contact Requests


We should understand what is the center of our interest for this RFC. It seems Waku and not Status (there should be one specific for them).

Status as an app will have to collect protocol level telemetry and on top of it application level.
Ideally for collecting Waku - they shouldn't care and we should provide some framework. I might be:

a way to get data from within a node;

or turn on or off Telemetry service embedded into running Waku node (makes sense if we start running our own nodes with embedded telemetry to measure network);

weboko · 2024-05-24T08:58:31Z

standards/application/telemetry.md

@@ -0,0 +1,39 @@
+---
+title: TELEMETRY
+name: Waku v2 Telemetry


I'd propose to add Metrics usage section that align with our building in public principle.
In that section we should explain what knowledge we are aiming to get from the collected data and re-assure a reader in us not having any way of personalizing collected data etc. This might come in handy in case we'd need to communicate is as a Privacy policy.

I think beyond the metruc usage, we need to state what we are trying to learn so we can justify the work to ourselves. Not really any point collecting random data if you we don't know how we are going to analyse it.

Ivansete-status · 2024-05-27T13:11:27Z

standards/application/telemetry.md

@@ -0,0 +1,39 @@
+---
+title: TELEMETRY
+name: Waku v2 Telemetry


I think we should stop using the v2 term

Suggested change

name: Waku v2 Telemetry

name: Waku Telemetry

That's more of a meta concern cc @jm-clius

Indeed. We're certainly not consistent here yet, but:

the name of a new spec should no longer include "v2" (just call it "Waku Telemetry")

the title of specs should preferably be prefixed with WAKU2- (i.e. I propose WAKU2-TELEMETRY as title)

I've updated the contribution guidelines to reflect this: https://github.com/waku-org/specs/blob/master/README.md#contributions

adklempner · 2024-06-24T01:40:07Z

Two strategies for tracking message reliability:

A:
1. User sends message with light push and notifies telemetry after receiving success confirmation from minimum number of peers (likely 1 peer)
2. Telemetry waits for a store node to have received the message
B: (more akin to end-to-end reliability)
1. User sends message with Light Push and notifies telemetry (after success)
2. Another user running Waku receives the message via Filter and notifies telemetry

Note that for B, both users must be running the Waku application at the same time, since we do not expect "offline" nodes to receive a historical message via Filter

Message that was sent but never received by store node is worst case
In each case, the latency between steps 1 and 2 can be a metric to measure.

Types of metrics, sorted by priority:

Reliability
Bandwidth
Latency

adklempner · 2024-07-12T01:18:32Z

Should telemetry implementation be purely external or purely internal? i.e. with current work in status-go and waku dogfooding, all metrics are collected without making changes within the go-waku and js-waku libraries. This makes telemetry an application-level feature (or perhaps an application in itself), and thus any changes to the implementation or spec should not affect the protocol. It also limits metrics to use data exposed by the waku libraries' public interfaces.

If purely internal, then telemetry becomes more closely coupled with core Waku specs/implementation but allows for more granular metric collection.

fryorcraken · 2024-07-26T06:37:18Z

Should telemetry implementation be purely external or purely internal? i.e. with current work in status-go and waku dogfooding, all metrics are collected without making changes within the go-waku and js-waku libraries. This makes telemetry an application-level feature (or perhaps an application in itself), and thus any changes to the implementation or spec should not affect the protocol. It also limits metrics to use data exposed by the waku libraries' public interfaces.

If purely internal, then telemetry becomes more closely coupled with core Waku specs/implementation but allows for more granular metric collection.

There could be merit of having telemetry at both level.
The issue with telemetry at Waku level is that a number of information and learning would be lost, as telemetry also hleps understand application behaviour but if it's at Waku level then application information (e.g. status message type) is lost.

For now, let's focus telemetry at application level. We can review if Waku level telemetry is something wanted later.

fryorcraken · 2024-09-30T05:25:55Z

I think this is not necessary anymore because of the direction we have taken for telemetry.

Let's review as part of 2025 H1 roadmap but it looks like having Prometheus metrics is good enough for what we need and we can move in this direction.

I think a spec was more necessary when we had a complex behaviour as per the original telemetry design.

Add specification for telemetry

d09fac5

adklempner marked this pull request as draft May 24, 2024 01:23

weboko reviewed May 24, 2024

View reviewed changes

Ivansete-status reviewed May 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add specification for telemetry #17

Add specification for telemetry #17

adklempner commented May 24, 2024

weboko May 24, 2024

weboko Jun 13, 2024

weboko May 24, 2024

weboko May 24, 2024

fryorcraken Jun 5, 2024

Ivansete-status May 27, 2024

fryorcraken Jun 3, 2024

jm-clius Aug 6, 2024

adklempner commented Jun 24, 2024

adklempner commented Jul 12, 2024

fryorcraken commented Jul 26, 2024

fryorcraken commented Sep 30, 2024

Add specification for telemetry #17

Are you sure you want to change the base?

Add specification for telemetry #17

Conversation

adklempner commented May 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adklempner commented Jun 24, 2024

adklempner commented Jul 12, 2024

fryorcraken commented Jul 26, 2024

fryorcraken commented Sep 30, 2024