-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add specification for telemetry #17
base: master
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,39 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need metrics
section. There can be 2 types of them: general and implementation specific.
As a basic I think following should be specified:
- timestamps of receiving and sending message;
- size of message packets;
- content topics / pubsub topics of the messages;
- code logs of errors and warnings;
- peer ID or sender and receiver;
- message hashes;
In case we collect multi addresses we can derive IP and try to see correlation between geographical presence and availability. This approach is questionable and unless we need it we should drop it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relay metrics needed:
- mesh state: peers discovered, connections established;
- any errors with peers;
This is a follow-up from today's Bug Bash meeting.
What else can we add, @jm-clius @chaitanyaprem ?
|
||
#### Group Chats | ||
|
||
#### Contact Requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should understand what is the center of our interest for this RFC. It seems Waku and not Status (there should be one specific for them).
Status as an app will have to collect protocol level telemetry and on top of it application level.
Ideally for collecting Waku - they shouldn't care and we should provide some framework. I might be:
- a way to get data from within a node;
- or turn on or off Telemetry service embedded into running Waku node (makes sense if we start running our own nodes with embedded telemetry to measure network);
@@ -0,0 +1,39 @@ | |||
--- | |||
title: TELEMETRY | |||
name: Waku v2 Telemetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose to add Metrics usage
section that align with our building in public
principle.
In that section we should explain what knowledge we are aiming to get from the collected data and re-assure a reader in us not having any way of personalizing collected data etc. This might come in handy in case we'd need to communicate is as a Privacy policy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think beyond the metruc usage, we need to state what we are trying to learn so we can justify the work to ourselves. Not really any point collecting random data if you we don't know how we are going to analyse it.
@@ -0,0 +1,39 @@ | |||
--- | |||
title: TELEMETRY | |||
name: Waku v2 Telemetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stop using the v2
term
name: Waku v2 Telemetry | |
name: Waku Telemetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's more of a meta concern cc @jm-clius
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. We're certainly not consistent here yet, but:
- the name of a new spec should no longer include "v2" (just call it "Waku Telemetry")
- the title of specs should preferably be prefixed with
WAKU2-
(i.e. I proposeWAKU2-TELEMETRY
as title)
I've updated the contribution guidelines to reflect this: https://github.com/waku-org/specs/blob/master/README.md#contributions
Two strategies for tracking message reliability:
Note that for B, both users must be running the Waku application at the same time, since we do not expect "offline" nodes to receive a historical message via Filter Message that was sent but never received by store node is worst case Types of metrics, sorted by priority:
|
Should telemetry implementation be purely external or purely internal? i.e. with current work in If purely internal, then telemetry becomes more closely coupled with core Waku specs/implementation but allows for more granular metric collection. |
There could be merit of having telemetry at both level. For now, let's focus telemetry at application level. We can review if Waku level telemetry is something wanted later. |
I think this is not necessary anymore because of the direction we have taken for telemetry. Let's review as part of 2025 H1 roadmap but it looks like having Prometheus metrics is good enough for what we need and we can move in this direction. I think a spec was more necessary when we had a complex behaviour as per the original telemetry design. |
No description provided.