Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple opid-fields #1359

Open
dmig opened this issue Dec 20, 2024 · 7 comments
Open

Support for multiple opid-fields #1359

dmig opened this issue Dec 20, 2024 · 7 comments
Labels
enhancement Feature Request

Comments

@dmig
Copy link

dmig commented Dec 20, 2024

Is your feature request related to a problem? Please describe.
There are multiple cases, when logs may have at least 2 event-chain identifiers.
For example, swarm of microservices will have different tids/pids, but usually there is some request_id, which is the main chain identifier. Sometimes it's necessary to follow another identifier -- thread_id in my case, other cases might exist.

Describe the solution you'd like

  • An option to set multiple opid fields for the format, f.e.:
    {
      // ...
      "opid-fields": ["request_id", "thread_id"],
      // ...
    }
  • UI command(s) to select one to follow, f.e.: 'set-opid_field ...', 'next-opid-field'.
  • Status message stating currently selected opid field or UI element displaying it.

Describe alternatives you've considered

  • As a quick-n-dirty solution: make opid-field configurable from UI. This will allow manual binding commands to change current field.
  • Just using filters for other fields than opid-field.
@dmig dmig added the enhancement Feature Request label Dec 20, 2024
@plan44
Copy link

plan44 commented Jan 13, 2025

I also would welcome multiple opid fields for a slightly different use case: A pair of origin and target opid fields to parse logs (matter smarthome SDK logs) that chain messages as follows:

[2025-01-10 19:51:41.947 N] ✅ CHIP:EM  ➡️ >>> [E:23470r S:47571 M:229067918] (S) Msg RX from 1:00000000506FB0E7 [98C0] --- Type 0001:03 (IM:SubscribeRequest)

..further down in log...

[2025-01-10 19:51:43.107 N] ✅ CHIP:EM  ➡️ <<< [E:23470r S:47571 M:177217589 (Ack:229067918)] (S) Msg TX to 1:00000000506FB0E7 [98C0] [UDP:192.168.59.151:51511] --- Type 0001:05 (IM:ReportData)

Here the M: field in the first line corresponds with the Ack: field in the second line. So M: would be the origin field and Ack: would be the target field (for going forward).

When going forward with "o", the contents of the origin field should be used to find the next line where it matches the target field. Whereas "O" should take the contents of the target field and search back for a line where it matches the origin field.

@tstack
Copy link
Owner

tstack commented Jan 13, 2025

I think both cases can be served by writing a script and adding some key bindings.

In the top-of-tree, I've added a find-msg script that can be used to move to the previous/next message with a field that matches the value of the field in the focused message. For example, when viewing an access_log, you might want to move to the previous/next log message with a matching cs_referer value. You could bind the F3 and F4 keys to use the find-msg script by running the following :config commands:

:config /ui/keymap-defs/default/f3/command |find-msg prev access_log cs_referer
:config /ui/keymap-defs/default/f4/command |find-msg next access_log cs_referer

The arguments to the script are the direction (i.e. prev/next), log format name (access_log), and the field name (cs_referer). The script then executes some SQL queries to locate the message and set the selection in the view.

@plan44 Do you have a custom log format for the logs that you posted that extract the fields in the message? Do you need help writing one?

Unfortunately, the performance for this script might not be very good with large amounts of logs. I'm currently in the process of trying to address these performance issues in the top-of-tree, but haven't finished yet. There are still some bugs to work out.


Let me also give a little background...

The "opid" field is a bit special since it is recorded in the internal index that is built up when lnav is first scanning a log file. So, scanning for a matching opid is pretty quick since we don't need to reload the log messages from the disk to check their value. Supporting multiple opid fields would mean expanding the size of the index and increasing the initial indexing time, which is not something I'd like to do. Ultimately, it's more advantageous for me to improve the SQL indexing performance since that is generally useful.

@plan44
Copy link

plan44 commented Jan 14, 2025

Oh, great, I guess find-msg will do the trick. Thanks!

@plan44 Do you have a custom log format for the logs that you posted that extract the fields in the message? Do you need help writing one?

Yes, at least I tried to (thanks to the guide how to do it on regex101) , but it only seems to work half way. The regex parses the messages correctly as it seems, as the fields show up when I press "p" on a log line. I also see the table schema for my format (called p44mbrd) when I type ;.schema. But the virtual table p44mbrd does not exist (no such table: p44mbrd). As a total lnav newbie, I haven't figured out why. But I guess find-msg can't work before the log is correctly represented by the virtual table.

(a little later:) I seem to make a fundamental error 😵‍💫 . In no case I ever got a search table to work, no matter if I specify it manuall with :create-search-table chipmsg bound to M:(?<msgcnt>\d+) (without any custom format involved) nor as part of my custom format. In both cases the CREATE statement in ;.schema looks right. When I try to access them with ;select * from msgcnt or ;select * from p44mbrd, the former query runs but always has zero rows, and the latter always says no such table: p44mbrd.

I really tried to figure it out myself, but now I'm out of ideas what to try or read up. Here's my custom format for reference: p44mbrd.json, and some raw log text of that format.

(again some hours later:) Unstuck! I found what silently made my custom format fail: I had log_level and timestamp in values, too, which probably caused duplicate fields when sqlite3 tried to actually create the table. Now that works 😄. Would there been a way to see that table creation error somewhere in a internal log?

@plan44
Copy link

plan44 commented Jan 14, 2025

@tstack I think both cases can be served by writing a script and adding some key bindings.

Now that my custom format works, I could try this and yes, problem solved! I had to modify the script because of the different origin/target fields, so I added a fourth argument which specifies the field to search for. In those matter exchanges, the current value needs to be taken from the M: field, but searched in the Ack: field (and the other way round when searching backwards).

Here's the find-chip-msg script, now bound to F3/F4 with:

:config /ui/keymap-defs/default/f3/command |find-chip-msg next p44mbrd_log chip_msgcnt chip_ackcnt
:config /ui/keymap-defs/default/f4/command |find-chip-msg prev p44mbrd_log chip_ackcnt chip_msgcnt

Thanks for the help and the fantastic tool! Wish I had discovered it a long time ago 😉

@tstack
Copy link
Owner

tstack commented Jan 15, 2025

I had log_level and timestamp in values, too, which probably caused duplicate fields when sqlite3 tried to actually create the table.

You can run lnav with the -d <path> option to have it write a debug log to the given path. Using the "p44mbrd.json" file that you attached, I see this error:

2025-01-14T21:11:32.296-08:00 E t0 sql_util.cc:552 (1) duplicate column name: log_level in "CREATE TABLE p44mbrd  (
  log_line        INTEGER,                         -- The line number for the log message
  log_time        DATETIME,                        -- The 

I've pushed a change that raises an error if a value is defined with a builtin name.

I had to modify the script because of the different origin/target fields, so I added a fourth argument which specifies the field to search for.

Nice work on modifying the script!

tstack added a commit that referenced this issue Jan 15, 2025
@tstack
Copy link
Owner

tstack commented Jan 15, 2025

@plan44 Would other people be interested in the log format you made? Should I add it to the set of defaults?

@plan44
Copy link

plan44 commented Jan 15, 2025

@plan44 Would other people be interested in the log format you made?

Maybe, yes - users of the p44mbrd matter bridge daemon, which is a (still very niche at this time) FOSS project.

The problem with matter SDK log messages is that they are usually formatted by a host application which has its own logger, so timestamp etc. are different depending on where it is used. p44mbrd is only one of many hosts.

Right now, I am working to create a format that can analyze Apple Log Console messages and extract matter SDK log info in the same way as for p44mbrd. The multifile/timestamp-sorted display of lnav is extremely useful for observation of interactions between two matter nodes, when there are problems (which I definitely have at hand here, that's why I needed to investigate log analysis for real in the first place...)

Should I add it to the set of defaults?

I guess it needs some work before one or the other of these formats can become a usable default. I changed the regex a few dozen times today to fine tune and fix problems...

Also, ideally, I guess there should be a common sub-format for the matter SDK part, which then could be used in different host log formats (p44mbrd, apple, homeassistant etc.). I read about this somewhere in the lnav docs, but doing that is definitely over my head at this point 😵‍💫

Here my current WIP versions for reference appleconsole_log.json, p44mbrd.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature Request
Projects
None yet
Development

No branches or pull requests

3 participants