Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues when stepping into a static library #110

Open
sven-hoek opened this issue Jul 21, 2024 · 7 comments
Open

Issues when stepping into a static library #110

sven-hoek opened this issue Jul 21, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@sven-hoek
Copy link

Not sure if stepping into the static library is the issue but it seems to happen whenever I try to step into or break in a statically-linked library function.

  • Bloom version: 1.0.0 (.deb, installed on Ubuntu 22.04)
  • Both library and firmware were built with -Og.
  • Debugging an ATmega328PB via DebugWire through an Atmel-ICE
  • Error message when trying to step into the function:
    [ERROR] Failed to decode AVR8 opcode at byte address 0x000001be - the instruction will have to be intercepted. Please enable debug logging, reproduce this message and report as an issue via https://bloom.oscillate.io/report-issue
    
  • Error message from avr-gdb:
    Breakpoint 1, main () at .././main.c:82
    82	    CH9120Initialisation();  // initialize the CH9120 Serial to Ethernet controller
    ../../src/gdb/gdbtypes.c:931: internal-error: type* create_range_type(type*, type*, const dynamic_prop*, const dynamic_prop*, LONGEST): Assertion `TYPE_LENGTH (index_type) > 0' failed.
    A problem internal to GDB has been detected,
    further debugging may prove unreliable.
    

When debugging within VSCode, the Debugging stops but when I run avr-gdb in the terminal and connect to Bloom's gdb-server, I can continue, though never able to step into a library function.
Bloom itself doesn't always crash and so far I couldn't really pin down when it does. It happens when I am a few lines above a library function call and then try to step over (but not even the library function call itself)...I'll also report the bug on the avr-gdb side.

If there's anything else I could try or any info I could provide, let me know.

@navnavnav navnavnav self-assigned this Jul 21, 2024
@navnavnav
Copy link
Member

navnavnav commented Jul 21, 2024

Hey @sven-hoek

Thanks for reporting this.

When range stepping is enabled, Bloom attempts to analyze all instructions within the given range, and intercept those that may take the target outside of the range. The error message suggests that Bloom was unable to decode a particular opcode, and so it was forced to intercept that instruction, as we don't know what it will do. However, that error is not considered to be fatal and should not result in Bloom crashing or shutting down abruptly.

A few things I need from you, please:

  • What version of GDB are you currently using?

  • Could you enable debug logging, reproduce that error, and then send me the full debug log? To enable debug logging, set the debugLogging param to true in the root node of your bloom.yaml:

    debugLogging: true
    
    environments:
    ...

    You may have to put the debug log into a text file, and attach it to your comment, as GitHub has a char limit for comments.

  • Can you provide a dump of program memory, around the address at which Bloom failed to decode the opcode (0x000001BE)? You can obtain this via GDB, using x/10b 0x000001BA - that should output 10 bytes of program memory, starting at 0x000001BA.

  • In addition to a program memory dump, it will help to know if GDB has similar issues decoding that opcode. Could you try running x/10bfi 0x000001BA in GDB? It will attempt to decode the opcodes around that address and output them.

As for the fatal error in GDB, I'm not sure if that's even related to the range step, as Bloom simply intercepts any instructions that it could not decode, so it shouldn't affect GDB at all. Have you tried disabling range stepping? Does GDB still crash? You can disable range stepping by setting rangeStepping to false in your server config, in bloom.yaml:

server:
  rangeStepping: false

If the issue in GDB is related to range stepping, you can just leave range stepping disabled, for the time being. It will result in degraded stepping performance but at least it won't crash.

@navnavnav navnavnav added the bug Something isn't working label Jul 21, 2024
@navnavnav
Copy link
Member

navnavnav commented Jul 21, 2024

Sorry, the GDB commands I provided in the previous comment, for dumping program memory, were incorrect as the address 0x000001BB is an invalid program memory address (it needs to be word-aligned, as opcodes take the form of 16-bit words). You'll want to use 0x000001BA instead. So x/10b 0x000001BA to dump program memory, and x/10bfi 0x000001BA to dump decoded instructions.

I have also revised the previous comment.

@sven-hoek
Copy link
Author

Hey @navnavnav , thanks for the quick reply and the clear instructions. Also thanks for creating this great tool including good documentation.

What version of GDB are you currently using?

> avr-gdb --version
GNU gdb (GDB) 10.1.90.20210103-git

Could you enable debug logging, reproduce that error, and then send me the full debug log?

I didn't get to reproduce Bloom crashing to capture that log but here it is with just GDB crashing. I'll upload another log if I get to the point that Bloom also crashes again.
https://gist.github.com/sven-hoek/4dee86bf9faccce8e4e4981c93a9c6c4

Can you provide a dump of program memory, around the address at which Bloom failed to decode the opcode (0x000001BE)?

x/10b 0x000001BA
0x1ba <CH9120Initialisation>:	-49	-109	14	-108	125	0	-120	-31
0x1c2 <CH9120Initialisation+8>:	-118	-107
{"token":18,"outOfBandRecord":[],"resultRecords":{"resultClass":"done","results":[]}}

In addition to a program memory dump, it will help to know if GDB has similar issues decoding that opcode. Could you try running x/10bfi 0x000001BA in GDB? It will attempt to decode the opcodes around that address and output them.

x/10bfi 0x000001BA
   0x1ba <CH9120Initialisation>:	push	r28
   0x1bc <CH9120Initialisation+2>:	call	0xfa	;  0xfa <UART1ActiveState>
   0x1c0 <CH9120Initialisation+6>:	ldi	r24, 0x18	; 24
   0x1c2 <CH9120Initialisation+8>:	dec	r24
   0x1c4 <CH9120Initialisation+10>:	brne	.-4      	;  0x1c2 <CH9120Initialisation+8>
   0x1c6 <CH9120Initialisation+12>:	rjmp	.+0      	;  0x1c8 <CH9120Initialisation+14>
=> 0x1c8 <CH9120Initialisation+14>:	break
   0x1ca <CH9120Initialisation+16>:	.word	0x0079	; ????
   0x1cc <CH9120Initialisation+18>:	ldi	r24, 0x18	; 24
   0x1ce <CH9120Initialisation+20>:	dec	r24

@sven-hoek
Copy link
Author

I haven't got to try it yet but turning off range-stepping is a good point. Though I stepped through the same code with AVARICE instead of Bloom and GDB also crashed, so there's that. It seems to be rather the GDB side or something about the code I'm debugging (the latter of which hopefully shouldn't be an issue though).

I will create a very simple project with a simple library for better experimenting. It may take a little until I get to do that but I'll update you with my findings.

@navnavnav
Copy link
Member

navnavnav commented Jul 22, 2024

Thanks for this @sven-hoek

So I can see that you have a CALL instruction at byte address 0x1bc, which is made up of two words (spanning byte address 0x1bc -> 0x1bf). But Bloom was attempting to decode the instruction at 0x1be - which is an invalid address as it points to the second word of that CALL instruction.

I was worried that Bloom may be incorrectly decoding the first word (0x1bc -> 0x1bd) as some other, single-word instruction, which would explain why it was attempting to decode the second word separately. But I've just attempted to replicate this, and it seems to be working fine for me. I used the exact same opcode as the one in your program: 0x0E947D00, which translates to CALL 0xFA, and then performed a range step to step over that instruction. Bloom correctly decoded the instruction and intercepted the destination address (0xFA), as it was outside of the requested range:

2024-07-22 01:58:50.259 BST [DS]: [DEBUG] Read GDB packet: $vCont;rfc,100:-1;c#73
2024-07-22 01:58:50.260 BST [DS]: [INFO] Handling VContRangeStep packet
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Requested stepping range start address: 0x000000fc
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Requested stepping range end address (exclusive): 0x00000100
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Issuing ReadTargetMemory command (ID: 430) to TargetController
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Delivering response for ReadTargetMemory command (ID: 430)
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Inspecting 1 instructions within stepping range (byte addresses) 0x000000fc -> 0x00000100, in preparation for new range stepping session
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Intercepting destination byte address 0x000000fa of CCPF instruction ("CALL") at byte address 0x000000fc

So, this leads me to believe that, when you attempted to step over that CALL instruction, GDB sent an invalid address range to Bloom, with a start address of 0x1be, which does not point to the beginning of any valid instruction. What's even worse: Once Bloom failed to decode the instruction at that address, it would have attempted to intercept it by placing a breakpoint there. That newly inserted breakpoint may have corrupted the CALL instruction (as it was placed in the middle of it), resulting in a corrupted program.

So I think the issue here is with GDB. But before you report this to the GDB devs, can you try reproducing the error with a newer version? Version 10 is a little old. I'm on 12.2, which works great for AVR, IMO. But I understand this may be a headache, as you may have to build it from source (unless you're willing to upgrade to a newer Ubuntu version - the newer repositories seem to host newer versions of the gdb-avr package).

@navnavnav
Copy link
Member

navnavnav commented Jul 22, 2024

It seems to be rather the GDB side or something about the code I'm debugging (the latter of which hopefully shouldn't be an issue though).

Yeah I agree. That fatal error in GDB doesn't seem to be caused by the opcode decoding error in Bloom. But whatever is causing the fatal error in GDB may also be the cause for the invalid address range that GDB is sending to Bloom. Worth keeping in mind 👍🏽

@sven-hoek
Copy link
Author

Thanks for that detailed explanation and great support.

can you try reproducing the error with a newer version? Version 10 is a little old

True, I haven't thought of that. I tried gdb 13.2 and stepping through the problematic code still seems to produce errors in GDB.

I created a very simple app that also uses a static library to see if it's any library that will cause issues. Kept everything very simple and I can step into the library code without any issues with the same toolchain as before (gdb 10). I assume the other project has some weird configuration that messes up the addressing. I will try to cleanly recreate the project and will see if the error persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

2 participants