Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NimBLE sometimes crashes/fails during connection. (IDFGH-12360) #13391

Closed
3 tasks done
SimonBrix opened this issue Mar 15, 2024 · 4 comments
Closed
3 tasks done

NimBLE sometimes crashes/fails during connection. (IDFGH-12360) #13391

SimonBrix opened this issue Mar 15, 2024 · 4 comments
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@SimonBrix
Copy link

SimonBrix commented Mar 15, 2024

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v5.2.1

Espressif SoC revision.

ESP32-S3 (QFN56) (revision v0.1)

Operating System used.

macOS

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

Custom Board

Power Supply used.

USB

What is the expected behavior?

I expect the connection to get established, or for the connection to fail "gracefully".

What is the actual behavior?

I encounter an error, where it fails to discover "services", and after that it seems to start running code in a random place in memory.

Steps to reproduce.

  1. Setup an ESP32-S3 with the NimBLE bleprph example.
  2. Setup another ESP32-S3 with a modified version of NimBLE blecent example, which disconnects every time all services are discovered to trigger a new reconnect (this project is attatched): esp-cent-repeat-pair.zip
  3. Make it keep running. In our example it sometimes fails to discover the services, and then it seems to do a "double connect", and then stops. In the actual code of our project, it also leads to it starting to execute random code when this happens, which leads to a crash. It seems to be the same place every time with the same binary, but if the binary is changed, the stack trace is different. (Sometimes it is from the constructer of something which has finished constructing way before the error occurs, in the stack trace from the project below I am not to sure what "0x40375bac: esp_restart_noos at /Users/simonbrix/esp/v5.2.1/esp-idf/components/esp_system/port/soc/esp32s3/system_internal.c:159" is about.)

Debug Logs.

The debug logs from the esp-cent-repeat-pair example:

W (70983) NimBLE_BLE_CENT: Connection number: 21
I (71984) NimBLE: GAP procedure initiated: terminate connection; conn_handle=1 hci_reason=19

I (72033) NimBLE: disconnect; reason=534 
I (72033) NimBLE: 

I (72033) NimBLE: GAP procedure initiated: discovery; 
I (72035) NimBLE: own_addr_type=0 filter_policy=0 passive=1 limited=0 filter_duplicates=1 
I (72044) NimBLE: duration=forever
I (72048) NimBLE: 

Device name: nimble-bleprph
I (72109) NimBLE: GAP procedure initiated: connect; 
I (72109) NimBLE: peer_addr_type=0 peer_addr=
I (72110) NimBLE: 7c:df:a1:e3:6c:22
I (72115) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (72128) NimBLE: 

I (72170) NimBLE: Connection established 
I (72170) NimBLE: 

I (72171) NimBLE: Connection secured

I (72683) NimBLE: encryption change event; status=0 
I (72683) NimBLE: GATT procedure initiated: discover all services

I (72883) NimBLE: GATT procedure initiated: discover all characteristics; 
I (72883) NimBLE: start_handle=1 end_handle=5

I (73083) NimBLE: GATT procedure initiated: discover all characteristics; 
I (73083) NimBLE: start_handle=6 end_handle=13

I (73283) NimBLE: GATT procedure initiated: discover all characteristics; 
I (73283) NimBLE: start_handle=14 end_handle=26

I (73583) NimBLE: GATT procedure initiated: discover all characteristics; 
I (73583) NimBLE: start_handle=27 end_handle=65535

I (73783) NimBLE: GATT procedure initiated: discover all descriptors; 
I (73783) NimBLE: chr_val_handle=8 end_handle=9

I (73883) NimBLE: GATT procedure initiated: discover all descriptors; 
I (73883) NimBLE: chr_val_handle=18 end_handle=19

I (73983) NimBLE: GATT procedure initiated: discover all descriptors; 
I (73983) NimBLE: chr_val_handle=23 end_handle=24

I (74083) NimBLE: GATT procedure initiated: discover all descriptors; 
I (74083) NimBLE: chr_val_handle=29 end_handle=65535

I (74383) NimBLE: Service discovery complete; status=0 conn_handle=1

W (74383) NimBLE_BLE_CENT: Connection number: 22
I (75384) NimBLE: GAP procedure initiated: terminate connection; conn_handle=1 hci_reason=19

I (75433) NimBLE: disconnect; reason=534 
I (75433) NimBLE: 

I (75433) NimBLE: GAP procedure initiated: discovery; 
I (75435) NimBLE: own_addr_type=0 filter_policy=0 passive=1 limited=0 filter_duplicates=1 
I (75444) NimBLE: duration=forever
I (75448) NimBLE: 

Device name: 
Device name: 
Device name: 
Device name: nimble-bleprph
I (75509) NimBLE: GAP procedure initiated: connect; 
I (75509) NimBLE: peer_addr_type=0 peer_addr=
I (75511) NimBLE: 7c:df:a1:e3:6c:22
I (75515) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (75529) NimBLE: 

I (75631) NimBLE: Connection established 
I (75631) NimBLE: 

I (75632) NimBLE: Connection secured

I (75983) NimBLE: encryption change event; status=7 
I (75983) NimBLE: GATT procedure initiated: discover all services

E (75985) NimBLE: Error: Service discovery failed; status=7 conn_handle=1

I (75992) NimBLE: GAP procedure initiated: terminate connection; conn_handle=1 hci_reason=19

I (76002) NimBLE: GAP procedure initiated: connect; 
I (76007) NimBLE: peer_addr_type=0 peer_addr=
I (76012) NimBLE: 7c:df:a1:e3:6c:22
I (76016) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (76030) NimBLE: 

I (76088) NimBLE: Connection established 
I (76089) NimBLE: 

E (76089) NimBLE: Failed to add peer; rc=2

Stack trace from out example where we built a C++ wrapper on top of NimBLE:

W (16739) Main: Connected number 5!!
I (17739) NimBLE: GAP procedure initiated: terminate connection; conn_handle=1 hci_reason=19

I (17746) NimBLE::Client: ESP_GAP_EVENT_DISCONNECT
I (17746) NimBLE::Peer: set_conn_state: 0
I (18839) NimBLE::Peer: set_conn_state: 1
I (18839) NimBLE: GAP procedure initiated: connect; 
I (18839) NimBLE: peer_addr_type=0 peer_addr=
I (18843) NimBLE: 7c:df:a1:e3:6c:22
I (18847) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (18861) NimBLE: 

I (18928) NimBLE::Client: ESP_GAP_EVENT_CONNECT 0
I (18929) NimBLE: Connection secured

I (18930) NimBLE: GATT procedure initiated: discover all services

I (18934) NimBLE::Client: Finished Connect event
I (19096) NimBLE::Client: All services registered
I (19097) NimBLE: GATT procedure initiated: discover all characteristics; 
I (19099) NimBLE: start_handle=1 end_handle=5

I (19446) NimBLE::Client: ESP_GAP_EVENT_ENC_CHANGE 0
I (19446) NimBLE::Client: Encryption successful
I (19446) NimBLE::Peer: find_description

I (19451) NimBLE::Peer: Encrypted: 1
I (19546) NimBLE::Client: BLE_HS_EDONE 1 4
I (19546) NimBLE: GATT procedure initiated: discover all characteristics; 
I (19547) NimBLE: start_handle=6 end_handle=13

I (19745) NimBLE::Client: BLE_HS_EDONE 2 4
I (19746) NimBLE: GATT procedure initiated: discover all characteristics; 
I (19747) NimBLE: start_handle=14 end_handle=26

I (20046) NimBLE::Client: BLE_HS_EDONE 3 4
I (20046) NimBLE: GATT procedure initiated: discover all characteristics; 
I (20047) NimBLE: start_handle=27 end_handle=65535

I (20245) NimBLE::Client: BLE_HS_EDONE 4 4
I (20246) NimBLE::Client: All charcteristics registered
I (20246) NimBLE::Peer: Discovered all characteristics
I (20251) NimBLE::Peer: set_conn_state: 2
W (20265) Main: Connected number 6!!
I (21265) NimBLE: GAP procedure initiated: terminate connection; conn_handle=1 hci_reason=19

I (21296) NimBLE::Client: ESP_GAP_EVENT_DISCONNECT
I (21296) NimBLE::Peer: set_conn_state: 0
I (22365) NimBLE::Peer: set_conn_state: 1
I (22365) NimBLE: GAP procedure initiated: connect; 
I (22365) NimBLE: peer_addr_type=0 peer_addr=
I (22369) NimBLE: 7c:df:a1:e3:6c:22
I (22373) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (22387) NimBLE: 

I (22433) NimBLE::Client: ESP_GAP_EVENT_CONNECT 0
I (22435) NimBLE: Connection secured

I (22435) NimBLE: GATT procedure initiated: discover all services

I (22439) NimBLE::Client: Finished Connect event
I (22746) NimBLE::Client: ESP_GAP_EVENT_ENC_CHANGE 7
E (22746) NimBLE::Client: Pairing failed - No open connection, deleting peer
E (22749) NimBLE::Client: Error discovering services; status=7
I (22755) NimBLE: GAP procedure initiated: terminate connection; conn_handle=1 hci_reason=19

I (22765) NimBLE: GAP procedure initiated: connect; 
I (22770) NimBLE: peer_addr_type=0 peer_addr=
I (22775) NimBLE: 7c:df:a1:e3:6c:22
I (22779) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (22793) NimBLE: 

I (22898) NimBLE::Client: ESP_GAP_EVENT_CONNECT 0
I (22900) NimBLE: Connection secured

I (22900) NimBLE: GATT procedure initiated: discover all services

I (22904) NimBLE::Client: Finished Connect event
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x42000001  PS      : 0x00060e30  A0      : 0x82053b2e  A1      : 0x3fcbd800  
A2      : 0x3fc9e2d0  A3      : 0x3fcbd810  A4      : 0xffffffff  A5      : 0x00000001  
A6      : 0x3fc9f01c  A7      : 0x3fcb8f8c  A8      : 0x82000001  A9      : 0x3fcbd7e0  
A10     : 0x00000000  A11     : 0x3fc9e2d0  A12     : 0x00060a20  A13     : 0x00060e23  
A14     : 0x0000ce3d  A15     : 0x3fcb8f8c  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000009  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0x00000000  
0x400570e8: memset in ROM
0x400570f3: memset in ROM



Backtrace: 0x41fffffe:0x3fcbd800 |<-CORRUPTED




ELF file SHA256: 5684348c6

Rebooting...
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0xc (RTC_SW_CPU_RST),boot:0x9 (SPI_FAST_FLASH_BOOT)
Saved PC:0x40375bac
0x40375bac: esp_restart_noos at /Users/simonbrix/esp/v5.2.1/esp-idf/components/esp_system/port/soc/esp32s3/system_internal.c:159

SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x178c
load:0x403c9700,len:0x4
load:0x403c9704,len:0xcbc
load:0x403cc700,len:0x2da0
entry 0x403c9914

More Information.

No response

@SimonBrix SimonBrix added the Type: Bug bugs in IDF label Mar 15, 2024
@espressif-bot espressif-bot added the Status: Opened Issue is new label Mar 15, 2024
@github-actions github-actions bot changed the title NimBLE sometimes crashes/fails during connection. NimBLE sometimes crashes/fails during connection. (IDFGH-12360) Mar 15, 2024
@SimonBrix
Copy link
Author

SimonBrix commented Mar 15, 2024

Here is another stack trace.
It often seems to be related to npl_os_freertos.c:451, which seems to, in this case, call Storage::readMultiPageBlob from nvs_storage.cpp. It seems to relate to when ESP_GAP_EVENT_ENC_CHANGE goes wrong because of error code 0x7.

I (17768) NimBLE: GAP procedure initiated: connect; 
I (17768) NimBLE: peer_addr_type=0 peer_addr=
I (17772) NimBLE: 7c:df:a1:e0:91:a6
I (17776) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (17790) NimBLE: 

I (17836) NimBLE::Client: ESP_GAP_EVENT_CONNECT 0
I (17838) NimBLE: Connection secured

I (17838) NimBLE::Client: Finished Connect event
I (18146) NimBLE::Client: ESP_GAP_EVENT_ENC_CHANGE 7
E (18146) NimBLE::Client: Pairing failed - No open connection, deleting peer
I (18149) NimBLE: GAP procedure initiated: connect; 
I (18154) NimBLE: peer_addr_type=0 peer_addr=
I (18159) NimBLE: 7c:df:a1:e0:91:a6
I (18163) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (18177) NimBLE: 

I (18226) NimBLE::Client: ESP_GAP_EVENT_CONNECT 0
I (18227) NimBLE: Connection secured

I (18228) NimBLE::Client: Finished Connect event
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4202c421  PS      : 0x00060e30  A0      : 0x8202c66e  A1      : 0x3fcbd7b0  
0x4202c421: nvs::Storage::findItem(unsigned char, nvs::ItemType, char const*, nvs::Page*&, nvs::Item&, unsigned char, nvs::VerOffset) at /Users/simonbrix/esp/v5.2.1/esp-idf/components/nvs_flash/src/nvs_storage.cpp:165

A2      : 0x000000fb  A3      : 0x00000010  A4      : 0x000000ff  A5      : 0x3c065d64  
A6      : 0x3fcbd830  A7      : 0x3fcbd810  A8      : 0x000003dc  A9      : 0x3fcbd7c0  
A10     : 0x3fcb8efc  A11     : 0x3fcb8efe  A12     : 0x0000001f  A13     : 0x3fcbd814  
A14     : 0x3fc9e2cc  A15     : 0x3fc9f01c  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x000003ec  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0x00000000  
0x400570e8: memset in ROM
0x400570f3: memset in ROM



Backtrace: 0x4202c41e:0x3fcbd7b0 0x4202c66b:0x3fcbd800 0x420532fb:0x3fcbd820 0x40377436:0x3fcbd840 0x4200b7c6:0x3fcbd860 0x40380ee9:0x3fcbd880
0x4202c41e: nvs::Storage::findItem(unsigned char, nvs::ItemType, char const*, nvs::Page*&, nvs::Item&, unsigned char, nvs::VerOffset) at /Users/simonbrix/esp/v5.2.1/esp-idf/components/nvs_flash/src/nvs_storage.cpp:164
0x4202c66b: nvs::Storage::readMultiPageBlob(unsigned char, char const*, void*, unsigned int) at /Users/simonbrix/esp/v5.2.1/esp-idf/components/nvs_flash/src/nvs_storage.cpp:508
0x420532fb: npl_freertos_event_run at /Users/simonbrix/esp/v5.2.1/esp-idf/components/bt/host/nimble/nimble/porting/npl/freertos/src/npl_os_freertos.c:451
0x40377436: ble_npl_event_run at /Users/simonbrix/esp/v5.2.1/esp-idf/components/bt/host/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:185
 (inlined by) nimble_port_run at /Users/simonbrix/esp/v5.2.1/esp-idf/components/bt/host/nimble/nimble/porting/nimble/src/nimble_port.c:279
0x4200b7c6: BW::NimBLE::bleprph_host_task(void*) at /Users/simonbrix/development/projects/uni/Bachelor/mss-320/components/NimBLEDevice/Device.cpp:19
0x40380ee9: vPortTaskWrapper at /Users/simonbrix/esp/v5.2.1/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134





ELF file SHA256: 621a4a1e8

Rebooting...
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0xc (RTC_SW_CPU_RST),boot:0x9 (SPI_FAST_FLASH_BOOT)
Saved PC:0x40375bac
0x40375bac: esp_restart_noos at /Users/simonbrix/esp/v5.2.1/esp-idf/components/esp_system/port/soc/esp32s3/system_internal.c:159

SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x178c
load:0x403c9700,len:0x4
load:0x403c9704,len:0xcbc
load:0x403cc700,len:0x2da0
entry 0x403c9914

@rahult-github
Copy link
Collaborator

Hi @SimonBrix ,

We have worked on fixing multiple cases to avoid multiple connection / abrupt disconnection which lead to such behaviours ( BLE_GAP_ENC_CHANGE with reason 7 implies no connection exists ) .

I tried your application on S3 for latest 5.2 release branch and i don't observe any failure for over 100 runs. Suggest to check with latest branch code.

success_log.txt

Closing this issue currently, but feel free to reopen in case of any updates / queries.

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Opened Issue is new labels Mar 4, 2025
@AxelLin
Copy link
Contributor

AxelLin commented Mar 5, 2025

Hi @SimonBrix ,

We have worked on fixing multiple cases to avoid multiple connection / abrupt disconnection which lead to such behaviours ( BLE_GAP_ENC_CHANGE with reason 7 implies no connection exists ) .

I tried your application on S3 for latest 5.2 release branch and i don't observe any failure for over 100 runs. Suggest to check with latest branch code.

@rahult-github
The 5.2 branch missed several recent nimble fixes because the submodule is not up to date.
Could you update nimble submodule for esp-idf 5.2 brnach?

@rahult-github
Copy link
Collaborator

Hi @AxelLin , update has already happened internally. Should be visible on github once the sync happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

4 participants