01 What semihosting is & why use it
Semihosting is an ARM mechanism that lets code running on the target (your STM32L4R5) ask the host PC — through the debugger — to perform I/O on its behalf. printf, fopen, scanf and friends run on the chip, but the actual reading and writing happen on your workstation.
The target has no display, no filesystem, and — during early bring-up — often no working UART yet. Semihosting borrows the host's. When the target executes a special trap instruction, the debug agent (OpenOCD, pyOCD, J-Link, etc.) halts the core, inspects the requested operation, executes it on the host, writes the result back into target memory/registers, and resumes the core. The application never knows the difference: to it, stdout is just a file handle.
BKPT 0xAB traps.The single biggest reason to reach for it: it needs no peripheral. No USART, no pins, no clocks, no DMA, no level shifter, no USB-serial cable. If SWD works — which it must, or you could not flash the part — semihosting works. That makes it the fastest possible way to get a printf out of a board on day one.
A normal UART printf on a Nucleo-L4R5ZI would tie up USART2 on PA2/PA3 (AF7), plus a clock enable and a baud-rate calc. Semihosting uses none of that — the debug port you already have is the transport. That is exactly why it shines before your board bring-up code exists.
Use it for
- Day-one bring-up: is my clock tree alive? did
main()even reach line 1? - Boards with no free UART, or where the UART is the thing you are debugging.
- Quick host file access from the target (read a test vector, dump a log to a file).
Do not ship it
- It halts the core on every call (see §08) — never leave it in a real-time or production build.
02 The BKPT syscall mechanism
Semihosting is a software-interrupt calling convention. On a Cortex-M the trap is the breakpoint instruction BKPT 0xAB (encoding 0xBEAB). The immediate 0xAB is what tells the debugger "this halt is a semihosting request, not a user breakpoint."
The convention is deliberately tiny:
- Load the operation number into
r0. - Load a pointer to the parameter block (or, for one-argument calls, the value itself) into
r1. - Execute the trap. The core halts; the debugger reads
r0/r1, walks the parameter block in target RAM, and does the work. - The debugger writes the return value back into
r0and resumes the core past theBKPT.
/* A generic semihosting call for Cortex-M.
* r0 = operation number, r1 = pointer to arg block.
* Returns the debugger's result value (also in r0). */
static inline int semihost(int op, void *arg_block)
{
register int r0 asm("r0") = op; // operation -> r0
register void *r1 asm("r1") = arg_block; // arg pointer -> r1
asm volatile(
"bkpt #0xAB" // the semihosting trap (0xBEAB)
: "+r"(r0) // r0 is read AND written (return value)
: "r"(r1) // r1 is an input
: "memory"); // debugger may touch our RAM
return r0;
}
Every semihosting service — SYS_WRITE0, SYS_OPEN, SYS_READC — is just this one instruction with a different r0 and a different parameter block layout. newlib's librdimon wraps exactly this behind _write, _read, _open, so that printf ends up here without you writing a line of assembly.
On Cortex-M, BKPT with no debugger attached escalates to a HardFault (the debug event is unhandled). A binary linked with semihosting therefore hangs or faults on its own when you power the board from a wall adapter with no probe running. This single fact drives most of the gotchas in §09.
03 Operation numbers & trap encodings
The full operation set is defined in the ARM "Semihosting for AArch32 and AArch64" specification (ARM-software/abi-aa). Below are the ones you actually hit; the two that matter for a first printf are SYS_WRITE0 (0x04) and SYS_WRITE (0x05).
| op (r0) | Name | r1 parameter block | Returns (r0) |
|---|---|---|---|
| 0x01 | SYS_OPEN | [ name_ptr, mode, name_len ] | host file handle, or −1 |
| 0x02 | SYS_CLOSE | [ handle ] | 0 on success |
| 0x03 | SYS_WRITEC | pointer to one char | void |
| 0x04 | SYS_WRITE0 | pointer to a NUL-terminated string | void |
| 0x05 | SYS_WRITE | [ handle, data_ptr, len ] | bytes not written (0 = ok) |
| 0x06 | SYS_READ | [ handle, buf_ptr, len ] | bytes not read |
| 0x07 | SYS_READC | none | one char read from console |
| 0x09 | SYS_ISTTY | [ handle ] | 1 if interactive tty |
| 0x0A | SYS_SEEK | [ handle, abs_pos ] | 0 on success |
| 0x0C | SYS_FLEN | [ handle ] | file length, or −1 |
| 0x13 | SYS_ERRNO | none | host errno of last call |
| 0x15 | SYS_GET_CMDLINE | [ buf_ptr, len ] | command line into buf |
| 0x16 | SYS_HEAPINFO | pointer to 4-word block (filled in) | heap/stack limits |
| 0x18 | SYS_EXIT | reason code (0x20026 = ApplicationExit) | does not return |
To open the host console explicitly you SYS_OPEN the special filename ":tt" — mode 0 gives stdin, mode 4 gives stdout, mode 8 gives stderr. This is exactly what newlib's initialise_monitor_handles() does under the hood to wire up handles 0/1/2.
Trap instruction per ISA
The immediate is the same idea everywhere, but the instruction differs by architecture. On the STM32L4R5 (M-profile) it is always the BKPT row.
| ISA | Instruction | Encoding | Applies to |
|---|---|---|---|
| A32 (ARM) | SVC #0x123456 | 0xEF123456 | Cortex-A/R in ARM state |
| T32 (Thumb) | SVC #0xAB | 0xDFAB | Cortex-A/R in Thumb state |
| M-profile T32 | BKPT #0xAB | 0xBEAB | Cortex-M0/M3/M4/M7 → STM32L4R5 |
Cortex-M reserves SVC for the RTOS/OS supervisor call and routes it through the normal exception vector — the debugger never sees it. BKPT instead generates a debug event that the halting debug logic catches directly. That is why M-profile semihosting uses BKPT 0xAB and A/R-profile uses SVC.
04 Toolchain wiring: newlib rdimon
You rarely write the BKPT yourself. The GNU Arm Embedded toolchain ships librdimon (the "remote debug monitor" flavour of libgloss) whose syscall stubs — _write, _read, _open, _close, _lseek, _sbrk, _exit — are implemented on top of semihosting. Point the linker at it with --specs=rdimon.specs and printf just works.
The three specs files you will meet
| Spec | Library | What the syscalls do | When |
|---|---|---|---|
--specs=nosys.specs | libnosys | Stubs return −1 / do nothing. printf goes nowhere. | No I/O; link cleanly with no host. |
--specs=rdimon.specs | librdimon | Syscalls become semihosting BKPT traps → host. | Semihosting (this guide). |
--specs=nano.specs | newlib-nano | Size-optimised libc; orthogonal to where I/O goes. | Shrink code; combine carefully (see §09). |
Link flags
Add the rdimon spec, then link the C library and the rdimon stubs. Wrapping them in a link group resolves the circular references between libc and librdimon:
# --- CPU / FPU: STM32L4R5 is Cortex-M4 with single-precision FPU ---
CPU = -mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=hard
# --- Compile ---
CFLAGS = $(CPU) -Og -g3 -Wall -ffunction-sections -fdata-sections
# --- Link: rdimon.specs routes newlib syscalls to semihosting ---
LDFLAGS = $(CPU) --specs=rdimon.specs -Wl,--gc-sections \
-T STM32L4R5ZI_FLASH.ld \
-Wl,--start-group -lc -lrdimon -Wl,--end-group \
-Wl,-Map=firmware.map
--specs=rdimon.specs already selects a startup and libc set. Naively adding --specs=nano.specs as well is a documented way to make printf hang. If you need newlib-nano and semihosting, use --specs=nano.specs --specs=rdimon.specs in that order and test it; when in doubt, use rdimon.specs alone.
initialise_monitor_handles()
Before any printf, the rdimon runtime must open the console (:tt) and set up handles 0/1/2. The function that does this is initialise_monitor_handles(). It is provided by librdimon; you declare it extern and call it once, first thing in main():
#include <stdio.h>
/* Provided by librdimon. No header ships it, so declare it. */
extern void initialise_monitor_handles(void);
int main(void)
{
initialise_monitor_handles(); // MUST run before the first printf
printf("STM32L4R5 alive, HCLK bring-up OK\n");
for (int i = 0; i < 3; i++)
printf("tick %d\n", i);
while (1) { }
}
initialise_monitor_handles() itself issues semihosting SYS_OPEN traps. Semihosting must already be enabled on the debug agent (next section) before this call runs, or the very first BKPT HardFaults on "an unexpected debug event." Enable in the debugger, then let the core reach main().
05 Enabling in OpenOCD / GDB
The target's traps do nothing until the debug agent is told to service them. In OpenOCD the command is arm semihosting enable; from GDB you issue it as a monitor command. The Nucleo-L4R5ZI's on-board ST-LINK is the SWD probe.
Start the OpenOCD server
# Option A: generic ST-LINK interface + STM32L4 target
openocd -f interface/stlink.cfg -f target/stm32l4x.cfg
# Option B: the ready-made Nucleo-L4 board file (does both)
openocd -f board/st_nucleo_l4.cfg
# OpenOCD now listens: GDB on :3333, telnet on :4444
Drive it from GDB
Connect, halt, enable semihosting, flash, and run. The critical line is monitor arm semihosting enable — issued before continue so it is active by the time initialise_monitor_handles() runs.
target extended-remote localhost:3333
monitor reset halt
monitor arm semihosting enable
load
continue
# printf output now streams to the OpenOCD console window.
arm-none-eabi-gdb firmware.elf -x run.gdb
# or fully inline:
arm-none-eabi-gdb firmware.elf \
-ex "target extended-remote localhost:3333" \
-ex "monitor reset halt" \
-ex "monitor arm semihosting enable" \
-ex "load" \
-ex "continue"
Where the output appears & useful OpenOCD sub-commands
Command (OpenOCD / monitor …) | Effect |
|---|---|
arm semihosting enable | Service BKPT 0xAB; output goes to the OpenOCD console. |
arm semihosting disable | Stop servicing — traps will then HardFault the target. |
arm semihosting_fileio enable | Route file I/O through GDB's own fileio channel instead of OpenOCD. |
arm semihosting_redirect tcp <port> all | Redirect semihosting I/O to a TCP socket (newer OpenOCD) — read it with nc localhost <port>. |
arm semihosting_resexit enable | Treat SYS_EXIT as "leave the debug session" rather than just halting. |
ST's own ST-LINK_gdbserver (used by default in STM32CubeIDE) does not implement semihosting. Use OpenOCD as the GDB server for semihosting, or switch CubeIDE's debug configuration from "ST-LINK (ST-LINK GDB server)" to "ST-LINK (OpenOCD)" and add the monitor arm semihosting enable line under the Startup tab. J-Link and pyOCD also support it.
06 Full build + run example
A complete, copy-pasteable flow: source, Makefile, GDB script, and the exact commands. This assumes the standard STM32CubeMX-generated startup_stm32l4r5xx.s and STM32L4R5ZI_FLASH.ld are present (they provide the vector table, Reset_Handler, and clock init). We change only the link flags and add the two semihosting lines.
#include <stdio.h>
#include <stdint.h>
/* Supplied by librdimon (via --specs=rdimon.specs). */
extern void initialise_monitor_handles(void);
int main(void)
{
/* SystemInit() / clock config from CubeMX has already run
out of the reset handler before main(). */
initialise_monitor_handles();
printf("=== STM32L4R5 semihosting bring-up ===\n");
printf("SCB CPUID @ 0xE000ED00 = 0x%08lX\n",
*(volatile uint32_t *)0xE000ED00); // should read 0x410FC241 (Cortex-M4 r0p1)
for (uint32_t i = 0; i < 5; i++) {
printf("loop %lu\n", i);
for (volatile uint32_t d = 0; d < 200000; d++) { } // crude delay
}
printf("done.\n");
while (1) { }
}
PREFIX = arm-none-eabi-
CC = $(PREFIX)gcc
OBJCOPY = $(PREFIX)objcopy
CPU = -mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=hard
CFLAGS = $(CPU) -Og -g3 -Wall -ffunction-sections -fdata-sections \
-DSTM32L4R5xx
LDFLAGS = $(CPU) --specs=rdimon.specs -Wl,--gc-sections \
-T STM32L4R5ZI_FLASH.ld \
-Wl,--start-group -lc -lrdimon -Wl,--end-group \
-Wl,-Map=firmware.map
SRCS = main.c system_stm32l4xx.c startup_stm32l4r5xx.s
OBJS = $(addsuffix .o,$(basename $(SRCS)))
all: firmware.elf firmware.bin
%.o: %.c
$(CC) $(CFLAGS) -c $< -o $@
%.o: %.s
$(CC) $(CFLAGS) -c $< -o $@
firmware.elf: $(OBJS)
$(CC) $(OBJS) $(LDFLAGS) -o $@
$(PREFIX)size $@
firmware.bin: firmware.elf
$(OBJCOPY) -O binary $< $@
clean:
rm -f $(OBJS) firmware.elf firmware.bin firmware.map
# 1) Build
make
# 2) Terminal A — start the GDB server (leave it running)
openocd -f interface/stlink.cfg -f target/stm32l4x.cfg
# 3) Terminal B — connect, enable semihosting, flash, run
arm-none-eabi-gdb firmware.elf -x run.gdb
Info : halted: PC: 0x08000abc
Info : semihosting is enabled
=== STM32L4R5 semihosting bring-up ===
SCB CPUID @ 0xE000ED00 = 0x410FC241
loop 0
loop 1
loop 2
loop 3
loop 4
done.
The 0x410FC241 read confirms this is a Cortex-M4 (r0p1) core — a handy sanity check that both the read and the semihosting print path work.
07 Register-level: direct BKPT, no library
You do not need newlib rdimon at all. Because the whole protocol is one instruction (§02), you can retarget printf with a handful of lines and keep --specs=nano.specs (or even nosys.specs) for a tiny binary. This is the LL/register-level path: it works with any startup, no librdimon, no initialise_monitor_handles().
#include <stdint.h>
#include <sys/stat.h>
/* Operation numbers from the ARM semihosting spec. */
#define SYS_WRITE0 0x04
#define SYS_WRITE 0x05
#define SYS_READC 0x07
/* The one-instruction primitive: r0=op, r1=args, ret in r0. */
static inline int semihost(int op, void *args)
{
register int r0 asm("r0") = op;
register void *r1 asm("r1") = args;
asm volatile("bkpt #0xAB"
: "+r"(r0) : "r"(r1) : "memory");
return r0;
}
/* SYS_WRITE param block: { handle, data ptr, length }.
Returns the number of bytes NOT written (0 == success). */
static int sh_write(int handle, const char *buf, int len)
{
volatile uint32_t block[3] = { (uint32_t)handle,
(uint32_t)buf,
(uint32_t)len };
return semihost(SYS_WRITE, (void *)block);
}
/* newlib calls _write() for every printf/putchar/fwrite(stdout).
Handle 1 == stdout by convention. */
int _write(int fd, const char *buf, int len)
{
(void)fd; // treat all fds as the console
int not_written = sh_write(1, buf, len);
return len - not_written; // bytes actually written
}
/* Minimal stubs so the linker is happy with --specs=nano.specs. */
int _read(int fd, char *buf, int len) { (void)fd;(void)buf;(void)len; return 0; }
int _close(int fd) { (void)fd; return -1; }
int _lseek(int fd, int off, int dir) { (void)fd;(void)off;(void)dir; return 0; }
int _fstat(int fd, struct stat *st) { (void)fd; st->st_mode = S_IFCHR; return 0; }
int _isatty(int fd) { (void)fd; return 1; }
Link this with --specs=nano.specs (not rdimon) and printf("hi\n") reaches the host through _write → sh_write → BKPT 0xAB. For the very cheapest "did I get here?" probe you can skip printf entirely and fire SYS_WRITE0 directly:
static inline void sh_puts(const char *s)
{
semihost(SYS_WRITE0, (void *)s); // r1 -> NUL-terminated string
}
// ...anywhere in your code, even before clocks are configured:
sh_puts("reached HAL_Init\n");
The direct approach still requires monitor arm semihosting enable on the debug agent and a probe attached — the BKPT is identical hardware behaviour. The only thing you saved is the librdimon code size and the initialise_monitor_handles() dance.
08 Pros, cons & when to use
Semihosting trades speed and independence for zero setup. Understanding the cost tells you where it belongs (bring-up) and where it does not (anything timed or unattended).
| Property | Semihosting | UART printf | SWO / ITM (printf via SWV) |
|---|---|---|---|
| Extra pins | None (reuses SWD) | 2 (TX/RX) | 1 (SWO/PB3) |
| Peripheral setup | None | USART + clock + baud | ITM + TPIU + trace clock |
| Halts the core? | Yes, every call | No | No |
| Throughput | Very slow (μs–ms/call) | Baud-limited | Fast (MHz) |
| Runs with no probe? | No — HardFaults | Yes | Runs, output lost |
| Host file access | Yes (open/read/write) | No | No |
| Best for | Day-one bring-up, host files | Field logging, production | High-rate live tracing |
Why it is slow — and why "slow" is the point
Every printf forces a full round trip: the core executes BKPT and stops; the debug probe notices the halt, reads registers over SWD, walks the parameter block out of target RAM, does the host I/O, writes the result back, and single-steps the core past the breakpoint. That is tens of microseconds to milliseconds per call, during which your firmware is frozen. Interrupts do not run, timers drift, communication peripherals overflow.
main() run? Is my PLL locked? What is the value of this register?" — correctness questions where stopping the core is irrelevant.The rule of thumb: semihosting is a bring-up and lab tool. Once the board talks, graduate your logging to a non-halting transport (ITM/SWO or a real UART) and strip the rdimon link flags out of release builds.
09 Gotchas / common mistakes
Almost every "semihosting doesn't print / my board hangs" report is one of the following.
| Symptom | Cause | Fix |
|---|---|---|
Board HardFaults the instant it hits printf, or hangs when powered from USB/wall with no probe. | BKPT 0xAB with no debugger = unhandled debug event = HardFault on Cortex-M. | Never ship rdimon builds. For standalone runs relink with nosys.specs, or guard traps behind a "debugger attached" check (CoreDebug->DHCSR & C_DEBUGEN). |
| Nothing prints; no fault. | Forgot monitor arm semihosting enable, or used the ST-LINK GDB server (no support). | Enable semihosting before continue; use OpenOCD/J-Link/pyOCD, not ST-LINK_gdbserver. |
First trap faults inside initialise_monitor_handles(). | Semihosting enabled after the core already reached that call. | Order: reset halt → enable semihosting → then run to main(). |
| Output only appears in bursts / at exit. | stdout is block-buffered by newlib. | End lines with \n and/or call setvbuf(stdout, NULL, _IONBF, 0), or fflush(stdout). |
Link error: undefined reference to initialise_monitor_handles / _write. | -lrdimon missing, or --specs=rdimon.specs not passed. | Use the group form: --specs=rdimon.specs -Wl,--start-group -lc -lrdimon -Wl,--end-group. |
printf hangs forever after adding nano.specs. | Conflicting nano.specs + rdimon.specs combination. | Use rdimon.specs alone, or the direct-BKPT retarget in §07 with nano.specs. |
| Firmware is fine under debugger but "randomly" freezes in the field. | A stray semihosting printf left in a production build; runs fine only while a probe is attached. | Grep the release map for librdimon / semihost symbols; remove the spec from release configs. |
| Everything works but the loop is unbearably slow. | Expected: each call halts the core (§08). | Move to ITM/SWO or UART for anything high-rate; keep semihosting for bring-up only. |
#include <stdint.h>
/* DHCSR bit0 (C_DEBUGEN) is set only when a debugger has enabled
halting-debug. Check it before issuing any semihosting trap so the
same image can run both under the probe and standalone. */
static inline int debugger_attached(void)
{
return (*(volatile uint32_t *)0xE000EDF0) & 0x1u; // CoreDebug->DHCSR & C_DEBUGEN
}
void log_msg(const char *s)
{
if (debugger_attached())
sh_puts(s); // safe: a probe is servicing traps
// else: silently skip — no BKPT, no HardFault
}
A semihosting build is not self-contained. It only runs correctly while a debugger with semihosting enabled is attached. Treat it as a lab instrument: enable it for bring-up, guard or remove it everywhere else.
Checklist to get the first character
- Compile
-mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=hard. - Link
--specs=rdimon.specs+-lc -lrdimon(in a link group). - Call
initialise_monitor_handles()first thing inmain(). - OpenOCD server up; GDB:
reset halt→monitor arm semihosting enable→load→continue. - Print with a trailing
\n; watch the OpenOCD console.