01 SWD pins & protocol basics
Serial Wire Debug (SWD) is ARM's two-wire replacement for JTAG on Cortex-M. Everything in this guide rides on the same physical link the debugger already uses to halt, single-step and flash the STM32L4R5 (Cortex-M4F).
SWD replaces JTAG's TCK/TMS/TDI/TDO with a bidirectional data line and a clock. It talks to the Debug Access Port (DAP) inside the core, which in turn reaches the Advanced High-performance Bus (AHB) and the CoreSight debug components. SWO is a third, optional pin: a one-way trace output that shares the physical JTDO/TRACESWO ball.
| Signal | Direction | Purpose |
|---|---|---|
| SWDIO | Bidirectional | Serial data (host ↔ DAP), sampled/driven on SWCLK edges |
| SWCLK | Host → target | Debug clock supplied by the probe (up to a few MHz) |
| SWO | Target → host | Single-Wire Output: ITM/DWT trace stream (async UART or Manchester) |
| NRST | Host ↔ target | Optional hardware reset (connect-under-reset, recovery) |
| VTref | Target → host | Reference voltage so the probe matches target I/O levels |
| GND | — | Common ground; keep it short for clean high-speed SWO |
STM32L4R5 debug pin / alternate-function map
On the STM32L4R5 (RM0432, DS12023) the SW/JTAG pins default to alternate function AF0 (the SYS_AF). After reset PA13/PA14 are already in SWD mode; PA15/PB3/PB4 come up as JTAG pins and can be freed for GPIO if you use SWD-only — but keep PB3 in AF0 if you want SWO.
| Pin | Default function | AF | Role for this guide |
|---|---|---|---|
| PA13 | JTMS / SWDIO | AF0 | SWD data — required |
| PA14 | JTCK / SWCLK | AF0 | SWD clock — required |
| PB3 | JTDO / TRACESWO | AF0 | SWO trace output — required for ITM/SWO |
| PA15 | JTDI | AF0 | JTAG only; free as GPIO in SWD mode |
| PB4 | NJTRST | AF0 | JTAG only; free as GPIO in SWD mode |
On the NUCLEO-L4R5ZI, PA13/PA14 (SWD) and PB3 (SWO) are wired to the on-board ST-LINK, so no external probe or extra jumper wire is needed for SWO. On custom boards, break out SWDIO, SWCLK, GND, VTref, NRST and — if you want SWO — PB3 on the standard 10-pin Cortex Debug connector.
Standard 10-pin Cortex Debug (SWD) connector
| Pin | Signal | Pin | Signal |
|---|---|---|---|
| 1 | VTref | 2 | SWDIO |
| 3 | GND | 4 | SWCLK |
| 5 | GND | 6 | SWO |
| 7 | KEY (n/c) | 8 | NC / TDI |
| 9 | GNDDetect | 10 | NRST |
02 The CoreSight trace path: DEMCR, ITM, DWT, TPIU
SWO printf is not a UART. Bytes travel through four CoreSight blocks: the DWT/ITM generate trace packets, the TPIU serialises them onto the SWO pin, and DEMCR.TRCENA is the master power switch. Know these registers and you can bring up SWO on any Cortex-M without a wizard.
| Block | Register | Address | Key fields |
|---|---|---|---|
| SCB/DCB | DEMCR | 0xE000EDFC | TRCENA (bit 24) — master enable for DWT/ITM/TPIU |
| ITM | ITM_STIM0..31 | 0xE0000000 + 4·n | Stimulus ports; write byte/half/word to emit trace |
| ITM | ITM_TER | 0xE0000E00 | Trace Enable Register — one bit per stimulus port |
| ITM | ITM_TPR | 0xE0000E40 | Trace Privilege — allow unprivileged writes to ports |
| ITM | ITM_TCR | 0xE0000E80 | ITMENA(0), TSENA(1), SYNCENA(2), TXENA(3), SWOENA(4), TraceBusID[22:16] |
| ITM | ITM_LAR | 0xE0000FB0 | Lock Access — write 0xC5ACCE55 to unlock ITM regs |
| DWT | DWT_CTRL | 0xE0001000 | Cycle counter, PC sampling, exception trace |
| DWT | DWT_CYCCNT | 0xE0001004 | 32-bit cycle counter (profiling / timestamps) |
| TPIU | TPIU_ACPR | 0xE0040010 | Async Clock Prescaler — sets SWO bit rate divider |
| TPIU | TPIU_SPPR | 0xE00400F0 | Pin protocol: 1 = Manchester, 2 = NRZ (UART) |
| TPIU | TPIU_FFCR | 0xE0040304 | Formatter & Flush Control — turn formatter OFF for single-source SWO |
| DBGMCU | DBGMCU_CR | 0xE0042004 | TRACE_IOEN (bit 5), TRACE_MODE[7:6] — STM32-specific pin muxing |
DBGMCU_CR.TRACE_MODE (STM32L4R5-specific)
ARM defines TPIU; ST adds a small mux on top via DBGMCU_CR. You must set TRACE_IOEN to route the trace signals to package pins, and pick a TRACE_MODE. For SWO you want asynchronous mode.
| TRACE_MODE[1:0] | Mode | Pins used |
|---|---|---|
| 0b00 | Asynchronous (SWO) | TRACESWO only (PB3) — this is what you want |
| 0b01 | Synchronous, 1-bit | TRACECK + TRACED0 |
| 0b10 | Synchronous, 2-bit | TRACECK + TRACED0..1 |
| 0b11 | Synchronous, 4-bit | TRACECK + TRACED0..3 |
The SWO baud-rate formula
In async NRZ mode the SWO line rate is the trace clock divided by the prescaler. On STM32 the TPIU trace clock (TRACECLKIN) is HCLK. So:
SWO_baud = TRACECLK / (TPIU_ACPR + 1) → TPIU_ACPR = (HCLK / SWO_baud) - 1 Example (default L4R5, HCLK = 80 MHz, target SWO = 2 MHz): TPIU_ACPR = 80 000 000 / 2 000 000 - 1 = 39
The host decoder must be told the same TRACECLK you actually run. If firmware boosts the PLL from 80 MHz to 120 MHz after SWO is set up (or the host assumes 72 MHz), every character is garbled. TRACECLK == current HCLK == what you pass to OpenOCD/st-trace. No exceptions.
03 SWO/ITM printf on the STM32L4R5 (register-level + HAL)
Two jobs: (1) mux PB3 to TRACESWO and program the TPIU/ITM once at boot, (2) push characters into a stimulus port. Below is a complete, compilable CMSIS bring-up with no HAL dependency, plus the HAL/GPIO variant and a newlib printf retarget.
/* SWO printf for STM32L4R5 (Cortex-M4F). No HAL required.
* Emits ITM stimulus-port-0 bytes on PB3 / TRACESWO, async NRZ (UART) encoding.
* Call SWO_Init(SystemCoreClock, 2000000) once after the clock tree is final. */
#include <stdint.h>
#include "stm32l4xx.h" /* CMSIS device header: defines ITM, TPI, DWT, CoreDebug, DBGMCU, GPIOB, RCC */
void SWO_Init(uint32_t hclk_hz, uint32_t swo_baud)
{
/* --- 1. Route PB3 to TRACESWO (AF0), very-high speed --------------- */
RCC->AHB2ENR |= RCC_AHB2ENR_GPIOBEN; /* clock GPIOB */
GPIOB->MODER = (GPIOB->MODER & ~(3u << (3*2))) | (2u << (3*2)); /* PB3 = AF mode */
GPIOB->OSPEEDR |= (3u << (3*2)); /* very-high output speed */
GPIOB->AFR[0] = (GPIOB->AFR[0] & ~(0xFu << (3*4))) | (0u << (3*4)); /* AF0 = TRACESWO */
/* --- 2. STM32 debug mux: async trace, drive the trace IO --------- */
DBGMCU->CR &= ~DBGMCU_CR_TRACE_MODE; /* TRACE_MODE = 00 -> async SWO */
DBGMCU->CR |= DBGMCU_CR_TRACE_IOEN; /* enable trace pin(s) */
/* --- 3. Master trace enable ------------------------------------- */
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk; /* DEMCR bit 24 */
/* --- 4. TPIU: NRZ encoding, prescaler, formatter OFF ------------ */
TPI->ACPR = (hclk_hz / swo_baud) - 1u; /* SWO_baud = HCLK/(ACPR+1) */
TPI->SPPR = 2u; /* 2 = NRZ/UART, 1 = Manchester */
TPI->FFCR = 0x100u; /* formatter off (single ITM source) */
/* --- 5. ITM: unlock, enable, open stimulus port 0 --------------- */
ITM->LAR = 0xC5ACCE55u; /* unlock ITM registers */
ITM->TCR = ITM_TCR_ITMENA_Msk /* enable ITM */
| ITM_TCR_SYNCENA_Msk /* sync packets */
| ITM_TCR_TSENA_Msk /* local timestamps */
| (1u << ITM_TCR_TraceBusID_Pos); /* ATB ID = 1 */
ITM->TPR = 0x00000001u; /* allow unpriv. access to port 0 */
ITM->TER = 0x00000001u; /* enable stimulus port 0 */
}
/* Blocking single-char write. Spins only while the FIFO is full AND a
* debugger is draining it; if no probe is attached the port stays "ready"
* because writes are simply dropped by hardware. */
void SWO_PutChar(uint8_t c)
{
if ((ITM->TCR & ITM_TCR_ITMENA_Msk) == 0u) return; /* ITM off */
if ((ITM->TER & 1u) == 0u) return; /* port 0 disabled */
while (ITM->PORT[0].u32 == 0u) { /* wait for FIFO ready */ }
ITM->PORT[0].u8 = c; /* 8-bit write = 1 byte packet */
}
CMSIS core_cm4.h already ships ITM_SendChar(int c), which does exactly the "wait-for-ready then write PORT[0].u8" dance. You still need steps 1–5 above (CMSIS does not set up TPIU pins, DBGMCU or the prescaler for you). Use ITM_SendChar for the byte loop and keep your own SWO_Init.
HAL/CMSIS-mixed variant (STM32Cube)
If you generate the project with STM32CubeMX/HAL, do the pin mux with HAL and the CoreSight setup with CMSIS registers. STM32CubeMX will not configure ITM/TPIU for you — you must add this.
#include "main.h"
void SWO_Init_HAL(uint32_t hclk_hz, uint32_t swo_baud)
{
__HAL_RCC_GPIOB_CLK_ENABLE();
GPIO_InitTypeDef g = {0};
g.Pin = GPIO_PIN_3;
g.Mode = GPIO_MODE_AF_PP;
g.Pull = GPIO_NOPULL;
g.Speed = GPIO_SPEED_FREQ_VERY_HIGH;
g.Alternate = GPIO_AF0_TRACE; /* PB3 -> TRACESWO */
HAL_GPIO_Init(GPIOB, &g);
DBGMCU->CR &= ~DBGMCU_CR_TRACE_MODE;
DBGMCU->CR |= DBGMCU_CR_TRACE_IOEN;
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
TPI->ACPR = (hclk_hz / swo_baud) - 1u;
TPI->SPPR = 2u;
TPI->FFCR = 0x100u;
ITM->LAR = 0xC5ACCE55u;
ITM->TCR = ITM_TCR_ITMENA_Msk | ITM_TCR_SYNCENA_Msk |
ITM_TCR_TSENA_Msk | (1u << ITM_TCR_TraceBusID_Pos);
ITM->TPR = 1u;
ITM->TER = 1u;
}
Retarget printf to SWO (newlib)
GCC/newlib calls _write() under the hood for printf/puts. Point it at the ITM and every standard-library print goes out SWO.
#include <unistd.h>
#include "stm32l4xx.h"
/* newlib syscall stub; STDOUT (fd 1) and STDERR (fd 2) go to SWO */
int _write(int fd, char *buf, int len)
{
(void)fd;
for (int i = 0; i < len; i++)
ITM_SendChar((uint8_t)buf[i]); /* CMSIS helper */
return len;
}
/* Usage in main():
* SWO_Init(SystemCoreClock, 2000000);
* setvbuf(stdout, NULL, _IONBF, 0); // no buffering -> lines appear live
* printf("boot ok, HCLK=%lu\r\n", SystemCoreClock);
*/
04 Reading SWO on the host: OpenOCD, itmdump, orbuculum, st-trace
The MCU is now spitting ITM packets onto PB3; the probe captures them and the host must decode the framing back into your bytes. Three battle-tested paths: OpenOCD + itmdump, the orbuculum suite, and ST's own st-trace / GDB server.
Path A — OpenOCD (0.12) tpiu/itm + itmdump
Modern OpenOCD models the trace port as a tpiu object. The stock target/stm32l4x.cfg already creates one named stm32l4x.tpiu. You configure it (protocol, TRACECLK, pin rate, output sink), enable it, then open ITM ports.
# Run: openocd -f interface/stlink.cfg -f target/stm32l4x.cfg -f openocd-swo.cfg
init
# protocol uart = async NRZ; traceclk MUST equal the running HCLK (80 MHz here)
# pin-freq = SWO line rate you programmed into TPIU_ACPR (2 MHz here)
# -output can be: filename | :port (TCP) | - (Tcl) | external
stm32l4x.tpiu configure -protocol uart -traceclk 80000000 \
-pin-freq 2000000 -output /tmp/swo.fifo -formatter 0
stm32l4x.tpiu enable
itm ports off # mask all first
itm port 0 on # enable only stimulus 0
reset run
# itmdump from the Rust 'itm' crate: cargo install itm --features cli
# -f/--file: file OR named pipe -F/--follow: tail -f -s/--stimulus: port (default 0)
mkfifo /tmp/swo.fifo # create the pipe BEFORE starting OpenOCD
itmdump -f /tmp/swo.fifo -F -s 0 # prints your printf() text live
# Windows / no-FIFO alternative: let OpenOCD write a real file, then tail it
# stm32l4x.tpiu configure ... -output swo.log ...
itmdump -f swo.log -F
Older scripts use the single-line form instead of the tpiu object:
tpiu config internal /tmp/swo.fifo uart off 80000000 2000000 then itm port 0 on. Same effect; internal = OpenOCD decodes, off = no formatter, the two trailing numbers are TRACECLK and SWO rate.
Path B — orbuculum suite (orbcat / orbtop)
The orbuculum project is the Swiss-army knife for SWO/SWV. orbuculum is the mux daemon; orbcat prints ITM channel text; orbtop is a live PC-sampling profiler; orbdump saves raw trace. It shines with a SEGGER J-Link (which exports a raw SWO TCP port on 2332) or an ORBTrace Mini.
# J-Link exports SWO on TCP 2332 (start JLinkGDBServer or JLinkExe first).
# orbcat: -s server:port -c CHANNEL,FORMAT (%c = raw char)
orbcat -s localhost:2332 -c 0,"%c"
# Statistical PC profiler over SWO — needs the ELF for symbol names
orbtop -s localhost:2332 -e build/firmware.elf
# Or run the mux once and attach many clients to it:
orbuculum -s localhost:2332 # daemon; then orbcat/orbtop connect to it
Path C — ST-LINK native tools (st-trace, GDB server)
The open-source stlink-tools ship st-trace, a zero-config ITM reader for on-board ST-LINKs. STM32CubeIDE's "SWV ITM Data Console" does the same through the ST-LINK GDB server. Both need the correct core clock.
# The clock value MUST equal the running HCLK. Check exact flag with: st-trace --help
st-trace --clock=80000000 # reads stimulus port 0, prints to stdout
# ST-LINK GDB server route: it opens a raw SWO TCP port you can pipe to itmdump
nc localhost 3344 | itmdump -f /dev/stdin -F
05 SEGGER RTT: concept & target setup
Real-Time Transfer (RTT) needs no trace pin at all. The target keeps ring buffers in RAM; the debugger reads/writes them over ordinary SWD memory accesses while the CPU runs. It is bidirectional, fast (hundreds of kB/s to MB/s), and works on any probe that can do background memory reads.
How it works
A control block — a C struct starting with the ASCII marker "SEGGER RTT" — lives in RAM. It describes N "up" buffers (target → host) and M "down" buffers (host → target), each a lock-free ring buffer with write/read indices. The host scans target RAM for the marker string, then polls the buffers.
| Element | Meaning |
|---|---|
| ID = "SEGGER RTT" | 16-byte marker the host searches RAM for |
| Up buffer 0 | Default terminal: target → host (your SEGGER_RTT_printf) |
| Down buffer 0 | Default terminal: host → target (keyboard input to firmware) |
| WrOff / RdOff | Ring indices; target writes WrOff, host advances RdOff (and vice-versa) |
| Flags/mode | SKIP (drop on full, real-time safe) · TRIM · BLOCK_IF_FIFO_FULL |
Target setup (C)
Grab SEGGER_RTT.c, SEGGER_RTT.h and SEGGER_RTT_Conf.h from the J-Link SDK (redistributable), add them to your build, and call the API. No linker changes are needed for the default single control block — but placing it at a fixed RAM address (below) makes host attach instant.
#include "SEGGER_RTT.h"
#include "stm32l4xx.h"
int main(void)
{
/* No clocks, no pins, no ITM — RTT rides the SWD link the probe already owns */
SEGGER_RTT_Init();
/* Default up-channel 0 is BLOCK-if-full; switch to NO-BLOCK for real-time safety
* so logging never stalls the CPU when no debugger is draining the buffer. */
SEGGER_RTT_ConfigUpBuffer(0, "Terminal", NULL, 0,
SEGGER_RTT_MODE_NO_BLOCK_SKIP);
uint32_t n = 0;
while (1) {
SEGGER_RTT_printf(0, "tick %u HCLK=%u\r\n",
(unsigned)n++, (unsigned)SystemCoreClock);
/* Optional: read a byte the host typed into down-channel 0 */
if (SEGGER_RTT_HasKey()) {
int key = SEGGER_RTT_GetKey();
SEGGER_RTT_printf(0, "got key: %c\r\n", key);
}
HAL_Delay(250);
}
}
By default the host searches a RAM window for the "SEGGER RTT" marker, which can be slow or ambiguous. Force a known address by placing _SEGGER_RTT in a dedicated linker section, then hand that exact address to the host tool (OpenOCD rtt setup, probe-rs, JLinkRTTViewer). The STM32L4R5 has SRAM at 0x20000000, so a control block near the start of RAM is a safe target for a bounded search.
06 Reading RTT: OpenOCD, probe-rs, JLinkRTTClient
Same firmware, three host front-ends. OpenOCD polls the control block and re-exports each channel as a TCP socket; probe-rs has a built-in RTT terminal (and defmt decoder); the SEGGER stack gives you JLinkRTTClient/Viewer.
OpenOCD RTT
OpenOCD locates the control block (either at an address you give or by searching a RAM range for the "SEGGER RTT" ID), starts polling, and exposes channel 0 as a raw TCP server you connect to with nc/telnet.
# openocd -f interface/stlink.cfg -f target/stm32l4x.cfg -f openocd-rtt.cfg
init
reset run
# Search 0x20000000..+2048 for a control block named "SEGGER RTT".
# If you pinned the address, pass that exact value as the first arg.
rtt setup 0x20000000 2048 "SEGGER RTT"
rtt start
# Re-export up/down channel 0 as a TCP server on port 9090
rtt server start 9090 0
# Optional tuning
rtt polling_interval 20 # ms between buffer polls (default 100)
rtt channels # list discovered channels
# Bidirectional: your terminal sees up-channel 0, typing goes to down-channel 0
nc localhost 9090
# Or drive it from a GDB/Tcl session: monitor rtt start / monitor rtt channels
probe-rs / cargo-embed (Rust, but works for any ELF)
probe-rs has first-class RTT. probe-rs run flashes and opens an RTT terminal; probe-rs attach connects without resetting/flashing. cargo embed reads an Embed.toml and opens an RTT UI. It also decodes defmt if the firmware uses defmt-rtt.
# Flash + run + live RTT terminal (chip name from `probe-rs chip list`)
probe-rs run --chip STM32L4R5ZITx target/thumbv7em-none-eabihf/release/app
# Attach to an already-running target WITHOUT reset/flash (preserve state)
probe-rs attach --chip STM32L4R5ZITx build/firmware.elf
# Run with: cargo embed --release (uses the [default] profile)
[default.general]
chip = "STM32L4R5ZITx"
[default.rtt]
enabled = true # open the RTT terminal UI
[default.gdb]
enabled = false # set true to also start a GDB server
SEGGER stack (J-Link only)
With a J-Link, RTT is native and fastest. Start any J-Link server (JLinkExe, JLinkGDBServer, or JLinkRTTViewer) — it auto-detects the control block. JLinkExe opens an RTT telnet server on port 19021; JLinkRTTClient connects to it.
# Terminal 1: open a J-Link session (SWD, our target). RTT auto-starts.
JLinkExe -device STM32L4R5ZI -if SWD -speed 4000 -autoconnect 1
# Terminal 2: text client to the RTT telnet server on 127.0.0.1:19021
JLinkRTTClient
# GUI alternative (multi-channel, logging to file):
JLinkRTTViewer
07 SWO vs RTT vs semihosting — when to use which
All three get text off the chip through the probe, but they trade differently between speed, pin cost, and how badly they disturb real-time behaviour. Pick deliberately.
| Aspect | SWO / ITM | SEGGER RTT | Semihosting |
|---|---|---|---|
| Direction | Target → host (out only) | Bidirectional | Bidirectional (full stdio, file I/O) |
| Extra pin | Needs SWO (PB3) | None — SWD only | None — SWD only |
| Throughput | Good (~SWO baud, ~2–6 Mbit/s) | Very high (RAM ring over SWD) | Very low |
| CPU intrusion | Low (few cycles/byte, HW FIFO) | Low (memcpy to RAM ring) | Severe — CPU HALTS on every call |
| Real-time safe | Yes | Yes (NO_BLOCK_SKIP mode) | No — breaks timing hard |
| Needs debugger attached | Yes (else bytes dropped) | Yes (else buffer fills/skips) | Yes (BKPT hangs with no host) |
| Probe support | Any with SWO capture | J-Link native; OpenOCD/probe-rs poll | Any (OpenOCD/J-Link/pyOCD) |
| Target code | ~30 lines of ITM/TPIU init | SEGGER_RTT.c (drop-in) | newlib syscalls + monitor arm semihosting on |
| Bonus data | DWT PC-sampling, exception & event trace, HW timestamps | Multiple channels, host→target input | Real host file/console access |
fopen, reading test vectors from a PC file) on a halted/bench target. Never in timing-critical or field code — each printf stops the core at a BKPT.08 Gotchas & common mistakes
Ninety percent of "SWO shows nothing" and "RTT: no control block" tickets are one of these.
DEMCR.TRCENA (bit 24). Without the master trace enable, ITM/DWT/TPIU are dead and every register write silently no-ops.-formatter 0 / TPI->FFCR = 0x100). With it on, itmdump sees framed data it can't parse.-O2 stripped/moved _SEGGER_RTT. Pin the control block to a known address (linker section) and pass that exact address to rtt setup / probe-rs / JLinkRTTViewer.SEGGER_RTT_MODE_NO_BLOCK_SKIP for field/RT code.hla_swd/swd), PB3 is JTDO, not TRACESWO. Use SWD transport for SWO.defmt and raw rtt-target channels in one binary. Pick one logging framework.In this guide
- SWD = SWDIO+SWCLK; SWO is a third one-way trace pin on PB3/TRACESWO (AF0) of the STM32L4R5.
- Trace path: DEMCR.TRCENA → ITM (stimulus ports) → TPIU (ACPR/SPPR/FFCR) → SWO; ST mux via DBGMCU_CR.
- SWO baud = HCLK/(ACPR+1); the host must be told the same TRACECLK.
- Capture SWO with OpenOCD
tpiu configure+ itmdump, orbuculum/orbcat/orbtop, or st-trace. - RTT needs no pin: RAM ring buffers marked "SEGGER RTT", read by OpenOCD
rtt, probe-rs, or JLinkRTTClient. - SWO for profiling/timestamps, RTT for speed + bidirectional, semihosting only on a halted bench target.