All guides
TECHNICAL GUIDESTM32L4R5UART2026

STM32L4R5 USART / LPUART
Poll · IRQ · DMA

Register-accurate serial for the STM32L4R5 (Cortex-M4F, RM0432): USART1-3, UART4-5 and LPUART1 — BRR baud math with oversampling 16/8, the CR1/CR2/CR3 bit fields, TXE/RXNE/TC flags, and complete polling, interrupt and DMAMUX-driven code in both bare-metal and HAL form.

01 What is on the chip: instances, buses, AFs

The STM32L4R5 exposes six serial blocks. USART1-3 are full USARTs (async + synchronous master, LIN, IrDA, smartcard, RS-485 Driver-Enable). UART4-5 are async-only UARTs. LPUART1 is a Low-Power UART with a fractional /256 divider that can run and wake the MCU from Stop mode. All six share the same register model (CR1/CR2/CR3/BRR/ISR/ICR/RDR/TDR), so once you can drive one you can drive all of them.

Every instance sits on a peripheral bus and has its own clock-enable bit and its own kernel-clock selector in RCC->CCIPR. The alternate-function group is fixed by silicon: USART1/2/3 = AF7, UART4/5 and LPUART1 = AF8.

InstanceBusClock-enable bitCCIPR selectorAFType
USART1APB2RCC_APB2ENR_USART1ENUSART1SEL [1:0]AF7Full USART
USART2APB1RCC_APB1ENR1_USART2ENUSART2SEL [3:2]AF7Full USART
USART3APB1RCC_APB1ENR1_USART3ENUSART3SEL [5:4]AF7Full USART
UART4APB1RCC_APB1ENR1_UART4ENUART4SEL [7:6]AF8Async UART
UART5APB1RCC_APB1ENR1_UART5ENUART5SEL [9:8]AF8Async UART
LPUART1APB1RCC_APB1ENR2_LPUART1ENLPUART1SEL [11:10]AF8Low-power UART

The kernel clock (f_CK) that feeds the baud generator is not the AHB clock — it is whatever the CCIPR selector picks for that instance. The 2-bit code is identical for every instance:

00 = PCLKAPB clock of that bus (PCLK2 for USART1, PCLK1 for the rest). Reset default.
01 = SYSCLKSystem clock. Useful when you want a baud rate independent of APB prescalers.
10 = HSI1616 MHz internal RC. Lets the UART keep a fixed baud even if you change SYSCLK.
11 = LSE32.768 kHz. Mostly for LPUART1 low-power / Stop-mode operation.
FIFO present but off by default

Unlike the older STM32L4 (RM0351), the STM32L4+ USART has an 8-level TX/RX FIFO, enabled by FIFOEN (CR1 bit 29). It is disabled at reset, and this guide leaves it off. Because of the FIFO the CMSIS flag names are the combined forms: USART_ISR_TXE_TXFNF, USART_ISR_RXNE_RXFNE, USART_CR1_TXEIE_TXFNFIE, USART_CR1_RXNEIE_RXFNEIE. The bit positions (7, 5) are unchanged; only the macro spelling differs from classic STM32.

02 Clocks, GPIO and pin alternate functions

Three enables are needed before a byte moves: the GPIO port clock (AHB2), the USART clock (APB1/APB2), and the pins put into Alternate-Function mode with the correct AF number. TX is a push-pull output; RX is an input (a pull-up keeps the idle line high so noise cannot look like a start bit).

Pin / AF map (most-used pins)

SignalAFTX pinsRX pinsCK / CTS / RTS_DE
USART1AF7PA9, PB6PA10, PB7CK PA8 · CTS PA11 · RTS PA12
USART2AF7PA2, PD5PA3, PD6CTS PA0/PD3 · RTS PA1/PD4
USART3AF7PB10, PC4, PC10, PD8PB11, PC5, PC11, PD9CK PB12/PD10 · CTS PB13/PD11
UART4AF8PA0, PC10PA1, PC11RTS PA15 · CTS PB7
UART5AF8PC12PD2RTS PB4 · CTS PB5
LPUART1AF8PB11, PC1, PG7PB10, PC0, PG8CTS PB13/PG5 · RTS PB1/PB12/PG6
Port G needs VDDIO2

On the NUCLEO-L4R5ZI the ST-LINK Virtual COM Port is wired to LPUART1 on PG7 (TX) / PG8 (RX). Port G[15:2] is powered by the independent VDDIO2 supply, which is off after reset. You must enable the PWR clock and set PWR->CR2 bit IOSV before those pins do anything. Forget it and the VCP stays silent with no error.

A reusable AF-pin helper (used by every example below)

gpio_af.c — bare-metal, CMSIS
#include "stm32l4r5xx.h"   /* CMSIS device header; define STM32L4R5xx in the build */
#include <stdint.h>

/* Put one pin into Alternate-Function mode: MODER=10, very-high speed,
   pull-up, and program the 4-bit AF selector in AFR[0] (pins 0-7) or AFR[1] (8-15). */
static void gpio_af(GPIO_TypeDef *port, uint32_t pin, uint32_t af)
{
    port->MODER   = (port->MODER   & ~(3u  << (pin * 2)))       | (2u << (pin * 2));
    port->OSPEEDR = (port->OSPEEDR |  (3u  << (pin * 2)));               /* very high */
    port->PUPDR   = (port->PUPDR   & ~(3u  << (pin * 2)))       | (1u << (pin * 2)); /* pull-up */
    port->OTYPER &= ~(1u << pin);                                        /* push-pull */
    port->AFR[pin >> 3] = (port->AFR[pin >> 3] & ~(0xFu << ((pin & 7) * 4)))
                        | (af << ((pin & 7) * 4));
}

This one function replaces a pile of per-pin CMSIS macros and works for any port/pin/AF — you will see it reused in every section.

03 Baud rate: BRR, oversampling 16/8, LPUART /256

The baud generator divides the kernel clock f_CK by the value in BRR. Standard USART/UART uses oversampling by 16 (the reset default, OVER8=0) or by 8 (OVER8=1) for double the top speed. LPUART1 does not have OVER8 — it always uses a fixed /256 fractional divider, so its BRR formula is different.

Oversampling by 16 (OVER8 = 0)

Simplest case — BRR is just the divider, whole thing:

OVER16
USARTDIV = f_CK / baud            (rounded to nearest integer)
BRR      = USARTDIV               (all 16 bits used directly)

/* e.g. 80 MHz / 115200 = 694.44 -> BRR = 694 (0x02B6), error +0.06% */
USART2->BRR = (f_CK + baud/2) / baud;   /* rounded integer divide */

Oversampling by 8 (OVER8 = 1)

Doubles the maximum baud (up to f_CK / 8). The mantissa stays in BRR[15:4], but BRR[3] must be kept 0 and the fraction is shifted right by one bit into BRR[2:0]:

OVER8
USARTDIV   = 2 * f_CK / baud                        (rounded)
BRR[15:4]  = USARTDIV[15:4]
BRR[3]     = 0                                      (reserved, keep clear)
BRR[2:0]   = USARTDIV[3:0] >> 1

/* helper */
static uint32_t brr_over8(uint32_t fck, uint32_t baud)
{
    uint32_t div = (2u * fck + baud / 2u) / baud;   /* USARTDIV, rounded */
    return (div & 0xFFF0u) | ((div & 0x000Fu) >> 1);
}
/* set OVER8 in CR1 (only while UE=0), then write BRR */
USART2->CR1 |= USART_CR1_OVER8;
USART2->BRR  = brr_over8(80000000u, 921600u);       /* -> 0x00A7 */

LPUART1 — fixed /256 divider

LPUART uses a 20-bit BRR and a different formula. Two hard constraints from RM0432: f_CK must be within [3 x baud, 4096 x baud], and the resulting BRR must be at least 0x300.

LPUART BRR
LPUART_BRR = (256 * f_CK) / baud          /* 20-bit; must be >= 0x300 */

/* LSE (32768 Hz) @ 9600 baud:  256*32768/9600 = 874 = 0x36A  (valid) */
LPUART1->BRR = (uint32_t)(((uint64_t)256 * 32768u) / 9600u);

/* HSI16 (16 MHz) @ 115200:     256*16e6/115200 = 35556 = 0x8AE4 */
LPUART1->BRR = (uint32_t)(((uint64_t)256 * 16000000u) / 115200u);
LSE caps LPUART at ~9600

With the 32.768 kHz LSE, f_CK / 3 ≈ 10922, so the fastest usable rate is 9600 baud. For 115200 you must clock LPUART1 from HSI16, PCLK or SYSCLK — but then it can no longer keep running in Stop mode. Use the 64-bit intermediate shown above so 256 * f_CK does not overflow 32 bits.

Baud table — BRR values (OVER16)

BaudBRR @ 80 MHzerrBRR @ 16 MHz (HSI16)err
96008333 (0x208D)+0.00%1667 (0x0683)-0.02%
192004167 (0x1047)-0.01%833 (0x0341)+0.04%
384002083 (0x0823)+0.02%417 (0x01A1)-0.08%
576001389 (0x056D)-0.01%278 (0x0116)-0.08%
115200694 (0x02B6)+0.06%139 (0x008B)-0.08%
230400347 (0x015B)+0.06%69 (0x0045)+0.64%
460800174 (0x00AE)-0.22%35 (0x0023)-0.79%
92160087 (0x0057)-0.22%17 (0x0011)+2.12%

Keep total baud error under ~2.5% for reliable framing. Note how 16 MHz already strains 921600 (+2.1%); at high baud prefer 80 MHz or switch to OVER8.

04 Register-level init + polling TX/RX

Configuration order matters: program CR1/CR2/CR3 and BRR while the peripheral is disabled (UE=0), then set UE, then enable TE/RE. Bits like OVER8, M1:M0 and the clock/parity settings cannot be changed while UE=1.

CR1 — control register 1 (the important bits)

UE (0)USART enable. Set last, after all config.
UESM (1)Keep the (LP)USART clocked in Stop mode. Needed for Stop-mode wake.
RE (2) / TE (3)Receiver / Transmitter enable. Setting TE sends one idle frame first.
RXNEIE_RXFNEIE (5)Interrupt when RXNE (RX data ready) is set.
TCIE (6)Interrupt on Transmission Complete (last stop bit shifted out).
TXEIE_TXFNFIE (7)Interrupt when TXE (TDR empty) is set — feed the next byte.
PS (9) / PCE (10)Parity select (0=even,1=odd) / parity control enable.
M1 (28) : M0 (12)Word length. 00 = 8 bits, 01 = 9 bits, 10 = 7 bits. With parity ON, one data bit becomes the parity bit.
OVER8 (15)0 = oversample by 16, 1 = by 8. Change only while UE=0.
FIFOEN (29)Enable the 8-level FIFO. Left 0 in this guide.

CR2 / CR3 — the bits you actually touch

RegFieldMeaning
CR2STOP [13:12]00 = 1 stop, 01 = 0.5, 10 = 2 stop, 11 = 1.5
CR2SWAP (15)Swap TX/RX pins (fix a crossed cable in firmware)
CR2RXINV/TXINV (16/17)Invert RX / TX polarity
CR3DMAR (6) / DMAT (7)DMA enable for receiver / transmitter
CR3OVRDIS (12)Disable overrun detection (RX without RXNE overrun stalls)
CR3DEM (14) / DEP (15)RS-485 Driver-Enable mode / DE polarity

ISR flags and how to clear them

Flag (ISR)BitMeaningCleared by
RXNE_RXFNE5RX data register not emptyReading RDR
TC6Transmission completeWrite 1 to TCCF in ICR
TXE_TXFNF7TX data register emptyWriting TDR
ORE3Overrun — a byte was lostWrite 1 to ORECF in ICR
IDLE4Idle line detected (end of frame burst)Write 1 to IDLECF in ICR

Complete polling driver (USART2, PA2/PA3, 115200 8N1)

uart2_poll.c
#include "stm32l4r5xx.h"
#include <stdint.h>

/* gpio_af() from section 02 is assumed available */

void uart2_init(uint32_t fck_hz, uint32_t baud)
{
    /* 1. Clocks: GPIOA + USART2 */
    RCC->AHB2ENR  |= RCC_AHB2ENR_GPIOAEN;
    RCC->APB1ENR1 |= RCC_APB1ENR1_USART2EN;

    /* 2. Kernel clock = PCLK1 (00 = reset default); fck_hz must equal PCLK1 then */
    RCC->CCIPR &= ~RCC_CCIPR_USART2SEL;

    /* 3. Pins: PA2 = USART2_TX (AF7), PA3 = USART2_RX (AF7) */
    gpio_af(GPIOA, 2, 7);
    gpio_af(GPIOA, 3, 7);

    /* 4. Program while disabled */
    USART2->CR1 = 0;                 /* 8N1, OVER16, FIFO off, no interrupts */
    USART2->CR2 = 0;                 /* STOP = 1 bit */
    USART2->CR3 = 0;
    USART2->BRR = (fck_hz + baud / 2u) / baud;   /* OVER16 rounded divide */

    /* 5. Enable peripheral, then TX + RX */
    USART2->CR1 |= USART_CR1_UE;
    USART2->CR1 |= USART_CR1_TE | USART_CR1_RE;
}

void uart2_putc(uint8_t c)
{
    while (!(USART2->ISR & USART_ISR_TXE_TXFNF)) { }   /* TDR empty? */
    USART2->TDR = c;                                   /* write clears TXE */
}

uint8_t uart2_getc(void)
{
    while (!(USART2->ISR & USART_ISR_RXNE_RXFNE)) {
        if (USART2->ISR & USART_ISR_ORE)               /* recover from overrun */
            USART2->ICR = USART_ICR_ORECF;
    }
    return (uint8_t)USART2->RDR;                        /* read clears RXNE */
}

void uart2_puts(const char *s)
{
    while (*s) uart2_putc((uint8_t)*s++);
    while (!(USART2->ISR & USART_ISR_TC)) { }           /* let last bit leave the wire */
    USART2->ICR = USART_ICR_TCCF;                       /* clear TC */
}

int main(void)
{
    /* assumes SystemClock already at 80 MHz and PCLK1 = 80 MHz */
    uart2_init(80000000u, 115200u);
    uart2_puts("STM32L4R5 USART2 up\r\n");
    for (;;) uart2_putc(uart2_getc());   /* echo */
}

05 Interrupt-driven TX/RX

Polling burns the CPU. For real work, enable RXNE interrupts to receive without spinning, and drive TX from a ring buffer: enable TXEIE while bytes remain, and when the buffer empties switch to TCIE to know the line is truly idle (important for RS-485 DE de-assert). The interrupt vector is shared per instance (USART2_IRQn); you dispatch on the ISR flags inside the handler.

Enable the source AND the NVIC

An interrupt needs two switches: the peripheral enable (RXNEIE/TXEIE in CR1) and the NVIC line (NVIC_EnableIRQ). Miss either and the handler never runs. Also: never leave TXEIE enabled with an empty buffer — TXE stays high forever and you get a permanent interrupt storm. Disable TXEIE the moment the buffer drains.

uart2_irq.c — ring-buffer TX + IRQ RX
#include "stm32l4r5xx.h"
#include <stdint.h>

#define TXQ_SIZE 256u
static volatile uint8_t  txq[TXQ_SIZE];
static volatile uint16_t tx_head, tx_tail;     /* head=write, tail=read (ISR) */

#define RXQ_SIZE 256u
static volatile uint8_t  rxq[RXQ_SIZE];
static volatile uint16_t rx_head, rx_tail;

void uart2_irq_init(uint32_t fck_hz, uint32_t baud)
{
    RCC->AHB2ENR  |= RCC_AHB2ENR_GPIOAEN;
    RCC->APB1ENR1 |= RCC_APB1ENR1_USART2EN;
    gpio_af(GPIOA, 2, 7);
    gpio_af(GPIOA, 3, 7);

    USART2->CR1 = 0;
    USART2->CR2 = 0;
    USART2->CR3 = 0;
    USART2->BRR = (fck_hz + baud / 2u) / baud;

    USART2->CR1 |= USART_CR1_RXNEIE_RXFNEIE;    /* IRQ on every received byte */
    USART2->CR1 |= USART_CR1_UE | USART_CR1_TE | USART_CR1_RE;

    NVIC_SetPriority(USART2_IRQn, 6);
    NVIC_EnableIRQ(USART2_IRQn);
}

/* Queue a byte for transmission and kick the TXE interrupt. */
void uart2_send(uint8_t c)
{
    uint16_t next = (uint16_t)((tx_head + 1u) % TXQ_SIZE);
    while (next == tx_tail) { }                 /* buffer full: wait (or drop) */
    txq[tx_head] = c;
    tx_head = next;
    USART2->CR1 |= USART_CR1_TXEIE_TXFNFIE;     /* arm TXE IRQ */
}

/* Non-blocking receive: returns -1 if empty. */
int uart2_recv(void)
{
    if (rx_tail == rx_head) return -1;
    uint8_t c = rxq[rx_tail];
    rx_tail = (uint16_t)((rx_tail + 1u) % RXQ_SIZE);
    return c;
}

void USART2_IRQHandler(void)
{
    uint32_t isr = USART2->ISR;

    /* --- RX --- */
    if (isr & USART_ISR_ORE)                     /* clear overrun, keep going */
        USART2->ICR = USART_ICR_ORECF;
    if (isr & USART_ISR_RXNE_RXFNE) {
        uint8_t c = (uint8_t)USART2->RDR;        /* reading RDR clears RXNE */
        uint16_t next = (uint16_t)((rx_head + 1u) % RXQ_SIZE);
        if (next != rx_tail) { rxq[rx_head] = c; rx_head = next; }  /* else drop */
    }

    /* --- TX --- only act if TXE IRQ is actually enabled */
    if ((isr & USART_ISR_TXE_TXFNF) && (USART2->CR1 & USART_CR1_TXEIE_TXFNFIE)) {
        if (tx_tail != tx_head) {
            USART2->TDR = txq[tx_tail];          /* writing TDR clears TXE */
            tx_tail = (uint16_t)((tx_tail + 1u) % TXQ_SIZE);
        } else {
            USART2->CR1 &= ~USART_CR1_TXEIE_TXFNFIE;  /* nothing left: stop TXE storm */
        }
    }
}

06 DMA TX/RX with DMAMUX

The STM32L4R5 routes DMA through DMAMUX1: DMA request lines are no longer hard-wired to fixed channels. You pick any free DMA channel and write the peripheral's request number into that channel's DMAMUX control register. DMAMUX1 channels 0-6 drive DMA1 channels 1-7; channels 7-13 drive DMA2 channels 1-7.

DMAMUX request numbers (STM32L4R5, RM0432)

RequestIDRequestID
USART1_RX24USART1_TX25
USART2_RX26USART2_TX27
USART3_RX28USART3_TX29
UART4_RX30UART4_TX31
UART5_RX32UART5_TX33
LPUART1_RX34LPUART1_TX35

These are the values you write into DMAMUXx_ChannelN->CCR (field DMAREQ_ID[7:0]). Use the CMSIS macros DMA_REQUEST_USART2_TX etc. if you prefer symbolic names.

DMA channel registers (per DMA1/DMA2 channel)

CPARPeripheral address = &USARTx->TDR (TX) or &USARTx->RDR (RX)
CMARMemory buffer address
CNDTRNumber of bytes to move
CCRDIR (1=mem→periph for TX), MINC, CIRC, PL, MSIZE/PSIZE (00=8-bit), TCIE, EN (set last)
uart2_dma.c — TX one-shot + circular RX
#include "stm32l4r5xx.h"
#include <stdint.h>
#include <string.h>

#define RX_BUF 128u
static volatile uint8_t rx_dma[RX_BUF];

void uart2_dma_init(uint32_t fck_hz, uint32_t baud)
{
    RCC->AHB2ENR  |= RCC_AHB2ENR_GPIOAEN;
    RCC->APB1ENR1 |= RCC_APB1ENR1_USART2EN;
    RCC->AHB1ENR  |= RCC_AHB1ENR_DMA1EN | RCC_AHB1ENR_DMAMUX1EN;  /* DMA + DMAMUX clocks */

    gpio_af(GPIOA, 2, 7);
    gpio_af(GPIOA, 3, 7);

    USART2->CR1 = 0;
    USART2->CR2 = 0;
    USART2->CR3 = USART_CR3_DMAT | USART_CR3_DMAR;    /* route TX+RX through DMA */
    USART2->BRR = (fck_hz + baud / 2u) / baud;
    USART2->CR1 |= USART_CR1_UE | USART_CR1_TE | USART_CR1_RE;

    /* ---- RX: DMA1_Channel6  (= DMAMUX1_Channel5), circular, periph->mem ---- */
    DMA1_Channel6->CCR   = 0;                         /* disable before config */
    DMA1_Channel6->CPAR  = (uint32_t)&USART2->RDR;
    DMA1_Channel6->CMAR  = (uint32_t)rx_dma;
    DMA1_Channel6->CNDTR = RX_BUF;
    DMAMUX1_Channel5->CCR = 26u;                      /* USART2_RX request */
    DMA1_Channel6->CCR = DMA_CCR_MINC | DMA_CCR_CIRC | DMA_CCR_EN;  /* DIR=0, 8-bit */
}

/* Blocking-start DMA transmit of a buffer on DMA1_Channel7 (= DMAMUX1_Channel6). */
void uart2_dma_send(const uint8_t *buf, uint16_t len)
{
    DMA1_Channel7->CCR = 0;                           /* stop channel */
    DMA1->IFCR = DMA_IFCR_CGIF7;                      /* clear ch7 flags */
    DMA1_Channel7->CPAR  = (uint32_t)&USART2->TDR;
    DMA1_Channel7->CMAR  = (uint32_t)buf;
    DMA1_Channel7->CNDTR = len;
    DMAMUX1_Channel6->CCR = 27u;                      /* USART2_TX request */
    USART2->ICR = USART_ICR_TCCF;                     /* clear stale TC */
    DMA1_Channel7->CCR = DMA_CCR_MINC | DMA_CCR_DIR | DMA_CCR_EN;   /* mem->periph */
}

/* Call when the DMA TX transfer-complete flag is set to confirm the wire is idle. */
void uart2_dma_wait_done(void)
{
    while (!(DMA1->ISR & DMA_ISR_TCIF7)) { }          /* DMA moved all bytes */
    DMA1->IFCR = DMA_IFCR_CTCIF7;
    DMA1_Channel7->CCR &= ~DMA_CCR_EN;
    while (!(USART2->ISR & USART_ISR_TC)) { }         /* last stop bit really left */
    USART2->ICR = USART_ICR_TCCF;
}
Idle-line RX is the robust pattern

For receiving variable-length data, combine circular DMA RX with the IDLE interrupt: enable USART_CR1_IDLEIE, and on each IDLE event read how many bytes DMA has written via RX_BUF - DMA1_Channel6->CNDTR. That gives you framed packets without knowing the length in advance. Clear IDLE with USART2->ICR = USART_ICR_IDLECF.

07 The HAL way (Init, Transmit, _IT, _DMA)

STM32Cube HAL wraps all of the above behind UART_HandleTypeDef. You fill an Init struct (HAL computes BRR from HAL_RCC_GetPCLK1Freq()), then use blocking, interrupt, or DMA transfer calls. The three transfer modes map exactly to sections 04-06.

uart2_hal.c
#include "stm32l4xx_hal.h"

UART_HandleTypeDef huart2;

void uart2_hal_init(void)
{
    __HAL_RCC_GPIOA_CLK_ENABLE();
    __HAL_RCC_USART2_CLK_ENABLE();

    GPIO_InitTypeDef g = {0};
    g.Pin       = GPIO_PIN_2 | GPIO_PIN_3;   /* PA2 TX, PA3 RX */
    g.Mode      = GPIO_MODE_AF_PP;
    g.Pull      = GPIO_PULLUP;
    g.Speed     = GPIO_SPEED_FREQ_VERY_HIGH;
    g.Alternate = GPIO_AF7_USART2;
    HAL_GPIO_Init(GPIOA, &g);

    huart2.Instance                    = USART2;
    huart2.Init.BaudRate               = 115200;
    huart2.Init.WordLength             = UART_WORDLENGTH_8B;   /* 8 data bits */
    huart2.Init.StopBits               = UART_STOPBITS_1;
    huart2.Init.Parity                 = UART_PARITY_NONE;
    huart2.Init.Mode                   = UART_MODE_TX_RX;
    huart2.Init.HwFlowCtl              = UART_HWCONTROL_NONE;
    huart2.Init.OverSampling           = UART_OVERSAMPLING_16;
    huart2.Init.OneBitSampling         = UART_ONE_BIT_SAMPLE_DISABLE;
    huart2.AdvancedInit.AdvFeatureInit = UART_ADVFEATURE_NO_INIT;
    if (HAL_UART_Init(&huart2) != HAL_OK) { while (1) { } }
}

/* ---- Blocking (polling) ---- */
void demo_poll(void)
{
    uint8_t rx;
    HAL_UART_Transmit(&huart2, (uint8_t *)"hi\r\n", 4, HAL_MAX_DELAY);
    HAL_UART_Receive(&huart2, &rx, 1, HAL_MAX_DELAY);
}

/* ---- Interrupt: arm a 1-byte receive, refill in the callback ---- */
static uint8_t rx_byte;
void demo_it_start(void) { HAL_UART_Receive_IT(&huart2, &rx_byte, 1); }

void HAL_UART_RxCpltCallback(UART_HandleTypeDef *h)
{
    if (h->Instance == USART2) {
        HAL_UART_Transmit(&huart2, &rx_byte, 1, HAL_MAX_DELAY);  /* echo */
        HAL_UART_Receive_IT(&huart2, &rx_byte, 1);               /* re-arm */
    }
}

/* The HAL IRQ handler must be routed from the vector: */
void USART2_IRQHandler(void) { HAL_UART_IRQHandler(&huart2); }
HAL + DMA needs the MSP and DMAMUX clock

For HAL_UART_Transmit_DMA() / HAL_UART_Receive_DMA() you must, in HAL_UART_MspInit(): enable __HAL_RCC_DMA1_CLK_ENABLE() and __HAL_RCC_DMAMUX1_CLK_ENABLE(), init a DMA_HandleTypeDef with Init.Request = DMA_REQUEST_USART2_TX (HAL sets DMAMUX for you), call __HAL_LINKDMA(&huart2, hdmatx, hdma_tx), and enable both the DMA-channel IRQ and USART2_IRQn in the NVIC. CubeMX generates all of this; by hand it is the step people forget.

08 LPUART1 for low-power / Stop-mode wake

LPUART1's point is that, clocked from the 32.768 kHz LSE, it keeps receiving while the MCU sleeps in Stop 2 and can wake it on activity. That needs three things: LSE as the LPUART kernel clock, UESM=1 (clock kept in Stop), and a wake source configured via CR3.WUS + WUFIE, exposed on EXTI line 28.

lpuart1_lp.c — PG7/PG8, LSE, wake from Stop
#include "stm32l4r5xx.h"
#include <stdint.h>

void lpuart1_lp_init(void)
{
    /* 1. PWR clock, then enable VDDIO2 so Port G[15:2] (PG7/PG8) works */
    RCC->APB1ENR1 |= RCC_APB1ENR1_PWREN;
    PWR->CR2 |= PWR_CR2_IOSV;

    /* 2. Turn on LSE (needs backup-domain write access) */
    PWR->CR1 |= PWR_CR1_DBP;                 /* unlock RTC/backup domain */
    RCC->BDCR |= RCC_BDCR_LSEON;
    while (!(RCC->BDCR & RCC_BDCR_LSERDY)) { }

    /* 3. GPIOG + LPUART1 clocks */
    RCC->AHB2ENR  |= RCC_AHB2ENR_GPIOGEN;
    RCC->APB1ENR2 |= RCC_APB1ENR2_LPUART1EN;

    /* 4. LPUART1 kernel clock = LSE (11) */
    RCC->CCIPR = (RCC->CCIPR & ~RCC_CCIPR_LPUART1SEL)
               | (3u << RCC_CCIPR_LPUART1SEL_Pos);

    /* 5. Pins: PG7 = LPUART1_TX (AF8), PG8 = LPUART1_RX (AF8) */
    gpio_af(GPIOG, 7, 8);
    gpio_af(GPIOG, 8, 8);

    /* 6. Configure while disabled: 9600 8N1 from LSE */
    LPUART1->CR1 = 0;
    LPUART1->CR2 = 0;
    LPUART1->CR3 = 0;
    LPUART1->BRR = (uint32_t)(((uint64_t)256 * 32768u) / 9600u);   /* = 0x36A */

    /* 7. Wake-from-Stop plumbing */
    LPUART1->CR3 |= (3u << USART_CR3_WUS_Pos);   /* WUS=11: wake on RXNE */
    LPUART1->CR3 |= USART_CR3_WUFIE;             /* wake-up interrupt */
    LPUART1->CR1 |= USART_CR1_UESM;              /* keep LPUART clocked in Stop */

    LPUART1->CR1 |= USART_CR1_UE;
    LPUART1->CR1 |= USART_CR1_TE | USART_CR1_RE;

    /* 8. Route the wake-up line: EXTI line 28 = LPUART1 wakeup */
    EXTI->IMR1  |= EXTI_IMR1_IM28;               /* unmask */
    EXTI->RTSR1 |= EXTI_RTSR1_RT28;              /* rising edge */
    NVIC_EnableIRQ(LPUART1_IRQn);
}

void LPUART1_IRQHandler(void)
{
    if (LPUART1->ISR & USART_ISR_WUF)            /* woke the core */
        LPUART1->ICR = USART_ICR_WUCF;
    if (LPUART1->ISR & USART_ISR_RXNE_RXFNE)
        (void)LPUART1->RDR;                      /* consume the byte */
}
Why LSE and not PCLK here

In Stop mode the APB/AHB clocks are off, so a PCLK-clocked UART is dead. Only LSE (or HSI16 with the kept-in-stop option) survives. That is exactly the trade-off from section 03: LSE limits you to ~9600 baud but buys you receive-while-asleep. For fast LPUART traffic while running, clock it from HSI16 or PCLK instead and skip UESM.

09 Gotchas and common mistakes

Almost every "UART doesn't work" on the L4R5 traces back to one of these.

1 · Wrong f_CK in the BRR formula

BRR uses the kernel clock chosen in CCIPR, not SYSCLK or the AHB clock. If USART2SEL = 00 the divider is PCLK1 — and PCLK1 can be prescaled below SYSCLK. Compute BRR with the actual selected clock, or characters come out garbled at the wrong rate. HAL avoids this by reading HAL_RCC_GetPCLK1Freq() — but only if your clock-tree setup is correct.

2 · Wrong CMSIS flag names (FIFO parts)

On the L4R5 there is no USART_ISR_TXE or USART_ISR_RXNE macro — the FIFO merged them into USART_ISR_TXE_TXFNF and USART_ISR_RXNE_RXFNE (and the interrupt-enable bits into USART_CR1_TXEIE_TXFNFIE / USART_CR1_RXNEIE_RXFNEIE). Code copied from an F1/F4 example will not compile until you rename them.

3 · Changing config while UE=1

OVER8, M1:M0, parity, clock polarity and BRR must be written while UE=0. Writing them with the peripheral enabled is silently ignored. Sequence is always: config → set UE → set TE/RE.

4 · TXEIE interrupt storm

TXE is high whenever TDR is empty, which is most of the time. Enable TXEIE only while you have bytes queued and clear it the instant the buffer empties (section 05). Leaving it on with nothing to send pins the CPU in the ISR forever.

5 · Overrun (ORE) freezes RX

In polling/IRQ mode, if you do not read RDR fast enough, ORE sets and no further RXNE events occur until you clear it with ICR = USART_ICR_ORECF. Always handle ORE in the RX path, or set CR3.OVRDIS if you genuinely don't care about lost bytes.

6 · DMA "done" is not the same as "sent"

The DMA transfer-complete flag fires when the last byte is written to TDR — the final byte is still shifting out on the wire. Before you cut power, sleep, or drop an RS-485 driver-enable line, also wait for USART_ISR_TC. Cutting on DMA-TC alone truncates the last character.

7 · Port G dark without VDDIO2

PG[15:2] (which carries the NUCLEO-L4R5ZI VCP on LPUART1) needs PWR->CR2 |= PWR_CR2_IOSV after enabling the PWR clock. Without it the pins float and the LPUART looks broken though the peripheral is fine.

8 · Forgetting DMAMUX clock / request ID

On the L4R5, DMA needs both RCC_AHB1ENR_DMA1EN and RCC_AHB1ENR_DMAMUX1EN, and you must write the request number (e.g. 27 for USART2_TX) into the DMAMUX channel's CCR. A channel with request ID 0 is wired to nothing and never triggers — a classic "DMA silently does nothing" bug.

9 · LPUART BRR out of range

LPUART requires f_CK in [3 x baud, 4096 x baud] and BRR >= 0x300. Try 115200 from LSE and the BRR underflows the legal range and the link is dead. Use HSI16/PCLK for high LPUART baud, LSE only for slow Stop-mode links.

10 · Crossed TX/RX — fix it in firmware

If TX and RX are swapped on the board, you don't need a new cable: set USART_CR2_SWAP (while UE=0) to exchange the two pins internally. Likewise RXINV/TXINV handle inverted-logic transceivers.