All guides
TECHNICAL GUIDESTM32L4R5SPI2026

STM32L4R5 SPI Master
CPOL/CPHA, FIFO & DMA

Register-level and HAL configuration of SPI1/2/3 in master mode on the STM32L4R5 (Cortex-M4F, RM0432): CR1/CR2 bit fields, the four clock modes, baud prescaler, NSS management, 8/16-bit framing, the RX/TX FIFO and full-duplex DMA over DMAMUX1.

01 The SPI peripheral on the L4R5

The STM32L4R5 has three general-purpose SPI blocks. SPI1 lives on the fast APB2 bus; SPI2 and SPI3 are on APB1. All three share the same "SPI with FIFO" IP described in RM0432: a 32-bit-wide TX and RX FIFO (4×8-bit deep), programmable frame size from 4 to 16 bits, and Motorola or TI frame formats.

  MASTER (STM32L4R5 SPI1)            SLAVE (e.g. flash, sensor, ADC)
  ┌────────────────────┐            ┌──────────────┐
  │  TX FIFO  ─► shift ─┼─ MOSI ────►│ MOSI         │
  │  RX FIFO  ◄─ shift ◄┼─ MISO ◄────┤ MISO         │
  │  BR prescaler ─────►┼─ SCK  ─────►│ SCK          │
  │  NSS (SW/HW)  ──────┼─ CS   ─────►│ /CS          │
  └────────────────────┘            └──────────────┘
       PCLK2 (SPI1) / PCLK1 (SPI2,3)
    
BlockBus / kernel clockBase addressCMSIS pointerNotes
SPI1APB2 (PCLK2)0x4001 3000SPI1Fastest; up to PCLK2 = 120 MHz
SPI2APB1 (PCLK1)0x4000 3800SPI2Also I2S-capable
SPI3APB1 (PCLK1)0x4000 3C00SPI3Also I2S-capable; uses AF6

Unlike the STM32H7, the L4/L4+ SPI has no independent kernel-clock mux — the serial clock is derived directly from the APB clock that feeds the block (PCLK2 for SPI1, PCLK1 for SPI2/SPI3) through the 3-bit BR prescaler. There is exactly one clock domain to reason about.

Why FIFO matters

The 4-entry TX/RX FIFO lets you keep the bus 100% utilised: you can push a second byte before the first has finished shifting out. In polling code you wait on TXE/RXNE; in DMA mode the controller keeps the FIFO fed automatically. The FIFO is also why the disable sequence is non-trivial (see §08).

02 Signals, framing & the four CPOL/CPHA modes

SPI is a full-duplex synchronous bus: on every SCK edge the master shifts one bit out on MOSI and simultaneously latches one bit in on MISO. There is no "read" or "write" — only a transfer. To read N bytes you must clock out N (dummy) bytes.

MOSIMaster Out / Slave In — data from master to slave.
MISOMaster In / Slave Out — data from slave to master.
SCKSerial clock, generated by the master from PCLK / BR prescaler.
NSS / CSActive-low chip select. On the L4R5 it can be a GPIO (software NSS) or the hardware NSS pin.

CPOL and CPHA — the four modes

CPOL (CR1 bit 1) sets the idle level of SCK. CPHA (CR1 bit 0) selects which clock edge samples the data. The combination gives four standard modes; the slave datasheet dictates which one you must use.

ModeCPOLCPHASCK idleData sampled onData shifted on
000Low1st edge (rising)2nd edge (falling)
101Low2nd edge (falling)1st edge (rising)
210High1st edge (falling)2nd edge (rising)
311High2nd edge (rising)1st edge (falling)
Configure CPOL/CPHA only while SPE = 0

CPOL, CPHA, BR, MSTR, LSBFIRST, DS and the NSS bits must be programmed with the SPI disabled (CR1.SPE = 0). Changing them on a running peripheral yields undefined behaviour. Set everything, then set SPE last.

Frame size and bit order

DS[3:0] (CR2 bits 11:8) selects the frame length: 0b0011 = 4-bit … 0b0111 = 8-bit … 0b1111 = 16-bit. Values below 0b0011 are reserved. LSBFIRST (CR1 bit 7) chooses MSB-first (0, the default and most common) or LSB-first (1). For frames of 8 bits or fewer you must also set FRXTH (CR2 bit 12) so that RXNE is asserted per byte — see §04.

03 Pin mapping, alternate functions & clocks

SPI1 and SPI2 pins use alternate function AF5; SPI3 uses AF6. The table below lists the mappings shared across the STM32L4 family (verify the exact set available on your package against DS12023). Not every pin exists on every package.

SignalAFPin options
SPI1_SCKAF5PA5, PB3, PE13
SPI1_MISOAF5PA6, PB4, PE14
SPI1_MOSIAF5PA7, PB5, PE15
SPI1_NSSAF5PA4, PA15, PE12
SPI2_SCKAF5PB10, PB13, PD1
SPI2_MISOAF5PB14, PC2, PD3
SPI2_MOSIAF5PB15, PC3, PD4
SPI2_NSSAF5PB9, PB12, PD0
SPI3_SCKAF6PB3, PC10
SPI3_MISOAF6PB4, PC11
SPI3_MOSIAF6PB5, PC12
SPI3_NSSAF6PA4, PA15

Note that PA4/PA15/PB3/PB4/PB5 are shared between SPI1 (AF5) and SPI3 (AF6): the AF number selects which peripheral drives the pin.

Clock enables (RCC)

GPIO ports live on AHB2; DMA and DMAMUX on AHB1. SPI1 is enabled in APB2ENR; SPI2/SPI3 in APB1ENR1.

rcc_enables.c
// GPIO port clocks (AHB2)
RCC->AHB2ENR |= RCC_AHB2ENR_GPIOAEN;      // PA4..PA7 for SPI1

// SPI peripheral clocks
RCC->APB2ENR  |= RCC_APB2ENR_SPI1EN;      // SPI1  (bit 12, APB2)
RCC->APB1ENR1 |= RCC_APB1ENR1_SPI2EN;     // SPI2  (bit 14, APB1)
RCC->APB1ENR1 |= RCC_APB1ENR1_SPI3EN;     // SPI3  (bit 15, APB1)

// For DMA later (AHB1):
RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN | RCC_AHB1ENR_DMAMUX1EN;

// Erratum-safe: read back after enabling a clock before using the block
(void)RCC->APB2ENR;

GPIO alternate-function setup

gpio_spi1.c — PA5=SCK, PA6=MISO, PA7=MOSI (AF5)
static void spi1_gpio_init(void)
{
    // MODER = 0b10 (alternate function) for PA5, PA6, PA7
    GPIOA->MODER &= ~(GPIO_MODER_MODE5 | GPIO_MODER_MODE6 | GPIO_MODER_MODE7);
    GPIOA->MODER |=  (2u << GPIO_MODER_MODE5_Pos)
                 |  (2u << GPIO_MODER_MODE6_Pos)
                 |  (2u << GPIO_MODER_MODE7_Pos);

    // Very-high-speed output (0b11) — needed above a few MHz
    GPIOA->OSPEEDR |= (3u << GPIO_OSPEEDR_OSPEED5_Pos)
                   |  (3u << GPIO_OSPEEDR_OSPEED6_Pos)
                   |  (3u << GPIO_OSPEEDR_OSPEED7_Pos);

    // Push-pull (default), no pull-up/down (MISO may want a pull if slave tristates)
    GPIOA->PUPDR &= ~(GPIO_PUPDR_PUPD5 | GPIO_PUPDR_PUPD6 | GPIO_PUPDR_PUPD7);

    // AFR[0] handles pins 0..7; write AF5 into the 4-bit nibble of each pin
    GPIOA->AFR[0] &= ~((0xFu << (5*4)) | (0xFu << (6*4)) | (0xFu << (7*4)));
    GPIOA->AFR[0] |=  ((5u   << (5*4)) | (5u   << (6*4)) | (5u   << (7*4)));

    // Chip-select on PA4 as a plain GPIO output, idle HIGH (software NSS)
    GPIOA->MODER &= ~GPIO_MODER_MODE4;
    GPIOA->MODER |=  (1u << GPIO_MODER_MODE4_Pos);  // general output
    GPIOA->BSRR   =  GPIO_BSRR_BS4;                 // deselect (drive high)
}

04 Register map: CR1, CR2, SR, DR

Five registers do the real work: CR1 (mode/clock), CR2 (framing, FIFO threshold, DMA/IRQ enables), SR (status/FIFO levels), DR (data), plus the CRC registers. Offsets are from the SPI base address.

SPI_CR1 (offset 0x00)

BitFieldMeaning
15BIDIMODE0 = 2-line unidirectional (full-duplex); 1 = 1-line bidirectional
14BIDIOEOutput enable in bidirectional mode
13CRCENHardware CRC calculation enable
12CRCNEXTTransmit CRC next
11CRCLCRC length: 0 = 8-bit, 1 = 16-bit
10RXONLYReceive-only (clock keeps running, MOSI idle)
9SSMSoftware NSS management
8SSIInternal NSS level when SSM = 1
7LSBFIRST0 = MSB first, 1 = LSB first
6SPESPI enable — set this LAST
5:3BR[2:0]Baud rate prescaler (see table below)
2MSTR1 = master, 0 = slave
1CPOLClock polarity (idle level)
0CPHAClock phase (sampling edge)

SPI_CR2 (offset 0x04)

BitFieldMeaning
14LDMA_TXOdd-byte handling for TX DMA (data ≤ 8-bit)
13LDMA_RXOdd-byte handling for RX DMA (data ≤ 8-bit)
12FRXTHRX FIFO threshold: 1 = RXNE at 8-bit, 0 = at 16-bit
11:8DS[3:0]Data size: 0111 = 8-bit, 1111 = 16-bit
7TXEIETX-buffer-empty interrupt enable
6RXNEIERX-buffer-not-empty interrupt enable
5ERRIEError interrupt enable (OVR, MODF, FRE, CRCERR)
4FRFFrame format: 0 = Motorola, 1 = TI
3NSSPNSS pulse between frames (hardware NSS output)
2SSOENSS output enable (hardware NSS)
1TXDMAENTX DMA request enable
0RXDMAENRX DMA request enable

SPI_SR (offset 0x08) — status & FIFO levels

BitFieldMeaning
12:11FTLVL[1:0]TX FIFO level: 00 = empty … 11 = full
10:9FRLVL[1:0]RX FIFO level: 00 = empty … 11 = full
8FRETI-mode frame format error
7BSYBus busy — a transfer is in progress
6OVROverrun (an RX byte was lost)
5MODFMode fault (hardware NSS pulled low in master)
4CRCERRCRC mismatch
1TXETX buffer empty — OK to write DR
0RXNERX buffer not empty — DR holds received data

Baud-rate prescaler (CR1.BR)

fSCK = fPCLK / 2(BR+1). The example column assumes SPI1 with PCLK2 = 120 MHz.

BR[2:0]DivisorfSCK @ PCLK2 = 120 MHz
000/260 MHz — exceeds spec, do not use for master
001/430 MHz
010/815 MHz
011/167.5 MHz
100/323.75 MHz
101/641.875 MHz
110/128937.5 kHz
111/256468.75 kHz
Respect the datasheet fSCK ceiling

DS12023 caps the master SCK frequency (roughly 40 MHz in transmit-only, and lower — around 24 MHz depending on VDD — in full-duplex / master-receive). At PCLK2 = 120 MHz the /2 divisor (60 MHz) is out of spec; the fastest safe full-duplex divisor is typically /8. Always check the SPI timing table for your VDD range.

NSS management

Software (SSM=1, SSI=1)Most common for a master. Internal NSS is forced high so no mode fault occurs; you toggle a GPIO as chip-select yourself. Full control, works with any number of slaves.
Hardware output (SSM=0, SSOE=1)NSS pin is driven low automatically while SPE=1. Add NSSP=1 to emit a one-cycle NSS pulse between frames (needed by some slaves).
Hardware input (SSM=0, SSOE=0)Multi-master arbitration: NSS is an input; if it is pulled low while MSTR=1, MODF fires, SPE and MSTR are cleared.

05 Register-level master init + full-duplex transfer

A complete, compilable bare-metal driver for SPI1 as an 8-bit, mode-0, MSB-first master with software NSS at PCLK2/16. Uses only CMSIS device headers (stm32l4xx.h). Byte-wide DR access and FRXTH are the two details people miss.

spi1_master.c
#include "stm32l4xx.h"

// ---- forward decl from §03 ----
static void spi1_gpio_init(void);

void spi1_master_init(void)
{
    // 1) Clocks
    RCC->AHB2ENR |= RCC_AHB2ENR_GPIOAEN;
    RCC->APB2ENR |= RCC_APB2ENR_SPI1EN;
    (void)RCC->APB2ENR;               // read-back barrier

    spi1_gpio_init();                 // PA5/6/7 = AF5, PA4 = CS output

    // 2) Configure with SPE = 0
    SPI1->CR1 = 0;                     // clear: CPOL=0, CPHA=0 (mode 0), MSB first
    SPI1->CR1 |= SPI_CR1_MSTR          // master
              |  SPI_CR1_SSM           // software NSS management
              |  SPI_CR1_SSI           // internal NSS high -> no MODF
              |  (3u << SPI_CR1_BR_Pos); // BR=011 -> PCLK2/16

    // 3) CR2: 8-bit frames, RXNE per byte, no DMA/IRQ yet
    SPI1->CR2 = (0x7u << SPI_CR2_DS_Pos) // DS=0111 -> 8-bit
             |  SPI_CR2_FRXTH;          // FRXTH=1 -> RXNE asserts at 8 bits

    // 4) Enable
    SPI1->CR1 |= SPI_CR1_SPE;
}

// One full-duplex byte: send tx, return the byte clocked in on MISO.
uint8_t spi1_txrx(uint8_t tx)
{
    while (!(SPI1->SR & SPI_SR_TXE)) { }        // wait TX space
    *(volatile uint8_t *)&SPI1->DR = tx;        // 8-bit write (critical!)
    while (!(SPI1->SR & SPI_SR_RXNE)) { }        // wait for the echo
    return *(volatile uint8_t *)&SPI1->DR;      // 8-bit read
}

// Buffer transfer with manual chip-select (PA4).
void spi1_transfer(const uint8_t *tx, uint8_t *rx, uint32_t n)
{
    GPIOA->BSRR = GPIO_BSRR_BR4;                 // CS low (select)
    for (uint32_t i = 0; i < n; i++) {
        uint8_t out = tx ? tx[i] : 0xFF;        // 0xFF = dummy for reads
        uint8_t in  = spi1_txrx(out);
        if (rx) rx[i] = in;
    }
    // Wait until the shift register has fully drained before raising CS
    while (SPI1->SR & SPI_SR_BSY) { }
    GPIOA->BSRR = GPIO_BSRR_BS4;                 // CS high (deselect)
}
The #1 register-level bug: 16-bit DR access on 8-bit frames

With DS = 8-bit, a 32-bit or 16-bit store to SPI1->DR pushes two bytes into the TX FIFO. You must cast to volatile uint8_t* for both the write and the read. Likewise set FRXTH = 1 so RXNE triggers on a single byte instead of waiting for a half-word that never completes.

Switching to 16-bit frames

spi1_16bit.c
// While SPE = 0:
SPI1->CR2 = (0xFu << SPI_CR2_DS_Pos);   // DS=1111 -> 16-bit; FRXTH is ignored
SPI1->CR1 |= SPI_CR1_SPE;

uint16_t spi1_txrx16(uint16_t tx)
{
    while (!(SPI1->SR & SPI_SR_TXE)) { }
    SPI1->DR = tx;                       // 16-bit access is correct here
    while (!(SPI1->SR & SPI_SR_RXNE)) { }
    return SPI1->DR;
}

06 Full-duplex DMA over DMAMUX1

On the L4R5 the DMA controllers are request-agnostic: any DMA1/DMA2 channel can serve any peripheral, and DMAMUX1 routes a peripheral request line onto a channel. You program the request-line ID into the DMAMUX channel's CCR.

RequestDMAMUX1 request-line ID
SPI1_RX10
SPI1_TX11
SPI2_RX12
SPI2_TX13
SPI3_RX14
SPI3_TX15

DMAMUX channels map one-to-one onto DMA channels: DMAMUX1_Channel0..6 drive DMA1_Channel1..7, and DMAMUX1_Channel7..13 drive DMA2_Channel1..7. In this example RX uses DMA1_Channel2 (→ DMAMUX1_Channel1) and TX uses DMA1_Channel3 (→ DMAMUX1_Channel2).

spi1_dma.c
#include "stm32l4xx.h"

// Call once after spi1_master_init().
void spi1_dma_init(void)
{
    RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN | RCC_AHB1ENR_DMAMUX1EN;

    // Route request lines. DMAREQ_ID lives in bits [6:0]; other bits stay 0.
    DMAMUX1_Channel1->CCR = 10u;   // SPI1_RX -> DMA1_Channel2
    DMAMUX1_Channel2->CCR = 11u;   // SPI1_TX -> DMA1_Channel3
}

// Blocking full-duplex transfer of n bytes via DMA.
void spi1_dma_transfer(const uint8_t *tx, uint8_t *rx, uint16_t n)
{
    // ---- RX channel: peripheral -> memory (DIR = 0) ----
    DMA1_Channel2->CCR   = 0;                       // disable while configuring
    DMA1_Channel2->CPAR  = (uint32_t)&SPI1->DR;     // 8-bit peripheral reg
    DMA1_Channel2->CMAR  = (uint32_t)rx;
    DMA1_Channel2->CNDTR = n;
    DMA1_Channel2->CCR   = DMA_CCR_MINC             // increment memory
                         | DMA_CCR_TCIE;           // PSIZE/MSIZE = 00 (8-bit), DIR=0

    // ---- TX channel: memory -> peripheral (DIR = 1) ----
    DMA1_Channel3->CCR   = 0;
    DMA1_Channel3->CPAR  = (uint32_t)&SPI1->DR;
    DMA1_Channel3->CMAR  = (uint32_t)tx;
    DMA1_Channel3->CNDTR = n;
    DMA1_Channel3->CCR   = DMA_CCR_MINC | DMA_CCR_DIR;  // DIR=1 mem->periph

    GPIOA->BSRR = GPIO_BSRR_BR4;                    // CS low

    // Enable channels: RX first so it can never miss the first byte.
    DMA1_Channel2->CCR |= DMA_CCR_EN;
    DMA1_Channel3->CCR |= DMA_CCR_EN;

    // Enable the SPI DMA requests: RX before TX.
    SPI1->CR2 |= SPI_CR2_RXDMAEN;
    SPI1->CR2 |= SPI_CR2_TXDMAEN;

    // Wait for RX completion (RX finishing guarantees the frame is done).
    while (!(DMA1->ISR & DMA_ISR_TCIF2)) { }
    DMA1->IFCR = DMA_IFCR_CTCIF2;                   // clear TC flag

    // ---- Tear down in the correct order ----
    while (SPI1->SR & SPI_SR_BSY) { }
    SPI1->CR2 &= ~(SPI_CR2_TXDMAEN | SPI_CR2_RXDMAEN);
    DMA1_Channel2->CCR &= ~DMA_CCR_EN;
    DMA1_Channel3->CCR &= ~DMA_CCR_EN;

    GPIOA->BSRR = GPIO_BSRR_BS4;                    // CS high
}
Ordering rules that prevent overruns

Enable the RX DMA stream and RXDMAEN before the TX side. The RX channel drains the FIFO in lock-step with TX; if TX starts first you will get an OVR. On teardown, wait for RX transfer-complete, then BSY = 0, then clear the DMA-enable bits. For frames ≤ 8 bits, the peripheral and memory data sizes are both 8-bit (PSIZE = MSIZE = 00).

07 HAL variant (init, MSP, transfers, callbacks)

The same SPI1 master with STM32Cube HAL. HAL hides FRXTH/DS and the byte-wide DR access; you only choose the high-level options. The MSP callback wires up GPIO and DMA, using the symbolic DMA_REQUEST_SPI1_TX/_RX which expand to the DMAMUX IDs 11/10.

spi1_hal.c
#include "stm32l4xx_hal.h"

SPI_HandleTypeDef hspi1;
DMA_HandleTypeDef hdma_spi1_tx;
DMA_HandleTypeDef hdma_spi1_rx;

void MX_SPI1_Init(void)
{
    hspi1.Instance               = SPI1;
    hspi1.Init.Mode              = SPI_MODE_MASTER;
    hspi1.Init.Direction         = SPI_DIRECTION_2LINES;      // full-duplex
    hspi1.Init.DataSize          = SPI_DATASIZE_8BIT;
    hspi1.Init.CLKPolarity       = SPI_POLARITY_LOW;          // CPOL = 0
    hspi1.Init.CLKPhase          = SPI_PHASE_1EDGE;           // CPHA = 0  (mode 0)
    hspi1.Init.NSS               = SPI_NSS_SOFT;              // software NSS
    hspi1.Init.BaudRatePrescaler = SPI_BAUDRATEPRESCALER_16;  // PCLK2/16
    hspi1.Init.FirstBit          = SPI_FIRSTBIT_MSB;
    hspi1.Init.TIMode            = SPI_TIMODE_DISABLE;
    hspi1.Init.CRCCalculation    = SPI_CRCCALCULATION_DISABLE;
    hspi1.Init.CRCPolynomial     = 7;
    hspi1.Init.CRCLength         = SPI_CRC_LENGTH_DATASIZE;
    hspi1.Init.NSSPMode          = SPI_NSS_PULSE_DISABLE;
    if (HAL_SPI_Init(&hspi1) != HAL_OK) { Error_Handler(); }
}

// Called automatically by HAL_SPI_Init() — configure clocks, pins, DMA here.
void HAL_SPI_MspInit(SPI_HandleTypeDef *spi)
{
    if (spi->Instance != SPI1) return;

    __HAL_RCC_SPI1_CLK_ENABLE();
    __HAL_RCC_GPIOA_CLK_ENABLE();
    __HAL_RCC_DMA1_CLK_ENABLE();
    __HAL_RCC_DMAMUX1_CLK_ENABLE();

    GPIO_InitTypeDef g = {0};
    g.Pin       = GPIO_PIN_5 | GPIO_PIN_6 | GPIO_PIN_7;   // SCK/MISO/MOSI
    g.Mode      = GPIO_MODE_AF_PP;
    g.Pull      = GPIO_NOPULL;
    g.Speed     = GPIO_SPEED_FREQ_VERY_HIGH;
    g.Alternate = GPIO_AF5_SPI1;
    HAL_GPIO_Init(GPIOA, &g);

    // TX: DMA1_Channel3
    hdma_spi1_tx.Instance                 = DMA1_Channel3;
    hdma_spi1_tx.Init.Request             = DMA_REQUEST_SPI1_TX;   // ID 11
    hdma_spi1_tx.Init.Direction           = DMA_MEMORY_TO_PERIPH;
    hdma_spi1_tx.Init.PeriphInc           = DMA_PINC_DISABLE;
    hdma_spi1_tx.Init.MemInc              = DMA_MINC_ENABLE;
    hdma_spi1_tx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
    hdma_spi1_tx.Init.MemDataAlignment    = DMA_MDATAALIGN_BYTE;
    hdma_spi1_tx.Init.Mode                = DMA_NORMAL;
    hdma_spi1_tx.Init.Priority            = DMA_PRIORITY_HIGH;
    HAL_DMA_Init(&hdma_spi1_tx);
    __HAL_LINKDMA(spi, hdmatx, hdma_spi1_tx);

    // RX: DMA1_Channel2
    hdma_spi1_rx.Instance                 = DMA1_Channel2;
    hdma_spi1_rx.Init.Request             = DMA_REQUEST_SPI1_RX;   // ID 10
    hdma_spi1_rx.Init.Direction           = DMA_PERIPH_TO_MEMORY;
    hdma_spi1_rx.Init.PeriphInc           = DMA_PINC_DISABLE;
    hdma_spi1_rx.Init.MemInc              = DMA_MINC_ENABLE;
    hdma_spi1_rx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
    hdma_spi1_rx.Init.MemDataAlignment    = DMA_MDATAALIGN_BYTE;
    hdma_spi1_rx.Init.Mode                = DMA_NORMAL;
    hdma_spi1_rx.Init.Priority            = DMA_PRIORITY_HIGH;
    HAL_DMA_Init(&hdma_spi1_rx);
    __HAL_LINKDMA(spi, hdmarx, hdma_spi1_rx);

    HAL_NVIC_SetPriority(DMA1_Channel2_IRQn, 5, 0);
    HAL_NVIC_EnableIRQ(DMA1_Channel2_IRQn);
    HAL_NVIC_SetPriority(DMA1_Channel3_IRQn, 5, 0);
    HAL_NVIC_EnableIRQ(DMA1_Channel3_IRQn);
}

// The DMA IRQs must forward into the HAL DMA handler:
void DMA1_Channel2_IRQHandler(void) { HAL_DMA_IRQHandler(&hdma_spi1_rx); }
void DMA1_Channel3_IRQHandler(void) { HAL_DMA_IRQHandler(&hdma_spi1_tx); }

Running transfers

spi1_hal_use.c
uint8_t tx[4] = { 0x9F, 0xFF, 0xFF, 0xFF };   // e.g. flash "read ID"
uint8_t rx[4];

// Software NSS -> toggle your CS GPIO around the call.
void read_id(void)
{
    HAL_GPIO_WritePin(GPIOA, GPIO_PIN_4, GPIO_PIN_RESET);   // CS low

    // Blocking full-duplex:
    HAL_SPI_TransmitReceive(&hspi1, tx, rx, 4, HAL_MAX_DELAY);

    HAL_GPIO_WritePin(GPIOA, GPIO_PIN_4, GPIO_PIN_SET);     // CS high
}

// Non-blocking full-duplex over DMA:
void read_id_dma(void)
{
    HAL_GPIO_WritePin(GPIOA, GPIO_PIN_4, GPIO_PIN_RESET);
    HAL_SPI_TransmitReceive_DMA(&hspi1, tx, rx, 4);
    // return; completion is signalled in the callback below.
}

// Fires when the DMA full-duplex transfer completes.
void HAL_SPI_TxRxCpltCallback(SPI_HandleTypeDef *spi)
{
    if (spi->Instance == SPI1)
        HAL_GPIO_WritePin(GPIOA, GPIO_PIN_4, GPIO_PIN_SET); // CS high
}

void HAL_SPI_ErrorCallback(SPI_HandleTypeDef *spi)
{
    // Inspect spi->ErrorCode: HAL_SPI_ERROR_OVR / _MODF / _DMA ...
}
HAL vs registers

For a full-duplex transfer always use HAL_SPI_TransmitReceive[_DMA], even if you only care about RX — SPI must clock TX to receive. HAL_SPI_Receive alone works but internally still drives dummy TX. The Request field in the DMA init is what programs the DMAMUX; get it wrong and the channel simply never triggers.

08 Gotchas & common mistakes

Nearly every "SPI doesn't work" bug on the L4R5 is one of the following. They are ordered roughly by how often they bite.

16-bit write on 8-bit dataWriting SPIx->DR as a 32/16-bit value with DS=8-bit pushes two bytes. Always cast to volatile uint8_t* and set FRXTH=1 so RXNE fires per byte.
MODF right after enablingA master with SSM=0/SSOE=0 and a low NSS pin sets MODF, which clears SPE and MSTR — the peripheral silently stops. For a single-master design use SSM=1, SSI=1 (software NSS).
Raising CS too earlyTXE/RXNE only track the FIFO, not the shift register. Wait for BSY=0 before deasserting chip-select, or the last bits get truncated.
Configuring while SPE=1CPOL, CPHA, BR, DS, MSTR, LSBFIRST and the NSS bits are only latched with the SPI disabled. Change them, then set SPE.
Wrong DMAMUX request IDSPI1_TX is 11 and SPI1_RX is 10 (not the other way round). A swapped ID means the channel never receives a request and the transfer hangs. In HAL, set Init.Request, not just the channel.
TX-before-RX in DMAEnabling TXDMAEN before the RX channel is armed loses the first received byte and sets OVR. Arm RX first, then TX.
GPIO speed too lowDefault GPIO speed rounds edges; above a few MHz set OSPEEDR to very-high-speed (0b11) on SCK/MOSI, or the slave samples garbage.
Forgetting the RCC read-backWriting a peripheral register on the very next instruction after enabling its clock can be dropped. Do a dummy read of the RCC enable register first.
Overrun (OVR) on read-only pathsIf you clock bytes but never read DR, RXNE stays set and OVR latches. To clear OVR, read DR then read SR. Always drain RX even for "write-only" transfers.
Wrong mode for the slaveCPOL/CPHA must match the slave exactly. A device wanting mode 3 will return shifted/garbage data in mode 0. Check the slave's timing diagram, not just "SPI mode 0".

Correct disable sequence (RM0432)

To stop the SPI without truncating the last frame, follow the reference-manual order rather than just clearing SPE:

spi_disable.c
static void spi1_disable(void)
{
    while ((SPI1->SR & SPI_SR_FTLVL) != 0) { }   // 1) TX FIFO drained
    while (SPI1->SR & SPI_SR_BSY)        { }      // 2) last frame shifted out
    SPI1->CR1 &= ~SPI_CR1_SPE;                    // 3) disable

    // 4) flush any remaining RX bytes
    while ((SPI1->SR & SPI_SR_FRLVL) != 0)
        (void)*(volatile uint8_t *)&SPI1->DR;
}
Clocks & power modes

SPI2/SPI3 run from PCLK1; if you gate APB1 in a low-power mode they stop. Also, after wake-up from Stop mode the PLL is off and PCLK reverts to MSI — re-check your BR prescaler so fSCK stays in range for the slave.