01 DMA + DMAMUX architecture on STM32L4+
The STM32L4R5 (Cortex-M4F, RM0432) has two general-purpose DMA controllers, DMA1 and DMA2, each with 7 independent channels (14 channels total). On this L4+ device the channels are decoupled from the peripherals by a request router called DMAMUX1.
Why DMAMUX exists
On classic STM32F1/F4 parts every peripheral DMA request was hard-wired to one fixed DMA channel (or picked from a tiny CxS set), so two peripherals that wanted the same channel could not both use DMA. STM32L4+ inserts DMAMUX1 between the ~94 peripheral request lines and the 14 DMA channels: any request line can be routed to any channel. You no longer look up "which channel does USART1_TX live on" — you pick a free channel and program its DMAMUX request ID.
Peripheral (ADC1, USART2, SPI1 …)
│ request line (fixed ID, e.g. ADC1 = 5)
▼
DMAMUX1 ── one CxCR per DMA channel selects DMAREQ_ID
│ routed request
▼
DMA1 / DMA2 channel ── CCR/CNDTR/CPAR/CMAR move the data
│ AHB master
▼
SRAM ⇄ Peripheral data register
Memory map (AHB1)
| Block | Base address | Notes |
|---|---|---|
| DMA1 | 0x4002 0000 | ISR @+0x00, IFCR @+0x04, Ch1 regs @+0x08 |
| DMA2 | 0x4002 0400 | same layout as DMA1 |
| DMAMUX1 | 0x4002 0800 | C0CR @+0x00 … C13CR @+0x34 |
| RCC | 0x4002 1000 | AHB1ENR gates DMA1/DMA2/DMAMUX1 clocks |
| DMA1_Channel1 | 0x4002 0008 | each channel block = 0x14 bytes |
| ADC1 | 0x5004 0000 | DR @+0x40 → 0x5004 0040 (DMA source) |
| USART2 | 0x4000 4400 | RDR @+0x24, TDR @+0x28 |
DMAMUX channel ↔ DMA channel mapping
DMAMUX1 has one control register (CxCR) per DMA channel. The index is fixed:
| DMAMUX channel | Drives | Register |
|---|---|---|
| 0 … 6 | DMA1 Channel 1 … 7 | DMAMUX1_Channel0 … 6 |
| 7 … 13 | DMA2 Channel 1 … 7 | DMAMUX1_Channel7 … 13 |
Formula: dmamux_index = (dma == DMA2 ? 7 : 0) + (channel - 1), with channel in 1…7.
Enable the clocks (register level)
Three clock bits live in RCC->AHB1ENR. Forgetting DMAMUX1EN is the single most common "my DMA does nothing" bug.
#include "stm32l4r5xx.h"
/* RCC->AHB1ENR: DMA1EN=bit0, DMA2EN=bit1, DMAMUX1EN=bit2 */
RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN /* controller */
| RCC_AHB1ENR_DMAMUX1EN; /* the router — do NOT forget */
(void)RCC->AHB1ENR; /* read-back: let the clock settle */
This section
- DMA1 & DMA2, 7 channels each; DMAMUX1 routes any of ~94 request lines to any channel.
- DMAMUX channel index = (DMA2?7:0) + (channel−1); DMA1.Ch1→mux0, DMA2.Ch7→mux13.
- Enable DMA1/DMA2 and DMAMUX1 in RCC→AHB1ENR (bits 0/1/2).
02 DMA channel registers: CCR / CNDTR / CPAR / CMAR
Each channel is described by exactly four 32-bit registers. The controller adds two shared registers, ISR (status) and IFCR (flag clear). Channel n registers sit at DMAx_BASE + 0x08 + 0x14·(n−1).
| Register | Offset | Meaning |
|---|---|---|
| CCR | +0x00 | Channel configuration (direction, sizes, increment, circular, IRQ enables, EN) |
| CNDTR | +0x04 | Number of data items to transfer (0…65535). Counts down; read-only while EN=1 |
| CPAR | +0x08 | Peripheral address — usually &PERIPH->DR (source for P2M, dest for M2P) |
| CMAR | +0x0C | Memory address — your SRAM buffer |
CCR bit fields
| Bits | Field | Function |
|---|---|---|
| 0 | EN | Channel enable. Set last. Most other fields are read-only while EN=1 |
| 1 | TCIE | Transfer-complete interrupt enable |
| 2 | HTIE | Half-transfer interrupt enable |
| 3 | TEIE | Transfer-error interrupt enable |
| 4 | DIR | Direction: 0 = read from peripheral (P2M), 1 = read from memory (M2P) |
| 5 | CIRC | Circular mode — CNDTR auto-reloads, transfer never stops |
| 6 | PINC | Peripheral address increment (normally 0 for a data register) |
| 7 | MINC | Memory address increment (normally 1 to walk a buffer) |
| 9:8 | PSIZE | Peripheral data size: 00=8-bit, 01=16-bit, 10=32-bit |
| 11:10 | MSIZE | Memory data size: 00=8-bit, 01=16-bit, 10=32-bit |
| 13:12 | PL | Priority: 00=low, 01=medium, 10=high, 11=very high |
| 14 | MEM2MEM | Memory-to-memory mode (channel runs freely, no hardware request) |
Sizes, direction and CNDTR — the practical rules
uint16_t buf[] → MSIZE=01; uint8_t buf[] → MSIZE=00. If PSIZE≠MSIZE the DMA packs/unpacks.CPAR, CMAR and CNDTR are writable only while the channel is disabled (EN=0). Writing them with EN=1 is ignored. Always clear EN and spin until it reads back 0 before reprogramming.
03 DMAMUX1 request routing & request-line table
To connect a peripheral to a channel you write the peripheral's request ID into the DMAREQ_ID[7:0] field of the DMAMUX CxCR register that corresponds to that channel. That single write is the whole "routing" step.
DMAMUX CxCR bit fields
| Bits | Field | Function |
|---|---|---|
| 7:0 | DMAREQ_ID | Selected request line (see table below). This is all you need for plain peripheral DMA |
| 8 | SOIE | Synchronization overrun interrupt enable |
| 9 | EGE | Event generation enable (drive a request-generator trigger) |
| 16 | SE | Synchronization enable — gate requests on a sync input |
| 18:17 | SPOL | Sync edge polarity (00 none, 01 rising, 10 falling, 11 both) |
| 23:19 | NBREQ | Number of requests to forward per sync event (minus 1) |
| 28:24 | SYNC_ID | Sync input selection (used only when SE=1) |
For the vast majority of transfers you leave SE=0 and EGE=0 and write only DMAREQ_ID, i.e. DMAMUX1_ChannelN->CCR = request;.
DMAMUX1 request-line numbers (STM32L4R5)
The L4R5 has a single ADC (ADC1 only), so the request IDs are not shifted by the optional ADC2 slot. IDs run 0…93. Request 0 is memory-to-memory (no hardware trigger); 1–4 are the DMAMUX request-generator outputs; 5 and up are peripherals.
| ID | Request line | ID | Request line |
|---|---|---|---|
| 0 | MEM2MEM (no request) | 25 | USART1_TX |
| 1–4 | DMAMUX req generator 0–3 | 26 | USART2_RX |
| 5 | ADC1 | 27 | USART2_TX |
| 6 | DAC1_CH1 | 28 | USART3_RX |
| 7 | DAC1_CH2 | 29 | USART3_TX |
| 8 | TIM6_UP | 30 | UART4_RX |
| 9 | TIM7_UP | 31 | UART4_TX |
| 10 | SPI1_RX | 32 | UART5_RX |
| 11 | SPI1_TX | 33 | UART5_TX |
| 12 | SPI2_RX | 34 | LPUART1_RX |
| 13 | SPI2_TX | 35 | LPUART1_TX |
| 14 | SPI3_RX | 36–37 | SAI1_A / SAI1_B |
| 15 | SPI3_TX | 38–39 | SAI2_A / SAI2_B |
| 16 | I2C1_RX | 40–41 | OCTOSPI1 / OCTOSPI2 |
| 17 | I2C1_TX | 42–48 | TIM1 CH1..4/UP/TRIG/COM |
| 18 | I2C2_RX | 49–55 | TIM8 CH1..4/UP/TRIG/COM |
| 19 | I2C2_TX | 56–60 | TIM2 CH1..4/UP |
| 20 | I2C3_RX | 61–66 | TIM3 CH1..4/UP/TRIG |
| 21 | I2C3_TX | 67–71 | TIM4 CH1..4/UP |
| 22 | I2C4_RX | 72–77 | TIM5 CH1..4/UP/TRIG |
| 23 | I2C4_TX | 78–85 | TIM15/16/17 |
| 24 | USART1_RX | 86–89 | DFSDM1_FLT0..3 |
| 90 | DCMI / PSSI | 91–93 | AES_IN / AES_OUT / HASH_IN |
These IDs match the CMSIS macros DMA_REQUEST_ADC1 (5), DMA_REQUEST_USART2_TX (27), etc., and the RM0432 DMAMUX assignment table. Always cross-check against the exact reference manual for your part number.
Sync & request generator (brief)
Synchronization (SE=1) forwards a batch of NBREQ+1 requests only after an edge on SYNC_ID — useful to align a DMA burst to a timer or external event. The request generator (IDs 1–4) turns an external signal into periodic DMA requests for peripherals that have no native DMA line. Both are configured in the DMAMUX request-generator block at DMAMUX1 + 0x100; leave them off for ordinary peripheral streaming.
04 Programming sequence: the transfer recipe
Every DMA setup on this device follows the same ten steps. Do them in order and the transfer just works; skip step 2 or step 10 and it silently does nothing.
| # | Step |
|---|---|
| 1 | Enable clocks: DMAx + DMAMUX1 + the peripheral |
| 2 | Disable the channel (CCR.EN=0) and spin until EN reads 0 |
| 3 | Clear this channel's flags in DMAx→IFCR |
| 4 | Write CPAR (peripheral data-register address) |
| 5 | Write CMAR (buffer address) |
| 6 | Write CNDTR (element count) |
| 7 | Route the request: DMAMUX1_ChannelN→CCR = request ID |
| 8 | Write CCR: DIR, PINC/MINC, PSIZE/MSIZE, PL, CIRC, IRQ enables (EN still 0) |
| 9 | Set CCR.EN = 1 |
| 10 | Enable the peripheral's DMA request bit (ADC DMAEN, USART DMAT/DMAR, SPI TXDMAEN/RXDMAEN) |
Reusable register-level helper
#include "stm32l4r5xx.h"
#include <stdint.h>
/* DMAMUX CxCR that drives a given DMA channel (ch = 1..7).
* DMA1 Ch1..7 -> DMAMUX1 Channel 0..6
* DMA2 Ch1..7 -> DMAMUX1 Channel 7..13 */
static inline DMAMUX_Channel_TypeDef *dmamux_for(DMA_TypeDef *dma, uint8_t ch)
{
uint32_t idx = (dma == DMA2 ? 7u : 0u) + (uint32_t)(ch - 1u);
return DMAMUX1_Channel0 + idx; /* 4 bytes per CxCR */
}
/* Register block for DMA channel ch (1..7). */
static inline DMA_Channel_TypeDef *dma_ch(DMA_TypeDef *dma, uint8_t ch)
{
uint32_t base = (uint32_t)dma + 0x08u + 0x14u * (uint32_t)(ch - 1u);
return (DMA_Channel_TypeDef *)base;
}
/* One-shot or circular transfer setup. `ccr_flags` carries DIR, MINC,
* PSIZE/MSIZE, PL, CIRC and any *IE bits — but NOT the EN bit. */
void dma_setup(DMA_TypeDef *dma, uint8_t ch, uint8_t request,
volatile void *periph, void *mem, uint16_t count,
uint32_t ccr_flags)
{
DMA_Channel_TypeDef *c = dma_ch(dma, ch);
c->CCR &= ~DMA_CCR_EN; /* (2) disable */
while (c->CCR & DMA_CCR_EN) { } /* wait off */
dma->IFCR = 0xFu << (4u * (ch - 1u)); /* (3) clear GIF/TCIF/HTIF/TEIF */
c->CPAR = (uint32_t)periph; /* (4) periph addr */
c->CMAR = (uint32_t)mem; /* (5) buffer addr */
c->CNDTR = count; /* (6) item count */
dmamux_for(dma, ch)->CCR = request; /* (7) route request */
c->CCR = ccr_flags; /* (8) config, EN=0 */
c->CCR |= DMA_CCR_EN; /* (9) go */
}
Step 10 (the peripheral's own DMA-request enable) lives in the peripheral driver, not in dma_setup() — the two examples below show exactly where it goes.
05 Worked example: ADC1 + DMA (circular scan)
Continuously scan three ADC channels into an SRAM buffer with zero CPU involvement. ADC1 raises request ID 5, routed to DMA1 Channel 1 (DMAMUX channel 0). Circular DMA + continuous conversion means the buffer stays fresh forever.
#include "stm32l4r5xx.h"
#include <stdint.h>
#define NUM_CH 3u
static volatile uint16_t adc_buf[NUM_CH]; /* 12-bit samples, halfwords */
static void clocks_init(void)
{
RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN | RCC_AHB1ENR_DMAMUX1EN;
RCC->AHB2ENR |= RCC_AHB2ENR_ADCEN; /* ADC is on AHB2 */
(void)RCC->AHB2ENR;
/* Clock the ADC synchronously from HCLK/4 (CKMODE=11) so no
RCC ADCSEL kernel-clock selection is required. */
ADC1_COMMON->CCR |= ADC_CCR_CKMODE_0 | ADC_CCR_CKMODE_1; /* 11 */
}
static void adc_enable(void)
{
ADC1->CR &= ~ADC_CR_DEEPPWD; /* leave deep-power-down */
ADC1->CR |= ADC_CR_ADVREGEN; /* turn on ADC regulator */
for (volatile int i = 0; i < 4000; i++) { } /* > t_ADCVREG_STUP (~20us) */
ADC1->CR &= ~ADC_CR_ADCALDIF; /* single-ended calibration */
ADC1->CR |= ADC_CR_ADCAL;
while (ADC1->CR & ADC_CR_ADCAL) { } /* wait for cal to finish */
ADC1->ISR = ADC_ISR_ADRDY; /* clear ADRDY (rc_w1) */
ADC1->CR |= ADC_CR_ADEN; /* enable the ADC */
while (!(ADC1->ISR & ADC_ISR_ADRDY)) { }
}
static void adc_dma_init(void)
{
/* --- regular sequence: 3 conversions, channels 1,2,3 --- */
ADC1->SQR1 = ((NUM_CH - 1u) << ADC_SQR1_L_Pos) /* L = length-1 */
| (1u << ADC_SQR1_SQ1_Pos)
| (2u << ADC_SQR1_SQ2_Pos)
| (3u << ADC_SQR1_SQ3_Pos);
ADC1->SMPR1 = 0x3FFFFFFFu; /* long sample time, ch0..9 */
/* continuous + DMA + circular DMA (DMACFG=1) + overrun overwrite */
ADC1->CFGR = ADC_CFGR_CONT | ADC_CFGR_DMAEN
| ADC_CFGR_DMACFG | ADC_CFGR_OVRMOD;
/* --- DMA1 Channel 1: ADC1->DR (16-bit) -> adc_buf, circular --- */
DMA_Channel_TypeDef *c = DMA1_Channel1;
c->CCR &= ~DMA_CCR_EN;
while (c->CCR & DMA_CCR_EN) { }
DMA1->IFCR = 0xFu; /* clear ch1 flags */
c->CPAR = (uint32_t)&ADC1->DR; /* source: ADC data reg */
c->CMAR = (uint32_t)adc_buf; /* dest: SRAM buffer */
c->CNDTR = NUM_CH; /* 3 halfword items */
DMAMUX1_Channel0->CCR = 5u; /* DMA1 Ch1 = mux0, req = ADC1 */
c->CCR = DMA_CCR_MINC /* walk the buffer */
| DMA_CCR_CIRC /* wrap forever */
| DMA_CCR_PSIZE_0 /* PSIZE=01 -> 16-bit */
| DMA_CCR_MSIZE_0 /* MSIZE=01 -> 16-bit */
| DMA_CCR_PL_1; /* priority = high (10) */
/* DIR left 0 = read from peripheral (P2M); PINC left 0 */
c->CCR |= DMA_CCR_EN; /* enable channel */
}
int main(void)
{
clocks_init();
adc_enable();
adc_dma_init();
ADC1->CR |= ADC_CR_ADSTART; /* start regular conversions */
for (;;) {
/* adc_buf[0..2] refreshed continuously by DMA — read anytime. */
}
}
ADC_CFGR_DMAEN (step 10) is set before ADSTART, and the DMA channel is enabled before the ADC starts converting. If the ADC produces a result before DMA is armed you lose the first sample (or set OVRMOD=1, as above, so the ADC overwrites rather than stalls).
06 Worked example: USART/UART + DMA (TX & RX)
UART is the canonical mem2periph (TX) and periph2mem (RX) case. TX = memory→peripheral (DIR=1), RX = peripheral→memory (DIR=0). Bytes, so PSIZE=MSIZE=8-bit (the CCR reset value). Here RX runs circular into a ring buffer while TX is fired on demand.
#include "stm32l4r5xx.h"
#include <stdint.h>
#define RX_LEN 64u
static volatile uint8_t rx_buf[RX_LEN]; /* circular RX ring */
/* Assumes USART2 already has baud/framing set and UE/TE/RE enabled. */
void uart_dma_init(void)
{
RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN | RCC_AHB1ENR_DMAMUX1EN;
/* ---- RX: USART2->RDR -> rx_buf, circular ----
DMA1 Ch6 == DMAMUX1 Channel 5, request 26 = USART2_RX */
DMA1_Channel6->CCR &= ~DMA_CCR_EN;
while (DMA1_Channel6->CCR & DMA_CCR_EN) { }
DMA1->IFCR = 0xFu << (4u * (6u - 1u)); /* clear ch6 flags */
DMA1_Channel6->CPAR = (uint32_t)&USART2->RDR;
DMA1_Channel6->CMAR = (uint32_t)rx_buf;
DMA1_Channel6->CNDTR = RX_LEN;
DMAMUX1_Channel5->CCR = 26u; /* USART2_RX */
DMA1_Channel6->CCR = DMA_CCR_MINC | DMA_CCR_CIRC; /* 8-bit sizes = reset */
DMA1_Channel6->CCR |= DMA_CCR_EN;
/* ---- TX: only route the channel now (armed per message) ----
DMA1 Ch7 == DMAMUX1 Channel 6, request 27 = USART2_TX */
DMAMUX1_Channel6->CCR = 27u; /* USART2_TX */
/* Step 10: let USART generate DMA requests in both directions */
USART2->CR3 |= USART_CR3_DMAT | USART_CR3_DMAR;
}
/* Fire a memory-to-peripheral transfer of `len` bytes. */
void uart_send_dma(const uint8_t *data, uint16_t len)
{
DMA_Channel_TypeDef *c = DMA1_Channel7;
c->CCR &= ~DMA_CCR_EN;
while (c->CCR & DMA_CCR_EN) { }
DMA1->IFCR = 0xFu << (4u * (7u - 1u)); /* clear ch7 flags */
c->CPAR = (uint32_t)&USART2->TDR;
c->CMAR = (uint32_t)data;
c->CNDTR = len;
c->CCR = DMA_CCR_DIR /* memory -> peripheral */
| DMA_CCR_MINC /* walk the buffer */
| DMA_CCR_TCIE; /* IRQ when done */
c->CCR |= DMA_CCR_EN;
}
/* TX-complete: DMA1 Channel 7 (IRQn = 17). */
void DMA1_Channel7_IRQHandler(void)
{
if (DMA1->ISR & DMA_ISR_TCIF7) {
DMA1->IFCR = DMA_IFCR_CTCIF7; /* ack TC */
DMA1_Channel7->CCR &= ~DMA_CCR_EN; /* stop the channel */
/* For half-duplex/RS-485 wait for USART2->ISR & USART_ISR_TC
before releasing the driver-enable line. */
}
}
/* Bytes received so far into the ring = RX_LEN - remaining. */
uint16_t uart_rx_count(void)
{
return (uint16_t)(RX_LEN - DMA1_Channel6->CNDTR);
}
HAL variant (same USART2 TX via DMA)
The HAL hides the channel and DMAMUX writes behind a DMA_HandleTypeDef. You still choose the channel (Instance) and the request (Init.Request = DMA_REQUEST_USART2_TX, which is 27), then link it to the UART handle.
#include "stm32l4xx_hal.h"
extern UART_HandleTypeDef huart2; /* configured elsewhere (baud, pins) */
DMA_HandleTypeDef hdma_usart2_tx;
void uart2_tx_dma_init(void)
{
__HAL_RCC_DMA1_CLK_ENABLE();
__HAL_RCC_DMAMUX1_CLK_ENABLE();
hdma_usart2_tx.Instance = DMA1_Channel7;
hdma_usart2_tx.Init.Request = DMA_REQUEST_USART2_TX; /* 27 */
hdma_usart2_tx.Init.Direction = DMA_MEMORY_TO_PERIPH;
hdma_usart2_tx.Init.PeriphInc = DMA_PINC_DISABLE;
hdma_usart2_tx.Init.MemInc = DMA_MINC_ENABLE;
hdma_usart2_tx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
hdma_usart2_tx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
hdma_usart2_tx.Init.Mode = DMA_NORMAL;
hdma_usart2_tx.Init.Priority = DMA_PRIORITY_HIGH;
HAL_DMA_Init(&hdma_usart2_tx);
__HAL_LINKDMA(&huart2, hdmatx, hdma_usart2_tx); /* wire handle to UART */
HAL_NVIC_SetPriority(DMA1_Channel7_IRQn, 5, 0);
HAL_NVIC_EnableIRQ(DMA1_Channel7_IRQn);
}
/* Route the channel IRQ into the HAL DMA state machine. */
void DMA1_Channel7_IRQHandler(void)
{
HAL_DMA_IRQHandler(huart2.hdmatx);
}
void app_send(const uint8_t *buf, uint16_t n)
{
HAL_UART_Transmit_DMA(&huart2, (uint8_t *)buf, n);
}
RX reads USART2->RDR (offset 0x24); TX writes USART2->TDR (offset 0x28). They are different registers — pointing CPAR at the wrong one is a frequent copy-paste bug.
07 Interrupts: TC / HT / TE, flags & ping-pong
Each channel can raise three events — transfer complete (TC), half transfer (HT) and transfer error (TE) — enabled by TCIE/HTIE/TEIE in CCR. Status is read from DMAx->ISR and cleared by writing 1 to the matching bit in DMAx->IFCR.
ISR / IFCR flag layout
Each channel owns a nibble; the same bit positions are used in ISR (read) and IFCR (write-1-to-clear, prefixed C).
| Channel | GIF (global) | TCIF | HTIF | TEIF |
|---|---|---|---|---|
| 1 | bit 0 | bit 1 | bit 2 | bit 3 |
| 2 | bit 4 | bit 5 | bit 6 | bit 7 |
| 3 | bit 8 | bit 9 | bit 10 | bit 11 |
| n | 4(n−1) | 4(n−1)+1 | 4(n−1)+2 | 4(n−1)+3 |
| 7 | bit 24 | bit 25 | bit 26 | bit 27 |
NVIC interrupt numbers (RM0432 / DS12023)
| IRQ | Position | IRQ | Position |
|---|---|---|---|
| DMA1_Channel1..7 | 11 … 17 | DMA2_Channel1..5 | 56 … 60 |
| DMA2_Channel6 | 68 | DMA2_Channel7 | 69 |
| ADC1 | 18 | DMAMUX1_OVR | 94 |
Note the DMA2 vector table is not contiguous: channels 1–5 are 56–60 but channels 6–7 jump to 68–69. Use the enum names (DMA2_Channel6_IRQn) from the CMSIS header, never a hand-computed number.
Ping-pong double buffering with HT + TC
In circular mode HT fires at the midpoint and TC at the end. Process the first half on HT and the second half on TC while the DMA keeps filling the other half — continuous streaming with no gaps and no data loss.
#include "stm32l4r5xx.h"
#include <stdint.h>
#define BLK 128u
static volatile uint16_t stream[2u * BLK]; /* two half-buffers */
extern void process(const volatile uint16_t *half, uint32_t n);
void stream_irq_enable(void)
{
/* CNDTR must be 2*BLK and CIRC set when the channel was configured. */
DMA1_Channel1->CCR |= DMA_CCR_HTIE | DMA_CCR_TCIE | DMA_CCR_TEIE;
NVIC_SetPriority(DMA1_Channel1_IRQn, 5);
NVIC_EnableIRQ(DMA1_Channel1_IRQn);
}
void DMA1_Channel1_IRQHandler(void)
{
uint32_t isr = DMA1->ISR;
if (isr & DMA_ISR_HTIF1) { /* first half ready */
DMA1->IFCR = DMA_IFCR_CHTIF1; /* ack BEFORE work */
process(&stream[0], BLK);
}
if (isr & DMA_ISR_TCIF1) { /* second half ready */
DMA1->IFCR = DMA_IFCR_CTCIF1;
process(&stream[BLK], BLK);
}
if (isr & DMA_ISR_TEIF1) { /* bus / config error */
DMA1->IFCR = DMA_IFCR_CTEIF1;
/* channel auto-disables on TE — clear cause and re-init here */
}
}
Clear TCIF/HTIF/TEIF in IFCR at the top of the handler. If you forget, the pending bit stays set and the ISR re-fires immediately, hanging the CPU in the handler.
08 Gotchas & common mistakes
The DMA controller gives almost no feedback when it is misconfigured — it simply does nothing. These are the failure modes that cost the most debugging time on STM32L4+.
| Symptom | Cause & fix |
|---|---|
| DMA does absolutely nothing | DMAMUX1 clock not enabled. Set RCC_AHB1ENR_DMAMUX1EN (bit 2) as well as the DMA1/DMA2 clock. Without it the request never reaches the channel. |
| Channel ignores new CPAR/CMAR/CNDTR | Those registers are read-only while EN=1. Clear EN and spin until it reads 0 before rewriting them. |
| Transfer runs but no peripheral data moves | Step 10 missing: the peripheral's own DMA-request enable (ADC DMAEN, USART DMAT/DMAR, SPI TXDMAEN/RXDMAEN) is separate from the channel EN bit. Both are required. |
| Only ~half the bytes appear, or garbage | PSIZE/MSIZE mismatch. ADC_DR is 16-bit → PSIZE=01 into a uint16_t buffer; UART RDR/TDR is 8-bit → PSIZE=00 into a uint8_t buffer. A byte DMA into a halfword register truncates. |
| Wrong direction / no movement | DIR is about the read source. RX/ADC-in = read peripheral = DIR=0. TX/DAC-out = read memory = DIR=1. It is easy to invert. |
| CNDTR seems off by a factor | CNDTR is an element count, not a byte count. 32 halfwords → CNDTR=32, not 64. |
| ISR fires forever / CPU stuck in handler | You did not clear the flag. Write the matching CTCIFn/CHTIFn/CTEIFn bit to IFCR at the top of the handler. |
| First sample lost / ADC overrun (OVR) | DMA armed after the peripheral produced data. Enable the channel first, or set ADC OVRMOD=1 so it overwrites instead of stalling. |
| Data updates but a stale copy is used | The Cortex-M4 in the L4 has no data cache (no cache maintenance needed, unlike Cortex-M7), but the compiler can still cache the buffer in a register. Declare DMA buffers volatile. |
| Occasional corruption / hard fault | Buffer misaligned for 16/32-bit transfers, or placed in a region the DMA master cannot reach. Keep DMA buffers word-aligned in SRAM1/SRAM2. |
| Two channels, one starves the other | DMAMUX lets any channel serve any request, but bus arbitration is still per-channel: higher PL wins, and on a tie the lower channel number wins. Raise PL on the latency-critical stream. |
| Re-used channel behaves oddly | When you repurpose a channel for a different peripheral, rewrite DMAMUX1_ChannelN->CCR with the new request ID. A leftover ID from the previous use keeps routing the old peripheral. |
| Channel only runs in mem2mem | DMAMUX request ID 0 = MEM2MEM (no hardware trigger). If you leave CxCR at its reset value (0) the channel waits for a request that never comes unless CCR.MEM2MEM is set. |
Checklist before you flash
- Clocks: DMA1/DMA2 and DMAMUX1 and the peripheral.
- Channel disabled while writing CPAR/CMAR/CNDTR; EN set last.
- DMAMUX CxCR = correct request ID for the target channel.
- PSIZE/MSIZE match the register and buffer widths; DIR matches the direction.
- Peripheral DMA-request enable bit set; buffer
volatile, aligned, in SRAM. - IRQ handler clears its IFCR flag first; NVIC enabled with the right IRQn enum.