01 The device at a glance
The STM32L4R5 belongs to the STM32L4+ (L4 "plus") ultra-low-power family: an Arm Cortex-M4 with FPU running at 120 MHz, 2 Mbytes of flash and a large 640 Kbytes of SRAM, targeted at battery/energy-harvesting designs that still need graphics, audio and external-flash bandwidth. Compared to the classic STM32L4 (e.g. L476, RM0351) it adds a higher 120 MHz clock, more SRAM, two OctoSPI controllers, Chrom-ART (DMA2D), and — critically for driver code — a DMAMUX request router instead of fixed DMA channel mapping.
Headline specifications
| Item | STM32L4R5 |
|---|---|
| Core | Arm Cortex-M4F (with DSP + single-precision FPU), 120 MHz max |
| Flash | 2 Mbytes @ 0x0800_0000, single- or dual-bank (DBANK option bit) |
| SRAM | 640 Kbytes total = SRAM1 192K + SRAM2 64K + SRAM3 384K |
| Voltage scaling | Range 1 boost (120 MHz), Range 1 normal (80 MHz), Range 2 (26 MHz) |
| Accelerator | ART Accelerator: prefetch + instruction cache + data cache |
| External memory | 2x OctoSPI (XIP), FMC (static SRAM/NOR/PSRAM) |
| DMA | DMA1 + DMA2 (7 channels each) routed through DMAMUX1 |
| Packages | LQFP100, LQFP144, UFBGA132, UFBGA169 |
| Supply | 1.71 V – 3.6 V, ultra-low-power modes down to Shutdown |
Where the documentation lives
Every exact address, bit field and timing in this guide comes from ST's primary documents. Bookmark these — the reference manual and datasheet are the ground truth when a header macro is ambiguous:
| Document | ID / file | What it defines |
|---|---|---|
| Reference manual | RM0432 (dm00310109.pdf) | Register map, memory map, every peripheral (L4R/L4S) |
| Datasheet | DS12023 (stm32l4r5vi.pdf) | Pinout, package drawings, AF mapping, electricals, ordering scheme |
| Cortex-M4 prog. manual | PM0214 | Core: NVIC, MPU, FPU, SysTick, SCB, instruction set |
| Errata sheet | ES (product page) | Silicon limitations — always read before shipping |
| CMSIS device header | stm32l4r5xx.h | All the RCC->, FLASH_ACR_* macros used below |
"L4R5" is the flash/feature line; the trailing letters in a full part number such as STM32L4R5ZI encode package and flash size — decoded in Section 06. RM0432 also covers L4R7/L4R9 (adds LCD-TFT/DSI graphics) and the L4S5/S7/S9 SecureBoot variants, so filter register notes by device.
02 Cortex-M4F core, clock tree & reaching 120 MHz
The core is a full Cortex-M4F: 3-stage pipeline, Thumb-2, hardware divide/MAC DSP instructions, a single-precision FPU, an 8-region MPU, the NVIC (interrupt controller, up to 16 priority levels) and a 24-bit SysTick down-counter clocked from HCLK. Flash accesses go through the ART Accelerator (prefetch + instruction/data cache) so zero-wait-state throughput is possible even at 120 MHz.
Clock sources
| Source | Frequency | Typical role |
|---|---|---|
| MSI | 100 kHz – 48 MHz | Reset default SYSCLK (4 MHz); can be PLL/USB source, LSE-trimmed |
| HSI16 | 16 MHz | Factory-trimmed RC, fast wake, common PLL input (used below) |
| HSE | 4 – 48 MHz | External crystal or clock in |
| PLL / PLLSAI1 / PLLSAI2 | up to 120 MHz | SYSCLK, plus 48 MHz for USB/RNG/SDMMC, SAI and ADC clocks |
| LSE | 32.768 kHz | RTC, low-drift timebase |
| LSI | ~32 kHz | IWDG, low-power RTC/auto-wake |
After reset the device runs from MSI at 4 MHz in voltage Range 1 (normal). To get to 120 MHz you must (1) select Range 1 boost, (2) raise flash wait states, then (3) engage the PLL — in that order.
Voltage scaling & flash wait states
120 MHz is only legal in Range 1 boost mode (PWR_CR5.R1MODE = 0). Flash latency must be set before the clock is raised. Wait-state limits in Range 1 boost:
| LATENCY (FLASH_ACR) | Wait states | Max HCLK (Range 1 boost) |
|---|---|---|
| 0b0000 | 0 WS | ≤ 20 MHz |
| 0b0001 | 1 WS | ≤ 40 MHz |
| 0b0010 | 2 WS | ≤ 60 MHz |
| 0b0011 | 3 WS | ≤ 80 MHz |
| 0b0100 | 4 WS | ≤ 100 MHz |
| 0b0101 | 5 WS | ≤ 120 MHz |
In Range 1 normal the ceiling is 80 MHz (4 WS); in Range 2 it is only 26 MHz (3 WS). PLL constraints for the 120 MHz recipe:
| PLL stage | Chosen value | Constraint |
|---|---|---|
| Input source | HSI16 = 16 MHz | — |
| PLLM (÷) | ÷2 → 8 MHz | PLL ref input must be 4–16 MHz |
| PLLN (×) | ×30 → 240 MHz VCO | VCO output 64–344 MHz |
| PLLR (÷) | ÷2 → 120 MHz SYSCLK | SYSCLK ≤ 120 MHz |
| AHB / APB1 / APB2 | ÷1 each | each bus ≤ 120 MHz |
Register-level clock init to 120 MHz (no HAL)
/* Bring SYSCLK to 120 MHz from the 16 MHz HSI using the main PLL.
* Order is mandatory: boost + wait-states BEFORE raising the clock. */
#include "stm32l4r5xx.h" /* CMSIS device header from STM32CubeL4 */
void sysclk_120mhz_hsi(void)
{
/* 1. Turn on HSI16 and wait until it is stable */
RCC->CR |= RCC_CR_HSION;
while (!(RCC->CR & RCC_CR_HSIRDY)) { }
/* 2. Enable the power interface clock, select Range 1, then Range-1 BOOST */
RCC->APB1ENR1 |= RCC_APB1ENR1_PWREN;
PWR->CR1 = (PWR->CR1 & ~PWR_CR1_VOS) | PWR_CR1_VOS_0; /* VOS = 01 = Range 1 */
while (PWR->SR2 & PWR_SR2_VOSF) { } /* wait regulator ready */
PWR->CR5 &= ~PWR_CR5_R1MODE; /* 0 = boost: unlocks 120 MHz */
/* 3. Flash: 5 wait states for 120 MHz, enable prefetch + I/D cache (ART) */
FLASH->ACR = (FLASH->ACR & ~FLASH_ACR_LATENCY) | FLASH_ACR_LATENCY_5WS
| FLASH_ACR_PRFTEN | FLASH_ACR_ICEN | FLASH_ACR_DCEN;
/* 4. PLL: 16MHz /2 = 8MHz ref, *30 = 240MHz VCO, /2 = 120MHz SYSCLK.
* Note: the PLLM field is encoded as (divider - 1), so /2 -> 1. */
RCC->PLLCFGR = RCC_PLLCFGR_PLLSRC_HSI
| (1u << RCC_PLLCFGR_PLLM_Pos) /* PLLM = /2 */
| (30u << RCC_PLLCFGR_PLLN_Pos) /* PLLN = x30 */
| (0u << RCC_PLLCFGR_PLLR_Pos) /* PLLR = /2 */
| RCC_PLLCFGR_PLLREN; /* enable PLLCLK (R) output */
RCC->CR |= RCC_CR_PLLON;
while (!(RCC->CR & RCC_CR_PLLRDY)) { }
/* 5. Bus prescalers: AHB/1, APB1/1, APB2/1 (all valid up to 120 MHz) */
RCC->CFGR &= ~(RCC_CFGR_HPRE | RCC_CFGR_PPRE1 | RCC_CFGR_PPRE2);
/* 6. Switch SYSCLK to the PLL and wait for the hardware to confirm */
RCC->CFGR = (RCC->CFGR & ~RCC_CFGR_SW) | RCC_CFGR_SW_PLL;
while ((RCC->CFGR & RCC_CFGR_SWS) != RCC_CFGR_SWS_PLL) { }
SystemCoreClock = 120000000u; /* keep the CMSIS global in sync */
}
Same thing with the HAL
/* Equivalent 120 MHz setup with the STM32L4xx HAL. HAL takes the human
* PLLM value (2), not the register-encoded value, and does the -1 for you. */
void SystemClock_Config(void)
{
RCC_OscInitTypeDef osc = {0};
RCC_ClkInitTypeDef clk = {0};
/* Range 1 BOOST voltage scaling is required for 120 MHz */
HAL_PWREx_ControlVoltageScaling(PWR_REGULATOR_VOLTAGE_SCALE1_BOOST);
osc.OscillatorType = RCC_OSCILLATORTYPE_HSI;
osc.HSIState = RCC_HSI_ON;
osc.HSICalibrationValue = RCC_HSICALIBRATION_DEFAULT;
osc.PLL.PLLState = RCC_PLL_ON;
osc.PLL.PLLSource = RCC_PLLSOURCE_HSI;
osc.PLL.PLLM = 2; /* 16 / 2 = 8 MHz */
osc.PLL.PLLN = 30; /* 8 * 30 = 240 MHz VCO */
osc.PLL.PLLR = RCC_PLLR_DIV2; /* 240 / 2 = 120 MHz */
osc.PLL.PLLQ = RCC_PLLQ_DIV2;
osc.PLL.PLLP = RCC_PLLP_DIV2;
HAL_RCC_OscConfig(&osc);
clk.ClockType = RCC_CLOCKTYPE_SYSCLK | RCC_CLOCKTYPE_HCLK
| RCC_CLOCKTYPE_PCLK1 | RCC_CLOCKTYPE_PCLK2;
clk.SYSCLKSource = RCC_SYSCLKSOURCE_PLLCLK;
clk.AHBCLKDivider = RCC_SYSCLK_DIV1;
clk.APB1CLKDivider = RCC_HCLK_DIV1;
clk.APB2CLKDivider = RCC_HCLK_DIV1;
HAL_RCC_ClockConfig(&clk, FLASH_LATENCY_5); /* 5 WS */
}
03 Memory map & memory regions
Like every Cortex-M, the L4R5 uses the fixed 4 GB Armv7-M address space. Code lives in the low gigabyte (reached over the fast I-Code/D-Code buses), SRAM and peripherals sit at 0x2000_0000 and 0x4000_0000, external memories map into 0x6000_0000–0x9FFF_FFFF, and the private peripheral block (NVIC/SysTick/SCB) is at 0xE000_0000.
Top-level memory map
| Region | Base | Size | Notes |
|---|---|---|---|
| Boot alias | 0x0000_0000 | — | Mirrors whatever boot memory is selected (see Section 07) |
| Main flash | 0x0800_0000 | 2 MB | Program + constants; dual-bank capable |
| System memory | 0x1FFF_0000 | 28 KB | Factory ST ROM bootloader (USART/USB/I2C/SPI DFU) |
| OTP | 0x1FFF_7000 | 512 B | One-time-programmable area |
| Option bytes (bank 1) | 0x1FFF_7800 | — | RDP, WRP, user config; shadowed into FLASH_OPTR |
| Option bytes (bank 2) | 0x1FFF_F800 | — | Second copy used in dual-bank mode |
| SRAM2 (alias) | 0x1000_0000 | 64 KB | Same RAM as below, on a bus path good for 0-WS code |
| SRAM1 | 0x2000_0000 | 192 KB | Main system RAM (0x2000_0000–0x2002_FFFF) |
| SRAM2 | 0x2003_0000 | 64 KB | Parity option, Standby-retainable (0x2003_0000–0x2003_FFFF) |
| SRAM3 | 0x2004_0000 | 384 KB | Large data RAM (0x2004_0000–0x2009_FFFF) |
| APB1 peripherals | 0x4000_0000 | — | TIM2-7, LPTIM, I2C, USART2-5, LPUART, SPI2/3, USB, CAN, DAC, PWR, RTC |
| APB2 peripherals | 0x4001_0000 | — | SYSCFG, EXTI, TIM1/8/15/16/17, USART1, SPI1, SAI, DFSDM |
| AHB1 peripherals | 0x4002_0000 | — | DMA1/2, DMAMUX1, RCC, FLASH reg, CRC, TSC, DMA2D, GFXMMU |
| AHB2 peripherals | 0x4800_0000 | — | GPIOA–GPIOI, ADC, AES, HASH, RNG, DCMI, USB OTG FS |
| FMC bank 1 | 0x6000_0000 | 256 MB | External static SRAM/NOR/PSRAM |
| OCTOSPI2 (mapped) | 0x7000_0000 | 256 MB | Memory-mapped XIP flash/PSRAM on OCTOSPI2 |
| OCTOSPI1 (mapped) | 0x9000_0000 | 256 MB | Memory-mapped XIP flash/PSRAM on OCTOSPI1 |
| Cortex-M4 PPB | 0xE000_0000 | 1 MB | SysTick, NVIC, SCB, MPU, FPU, debug (ITM/DWT) |
Some early revisions of RM0432 printed the OctoSPI base addresses swapped. On silicon, OCTOSPI1 is memory-mapped at 0x9000_0000 and OCTOSPI2 at 0x7000_0000. Trust the hardware / latest revision; the datasheet DS12023 memory map matches this.
SRAM1 / SRAM2 / SRAM3 in detail
| Block | Size | Special features |
|---|---|---|
| SRAM1 | 192 KB | General-purpose data/stack/heap; contiguous with SRAM2 & SRAM3 from 0x2000_0000 |
| SRAM2 | 64 KB | Optional hardware parity (1 bit/byte); page-granular write protect; aliased at 0x1000_0000; can be retained in Standby/Shutdown via PWR_CR3.RRS (full 64 KB or partial 4 KB) |
| SRAM3 | 384 KB | Largest block; framebuffer/DSP buffers; not retained in low-power Standby |
Because SRAM1+SRAM2+SRAM3 are contiguous (0x2000_0000 up to 0x2009_FFFF), a linker script can treat the 640 KB as one RAM region — but if you rely on Standby retention or parity you must keep those objects inside the SRAM2 window.
04 Bus matrix & bus masters
The L4R5 is not a single-master CPU bus — it is a multi-layer AHB bus matrix. Several masters (the CPU plus DMA engines) can reach several slaves concurrently, which is exactly why DMA can move data while the core keeps executing. Understanding who is a master matters when you debug a stalled transfer or a bus-contention slowdown.
Masters (initiators)
| Master | Reaches |
|---|---|
| Cortex-M4 I-Code / D-Code | Flash (0x0000_0000–0x1FFF_FFFF): instruction fetch & literal loads |
| Cortex-M4 System bus | SRAM, peripherals, external memory (0x2000_0000 and above) |
| DMA1 / DMA2 | SRAM, peripherals, FMC/OCTOSPI — 7 channels each |
| DMA2D (Chrom-ART) | SRAM ↔ external framebuffer blits/blends |
| SDMMC1 | Its own AHB master port for card DMA |
| USB OTG FS | Dedicated master for endpoint DMA |
Slaves (targets) & the DMAMUX
Slaves include the flash interface, SRAM1/2/3, the four peripheral buses (APB1, APB2, AHB1, AHB2), the FMC and the two OctoSPI controllers. On the L4+ every DMA channel's trigger source is chosen through DMAMUX1: you write a request-line ID into DMAMUX1_CxCR.DMAREQ_ID rather than using the fixed per-channel CSELR tables of the classic STM32L4. Any peripheral request can go to any of the 14 channels.
Driver code ported from an STM32L476 (RM0351) will not work unchanged: there is no DMA_CSELR. Enable RCC_AHB1ENR.DMAMUX1EN, then set DMAMUX1_Channelx->CCR = request_id where the request IDs are listed in RM0432's DMAMUX table (e.g. USART1_TX, SPI1_RX, ADC1, TIM2_UP …).
05 Peripheral inventory & base addresses
The L4R5 is feature-dense. Below is the full peripheral count (from DS12023) followed by the base addresses you will hard-code or reach through CMSIS macros. Exact per-instance addresses always come from RM0432's "Memory map and register boundary addresses" table.
Peripheral count
| Class | On the L4R5 |
|---|---|
| Timers | 2x advanced-control 16-bit (TIM1/8), 2x 32-bit GP (TIM2/5), 7x 16-bit GP (TIM3/4/6/7/15/16/17), 2x low-power (LPTIM1/2), SysTick, IWDG, WWDG |
| ADC | 1x 12-bit SAR, up to 5 Msps, multiple channels, oversampling |
| DAC / analog | 2x DAC channels, 2x comparators, 2x op-amps, VREFBUF |
| I2C | 4x (Fast-mode Plus, SMBus) |
| SPI | 3x |
| USART / UART / LPUART | 3x USART + 2x UART + 1x LPUART |
| Audio | 2x SAI, DFSDM (digital filter for sigma-delta mics) |
| Mass storage / bus | 1x SDMMC, 1x bxCAN, 1x USB OTG FS |
| Graphics / camera | DMA2D (Chrom-ART), DCMI camera interface, GFXMMU |
| External memory | 2x OCTOSPI, 1x FMC |
| Security / misc | AES, HASH (on S variants), true RNG, CRC, TSC (touch) |
| GPIO | Up to 9 ports (GPIOA–GPIOI), all on AHB2 |
Key base addresses (CMSIS-confirmed)
| Peripheral | Base address | Bus |
|---|---|---|
| PWR | 0x4000_7000 | APB1 |
| SYSCFG | 0x4001_0000 | APB2 |
| EXTI | 0x4001_0400 | APB2 |
| DMA1 | 0x4002_0000 | AHB1 |
| DMA2 | 0x4002_0400 | AHB1 |
| DMAMUX1 | 0x4002_0800 | AHB1 |
| RCC | 0x4002_1000 | AHB1 |
| FLASH registers | 0x4002_2000 | AHB1 |
| GPIOA | 0x4800_0000 | AHB2 |
| GPIOB … GPIOI | +0x400 per port | AHB2 |
| NVIC / SCB / SysTick | 0xE000_E000 | PPB |
Every peripheral is clock-gated off at reset. Before touching any register you must set its enable bit in the matching RCC register — e.g. GPIO ports live on AHB2, so RCC->AHB2ENR |= RCC_AHB2ENR_GPIOAEN; not an APB register (a classic STM32F4-to-L4 porting bug).
06 Packages & part-number decoding
A full ordering code such as STM32L4R5ZIT6 encodes everything: family, line, pin count/package, flash size, temperature range and packing. The two letters right after L4R5 are the ones you decode most often — pin/package and flash size.
Ordering-scheme letters (per DS12023)
| Field | Code | Meaning |
|---|---|---|
| Pin count / package | V | LQFP100 (100 pins) |
| Pin count / package | Q | UFBGA132 (132 balls) |
| Pin count / package | Z | LQFP144 (144 pins) |
| Pin count / package | A | UFBGA169 (169 balls) |
| Flash size | I | 2 Mbytes |
| Temperature | 6 / 7 | –40…85 °C / –40…105 °C |
So STM32L4R5ZI = LQFP144 + 2 MB, STM32L4R5VI = LQFP100 + 2 MB, STM32L4R5QI = UFBGA132 + 2 MB, and STM32L4R5AI = UFBGA169 + 2 MB. The Nucleo-144 board (NUCLEO-L4R5ZI) uses the LQFP144 part.
Package trade-offs
| Package | Pins | Practical note |
|---|---|---|
| LQFP100 | 100 | Hand-solderable; FMC exposes only bank 1 / reduced external-memory pins |
| UFBGA132 | 132 | 0.5 mm pitch BGA; more I/O in small footprint |
| LQFP144 | 144 | Most peripherals/IO broken out; the usual dev/eval choice |
| UFBGA169 | 169 | 0.5 mm pitch; maximum I/O, full FMC + dual OctoSPI pinout |
Peripheral silicon is identical across packages; only how many pins are bonded out differs. If a design needs both OctoSPI ports plus DCMI plus FMC simultaneously, verify the AF table in DS12023 for your package — a 100-pin part simply cannot route every function at once.
07 Boot modes & option bytes
On reset the CPU reads the initial stack pointer from address 0x0000_0000 and the reset vector from 0x0000_0004. What physically sits at 0x0000_0000 is chosen by the boot configuration — a combination of the BOOT0 input and the nBOOT1 option bit.
Boot memory selection (RM0432)
| BOOT0 | nBOOT1 | Boot space (aliased at 0x0) |
|---|---|---|
| 0 | x | Main flash memory (0x0800_0000) |
| 1 | 1 | System memory — ST factory bootloader (0x1FFF_0000) |
| 1 | 0 | Embedded SRAM1 (0x2000_0000) |
Where BOOT0 comes from is itself an option: nSWBOOT0 = 1 (factory default) takes BOOT0 from the physical PH3/BOOT0 pin; nSWBOOT0 = 0 takes it from the nBOOT0 option bit, letting you fix the boot source in software with no pull resistor. The BOOT0 pin level is latched a few SYSCLK cycles after reset release.
Option bytes at a glance
Option bytes are non-volatile configuration loaded into the read-only FLASH_OPTR shadow register at reset. Physical copies live at 0x1FFF_7800 (bank 1) and 0x1FFF_F800 (bank 2). Key fields inside FLASH_OPTR:
Reprogramming an option byte (register level)
/* Set nSWBOOT0=0 (BOOT0 from option bit) and nBOOT0=1 (boot main flash)
* so the board always boots the application even if BOOT0 pin floats high.
* WARNING: OBL_LAUNCH reloads option bytes and RESETS the MCU. */
#define FLASH_KEY1 0x45670123u
#define FLASH_KEY2 0xCDEF89ABu
#define FLASH_OPTKEY1 0x08192A3Bu
#define FLASH_OPTKEY2 0x4C5D6E7Fu
void boot_from_flash_fixed(void)
{
while (FLASH->SR & FLASH_SR_BSY) { }
/* 1. Unlock the flash control register (FLASH_CR) */
if (FLASH->CR & FLASH_CR_LOCK) {
FLASH->KEYR = FLASH_KEY1;
FLASH->KEYR = FLASH_KEY2;
}
/* 2. Unlock the option-byte programming */
FLASH->OPTKEYR = FLASH_OPTKEY1;
FLASH->OPTKEYR = FLASH_OPTKEY2;
/* 3. Modify the FLASH_OPTR shadow */
FLASH->OPTR &= ~FLASH_OPTR_nSWBOOT0; /* BOOT0 taken from nBOOT0 bit */
FLASH->OPTR |= FLASH_OPTR_nBOOT0; /* nBOOT0 = 1 -> main flash */
/* 4. Commit, wait, then reload option bytes (this triggers a reset) */
FLASH->CR |= FLASH_CR_OPTSTRT;
while (FLASH->SR & FLASH_SR_BSY) { }
FLASH->CR |= FLASH_CR_OBL_LAUNCH; /* system reset on completion */
}
RDP Level 2 (0xCC) is permanent: it disables the debug port and the ST bootloader forever, and the chip can never be reopened. Never program it during development. Level 1 → Level 0 regression erases all flash but is recoverable.
08 Gotchas & common mistakes
The bugs below account for most "it hard-faults on my L4R5" reports. They come from the L4+ differences (boost mode, DMAMUX, AHB2 GPIO) that trip up code copied from other STM32 lines.
Jumping SYSCLK to 120 MHz while still in Range 1 normal, or with fewer than 5 flash wait states, gives instant hard faults or random corruption. Order is fixed: set VOS Range 1 → clear PWR_CR5.R1MODE (boost) → set FLASH_ACR.LATENCY = 5WS → enable PLL → switch clock. When slowing down, reduce the clock before lowering wait states.
The L4+ has no DMA_CSELR. Peripheral-to-channel routing goes through DMAMUX1 (DMAMUX1_CxCR.DMAREQ_ID). Enable RCC_AHB1ENR.DMAMUX1EN. Code lifted from STM32L476 examples will not compile or will silently trigger the wrong request.
Enabling GPIO clocks in the wrong RCC register is the top porting bug from STM32F1/F4. Use RCC->AHB2ENR |= RCC_AHB2ENR_GPIOxEN;. Reads/writes to a GPIO whose clock is off simply do nothing.
If you enable SRAM2 hardware parity (option bit SRAM2_PE), every byte must be written at least once after reset before it is read — an uninitialized read raises a parity error (NMI/bus fault). Startup code should zero the whole SRAM2 window.
Just SRAM2 (up to 64 KB, gated by PWR_CR3.RRS) is retained across Standby/Shutdown. Anything you keep in SRAM1 or the big SRAM3 is lost. Put retained state in the SRAM2 address window (0x2003_0000) and place it there in the linker script.
Flipping the DBANK option bit changes page size (4 KB dual-bank vs 8 KB single-bank), bank-2 base address and the option-byte layout. Any flash driver, bootloader or linker script with hard-coded page sizes must be revisited after changing it.
Out of reset the device runs from MSI at 4 MHz in Range 1 normal — not HSI, not the PLL. Any delay loop or baud-rate calculation must account for the real SYSCLK after SystemClock_Config(), and keep the CMSIS SystemCoreClock global updated.
FLASH_OPTR is only a shadow. Writing it and setting OPTSTRT programs the non-volatile bytes, but the new configuration takes effect only after OBL_LAUNCH reloads them — which resets the MCU. Expect (and design for) that reset.