Manage memory and latency considerations
This topic describes basic memory use and latency considerations for real-time applications that run on the MT3620 chip.
Note
For more detail about memory configuration or DMA, see the published MT3620 Datasheet from MediaTek; if questions remain, you can request the "MT3620 M4 Datasheet" from Avnet by emailing Azure.Sphere@avnet.com.
Memory layout on the real-time cores
The following table summarizes the memory available on the real-time cores:
Memory type | Base Address |
---|---|
TCM | 0x00100000 |
XIP flash | 0x10000000 |
SYSRAM | 0x22000000 |
Each real-time core has 192 KB of tightly-coupled memory (TCM), which is mapped in three banks of 64 KB starting at 0x00100000. TCM accesses are fast, but only the real-time core can access the memory. TCM cannot be shared with a high-level application or with a real-time capable application (RTApp) that runs on a different core.
Each real-time core also has 64 KB of SYSRAM, which is mapped starting at 0x22000000. The DMA controller can also target SYSRAM, so that peripherals can access it. Accesses to SYSRAM from the real-time core are slower than accesses to TCM. As with TCM, SYSRAM cannot be shared with another application.
Execute-in-place (XIP) flash memory is shared with high-level applications. A window into the XIP mapping of the flash is visible to each core at address 0x10000000. The OS configures the XIP mapping before it starts the application if the application’s ELF file contains a segment that has the following properties:
- Load address (as specified in the VirtAddr column of the Program Header) is equal to 0x10000000
- File offset and size (as specified in the FileSiz and MemSiz fields in the Program Header) fit in the application’s ELF file
If a program header with these properties is present in the application’s ELF file, the XIP window will be positioned so that the segment is visible at 0x10000000. The file can have no more than one XIP segment, and it must point to 0x10000000; it cannot specify any other address.
ELF deployment
RTApp images must be ELF files. The ELF image is wrapped in an Azure Sphere image package and deployed as an application. To load the application, the Azure Sphere OS starts an ELF loader that runs on the real-time core. The loader processes each LOAD segment in the ELF file and loads it into the type of memory indicated by the virtual address in the program header.
Use arm-none-eabi-readelf.exe -l
(lowercase L), which is part of the GNU Arm Embedded Toolchain, to display the program headers for your application. The virtual address column (VirtAddr) that appears in the header indicates the destination address for the load segment. It does not mean that the processor itself performs any additional translation. The Azure Sphere ELF loader doesn't use the physical address (PhysAddr).
Consider this example:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000098 0x00100000 0x00100000 0x00000 0x00e78 RW 0x8
LOAD 0x0000a0 0x10000000 0x10000000 0x03078 0x03078 RWE 0x10
LOAD 0x003118 0x00100e78 0x10003078 0x000f0 0x000f0 RW 0x4
The segment at 0x00100000 is targeted at tightly-coupled memory (TCM). The loader either copies data from the image package into RAM or zero-initializes the TCM as required.
The segment at 0x10000000 is mapped to the XIP window for the core. At run time, accesses to
0x10000000 + offset
are translated to<address-of-XIP-segment-in-flash> + offset
when they leave the real-time core.The data segment at virtual address 0x00100e78 is mapped to TCM.
ELF runtime considerations
The ELF loader performs some of the tasks that a raw binary (or chained bootloader) would perform at start-up. Specifically, it zero-initializes block-started-by-symbol (BSS) data and copies initialized but mutable data from read-only flash into RAM, according to the program headers. The application then starts and runs its own initialization functions. In most cases, changes to existing applications aren't required. Zeroing the BSS data in the application is unnecessary but harmless, because the loader has already zeroed the memory.
Copying mutable data from flash to RAM may in some circumstances result in problems, depending on how the ELF file is laid out. The ELF loader processes the program headers sequentially, without changing the overall layout of the segments in the file. It then maps not only the XIP segment itself to 0x10000000, but also any subsequent segments in order. If the segments in the ELF file are in sequential order without any alignment or gaps, OS startup code can use pointer arithmetic to find the start of the data segment. If the ELF file has a different layout, however, pointer arithmetic does not result in the correct address, so application startup code must not try to copy the data section. This may cause problems if the application or RTOS uses a chained bootloader or needs to set up a stack canary before zeroing BSS or initializing mutable data.
Memory targets
You can target code at TCM, XIP flash, or SYSRAM by editing the linker.ld script for your application. The Azure Sphere sample applications run from TCM, but the linker.ld script file for each application describes how to target XIP flash instead. As the following example shows, you can change a sample to run on XIP by aliasing CODE_REGION and RODATA_REGION to FLASH instead of the default TCM:
REGION_ALIAS("CODE_REGION", FLASH);
REGION_ALIAS("RODATA_REGION", FLASH);
To determine whether a compiled application runs from TCM or XIP flash, use arm-none-eabi-readelf.exe
, which is part of the GNU Arm Embedded Toolchain. Run it on the .out file, which is in the same directory as the image package, and specify the -l
(lowercase L) flag to see where the code and read-only data have been placed. Code and read-only data that are in flash memory are loaded at address 0x10000000; code and data in TCM are loaded in the TCM region.
The following example shows an application that runs from flash memory.
arm-none-eabi-readelf.exe -l UART_RTApp_MT3620_BareMetal.out
Elf file type is EXEC (Executable file)
Entry point 0x10000000
There are 2 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000074 0x00100000 0x00100000 0x00284 0x003c0 RW 0x4
LOAD 0x000300 0x10000000 0x10000000 0x013b9 0x013b9 R E 0x10
Section to Segment mapping:
Segment Sections...
00 .data .bss
01 .text .rodata
Vector table location
On ARMv7-M devices, the vector table must be aligned on a power-of-two boundary that is at least 128 bytes and no less than the size of the table, as noted in the ARMv7-M Architecture Reference Manual. Each I/O RT core on the MT3620 supports 100 external interrupts. Therefore, including the stack pointer and 15 standard exceptions, the table has 116 4-byte entries, for a total size of 464 bytes, which rounds up to 512 bytes.
When the code is run from XIP flash, the vector table must be placed at 0x10000000 and must be aligned on a 32-byte boundary within the ELF file. When the code is not run from XIP flash, the table is typically placed at the start of TCM0, which is 0x100000. In either case, to ensure that the table's virtual address is correctly aligned, put the vector table in a dedicated section and set CODE_REGION to the appropriate address.
The MT3620 BareMetal samples in the Azure Sphere Samples repository show how to do this. The declaration of the vector table in main.c sets its section
attribute to .vector_table
. The linker script aliases CODE_REGION to the start of either TCM or XIP, and the ALIGN attribute sets the alignment of the text section within the ELF file as follows:
SECTIONS
{
.text : ALIGN(32) {
KEEP(*(.vector_table))
*(.text)
} >CODE_REGION
...
}
Real-time and latency considerations
RTApps and high-level applications contend for access to flash memory, even if they don't communicate with each other. As a result, RTApps that are running from XIP flash may encounter high and unpredictable latency. Writes to flash, such as during an update, may involve latency spikes up to several hundred milliseconds. Depending on your application's requirements, you can manage this in several ways:
Put all code and data in TCM. Code that runs from TCM is not vulnerable to contention for flash.
Split code into critical and non-critical sections, and run the non-critical code from flash. Code that has real-time requirements, such as a watchdog timer, should not have to run when other code is accessing the flash. Memory targets describes how to target XIP flash instead of TCM.
Use cache. An application can use the lowest 32KB of TCM as XIP cache. This approach does not provide hard real-time guarantees in the event of a cache miss, but improves typical performance without requiring you to move all the code into RAM. Refer to the "MT3620 M4 Datasheet" for information about XIP cache configuration.