U-Boot Environment in FRAM Through SPI-Flash
Posted Fri, 19 May 2023 19:14:37 +0200 | Operative Systems| MIST|
Background
In MIST we have since several years back at this point worked on a functionallity in our on-board software (OBCSW) to allow us to remotely upload new software binaries through the command uplink, which is not only useful for fixing bugs when the satellite is in space, but also on the ground, since it makes it possible to re-program the on-board computer (iOBC) after satellite integration when it will not have an exposed debugging port.
The current implementation used to be nearly finished, and uses the U-boot bootloader to select between different software binaries. Specifically, it uses a modified version of Kubos U-boot which has been ported to the iOBC board before. Support in the OBCSW for uploading HMAC signed binaries is also complete and has been confirmed to work. Then, what is the problem? Well, the binaries and the file that keeps the environment variables that are used by U-boot are both stored on the on-board SD-cards. The SD cards are FAT formatted and accessed through a Multimedia-Card Interface (MCI) peripheral, and we are locked into using a driver supplied by the vendor. This driver works fine on its own, but as soon as we flashed U-boot to the iOBC, the OBCSW froze while setting up the SD-card driver. And this even though U-boot was able to read the environment file and update it without any problems. And so began my deep-dive into the AT91SAM9G20 datasheet, U-boot, proprietary drivers, my eventual ditching of the SD cards in favor of the much more stable FRAM device.
iOBC Datasheet
I found a very useful datasheet of the iOBC (AT91SAM9G20) board here, which I used extensively throughout the process of troubleshooting the software update functionallity. Most notably, it covers all peripherals in use, and the memory addresses of the registers that the iOBC uses to access them. This also made it a lot easier to reverse engineer and verify the correctness of the drivers that were supplied by the vendor using Ghidra, and also U-boot’s drivers.
SD card battle
The first thing that came to my mind when facing the SD card driver freeze issue was that the driver used
by U-boot could be partially incompatible with the MMC peripheral used by the iOBC board. After all, we
had no problems like this before we introduced U-boot into the equation. My suspicions were supported by
the fact that our QEMU emulator reported a number of invalid memory accesses while it ran U-boot. For example, it seemed
like U-boot expected there to be an MMC version register located at offset 0xFC
in the MCI peripheral, but
according to the AT91SAM9G20 datasheet no such register exists. I managed to track down
where U-boot performs these invalid accesses, and fixed them in a commit.
Unfortunately, this made everythig worse,
as U-boot was no longer able to read the SD card at all. (Eventually, I found out that this version register was
indeed supported by the iOBC and that the AT91SAM9G20 datasheet was wrong. The version reported by the iOBC is 0x210
)
I decided to give up on this track for now and reverted the commit.
SafeFAT
I decided to look closer into the stack trace in the Eclipse IDE where the freeze occurs, and it was then when I
noticed that in that stack trace was a function called safe_poweron
. I knew since before that the SD card
driver in use by the OBCSW was using some feature called SafeFAT,
which is developed by the vendor of the driver and is supposed to make FAT more stable and fail-safe.
It does this somehow by creating a folder called $$SAFE$$
and places some files in there, which I assume are
journals and other things commonly supported by more modern file systems. The reason I am vague here is because I
couldn’t find any clear specification of how SafeFAT works, the user guide only mentions something about allocating
new sectors and “modifying chains”. Obviously we don’t have the source code, and I could also not
find any evidence of it being used by someone other than the vendor.
When looking closer at the stack trace, I found the cause for the freeze. It was not stuck in a loop of
kicking the watchdog and waiting for a register to update, as I initially assumed. It was stuck traversing
the FAT cluster linked list, as an entry in that list presumably corresponding to a file in the $$SAFE$$
directory was pointing to itself.
I found out that it was possible to disable SafeFAT by initializing the SD card driver through calling f_initvolume_nonsafe
instead of f_initvolume
(The irony of naming the standard FAT “nonsafe” here is pretty hilarious). And once I
switched to “““Unsafe””” FAT, the OBCSW was once again able to boot normally. At this point I was still not sure if SafeFAT
was actually to blame for the freeze, but I had no second thoughts about disabling it since my trust in it was
essentially reduced to nothing.
Peripheral DMA Controller
While I thought I had won, more extensive testing showed that disabling SafeFAT was not enough to resolve the problem.
While the SD card driver could now initialize just fine, the OBCSW would still hang often when reading or writing to
files on the SD card. This time, the freeze was inside a low level function called mcipdc_read
, called by f_open
,
which uses the Peripheral DMA Controller (PDC) to read from the MCI interface. This peripheral acts as some sort of hardware
acceleration since it is faster to use the PDC compared to reading the Receive Data Register (MCI_RDR), one word at
a time.
After more digging with Ghidra, I was able to track down the cause for the freeze. mcipdc_read
was stuck in an
infinite loop waiting for the MCI Status Register (MCI_SR) register’s RXBUFF bit to be set. According to
the AT91SAM9G20 datasheet,
this bit is set when the PDC’s receive buffer is full (presumably when all data has been received), i.e., when
the PDC Receive Next Counter Register (PERIPH_RCR) is 0. The problem is that for some reason this register is
never set to 0, it is stuck at 0x100
the entire time.
I started looking a bit more into FAT filesystems, and the difference between sectors and clusters. Clusters
are a higher level concept that is related to the linked list issue with SafeFAT. Sectors is a lower level concept
and are stored in a linear manner. I was able to determine that the sector size hardcoded into the SD card driver
is 512 bytes. I also discovered through the datasheet that PERIPH_RCR actually contains the number of words that
should be read, not bytes. mcipdc_read
initially sets PERIPH_RCR to 0x400
, which means that it attempts to
read 8 sectors, but was only able to read 6 sectors before the freeze. Just for the heck of it, I decided to use
the debugger to set PERIPH_RCR to 0. And lo and behold, the OBCSW recovered from the freeze.
After looking into how U-boot reads from the SD card, I discovered that it does not utilize the PDC, it reads the MCI_RDR register consecutively instead. So my first thought was that there was something wrong with the PDC? Even so, there was no way for me to do anything about it without writing my own FAT driver using MCI, or somehow integrating U-boot drivers into the OBCSW, which I deemed to be a large undertaking. Furthermore, we had never had any problems with using the PDC driver before, so it did not seem like a likely fix to the problem. I decided to look at the contents of the SD card, so I ejected it from the iOBC and inserted it into my laptop. The result: there were a number of files with garbage filenames around 2 gigabytes in size each.
Introducing FRAM
After eventually having enough of SD card bullshit, I started considering other options. The iOBC has an FRAM peripheral where we store important flight parameters, and while I have had the idea of storing the env file there before I was a bit intimidated in trying to actually get U-boot to talk to it. Nevertheless, since I was fed up with the SD cards I decided to give it a go.
I started by looking through the decompiled FRAM driver just to see how it works. It uses SPI to send some
hex commands to the FRAM that I hadn’t seen before. Then I started digging through the U-boot env
drivers and
by a fluke I stumbled upon the sf
driver (SPI flash). SPI flash is some semi-standardized protocol for reading from
and writing to flash devices, including erasing them since that is commonly necessary before writing. When I looked
closer at it, the hex values it sent were very familliar. After double checking in Ghidra, it turns out that the
FRAM driver is literally just using the SPI flash protocol without any fluff. The only difference is, because the FRAM
is not a flash device I assume, it never sends any erase commands. I also took a look at the device tree (DTS) file
for the AT19SAM9G20 evaluation
board and discovered that there is some sort of dataflash
peripheral in the spi0
bus. Could this be the FRAM!?
spi0: spi@fffc8000 {
cs-gpios = <0>, <&pioC 11 0>, <0>, <0>;
mtd_dataflash@1 {
compatible = "atmel,at45", "atmel,dataflash";
spi-max-frequency = <15000000>;
reg = <1>;
};
};
JEDEC IDs
All of a sudden, storing the env file in FRAM became not only theoretically possible, but also practically possible.
I dug through the SPI flash U-boot driver further to learn how it talks to SPI flash devices. It starts by sending
a command to read the device JEDEC ID to determine parameters like sector sizes.
From this ID, you are supposed to be able to determine things like the manufacturer. The ID that is reported by
the FRAM is 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xC2 0x25 0x08
, but there are a number of things that do not make sense
when looking into how U-boot uses the JEDEC ID. The 0xFF
bytes seem to be padding,
but the padding bytes are supposed to come last. Secondly, the ID should be
6 bytes, but the FRAM’s ID is for some reason 9 bytes. To resolve the first issue, I just implemented a hack in
drivers/mtd/spi/spi-nor-core.c
that reverses the ID if the iOBC board is in use.
tmp = nor->read_reg(nor, SPINOR_OP_RDID, id, SPI_NOR_MAX_ID_LEN);
if (tmp < 0) {
dev_dbg(nor->dev, "error %d reading JEDEC ID\n", tmp);
return ERR_PTR(tmp);
}
#ifdef CONFIG_AT91SAM9G20ISIS
// ID is for some reason reversed?
u8 swap;
for (int i = 0; i < (SPI_NOR_MAX_ID_LEN >> 1) - (SPI_NOR_MAX_ID_LEN & 1 ? 0 : 1); i++) {
swap = id[i];
id[i] = id[SPI_NOR_MAX_ID_LEN - 1 - i];
id[SPI_NOR_MAX_ID_LEN - 1 - i] = swap;
}
#endif
To resolve the second issue, I just added a conditional define that is enabled when the iOBC board is in
use, in drivers/mtd/spi/sf_internal.h
.
#ifndef CONFIG_AT91SAM9G20ISIS
#define SPI_NOR_MAX_ID_LEN 6
#else
#define SPI_NOR_MAX_ID_LEN 9
#endif
As the first non-0xFF
byte in the ID is 0xC2
, I initially thought the manufacturer of the FRAM was
Macronix, so I added the ID of the FRAM under the list of Macronix SPI flash devices. In hindsight this
was quite silly, especially since I had to reverse the ID. But it works so whatever 🗿.
const struct flash_info spi_nor_ids[] = {
/* Devices follow */
#ifdef CONFIG_SPI_FLASH_MACRONIX /* MACRONIX */
/* Macronix */
#ifdef CONFIG_AT91SAM9G20ISIS
// Not sure if the ISIS FRAM even has a sector or page size.
// For simplicity it is set to the default page size.
{ INFO("isis_fram", 0x0825c2, 0, 256, 1024, 0) },
#endif
{ INFO("mx25l2005a", 0xc22012, 0, 64 * 1024, 4, SECT_4K) },
{ INFO("mx25l4005a", 0xc22013, 0, 64 * 1024, 8, SECT_4K) },
{ INFO("mx25l8005", 0xc22014, 0, 64 * 1024, 16, 0) },
{ INFO("mx25l1606e", 0xc22015, 0, 64 * 1024, 32, SECT_4K) },
/* More devices follow */
#endif
}
I wasn’t sure how the FRAM would handle the erase command. According to Ghidra the FRAM driver never sends it,
which make sense because there is no need to erase. I decided to just disable sending that command for the iOBC board
in env/sf.c
.
#ifndef CONFIG_AT91SAM9G20ISIS
// No need to erase ISIS FRAM
puts("Erasing SPI flash...");
ret = spi_flash_erase(env_flash, env_new_offset,
sector * sect_size);
if (ret)
goto done;
#endif
To enable usage of the SPI bus and FRAM device, I added new flags in the defconfig
file. For example, CONFIG_SPI_FLASH
, CONFIG_SPI_FLASH_DATAFLASH
, and CONFIG_SPI_FLASH_MACRONIX
.
I’m not sure if these flags are actually needed (especially sceptical about that last one),
but I added them “just in case”. I also added the SPI bus and dataflash
device to the AT19SAM9G20
device tree file from the AT19SAM9G20 evaluation
board device tree file. Turns out one needs to add status = "okay"
to the device or it won’t work.
Device trees
At this point, U-boot could correctly read and write from and to the FRAM when executed in the
QEMU emulator, correctly identifying it as the isis_fram
device. Btw, to support this I made a hacky FRAM simulator python script based on the SPI test simulator.
Problem is, it did not work using the real hardware, and this ended up not being a trivial problem to solve.
I spent the following days digging deeply into the OBCSW SPI driver in Ghidra and U-boot’s Atmel SPI driver
drivers/spi/atmel_spi.c
, comparing the SPI registers they were reading from and writing to. While I was
able to gather some insights that way, such as discovering that I may have configured an incorrect bitrate,
I was not able to find any significant differences.
I then decided to look into the device tree file again. Since I meticulously enumerated
the GPIO pins that were configured in the FRAM and SPI drivers in the OBCSW, I wanted to check if this was
where the corresponding U-boot GPIO configuration was located. And what do you know, the cs-gpios
attribute did
not contain the right pin. It was using the pin of the RTC, not the FRAM. I guess this is what I get for just
copying an entry from a device tree file of another (albeit similar) board and believing it would “just work”.
I changed the entry to use the right <&pioA 3 0>
pin as shown below. This corresponds to bit 3 of the pioA
peripheral.
spi0: spi@fffc8000 {
cs-gpios = <&pioA 3 0>, <0>, <0>, <0>;
status = "okay";
spi_dataflash: mtd_dataflash@1 {
status = "okay";
compatible = "atmel,at45", "atmel,dataflash";
spi-max-frequency = <15000000>;
reg = <1>;
};
};
And that’s it! U-boot was now able to access the FRAM at a given address and read the env file from there.
Following this, I removed a number of unneeded flags from the defconfig
file, mostly MMC-related, to shrink
the size of U-boot from over 220 KB to just over 150 KB.