Debugging ARM Linux Cache Corruption Using a Virtual Prototype
I recently worked with a customer to isolate a system issue. During the Linux boot sequence the cache was becoming corrupted somehow which caused their system to crash. They were able to work around the problem in software but had been unable to isolate the underlying hardware problem due to a lack of visibility into the hardware and also the difficulty of freezing the system at the point where corruption occurred.
By creating a virtual prototype with implementation accurate models for the processor and memory subsystem, they were able to determine the root cause and identify a fix instead of a work around. The virtual prototype was constructed and executed using Carbon SoC Designer Plus.
Figure 1: SoC Designer simulation environment
Reproducing the boot sequence cache corruption problem required the key components of their processor and memory subsystem. These included the ARM1136 processor, L2 Cache, the AHB fabric, Boot ROM, Memory Controller, and parts of their internal register/bus structure.
Configuring the ARM1136
Their ARM1136 implementation accurate model was built using Carbon IP Exchange. Carbon's secure portal used to configure, build, and download models of 3rd Party IP to be used in your SoC Designer Virtual Platform. The ARM1136 model includes register views, memory views, cache profiling views, and an ARM Real View debugger integration. Carbon IP Exchange built the ARM1136 model built using its Verilog RTL, and thus, it is fully accurate to the RTL.
Figure 2: ARM1136 IP Exchange configuration form
Note that Carbon IP Exchange enabled the customer to build the model without requiring detailed, RTL level knowledge of the processor. This is very important in cases like this one where the customer knew how to program the ARM1136 but didn't know RTL.
Building the Rest of the System
This customer had RTL for the L2 Cache, AHB fabric, Boot ROM, Memory Controller, and parts of their internal register/bus structure. They used Carbon Model Studio to compile them into a 100% accurate virtual model which could then be dropped directly onto the design canvas. This is the most common path that customers follow to leverage existing IP or blocks which they purchase from third parties that aren't available from Carbon IP Exchange. The alternative of hand-written models is often investigated but rarely pursued due to the weeks or months of design and validation effort required for a model that still isn't guaranteed to be 100% accurate.
Figure 3: Using Carbon Model Studio to add AMBA adaptors, clocking, etc. to the SoC Designer Model
Isolating the Problem
To focus on the boot sequence software causing the cache corruption, the customer removed unrelated initialization code. For example, most of their peripheral driver initialization code was removed. This also allowed the SoCDesigner virtual prototype to be simplified, since these peripheral devices did not need to be included.
After paring down the boot software and initializing the design, the customer was able to reproduce cache corruption during the Linux boot sequence quite quickly. Now that they had seen the problem they worked on creating a shorter sequence to reproduce the corruption and involve the hardware team. Then, by examining the AHB waveforms, cache memory views, memory controller memory views, ARM1136 disassembly views at the point of the corruption (all views which are provided by SoCDesigner), they were able pin-point the corruption to a Read reading data from the memory controller’s memory before an earlier Write to same location had been flushed from the L1 D-Cache. This occurred because a writethrough (cache controller writes both the cache and main memory) had been implemented as writeback (cache controller writes to valid cache data memory and NOT to main memory). With this information, they could finally fix the actual cache corruption problem.
They had been unable to find this problem in the real hardware since they lacked the controllability to reproduce the problem reliably and the visibility to see the interaction between hardware and software. By leveraging SoCDesigner together with 100% accurate models from Carbon IP Exchange and models compiled from their own RTL using Carbon Model Studio they were able to quickly and easily isolate the issue and fix the root problem.