Forward in code: Building a runtime system for arm-eabi

AdaCore provide a compiler (running on Windows and Linux) targeted to ARM (target arm-eabi) and a runtime system (RTS) supporting the Ravenscar profile.

The public version of the AdaCore Ravenscar RTS is released under the full GPL. It seemed as if it would be a good idea (and fun!) to produce an independent RTS with the GCC Runtime Library Exception (FAQ).

Development hardware

The board targeted by the AdaCore RTS is the STM32F4 Discovery from STMicroelectronics. I decided instead to get the STM32F429I Discovery, having been seduced by the on-board LCD; with hindsight this may have been a mistake, because it has fewer LEDs and reduces the number of available pins considerably. However, it is faster and has more RAM! Both are supplied by Farnell element14, among others.

AdaCore's software

The first step was to build AdaCore's GNAT GPL 2014 compiler for arm-eabi for the Mac, as written up here, and run with their Ravenscar RTS from the Linux distribution of the same compiler (for the STM32F4) and the modified version for the STM32F429I.

The STMicroelectronics boards come with a built-in communications port (ST-LINK) that supports reading and writing Flash and debugging with GDB. There's a fine tool to use this, stlink, which was easy enough to build for the Mac after an initial tussle with libusb and pkgconfig. A Mac binary is available on Sourceforge.

The demo worked.

FSF compilers

The next step was to build FSF GCC for arm-eabi; firstly 4.9.1, because that was the release whose host compiler I was running at the time, and later the 5.1.0 base and cross compiler.

Firmware

I started from the STM32Cube Firmware package from STMicroelectronics (www.st.com / Support / Tools and Software / Development Tools / Software / STM32Cube; various releases archived on Sourceforge, since STMicroelectronics don't make back issues available), with a boost from the Ada Bare Bones tutorial.

The STM32Cube software (version 1.3) had some problems with #include directives (backslash as path separator). After fixing this, and copying the relevant code from the Projects/STM32F429I-Discovery/Examples/BSP/ directory, the demo worked fine.

I wasn't able to include the linker script used by STMicroelectronics because of its licensing terms; instead, I found a suitable one on Github.

I had considerable fun working out how to deal with system calls (all dummied), C++ global constructors, and libc.a dependencies. This involves cooperation between the linker script (forcing symbols to be loaded) and provision of _init() using a weak symbol, to be overridden by compiler-generated code for C++.

RTS: first steps

AdaCore's RTS is based on a bare-board design which was initially developed by the Real-Time Systems Group at the Technical University of Madrid. It is almost all written in Ada (there are two assembler routines, one for the interrupt vector table and one, in two versions depending on whether the executable is loaded into ROM or RAM, to perform the initial startup).

The basis for my new package System was the GNU-Linux/ARMEL version from the GCC 4.9.1 source distribution with the restrictions listed in Ada Bare Bones, and I copied the initial set of packages from the GCC 4.9.1 distribution (the list was derived from the GNAT GPL 2014 Zero-Footprint (zfp) RTS).

I also wrote some simple packages to interface to the on-board hardware (pushbutton, LEDs, and the LCD; omitting for the moment more sophisticated devices such as the accelerometers) via the Hardware Abstraction Layer (HAL); this also included a procedure STM32F429I_Discovery.HAL.Wait, implemented as

procedure Wait (Milliseconds : Natural) is
   procedure HAL_Delay (Milliseconds : Interfaces.Unsigned_32) with
     Import,
     Convention => C,
     External_Name => "HAL_Delay";
begin
   HAL_Delay (Interfaces.Unsigned_32 (Milliseconds));
end Wait;

RTS: second stage

There are two kinds of package in an RTS:

Those that the compiler depends on; it generates code which calls the package concerned without the user being involved.
Those that are invoked by the user directly. These are much easier to deal with.

The general process for building an RTS is

Write test code that invokes the feature you want to implement. This should be able to produce an executable. You'll need to consider how to check that it's working: sometimes blinking LEDs is enough, more often you'll need to use the debugger.
Compile it.
If the compiler reports a restriction violation (and you haven't made a mistake in your test package!), remove the restriction from System and rebuild the RTS; recompile.
If the compiler reports that a feature isn't available in a configurable RTS, this means that the code generator is looking for a specific package or a particular feature of a package.
- If the package is there but you've previously commented-out the feature, uncomment it.
- If the package isn't there, copy the spec from the base RTS (GCC 4.9.1 in this case).
Don't bother to rebuild the RTS, just try the compilation. Repeat until successful.
Then, import the RTS package body if necessary. Rebuild the RTS, which will probably fail. Comment or uncomment the failing parts of the imported body; repeat until successful.
Recompile the test code. Repeat the whole process until you get an executable.
Run the executable on the target and check it's working.
Repeat for the next feature.

Tasking

The first step was to implement tasking. I originally implemented over ARM's CMSIS:

The ARM^® Cortex^® Microcontroller Software Interface Standard (CMSIS) is a vendor-independent hardware abstraction layer for the Cortex-M processor series and specifies debugger interfaces. Creation of software is a major cost factor in the embedded industry. By standardizing the software interfaces across all Cortex-M silicon vendor products, especially when creating new projects or migrating existing software to a new device, means significant cost reductions.
The CMSIS enables consistent and simple software interfaces to the processor for interface peripherals, real-time operating systems, and middleware. It simplifies software re-use, reducing the learning curve for new microcontroller developers and cutting the time-to-market for devices.

There's a lot to be learned during this process by examining the output of -gnatG, which outputs the generated expanded code (in 72 column lines; you can say -gnatG132 if you're happy with longer lines). For example,

package body Tasking is

   task T;
   task body T is
   begin
      null;
   end T;

end Tasking;

results (in part) in this for the task body

procedure tasking__tTKB (_task : access tasking__tTKV) is
   %push_constraint_error_label ()
   %push_program_error_label ()
   %push_storage_error_label ()
   procedure tasking__tTK___finalizer;

   procedure tasking__tTK___finalizer is
   begin
      %push_constraint_error_label ()
      %push_program_error_label ()
      %push_storage_error_label ()
      $system__tasking__restricted__stages__complete_restricted_task;
      %pop_constraint_error_label
      %pop_program_error_label
      %pop_storage_error_label
      return;
   end tasking__tTK___finalizer;
begin

     $system__tasking__restricted__stages__complete_restricted_activation;
   null;
   %pop_constraint_error_label
   %pop_program_error_label
   %pop_storage_error_label
   return;
at end
   tasking__tTK___finalizer;
end tasking__tTKB;

which is referenced from this, called during elaboration

procedure tasking__tTKVIP (_init : in out tasking__tTKV; _master :
  system__tasking__master_id; _chain : in out
  system__tasking__activation_chain; _task_name : in string) is
   %push_constraint_error_label ()
   %push_program_error_label ()
   %push_storage_error_label ()
   subtype tasking__tTKVIP__S5b is string (_task_name'first(1) ..
     _task_name'last(1));
begin
   _init._task_id := null;
   _init._task_id := _init._atcb'unchecked_access;
   $system__tasking__restricted__stages__create_restricted_task (
     system__tasking__unspecified_priority, system__null_address,
     tasking__tTKZ, system__task_info__unspecified_task_info,
     -1, system__tasking__task_procedure_access!(tasking__T3b'(
     tasking__tTKB'unrestricted_access)), _init'address,
     tasking__tTKE'unchecked_access, _chain, _task_name, _init.
     _task_id);
   %pop_constraint_error_label
   %pop_program_error_label
   %pop_storage_error_label
   return;
end tasking__tTKVIP;

which (after a struggle) lets you work out the parameters to System.Tasking.Restricted.Stages.Create_Restricted_Task ("restricted" because this is a Ravenscar, restricted-capability, runtime).

`delay until`

The next step was to implement delay until. I followed the process outlined above, the first step being to remove pragma Restrictions (No_Delay); from package System.

Protected types

Again, I followed the process.

Move to FreeRTOS

At least one of the CMSIS interfaces (osThreadCreate()) seemed quite cumbersome to use. Since STMicroelectronics's version is actually implemented over FreeRTOS, and since CMSIS is restricted to ARM devices while FreeRTOS has a wider target list, it seemed sensible to implement this RTS over FreeRTOS.

Interrupt handling

This time the actual way the compiler interacts with the RTS was sufficiently complicated that I had to write pseudo-ada to get my head round it.

The entry call (Protected_Single_Entry_Call):

locks the entry
if the barrier is open then
  asserts that Call_In_Progress isn't set
  sets Call_In_Progress
  calls the entry body wrapper
  clears Call_In_Progress
  unlocks the entry
else
  if the Entry_Queue isn't null then
    unlocks the entry
    raises PE
  end if
  sets the Entry_Queue
  unlocks the entry
  sleeps
end if

The handler wrapper:

locks the entry
calls another wrapper for the handler itself
calls Service_Entry
exits

Service_Entry:

if the Entry_Queue is set and the barrier is open then
  clears the Entry_Queue
  asserts that Call_In_Progress isn't set
  sets Call_In_Progress
  calls the entry body wrapper
  clears Call_In_Progress
  saves the caller task_id
  unlocks the entry
  wakes the caller
else
  unlocks the entry
end if

And so on ...

The next features were

Memory allocation
Tagged types
Secondary stacks (for, amongst other things, functions returning indefinite types)
Bounded Vectors, Bounded Hashed Maps
Interfaces.C, Interfaces.C.Strings
Assertions
Suspension Objects

Finalization

Finalization is needed to support generalized iteration over containers (Ada 2012 Rationale 6.3):

for E : Element of A_Vector_Of_Elements loop
   -- do work involving E
end loop;

and, incidentally, existential and universal quantifiers over containers:

(for some E of A_Vector_Of_Elements => property of E is True)
(for all E of A_Vector_Of_Elements => property of E is True)

Unfortunately, implementing finalization revealed two bugs in GNAT (all current releases):

PR66205, gnatbind generates invalid code when finalization is enabled in restricted runtime
PR66242, Front-end error if exception propagation disabled

Although I have provided patches for both of these (with the PRs; for FSF GCC 4.9.1 originally, OK for 5.1.0), it seems best to hold off until (if!) these bugs are fixed. If anyone is interested, see the Sourceforge repository, branch finalization.

Forward in code

Tuesday, 2 June 2015

Building a runtime system for arm-eabi