RSS Add a new post titled:

Decomposing Splines Without Recursion

To make graphics usable in Snek, I need to avoid using a lot of memory, especially on the stack as there's no stack overflow checking on most embedded systems. Today, I worked on how to draw splines with a reasonable number of line segments without requiring any intermediate storage. Here's the results from this work:

The Usual Method

The usual method I've used to convert a spline into a sequence of line segments is split the spline in half using DeCasteljau's algorithm recursively until the spline can be approximated by a straight line within a defined tolerance.

Here's an example from twin:

static void
_twin_spline_decompose (twin_path_t *path,
            twin_spline_t   *spline, 
            twin_dfixed_t   tolerance_squared)
{
    if (_twin_spline_error_squared (spline) <= tolerance_squared)
    {
    _twin_path_sdraw (path, spline->a.x, spline->a.y);
    }
    else
    {
    twin_spline_t s1, s2;
    _de_casteljau (spline, &s1, &s2);
    _twin_spline_decompose (path, &s1, tolerance_squared);
    _twin_spline_decompose (path, &s2, tolerance_squared);
    }
}

The _de_casteljau function splits the spline at the midpoint:

static void
_lerp_half (twin_spoint_t *a, twin_spoint_t *b, twin_spoint_t *result)
{
    result->x = a->x + ((b->x - a->x) >> 1);
    result->y = a->y + ((b->y - a->y) >> 1);
}

static void
_de_casteljau (twin_spline_t *spline, twin_spline_t *s1, twin_spline_t *s2)
{
    twin_spoint_t ab, bc, cd;
    twin_spoint_t abbc, bccd;
    twin_spoint_t final;

    _lerp_half (&spline->a, &spline->b, &ab);
    _lerp_half (&spline->b, &spline->c, &bc);
    _lerp_half (&spline->c, &spline->d, &cd);
    _lerp_half (&ab, &bc, &abbc);
    _lerp_half (&bc, &cd, &bccd);
    _lerp_half (&abbc, &bccd, &final);

    s1->a = spline->a;
    s1->b = ab;
    s1->c = abbc;
    s1->d = final;

    s2->a = final;
    s2->b = bccd;
    s2->c = cd;
    s2->d = spline->d;
}

This is certainly straightforward, but suffers from an obvious flaw — there's unbounded recursion. With two splines in the stack frame, each containing eight coordinates, the stack will grow rapidly; 4 levels of recursion will consume more than 64 coordinates space. This can easily overflow the stack of a tiny machine.

De Casteljau Splits At Any Point

De Casteljau's algorithm is not limited to splitting splines at the midpoint. You can supply an arbitrary position t, 0 < t < 1, and you will end up with two splines which, drawn together, exactly match the original spline. I use 1/2 in the above version because it provides a reasonable guess as to how an arbitrary spline might be decomposed efficiently. You can use any value and the decomposition will still work, it will just change the recursion depth along various portions of the spline.

Iterative Left-most Spline Decomposition

What our binary decomposition does is to pick points t0 - tn such that splines t0..t1 through tn-1 .. tn are all 'flat'. It does this by recursively bisecting the spline, storing two intermediate splines on the stack at each level. If we look at just how the first, or 'left-most' spline is generated, that can be represented as an iterative process. At each step in the iteration, we split the spline in half:

S' = _de_casteljau(s, 1/2)

We can re-write this using the broader capabilities of the De Casteljau algorithm by splitting the original spline at decreasing points along it:

S[n] = _de_casteljau(s0, (1/2)ⁿ)

Now recall that the De Casteljau algorithm generates two splines, not just one. One describes the spline from 0..(1/2)ⁿ, the second the spline from (1/2)ⁿ..1. This gives us an iterative approach to generating a sequence of 'flat' splines for the whole original spline:

while S is not flat:
    n = 1
    do
        Sleft, Sright = _decasteljau(S, (1/2)ⁿ)
        n = n + 1
    until Sleft is flat
    result ← Sleft
    S = Sright
result ← S

We've added an inner loop that wasn't needed in the original algorithm, and we're introducing some cumulative errors as we step around the spline, but we don't use any additional memory at all.

Final Code

Here's the full implementation:

/*
 * Copyright © 2020 Keith Packard <keithp@keithp.com>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
 */

#include <stdbool.h>
#include <stdio.h>
#include <string.h>

typedef float point_t[2];
typedef point_t spline_t[4];

#define SNEK_DRAW_TOLERANCE 0.5f

/* Is this spline flat within the defined tolerance */
static bool
_is_flat(spline_t spline)
{
    /*
     * This computes the maximum deviation of the spline from a
     * straight line between the end points.
     *
     * From https://hcklbrrfnn.files.wordpress.com/2012/08/bez.pdf
     */
    float ux = 3.0f * spline[1][0] - 2.0f * spline[0][0] - spline[3][0];
    float uy = 3.0f * spline[1][1] - 2.0f * spline[0][1] - spline[3][1];
    float vx = 3.0f * spline[2][0] - 2.0f * spline[3][0] - spline[0][0];
    float vy = 3.0f * spline[2][1] - 2.0f * spline[3][1] - spline[0][1];

    ux *= ux;
    uy *= uy;
    vx *= vx;
    vy *= vy;
    if (ux < vx)
        ux = vx;
    if (uy < vy)
        uy = vy;
    return (ux + uy <= 16.0f * SNEK_DRAW_TOLERANCE * SNEK_DRAW_TOLERANCE);
}

static void
_lerp (point_t a, point_t b, point_t r, float t)
{
    int i;
    for (i = 0; i < 2; i++)
        r[i] = a[i]*(1.0f - t) + b[i]*t;
}

static void
_de_casteljau(spline_t s, spline_t s1, spline_t s2, float t)
{
    point_t first[3];
    point_t second[2];
    int i;

    for (i = 0; i < 3; i++)
        _lerp(s[i], s[i+1], first[i], t);

    for (i = 0; i < 2; i++)
        _lerp(first[i], first[i+1], second[i], t);

    _lerp(second[0], second[1], s1[3], t);

    for (i = 0; i < 2; i++) {
        s1[0][i] = s[0][i];
        s1[1][i] = first[0][i];
        s1[2][i] = second[0][i];

        s2[0][i] = s1[3][i];
        s2[1][i] = second[1][i];
        s2[2][i] = first[2][i];
        s2[3][i] = s[3][i];
    }
}

static void
_spline_decompose(void (*draw)(float x, float y), spline_t s)
{
    float       t;
    spline_t    s1, s2;

    (*draw)(s[0][0], s[0][1]);

    /* If s is flat, we're done */
    while (!_is_flat(s)) {
        t = 1.0f;

        /* Iterate until s1 is flat */
        do {
            t = t/2.0f;
            _de_casteljau(s, s1, s2, t);
        } while (!_is_flat(s1));

        /* Draw to the end of s1 */
        (*draw)(s1[3][0], s1[3][1]);

        /* Replace s with s2 */
        memcpy(&s[0], &s2[0], sizeof (spline_t));
    }
    (*draw)(s[3][0], s[3][1]);
}

void draw(float x, float y)
{
    printf("%8g, %8g\n", x, y);
}

int main(int argc, char **argv)
{
    spline_t spline = {
        { 0.0f, 0.0f },
        { 0.0f, 256.0f },
        { 256.0f, -256.0f },
        { 256.0f, 0.0f }
    };
    _spline_decompose(draw, spline);
    return 0;
}
Posted Fri Feb 14 21:55:57 2020

Prototyping a Vulkan Extension — VK_MESA_present_period

I've been messing with application presentation through the Vulkan API for quite a while now, first starting by exploring how to make head-mounted displays work by creating DRM leases as described in a few blog posts: 1, 2, 3, 4.

Last year, I presented some work towards improving frame timing accuracy at the X developers conference. Part of that was about the Google Display Timing extension.

VK_GOOGLE_display_timing

VK_GOOGLE_display_timing is really two extensions in one:

  1. Report historical information about when frames were shown to the user.

  2. Allow applications to express when future frames should be shown to the user.

The combination of these two is designed to allow applications to get frames presented to the user at the right time. The biggest barrier to having things work perfectly all of the time is that the GPU has finite rendering performance, and can easily get behind if the application asks it to do more than it can in the time available.

When this happens, the previous frame gets stuck on the screen for extra time, and then the late frame gets displayed. In fact, because the software queues up a pile of stuff, several frames will often get delayed.

Once the application figures out that something bad happened, it can adjust future rendering, but the queued frames are going to get displayed at some point.

The problem is that the application has little control over the cadence of frames once things start going wrong.

Imagine the application is trying to render at 1/2 the native frame rate. Using GOOGLE_display_timing, it sets the display time for each frame by spacing them apart by twice the refresh interval. When a frame misses its target, it will be delayed by one frame. If the subsequent frame is ready in time, it will be displayed just one frame later, instead of two. That means you see two glitches, one for the delayed frame and a second for the "early" frame (not actually early, just early with respect to the delayed frame).

Specifying Presentation Periods

Maybe, instead of specifying when frames should be displayed, we should specify how long frames should be displayed. That way, when a frame is late, subsequent queued frames will still be displayed at the correct relative time. The application can use the first part of GOOGLE_display_timing to figure out what happened and correct at some later point, being careful to avoid generating another obvious glitch.

I really don't know if this is a better plan, but it seems worth experimenting with, so I decided to write some code and see how hard it was to implement.

Going In The Wrong Direction

At first, I assumed I'd have to hack up the X server, and maybe the kernel itself to make this work. So I started specifying changes to the X present extension and writing a pile of code in the X server.

Queuing the first presentation to the kernel was easy; with no previous presentation needing to be kept on the screen for a specified period, it just gets sent right along.

For subsequent presentations, I realized that I needed to wait until I learned when the earlier presentations actually happened, which meant waiting for a kernel event. That kernel event immediately generates an X event back to the Vulkan client, telling it when the presentation occurred.

Once I saw that both X and Vulkan were getting the necessary information at about the same time, I realized that I could wait in the Vulkan code rather than in the X server.

Window-system Independent Implementation

As part of the GOOGLE_display_timing implementation, each window system tells the common code when presentations have happened to record that information for the application. This provides the hook I need to send off pending presentations using that timing information to compute when they should be presented.

Almost. The direct-to-display (DRM) back-end worked great, but the X11 back-end wasn't very prompt about delivering this timing information, preferring to process X events (containing the timing information) only when the application was blocked in vkAcquireNextImageKHR. I hacked in a separate event handling thread so that events would be processed promptly and got things working.

VK_MESA_present_period

An application uses VK_MESA_present_period by including a VkPresentPeriodMESA structure in the pNext chain in the VkPresentInfoKHR structure passed to the vkQueuePresentKHR call.

typedef struct VkPresentPeriodMESA {
    VkStructureType    sType;
    const void*        pNext;
    uint32_t           swapchainCount;
    const int64_t*     pPresentPeriods;
} VkPresentPeriodMESA;

The fields in this structure are:

  • sType. Set to VK_STRUCTURE_TYPE_PRESENT_PERIOD_MESA
  • pNext. Points to the next extension structure in the chain (if any).
  • swapchainCount. A copy of the swapchainCount field in the VkPresentInfoKHR structure.
  • pPresentPeriods. An array, length swapchainCount, of presentation periods for each image in the call.

Positive presentation periods represent nanoseconds. Negative presentation periods represent frames. A zero value means the extension does not affect the associated presentation. Nanosecond values are rounded to the nearest upcoming frame so that a value of n * refresh_interval is the same as using a value of n frames.

The presentation period causes future images to be delayed at least until the specified interval after this image has been presented. Specifying both a presentation period in a previous frame and using GOOGLE_display_timing is well defined -- the presentation will be delayed until the later of the two times.

Status and Plans

The prototype (it's a bit haphazard, I'm afraid) code is available in my gitlab mesa repository. It depends on my GOOGLE_display_timing implementation, which has not been merged yet, so you may want to check that out to understand what this patch does.

As far as the API goes, I could easily be convinced to use some better way of switching between frames and nanoseconds, otherwise I think it's in pretty good shape.

I'm looking for feedback on whether this seems like a useful way to improve frame timing in Vulkan. Comments on how the code might be better structured would also be welcome; I'm afraid I open-coded a singly linked list in my haste...

Posted Sat Feb 1 22:38:21 2020 Tags:

Snekboard's Crowd Supply Campaign

Snekboard has garnered a lot of interest from people who have seen it in operation. Josh Lifton, a fellow Portland resident and co-founder of Crowd Supply, suggested that perhaps we could see how much interest there was for this hardware by building a campaign.

Getting Things Together

We took pictures, made movies, built spreadsheets full of cost estimates and put together the Snekboard story, including demonstrations of LEGO models running Snek code. It took a couple of months to get ready to launch.

Launching the Campaign

The Snekboard campaign launched while I was at LCA getting ready to talk about snek.

Interest is Strong

We set a goal of $4000, which is enough to build 50 Snekboards. We met that goal after only two weeks and still have until the first of March to get further support.

Creating Teaching Materials

We've been teaching programming in our LEGO robotics class for a long time. I joined the class about 15 years ago and started with LEGO Logo on an Apple II, and more recently using C++ with Arduino hardware.

That's given us a lot of experience with what kinds of robots work well and what kinds of software the students are going to be able to understand and enjoy experimenting with. We've adapted the models and software to run on Snekboard using Snek and have started writing up how we're teaching that and putting those up on the sneklang.org documentation page.

Free Software / Free Hardware

All of the software running on Snekboard is free; Snek is licensed under the GPL, Circuit Python uses the MIT license.

The Snekboard designs are also freely available; that uses the TAPR Open Hardware License.

All of the tools we use to design snekboard are also free; we use gEDA project tools.

Hardware and software used in education need to be free and open so that people can learn about how they work, build modified versions and share those with the world.

Posted Fri Jan 31 16:21:02 2020 Tags:

Linux.conf.au 2020

I just got back from linux.conf.au 2020 on Saturday and am still adjusting to being home again. I had the opportunity to give three presentations during the conference and wanted to provide links to the slides and videos.

Picolibc

My first presentation was part of the Open ISA miniconf on Monday. I summarized the work I've been doing on a fork of Newlib called Picolibc which targets 32- and 64- bit embedded processors.

Snek

Wednesday morning, I presented on my snek language, which is a small Python designed for introducing programming in an embedded environment. I've been using this for the last year or more in a middle-school environment (grades 5-7) as a part of a LEGO robotics class.

X History and Politics

Bradley Kuhn has been encouraging me to talk about the early politics of X and how that has shaped my views on the benefits of copyleft licenses in building strong communities, especially in driving corporate cooperation and collaboration. I would have loved to also give this talk as a part of the Copyleft Conference being held in Brussels after FOSDEM, but I won't be at that event. This talk spans the early years of X, covering events up through 1992 or so.

Posted Tue Jan 21 15:02:01 2020 Tags:

Picolibc Without Double

Smaller embedded processors may have no FPU, or may have an FPU that only supports single-precision mode. In either case, applications may well want to be able to avoid any double precision arithmetic as that will drag in a pile of software support code. Getting picolibc to cooperate so that it doesn't bring in double-precision code was today's exercise.

Take a look at the changes in git

__OBSOLETE_MATH is your friend

The newlib math library, which is where picolibc gets its math code, has two different versions of some functions:

  • single precision sin, cos and sincos
  • single and double precision exp, exp2 and log, log2 and pow

The older code, which was originally written at Sun Microsystems (most of this code carries a 1993 copyright), is quite careful to perform single precision functions using only single precision intermediate values.

The newer code, which carries a 2018 copyright from Arm Ltd, uses double precision intermediate values for single precision functions.

I haven't evaluated the accuracy of either algorithm, but the newer code claims to be faster on machines which support double in hardware.

However, for machines with no hardware double support, especially for machines with hardware single precision support, I'm betting the code which avoids double will be faster. Not to mention all of the extra space in ROM that would be used by a soft double implementation.

I had switched the library to always use the newer code while messing about with some really stale math code last month, not realizing exactly what this flag was doing. I got a comment on that patch from github user 'innerand' which made me realize my mistake.

I've switched the default back to using the old math code on platforms that don't have hardware double support, and using the new math code on platforms that do. I also added a new build option, -Dnewlib-obsolete-math, which can be set to auto, true, or false. auto mode is the default, which selects as above.

Float vs Double error handling

Part of the integration of the Arm math code changed how newlib/picolibc handles math errors. The new method calls functions to set errno and return a specific value back to the application, like __math_uflow, which calls __math_xflow which calls __math_with_errno. All of these versions take double parameters and return double results. Some of them do minor arithmetic on these parameters. There are also float versions of these handlers, which are for use in float operations.

One float function, the __OBSOLETE_MATH version of log1pf, was mistakenly using the double error handlers, __math_divzero and __math_invalid. Just that one bug pulled in most of the soft double precision implementation. I fixed that in picolibc and sent a patch upstream to newlib.

Float printf vs C ABI

The C ABI specifies that float parameters to varargs functions are always promoted to doubles. That means that printf never gets floats, only doubles. Program using printf will end up using doubles, even if there are no double values anywhere in the code.

There's no easy way around this issue — it's hard-wired in the C ABI. Smaller processors, like the 8-bit AVR, “solve” this by simply using the same 32-bit representation for both double and float. On RISC-V and ARM processors, that's not a viable solution as they have a well defined 64-bit double type, and both GCC and picolibc need to support that for applications requiring the wider precision.

I came up with a kludge which seems to work. Instead of passing a float parameter to printf, you can pass a uint32_t containing the same bits, which printf can unpack back into a float. Of course, both the caller and callee will need to agree on this convention.

Using the same mechanism as was used to offer printf/scanf functions without floating point support, when the #define value, PICOLIBC_FLOAT_PRINTF_SCANF is set before including stdio.h, the printf functions are all redefined to reference versions with this magic kludge enabled, and the scanf functions redefined to refer to ones with the 'double' code disabled.

A new macro, printf_float(x) can be used to pass floats to any of the printf functions. This also works in the normal version of the code, so you can use it even if you might be calling one of the regular printf functions.

Here's an example:

#define PICOLIBC_FLOAT_PRINTF_SCANF
#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
    printf("pi is %g\n", printf_float(3.141592f));
}

Results

Just switching to float-only printf removes the following soft double routines:

  • __adddf3
  • __aeabi_cdcmpeq
  • __aeabi_cdcmple
  • __aeabi_cdrcmple
  • __aeabi_d2uiz
  • __aeabi_d2ulz
  • __aeabi_dadd
  • __aeabi_dcmpeq
  • __aeabi_dcmpge
  • __aeabi_dcmpgt
  • __aeabi_dcmple
  • __aeabi_dcmplt
  • __aeabi_dcmpun
  • __aeabi_ddiv
  • __aeabi_dmul
  • __aeabi_drsub
  • __aeabi_dsub
  • __aeabi_f2d
  • __aeabi_i2d
  • __aeabi_l2d
  • __aeabi_ui2d
  • __aeabi_ul2d
  • __cmpdf2
  • __divdf3
  • __eqdf2
  • __extendsfdf2
  • __fixunsdfdi
  • __fixunsdfsi
  • __floatdidf
  • __floatsidf
  • __floatundidf
  • __floatunsidf
  • __gedf2
  • __gtdf2
  • __ledf2
  • __ltdf2
  • __muldf3
  • __nedf2
  • __subdf3
  • __unorddf2

The program shrank by 2672 bytes:

$ size double.elf float.elf
   text    data     bss     dec     hex filename
  48568     116   37952   86636   1526c double.elf
  45896     116   37952   83964   147fc float.elf
Posted Sat Nov 30 18:31:43 2019 Tags:

Picolibc Version 1.1

Picolibc development is settling down at last. With the addition of a simple 'hello world' demo app, it seems like a good time to stamp the current code as 'version 1.1'.

Changes since Version 1.0

  • Semihosting helper library. Semihosting lets an application running under a debugger or emulator communicate through the debugger or emulator with the environment hosting those. It's great for platform bringup before you've got clocking and a serial driver. I'm hoping it will also make running tests under qemu possible. The code works on ARM and RISC-V systems and offers console I/O and exit() support (under qemu).

  • Hello World example. This is a stand-alone bit of code with a Makefile that demonstrates how to build a complete application for both RISC-V and ARM embedded systems using picolibc after it has been installed. The executables run under QEMU using a provided script. Here's all the source code you need; the rest of the code (including semihosting support) is provided by picolibc:

    #include <stdio.h> #include <stdlib.h>

    int main(void) { printf("hello, world\n"); exit(0); }

  • POSIX file I/O support. For systems which have open/close/read/write, picolibc's tinystdio can now provide stdio functions that use them, including fopen and fdopen.

  • Updated code from newlib. I've merged current upstream newlib into the tree. There were a few useful changes there, including libm stubs for fenv on hosts that don't provide their own.

Where To Get Bits

You can find picolibc on my personal server's git repository:

https://keithp.com/cgit/picolibc.git/

There's also a copy on github:

https://github.com/keith-packard/picolibc

If you like tarballs, I also create those:

https://keithp.com/picolibc/dist/

I've create tags for 1.1 (upstream) and 1.1-1 (debian packaging included) and pushed those to the git repositories.

Filing Issues, Making Contributions

There's a mailing list at keithp.com:

https://keithp.com/mailman/listinfo/picolibc

Or you can file issues using the github tracker.

Posted Thu Nov 14 22:39:04 2019 Tags:

Picolibc Hello World Example

It's hard to get started building applications for embedded RISC-V and ARM systems. You need to at least:

  1. Find and install the toolchain

  2. Install a C library

  3. Configure the compiler for the right processor

  4. Configure the compiler to select the right headers and libraries

  5. Figure out the memory map for the target device

  6. Configure the linker to place objects in the right addresses

I've added a simple 'hello-world' example to picolibc that shows how to build something that runs under qemu so that people can test the toolchain and C library and see what values will be needed from their hardware design.

The Source Code

Getting text output from the application is a huge step in embedded system development. This example uses the “semihosting” support built-in to picolibc to simplify that process. It also explicitly calls exit so that qemu will stop when the demo has finished.

#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
    printf("hello, world\n");
    exit(0);
}

The Command Line

The hello-world documentation takes the user through the steps of building the compiler command line, first using the picolibc.specs file to specify header and library paths:

gcc --specs=picolibc.specs

Next adding the semihosting library with the --semihost option (this is an option defined in picolibc.specs which places -lsemihost after -lc):

gcc --specs=picolibc.specs --semihost

Now we specify the target processor (switching to the target compiler here as these options are target-specific):

riscv64-unknown-elf-gcc --specs=picolibc.specs --semihost -march=rv32imac -mabi=ilp32

or

arm-none-eabi-gcc --specs=picolibc.specs --semihost -mcpu=cortex-m3

The next step specifies the memory layout for our emulated hardware, either the 'spike' emulation for RISC-V:

riscv64-unknown-elf-gcc --specs=picolibc.specs --semihost -march=rv32imac -mabi=ilp32 -Thello-world-riscv.ld

with hello-world-riscv.ld containing:

__flash = 0x80000000;
__flash_size = 0x00080000;
__ram = 0x80080000;
__ram_size = 0x40000;
__stack_size = 1k;
INCLUDE picolibc.ld

or the mps2-an385 for ARM:

arm-none-eabi-gcc --specs=picolibc.specs --semihost -mcpu=cortex-m3 -Thello-world-arm.ld

with hello-world-arm.ld containing:

__flash =      0x00000000;
__flash_size = 0x00004000;
__ram =        0x20000000;
__ram_size   = 0x00010000;
__stack_size = 1k;
INCLUDE picolibc.ld

Finally, we add the source file name and target elf output:

riscv64-unknown-elf-gcc --specs=picolibc.specs --semihost
-march=rv32imac -mabi=ilp32 -Thello-world-riscv.ld -o
hello-world-riscv.elf hello-world.c

arm-none-eabi-gcc --specs=picolibc.specs --semihost
-mcpu=cortex-m3 -Thello-world-arm.ld -o hello-world-arm.elf
hello-world.c

Summary

Picolibc tries to make things a bit simpler by offering built-in compiler and linker scripts along with default startup code to try and make building your first embedded application easier.

Posted Sun Nov 10 14:54:35 2019 Tags:

Picolibc Updates (October 2019)

Picolibc is in pretty good shape, but I've been working on a few updates which I thought I'd share this evening.

Dummy stdio thunk

Tiny stdio in picolibc uses a global variable, __iob, to hold pointers to FILE structs for stdin, stdout, and stderr. For this to point at actual usable functions, applications normally need to create and initialize this themselves.

If all you want to do is make sure the tool chain can compile and link a simple program (as is often required for build configuration tools like autotools), then having a simple 'hello world' program actually build successfully can be really useful.

I added the 'dummyiob.c' module to picolibc which has an iob variable initialized with suitable functions. If your application doesn't define it's own iob, you'll get this one instead.

$ cat hello.c
#include <stdio.h>

int main(void)
{
    printf("hello, world\n");
}
$ riscv64-unknown-elf-gcc -specs=picolibc.specs hello.c
$ riscv64-unknown-elf-size a.out
   text    data     bss     dec     hex filename
    496      32       0     528     210 a.out

POSIX thunks

When building picolibc on Linux for testing, it's useful to be able to use glibc syscalls for input and output. If you configure picolibc with -Dposix-io=true, then tinystdio will use POSIX functions for reading and writing, and also offer fopen and fdopen functions as well.

To make calling glibc syscall APIs work, I had to kludge the stat structure and fcntl bits. I'm not really happy about this, but it's really only for testing picolibc on a Linux host, so I'm not going to worry too much about it.

Remove 'mathfp' code

The newlib configuration docs aren't exactly clear about what the newlib/libm/mathfp directory contains, but if you look at newlib faq entry 10 it turns out this code was essentially a failed experiment in doing a 'more efficient' math library.

I think it's better to leave 'mathfp' in git history and not have it confusing us in the source repository, so I've removed it along with the -Dhw-fp option.

Other contributions

I've gotten quite a few patches from other people now, which is probably the most satisfying feedback of all.

  • powerpc build patches
  • stdio fixes
  • cleanup licensing, removing stale autotools bits
  • header file cleanups from newlib which got missed

Semihosting support

RISC-V and ARM both define a 'semihosting' API, which provides APIs to access the host system from within an embedded application. This is useful in a number of environments:

  • GDB through OpenOCD and JTAG to an embedded device
  • Qemu running bare-metal applications
  • Virtual machines running anything up to and including Linux

I really want to do continuous integration testing for picolibc on as many target architectures as possible, but it's impractical to try and run that on actual embedded hardware. Qemu seems like the right plan, but I need a simple mechanism to get error messages and exit status values out from the application.

Semihosting offers all of the necessary functionality to run test without requiring an emulated serial port in Qemu and a serial port driver in the application.

For now, that's all the functionality I've added; console I/O (via a definition of _iob) and exit(2). If there's interest in adding more semihosting API calls, including file I/O, let me know.

I wanted to make semihosting optional, so that applications wouldn't get surprising results when linking with picolibc. This meant placing the code in a separate library, libsemihost. To get this linked correctly, I had to do a bit of magic in the picolibc.specs file. This means that selecting semihost mode is now done with a gcc option, -semihost', instead of just adding -lsemihost to the linker line.

Semihosting support for RISC-V is already upstream in OpenOCD. I spent a couple of hours last night adapting the ARM semihosting support in Qemu for RISC-V and have pushed that to my riscv-semihost branch in my qemu project on github

A real semi-hosted 'hello world'

I've been trying to make using picolibc as easy as possible. Learning how to build embedded applications is hard, and reducing some of the weird tool chain fussing might make it easier. These pieces work together to simplify things:

  • Built-in crt0.o
  • picolibc.specs
  • picolibc.ld
  • semihost mode

Here's a sample hello-world.c:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    printf("hello, world\n");
    exit(0);
}

On Linux, compiling is easy:

$ cc hello-world.c 
$ ./a.out 
hello, world
$

Here's how close we are to that with picolibc:

$ riscv64-unknown-elf-gcc -march=rv32imac -mabi=ilp32 --specs=picolibc.specs -semihost -Wl,-Tqemu-riscv.ld hello-world.c
$ qemu-system-riscv32 -semihosting -machine spike -cpu rv32imacu-nommu -kernel a.out -nographic
hello, world
$

This requires a pile of options to specify the machine that qemu emulates, both when compiling the program and again when running it. It also requires one extra file to define the memory layout of the target processor, 'qemu-riscv.ld':

__flash = 0x80000000;
__flash_size = 0x00080000;
__ram = 0x80080000;
__ram_size = 0x40000;
__stack_size = 1k;

These are all magic numbers that come from the definition of the 'spike' machine in qemu, which defines 16MB of RAM starting at 0x80000000 that I split into a chunk for read-only data and another chunk for read-write data. I found that definition by looking in the source; presumably there are easier ways?

Larger Examples

I've also got snek running on qemu for both arm and riscv processors; that exercises a lot more of the library. Beyond this, I'm working on freedom-metal and freedom-e-sdk support for picolibc and hope to improve the experience of building embedded RISC-V applications.

Future Plans

I want to get qemu-based testing working on both RISC-V and ARM targets. Once that's running, I want to see the number of test failures reduced to a more reasonable level and then I can feel comfortable releasing version 1.1. Help on these tasks would be greatly appreciated.

Posted Mon Oct 21 22:34:05 2019 Tags:

Picolibc Version 1.0 Released

I wrote a couple of years ago about the troubles I had finding a good libc for embedded systems, and for the last year or so I've been using something I called 'newlib-nano', which was newlib with the stdio from avrlibc bolted on. That library has worked pretty well, and required very little work to ship.

Now that I'm doing RISC-V stuff full-time, and am currently working to improve the development environment on deeply embedded devices, I decided to take another look at libc and see if a bit more work on newlib-nano would make it a good choice for wider usage.

One of the first changes was to switch away from the very confusing "newlib-nano" name. I picked "picolibc" as that seems reasonably distinct from other projects in the space and and doesn't use 'new' or 'nano' in the name.

Major Changes

Let's start off with the big things I've changed from newlib:

  1. Replaced stdio. In place of the large and memory-intensive stdio stack found in newlib, picolibc's stdio is derived from avrlibc's code. The ATmel-specific assembly code has been replaced with C, and the printf code has seen significant rework to improve standards conformance. This work was originally done for newlib-nano, but it's a lot cleaner looking in picolibc.

  2. Switched from 'struct _reent' to TLS variables for per-thread values. This greatly simplifies the library and reduces memory usage for all applications -- per-thread data from unused portions of the library will not get allocated for any thread. On RISC-V, this also generates smaller and faster code. This also eliminates an extra level of function call for many code paths.

  3. Switched to the 'meson' build system. This makes building the library much faster and also improves the maintainability of the build system as it eliminates a maze of twisty autotools configure scripts.

  4. Updated the math test suite to use glibc as a reference instead of some ancient Sun machine.

  5. Manually verified the test results to see how the library is doing; getting automated testing working will take a lot more effort as many (many) tests still have invalid 'correct' values resulting in thousands of failure.

  6. Remove unused code with non-BSD licenses. There's still a pile of unused code hanging around, but all non-BSD licensed bits have been removed to make the licensing situation clear. Picolibc is BSD licensed.

Picocrt

Starting your embedded application requires initializing RAM as appropriate and calling initializers/constructors before invoking main(). Picocrt is designed to do that part for you.

Building Simplified

Using newlib-nano meant specifying the include and library paths very carefully in your build environment, and then creating a full custom linker script. With Picolibc, things are much easier:

  • Compile with -specs=picolibc.specs. That and the specification of the target processor are enough to configure include and library paths. The Debian package installs this in the gcc directory so you don't need to provide a full path to the file.

  • Link with picolibc.ld (which is used by default with picolibc.specs). This will set up memory regions and include Picocrt to initialize memory before your application runs.

Debian Packages

I've uploaded Debian packages for this version; they'll get stuck in the new queue for a while, but should eventually make there way into the repository. I'll plan on removing newlib-nano at some point in the future as I don't plan on maintaining both.

More information

You can find the source code on both my own server and over on github:

You'll find some docs and other information linked off the README file

Posted Mon Sep 23 23:18:12 2019 Tags:

Snekboard v0.2 Update

I've built six prototypes of snekboard version 0.2. They're working great and I'm happy with the design.

New Motor Driver

Having discovered that the TI DRV8838 wasn't up to driving the Lego Power Functions Medium motor (8883) because of it's start-up current draw, I went back and reworked the snekboard circuit to use TI DRV8800 instead. That controller can provide up to 2.8A and doesn't have any trouble with this motor.

The DRV8800 is larger than the DRV8838, so it took a bit of re-wiring to fit them on the circuit board.

New Power Source Selector

In version 0.1, I was using two DFLS130L Schottky diodes to automatically select between the on-board lithium polymer battery and USB to power the board. That "worked", except that there was enough leakage back through them that when the USB connector was unplugged, the battery charge indicator LEDs both lit up, which left me with the choice of disabling those indicators or draining the battery.

To fix that, I found an automatic power selector (with current limit!) part, the TPS2121. This should avoid frying the board when you short the motor controller outputs, although those also have current limiting circuits. Defense in depth!

One issue I found was that this circuit draws current even when the output is disconnected, so I changed the power switch from a SPST to DPST and now control USB and battery power separately.

CircuitPython

I included a W25Q16 2MB NOR flash chip on the board so that it could also run CircuitPython. Before finalizing the design, I thought it might be a good idea to actually get that running.

I've submitted a pull request with the necessary changes. I hope to see that merged at some point, which will allow users to select between CircuitPython and snek.

Smoothing Speed Changes

While the 9V supply on snekboard is designed to supply plenty of current for the motors, if you ask it to suddenly change how much it is producing, it places a huge load on the battery. When this happens, the battery voltage drops below the brown-out value for the SoC and the board resets.

I experimented with how to resolve this by ramping the power up and down in the snek application. That worked great; the motors could easily switch from full speed in one direction to full speed in the other direction.

Instead of having users add code to every snek application, I decided to move this functionality down into the snek implementation. I did this by modifying the PWM and direction pins values in a function called from the timer interrupt. This lets the application continue to run at full speed, while the motor controller slowly adjusts its output. No more resets when switching from full forward to full reverse.

Future Plans

I've got the six v0.2 prototypes that I'll be able to use in for the upcoming class year, but I'm unsure of whether there would be enough interest in the broader community to have more of them made. Let me know if you'd be interested in purchasing snekboards; if I get enough responses, I'll look at running them through Crowd Supply or similar.

Posted Sun Jul 28 13:20:36 2019 Tags:

All Entries