[Buildroot] [PATCH 00/28] Use the best FPU strategies on 32-bits Arm Cortex

Tue Jan 25 20:55:23 UTC 2022

Thomas, Giulio, All,

On 2022-01-22 15:32 +0100, Thomas Petazzoni spake thusly:
> On Tue, 18 Jan 2022 11:43:10 +0100
> Giulio Benetti <giulio.benetti at benettiengineering.com> wrote:
> 
> > this patchset aims to enable the best FPU strategy for every board with
> > 32-bits Arm Cortex actually present in Buildroot. I don't own these boards
> > so I can't test these changes. What is about Allwinner doesn't worry me
> > because I've tested a lot of cases with Olimex boards, but the other
> > changes are still to be tested.
> > 
> > So I ask to the board maintainers to test the patches that involve their
> > boards if possible. Anyway I've checked well all the SoCs Datasheet and I
> > think I've made it correctly, even if in some of them it's not specified
> > if VFPv4 means -D32. I assume it like that because otherwise -D16 is
> > usually specified.
> 
> Do we have a good understand of what -mfpu=neon-vfpv4 does? Your patch
> series basically converts many defconfigs to use this -mfpu value, but
> it's not clear to me how it works. What does it mean to combine NEON
> and VFPv4 instructions?

The gcc man page states that specifying Neon as part of the fpu setting
has no effect, unless the -funsafe-math-optimizations is also specified,
because Neon is not compliant zith IEEE 754:

    If the selected floating-point hardware includes the NEON extension
    (e.g. -mfpu=neon), note that floating-point operations are not
    generated by GCC's auto-vectorization pass unless
    -funsafe-math-optimizations is also specified. This is because NEON
    hardware does not fully implement the IEEE 754 standard for
    floating-point arithmetic (in particular denormal values are treated
    as zero), so the use of NEON instructions may lead to a loss of
    precision.

So it is my understanding that using Neon is not a good idea overall. It
should only be requested on-demand by people who know what they are
doing, and most probably, be setting appropriate CFLAGS on a per-package
basis.

Additionally, changing the default FPU setting on those defconfigs is
not really interesting. Indeed, the base system that (most of) those
defconfig build are probably not exercising the FPU setting much (the
busybox login and shell are probably not using much FPU insns).

So, for me, this series is a no-go, first because it has not been tsted
on actual hardware, second because some of the changes introduce a
dubious feature which is actually a no-op at best, or worse will
generate incorrect code.

Regards,
Yann E. MORIN.

> Regarding VFPv4 D16 vs. D32,
> https://developer.arm.com/documentation/den0018/a/Compiling-NEON-Instructions/GCC-command-line-options/Option-to-specify-the-FPU
> tells us:
> 
>   VFPv3 and VFPv4 implementations provide 32 double-precision
>   registers. However, when NEON unit is not present, the top sixteen
>   registers (D16-D31) become optional. This is shown by the -d16 in the
>   option name, which means that the top sixteen D registers are not
>   available.
> 
> So, my understanding is that when NEON is available, the VFPv4 is
> guaranteed to have the 32 double-precision registers (D32).
> 
> Thomas
> -- 
> Thomas Petazzoni, co-owner and CEO, Bootlin
> Embedded Linux and Kernel engineering and training
> https://bootlin.com

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 561 099 427 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'