[Openmcl-devel] trunk unstable. Linux ARM port identified as culprit.

Fri Aug 6 10:07:51 PDT 2010

On Thu, 5 Aug 2010, Brian Mastenbrook wrote:

> On 8/5/2010 5:37 PM, Gary Byers wrote:
>> 
>> <http://en.wikipedia.org/wiki/ARM_architecture#ARM_cores>
>> 
>> offers a helpful chart which shows which ARM cores implement which
>> architecture revisions. The Omap2 implements ARMv6, but not ARMv6T2;
>> if it's an "unhandled exception 4", that may be an unimplemented
>> instruction and the instruction may be a "movw" (which is both legal
>> and very useful on ARMv6T2 and later.)
>> 
>> If it's any consolation, I have an N810, too.
>> 
>> Until everyone who's interested memorizes that chart, this will likely
>> be very confusing.
>> 
>> (That WikiPedia page says that ~2,450,000,000 ARM cores were shipped in
>> 2006; it neglects to mention that no two of them were alike.)
>
> Is ARMv6T2 a sensible minimum?

It'd be really great if it was possible to both fully exploit newer,
higher-end ARM hardware (say, the Cortex A8, A9, and their successors)
and simultaneously support things like the SheevaPlug and the OpenRD
boxes.  (I have two OpenRD boxes; they're nice little machines.  All
other things being equal, it'd be great if CCL ran on them.  I don't
think that things are equal, and after thinking about fairly hard and
for a fairly long time, I'm pretty much convinced that the most sensible
minimum is nearer the higher end of the spectrum.)

The OpenRD and the SheevaPlug (and most other ARMv5 machines) don't
have the Vector Floating Point (VFP) unit that's present on later
machines.  (They either have some proprietary Marvell FPU or are
FPU-less).  You -might- be able to write a VFP emulator and run the
same code on an ARMv5 and on a more modern machine, and you might live
to tell about that if that code didn't do non-incidental FP
arithmetic. I haven't tried, and don't know whether emulating VFP
instructions on VFP-less hardware would be at all practical.

Not using VFP instructions for floating-point - because the SheevaPlug
is Really Neat and cheap and there are probably lots of them out there -
is another way of solving this problem.  (The technical term for this
approach is "A Really Stupid Idea.")

Older (pre-~v6T2) ARMs don't offer reliable ways of doing atomic
memory operations.  (The SWP instruction - which was intended to
provide this functionality - isn't interrupt-safe and is now
deprecated.)  Newer variants offer reliable ways of doing things like
"store conditional" via LDREX/STREX/CLREX; I'm not sure that it'd be
practical to emulate these instructions on hardware that doesn't offer
them.  Linux provides a sort of low-rent system call that implements a
certain flavor of store-conditional based on what the hardware
supports (and the interrupt handler special-cases the case of a SWP
instruction getting interrupted.)  The low-rent syscall is more
expensive than direct use of LDREX/STREX when these instructions are
available; in CCL, the Linux-specific emulation would probably need to
be treated like a foreign function call.  I tried using it in an
earlier version of the port and it adds compexity and overhead to
parts of the system that already have plenty of both.  (Again, it -might-
be possible/practical to emulate LDREX/STREX/CLREX on systems that don't
provide these instructions, and that might even be viable.  I haven't
tried, but I'd much rather try to support older machines via emulation
than do so by limiting the performance of newer/higher-end machines.)

For those who care and don't know: "movw" and "movt" instructions move
16 bit constant values to the low 16 bits of registers (movw), zeroing
the upper 16 bits or to the upper 16 bits of registers (movt), leaving
the lower 16 bits unchanged.  If these instructions are available, it
takes at most 2 instructions to get an arbitrary 32-bit value into a
register; if they aren't, it can take as many as 4 instructions and is
often implemented by loading the value from PC-relative memory.

It's "better" to use movw/movt than the alternative, but I'd agree
that it's not so compellingly better that machines that don't support
16-bit MOV instructions should be rejected on that basis alone.  The
point at which movw/movt are supported is somewhere around the point
where ldrex/strex/clrex are supported and on the right side of the
point where VFP hardware is present.  If it's possible/practical to
work around the absence of those other features (perhaps by emulating
them), then yes, it's probably more sensible to not insist on
movw/movt as minimum functionality and run on a slightly wider range
of hardware.  (At the very least, that's a defensible position and
there are sensible arguments on both sides of that issue.)  At the
point where CCL started depending on the presence of movw/movt - about
2 weeks ago - the benefits of doing so (saving a few cycles here and
there, not having to support PC-relative data in the assembler and
compiler backend) seemed to outweigh the cost (further abandoning all
of those older ARM cores that don't support those instructions.)

> I have at least two devices (a N800 and a HTC 
> Dream / G1) with an ARMv6. I believe the ARMv6KZ in my Palm Pixi Plus also 
> lacks the movw/movt instructions.

I'm fairly sure that the ARMv6KZ (which is the same class of machine as was
used in early iPod Touches and iPhones) supports movw/movt; the docs that
I'm looking at are less clear as to whether clrex is supported.  I think
that it's desirable to use clrex if it's available and necessary.

> The Marvell processor in the SheevaPlug 
> devices is an ARMv5TE, which would rule it out as well. There are numerous 
> cheap Android tablets coming on the market with ARMv6KZ-or-below processors 
> as well.

There'll be bazillions of Cortex A8s and A9s, too.

GCC on Ubuntu 10.04 defaults to generating Thumb 2 code.  The majority
of the ARM boxes that I have won't run Thumb 2 code; for some reason,
Canonical didn't ask me what I thought about this and I assume that
they didn't ask you about this, either.  If I had to guess whether
"they just didn't think about it" or "thought long and hard about
continuing to support older machines and decided not to", I think that
I'd probably guess the latter.  (I don't know whether compiling CCL
to Thumb 2 would be better or not; I'm skeptical that code density
would improve much, but there might be other ways in which it'd be
better.  If it was clearly better ... well, good bye forever pre-T2
machines.)

Rumors about the system requirements for Android 3 are just that at
this point.  If there's any truth to them, I can probably give up on
the idea of running Android 3 on my N810.  I'd better tell Google
... unless they know and don't particularly care.

>
> -- 
> Brian Mastenbrook
> brian at mastenbrook.net
> http://brian.mastenbrook.net/
> _______________________________________________
> Openmcl-devel mailing list
> Openmcl-devel at clozure.com
> http://clozure.com/mailman/listinfo/openmcl-devel
>
>