[Openmcl-devel] Re: [cl-ppcre-devel] cl-ppcre speedup

Wed Dec 15 10:22:45 PST 2004

On Wed, 15 Dec 2004, Alan Ruttenberg wrote:

>
> On Dec 15, 2004, at 3:41 AM, Gary Byers wrote:
>
> > I wonder if there's a clever way to inline CHAR-UPCASE/CHAR-DOWNCASE ?
> > (There are obviously straightforward ways to do this; it's sometimes
> > faster to do this via table lookup, but there may be ways to do this
> > without conditional branches or memory references.)
> >
>
> You never have to change more than one bit
>
> (loop for code from 0 below 256
>     for char = (code-char code)
>     for upcase-code = (char-code (char-upcase char))
>     always (if (/= upcase-code code) (= #b100000 (logxor code
> upcase-code)) t))
>
> T
>
> If the top nibble isn't 6 or 7 then you don't change the bit
>
> (remove-duplicates
>   (loop for code from 0 below 256
>      for char = (code-char code)
>      for upcase-code = (char-code (char-upcase char))
>      when (/= upcase-code code)
>      collect (/ (logand #xf0 code) 16)))
>
> (6 7)
>
> You need to do a lookup of the bit for 32 cases
>
> (loop for code from 0 below 256
>     for char = (code-char code)
>     for upcase-code = (char-code (char-upcase char))
>     when (and (/= upcase-code code) (= (logand #xf0 code) #60)) sum 1)
> 15
>
> (loop for code from 0 below 256
>     for char = (code-char code)
>     for upcase-code = (char-code (char-upcase char))
>     when (and (/= upcase-code code) (= (logand #xf0 code) #x70)) sum 1)
>
> 11
>
> I presume that the ifs aren't really conditionals but rather uses of a
> condition register value
>
> (defun char-upcase-by-bits (char)
>    (let ((code (char-code char)))
>      (let* ((nochange (if (= #x60 (logand code #xe0)) 1 0))
> 	   (offset (if (= 0 (logand #x10 code)) 0 1))
> 	   (testbit (if (logbitp (logior (ash offset 4) (logand #xf code))
> #b00000111111111111111111111111110) 1 0))
> 	   (flipper (ash (logand testbit nochange) 5)))
>        (code-char (logxor flipper code)))))
>
> seems to work:
>
> (loop for code from 0 below 256 for char = (code-char code)
>     always (eq (char-upcase-by-bits char) (char-upcase char)))
>
> T
>
> The question is, how few instructions can this be packed in to and
> whether it runs faster than the simpler memory lookup.
>
>

What I was thinking of (and eventually figured out) was something like:

;;; Assume that arg_z contains a character.  If it's uppercase, map it
;;; to lowercase.  Don't branch.

(lis imm0 (char-code #\A))
(ori imm0 imm0 target::subtag-character)	; imm0 <- #\A
(lis imm1 (char-code #\A))
(ori imm1 imm1 target::subtag-character)	; imm1 <- #\Z)
(subfc imm1 arg_z imm1)				; imm1 <- 1, if
(addze imm1 rzero)				;   (char<= arg_z #\Z),
						;    else 0
(subfc imm0 imm0 arg_z)				; imm0 <- 1, if
(addze imm0 rzero)				;   (char>= arg_z #\A),
						;    else 0
(and imm0 imm1 imm0)				; imm0 <- 1 if upcase,
						;    else 0
(slwi imm0 imm0 5)				; imm0 < 32 or 0
(or arg_z arg_z imm0)				; maybe make lowercase

You might be able to knock a few instructions out of that, and I wouldn't
want to guarantee that I don't have the operands backwards in one or
both "subfc" instructions.  The more straightforward:

(lis imm0 (char-code #\A))
(ori imm0 imm0 target::subtag-character)	; imm0 <- #\A
(lis imm1 (char-code #\A))
(ori imm1 imm1 target::subtag-character)	; imm1 <- #\Z)
(cmplw :cr0 arg_z imm0)
(cmplw :cr1 arg_z imm1)
(blt :cr0 no)
(bgt :cr1 no)
(ori arg_z arg_z 32)
no

is probably going to have worse performance in general (depending on
how often branches are mispredicted and how much that costs); that's
what I was trying to avoid.

(This is all just to address the question "if the compiler were to
open-code CHAR-UPCASE/CHAR-DOWNCASE, what kind of code should it
generate ?")