BBC BASIC for Windows - Setting a register to zero if it's

BBC BASIC for Windows

Programming

Assembly Language Programming (Moderator: admin)

Setting a register to zero if it's < zero

« Previous Topic | Next Topic »

Pages: 1 2 3

Author

Topic: Setting a register to zero if it's < zero (Read 1410 times)

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Setting a register to zero if it's < zero
« Thread started on: Oct 3^rd, 2011, 08:43am »

Unless I dreamt it, I'm sure Richard once showed a little 'trick' to test if a 32-bit register (EAX, EBX, etc.) is less than zero, and set it to zero (in a branchless way) if so.

Please remind me!

Logged

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Setting a register to zero if it's < zero
« Reply #1 on: Oct 3^rd, 2011, 09:41am »

Actually, an efficient bit of code for clamping a signed 32-bit integer to between 0 and 255 inclusive would be even more useful. I'm currently trawling the web for code snippets...

Logged

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Setting a register to zero if it's < zero
« Reply #2 on: Oct 3^rd, 2011, 09:46am »

Found this:

(QUOTE)
The generic way to handle arbitrary clamping values is something like this:

;; if ( x < MIN) x = MIN;
;; else if (x > MAX) x = MAX;

cmp eax,MIN ; Carry set if (eax < MIN)
sbb ebx,ebx ; EBX = -1 if underflow
cmp eax,MAX+1 ; Carry clear if (eax > MAX)
adc ebx,ebx ; Merge carry with previous result

At this point there are 4 possible combinations:

second 0 second 1
first 0 OVFL OK
first 1 impossible UNDFL

These four corresponds to EBX = 0, 1, -2, -1 which means that it is
possible to store the original value into a four-element table, then use
EBX to retrieve either the same value back or the clamped results:

cmp eax,MIN ; Carry set if (eax < MIN)
sbb ebx,ebx ; EBX = -1 if underflow
cmp eax,MAX+1 ; Carry clear if (eax > MAX)
adc ebx,ebx ; Merge carry with previous result
mov clamp_table[4], eax
mov eax,clamp_table[ebx*4+8]

This code will only be faster than the naive approach if clamped values
are both common and non-predictable, if this isn't true then it is much
better to simply use the generic C-style if/else if/else approach!

Terje
(END QUOTE)

That'll probably do.

« Last Edit: Oct 3^rd, 2011, 09:53am by David Williams »

Logged

admin
Administrator

member is offline

Posts: 1145

Re: Setting a register to zero if it's < zero
« Reply #3 on: Oct 3^rd, 2011, 10:37am »

on Oct 3^rd, 2011, 09:46am, David Williams wrote:

That'll probably do.

But do note Terje's postscript: "This code will only be faster than the naive approach if clamped values are both common and non-predictable, if this isn't true then it is much better to simply use the generic C-style if/else if/else approach!". Specifically, if not many values have to be clamped (and in my experience that's the usual case) you'll probably be better off using conditional jumps.

Richard.

Logged

Michael Hutton
Developer

member is offline
Avatar

Gender:

Posts: 248

Re: Setting a register to zero if it's < zero
« Reply #4 on: Oct 8^th, 2011, 9:38pm »

Could you use the CMOV instruction: ?

push max
push min
cmp eax, max
cmovg eax,[esp+4]
cmp eax, min
cmovl eax,[esp]
add esp,8

Michael

(You may have seen the original post... I didn't realise at first that CMOV doesn't actually do the test itself.. in a way I wish it did. ::))

Edit:
Also , I suppose you could do it all in registers if you have them spare:

mov edi, max
mov esi, min
cmp eax, edi
cmovg eax, edi
cmp eax, esi
cmovl eax, esi

another Edit:

Yes this works:

Code:

      INSTALL @lib$ + "ASMLIB"
      DIM C% 100, L%-1
      FOR pass=8 TO 10 STEP 2
        P% = C%
        ON ERROR LOCAL [OPT FN_asmext
        [
        OPT pass
        .clamp
        ;eax contains value to test
        push 5
        push 0
        cmp eax, 5
        cmovg eax, [esp+4]
        cmp eax, 0
        cmovl eax, [esp]
        add esp,8
        ret
        ]
      NEXT
      
      FOR A% = -5 TO 10
        PRINT A%, USR(clamp)
      NEXT

« Last Edit: Oct 8^th, 2011, 10:44pm by Michael Hutton »

Logged

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Setting a register to zero if it's < zero
« Reply #5 on: Oct 9^th, 2011, 05:03am »

on Oct 8^th, 2011, 9:38pm, Michael Hutton wrote:

push max
push min
cmp eax, max
cmovg eax,[esp+4]
cmp eax, min
cmovl eax,[esp]
add esp,8

Looks very elegant to my eyes. And totally branchless.

I think that with the help your solution, my bitmap colour saturation modifying routine will be fast enough to render dozens of colour-saturation-(de-)enhanced sprites in realtime, offering yet another creative option for the growing er.. army of BB4W game-makers.

Thanks again, Michael.

Rgs,
David.

Logged

admin
Administrator

member is offline

Posts: 1145

Re: Setting a register to zero if it's < zero
« Reply #6 on: Oct 9^th, 2011, 09:46am »

on Oct 9^th, 2011, 05:03am, David Williams wrote:

Looks very elegant to my eyes. And totally branchless.

The downside, of course, is that (if we're talking about BB4W here) it means using the ASMLIB library, with all that implies in respect of disabling crunching - for example needing to put your source in a separate file which you assemble using CALL.

I assume your application is somewhat untypical, in that clipping is both common and unpredictable. In the more typical case (digital filtering is a classic example) clipping is the exception: the majority of values will be 'in range'. In that case the naive approach using conditional jumps will outperform the CMOV code on any modern CPU.

Incidentally, your original question was how to clip off negative values, i.e. clipping to zero. That's simpler than the general case of course, for example this is one branchless solution (if that's definitely what you want):

Code:

      cmp eax,&80000000
      sbb ebx,ebx
      and eax,ebx

Richard.

Logged

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Setting a register to zero if it's < zero
« Reply #7 on: Oct 9^th, 2011, 10:28am »

on Oct 9^th, 2011, 09:46am, Richard Russell wrote:

Yes, I'm aware of this. I'll probably hand-assemble the instructions to get around the problem of not being able to crunch the program.

on Oct 9^th, 2011, 09:46am, Richard Russell wrote:

I assume your application is somewhat untypical, in that clipping is both common and unpredictable. In the more typical case (digital filtering is a classic example) clipping is the exception: the majority of values will be 'in range'. In that case the naive approach using conditional jumps will outperform the CMOV code on any modern CPU.

I'll conduct some timed tests at some point, aware of course of the performance variations resulting from different CPU architectures.

on Oct 9^th, 2011, 09:46am, Richard Russell wrote:

Incidentally, your original question was how to clip off negative values, i.e. clipping to zero. That's simpler than the general case of course, for example this is one branchless solution (if that's definitely what you want):

Code:

      cmp eax,&80000000
      sbb ebx,ebx
      and eax,ebx

Richard.

Thanks for this. I think I might set up another web page at bb4wgames.com -- a reference page of handy code snippets and gems just like that one; various x86 asm tips & tricks, etc.

David.

« Last Edit: Oct 9^th, 2011, 10:28am by David Williams »

Logged

Michael Hutton
Developer

member is offline
Avatar

Gender:

Posts: 248

Re: Setting a register to zero if it's < zero
« Reply #8 on: Oct 9^th, 2011, 10:31am »

Here are three links discussing the cmov instruction which seem not to be so impressed with it:

http://www.redhat.com/archives/rhl-devel-list/2009-February/msg00422.html
https://mail.mozilla.org/pipermail/tamarin-devel/2008-April/000455.html
http://ondioline.org/mail/cmov-a-bad-idea-on-out-of-order-cpus

So it's horses for courses as per usual.

Michael

Logged

Michael Hutton
Developer

member is offline
Avatar

Gender:

Posts: 248

Re: Setting a register to zero if it's < zero
« Reply #9 on: Oct 9^th, 2011, 11:11am »

Ok, for what it is worth, a not so very accurate test but it seems to favour the cmov instruction with random numbers on a core i5. You can test for not so random numbers as well I suppose, and it will vary according to processor type.
Code:

          0:               INSTALL @lib$ + "ASMLIB"
         0:               DIM C% 100, L%-1
         0:               FOR pass=8 TO 10 STEP 2
         0:               P% = C%
         0:               ON ERROR LOCAL [OPT FN_asmext
         0:               [
         0:               OPT pass
         0:               .clampcmov
         0:               .M%
         0:               push 255
         0:               push 0
         0:               cmp eax, 255
         0:               cmovg eax, [esp+4]
         0:               cmp eax, 0
         0:               cmovl eax, [esp]
         0:               add esp,8
         0:               ret
         0:               
         0:               .clampbranch
         0:               .N%
         0:               cmp eax,5
         0:               jl testmin
         0:               mov eax,5
         0:               .testmin
         0:               cmp eax,0
         0:               jg end
         0:               xor eax, eax
         0:               .end
         0:               ret
         0:               
         0:               .clampcmov2
         0:               .O%
         0:               mov esi, 255
         0:               mov edi,0
         0:               cmp eax, 255
         0:               cmovg eax, esi
         0:               cmp eax, 0
         0:               cmovl eax, edi
         0:               ret
         0:               
         0:               
         0:               ]
         0:               NEXT
         0:               
        20:     0.08      FOR I%=1 TO 10000000
      1695:     7.13      A% = RND
      5503:    23.15      B% = USR(M%)
       628:     2.64      NEXT
         0:               
        41:     0.17      FOR I%=1 TO 10000000
      1637:     6.89      A% = RND
      5628:    23.67      B% = USR(N%)
       683:     2.87      NEXT
         0:               
        25:     0.11      FOR I%=1 TO 10000000
      1668:     7.02      A% = RND
      5552:    23.35      B% = USR(O%)
       693:     2.91      NEXT
         0:               
         0:               
         0:               END
         0:

:-/

Michael

Logged

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Setting a register to zero if it's < zero
« Reply #10 on: Oct 9^th, 2011, 12:01pm »

Interesting stuff, Michael. Thanks for links to the CMOV discussions, and the timing test.

Those discussions seem to suggest, overall, that the jury is still out?

Incidentally, when I do timing tests, I usually raise the process priority of my program to Higher (or Above Normal) -- certainly not Highest, because I think it produces more stable timings.

For my specific application (fast colour saturation enhancement/reduction - basically blending a colour RGB pixel with its own greyscale intensity -- another routine for GFXLIB, of course), I don't think the branches are very predictable, so perhaps CMOV will win for me. It remains to be seen!

===

Now, sorry to chop & change, but...

I'd like to hijack my own thread and ask what you think is the fastest way of getting the separate R, G, B values from a 32-bit xRGB pixel (&xxRrGgBb) into three registers... say EAX, EBX, ECX.

I've found that this seems fast:

Code:

movzx ecx, BYTE [edi + 4*esi]      ; blue
movzx ebx, BYTE [edi + 4*esi + 1]  ; green
movzx eax, BYTE [edi + 4*esi + 2]  ; red

; EDI points to ARGB32 bitmap base address
; ESI is the pixel index

By the way, I found no discernible speed increase by loading (via LEA) the pixel address EDI+4*ESI into some register.

Why am I surprised that the above method is noticeably faster than this one below:

Code:

mov edx, [edi + 4*esi]   ; load 32-bit &xxRrGgBb pixel

mov eax, edx             ; copy EDX
mov ebx, edx             ; copy EDX (again)
mov ecx, edx             ; ...and again

and eax, &FF0000         ; EAX = &00 Rr 00 00 (red component * 2^16)
and ebx, &FF00           ; EBX = &00 00 Gg 00 (green component * 2^8)
and ecx, &FF             ; ECX = &00 00 00 Bb (blue component)
        
shr eax, 16              ; EAX (al) = &Rr (red byte)
shr ebx, 8               ; EBX (bl) = &Gg (green byte)
;                        ; ECX (cl) = &Bb (blue byte)

(My overly verbose ASM comments are for my own benefit - I don't intend to patronise! I nearly always verbosely comment my asm code.)

So, one single main memory access (in the latter case) versus three main memory accesses.

Would you say that the three EDX copying instructions are destroying parallel (U & V pipe) processing opportunities?

In any case, is there a faster way of getting those R, G, B values into EAX, EBX and ECX respectively?

Thanks in advance!

Rgs,
David.

PS. I notice that this forum system sometimes changes the word a_s_s_e_m_b_l_y into disagreembly. Bizarre.

« Last Edit: Oct 9^th, 2011, 12:04pm by David Williams »

Logged

Michael Hutton
Developer

member is offline
Avatar

Gender:

Posts: 248

Re: Setting a register to zero if it's < zero
« Reply #11 on: Oct 9^th, 2011, 12:35pm »

Quote:

In any case, is there a faster way of getting those R, G, B values into EAX, EBX and ECX respectively?

Good god! Why are you asking me? ???

I am an amoebic 'stimulus - respond' man when it comes to asm! Normally, I just play with what you've posted!

But out of interest, I was thinking about that very thing the other day when you sent me the 'Colourdrain' code.. and I am in visual C++ now typing out the procedure to see if I can look at the asm output and seeing how it may change depending on the C++ code.

I too was surprised you used the :
Code:

movzx ecx, BYTE [edi + 4*esi]      ; blue
movzx ebx, BYTE [edi + 4*esi + 1]  ; green
movzx eax, BYTE [edi + 4*esi + 2]  ; red

but came to the conclusion that that it would be cached.

but as alternatives I played around initially with:
Code:

xor ebx,ebx
xor ecx,ecx
mov eax, [edi + 4 * esi]
mov bl, ah     <note edited from original>
mov bl, al      <note edited from original>
shr eax,16
; and eax, &FF (if your going to use eax rather than al)

and if you just want the byte values you could forget about the xor's and just use the al, bl and cl registers..

so I suppose the bare minimal would be:
Code:

mov eax, [edi + 4 * esi]
mov bl, ah <note edited from original>
mov cl, al  <note edited from original>
shr eax,16

and you'll have the rgb values in al, bl, cl respectively. If you want them 'tidied' to 32 bit values then add

Code:

and eax,&FF
and ebx,&FF
and ecx,&FF

I hope I haven't overlooked something blindingly obvious as per ususal.

Michael

« Last Edit: Oct 9^th, 2011, 2:24pm by Michael Hutton »

Logged

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Setting a register to zero if it's < zero
« Reply #12 on: Oct 9^th, 2011, 1:33pm »

on Oct 9^th, 2011, 12:35pm, Michael Hutton wrote:

Good god! Why are you asking me? ???

Yes, sorry if I put you on the spot a little. Actually, the question was meant to be addressed to you, Richard and anyone else who knows x86 assembly language.

on Oct 9^th, 2011, 12:35pm, Michael Hutton wrote:

so I suppose the bare minimal would be:
Code:

mov eax, [edi + 4 * esi]
mov ebx, ah
mov ecx, al
shr eax,16

Sorry Michael, but the BB4W assembler didn't like MOVing an 8-bit register into a 32-bit one like that. It's hard to see why such an operation shouldn't be allowed, but then I'm not an Intel engineer!

Here is a timing test I did (excuse the obsession with data - and even instruction - alignment!):

Code:

      MODE 8 : OFF
      
      HIMEM = LOMEM + 2*&100000
      
      DIM gap1% 4096
      DIM bitmap% 4*(640*512 + 1)
      bitmap% = (bitmap% + 3) AND -4
      DIM  gap2% 4096
      
      REM. These 4 Kb gaps are probably way OTT, but just to be certain!
      
      PROC_asm
      
      REM. Fill bitmap with random values
      FOR I% = bitmap% TO (bitmap% + 4*640*512)-1 STEP 4
        !I% = RND
      NEXT
      
      G% = FNSYS_NameToAddress( "GetTickCount" )
      time0% = 0
      time1% = 0
      
      PRINT '" Conducting test #1, please wait..."'
      
      REM. Test #1 (three MOVZX instructions)
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        CALL A%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #1 (three MOVZX instructions) took ";
      PRINT ;(time1% - time0%)/1000; " s."'
      
      PRINT '" Conducting test #2, please wait..."'
      
      REM. Test #2 (one MOV instruction)
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        CALL B%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #2 (one MOV instruction) took ";
      PRINT ;(time1% - time0%)/1000; " s."'
      
      PRINT '" Conducting test #3, please wait..."'
      
      REM. Test #3 (one MOV instruction; other instructions re-ordered)
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        CALL C%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #3 (one MOV instruction; other instructions re-ordered) took ";
      PRINT ;(time1% - time0%)/1000; " s."'
      
      PRINT''" Finished."
      END
      :
      :
      :
      :
      DEF PROC_asm
      LOCAL I%, P%, code%, loop_A, loop_B, loop_C
      DIM code% 1000
      
      FOR I% = 0 TO 2 STEP 2
        P% = code%
        [OPT I%
        
        ] : P%=(P%+3) AND -4 : [OPT I%
        
        .A%
        mov edi, bitmap%
        xor esi, esi
        .loop_A
        movzx ecx, BYTE [edi + 4*esi]
        movzx ebx, BYTE [edi + 4*esi + 1]
        movzx eax, BYTE [edi + 4*esi + 2]
        add esi, 1
        cmp esi, (640 * 512)
        jl loop_A
        ret
        
        ] : P%=(P%+3) AND -4 : [OPT I%
        
        .B%
        mov edi, bitmap%
        xor esi, esi
        .loop_B
        mov edx, [edi + 4*esi]
        mov eax, edx
        mov ebx, edx
        mov ecx, edx
        and eax, &FF0000
        and ebx, &FF00
        and ecx, &FF
        shr eax, 16
        shr ebx, 8
        add esi, 1
        cmp esi, (640 * 512)
        jl loop_B
        ret
        
        ] : P%=(P%+3) AND -4 : [OPT I%
        
        .C%
        mov edi, bitmap%
        xor esi, esi
        .loop_C
        mov edx, [edi + 4*esi]
        
        mov eax, edx
        and eax, &FF0000
        shr eax, 16
        
        mov ebx, edx
        and ebx, &FF00
        shr ebx, 8
        
        mov ecx, edx
        and ecx, &FF
        
        add esi, 1
        cmp esi, (640 * 512)
        jl loop_C
        ret
        
        ]
        
      NEXT I%
      ENDPROC
      :
      :
      :
      :
      DEF FNSYS_NameToAddress( f$ )
      LOCAL P%
      DIM P% LOCAL 5
      [OPT 0 : call f$ : ]
      =P%!-4+P%

On my Intel Centrino Duo 1.whatever GHz laptop, I get the following results:

Test #1 (three MOVZX instructions): 7.84 seconds.

Test #2 (one MOV instruction): 10.92 seconds.

Test #3 (one MOV instructoin; other instructions re-ordered): 10.97 seconds.

By "MOV instruction", I mean of course the main memory/cache pixel data loads.

I'd say that, with a difference of nearly 3 seconds, the three MOVZX method is significantly faster.

I still wonder if there's an even faster way of doing it.

Regards,

David.

Logged

Michael Hutton
Developer

member is offline
Avatar

Gender:

Posts: 248

Re: Setting a register to zero if it's < zero
« Reply #13 on: Oct 9^th, 2011, 1:39pm »

Code:

mov eax, [edi + 4 * esi]
mov ebx, ah
mov ecx, al
shr eax,16

well, you must have known I meant: ;)

Code:

mov eax, [edi + 4 * esi]
mov bl, ah
mov cl, al
shr eax,16

I shouldn't edit code in the conforums!

Put that in the timings program... I would do myself but I have only got the demo version on this laptop (Yes, Richard, I can't remember what the Serial Number reg key is actually called and exactly where it is!)

Michael

Logged

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Setting a register to zero if it's < zero
« Reply #14 on: Oct 9^th, 2011, 2:03pm »

on Oct 9^th, 2011, 1:39pm, Michael Hutton wrote:

well, you must have known I meant: ;)

I was just being dense. And having been up all night... etc.

on Oct 9^th, 2011, 1:39pm, Michael Hutton wrote:

Code:

mov eax, [edi + 4 * esi]
mov bl, ah
mov cl, al
shr eax,16

Right, the updated timing test code with your solution(s):

Code:

      MODE 8 : OFF
      
      HIMEM = LOMEM + 2*&100000
      
      DIM gap1% 4096
      DIM bitmap% 4*(640*512 + 1)
      bitmap% = (bitmap% + 3) AND -4
      DIM  gap2% 4096
      
      REM. These 4 Kb gaps are probably way OTT, but just to be certain!
      
      PROC_asm
      
      REM. Fill bitmap with random values
      FOR I% = bitmap% TO (bitmap% + 4*640*512)-1 STEP 4
        !I% = RND
      NEXT
      
      G% = FNSYS_NameToAddress( "GetTickCount" )
      time0% = 0
      time1% = 0
      
      PRINT '" Conducting test #1, please wait..."'
      
      REM. Test #1 (three MOVZX instructions)
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        CALL A%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #1 (three MOVZX instructions) took ";
      PRINT ;(time1% - time0%)/1000; " s."'
      
      PRINT '" Conducting test #2, please wait..."'
      
      REM. Test #2 (one MOV instruction)
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        CALL B%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #2 (one MOV instruction) took ";
      PRINT ;(time1% - time0%)/1000; " s."'
      
      PRINT '" Conducting test #3, please wait..."'
      
      REM. Test #3 (one MOV instruction; other instructions re-ordered)
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        CALL C%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #3 (one MOV instruction; other instructions re-ordered) took ";
      PRINT ;(time1% - time0%)/1000; " s."'
      
      PRINT '" Conducting test #4, please wait..."'
      
      REM. Test #4 (one MOV instruction; Michael's solution (no XORs))
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        CALL D%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #4 (one MOV instruction; Michael's solution (no XORs)) took ";
      PRINT ;(time1% - time0%)/1000; " s."'
      
      PRINT '" Conducting test #5, please wait..."'
      
      REM. Test #5 (one MOV instruction; Michael's solution (with XORs))
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        CALL E%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #5 (one MOV instruction; Michael's solution (with XORs)) took ";
      PRINT ;(time1% - time0%)/1000; " s."'
      
      PRINT''" Finished."
      END
      :
      :
      :
      :
      DEF PROC_asm
      LOCAL I%, P%, code%, loop_A, loop_B, loop_C, loop_D, loop_E
      DIM code% 1000
      
      FOR I% = 0 TO 2 STEP 2
        P% = code%
        [OPT I%
        
        ] : P%=(P%+3) AND -4 : [OPT I%
        
        .A%
        mov edi, bitmap%
        xor esi, esi
        .loop_A
        movzx ecx, BYTE [edi + 4*esi]
        movzx ebx, BYTE [edi + 4*esi + 1]
        movzx eax, BYTE [edi + 4*esi + 2]
        add esi, 1
        cmp esi, (640 * 512)
        jl loop_A
        ret
        
        ] : P%=(P%+3) AND -4 : [OPT I%
        
        .B%
        mov edi, bitmap%
        xor esi, esi
        .loop_B
        mov edx, [edi + 4*esi]
        mov eax, edx
        mov ebx, edx
        mov ecx, edx
        and eax, &FF0000
        and ebx, &FF00
        and ecx, &FF
        shr eax, 16
        shr ebx, 8
        add esi, 1
        cmp esi, (640 * 512)
        jl loop_B
        ret
        
        ] : P%=(P%+3) AND -4 : [OPT I%
        
        .C%
        mov edi, bitmap%
        xor esi, esi
        .loop_C
        mov edx, [edi + 4*esi]
        
        mov eax, edx
        and eax, &FF0000
        shr eax, 16
        
        mov ebx, edx
        and ebx, &FF00
        shr ebx, 8
        
        mov ecx, edx
        and ecx, &FF
        
        add esi, 1
        cmp esi, (640 * 512)
        jl loop_C
        ret
        
        ] : P%=(P%+3) AND -4 : [OPT I%
        
        .D%
        mov edi, bitmap%
        xor esi, esi
        .loop_D
        
        mov eax, [edi + 4 * esi]
        mov bl, ah
        mov cl, al
        shr eax,16
        
        add esi, 1
        cmp esi, (640 * 512)
        jl loop_D
        ret
        
        ] : P%=(P%+3) AND -4 : [OPT I%
        
        .E%
        mov edi, bitmap%
        xor esi, esi
        .loop_E
        
        xor ebx, ebx
        xor ecx, ecx
        mov eax, [edi + 4 * esi]
        mov bl, ah
        mov cl, al
        shr eax,16
        
        add esi, 1
        cmp esi, (640 * 512)
        jl loop_E
        ret
        
        ]
        
      NEXT I%
      ENDPROC
      :
      :
      :
      :
      DEF FNSYS_NameToAddress( f$ )
      LOCAL P%
      DIM P% LOCAL 5
      [OPT 0 : call f$ : ]
      =P%!-4+P%

Results:

Test#1 (three MOVZX's) : 7.81 s
Test#2 (one MOV) : 10.89 s
Test#3 (one MOV; instructions re-ordered): 10.83 s
Test#4 (one MOV; Michael's solution (no XORs)): 6.02 s
Test#5 (one MOV; Michael's solution (with XORs)): 7.45 s

Since I probably need (I can't remember off-hand) to have the 3 higher bytes of EAX, EBX and ECX all clear, your code with the XORs looks like the one to go with, so thanks for that.

(Edit: Hey, I think this is another prime candidate for my proposed BB4W "Handy Assembler Code Snippets" (or whatever) web page!)

Rgs,
David.

« Last Edit: Oct 9^th, 2011, 2:26pm by David Williams »

Logged

Pages: 1 2 3


« Previous Topic \| Next Topic »