Author |
Topic: Setting a register to zero if it's < zero (Read 1413 times) |
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Setting a register to zero if it's < zero
« Reply #30 on: Oct 9th, 2011, 5:32pm » |
|
on Oct 9th, 2011, 4:14pm, David Williams wrote:| Assuming that those FOR...NEXT loops still had 5000 iterations each (as in the original) |
|
Yes, the code was unchanged except that I modified it to align each routine on a 32-byte boundary (rather than 4). My CPU is a 2.8 GHz Pentium 4.
Here is a comparison of MMX and GP register versions of your 'luminance matrix' code:
Code: MODE 8 : OFF
HIMEM = LOMEM + 2*&100000
DIM gap1% 4096
DIM bitmap% 4*(640*512 + 2)
bitmap% = (bitmap% + 7) AND -8
DIM gap2% 4096
REM. These 4 Kb gaps are probably way OTT, but just to be certain!
PROC_asm
REM. Fill bitmap with random values
FOR I% = bitmap% TO (bitmap% + 4*640*512)-1 STEP 4
!I% = RND
NEXT
G% = FNSYS_NameToAddress( "GetTickCount" )
time0% = 0
time1% = 0
PRINT '" Conducting test #1, please wait..."'
REM. Test #1 (DW's code)
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &80
SYS G% TO time0%
FOR I% = 1 TO 5000
C%=USRA%
NEXT
SYS G% TO time1%
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &20
PRINT '" Test #1 (DW's code) took ";
PRINT ;(time1% - time0%)/1000; " s. (final result ";C% ")"'
PRINT '" Conducting test #2, please wait..."'
REM. Test #2 (RTR's code)
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &80
SYS G% TO time0%
FOR I% = 1 TO 5000
C%=USRB%
NEXT
SYS G% TO time1%
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &20
PRINT '" Test #2 (RTR's code) took ";
PRINT ;(time1% - time0%)/1000; " s. (final result ";C% ")"'
PRINT''" Finished."
END
;
DEF PROC_asm
LOCAL I%, P%, code%, loop_A, loop_B, loop_C
DIM code% 1000
FOR I% = 0 TO 2 STEP 2
P% = code%
[OPT I%
] : P%=(P%+31) AND -32 : [OPT I%
.A%
mov esi, bitmap%
.loop_A
movzx ecx, BYTE [esi] ; ECX (cl) = blue byte (b&)
movzx ebx, BYTE [esi + 1] ; EBX (bl) = green byte (g&)
movzx eax, BYTE [esi + 2] ; EAX (al) = red byte (r&)
xor ebp, ebp ; EBP = cumulative intensity (i) - initially zero
;REM. i += (0.114 * 2^20) * b&
mov edx, ecx
imul edx, (0.114 * 2^20)
add ebp, edx
;REM. i += (0.587 * 2^20) * g&
mov edx, ebx
imul edx, (0.587 * 2^20)
add ebp, edx
;REM. i += (0.299 * 2^20) * r&
mov edx, eax
imul edx, (0.299 * 2^20)
add ebp, edx
shr ebp, 20 ; EDX (i&) now in the range 0 to 255
add esi, 4
cmp esi, (bitmap% + 640 * 512)
jl loop_A
mov eax,ebp
ret
] : P%=(P%+31) AND -32 : [OPT I%
.matrix
dw 0.114 * 2^15
dw 0.587 * 2^15
dw 0.299 * 2^15
dw 0
.B%
mov esi, bitmap%
movq mm7, [matrix]
.loop_B
punpcklbw mm0,[esi]
psrlw mm0,8
pmaddwd mm0,mm7
movd ebp,mm0
pshufw mm0,mm0,%01001110
movd eax,mm0
add ebp,eax
shr ebp,15
add esi, 4
cmp esi, (bitmap% + 640 * 512)
jl loop_B
mov eax,ebp
ret
]
NEXT I%
ENDPROC
;
DEF FNSYS_NameToAddress( f$ )
LOCAL P%
DIM P% LOCAL 5
[OPT 0 : call f$ : ]
=P%!-4+P% On my PCs the MMX version doesn't run much, if any, faster but the potential gain comes from being able to use the other MMX registers to process (say) four pixels 'in parallel'.
Richard.
|
|
Logged
|
|
|
|
Michael Hutton
Developer
member is offline


Gender: 
Posts: 248
|
 |
Re: Setting a register to zero if it's < zero
« Reply #31 on: Oct 9th, 2011, 8:28pm » |
|
Code:
.loop_B
punpcklbw mm0,[esi]
psrlw mm0,8
pmaddwd mm0,mm7
movd ebp,mm0
pshufw mm0,mm0,%01001110
movd eax,mm0
add ebp,eax
shr ebp,15
Very nice!
Michael
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #32 on: Oct 9th, 2011, 9:08pm » |
|
on Oct 9th, 2011, 5:32pm, Richard Russell wrote:| Here is a comparison of MMX and GP register versions of your 'luminance matrix' code: |
|
Thanks for the code (and apologies for referring to the luminance as "the intensity"!).
The MMX version is slightly slower on my laptop:
GR version: 3.67 s. MMX version: 4.19 s
I know that when it comes to the 'blending' (that's my word for it!) part, MMX is likely to be faster because, IIRC, the MMX-powered alphablending code you kindly contributed to GFXLIB is about twice as fast as the non-MMX version.
Anyway, this has been an interesting discussion (it's certainly occupied much of my day). Things have been learnt, and I'm beginning to wonder if time is better spent not worrying too much about optimising code...because seemingly optimised code on one system is not so on another. (I did know this, but today's discussion has highlighted the fact).
Now I must get on with preparing version 2.03 of GFXLIB for release, along with the GFXLIB FAQ.
After Christmas, I may have another crack at producing a GFXLIB-Lite for the Trial version of BB4W, having taken on board the points/ideas raised by you and Michael (in private correspondence).
Rgs, David.
|
|
|
|
Michael Hutton
Developer
member is offline


Gender: 
Posts: 248
|
 |
Re: Setting a register to zero if it's < zero
« Reply #33 on: Oct 9th, 2011, 9:15pm » |
|
David,
Here is the assembler output for the following routine in C++. I'm not really sure it adds anything or that we should take anything from it. I don't even know if I have put in the right optimising flags etc.
As an aside, the output bitmap does suffer from some aliasing with this routine.
It might also heavily dependent on how I access the bitmap - pData is defined as DWORD* whereas now we know that movzx *3 is better using unsigned char* could be better. Code:
void SetRGBs(DWORD* pData, int num, int f)
{
// Some set up
unsigned int r,g,b,i,rgb;
// Clamp f
if (f < 0) {f = 0;}
if (f > 0x100000) {f = 0x100000;}
for (int x = 0; x < 256; x ++){
for (int y = 0; y < 192; y++){
// get the rgb data
rgb = pData[(y * 256) + x];
r = (rgb >> 16) & 0xFF;
g = (rgb >> 8) & 0xFF;
b = rgb & 0xFF;
// calculate cumulative intensity
i = r * 313524;
i += g * 615514;
i += b * 119537;
// Refactor to 0-255
i = i >> 20;
// Calculate the new values
r = r + (((i-r)*f) >> 20);
g = g + (((i-g)*f) >> 20);
b = b + (((i-b)*f) >> 20);
// Write Pixel Back
pData[(y * 256) + x] = (r << 16) | (g <<8) | b;
}
}
}
and the asm code is - take a deep breath! Code:
_TEXT SEGMENT
_y$33985 = -80 ; size = 4
_x$33981 = -68 ; size = 4
_rgb$ = -56 ; size = 4
_i$ = -44 ; size = 4
_b$ = -32 ; size = 4
_g$ = -20 ; size = 4
_r$ = -8 ; size = 4
_pData$ = 8 ; size = 4
_num$ = 12 ; size = 4
_f$ = 16 ; size = 4
?SetRGBs@@YAXPAKHH@Z PROC ; SetRGBs, COMDAT
; 302 : {
push ebp
mov ebp, esp
sub esp, 276 ; 00000114H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-276]
mov ecx, 69 ; 00000045H
mov eax, -858993460 ; ccccccccH
rep stosd
; 303 : // Some set up
; 304 : unsigned int r,g,b,i,rgb;
; 305 :
; 306 : // Clamp f
; 307 : if (f < 0) {f = 0;}
cmp DWORD PTR _f$[ebp], 0
jge SHORT $LN8@SetRGBs
mov DWORD PTR _f$[ebp], 0
$LN8@SetRGBs:
; 308 : if (f > 0x100000) {f = 0x100000;}
cmp DWORD PTR _f$[ebp], 1048576 ; 00100000H
jle SHORT $LN7@SetRGBs
mov DWORD PTR _f$[ebp], 1048576 ; 00100000H
$LN7@SetRGBs:
; 309 :
; 310 :
; 311 : for (int x = 0; x < 256; x ++){
mov DWORD PTR _x$33981[ebp], 0
jmp SHORT $LN6@SetRGBs
$LN5@SetRGBs:
mov eax, DWORD PTR _x$33981[ebp]
add eax, 1
mov DWORD PTR _x$33981[ebp], eax
$LN6@SetRGBs:
cmp DWORD PTR _x$33981[ebp], 256 ; 00000100H
jge $LN9@SetRGBs
; 312 : for (int y = 0; y < 192; y++){
mov DWORD PTR _y$33985[ebp], 0
jmp SHORT $LN3@SetRGBs
$LN2@SetRGBs:
mov eax, DWORD PTR _y$33985[ebp]
add eax, 1
mov DWORD PTR _y$33985[ebp], eax
$LN3@SetRGBs:
cmp DWORD PTR _y$33985[ebp], 192 ; 000000c0H
jge $LN1@SetRGBs
; 313 : // get the rgb data
; 314 : rgb = pData[(y * 256) + x];
mov eax, DWORD PTR _y$33985[ebp]
shl eax, 8
add eax, DWORD PTR _x$33981[ebp]
mov ecx, DWORD PTR _pData$[ebp]
mov edx, DWORD PTR [ecx+eax*4]
mov DWORD PTR _rgb$[ebp], edx
; 315 : r = (rgb >> 16) & 0xFF;
mov eax, DWORD PTR _rgb$[ebp]
shr eax, 16 ; 00000010H
and eax, 255 ; 000000ffH
mov DWORD PTR _r$[ebp], eax
; 316 : g = (rgb >> 8) & 0xFF;
mov eax, DWORD PTR _rgb$[ebp]
shr eax, 8
and eax, 255 ; 000000ffH
mov DWORD PTR _g$[ebp], eax
; 317 : b = rgb & 0xFF;
mov eax, DWORD PTR _rgb$[ebp]
and eax, 255 ; 000000ffH
mov DWORD PTR _b$[ebp], eax
; 318 : // calculate cumulative intensity
; 319 : i = r * 313524;
mov eax, DWORD PTR _r$[ebp]
imul eax, 313524 ; 0004c8b4H
mov DWORD PTR _i$[ebp], eax
; 320 : i += g * 615514;
mov eax, DWORD PTR _g$[ebp]
imul eax, 615514 ; 0009645aH
add eax, DWORD PTR _i$[ebp]
mov DWORD PTR _i$[ebp], eax
; 321 : i += b * 119537;
mov eax, DWORD PTR _b$[ebp]
imul eax, 119537 ; 0001d2f1H
add eax, DWORD PTR _i$[ebp]
mov DWORD PTR _i$[ebp], eax
; 322 : // Refactor to 0-255
; 323 : i = i >> 20;
mov eax, DWORD PTR _i$[ebp]
shr eax, 20 ; 00000014H
mov DWORD PTR _i$[ebp], eax
; 324 : // Calculate the new values
; 325 : r = r + (((i-r)*f) >> 20);
mov eax, DWORD PTR _i$[ebp]
sub eax, DWORD PTR _r$[ebp]
imul eax, DWORD PTR _f$[ebp]
shr eax, 20 ; 00000014H
add eax, DWORD PTR _r$[ebp]
mov DWORD PTR _r$[ebp], eax
; 326 : g = g + (((i-g)*f) >> 20);
mov eax, DWORD PTR _i$[ebp]
sub eax, DWORD PTR _g$[ebp]
imul eax, DWORD PTR _f$[ebp]
shr eax, 20 ; 00000014H
add eax, DWORD PTR _g$[ebp]
mov DWORD PTR _g$[ebp], eax
; 327 : b = b + (((i-b)*f) >> 20);
mov eax, DWORD PTR _i$[ebp]
sub eax, DWORD PTR _b$[ebp]
imul eax, DWORD PTR _f$[ebp]
shr eax, 20 ; 00000014H
add eax, DWORD PTR _b$[ebp]
mov DWORD PTR _b$[ebp], eax
; 328 : // Write Pixel Back
; 329 : pData[(y * 256) + x] = (r << 16) | (g <<8) | b;
mov eax, DWORD PTR _r$[ebp]
shl eax, 16 ; 00000010H
mov ecx, DWORD PTR _g$[ebp]
shl ecx, 8
or eax, ecx
or eax, DWORD PTR _b$[ebp]
mov edx, DWORD PTR _y$33985[ebp]
shl edx, 8
add edx, DWORD PTR _x$33981[ebp]
mov ecx, DWORD PTR _pData$[ebp]
mov DWORD PTR [ecx+edx*4], eax
; 330 : }
jmp $LN2@SetRGBs
$LN1@SetRGBs:
; 331 : }
jmp $LN5@SetRGBs
$LN9@SetRGBs:
; 332 : }
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
?SetRGBs@@YAXPAKHH@Z ENDP ; SetRGBs
_TEXT ENDS
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #34 on: Oct 9th, 2011, 10:17pm » |
|
Michael: Gracias for the C++ output asm code. I'll study it keenly.
For those who've been following the discussion, here's a quick demo of GFXLIB_ColourDesaturate (compiled EXE):
http://www.bb4wgames.com/misc/realtimecolourdesaturation.zip
It struggles to maintain the full 60 fps framerate on my laptop (typically ~50 fps) which disappoints me, but then the routine isn't intended for realtime desaturation of 640x480 images in a gaming situation!
I reckon it's fast enough.
Regards, David.
Source (but don't try to run it!):
Code: HIMEM = LOMEM + 5*&100000
HIMEM = (HIMEM + 3) AND -4
PROCfixWindowSize
ON ERROR PROCerror( REPORT$, TRUE )
WinW% = 640
WinH% = 480
VDU 23, 22, WinW%; WinH%; 8, 16, 16, 0 : OFF
INSTALL @lib$ + "GFXLIB2"
PROCInitGFXLIB( d{}, 0 )
INSTALL @lib$ + "GFXLIB_modules\ColourDrain"
PROCInitModule
INSTALL @lib$ + "GFXLIB_modules\PlotShapeHalfIntensity"
PROCInitModule
GetTickCount% = FNSYS_NameToAddress( "GetTickCount" )
SetWindowText% = FNSYS_NameToAddress( "SetWindowText" )
flowers% = FNLoadImg( @dir$ + "flowers_640x480.JPG", 0 )
flowers_copy% = FNmalloc( 4 * 640*480 )
ball% = FNLoadImg( @lib$ + "GFXLIB_media\ball1_64x64x8.BMP", 0 )
numFrames% = 0
*REFRESH OFF
SYS GetTickCount% TO time0%
REPEAT
T% = TIME
f = 0.5 * (1.0 + SIN(T%/100))
SYS GFXLIB_DWORDCopy%, flowers%, flowers_copy%, 640*480
SYS GFXLIB_ColourDrain%, flowers_copy%, 640*480, f*&100000
SYS GFXLIB_BPlot%, d{}, flowers_copy%, 640, 480, 0, 0
FOR I% = 0 TO 11
X% = (320 - 32) + 220*SINRAD(I%*(360/12) + T%/2)
Y% = (240 - 32) + 220*COSRAD(I%*(360/12) + T%/2)
SYS GFXLIB_PlotShapeHalfIntensity%, d{}, ball%, 64, 64, X%-15, Y%-16
SYS GFXLIB_Plot%, d{}, ball%, 64, 64, X%, Y%
NEXT I%
PROCdisplay
SYS GetTickCount% TO time1%
IF (time1% - time0%) >= 1000 THEN
SYS SetWindowText%, @hwnd%, "Frame rate: " + STR$numFrames% + " fps"
numFrames% = 0
SYS GetTickCount% TO time0%
ELSE
numFrames% += 1
ENDIF
UNTIL FALSE
END
:
:
:
:
DEF PROCfixWindowSize
LOCAL GWL_STYLE, WS_THICKFRAME, WS_MAXIMIZEBOX, ws%
GWL_STYLE = -16
WS_THICKFRAME = &40000
WS_MAXIMIZEBOX = &10000
SYS "GetWindowLong", @hwnd%, GWL_STYLE TO ws%
SYS "SetWindowLong", @hwnd%, GWL_STYLE, ws% AND NOT (WS_THICKFRAME+WS_MAXIMIZEBOX)
ENDPROC
:
:
:
:
DEF PROCerror( msg$, L% )
OSCLI "REFRESH ON" : ON
COLOUR 1, &FF, &FF, &FF
COLOUR 1
PRINT TAB(1,1)msg$;
IF L% THEN
PRINT " at line "; ERL;
ENDIF
VDU 7
REPEAT UNTIL INKEY(1)=0
ENDPROC
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Setting a register to zero if it's < zero
« Reply #35 on: Oct 9th, 2011, 10:23pm » |
|
on Oct 9th, 2011, 3:27pm, David Williams wrote:| I'm thinking of calling it "DesaturateColour"! Isn't that more sensible? |
|
Yes!
If you're looking for ways to tidy up your code, please note that this:
Code:
sub ebp, 1
shl ebp, 2
add ebp, esi can be replaced by this (just 4 bytes):
Code: There's no significant speed impact, because it's not in a loop, but in terms of elegance there's no contest!
Richard.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #36 on: Oct 9th, 2011, 10:28pm » |
|
on Oct 9th, 2011, 10:23pm, Richard Russell wrote: Code:
There's no significant speed impact, because it's not in a loop, but in terms of elegance there's no contest! |
|
There was no excuse for me to miss that one, really, especially since I had read this article not long ago:
http://bb4w.wikispaces.com/Using+the+lea+instruction
Thanks again.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Setting a register to zero if it's < zero
« Reply #37 on: Oct 10th, 2011, 01:10am » |
|
on Oct 9th, 2011, 10:17pm, David Williams wrote:| For those who've been following the discussion, here's a quick demo of GFXLIB_ColourDesaturate (compiled EXE) |
|
Here's a MMX version of GFXLIB_ColourDrain:
Code:
; REM. SYS GFXLIB_ColourDrain%, pBitmap%, numPixels%, f%
;
; Parameters -- pBitmap%, numPixels%, f%
;
; pBitmap% - points to base address of 32-bpp ARGB bitmap
; numPixels% - number of pixels in bitmap
;
; f% (''colour-drain'' factor) is 12.20 fixed-point integer; range (0.0 to 1.0)*2^20 (Note 2^20 = &100000)
;
; f% is clamped (by this routine) to 0 or 2^20 (&100000)
;
pushad
; ESP!36 = pBitmap%
; ESP!40 = numPixels%
; ESP!44 = f% (= f * 2^20)
mov esi, [esp + 36] ; esi = pBitmap%
mov ebp, [esp + 40] ; numPixels%
lea ebp, [esi + ebp*4]
mov edi, [esp + 44] ; edi = f%
;REM. if f% < 0 then f% = 0
cmp edi, 0 ; f% < 0 ?
jge _.fgtzero%
xor edi, edi ; f% = 0
._.fgtzero%
;REM. if f% >= 2^20 (&100000) then f% = 2^20-1
cmp edi, 2^20 ; f% > 2^20 ?
jl _.flt2p20%
mov edi, 2^20-1 ; f% = 2^20-1
._.flt2p20%
shr edi, 5
movd mm6, edi
pshufw mm6, mm6, %11000000
movq mm7, [_.matrix%]
._.loop%
punpcklbw mm0,[esi]
punpckhbw mm1,[esi]
psrlw mm0,8
psrlw mm1,8
movq mm2,mm0
movq mm3,mm1
pmaddwd mm0,mm7
pmaddwd mm1,mm7
pshufw mm4,mm0,%01001110
pshufw mm5,mm1,%01001110
paddd mm4,mm0
paddd mm5,mm1
pslld mm4,1
pslld mm5,1
pshufw mm4,mm4,%01010101
pshufw mm5,mm5,%01010101
psubw mm4,mm2
psubw mm5,mm3
pmulhw mm4,mm6
pmulhw mm5,mm6
psllw mm4,1
psllw mm5,1
paddw mm4,mm2
paddw mm5,mm3
packuswb mm4,mm5
movq [esi],mm4
add esi, 8 ; next pixel address
cmp esi, ebp
jb _.loop%
popad
emms
ret 12
._.matrix%
dw 0.114 * 2^15
dw 0.587 * 2^15
dw 0.299 * 2^15
dw 0 I haven't compared its speed with yours, but I would expect it to be faster. As I'm by no means an MMX expert it may well be that it can be improved.
Richard.
|
| « Last Edit: Oct 10th, 2011, 08:33am by admin » |
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #38 on: Oct 10th, 2011, 06:55am » |
|
on Oct 10th, 2011, 01:10am, Richard Russell wrote:| Here's a MMX version of GFXLIB_ColourDrain: |
|
Gratefully received. :)
Okay, I had to make one little correction because I discovered that only one (or two?) pixels were being processed in the image.
The MMX version (MMXDesaturateColour) is nearly twice as fast (on my Centrino Duo laptop) as the non-MMX (GR) version.
1000 full-image operations on a 640x480 ARGB32 bitmap took:
4.84 s. (MMX version) 9.22 s (GR version)
The test (compiled EXE) can be downloaded here:
www.bb4wgames.com/misc/mmxdesaturatecolour_vs_colourdrain.zip
As I mentioned yesterday, I'll be dropping the Fisher-Price routine name (ColourDrain) and calling it DesaturateColour.
I won't just grab your MMX code and learn nothing from it, that you can be assured.
Thanks for the code.
David.
---
For the sake of completeness only, I'll list the source for the timed test here:
Code: HIMEM = LOMEM + 5*&100000
HIMEM = (HIMEM + 3) AND -4
PROCfixWindowSize
ON ERROR PROCerror( REPORT$, TRUE )
WinW% = 640
WinH% = 480
VDU 23, 22, WinW%; WinH%; 8, 16, 16, 0 : OFF
INSTALL @lib$ + "GFXLIB2"
PROCInitGFXLIB( d{}, 0 )
INSTALL @lib$ + "GFXLIB_modules\ColourDrain"
PROCInitModule
INSTALL @lib$ + "GFXLIB_modules\MMXDesaturateColour"
PROCInitModule
GetTickCount% = FNSYS_NameToAddress( "GetTickCount" )
flowers% = FNLoadImg( @dir$ + "flowers_640x480.JPG", 0 )
flowers_copy% = FNmalloc( 4 * 640*480 )
timeA_0% = 0
timeA_1% = 0
timeB_0% = 0
timeB_1% = 0
PRINT
PRINT " Conducting timed tests (MMXDesaturateColour vs. ColourDrain)"'
PRINT " (1000 colour desaturations of a 640x480 ARGB32 bitmap)"'
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &80
PRINT " Timing MMXDesaturateColour..."
df = 0.01
f = 0.0
G% = GFXLIB_MMXDesaturateColour%
SYS GetTickCount% TO timeA_0%
FOR I% = 1 TO 1000
SYS GFXLIB_DWORDCopy%, flowers%, flowers_copy%, 640*480
SYS G%, flowers_copy%, 640*480, f*&100000
f += df
IF f >= 1.0 THEN f = 0.0
NEXT I%
SYS GetTickCount% TO timeA_1%
PRINT " Timing ColourDrain..."
df = 0.01
f = 0.0
G% = GFXLIB_ColourDrain%
SYS GetTickCount% TO timeB_0%
FOR I% = 1 TO 1000
SYS GFXLIB_DWORDCopy%, flowers%, flowers_copy%, 640*480
SYS G%, flowers_copy%, 640*480, f*&100000
f += df
IF f >= 1.0 THEN f = 0.0
NEXT I%
SYS GetTickCount% TO timeB_1%
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &20
timeA = (timeA_1% - timeA_0%) / 1000
timeB = (timeB_1% - timeB_0%) / 1000
SOUND OFF : SOUND 1, -10, 226, 1
COLOUR 11 : ON
PRINT '" Results" : PRINT " -------"'
PRINT " MMXDesaturateColour took "; timeA; " s."'
PRINT " ColourDrain took "; timeB; " s."''
COLOUR 3 : PRINT " Finished!";
REPEAT UNTIL INKEY(1)=0
END
:
:
:
:
DEF PROCfixWindowSize
LOCAL GWL_STYLE, WS_THICKFRAME, WS_MAXIMIZEBOX, ws%
GWL_STYLE = -16
WS_THICKFRAME = &40000
WS_MAXIMIZEBOX = &10000
SYS "GetWindowLong", @hwnd%, GWL_STYLE TO ws%
SYS "SetWindowLong", @hwnd%, GWL_STYLE, ws% AND NOT (WS_THICKFRAME+WS_MAXIMIZEBOX)
ENDPROC
:
:
:
:
DEF PROCerror( msg$, L% )
OSCLI "REFRESH ON" : ON
COLOUR 1, &FF, &FF, &FF
COLOUR 1
PRINT TAB(1,1)msg$;
IF L% THEN
PRINT " at line "; ERL;
ENDIF
VDU 7
REPEAT UNTIL INKEY(1)=0
ENDPROC
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Setting a register to zero if it's < zero
« Reply #39 on: Oct 10th, 2011, 08:32am » |
|
on Oct 10th, 2011, 06:55am, David Williams wrote:| Okay, I had to make one little correction because I discovered that only one (or two?) pixels were being processed in the image. |
|
Ah yes, was that the edi that should have been an esi? Oddly, it worked here despite the error.
There's another change you should really make. The code as listed affects all four bytes of the resulting pixel (including the most-significant 'alpha' byte). Presumably you would prefer it to leave that byte unchanged, in which case you should alter the third line here as shown:
Code: shr edi, 5
movd mm6, edi
pshufw mm6, mm6, %11000000 Quote:| As I mentioned yesterday, I'll be dropping the Fisher-Price routine name (ColourDrain) and calling it DesaturateColour |
|
I know, but I only had the original version to work from. It seemed safer not to make any unnecessary changes.
Richard.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #40 on: Oct 13th, 2011, 7:40pm » |
|
GFXLIB_MMXDesaturateColour & GFXLIB_BoxBlur3x3:
http://www.bb4wgames.com/misc/mmxdesaturatecolour_example2c.zip (EXE; 163 Kb)
I can imagine using that kind of effect on the title page of some creepy RPG just before the game begins.
David.
======================================
Code:
*ESC OFF
REM Make 3 MB available for this program
M%=3 : HIMEM = LOMEM + M%*&100000
MODE 8 : OFF
INSTALL @lib$ + "GFXLIB2" : PROCInitGFXLIB
INSTALL @lib$ + "GFXLIB_modules\MMXDesaturateColour" : PROCInitModule
INSTALL @lib$ + "GFXLIB_modules\BoxBlur3x3" : PROCInitModule
bm% = FNLoadImg( @lib$ + "GFXLIB_media\bg1_640x512x8.bmp", 0 )
*REFRESH OFF
REPEAT
REM. Display the image normally for two seconds
SYS GFXLIB_BPlot%, dispVars{}, bm%, 640, 512, 0, 0
PROCdisplay
WAIT 200
FOR I% = 1 TO 280
SYS GFXLIB_BoxBlur3x3%, dispVars.bmBuffAddr%, dispVars.bmBuffAddr%, 640, 512
IF I% MOD 2 = 0 THEN SYS GFXLIB_MMXDesaturateColour%, dispVars.bmBuffAddr%, 640*512, 0.01*&100000
PROCdisplay
NEXT I%
UNTIL FALSE
|
|
Logged
|
|
|
|
|