BBC BASIC for Windows
« Setting a register to zero if it's < zero »

Welcome Guest. Please Login or Register.
Apr 5th, 2018, 9:57pm



ATTENTION MEMBERS: Conforums will be closing it doors and discontinuing its service on April 15, 2018.
Ad-Free has been deactivated. Outstanding Ad-Free credits will be reimbursed to respective payment methods.

If you require a dump of the post on your message board, please come to the support board and request it.


Thank you Conforums members.

BBC BASIC for Windows Resources
Online BBC BASIC for Windows documentation
BBC BASIC for Windows Beginners' Tutorial
BBC BASIC Home Page
BBC BASIC on Rosetta Code
BBC BASIC discussion group
BBC BASIC for Windows Programmers' Reference

« Previous Topic | Next Topic »
Pages: 1 2 3  Notify Send Topic Print
 veryhotthread  Author  Topic: Setting a register to zero if it's < zero  (Read 1413 times)
admin
Administrator
ImageImageImageImageImage


member is offline

Avatar




PM


Posts: 1145
xx Re: Setting a register to zero if it's < zero
« Reply #30 on: Oct 9th, 2011, 5:32pm »

on Oct 9th, 2011, 4:14pm, David Williams wrote:
Assuming that those FOR...NEXT loops still had 5000 iterations each (as in the original)

Yes, the code was unchanged except that I modified it to align each routine on a 32-byte boundary (rather than 4). My CPU is a 2.8 GHz Pentium 4.

Here is a comparison of MMX and GP register versions of your 'luminance matrix' code:

Code:
      MODE 8 : OFF
      
      HIMEM = LOMEM + 2*&100000
      
      DIM gap1% 4096
      DIM bitmap% 4*(640*512 + 2)
      bitmap% = (bitmap% + 7) AND -8
      DIM  gap2% 4096
      
      REM. These 4 Kb gaps are probably way OTT, but just to be certain!
      
      PROC_asm
      
      REM. Fill bitmap with random values
      FOR I% = bitmap% TO (bitmap% + 4*640*512)-1 STEP 4
        !I% = RND
      NEXT
      
      G% = FNSYS_NameToAddress( "GetTickCount" )
      time0% = 0
      time1% = 0
      
      PRINT '" Conducting test #1, please wait..."'
      
      REM. Test #1 (DW's code)
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        C%=USRA%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #1 (DW's code) took ";
      PRINT ;(time1% - time0%)/1000; " s. (final result ";C% ")"'
      
      PRINT '" Conducting test #2, please wait..."'
      
      REM. Test #2 (RTR's code)
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      SYS G% TO time0%
      FOR I% = 1 TO 5000
        C%=USRB%
      NEXT
      SYS G% TO time1%
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      PRINT '" Test #2 (RTR's code) took ";
      PRINT ;(time1% - time0%)/1000; " s. (final result ";C% ")"'
      
      PRINT''" Finished."
      END
      ;
      DEF PROC_asm
      LOCAL I%, P%, code%, loop_A, loop_B, loop_C
      DIM code% 1000
      
      FOR I% = 0 TO 2 STEP 2
        P% = code%
        [OPT I%
        
        ] : P%=(P%+31) AND -32 : [OPT I%
        
        .A%
        mov esi, bitmap%
        .loop_A
        movzx ecx, BYTE [esi]                    ; ECX (cl) = blue byte (b&)
        movzx ebx, BYTE [esi + 1]                ; EBX (bl) = green byte (g&)
        movzx eax, BYTE [esi + 2]                ; EAX (al) = red byte (r&)
        
        xor ebp, ebp                             ; EBP = cumulative intensity (i) - initially zero
        
        ;REM. i += (0.114 * 2^20) * b&
        mov edx, ecx
        imul edx, (0.114 * 2^20)
        add ebp, edx
        
        ;REM. i += (0.587 * 2^20) * g&
        mov edx, ebx
        imul edx, (0.587 * 2^20)
        add ebp, edx
        
        ;REM. i += (0.299 * 2^20) * r&
        mov edx, eax
        imul edx, (0.299 * 2^20)
        add ebp, edx
        
        shr ebp, 20                              ; EDX (i&) now in the range 0 to 255
        
        add esi, 4
        cmp esi, (bitmap% + 640 * 512)
        jl loop_A
        mov eax,ebp
        ret
        
        ] : P%=(P%+31) AND -32 : [OPT I%
        
        .matrix
        dw 0.114 * 2^15
        dw 0.587 * 2^15
        dw 0.299 * 2^15
        dw 0
        
        .B%
        mov esi, bitmap%
        movq mm7, [matrix]
        
        .loop_B
        punpcklbw mm0,[esi]
        psrlw mm0,8
        pmaddwd mm0,mm7
        movd ebp,mm0
        pshufw mm0,mm0,%01001110
        movd eax,mm0
        add ebp,eax
        shr ebp,15
        
        add esi, 4
        cmp esi, (bitmap% + 640 * 512)
        jl loop_B
        mov eax,ebp
        ret
        
        ]
        
      NEXT I%
      ENDPROC
      ;
      DEF FNSYS_NameToAddress( f$ )
      LOCAL P%
      DIM P% LOCAL 5
      [OPT 0 : call f$ : ]
      =P%!-4+P% 

On my PCs the MMX version doesn't run much, if any, faster but the potential gain comes from being able to use the other MMX registers to process (say) four pixels 'in parallel'.

Richard.
User IP Logged

Michael Hutton
Developer

member is offline

Avatar




PM

Gender: Male
Posts: 248
xx Re: Setting a register to zero if it's < zero
« Reply #31 on: Oct 9th, 2011, 8:28pm »

Code:
.loop_B
        punpcklbw mm0,[esi]
        psrlw mm0,8
        pmaddwd mm0,mm7
        movd ebp,mm0
        pshufw mm0,mm0,%01001110
        movd eax,mm0
        add ebp,eax
        shr ebp,15
 


Very nice!

Michael
User IP Logged

David Williams
Developer

member is offline

Avatar

meh


PM

Gender: Male
Posts: 452
xx Re: Setting a register to zero if it's < zero
« Reply #32 on: Oct 9th, 2011, 9:08pm »

on Oct 9th, 2011, 5:32pm, Richard Russell wrote:
Here is a comparison of MMX and GP register versions of your 'luminance matrix' code:


Thanks for the code (and apologies for referring to the luminance as "the intensity"!).

The MMX version is slightly slower on my laptop:

GR version: 3.67 s.
MMX version: 4.19 s

I know that when it comes to the 'blending' (that's my word for it!) part, MMX is likely to be faster because, IIRC, the MMX-powered alphablending code you kindly contributed to GFXLIB is about twice as fast as the non-MMX version.

Anyway, this has been an interesting discussion (it's certainly occupied much of my day). Things have been learnt, and I'm beginning to wonder if time is better spent not worrying too much about optimising code...because seemingly optimised code on one system is not so on another. (I did know this, but today's discussion has highlighted the fact).

Now I must get on with preparing version 2.03 of GFXLIB for release, along with the GFXLIB FAQ.

After Christmas, I may have another crack at producing a GFXLIB-Lite for the Trial version of BB4W, having taken on board the points/ideas raised by you and Michael (in private correspondence).


Rgs,
David.
« Last Edit: Oct 9th, 2011, 9:09pm by David Williams » User IP Logged

Michael Hutton
Developer

member is offline

Avatar




PM

Gender: Male
Posts: 248
xx Re: Setting a register to zero if it's < zero
« Reply #33 on: Oct 9th, 2011, 9:15pm »

David,

Here is the assembler output for the following routine in C++. I'm not really sure it adds anything or that we should take anything from it. I don't even know if I have put in the right optimising flags etc.

As an aside, the output bitmap does suffer from some aliasing with this routine.

It might also heavily dependent on how I access the bitmap - pData is defined as DWORD* whereas now we know that movzx *3 is better using unsigned char* could be better.
Code:
void SetRGBs(DWORD* pData, int num, int f)
{
	// Some set up
	unsigned int r,g,b,i,rgb;
	
	// Clamp f
	if (f < 0) {f = 0;} 
	if (f > 0x100000) {f = 0x100000;}	
	for (int x = 0; x < 256; x ++){	
		for (int y = 0; y < 192; y++){
		// get the rgb data	
			rgb = pData[(y * 256) + x];	
			r = (rgb >> 16) & 0xFF;   
			g = (rgb >> 8) & 0xFF;
		    b = rgb & 0xFF;
		// calculate cumulative intensity
			i  = r * 313524;
			i += g * 615514;
			i += b * 119537;
		// Refactor to 0-255
			i = i >> 20;
		// Calculate the new values
			r = r + (((i-r)*f) >> 20);
			g = g + (((i-g)*f) >> 20);
			b = b + (((i-b)*f) >> 20);
		// Write Pixel Back
			pData[(y * 256) + x] = (r << 16) | (g <<8) | b;
		}
	}
}
 


and the asm code is - take a deep breath!
Code:
_TEXT	SEGMENT
_y$33985 = -80						; size = 4
_x$33981 = -68						; size = 4
_rgb$ = -56						; size = 4
_i$ = -44						; size = 4
_b$ = -32						; size = 4
_g$ = -20						; size = 4
_r$ = -8						; size = 4
_pData$ = 8						; size = 4
_num$ = 12						; size = 4
_f$ = 16						; size = 4
?SetRGBs@@YAXPAKHH@Z PROC				; SetRGBs, COMDAT

; 302  : {

	push	ebp
	mov	ebp, esp
	sub	esp, 276				; 00000114H
	push	ebx
	push	esi
	push	edi
	lea	edi, DWORD PTR [ebp-276]
	mov	ecx, 69					; 00000045H
	mov	eax, -858993460				; ccccccccH
	rep stosd

; 303  : 	// Some set up
; 304  : 	unsigned int r,g,b,i,rgb;
; 305  : 	
; 306  : 	// Clamp f
; 307  : 	if (f < 0) {f = 0;} 

	cmp	DWORD PTR _f$[ebp], 0
	jge	SHORT $LN8@SetRGBs
	mov	DWORD PTR _f$[ebp], 0
$LN8@SetRGBs:

; 308  : 	if (f > 0x100000) {f = 0x100000;}

	cmp	DWORD PTR _f$[ebp], 1048576		; 00100000H
	jle	SHORT $LN7@SetRGBs
	mov	DWORD PTR _f$[ebp], 1048576		; 00100000H
$LN7@SetRGBs:

; 309  : 
; 310  : 	
; 311  : 	for (int x = 0; x < 256; x ++){	

	mov	DWORD PTR _x$33981[ebp], 0
	jmp	SHORT $LN6@SetRGBs
$LN5@SetRGBs:
	mov	eax, DWORD PTR _x$33981[ebp]
	add	eax, 1
	mov	DWORD PTR _x$33981[ebp], eax
$LN6@SetRGBs:
	cmp	DWORD PTR _x$33981[ebp], 256		; 00000100H
	jge	$LN9@SetRGBs

; 312  : 		for (int y = 0; y < 192; y++){

	mov	DWORD PTR _y$33985[ebp], 0
	jmp	SHORT $LN3@SetRGBs
$LN2@SetRGBs:
	mov	eax, DWORD PTR _y$33985[ebp]
	add	eax, 1
	mov	DWORD PTR _y$33985[ebp], eax
$LN3@SetRGBs:
	cmp	DWORD PTR _y$33985[ebp], 192		; 000000c0H
	jge	$LN1@SetRGBs

; 313  : 		// get the rgb data	
; 314  : 			rgb = pData[(y * 256) + x];	

	mov	eax, DWORD PTR _y$33985[ebp]
	shl	eax, 8
	add	eax, DWORD PTR _x$33981[ebp]
	mov	ecx, DWORD PTR _pData$[ebp]
	mov	edx, DWORD PTR [ecx+eax*4]
	mov	DWORD PTR _rgb$[ebp], edx

; 315  : 			r = (rgb >> 16) & 0xFF;   

	mov	eax, DWORD PTR _rgb$[ebp]
	shr	eax, 16					; 00000010H
	and	eax, 255				; 000000ffH
	mov	DWORD PTR _r$[ebp], eax

; 316  : 			g = (rgb >> 8) & 0xFF;

	mov	eax, DWORD PTR _rgb$[ebp]
	shr	eax, 8
	and	eax, 255				; 000000ffH
	mov	DWORD PTR _g$[ebp], eax

; 317  : 		    b = rgb & 0xFF;

	mov	eax, DWORD PTR _rgb$[ebp]
	and	eax, 255				; 000000ffH
	mov	DWORD PTR _b$[ebp], eax

; 318  : 		// calculate cumulative intensity
; 319  : 			i  = r * 313524;

	mov	eax, DWORD PTR _r$[ebp]
	imul	eax, 313524				; 0004c8b4H
	mov	DWORD PTR _i$[ebp], eax

; 320  : 			i += g * 615514;

	mov	eax, DWORD PTR _g$[ebp]
	imul	eax, 615514				; 0009645aH
	add	eax, DWORD PTR _i$[ebp]
	mov	DWORD PTR _i$[ebp], eax

; 321  : 			i += b * 119537;

	mov	eax, DWORD PTR _b$[ebp]
	imul	eax, 119537				; 0001d2f1H
	add	eax, DWORD PTR _i$[ebp]
	mov	DWORD PTR _i$[ebp], eax

; 322  : 		// Refactor to 0-255
; 323  : 			i = i >> 20;

	mov	eax, DWORD PTR _i$[ebp]
	shr	eax, 20					; 00000014H
	mov	DWORD PTR _i$[ebp], eax

; 324  : 		// Calculate the new values
; 325  : 			r = r + (((i-r)*f) >> 20);

	mov	eax, DWORD PTR _i$[ebp]
	sub	eax, DWORD PTR _r$[ebp]
	imul	eax, DWORD PTR _f$[ebp]
	shr	eax, 20					; 00000014H
	add	eax, DWORD PTR _r$[ebp]
	mov	DWORD PTR _r$[ebp], eax

; 326  : 			g = g + (((i-g)*f) >> 20);

	mov	eax, DWORD PTR _i$[ebp]
	sub	eax, DWORD PTR _g$[ebp]
	imul	eax, DWORD PTR _f$[ebp]
	shr	eax, 20					; 00000014H
	add	eax, DWORD PTR _g$[ebp]
	mov	DWORD PTR _g$[ebp], eax

; 327  : 			b = b + (((i-b)*f) >> 20);

	mov	eax, DWORD PTR _i$[ebp]
	sub	eax, DWORD PTR _b$[ebp]
	imul	eax, DWORD PTR _f$[ebp]
	shr	eax, 20					; 00000014H
	add	eax, DWORD PTR _b$[ebp]
	mov	DWORD PTR _b$[ebp], eax

; 328  : 		// Write Pixel Back
; 329  : 			pData[(y * 256) + x] = (r << 16) | (g <<8) | b;

	mov	eax, DWORD PTR _r$[ebp]
	shl	eax, 16					; 00000010H
	mov	ecx, DWORD PTR _g$[ebp]
	shl	ecx, 8
	or	eax, ecx
	or	eax, DWORD PTR _b$[ebp]
	mov	edx, DWORD PTR _y$33985[ebp]
	shl	edx, 8
	add	edx, DWORD PTR _x$33981[ebp]
	mov	ecx, DWORD PTR _pData$[ebp]
	mov	DWORD PTR [ecx+edx*4], eax

; 330  : 		}

	jmp	$LN2@SetRGBs
$LN1@SetRGBs:

; 331  : 	}

	jmp	$LN5@SetRGBs
$LN9@SetRGBs:

; 332  : }

	pop	edi
	pop	esi
	pop	ebx
	mov	esp, ebp
	pop	ebp
	ret	0
?SetRGBs@@YAXPAKHH@Z ENDP				; SetRGBs
_TEXT	ENDS
 
User IP Logged

David Williams
Developer

member is offline

Avatar

meh


PM

Gender: Male
Posts: 452
xx Re: Setting a register to zero if it's < zero
« Reply #34 on: Oct 9th, 2011, 10:17pm »

Michael: Gracias for the C++ output asm code. I'll study it keenly.

For those who've been following the discussion, here's a quick demo of GFXLIB_ColourDesaturate (compiled EXE):

http://www.bb4wgames.com/misc/realtimecolourdesaturation.zip

It struggles to maintain the full 60 fps framerate on my laptop (typically ~50 fps) which disappoints me, but then the routine isn't intended for realtime desaturation of 640x480 images in a gaming situation!

I reckon it's fast enough.


Regards,
David.


Source (but don't try to run it!):

Code:
      HIMEM = LOMEM + 5*&100000
      HIMEM = (HIMEM + 3) AND -4
      
      PROCfixWindowSize
      
      ON ERROR PROCerror( REPORT$, TRUE )
      
      WinW% = 640
      WinH% = 480
      VDU 23, 22, WinW%; WinH%; 8, 16, 16, 0 : OFF
      
      INSTALL @lib$ + "GFXLIB2"
      PROCInitGFXLIB( d{}, 0 )
      
      INSTALL @lib$ + "GFXLIB_modules\ColourDrain"
      PROCInitModule
      
      INSTALL @lib$ + "GFXLIB_modules\PlotShapeHalfIntensity"
      PROCInitModule
      
      GetTickCount% = FNSYS_NameToAddress( "GetTickCount" )
      SetWindowText% = FNSYS_NameToAddress( "SetWindowText" )
      
      flowers% = FNLoadImg( @dir$ + "flowers_640x480.JPG", 0 )
      
      flowers_copy% = FNmalloc( 4 * 640*480 )
      
      ball% = FNLoadImg( @lib$ + "GFXLIB_media\ball1_64x64x8.BMP", 0 )
      
      numFrames% = 0
      
      *REFRESH OFF
      
      SYS GetTickCount% TO time0%
      REPEAT
        
        T% = TIME
        
        f = 0.5 * (1.0 + SIN(T%/100))
        SYS GFXLIB_DWORDCopy%, flowers%, flowers_copy%, 640*480
        SYS GFXLIB_ColourDrain%, flowers_copy%, 640*480, f*&100000
        SYS GFXLIB_BPlot%, d{}, flowers_copy%, 640, 480, 0, 0
        
        FOR I% = 0 TO 11
          X% = (320 - 32) + 220*SINRAD(I%*(360/12) + T%/2)
          Y% = (240 - 32) + 220*COSRAD(I%*(360/12) + T%/2)
          SYS GFXLIB_PlotShapeHalfIntensity%, d{}, ball%, 64, 64, X%-15, Y%-16
          SYS GFXLIB_Plot%, d{}, ball%, 64, 64, X%, Y%
        NEXT I%
        
        PROCdisplay
        
        SYS GetTickCount% TO time1%
        IF (time1% - time0%) >= 1000 THEN
          SYS SetWindowText%, @hwnd%, "Frame rate: " + STR$numFrames% + " fps"
          numFrames% = 0
          SYS GetTickCount% TO time0%
        ELSE
          numFrames% += 1
        ENDIF
        
      UNTIL FALSE
      END
      :
      :
      :
      :
      DEF PROCfixWindowSize
      LOCAL GWL_STYLE, WS_THICKFRAME, WS_MAXIMIZEBOX, ws%
      GWL_STYLE = -16
      WS_THICKFRAME = &40000
      WS_MAXIMIZEBOX = &10000
      SYS "GetWindowLong", @hwnd%, GWL_STYLE TO ws%
      SYS "SetWindowLong", @hwnd%, GWL_STYLE, ws% AND NOT (WS_THICKFRAME+WS_MAXIMIZEBOX)
      ENDPROC
      :
      :
      :
      :
      DEF PROCerror( msg$, L% )
      OSCLI "REFRESH ON" : ON
      COLOUR 1, &FF, &FF, &FF
      COLOUR 1
      PRINT TAB(1,1)msg$;
      IF L% THEN
        PRINT " at line "; ERL;
      ENDIF
      VDU 7
      REPEAT UNTIL INKEY(1)=0
      ENDPROC 

« Last Edit: Oct 9th, 2011, 10:21pm by David Williams » User IP Logged

admin
Administrator
ImageImageImageImageImage


member is offline

Avatar




PM


Posts: 1145
xx Re: Setting a register to zero if it's < zero
« Reply #35 on: Oct 9th, 2011, 10:23pm »

on Oct 9th, 2011, 3:27pm, David Williams wrote:
I'm thinking of calling it "DesaturateColour"! Isn't that more sensible?

Yes!

If you're looking for ways to tidy up your code, please note that this:

Code:
      sub ebp, 1
      shl ebp, 2
      add ebp, esi 

can be replaced by this (just 4 bytes):

Code:
      lea ebp,[esi+ebp*4-4] 

There's no significant speed impact, because it's not in a loop, but in terms of elegance there's no contest!

Richard.
User IP Logged

David Williams
Developer

member is offline

Avatar

meh


PM

Gender: Male
Posts: 452
xx Re: Setting a register to zero if it's < zero
« Reply #36 on: Oct 9th, 2011, 10:28pm »

on Oct 9th, 2011, 10:23pm, Richard Russell wrote:
Code:
lea ebp,[esi+ebp*4-4] 


There's no significant speed impact, because it's not in a loop, but in terms of elegance there's no contest!


There was no excuse for me to miss that one, really, especially since I had read this article not long ago:

http://bb4w.wikispaces.com/Using+the+lea+instruction

Thanks again.
User IP Logged

admin
Administrator
ImageImageImageImageImage


member is offline

Avatar




PM


Posts: 1145
xx Re: Setting a register to zero if it's < zero
« Reply #37 on: Oct 10th, 2011, 01:10am »

on Oct 9th, 2011, 10:17pm, David Williams wrote:
For those who've been following the discussion, here's a quick demo of GFXLIB_ColourDesaturate (compiled EXE)

Here's a MMX version of GFXLIB_ColourDrain:

Code:
        ; REM. SYS GFXLIB_ColourDrain%, pBitmap%, numPixels%, f%
        
        ;
        ; Parameters -- pBitmap%, numPixels%, f%
        ;
        ;               pBitmap% - points to base address of 32-bpp ARGB bitmap
        ;               numPixels% - number of pixels in bitmap
        ;
        ;               f% (''colour-drain'' factor) is 12.20 fixed-point integer; range (0.0 to 1.0)*2^20  (Note 2^20 = &100000)
        ;
        ;               f% is clamped (by this routine) to 0 or 2^20 (&100000)
        ;
        
        pushad
        
        ; ESP!36 = pBitmap%
        ; ESP!40 = numPixels%
        ; ESP!44 = f% (= f * 2^20)
        
        mov esi, [esp + 36]                      ; esi = pBitmap%
        
        mov ebp, [esp + 40]                      ; numPixels%
        lea ebp, [esi + ebp*4]
        
        mov edi, [esp + 44]                      ; edi = f%
        
        ;REM. if f% < 0 then f% = 0
        cmp edi, 0                               ; f% < 0 ?
        jge _.fgtzero%
        xor edi, edi                             ; f% = 0
        ._.fgtzero%
        
        ;REM. if f% >= 2^20 (&100000) then f% = 2^20-1
        cmp edi, 2^20                            ; f% > 2^20 ?
        jl _.flt2p20%
        mov edi, 2^20-1                          ; f% = 2^20-1
        ._.flt2p20%
        
        shr edi, 5
        movd mm6, edi
        pshufw mm6, mm6, %11000000
        movq mm7, [_.matrix%]
        
        ._.loop%
        
        punpcklbw mm0,[esi]
        punpckhbw mm1,[esi]
        psrlw mm0,8
        psrlw mm1,8
        movq mm2,mm0
        movq mm3,mm1
        pmaddwd mm0,mm7
        pmaddwd mm1,mm7
        pshufw mm4,mm0,%01001110
        pshufw mm5,mm1,%01001110
        paddd mm4,mm0
        paddd mm5,mm1
        pslld mm4,1
        pslld mm5,1
        pshufw mm4,mm4,%01010101
        pshufw mm5,mm5,%01010101
        psubw mm4,mm2
        psubw mm5,mm3
        pmulhw mm4,mm6
        pmulhw mm5,mm6
        psllw mm4,1
        psllw mm5,1
        paddw mm4,mm2
        paddw mm5,mm3
        packuswb mm4,mm5
        movq [esi],mm4
        
        add esi, 8                               ; next pixel address
        cmp esi, ebp
        jb _.loop%
        
        popad
        emms
        ret 12
        
        ._.matrix%
        dw 0.114 * 2^15
        dw 0.587 * 2^15
        dw 0.299 * 2^15
        dw 0 

I haven't compared its speed with yours, but I would expect it to be faster. As I'm by no means an MMX expert it may well be that it can be improved.

Richard.
« Last Edit: Oct 10th, 2011, 08:33am by admin » User IP Logged

David Williams
Developer

member is offline

Avatar

meh


PM

Gender: Male
Posts: 452
xx Re: Setting a register to zero if it's < zero
« Reply #38 on: Oct 10th, 2011, 06:55am »

on Oct 10th, 2011, 01:10am, Richard Russell wrote:
Here's a MMX version of GFXLIB_ColourDrain:


Gratefully received. :)

Okay, I had to make one little correction because I discovered that only one (or two?) pixels were being processed in the image.

The MMX version (MMXDesaturateColour) is nearly twice as fast (on my Centrino Duo laptop) as the non-MMX (GR) version.

1000 full-image operations on a 640x480 ARGB32 bitmap took:

4.84 s. (MMX version)
9.22 s (GR version)

The test (compiled EXE) can be downloaded here:

www.bb4wgames.com/misc/mmxdesaturatecolour_vs_colourdrain.zip

As I mentioned yesterday, I'll be dropping the Fisher-Price routine name (ColourDrain) and calling it DesaturateColour.

I won't just grab your MMX code and learn nothing from it, that you can be assured.

Thanks for the code.


David.


---


For the sake of completeness only, I'll list the source for the timed test here:

Code:
      HIMEM = LOMEM + 5*&100000
      HIMEM = (HIMEM + 3) AND -4
      
      PROCfixWindowSize
      
      ON ERROR PROCerror( REPORT$, TRUE )
      
      WinW% = 640
      WinH% = 480
      VDU 23, 22, WinW%; WinH%; 8, 16, 16, 0 : OFF
      
      INSTALL @lib$ + "GFXLIB2"
      PROCInitGFXLIB( d{}, 0 )
      
      INSTALL @lib$ + "GFXLIB_modules\ColourDrain"
      PROCInitModule
      
      INSTALL @lib$ + "GFXLIB_modules\MMXDesaturateColour"
      PROCInitModule
      
      GetTickCount% = FNSYS_NameToAddress( "GetTickCount" )
      
      flowers% = FNLoadImg( @dir$ + "flowers_640x480.JPG", 0 )
      flowers_copy% = FNmalloc( 4 * 640*480 )
      
      timeA_0% = 0
      timeA_1% = 0
      timeB_0% = 0
      timeB_1% = 0
      
      PRINT
      
      PRINT " Conducting timed tests (MMXDesaturateColour vs. ColourDrain)"'
      PRINT " (1000 colour desaturations of a 640x480 ARGB32 bitmap)"'
      
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &80
      
      PRINT " Timing MMXDesaturateColour..."
      df = 0.01
      f = 0.0
      G% = GFXLIB_MMXDesaturateColour%
      SYS GetTickCount% TO timeA_0%
      FOR I% = 1 TO 1000
        SYS GFXLIB_DWORDCopy%, flowers%, flowers_copy%, 640*480
        SYS G%, flowers_copy%, 640*480, f*&100000
        f += df
        IF f >= 1.0 THEN f = 0.0
      NEXT I%
      SYS GetTickCount% TO timeA_1%
      
      PRINT " Timing ColourDrain..."
      df = 0.01
      f = 0.0
      G% = GFXLIB_ColourDrain%
      SYS GetTickCount% TO timeB_0%
      FOR I% = 1 TO 1000
        SYS GFXLIB_DWORDCopy%, flowers%, flowers_copy%, 640*480
        SYS G%, flowers_copy%, 640*480, f*&100000
        f += df
        IF f >= 1.0 THEN f = 0.0
      NEXT I%
      SYS GetTickCount% TO timeB_1%
      
      SYS "GetCurrentProcess" TO hprocess%
      SYS "SetPriorityClass", hprocess%, &20
      
      timeA = (timeA_1% - timeA_0%) / 1000
      timeB = (timeB_1% - timeB_0%) / 1000
      
      SOUND OFF : SOUND 1, -10, 226, 1
      
      COLOUR 11 : ON
      PRINT '" Results" : PRINT " -------"'
      
      PRINT " MMXDesaturateColour took "; timeA; " s."'
      PRINT " ColourDrain took "; timeB; " s."''
      
      COLOUR 3 : PRINT " Finished!";
      REPEAT UNTIL INKEY(1)=0
      END
      :
      :
      :
      :
      DEF PROCfixWindowSize
      LOCAL GWL_STYLE, WS_THICKFRAME, WS_MAXIMIZEBOX, ws%
      GWL_STYLE = -16
      WS_THICKFRAME = &40000
      WS_MAXIMIZEBOX = &10000
      SYS "GetWindowLong", @hwnd%, GWL_STYLE TO ws%
      SYS "SetWindowLong", @hwnd%, GWL_STYLE, ws% AND NOT (WS_THICKFRAME+WS_MAXIMIZEBOX)
      ENDPROC
      :
      :
      :
      :
      DEF PROCerror( msg$, L% )
      OSCLI "REFRESH ON" : ON
      COLOUR 1, &FF, &FF, &FF
      COLOUR 1
      PRINT TAB(1,1)msg$;
      IF L% THEN
        PRINT " at line "; ERL;
      ENDIF
      VDU 7
      REPEAT UNTIL INKEY(1)=0
      ENDPROC
 

User IP Logged

admin
Administrator
ImageImageImageImageImage


member is offline

Avatar




PM


Posts: 1145
xx Re: Setting a register to zero if it's < zero
« Reply #39 on: Oct 10th, 2011, 08:32am »

on Oct 10th, 2011, 06:55am, David Williams wrote:
Okay, I had to make one little correction because I discovered that only one (or two?) pixels were being processed in the image.

Ah yes, was that the edi that should have been an esi? Oddly, it worked here despite the error.

There's another change you should really make. The code as listed affects all four bytes of the resulting pixel (including the most-significant 'alpha' byte). Presumably you would prefer it to leave that byte unchanged, in which case you should alter the third line here as shown:

Code:
        shr edi, 5
        movd mm6, edi
        pshufw mm6, mm6, %11000000 


Quote:
As I mentioned yesterday, I'll be dropping the Fisher-Price routine name (ColourDrain) and calling it DesaturateColour

I know, but I only had the original version to work from. It seemed safer not to make any unnecessary changes.

Richard.
User IP Logged

David Williams
Developer

member is offline

Avatar

meh


PM

Gender: Male
Posts: 452
xx Re: Setting a register to zero if it's < zero
« Reply #40 on: Oct 13th, 2011, 7:40pm »

GFXLIB_MMXDesaturateColour & GFXLIB_BoxBlur3x3:

http://www.bb4wgames.com/misc/mmxdesaturatecolour_example2c.zip (EXE; 163 Kb)

I can imagine using that kind of effect on the title page of some creepy RPG just before the game begins.


David.

======================================

Code:
      *ESC OFF
      
      REM Make 3 MB available for this program
      M%=3 : HIMEM = LOMEM + M%*&100000
      
      MODE 8 : OFF
      
      INSTALL @lib$ + "GFXLIB2" : PROCInitGFXLIB
      INSTALL @lib$ + "GFXLIB_modules\MMXDesaturateColour" : PROCInitModule
      INSTALL @lib$ + "GFXLIB_modules\BoxBlur3x3" : PROCInitModule
      
      bm% = FNLoadImg( @lib$ + "GFXLIB_media\bg1_640x512x8.bmp", 0 )
      
      *REFRESH OFF
      
      REPEAT
        
        REM. Display the image normally for two seconds
        SYS GFXLIB_BPlot%, dispVars{}, bm%, 640, 512, 0, 0
        PROCdisplay
        WAIT 200
        
        FOR I% = 1 TO 280
          SYS GFXLIB_BoxBlur3x3%, dispVars.bmBuffAddr%, dispVars.bmBuffAddr%, 640, 512
          IF I% MOD 2 = 0 THEN SYS GFXLIB_MMXDesaturateColour%, dispVars.bmBuffAddr%, 640*512, 0.01*&100000
          PROCdisplay
        NEXT I%
        
      UNTIL FALSE 



User IP Logged

Pages: 1 2 3  Notify Send Topic Print
« Previous Topic | Next Topic »

| |

This forum powered for FREE by Conforums ©
Terms of Service | Privacy Policy | Conforums Support | Parental Controls