Author |
Topic: Setting a register to zero if it's < zero (Read 1455 times) |
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #19 on: Oct 9th, 2011, 2:55pm » |
|
on Oct 9th, 2011, 2:35pm, Richard Russell wrote:| I know you didn't ask me, but.... |
|
I certainly had you in mind as well!
The trouble with the English language is that, in contrast to French and Arabic, it doesn't have a plural form of the word "you" (as in "you people", "you two gifted assembly language programmers", etc.).
on Oct 9th, 2011, 2:35pm, Richard Russell wrote:| I wouldn't start from there, because constraining yourself to using the general-purpose registers (eax etc.) is going to limit the speed, however optimised the code. The MMX instructions are specifically designed to handle things like 32-bit xRGB pixels efficiently, so I would expect using MMX would be the fastest way. |
|
I'll certainly look into it. I've got some MMX code of yours (alpha blending) which I might be able to adapt.
on Oct 9th, 2011, 2:35pm, Richard Russell wrote:| Of course it depends on what you do with the values next, once you've got them in separate registers, but I'd still be surprised if MMX doesn't win. |
|
Once they're in the registers, I compute the intensity using the fixed-point version of the formula:
i = 0.114*R + 0.587*G + 0.299*B
then I 'blend' i (which is in the range 0 to 255) with each of the R, G, B values by an amount specified as a factor ranging from 0.0 to beyond 1.0 (up to 10.0, say). Values less than 1.0 de-saturate the colour, values over 1.0 enhance saturation. (By the way, all the math is fixed-point, so this 'saturation factor' is multiplied by 2^20 (&100000) before it's passed to the routine).
I like the effect it produces (technically accurate or otherwise!), and will be a nice addition to GFXLIB.
Rgs, David.
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #20 on: Oct 9th, 2011, 2:58pm » |
|
on Oct 9th, 2011, 2:40pm, Richard Russell wrote:It is allowed, but you have to use the correct mnemonic:
Code:
mov eax, [edi + 4 * esi]
movzx ebx, ah
movzx ecx, al
shr eax,16 Richard. |
|
Excellent. I can do away with those horrid XORs now. :)
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #21 on: Oct 9th, 2011, 3:09pm » |
|
Okay, with Richard's correction, the fastest non-MMX/non-SIMD code yet is indeed:
Code:
mov eax, [edi + 4 * esi]
movzx ebx, ah
movzx ecx, al
shr eax,16
It took 5.93 s. in the timed test, approaching 2 seconds faster than the three MOVZX byte loads (Test #1).
Thanking MH and RTR once more.
|
|
Logged
|
|
|
|
Michael Hutton
Developer
member is offline


Gender: 
Posts: 248
|
 |
Re: Setting a register to zero if it's < zero
« Reply #22 on: Oct 9th, 2011, 3:14pm » |
|
Quote:| I would rather say 'completely meaningless'. |
|

Quote:| overhead of the USR function and the NEXT statement |
|
In defence, I wasn't commenting of the timing the BASIC loops, just the profiler report of the USR() line, but even that only has a resolution of 1ms.
I did say **seems to favour** over 10 million runs.... I am not sure "completely meaningless" really does apply, but I won't argue the point any further. I know what you are saying.
It obviously needs more rigourous testing.
re mmx: I've been playing around trying to get this Visual C++ express to spew out SIMD codings but it doesn't want to play dice at the moment.
It's asm output for the ColourDrain routine is very similar to what you have already coded David. In fact, slower I would say by looking at it, but then I wouldn't trust my timings, especally the ones I've just done in my noodle, lol, and I would say my C++ code is not so hot.
Michael
|
|
Logged
|
|
|
|
Michael Hutton
Developer
member is offline


Gender: 
Posts: 248
|
 |
Re: Setting a register to zero if it's < zero
« Reply #23 on: Oct 9th, 2011, 3:19pm » |
|
Quote:
I knew that but but I must admit to thinking that that it would just extend al into cx, not into ecx and was worried about high word garbage.
Oh well.
Michael
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #24 on: Oct 9th, 2011, 3:27pm » |
|
on Oct 9th, 2011, 3:14pm, Michael Hutton wrote:| It's asm output for the ColourDrain routine ... |
|
Erm... you know it's occurred to me that "ColourDrain" (although I named it) is the kind of name that Fisher-Price might call it if they had written that routine. I'm thinking of calling it "DesaturateColour"! Isn't that more sensible?
Just on the off-chance that anyone wants to see the code (it's in the form of an external GFXLIB module), here it is:
Code: DEF PROCInitModule
PROCInitModule( 0 )
ENDPROC
DEF PROCInitModule( V% )
LOCAL codeSize%, I%, L%, P%, _{}, M$
M$ = "ColourDrain"
GFXLIB_CoreCode% += 0
IF GFXLIB_CoreCode% = 0 THEN ERROR 0, "The GFXLIB core library appears not have been installed and initialised. The core library must be installed and initialised before attempting to install any external GFXLIB modules."
codeSize% = 170
DIM GFXLIB_ColourDrain% codeSize%-1, L% -1
DIM _{fgtzero%, flt2p20%, loop%}
IF V% THEN
PRINT '" Assembling GFXLIB module " + CHR$34 + M$ + CHR$34 + "..."
ENDIF
FOR I% = 8 TO 10 STEP 2
P% = GFXLIB_ColourDrain%
[OPT I%
; REM. SYS GFXLIB_ColourDrain%, pBitmap%, numPixels%, f%
;
; Parameters -- pBitmap%, numPixels%, f%
;
; pBitmap% - points to base address of 32-bpp ARGB bitmap
; numPixels% - number of pixels in bitmap
;
; f% (''colour-drain'' factor) is 12.20 fixed-point integer; range (0.0 to 1.0)*2^20 (Note 2^20 = &100000)
;
; f% is clamped (by this routine) to 0 or 2^20 (&100000)
;
pushad
; ESP!36 = pBitmap%
; ESP!40 = numPixels%
; ESP!44 = f% (= f * 2^20)
sub esp, 4 ; allocate space for one local variable
; And now...
;
; ESP!40 = pBitmap%
; ESP!44 = numPixels%
; ESP!48 = f% (= f * 2^20)
mov esi, [esp + 40] ; ESI = pBitmap%
; calc. address of final pixel
mov ebp, [esp + 44] ; numPixels%
sub ebp, 1 ; numPixels% - 1 (because pixel index starts at zero)
shl ebp, 2 ; 4 * (numPixels% - 1) (because 4 bytes per pixel)
add ebp, esi ; = addr of final pixel
mov [esp], ebp ; ESP!0 = addr of final pixel
mov edi, [esp + 48] ; EDI = f%
;REM. if f% < 0 then f% = 0
cmp edi, 0 ; f% < 0 ?
jge _.fgtzero%
xor edi, edi ; f% = 0
._.fgtzero%
;REM. if f% > 2^20 (&100000) then f% = 2^20
cmp edi, 2^20 ; f% > 2^20 ?
jle _.flt2p20%
mov edi, 2^20 ; f% = 2^20
._.flt2p20%
._.loop%
movzx ecx, BYTE [esi] ; ECX (cl) = blue byte (b&)
movzx ebx, BYTE [esi + 1] ; EBX (bl) = green byte (g&)
movzx eax, BYTE [esi + 2] ; EAX (al) = red byte (r&)
xor ebp, ebp ; EBP = cumulative intensity (i) - initially zero
;REM. i += (0.114 * 2^20) * b&
mov edx, ecx
imul edx, (0.114 * 2^20)
add ebp, edx
;REM. i += (0.587 * 2^20) * g&
mov edx, ebx
imul edx, (0.587 * 2^20)
add ebp, edx
;REM. i += (0.299 * 2^20) * r&
mov edx, eax
imul edx, (0.299 * 2^20)
add ebp, edx
shr ebp, 20 ; EDX (i&) now in the range 0 to 255
;REM. b`& = b& + (((i& - b&)*f%) >> 20)
mov edx, ebp ; copy EBP (i&)
sub edx, ecx ; i& - b&
imul edx, edi ; (i& - b&)*f%
shr edx, 20 ; ((i& - b&)*f%) >> 20
add BYTE [esi], dl ; write blue byte
;REM. g`& = r& + (((i& - g&)*f%) >> 20)
mov edx, ebp ; copy EBP (i&)
sub edx, ebx ; i& - g&
imul edx, edi ; (i& - g&)*f%
shr edx, 20 ; ((i& - g&)*f%) >> 20
add BYTE [esi + 1], dl ; write green byte
;REM. r`& = r& + (((i& - r&)*f%) >> 20)
mov edx, ebp ; copy EBP (i&)
sub edx, eax ; i& - r&
imul edx, edi ; (i& - r&)*f%
shr edx, 20 ; ((i& - r&)*f%) >> 20
add BYTE [esi + 2], dl ; write red byte
add esi, 4 ; next pixel address
cmp esi, [esp]
jle _.loop%
add esp, 4 ; free local variable space
popad
ret 12
]
NEXT I%
IF V% THEN
PRINT " Assembled code size = "; (P% - GFXLIB_ColourDrain%);" bytes"
WAIT V%
ENDIF
ENDPROC
(Edited some mistakes in the assembler code comments)
Rgs,
David.
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Setting a register to zero if it's < zero
« Reply #25 on: Oct 9th, 2011, 3:28pm » |
|
on Oct 9th, 2011, 3:14pm, Michael Hutton wrote:| In defence, I wasn't commenting of the timing the BASIC loops, just the profiler report of the USR() line, but even that only has a resolution of 1ms. |
|
Yes, fair point, I shouldn't have included NEXT in my comment, but USR will still take much longer than the code you're trying to benchmark.
Quote:| I am not sure "completely meaningless" really does apply |
|
I was more meaning in respect of the lack of any alignment code. Modern CPUs typically fetch code in chunks of 32 bytes at a time, and execution speed can be affected depending on the alignment of the code with respect to those chunks (for example a short routine may fit entirely in one chunk, or be split between two).
David's got the right idea, by adding code to align P% before each subroutine, but he's only aligning to 4 bytes (which I don't think is very significant in respect of code, although of course it can be for data) rather than the 32 bytes that is necessary to eliminate alignment as a factor.
And as I said before, even when attempting to eliminate all confounding factors I still found that the code which ran the faster was always the routine which was tested second, and that applied both to my old P4 and an AMD Athlon 64.
So I think my 'meaningless' comment was justified, given the number of factors you didn't consider.
Richard.
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Setting a register to zero if it's < zero
« Reply #26 on: Oct 9th, 2011, 3:36pm » |
|
on Oct 9th, 2011, 3:09pm, David Williams wrote:Okay, with Richard's correction, the fastest non-MMX/non-SIMD code yet is indeed: Code:
mov eax, [edi + 4 * esi]
movzx ebx, ah
movzx ecx, al
shr eax,16
|
|
I assume you appreciate that this code doesn't zero AH, so if your original pixel was indeed xRGB (rather than 0RGB) then eax will end up containing 'xR' not just 'R'.
On my P4, Test#1 (three MOVZX instructions) is the fastest. That doesn't surprise me, because the first MOVZX will load all 4 bytes into the L1 cache so the remaining two can execute as quickly as if they were loading from registers. It ends up faster because of there being no need to do the SHR (and AH is zeroed for free):
Code: Test #1 (three MOVZX instructions) took 2.719 s.
Test #2 (one MOV instruction) took 3.953 s.
Test #3 (one MOV instruction; other instructions re-ordered) took 3.828 s.
Test #4 (one MOV instruction; Michael's solution (no XORs)) took 2.953 s.
Test #5 (one MOV instruction; Michael's solution (with XORs)) took 3.266 s.
Test #6 (one MOV instruction; Richard's solution) took 2.812 s. Richard.
|
| « Last Edit: Oct 9th, 2011, 3:38pm by admin » |
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #27 on: Oct 9th, 2011, 4:14pm » |
|
on Oct 9th, 2011, 3:36pm, Richard Russell wrote:| I assume you appreciate that this code doesn't zero AH, so if your original pixel was indeed xRGB (rather than 0RGB) then eax will end up containing 'xR' not just 'R'. |
|
I'd like to make the unsafe assumption that the source ARGB32 bitmap will always have all the MSB bytes clear, but that probably won't be the case. Having to clear that byte seems to spoil the beauty of the code somewhat! And slows it down, of course. It'll have to be done, though.
on Oct 9th, 2011, 3:36pm, Richard Russell wrote:| On my P4, Test#1 (three MOVZX instructions) is the fastest. |
|
(Sigh.) Well, that just goes to show, doesn't it!
on Oct 9th, 2011, 3:36pm, Richard Russell wrote:| That doesn't surprise me, because the first MOVZX will load all 4 bytes into the L1 cache so the remaining two can execute as quickly as if they were loading from registers. It ends up faster because of there being no need to do the SHR (and AH is zeroed for free): |
|
In light of the 'new' code (getting those R, G, B values into registers), I was going to set about modifying a dozen-or-so GFXLIB routines to rid them of those MOVZX byte-loading instructions. Now it looks like I ought not to bother!
on Oct 9th, 2011, 3:36pm, Richard Russell wrote: Code: Test #1 (three MOVZX instructions) took 2.719 s.
Test #2 (one MOV instruction) took 3.953 s.
Test #3 (one MOV instruction; other instructions re-ordered) took 3.828 s.
Test #4 (one MOV instruction; Michael's solution (no XORs)) took 2.953 s.
Test #5 (one MOV instruction; Michael's solution (with XORs)) took 3.266 s.
Test #6 (one MOV instruction; Richard's solution) took 2.812 s. Richard. |
|
Assuming that those FOR...NEXT loops still had 5000 iterations each (as in the original), I'm curious that your presumably ancient P4 has trounced my 1.83GHz Intel Centrino Duo-based laptop (in this test). Even if your clock speed was 3Ghz, still interesting.
David.
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #28 on: Oct 9th, 2011, 4:20pm » |
|
on Oct 9th, 2011, 3:14pm, Michael Hutton wrote:| It's asm output for the ColourDrain routine is very similar to what you have already coded David. In fact, slower I would say by looking at it, but then I wouldn't trust my timings, especally the ones I've just done in my noodle, lol, and I would say my C++ code is not so hot. |
|
I'm curious. May I see that code?
I didn't think I could ever beat a modern C++ compiler in terms of code efficiency!
Rgs,
David.
|
|
Logged
|
|
|
|
Michael Hutton
Developer
member is offline


Gender: 
Posts: 248
|
 |
Re: Setting a register to zero if it's < zero
« Reply #29 on: Oct 9th, 2011, 5:13pm » |
|
I have to go out now but will post the relevant code for you later..
It is very interesting about the three movzx's being faster in tests. It is good to see these things thrashed out in a thread, to confirm or deny your prejudices/inclinations.
Michael
|
|
Logged
|
|
|
|
admin
Administrator
member is offline


Posts: 1145
|
 |
Re: Setting a register to zero if it's < zero
« Reply #30 on: Oct 9th, 2011, 5:32pm » |
|
on Oct 9th, 2011, 4:14pm, David Williams wrote:| Assuming that those FOR...NEXT loops still had 5000 iterations each (as in the original) |
|
Yes, the code was unchanged except that I modified it to align each routine on a 32-byte boundary (rather than 4). My CPU is a 2.8 GHz Pentium 4.
Here is a comparison of MMX and GP register versions of your 'luminance matrix' code:
Code: MODE 8 : OFF
HIMEM = LOMEM + 2*&100000
DIM gap1% 4096
DIM bitmap% 4*(640*512 + 2)
bitmap% = (bitmap% + 7) AND -8
DIM gap2% 4096
REM. These 4 Kb gaps are probably way OTT, but just to be certain!
PROC_asm
REM. Fill bitmap with random values
FOR I% = bitmap% TO (bitmap% + 4*640*512)-1 STEP 4
!I% = RND
NEXT
G% = FNSYS_NameToAddress( "GetTickCount" )
time0% = 0
time1% = 0
PRINT '" Conducting test #1, please wait..."'
REM. Test #1 (DW's code)
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &80
SYS G% TO time0%
FOR I% = 1 TO 5000
C%=USRA%
NEXT
SYS G% TO time1%
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &20
PRINT '" Test #1 (DW's code) took ";
PRINT ;(time1% - time0%)/1000; " s. (final result ";C% ")"'
PRINT '" Conducting test #2, please wait..."'
REM. Test #2 (RTR's code)
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &80
SYS G% TO time0%
FOR I% = 1 TO 5000
C%=USRB%
NEXT
SYS G% TO time1%
SYS "GetCurrentProcess" TO hprocess%
SYS "SetPriorityClass", hprocess%, &20
PRINT '" Test #2 (RTR's code) took ";
PRINT ;(time1% - time0%)/1000; " s. (final result ";C% ")"'
PRINT''" Finished."
END
;
DEF PROC_asm
LOCAL I%, P%, code%, loop_A, loop_B, loop_C
DIM code% 1000
FOR I% = 0 TO 2 STEP 2
P% = code%
[OPT I%
] : P%=(P%+31) AND -32 : [OPT I%
.A%
mov esi, bitmap%
.loop_A
movzx ecx, BYTE [esi] ; ECX (cl) = blue byte (b&)
movzx ebx, BYTE [esi + 1] ; EBX (bl) = green byte (g&)
movzx eax, BYTE [esi + 2] ; EAX (al) = red byte (r&)
xor ebp, ebp ; EBP = cumulative intensity (i) - initially zero
;REM. i += (0.114 * 2^20) * b&
mov edx, ecx
imul edx, (0.114 * 2^20)
add ebp, edx
;REM. i += (0.587 * 2^20) * g&
mov edx, ebx
imul edx, (0.587 * 2^20)
add ebp, edx
;REM. i += (0.299 * 2^20) * r&
mov edx, eax
imul edx, (0.299 * 2^20)
add ebp, edx
shr ebp, 20 ; EDX (i&) now in the range 0 to 255
add esi, 4
cmp esi, (bitmap% + 640 * 512)
jl loop_A
mov eax,ebp
ret
] : P%=(P%+31) AND -32 : [OPT I%
.matrix
dw 0.114 * 2^15
dw 0.587 * 2^15
dw 0.299 * 2^15
dw 0
.B%
mov esi, bitmap%
movq mm7, [matrix]
.loop_B
punpcklbw mm0,[esi]
psrlw mm0,8
pmaddwd mm0,mm7
movd ebp,mm0
pshufw mm0,mm0,%01001110
movd eax,mm0
add ebp,eax
shr ebp,15
add esi, 4
cmp esi, (bitmap% + 640 * 512)
jl loop_B
mov eax,ebp
ret
]
NEXT I%
ENDPROC
;
DEF FNSYS_NameToAddress( f$ )
LOCAL P%
DIM P% LOCAL 5
[OPT 0 : call f$ : ]
=P%!-4+P% On my PCs the MMX version doesn't run much, if any, faster but the potential gain comes from being able to use the other MMX registers to process (say) four pixels 'in parallel'.
Richard.
|
|
Logged
|
|
|
|
Michael Hutton
Developer
member is offline


Gender: 
Posts: 248
|
 |
Re: Setting a register to zero if it's < zero
« Reply #31 on: Oct 9th, 2011, 8:28pm » |
|
Code:
.loop_B
punpcklbw mm0,[esi]
psrlw mm0,8
pmaddwd mm0,mm7
movd ebp,mm0
pshufw mm0,mm0,%01001110
movd eax,mm0
add ebp,eax
shr ebp,15
Very nice!
Michael
|
|
Logged
|
|
|
|
David Williams
Developer
member is offline

meh

Gender: 
Posts: 452
|
 |
Re: Setting a register to zero if it's < zero
« Reply #32 on: Oct 9th, 2011, 9:08pm » |
|
on Oct 9th, 2011, 5:32pm, Richard Russell wrote:| Here is a comparison of MMX and GP register versions of your 'luminance matrix' code: |
|
Thanks for the code (and apologies for referring to the luminance as "the intensity"!).
The MMX version is slightly slower on my laptop:
GR version: 3.67 s. MMX version: 4.19 s
I know that when it comes to the 'blending' (that's my word for it!) part, MMX is likely to be faster because, IIRC, the MMX-powered alphablending code you kindly contributed to GFXLIB is about twice as fast as the non-MMX version.
Anyway, this has been an interesting discussion (it's certainly occupied much of my day). Things have been learnt, and I'm beginning to wonder if time is better spent not worrying too much about optimising code...because seemingly optimised code on one system is not so on another. (I did know this, but today's discussion has highlighted the fact).
Now I must get on with preparing version 2.03 of GFXLIB for release, along with the GFXLIB FAQ.
After Christmas, I may have another crack at producing a GFXLIB-Lite for the Trial version of BB4W, having taken on board the points/ideas raised by you and Michael (in private correspondence).
Rgs, David.
|
|
|
|
Michael Hutton
Developer
member is offline


Gender: 
Posts: 248
|
 |
Re: Setting a register to zero if it's < zero
« Reply #33 on: Oct 9th, 2011, 9:15pm » |
|
David,
Here is the assembler output for the following routine in C++. I'm not really sure it adds anything or that we should take anything from it. I don't even know if I have put in the right optimising flags etc.
As an aside, the output bitmap does suffer from some aliasing with this routine.
It might also heavily dependent on how I access the bitmap - pData is defined as DWORD* whereas now we know that movzx *3 is better using unsigned char* could be better. Code:
void SetRGBs(DWORD* pData, int num, int f)
{
// Some set up
unsigned int r,g,b,i,rgb;
// Clamp f
if (f < 0) {f = 0;}
if (f > 0x100000) {f = 0x100000;}
for (int x = 0; x < 256; x ++){
for (int y = 0; y < 192; y++){
// get the rgb data
rgb = pData[(y * 256) + x];
r = (rgb >> 16) & 0xFF;
g = (rgb >> 8) & 0xFF;
b = rgb & 0xFF;
// calculate cumulative intensity
i = r * 313524;
i += g * 615514;
i += b * 119537;
// Refactor to 0-255
i = i >> 20;
// Calculate the new values
r = r + (((i-r)*f) >> 20);
g = g + (((i-g)*f) >> 20);
b = b + (((i-b)*f) >> 20);
// Write Pixel Back
pData[(y * 256) + x] = (r << 16) | (g <<8) | b;
}
}
}
and the asm code is - take a deep breath! Code:
_TEXT SEGMENT
_y$33985 = -80 ; size = 4
_x$33981 = -68 ; size = 4
_rgb$ = -56 ; size = 4
_i$ = -44 ; size = 4
_b$ = -32 ; size = 4
_g$ = -20 ; size = 4
_r$ = -8 ; size = 4
_pData$ = 8 ; size = 4
_num$ = 12 ; size = 4
_f$ = 16 ; size = 4
?SetRGBs@@YAXPAKHH@Z PROC ; SetRGBs, COMDAT
; 302 : {
push ebp
mov ebp, esp
sub esp, 276 ; 00000114H
push ebx
push esi
push edi
lea edi, DWORD PTR [ebp-276]
mov ecx, 69 ; 00000045H
mov eax, -858993460 ; ccccccccH
rep stosd
; 303 : // Some set up
; 304 : unsigned int r,g,b,i,rgb;
; 305 :
; 306 : // Clamp f
; 307 : if (f < 0) {f = 0;}
cmp DWORD PTR _f$[ebp], 0
jge SHORT $LN8@SetRGBs
mov DWORD PTR _f$[ebp], 0
$LN8@SetRGBs:
; 308 : if (f > 0x100000) {f = 0x100000;}
cmp DWORD PTR _f$[ebp], 1048576 ; 00100000H
jle SHORT $LN7@SetRGBs
mov DWORD PTR _f$[ebp], 1048576 ; 00100000H
$LN7@SetRGBs:
; 309 :
; 310 :
; 311 : for (int x = 0; x < 256; x ++){
mov DWORD PTR _x$33981[ebp], 0
jmp SHORT $LN6@SetRGBs
$LN5@SetRGBs:
mov eax, DWORD PTR _x$33981[ebp]
add eax, 1
mov DWORD PTR _x$33981[ebp], eax
$LN6@SetRGBs:
cmp DWORD PTR _x$33981[ebp], 256 ; 00000100H
jge $LN9@SetRGBs
; 312 : for (int y = 0; y < 192; y++){
mov DWORD PTR _y$33985[ebp], 0
jmp SHORT $LN3@SetRGBs
$LN2@SetRGBs:
mov eax, DWORD PTR _y$33985[ebp]
add eax, 1
mov DWORD PTR _y$33985[ebp], eax
$LN3@SetRGBs:
cmp DWORD PTR _y$33985[ebp], 192 ; 000000c0H
jge $LN1@SetRGBs
; 313 : // get the rgb data
; 314 : rgb = pData[(y * 256) + x];
mov eax, DWORD PTR _y$33985[ebp]
shl eax, 8
add eax, DWORD PTR _x$33981[ebp]
mov ecx, DWORD PTR _pData$[ebp]
mov edx, DWORD PTR [ecx+eax*4]
mov DWORD PTR _rgb$[ebp], edx
; 315 : r = (rgb >> 16) & 0xFF;
mov eax, DWORD PTR _rgb$[ebp]
shr eax, 16 ; 00000010H
and eax, 255 ; 000000ffH
mov DWORD PTR _r$[ebp], eax
; 316 : g = (rgb >> 8) & 0xFF;
mov eax, DWORD PTR _rgb$[ebp]
shr eax, 8
and eax, 255 ; 000000ffH
mov DWORD PTR _g$[ebp], eax
; 317 : b = rgb & 0xFF;
mov eax, DWORD PTR _rgb$[ebp]
and eax, 255 ; 000000ffH
mov DWORD PTR _b$[ebp], eax
; 318 : // calculate cumulative intensity
; 319 : i = r * 313524;
mov eax, DWORD PTR _r$[ebp]
imul eax, 313524 ; 0004c8b4H
mov DWORD PTR _i$[ebp], eax
; 320 : i += g * 615514;
mov eax, DWORD PTR _g$[ebp]
imul eax, 615514 ; 0009645aH
add eax, DWORD PTR _i$[ebp]
mov DWORD PTR _i$[ebp], eax
; 321 : i += b * 119537;
mov eax, DWORD PTR _b$[ebp]
imul eax, 119537 ; 0001d2f1H
add eax, DWORD PTR _i$[ebp]
mov DWORD PTR _i$[ebp], eax
; 322 : // Refactor to 0-255
; 323 : i = i >> 20;
mov eax, DWORD PTR _i$[ebp]
shr eax, 20 ; 00000014H
mov DWORD PTR _i$[ebp], eax
; 324 : // Calculate the new values
; 325 : r = r + (((i-r)*f) >> 20);
mov eax, DWORD PTR _i$[ebp]
sub eax, DWORD PTR _r$[ebp]
imul eax, DWORD PTR _f$[ebp]
shr eax, 20 ; 00000014H
add eax, DWORD PTR _r$[ebp]
mov DWORD PTR _r$[ebp], eax
; 326 : g = g + (((i-g)*f) >> 20);
mov eax, DWORD PTR _i$[ebp]
sub eax, DWORD PTR _g$[ebp]
imul eax, DWORD PTR _f$[ebp]
shr eax, 20 ; 00000014H
add eax, DWORD PTR _g$[ebp]
mov DWORD PTR _g$[ebp], eax
; 327 : b = b + (((i-b)*f) >> 20);
mov eax, DWORD PTR _i$[ebp]
sub eax, DWORD PTR _b$[ebp]
imul eax, DWORD PTR _f$[ebp]
shr eax, 20 ; 00000014H
add eax, DWORD PTR _b$[ebp]
mov DWORD PTR _b$[ebp], eax
; 328 : // Write Pixel Back
; 329 : pData[(y * 256) + x] = (r << 16) | (g <<8) | b;
mov eax, DWORD PTR _r$[ebp]
shl eax, 16 ; 00000010H
mov ecx, DWORD PTR _g$[ebp]
shl ecx, 8
or eax, ecx
or eax, DWORD PTR _b$[ebp]
mov edx, DWORD PTR _y$33985[ebp]
shl edx, 8
add edx, DWORD PTR _x$33981[ebp]
mov ecx, DWORD PTR _pData$[ebp]
mov DWORD PTR [ecx+edx*4], eax
; 330 : }
jmp $LN2@SetRGBs
$LN1@SetRGBs:
; 331 : }
jmp $LN5@SetRGBs
$LN9@SetRGBs:
; 332 : }
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret 0
?SetRGBs@@YAXPAKHH@Z ENDP ; SetRGBs
_TEXT ENDS
|
|
Logged
|
|
|
|
|