BBC BASIC for Windows - Letting GCC do the hard work

BBC BASIC for Windows

Programming

Assembly Language Programming (Moderator: admin)

Letting GCC do the hard work

« Previous Topic | Next Topic »

Pages: 1

Author

Topic: Letting GCC do the hard work (Read 1779 times)

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Letting GCC do the hard work
« Reply #7 on: Aug 17^th, 2014, 08:13am »

Probably my last post for a while...

1000 depth-sorted 'vector balls' based on graphics routines written in C (including the Shell Sort code I borrowed from Rosetta Code). I get 60 fps on my laptop, which is quite impressive, considering (well, considering that I don't yet really know what I'm doing with C):

www.bb4wgames.com/temp/vector_balls.zip [EXE; 142 Kb]

Update: Two versions of glib.dll compiled using different GCC optimisation settings (-O2 and -O3 plus -ffast-math). Also edited this post to include the image link below.

Screenshot:
www.bb4wgames.com/temp/vecballs.jpg

This kind of performance is convincing me that hybridizing BB4W and C/C++ code is the way to go (for me personally).

I'll include the BB4W part of the source below for curious people.

David.
---

Code:

      *FLOAT 64
      *ESC OFF

      ON ERROR PROCerror

      HIMEM = PAGE + 2*&100000

      PROCFixWndSz : MODE 8 : OFF

      INSTALL @lib$ + "GLIB"
      PROCInitGLIB( @lib$ + "glib.dll", g{} )

      ON ERROR PROCCleanup : PROCerror
      ON CLOSE PROCCleanup : QUIT

      GetTickCount = FN`s("GetTickCount")

      LerpClr        = FNImport("LerpClr")
      Plot           = FNImport("Plot")
      InitExPoints   = FNImport("InitExPointList")
      ShellSort      = FNImport("ShellSortExPointListZValues")
      Rotate         = FNImport("RotateExPoints")
      MakeBallBitmap = FNImport("MakeBallBitmap1")

      REM Create a 48x48 ball bitmap:
      DIM ball% 4*(48*48 + 1)
      ball%=ball% + 3 AND -4
      SYS MakeBallBitmap, ball%, 48, &40AA20, &FF0020, 0.99*&10000, &10000

      N% = 1000

      DIM list{(N%-1) x#, y#, z#, x2#, y2#, z2#, key%, d0%, d1%, d2% }

      listBaseAddr% = ^list{(0)}.x#

      IF (listBaseAddr% AND 3) <> 0 THEN
        PRINT '" The coordinates list base address is not DWORD-aligned!"
        PRINT '" This may affect performance. Continuing in 5 seconds..."
        WAIT 500
      ENDIF

      REM Define objects (balls) 3D (x,y,z) coordinates:
      FOR I% = 0 TO N%-1
        list{(I%)}.x# = (RND(1)-0.5) * 800.0
        list{(I%)}.y# = (RND(1)-0.5) * 800.0
        list{(I%)}.z# = (RND(1)-0.5) * 800.0
        list{(I%)}.x2# = 0.0
        list{(I%)}.y2# = 0.0
        list{(I%)}.z2# = 0.0
      NEXT

      REM Initialise rotation angles:
      t1 = 2*PI*RND(1)
      t2 = 2*PI*RND(1)
      t3 = 2*PI*RND(1)

      F% = &10000 : REM Frequently used constant

      frames% = 0

      *REFRESH OFF

      SYS GetTickCount TO time0%
      REPEAT
  
        REM Draw background:
        SYS LerpClr, g{}, &102030, &805090
  
        REM Init key values for Z-sort
        REM (this could and perhaps should be moved into the rotation routine):
        SYS InitExPoints, listBaseAddr%, N%
  
        REM Rotate coordinates:
        SYS Rotate, listBaseAddr%, 1+2+4, N%, F%*t1, F%*t2, F%*t3, \
        \ F%*-100, F%*-200, F%*50, \
        \ F%*320, F%*256, F%*0, \
        \ 1, F%*300, F%*800
  
        REM Shell sort code (in C) courtesy of Rosetta Code, many thanks:
        SYS ShellSort, listBaseAddr%, N%, -1
  
        REM Draw the depth-sorted balls:
        FOR I% = 0 TO N%-1
          J% = list{(I%)}.key%
          SYS Plot, g{}, ball%, 48, 48, list{(J%)}.x2#-16, list{(J%)}.y2#-16
        NEXT I%
  
        PROCDisplay( TRUE )
  
        frames% += 1
  
        SYS GetTickCount TO time1%
        IF time1%-time0%>=1000 THEN
          SYS GetTickCount TO time0%
          SYS "SetWindowText", @hwnd%, STR$frames% + " fps"
          frames% = 0
        ENDIF
  
        REM Bump rotation angles:
        t1 += 0.02001
        t2 += 0.01905
        t3 += 0.00598
      UNTIL FALSE

      DEF PROCFixWndSz
      LOCAL W%
      SYS"GetWindowLong",@hwnd%,-16 TO W%
      SYS"SetWindowLong",@hwnd%,-16,W% ANDNOT&40000 ANDNOT&10000
      ENDPROC

      DEF PROCerror
      OSCLI "REFRESH ON" : CLS : ON : PRINT '" ";
      REPORT : PRINT " at line "; ERL;
      REPEAT UNTIL INKEY(1)=0
      ENDPROC

« Last Edit: Aug 17^th, 2014, 9:09pm by David Williams »

Logged

rtr
Guest

Re: Letting GCC do the hard work
« Reply #8 on: Aug 17^th, 2014, 10:32am »

on Aug 17^th, 2014, 08:13am, David Williams wrote:

This kind of performance is convincing me that hybridizing BB4W and C/C++ code is the way to go (for me personally).

Performance (if by that you mean speed) is not a good reason to go down the GCC route. Even though the code generators in modern compilers are very good, you will almost always be able to do better with hand-crafted assembler.

That is especially true when you are not targeting a particular CPU architecture, but want the code to run on a wide range of machines. In that case much of the clever code-optimising for a specific architecture, that GCC can do very well, will benefit some machines at the expense of others.

If you are compiling with the -march=native switch and then testing your code on the same machine you are getting a misleading impression of performance (unless of course you want to go down the route of including machine code for a range of different architectures and choosing the best one at run time).

Where using C does admittedly have advantages is in speed and ease of coding, and especially in time taken debugging. If those are the issues that most concern you, then fine.

Richard.

Logged

David Williams
Developer

member is offline
Avatar

meh

Gender:

Posts: 452

Re: Letting GCC do the hard work
« Reply #9 on: Aug 17^th, 2014, 11:38am »

on Aug 17^th, 2014, 10:32am, Richard Russell wrote:

If you are compiling with the -march=native switch and then testing your code on the same machine you are getting a misleading impression of performance [...]

Yes, I did use that switch and when after uploading the EXE for public consumption, I discovered that it crashed my 32-bit XP-based laptop (which I hardly use now!). I suspect my use of -march=native caused GCC to generate 64-bit code since the laptop the code was compiled on is a 64-bit machine. Lesson learned.

Re-compiling the vector balls demo without the aforementioned switch results in the code working on the 32-bit laptop, although the frame rate isn't as high as on the compilation machine (which is a little faster anyway, I think).

Quote:

Where using C does admittedly have advantages is in speed and ease of coding, and especially in time taken debugging. If those are the issues that most concern you, then fine.

For the vast majority of my 'applications', the speed of GCC's generated ASM code suffices (and in some cases, has exceeded the speed of my hand-written ASM code, which isn't too surprising!). I won't be touching -- or rather, writing -- assembler code again unless my life depends on it.

David.
--

Logged

Pages: 1


« Previous Topic \| Next Topic »