BBC BASIC for Windows
General >> General Board >> MultiThreaded BB4W
http://bb4w.conforums.com/index.cgi?board=general&action=display&num=1273980315

MultiThreaded BB4W
Post by Michael Hutton on May 16th, 2010, 03:25am

Hi Richard,

I hope I find you well. I have been fairly quiet for the last few months, but passed my exams, which I am sure will delight you as it means I have more time on my hands now to annoy you! ;)

Have been thinking/dreaming this end of a way to make BB4W spawn different threads to run BASIC code *in the same process*.

I was thinking, but understand my ideas may be way short of the mark.

Already known and can do : Spawn different asembler threads to do some work and return results.

Prehaps introduce a new keyword or function which will create a thread and execute a PROC/FN in BASIC and returns it's answer. For example :

Code:
Result% = FN_Thread_AddTwoNumbers(A%, B%) 


or a new token/keyword

Code:
R% = THREAD(^FN_Calculate) 


I am thinking you would need to "invoke" the interpreter by the thread and then give it an address to start executing the code...

I can already do this in assembler but I am thinking it would be an amazing thing for BB4W to be able to do. I am imagining multithreaded BASIC code.

I think a lot of people would be interested, but I would like to hear your thoughts.

Michael
Re: MultiThreaded BB4W
Post by admin on May 16th, 2010, 10:48am

on May 16th, 2010, 03:25am, Michael Hutton wrote:
I think a lot of people would be interested, but I would like to hear your thoughts.

I tried doing it myself some time ago. I made some progress, but got stuck on some fundamental issues.

One of these was the need for each 'thread' to have a separate stack. Normally that isn't a major issue, when the stack is relatively small and holds only a few return addresses and 'pushed' values (typically if you create an assembler thread you might allocate a couple of Kbytes for the stack). However BBC BASIC uses the stack for much more than that. In particular, all nesting structures use the stack (so the stack grows arbitrarily deep) and LOCAL arrays, structures and DIMmed memory also reside on the stack.

A related issue is that BBC BASIC notices that it has run out of memory when the bottom of the stack approaches the top of the heap. That's fine when there is just one stack, but what happens when each 'thread' has its own stack but there is only one shared heap? Only one of the stacks can 'approach' the heap, the others must be separate. How, then, can a thread determine when all its stack space is used up?

Another fundamental problem is that different BB4W 'threads' would in fact all have to run in the same CPU thread (!) because the code of BB4W is not 'thread safe'. Therefore although they could multi-task, they couldn't take any advantage of multiple CPUs or cores!

I concluded that rather than having separate 'threads' it would be easier to have separate processes, each spawned using BBCWRUN.EXE. Of course they then don't share the same heap, and one has to use IPC (InterProcess Communication), but at least it's a workable solution and advantage can be taken of multiple cores/CPUs.

Richard.

Re: MultiThreaded BB4W
Post by Michael Hutton on May 16th, 2010, 1:27pm

Ah. I see what you are getting at, what a shame. Oh well, I can continue with using ASM code for different threads, but was wondering if it would have been possible.

Thanks for the reply.

Michael
Re: MultiThreaded BB4W
Post by admin on May 17th, 2010, 08:33am

on May 16th, 2010, 1:27pm, Michael Hutton wrote:
Oh well, I can continue with using ASM code for different threads, but was wondering if it would have been possible.

Would you not consider using the 'multi process' approach I suggested, if it could be made suffciently user-friendly? The way I'd envisage it working would be as a library including the function:

Code:
      FN_spawn(PROCcode, shared{}) 

which would run the procedure PROCcode (in another process) and pass to it the structure shared{}. The structure would automatically be placed in shared memory, so both the calling program and the spawned procedure can access its members concurrently. The function would return an identifier of the child process (e.g. to allow it to be killed).

To the user this would appear little different from a spawned thread, the main consideration being that only the variables in shared{} can be accessed by the procedure, rather than all global variables. That's not necessarily a bad thing because it imposes some extra discipline on the communication between the concurrent tasks.

Of course the spawned procedure cannot output to the main window, but again that's no bad thing. It could, with care, still output to child controls if their window handles were passed in the shared structure.

I reckon that this approach could be made to work nicely, with the user not having to worry too much about the fact that it's actually a separate process, rather than just a separate thread, being invoked.

What do you think?

Richard.
Re: MultiThreaded BB4W
Post by DDRM on May 17th, 2010, 09:16am

Hi Richard,

That sounds like an excellent solution: presumably the structure could contain, for example, an array full of data. Presumably an element of the structure could be used to allow coordination of the two processes (i.e. as a marker of how far the subprocess had got) - I assume the return would only occur once the procedure completed.

Presumably it would be the programmer's responsibility to avoid clashes with the shared data - I guess one thread should regard any given element as read-only - or could you provide protection for that transparently?

Could the structure contain an array which was actually being used as a DIBsection to map to the main window? that might allow parallel processing of images, with direct output to the main window.

If I have understood correctly (quite unlikely), a new process would get a new "core" if one was available - what would happen if not? would you just get a new thread but have to share a core that was already in use? How would the processor decide which one to use?

Probably silly ideas, but at least I am only annoying a few of you on the conforum!

smiley

David
Re: MultiThreaded BB4W
Post by admin on May 17th, 2010, 09:30am

on May 16th, 2010, 03:25am, Michael Hutton wrote:
For example :

Code:
Result% = FN_Thread_AddTwoNumbers(A%, B%) 

or a new token/keyword

Code:
R% = THREAD(^FN_Calculate) 

Further to my previous replies, I trust it's obvious to you that neither of the examples you gave actually benefit from using multiple threads (or processes)! In both cases the calling program waits for the result from the thread (Result% and R% respectively). Hence there's no concurrency and no advantage - indeed a disadvantage - of running a separate thread.

To be of any benefit you have to be able to get on with something else while the separate thread or process does its stuff. Therefore rather than the result being returned as if from a function (assuming the thread is performing some calculation) the calling program must be able to periodically 'poll' for the spawned thread having completed its task.

In my proposal that would involve one of the members of the shared{} structure being a semaphore indicating availability of the result in (probably) another member.

Richard.

Re: MultiThreaded BB4W
Post by admin on May 17th, 2010, 09:42am

on May 17th, 2010, 09:16am, DDRM wrote:
presumably the structure could contain, for example, an array full of data.

The intention is that it could contain anything, within reason.

Quote:
I assume the return would only occur once the procedure completed.

As I explained in my reply to Michael, if the calling program waits for the result from the spawned thread/process then there's no concurrency and no benefit. You might as well do it all in one thread.

Quote:
Presumably it would be the programmer's responsibility to avoid clashes with the shared data - I guess one thread should regard any given element as read-only - or could you provide protection for that transparently?

When necessary the threads/processes would have to coordinate their accesses, e.g. using semaphores, but with simple values such as 32-bit integers or bytes it should not be necessary because the accesses are likely to be uninterruptible anyway.

Quote:
Could the structure contain an array which was actually being used as a DIBsection to map to the main window?

A DIBSection is allocated by Windows, so you don't have any control over where in memory it is located. You can't 'force' it to be at a specific address. You could probably do what you want using a regular DIB.

Quote:
If I have understood correctly (quite unlikely), a new process would get a new "core" if one was available - what would happen if not? would you just get a new thread but have to share a core that was already in use? How would the processor decide which one to use?

This is what happens all the time in regular single-core single-CPU Windows, so it's nothing out of the ordinary.

Richard.
Re: MultiThreaded BB4W
Post by Michael Hutton on May 22nd, 2010, 02:15am

on May 17th, 2010, 08:33am, Richard Russell wrote:
Would you not consider using the 'multi process' approach I suggested, if it could be made suffciently user-friendly? The way I'd envisage it working would be as a library including the function:

Code:
      FN_spawn(PROCcode, shared{}) 

which would run the procedure PROCcode (in another process) and pass to it the structure shared{}. The structure would automatically be placed in shared memory, so both the calling program and the spawned procedure can access its members concurrently. The function would return an identifier of the child process (e.g. to allow it to be killed).


Definitely an approach which would be worthwhile. I have previously used the Shellexecute to spawn another BB4W to perform a fixed task and return results in a file, not exactly programmable!

As you say the 'limitations' of only having access to shared variables and shared memory are a good thing. Having access to Globals could make things very messy. What would happen if someone sets up an interrupt on the spawned process to change Globals in the main program....?

The only other limitation I can see is the overhead of creating the process, but then I suppose it would stay running in memory using the SemaphoreObjects to signal its current working state..working or waiting for data...

I see what you are getting at with the code I originally posted, but I had in my mind a function that returns immediately without waiting, it probably wasn't the best way to get the idea across.
Code:
R% = THREAD(^FN_Calculate) 
should be
Code:
THREAD(^FN_Calculate, Result%)
 
where THREAD is a new Keyword.

but anyway, it is irrelevant.

I think the multi-process idea is definitely the way to go and could be made very user friendly. It would be another feather in the cap of BB4W.

Michael

Re: MultiThreaded BB4W
Post by admin on May 22nd, 2010, 11:52am

on May 22nd, 2010, 02:15am, Michael Hutton wrote:
The only other limitation I can see is the overhead of creating the process, but then I suppose it would stay running in memory using the SemaphoreObjects to signal its current working state..working or waiting for data...

Indeed. Not only is launching a process a little slow, but FN_spawn also takes some time building the shared memory. So, as you say, rather than call FN_spawn multiple times it would be better to call it just once and use semaphores to control when the spawned procedure runs.

Quote:
I think the multi-process idea is definitely the way to go and could be made very user friendly. It would be another feather in the cap of BB4W.

I've got it going. One limitation I didn't previously mention is that the shared structure cannot include strings or substructures. Although it would be theoretically possible to map them into the shared memory, the complication would be out of all proportion to the benefit. If you do want to share a string, use a byte array to hold it.

Here's a simple program I used to test it:

Code:
      INSTALL @lib$+"SPAWNLIB"
      
      DIM mystruct{one,two%,three&,four(3),five%(1,2),six&(9),flag%}
      mystruct.one = 1.0
      mystruct.two% = 2
      mystruct.three& = 3
      mystruct.four(1) = 4.0
      mystruct.five%(0,2) = 5
      mystruct.six&(4) = 6
      
      PRINT mystruct.one, mystruct.two%, mystruct.three&, mystruct.four(1), ;
      PRINT mystruct.five%(0,2), mystruct.six&(4)
      
      spawn% = FN_spawn(PROCspawned(), mystruct{})
      REPEAT WAIT 0 : UNTIL mystruct.flag%
      
      PRINT mystruct.one, mystruct.two%, mystruct.three&, mystruct.four(1), ;
      PRINT mystruct.five%(0,2), mystruct.six&(4)
      
      PROC_unspawn(spawn%, mystruct{})
      END
      
      DEF PROCspawned(shared{})
      shared.one += 1.0
      shared.two% += 2
      shared.three& += 3
      shared.four(1) += 4.0
      shared.five%(0,2) += 5
      shared.six&(4) += 6
      
      shared.flag% = 1
      ENDPROC 

Richard.
Re: MultiThreaded BB4W
Post by Michael Hutton on May 23rd, 2010, 02:25am

eek, just posted a reply but dowsn't seem to have logged... hmm.

Anyway, again:

That was quick! Thanks.

A few questions:

1. Is spawn% the returned process handle (I am assuming yes)?

2. Does the mystruct.flag% always have to be defined in the shared structure (and will its position always be fiixed at the end)? Also, I am assuming yes...

3. Maybe add a helper function to SPAWNLIB to return the number of processors.

Code:
      DEF FN_SPAWNLIB_NumberOfProcessors
      LOCAL _SYSTEM_INFO{}
      REM Get Computer Information
      DIM _SYSTEM_INFO{ wProcessorArchitecture%,      \ Only lower word if valid: high word is wReserved - was dwOemID
      \                 dwPageSize%,                  \
      \                 lpMinimumApplicationAddress%, \
      \                 lpMaximumApplicationAddress%, \
      \                 dwActiveProcessorMask%,       \
      \                 dwNumberOfProcessors%,        \
      \                 dwProcessorType%,             \
      \                 dwAllocationGranularity%,     \
      \                 wProcessorLevel{l&,h&},       \
      \                 wProcessorRevision{l&,h&}    }
      SYS "GetSystemInfo", _SYSTEM_INFO{}
      =_SYSTEM_INFO.dwNumberOfProcessors%
 



er, I think that was it... I hope my first post doesn't appear suddenly from the ether...

Michael
Re: MultiThreaded BB4W
Post by admin on May 23rd, 2010, 09:38am

on May 23rd, 2010, 02:25am, Michael Hutton wrote:
1. Is spawn% the returned process handle (I am assuming yes)?

Seems a strange assumption, given that FN_spawn has no reason to know what the process handle is and doesn't discover it (the remote process is run using ShellExecute, for convenience). But, just to confirm, the returned value is not the process handle.

Quote:
2. Does the mystruct.flag% always have to be defined in the shared structure

I don't think you understand the purpose of the flag, otherwise you wouldn't ask the question! To clarify: the flag is used (in the test program) to synchronise the two processes, so of course it must be shared between them.

Quote:
(and will its position always be fiixed at the end)? Also, I am assuming yes...

Again, your assumption is completely wrong (what is the purpose of asking a question but adding that you've already assumed what the answer is?). The flag is just a shared user variable, specific to the test program I listed. It doesn't have any 'internal' meaning to FN_spawn or SPAWNLIB in general. So the flag can go anywhere, or there may be no flag at all, or there may be multiple flags depending on the application.

Quote:
3. Maybe add a helper function to SPAWNLIB to return the number of processors.

I can't really see the point, because presumably one would spawn another process (or not) irrespective of the number of processors. It had never occurred to me that one might structure a program differently depending on the number of cores; I'd be very surprised if the benefit would ever outweigh the complication.

Richard.

Re: MultiThreaded BB4W
Post by DDRM on May 24th, 2010, 09:18am

on May 23rd, 2010, 09:38am, Richard Russell wrote:
(re possibility of helper function to detect the number of (available?) cores)

I can't really see the point, because presumably one would spawn another process (or not) irrespective of the number of processors. It had never occurred to me that one might structure a program differently depending on the number of cores; I'd be very surprised if the benefit would ever outweigh the complication.


Hi Richard,

I can - the sort of situation where I could see myself using this library would be as a "poor man's parallel computer" - for example, by doing parallel processing of images, etc. In that situation you could spawn as many new processes as available to, say, process one line of an image each, while the main programme checked how they were getting on, displayed them (or whatever), and gave them the next set of data.

If you only had a single core, you would simply run the procedures in-line, without the need to spawn a new process. If there were multiple cores available you could send them each a data set and a copy of the procedure.

That shouldn't be a lot more complicated, but could easily be twice as quick: two cores processing, one plotting, and one left for playing chess while I wait!

D
Re: MultiThreaded BB4W
Post by admin on May 24th, 2010, 1:54pm

on May 24th, 2010, 09:18am, DDRM wrote:
If you only had a single core, you would simply run the procedures in-line

Why?

I suspect you've somehow got it into your head that you can only spawn extra processes if there are multiple cores, when of course Windows normally runs tens or hundreds of concurrent processes even if there's only a single CPU and a single core. Indeed I've only tested SPAWNLIB on a single-core machine, because that's all I own.

If there is only a single core, then it's true that "running the procedures in line" would be slightly quicker, because the overhead of spawning the extra processes and the necessary inter-process synchronisation would take some time, but I would expect it to be only a small proportion of the total time taken.

Compare the two cases:

1. You test how many cores there are, and depending on the result you either spawn additional processes or you don't. You might get a small performance improvement if there's only one core, but the penalty you pay is a more complicated program and (importantly) two completely different situations to test. You would have to test your program on both a single-core and a multi-core platform whenever any change is made.

2. You always spawn multiple processes, irrespective of the number of cores. You will pay a small price in performance if there's only one core (for the reasons stated above) but you will have just one program to test, which you can test on a single-core or multi-core platform with reasonable confidence it will work OK on both.

So I remain of the opinion that testing the number of cores and acting differently depending on the result is complicated, risky and likely to be of little benefit.

Richard.
Re: MultiThreaded BB4W
Post by DDRM on May 24th, 2010, 3:54pm

Hi Richard,

That probably reflects how efficiently you could write your "sharing" data etc, compared to how I would do it! smiley

I quite recognise that Windows can happily handle lots of threads on a single core, and you may well be right that it isn't worth the effort of checking - that simply always using multiple threads is almost as quick, if you are careful about how you set things up.

As for the testing question, I'm not sure that the differences would be as great as you suggest: could you not use the same procedures in either case, but simply run multiple copies if you have multiple threads running? Maybe I am misunderstanding how things will work.

As a concrete example, consider a Mandelbrot programme: I could have PROCcalcline that take an x range and a y value, a spacing, and an array for the results, and PROCplot that bangs them out to the display.

If I have a single core, I simply alternate between them.

If I have three, I can have two of them running PROCcalcline, and the third one monitoring the data and plotting it when it is ready, in the meantime giving that thread another set of data to work on.

I suspect, but do not know, that the latter scheme would run significantly less efficiently on a single core than the first.

The proof will be in the pudding when the library appears...

Best wishes,

D
Re: MultiThreaded BB4W
Post by admin on May 24th, 2010, 5:43pm

on May 24th, 2010, 3:54pm, DDRM wrote:
Could you not use the same procedures in either case, but simply run multiple copies if you have multiple threads running?

If you have a program which does different things according to the number of cores, are you seriously suggesting that you need test only one of the cases? In other words, are you proposing that you test only the multi-core situation, and somehow structure the code so that the single-core case is bound to work, although untested?

Irrespective of testing, there is bound to be extra work in writing code that works correctly in the two different ways (concurrent and alternate).

Quote:
I suspect, but do not know, that the latter scheme would run significantly less efficiently on a single core than the first.

Why do you suspect that? As previously discussed in this thread, you will want the spawning of the other process(es) to happen only once, because there's a significant overhead for that. So, for a relatively slow operation such as plotting a Mandelbrot set, the once-off spawning will take an insignificant time.

That leaves us with the overhead of process switching and synchronisation. I find it hard to see why that would necessarily be significant. Can you explain?

Quote:
The proof will be in the pudding when the library appears...

I'm pretty happy with the library, but as all my PCs have only a single CPU/core I can't test it as thoroughly as I would like. I don't have any immediate plans to release it.

Richard.

Re: MultiThreaded BB4W
Post by DDRM on May 25th, 2010, 08:35am

on May 24th, 2010, 5:43pm, Richard Russell wrote:
If you have a program which does different things according to the number of cores, are you seriously suggesting that you need test only one of the cases? In other words, are you proposing that you test only the multi-core situation, and somehow structure the code so that the single-core case is bound to work, although untested?


No, but I would expect it to work fairly straightforwardly, if most of the code is the same! My main machine is a quad-core AMD, while my laptop is a single core Intel, so I could test it in both situations. Actually, allowing testing for cores makes this problem easier, since, by disabling the test for cores and therefore NOT starting new threads, it should behave the same in a multicore machine as it would on a single-core one, shouldn't it? What you couldn't do is test the multi-core behaviour rigorously on a single-core machine, since it (shouldn't) spawn the new threads.

Quote:
Why do you suspect that? ...(it would be slower running multiple threads on a single core)... Can you explain?


Largely because you would be forcing the processor to switch between the threads you had started, giving each a bit of processor time, then interrupting, and giving the next a go. In contrast, by running only a single copy, all the processor time allowed to my programme would be used as efficiently as possible. I don't have a feel for what the overheads involved in that are.

Quote:
I'm pretty happy with the library, but as all my PCs have only a single CPU/core I can't test it as thoroughly as I would like. I don't have any immediate plans to release it.


I'm sure you wouldn't consider my "testing" rigorous enough, but if you would like me to RUN some tests on a multi-core machine and report the results, I will happily do so.

Best wishes,

D
Re: MultiThreaded BB4W
Post by admin on May 25th, 2010, 11:29am

on May 25th, 2010, 08:35am, DDRM wrote:
Largely because you would be forcing the processor to switch between the threads you had started, giving each a bit of processor time, then interrupting, and giving the next a go. In contrast, by running only a single copy, all the processor time allowed to my programme would be used as efficiently as possible.

You are here simply restating the obvious, that switching processes takes a finite time. What you have failed to do is provide a plausible argument that the overhead is significant, and outweighs the clear advantages of my suggested approach (spawning multiple processes irrespective of the number of cores).

Context switches in Windows are fast, because typically there are large numbers of them taking place all the time, to service the many background tasks.

Quote:
I'm sure you wouldn't consider my "testing" rigorous enough, but if you would like me to RUN some tests on a multi-core machine and report the results, I will happily do so.

I will see if I can find a suitable program (maybe the Mandelbrot plot you suggested) that I could use as a test bed. If I can, I'll send you something to try on your multi-core machine.

Richard.

Re: MultiThreaded BB4W
Post by admin on May 26th, 2010, 5:29pm

I seem to have reached rather a dead-end with SPAWNLIB. It works (sort of) but there are two quite serious problems that I can find no way around:

1. Programs using it must be Run As Administrator on Vista and Windows 7. This is due to the way shared memory is handled on those systems, and I don't know of a workaround.

2. Large numbers of handles are leaked. This appears to be due to ShellExecute, which on its own seems to leak handles. I don't know how to prevent it.

So unless solutions to these problems are found, I will archive SPAWNLIB as an interesting, but failed, experiment.

Richard.
Re: MultiThreaded BB4W
Post by Michael Hutton on May 27th, 2010, 03:19am

on May 26th, 2010, 5:29pm, Richard Russell wrote:
1. Programs using it must be Run As Administrator on Vista and Windows 7. This is due to the way shared memory is handled on those systems, and I don't know of a workaround.


Would ShellExecuteEx using SEE_MASK_NOZONECHECKS be a solution? I haven't tested this but I hope this doesn't look at time zones instead, and I don't know how this effects Shared Memory created with the File Mapping functions.


Quote:
2. Large numbers of handles are leaked. This appears to be due to ShellExecute, which on its own seems to leak handles. I don't know how to prevent it.


Also wouldn't using ShellExecuteEx give a way to return the handle to the process invoked?

Quote:
So unless solutions to these problems are found, I will archive SPAWNLIB as an interesting, but failed, experiment.


I ask for my own enlightenment rather that assuming you hadn't thought of these...

Michael


Re: MultiThreaded BB4W
Post by admin on May 27th, 2010, 08:33am

on May 27th, 2010, 03:19am, Michael Hutton wrote:
Would ShellExecuteEx using SEE_MASK_NOZONECHECKS be a solution?

What makes you think it might? The way Vista/7 handle shared memory isn't a function of how the process is launched (you can use the lower-level CreateProcess and the behaviour is the same).

Quote:
Also wouldn't using ShellExecuteEx give a way to return the handle to the process invoked?

As I said earlier in the thread, SPAWNLIB doesn't need to know the process handle. As you will appreciate, the process handle has nothing whatever to do with the handle leaks I mentioned; it is closed automatically when the process terminates.

Quote:
I ask for my own enlightenment rather that assuming you hadn't thought of these...

It's not so much that I hadn't thought of them, I don't see their relevance to the issues I mentioned (requiring to be Run As Administrator under Windows/7 and handle leaks from ShellExecute).

Richard.