BBC BASIC for Windows
Programming >> User Interface >> UTF-8 editor
http://bb4w.conforums.com/index.cgi?board=ui&action=display&num=1387532103

UTF-8 editor
Post by Ken Down on Jun 5th, 2010, 04:34am

Richard Russell's simple text editor is very useful, but I cannot get it to accept or output UTF-8 text. I've put in the recommended code to tell it to use UTF-8, but any "funny" characters just come out on screen as ||||| and are saved likewise.

Any ideas, please?
Re: UTF-8 editor
Post by admin on Jun 5th, 2010, 09:20am

on Jun 5th, 2010, 04:34am, Guest-Ken Down wrote:
Richard Russell's simple text editor is very useful, but I cannot get it to accept or output UTF-8 text.

Are we talking about TEXTEDIT.BBC (or something else based on a Windows Edit Control)? If so, Windows Edit Controls do not directly support UTF-8 encoding. What you will have to do is to configure the Edit Control as Unicode (UTF-16, or more precisely UCS-2 encoding) then use MultiByteToWideChar and WideCharToMultiByte respectively to convert UTF-8 to UCS-2 and UCS-2 to UTF-8. It's not too difficult.

To convert the Edit Control to Unicode see this Wiki article, but use the "RichEdit20W" class rather than "RichEdit20A":

http://bb4w.wikispaces.com/Using+Rich+Edit+controls

Do be careful when allocating buffers to ensure they are big enough (UCS-2 requires two bytes per character).

Richard.

Re: UTF-8 editor
Post by Ken Down on Jun 6th, 2010, 4:10pm

It's the TEXTEDIT.BBC example program, not an edit box. I presume, therefore, that the detailed instructions you have given do not apply.
Re: UTF-8 editor
Post by admin on Jun 6th, 2010, 4:54pm

on Jun 6th, 2010, 4:10pm, Guest-Ken Down wrote:
It's the TEXTEDIT.BBC example program, not an edit box. I presume, therefore, that the detailed instructions you have given do not apply.

TEXTEDIT.BBC does use a Windows Edit Control (an "edit box", if you prefer):

Code:
Hedit% = FN_createwindow("EDIT", "", 0, 0, @vdu%!208, @vdu%!212, 0, &200044, 0) 

Therefore the instructions I gave apply in full.

Richard.
Re: UTF-8 editor
Post by Ken Down on Jun 6th, 2010, 8:23pm

Oh, OK. I didn't realise that a window was an edit box.
I'll try working it out.
Thanks.
Re: UTF-8 editor
Post by Ken Down on Jun 27th, 2010, 07:40am

Hmmmm. I finally got around to trying this out. I loaded in the example program, TEXTEDIT.BBC
I copied the section from the instructions for Rich Text Edit Boxes which begins SYS "LoadLibrary", "RICHED20.DLL"
and ends
SCF_ALL = 4
and put them just before the call to FN_createwindow.

I then altered the call to FN_createwindow
Hedit% = FN_createwindow("RichEdit20W","",0,0,@vdu%!208,@vdu%!212,0,WS_BORDER,0)

When I ran the program the edit box appeared with a nice border around it. It accepted and displayed correctly some Hebrew characters, but it would not accept RETURN, just beeped when I pressed that key.

I returned the style parameter to &200044 and the border disappeared but it would now accept RETURN.

However when I saved what I had entered the Hebrew characters appeared as huh? (which isn't much of an improvement).

I have tried putting VDU23,22,800;600;8,16,16,8+128 (which is supposed to set the font to UTF-8) right at the start of the program, but it makes no difference.

I presume there is something simple which I have overlooked, but I can't see what it might be. Any help gratefully accepted.
Re: UTF-8 editor
Post by admin on Jun 27th, 2010, 09:19am

on Jun 27th, 2010, 07:40am, Guest-Ken Down wrote:
it would not accept RETURN, just beeped when I pressed that key.

Check your style values. You probably need to include ES_MULTILINE (4) and possibly ES_WANTRETURN (&1000). See the list of RichEdit styles here:

http://msdn.microsoft.com/en-us/library/bb774367.aspx

Quote:
However when I saved what I had entered the Hebrew characters appeared as ???? (which isn't much of an improvement).

It's a little hard to comment without seeing your code. I explained before that you would need to use SYS "WideCharToMultiByte" to convert the UCS-2 text returned from the RichEdit control to the UTF-8 text that you want to save to file. You must also use SYS "SendMessageW" (rather than the regular SYS "SendMessage") to get the UCS-2 data in the first place. My guess would be that you've used one or other of those calls incorrectly.

This is what I would expect your code to look like (or something very similar):

Code:
      DEF FNsaveas : LOCAL F%, L%, N%, U%
      SYS "GetSaveFileName", fs{} TO F%
      IF F% PROCtitle ELSE = FALSE
      DEF FNsave : LOCAL F%, L%, N%, U% : IF ?Fn% = 0 THEN = FNsaveas
      SYS "SendMessageW", Hedit%, WM_GETTEXTLENGTH, 0, 0 TO L%
      SYS "GlobalAlloc", 0, 2*(L%+1) TO F%
      SYS "SendMessageW", Hedit%, WM_GETTEXT, L%+1, F%
      SYS "WideCharToMultiByte", CP_UTF8, 0, F%, L%, 0, 0, 0, 0 TO N%
      SYS "GlobalAlloc", 0, N% TO U%
      SYS "WideCharToMultiByte", CP_UTF8, 0, F%, L%, U%, N%, 0, 0
      SYS "GlobalFree", F%
      OSCLI "SAVE """+$$Fn%+""" "+STR$~U%+"+"+STR$~N%
      SYS "GlobalFree", U%
      = TRUE 

Quote:
I have tried putting VDU23,22,800;600;8,16,16,8+128 (which is supposed to set the font to UTF-8) right at the start of the program, but it makes no difference.

As I've explained before, TEXTEDIT.BBC does not use BBC BASIC's VDU emulator for its output, therefore that command is irrelevant and will have no effect.

Richard.

Re: UTF-8 editor
Post by Ken Down on Jun 28th, 2010, 9:18pm

Ok, I'll play around with that over the next few days. Thanks for your patience and expertise.
Re: UTF-8 editor
Post by Ken Down on Jul 5th, 2010, 5:05pm

I presume that when loading a file back in again, I would need to use the opposite call, "MultiByteToWideChar"? Would the parameters be the same as in the two calls to "WideCharToMultiByte" in the save routine?
Re: UTF-8 editor
Post by admin on Jul 5th, 2010, 6:06pm

on Jul 5th, 2010, 5:05pm, Guest-Ken Down wrote:
I presume that when loading a file back in again, I would need to use the opposite call, "MultiByteToWideChar"?

That's correct.

Quote:
Would the parameters be the same as in the two calls to "WideCharToMultiByte" in the save routine?

MultiByteToWideChar has fewer parameters (six rather than eight). Look it up in your preferred Windows API Reference. APIViewer will give you the declaration in BBC BASIC syntax, but not tell you what the parameters mean (I don't advise guessing)!

See Frequently Asked Question #8:

http://www.bbcbasic.co.uk/bbcwin/faq.html#q8

Richard.