[Ace-users] [tao-users] client configuration: ensuring UTF8 encoding

Wed Nov 7 11:44:00 CST 2007

Hi Vance,

Thanks for the PRF.

It turns out that due to an amazing coincidence, just yesterday I found 
out that TAO versions other than OCI TAO 1.4a do not process the UTF8 
codeset correctly.

The problem is that TAO ships with a file, 
ACE_wrappers/ace/Codeset_Registry_db.cpp, which carries the definition of 
recognized codeset values happens to be lacking the definition of UTF8.

Since you are using an unsupported version of TAO, you need to generate 
the update yourself: in ACE_wrappers/apps/mkcsregdb, create a text file 
similar to the existing cs_test.txt file that contains all the codeset 
definitions you need, including Latin1, UTF-8, UTF-16, Unicode, etc. You 
can find all these in the large code_set_registry1.2g.txt file. You can 
actually generate a new Codeset Registry db file using 
code_set_registry1.2g.txt, but that will give you all known codesets, 
which you probably don't need.

Anyway, run mkcsregdb, and rebuild ACE.

More below:

Vance Maverick wrote:
> Hello,
> 
> I'm using TAO 1.5.4 as the client ORB, and JacORB 2.3.0 as the server, on an
> FC6 Linux box.  I'd like to make sure UTF8 encoding is used for string
> transmission.  This is the default for JacORB (now that I've upgraded) --
> executing the debugging entry point org.jacorb.orb.giop.CodeSet, I see
> 
>  System file encoding property: UTF-8
>  Cannonical encoding: UTF8
>  Default WChar encoding: UTF16
> 
> (among other outputs, below).  I can safely send and receive a test string
> with Japanese characters from my Java client (also using JacORB).
> 
> My question is, how do I configure TAO to make sure it handles its end
> correctly? 

"Handles" is a vague term. Since you are passing Japanese characters, I 
suppose you want use utf8 natively. There are two ways to do that, once 
you've fixed ACE as I indicate above. You can set a an optionion in 
svc.conf, or you can hardwire it in the codeset manager class.

See http://ociweb.com/cnb/CORBANewsBrief-200209.html for more information.

> 
> Right now, I'm passing "-ORBNegotiateCodesets 1" to CORBA::ORB_init.  (And
> this did force me to link to libTAO_Codeset.)  

This is redundent. Unless specially built, TAO defaults to negotiating 
codesets.

> However, this is not giving
> the desired result -- when I send a string with Japanese characters, my Java
> code on the server side doesn't receive the right decoded (UTF16) value. 
> 

UTF16 is a wchar codeset. Is your interface using strings or wstrings as 
the argument type? If you are using string, then you will want set TAO to 
use UTF-8 natively as I mentioned above.

Finally, I expect to patch the Codeset Registry db file in the DOC 
group's TAO repository.  Once that is done, you should be able to obtain 
a  copy of that file, which will be compatible with your version of ACE, 
if you don't want to mess with generating it yourself.

Regards,
Phil

-- 
Phil Mesnier
Principal Software Engineer,    http://www.ociweb.com
Object Computing, Inc.          +01.314.579.0066