TD 5.1 and ORACLE characterset problem critical

General discussion forum about all databases other than SqlBase.
Ralf.Niebling

TD 5.1 and ORACLE characterset problem critical

Post by Ralf.Niebling » 08 Nov 2007, 11:38

We have found a major Problem, as I described on the DEVCon in München.







Oracle have following behavior:



If you use the same Characterset on Client and Server (Listener and

Database) then ORACLE not convert any character.



At the end you save the Codepage from your Clientwindows in the Database.



Now with TD 5.1 we figure out that the TD 5.1 uses the Characterset from the

ORACLE Client not from the Windowsenviroment to convert the Data.



At default ORACLE uses the GERMAN_GERMANY.WE8ISO8859P1 as default. This

Characterset at least dont have any Euro character!!! So the Euro character

from the Database will not be converted correctly!!



But there is no way to change the characterset in ORACLE. It must be a way

to use the windows characterset for TD5.1.







Ralf



i have post this Message in the newsgroup also

Jean-Marc Gemperle

Re: TD 5.1 and ORACLE characterset problem critical

Post by Jean-Marc Gemperle » 14 Nov 2007, 18:14

Hi Ralf,



I will contact you to discuss about this issue. On 10g VISTA US machine the default for the ORACLE instance is WE8MSWIN1252. IF my NLSL_ANG is GERMAN_GERMANY.WE8ISO8859P1 then I have no issues with the EURO symbol. I guess this is as expected from your message. Also if both server and client are WE8MSWIN1252, I used NLS_LANG=American_America.WE8MSWIN1252 I also have no problem.

I need to test your issue after creating an instance that uses WE8ISO8859P1 charset.



Jean-Marc

Jean-Marc Gemperle

Re: TD 5.1 and ORACLE characterset problem critical

Post by Jean-Marc Gemperle » 14 Nov 2007, 19:22

Ok just got a WE8ISO8859P1 ORACLE instance installed and indeed no EURO symbol with it, this with all NLS_LANG bellow with nvarchar2 or varchar2. USING NATIVE OR OLEDB connectivity...



GERMAN_GERMANY.WE8ISO8859P1

GERMAN_GERMANY.UTF8

GERMAN_GERMANY.WE8MSWIN1252



I was curious about OLEDB results but in thise case it is consistent...



You say



"Now with TD 5.1 we figure out that the TD 5.1 uses the Characterset from the ORACLE Client not from the Windowsenviroment to convert the Data."



I will try to get an answer for that.... I wonder why TD would do this...the router maybe...but in case of OLEDB no router, no SQLBASE API in the way...



That makes me wonder how a unicode client application other than TD would behave with this issue.



Jean-Marc

Jean-Marc Gemperle

Re: TD 5.1 and ORACLE characterset problem critical

Post by Jean-Marc Gemperle » 15 Nov 2007, 16:43

Hi



I tested that further and will enter an issue and talk about this

Indeed if using

NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1 and the same charset on the ORACLE instance. Connecting with SQLTALK of the version 4.2 that is using the SBAPI of the version 9.01 returns correctly the EURO symbol that was inserted with SQLPLUS.



If using SQLTALK of TD 510 with the unicode SBAPI the EURO is lost.

With OLEDB in a TD application the same can be seen.



So I tested with MS ROWSETVIEWER and with it I also loose the EURO symbol. But I'm not sure if the version I'm using is unicode. I tested then with ODBCTEST ( there is a UNICODE version out there on internet) but selecting returned me no result at all, with rowsetviewer for a string containing some chars + the euro the symbol was lost with ?... Same if you use EXCEL 2007 via MSQUERY query!



So before telling you that maybe this is an issue with UNICODE in general, I've try then with SQLSERVER 2005 LINK server and there it works!



Anyway I'm not sure what is the problem but will enter an issue for this.



Cheers

JM

Jean-Marc Gemperle

Re: TD 5.1 and ORACLE characterset problem critical

Post by Jean-Marc Gemperle » 16 Nov 2007, 09:47

Hi,



I have entered the following log TD-4018 to address this issue.

I'm curious to know though what is the problem besides the anoyance to make a dump of the database and load it into a ORA INSTANCE that uses

WE8MSWIN1252 charset? Would you loose it during exp/imp too?



Jean-Marc

johi

Re: TD 5.1 and ORACLE characterset problem critical

Post by johi » 16 Nov 2007, 15:29

Hi Ralf, I would like to explain my idea about the reason of this problem and a possible solution.



Your Oracle-Database is configured with character set WE8ISO8859P1, that is ISO-8859-1. This encoding does not contain the Euro symbol.

All the codes between 128 and 159 are reserved control characters in ISO-8859-1.



As you mentioned, if your Oracle client has NLS_LANG set to WE8ISO8859P1, no conversion is done, the single byte codes from the old Gupta application are transferred transparently between database and client application. But in reality your old client is using Codepage 1252 (setting is implied from the regional setting "language for non-unicode programs") which represents the Euro symbol by code 128 (hex 0x80), and this code is stored in the database. But in ISO-8859-1, code 128/0x80 represents the control code PAD (Padding character), see http://en.wikipedia.org/wiki/Iso-8859-1

So you get away with storing the Euro symbol as a different character in the database, the user enters a Euro symbol and gets back a Euro symbol from the database.



But now TD 5.1 uses UTF-16 internally, it does not care about the NLS_LANG setting, and the Euro symbol cannot be stored in your WE8ISO8859P1-database anymore.



I believe that your database is incorrectly labeled WE8ISO8859P1 while it contains character data encoded in WE8MSWIN1252, while TD 5.1 processes the Euro symbol correctly.



I think changing the database encoding from WE8ISO8859P1 to WE8MSWIN1252 should solve your problem:

alter database character set ... ( internal_use ? )

Check Metalink for instructions and caveats, test for side effects.



Greetings from Dornbirn, Austria

Joe

Jean-Marc Gemperle

Re: TD 5.1 and ORACLE characterset problem critical

Post by Jean-Marc Gemperle » 16 Nov 2007, 18:10

Hi Joe,



Many thanks for sharing some lights on this interesting issue, get to know more about oracle this way ;-)



I like to know if you imply people not longer using WE8ISO8859P1 in favor of WE8MSWIN1252.



I've created a database with WE8ISO8859P1 and NLS lang with the same setting and I could reproduce the problem.



So as you stated I altered the database with



SQL>shut

SQL> startup restrict

SQL> ALTER DATABASE CHARACTER SET WE8MSWIN1252;

Database altered.

shut

SQL> startup



SQL> select * from nls_database_parameters where parameter='NLS_CHARACTERSET';



PARAMETER

--------------------------------------------------------------------------------

VALUE

--------------------------------------------------------------------------------

NLS_CHARACTERSET

WE8MSWIN1252



Now when I select the existing previous table on which I inserted EUR using WE8ISO8859P1 I get an INVERTED ?. the NLS_LANG beeing configured as WE8ISO8859P1, if I change it to WE8MSWIN1252 I get ? (not inverted :-) ) I guess this is expected as it was inserted with WE8ISO8859P1



So altering the CHARACTER SET does not seems to help for people that used originaly WE8ISO8859P1. The odd things is that is seems from my test that all tools seems to have that problem ie EXCEL 2007 and MSQUERY or ROWSETVIEWER



EXCEPT SQLSERVER 2005 when doing a linked server connection via the oracle provider...



About

"But now TD 5.1 uses UTF-16 internally, it does not care about the NLS_LANG"



I'm not that sure about that and what that imply exactly, I need to check this, ie: you mean it uses then the REGIONAL setting of the OS for example to determinate insertion of DECIMAL and allow COMA or DOT decimal separation depending the regional setting and does not care about the NLS_LANG (specificaly NLS_NUMERIC)anymore?



Many thanks again

Jean-Marc

johi

Re: TD 5.1 and ORACLE characterset problem critical

Post by johi » 17 Nov 2007, 16:38

recommendation for Oracle database character set: if at all possible, use the current Unicode standard AL32UTF8. If you have to stick with single byte encoding because of other applications accessing the DB, and you need to store the Euro sign or other characters from Codepage 1252 not included in iso-8859-1, the correct setting is WE8MSWIN1252, not WE8ISO8859P1.



About

"But now TD 5.1 uses UTF-16 internally, it does not care about the NLS_LANG".

I was talking about the encoding only, I should have written "NLS_LANG encoding". The other NLS settings should be processed correctly.

Anyway, Oracle client has its own settings in parallel to the Windows regional settings, problems are to be expected if they are not synchronized.



Encoding setting in Windows:

1) ANSI codepage for non Unicode capable apps using SBCS (single byte character set).

2) For programs using Unicode (UTF-16) internally, there is only one encoding defined by the Unicode standard.



ANSI Code Page is the Windows Codepage for Non-Unicode applications and related API functions. It is set according to the table http://www.microsoft.com/globaldev/nlsweb/default.mspx

by choosing the language for non-unicode applications in system settings - regional options.

A change requires a reboot, ACP is stored in the registry key

SYSTEM\CurrentControlSet\Control\Nls\CodePage\ACP



I am currently porting our C programs from SBCS to Unicode, using OCI (Oracle Call Interface C-API) with UTF-16. UTF-16 encoding is used internally by Win32, Java and OCI, and I am quite sure also by TD 5.1



With UTF-16 on the client side and SBCS database, OCI always has to convert between database encoding and Unicode.

The Euro sign is not included in WE8ISO8859P1, there is no way to store or retrieve it using normal sql statements. If the "Euro sign" is stored as Code 128 / 0x80 in WE8ISO8859P1,

this control code will be converted to Unicode \u0080, which is not printable and different from the Euro sign \u20ac.

You wrote that you have one application where the Euro sign works correctly: There must be a conversion to single byte encoding somewhere in between, test if you can store and retrieve for example cyrillic characters.



About changing database characterset:



What Oracle version where you using with your test?

Please check the code you have entered in the database with sqlplus immediately after inserting it (DB-Encoding WE8ISO8559P1) "select ascii(euro_column) from test_table"

and after the "alter database character set", "alter database character set" might have modified the code.

Maybe you have to add "internal_use" in order to instruct "alter database characer set" not to touch the data stored internally.

Can you enter and retrieve the Euro symbol with TD 5.1 and DB-encoding WE8MSWIN1252?

At the moment I do not have a database to play with, I will try later.

Jean-Marc Gemperle

Re: TD 5.1 and ORACLE characterset problem critical

Post by Jean-Marc Gemperle » 20 Nov 2007, 13:52

Hi Joe,



That is just very informative and learned couple things again. I will try your suggestions at the end of the email and let you know. I'm on VISTA with ORACLE 10g 10.2.0.3



Cheers

JM

johi

Re: TD 5.1 and ORACLE characterset problem critical

Post by johi » 21 Nov 2007, 12:42

Sorry, I was wrong stating that NLS_LANG setting is ignored by TD 5.1.

If you search the TD docs for NLS_LANG, you find the following incomplete sentence:

----

Passing Unicode data with Oracle

With the Oracle database you need to set the NLS_LANG environment variable on the client side in order to pass unicode data.

----

I am quite sure it should read set NLS_LANG encoding to AL32UTF8.



I have tested the following select with database encoding WE8MSWIN1252, WE8ISO8859P1 and AL32UTF8.

They only work as expected with NLS_LANG encoding set to AL32UTF8.



Set sEuro = '€'

Set bRet = SqlPrepareAndExecute(hSql, 'Select :sEuro, unistr(\'\\20ac\'), \'€\', ascii(\'€\') from dual into :sSymbol1, :sSymbol2, :sSymbol3, :sAscii')

Set bRet = SqlFetchNext(hSql, nRetVal)



With WE8MSWIN1252, Ascii('€') is 128.



With AL32UTF8, Ascii('€') is 14844588, an encoding invented by Oracle.

If you convert the number to hex, you get E2 82 AC, the UTF8 representation of the Euro-currency sign.



With WE8ISO8859P1 you get the inverted question mark ¿ with code 191, because the Euro sign cannot be cvonverted from Unicode to the DB character set, as I explained before.



sSymbol1 is empty with non-Unicode-DB (WE8MSWIN1252), looks like a bug. I run the select in the IDE and display the variables with Alt+8, I have to restart the IDE in order to get reproducible behaviour for sSymbol1.



sSymbol3 is set to "€", if NLS_LANG client encoding is set to WE8MSWIN1252, the result of taking the 3 bytes of the UTF-8 encoding as separate characters encoded in windows-1252 codepage.



I don't understand why TD 5.1 does not set the correct Unicode encoding through an OCI call when there is only one correct encoding, giving the users plenty of opportunities to make a mistake.

From my own experience with OCI programming, I suspect that the select statement is passed to OCI encoded as UTF8, not UTF16. Your C/C++ programmers might have a look at the Oracle OCI sample program cdemouni.c for an example how to use UTF16 consistently in the application.

johi

Re: TD 5.1 and ORACLE characterset problem critical

Post by johi » 22 Nov 2007, 17:07

With Sqltalk (use Lucida Console font instead of Fixedsys) the statement

select unistr('\20ac'), '€', ascii('€') from dual

works as expected:

With DB-charsets WE8MSWIN1252 and AL32UTF8, the Euro sign is displayed, with WE8ISO8859P1 you see inverted question marks, Code 191.

SqlTalk does not require client NLS_LANG-encoding to be set to AL32UTF8, as it is the case with TD 5.1.

(native oracle router was used )

Jean-Marc Gemperle

Re: TD 5.1 and ORACLE characterset problem critical

Post by Jean-Marc Gemperle » 22 Nov 2007, 23:10

Hi Joe,



There is a lot of information here:-) many combination to test... And I tested also. I still need to go back to you on the suggestion you gave about altering the DB character set.



To your question

Can you enter and retrieve the Euro symbol with TD 5.1 and DB-encoding WE8MSWIN1252?



ORACLE 10gR2 on VISTA:

YES when the DB is WE8MSWIN1252 and the NLS_LANG is

UTF8 or AL32UTF8 insert/fetching with TD 5.1 is OK..

However with NLS_LANG=WE8MSWIN1252 I get '€'...

Now if I select on SQLPLUS what has been inserted with TD5.1 I do get the € symbol... So it seems to me TD 5.1 does have a problem when fetching (not inserting), same result when selecting with SQLTALK. I believe I NEED TO ENTER AN ISSUE THERE and WILL.



Then there is the issue of people having existing WE8ISO8859P1 database with € symbol where with TD 4.2 ANSI this did not cause problem. Those people would have trouble in any case to insert or fetch that symbol regardless there NLS_LANG. I entered an issue for that, also I found out that SQLSERVER2005 linked server to ORACLE OLEDB could FETCH OK the EURO in a WE8ISO8859P1 database with NLS_LANG=WE8ISO8859P1! So it should be possible to do it...that is what I assume. Still I would go for your recommendation to STAY AWAY FROM WE8ISO8859P1.



Now what is the best to convert the database is also another issue...I need to check your suggestion from previous mail.



So as far as I can see it is OK to use the EUR symbol in TD 5.1 on a non UNICODE ORACLE instance as long as it is WE8MSWIN1252 and NOT WE8ISO8859P1 and that NLS_LANG is UNICODE UTF8 or AL32UTF8



Thanks again Joe for the head up on this issue and all the details your provide there.



I made sure this thread get some attention from the US especially the last paragraph.



Cheers

Jean-Marc

johi

Re: TD 5.1 and ORACLE characterset problem critical

Post by johi » 23 Nov 2007, 12:38

I would like to repeat my suggestion hopefully with a better explanation:



If you have an Oracle-DB with encoding WE8ISO8859P1, but in fact the data also contains codes from WE8MSWIN1252 (128 - 159), you have to correct the encoding label WE8ISO8859P1 to WE8MSWIN1252 if you want to access those characters encoded as 128 - 159 from an Unicode client. See Metalink note 225938.1 "Database character set H... check" for details and caveats, you may need help from Oracle support.

If you have codes 128-159 (Euro etc), the DB-encoding should also be corrected from WE8ISO8859P1 to WE8MSWIN1252 before exporting the data for import into an AL32UTF8-DB.



This problem with the Euro sign etc. is not specific to TD 5.1, you get away with an incorrectly labeled encoding iso-8859-1 as long as you stay in the SBCS world. At the point where a conversion to or from Unicode occurs, the problem will show up.

Return to “General Discussion”

Who is online

Users browsing this forum: [Ccbot] and 0 guests