Discussion:
EBCDIC Conversion
(too old to reply)
jacky
2006-08-19 10:47:07 UTC
Permalink
Hi,
I am trying to retrieving the data in a field with type as char(32) .
The database is on i-series The Chinese people enter data in to this
field in Chinese by Shift-Out - then Chinese Character and then
Shift-In .

Now when I am trying to retrive this information, using a vb.net and
updating our global CRM system . Sql Datareader is used to fetch data
using IBM Client access OLEDB Provider.

(1) Method -1

select DBCLOB(fieldname) from library.filename

, I am getting the data with traditional chinese character and not in
simplified Chinese character, but the english text in the filed is
getting retrieved correctly. This works fine for Populating from our
Taiwan system as they use Traditional Chinese.

(2) Method-2

I also tried by taking

select hex(fieldname) from library.filename

and then using following function (getunicodestring) with codepage 936
, after getting byte array using the function Hex2Byte(passing the
string from the select statement), still I couldn't get the proper
chinese characters.


Could any one help me out in getting the text properly.

thnaks & regards
Jacky



--------
Public Function getunicodestring(ByVal arrByte_source As Byte(),
ByVal i_codepage As
Integer) As String
Dim arrByte_unicode As Byte()
Dim enc_source As System.Text.Encoding =
System.Text.Encoding.GetEncoding(i_codepage)
Dim enc_unicode As System.Text.Encoding =
System.Text.Encoding.Unicode
Dim str_unicode As String

arrByte_unicode = System.Text.Encoding.Convert(enc_source,
enc_utf16, arrByte_source)
str_unicode = enc_utf16.GetString(arrByte_UTF16)

Return str_unicode
End Function
----------
Function Hex2Byte(ByVal HexValue As String) As Byte()
Dim X
Dim ByteArray() As Byte
Dim a As System.Text.ASCIIEncoding
HexValue = Replace(HexValue, " ", "")
ReDim ByteArray((Len(HexValue) \ 2) - 1)
For X = 0 To UBound(ByteArray) - 2
ByteArray(X) = CLng("&H" & Mid$(HexValue, 2 * X + 1, 2))
Next
Hex2Byte = ByteArray
End Function
Lou
2006-08-20 18:59:00 UTC
Permalink
That is a tough one. Your SO/SI is the old IBM mixed Single Byte
Character Support (SBCS) and Double Byte Character Support (DBCS)
method, from back in the days when there was no such thing as two-byte
characters. I used to ship a package to customers all over the world,
and we had to use an AS/400 with a special chip set to do the Asian
languages in true double byte. Nowadays everything is capable of
two-byte characters. I know DOS/Windows take that approach, so I assume
that is what is happening on your VB side of things.

I've looked through your code, and I am not sure you understand the
structure of the data in the 32-byte field you are reading. Below is
what it looks like. I apologize in advance if you already knew this:

The data in the 32-byte field is a mixture of 1-byte and 2-byte
characters EBCDIC. You can't just declare it a Unicode string. The
field contains 1-byte characters until a shift-out is encountered.
After the shift-out you have 2-byte characters until a shift in is
encountered. Then you are back to 1-byte characters.

ABC (SO) 2-byte-chars (SI) DEF

Lou
jacky
2006-08-21 02:10:17 UTC
Permalink
Hi Lou,
Thank you very much for your feedback.

I fully agree with what you said and it is is MBCS(Multi Byte character
support) and is in use for almost 20-25 years!.As you rightly pointed
out, the data inside the 32 bit char field is a mix of English and non
english characters(mainly asian), which is enclosed in SO and SI.

So what I tried to do in method -2 , which I mentioned in my first mail
is, splitting the DBCS string part of it and SBCS string part of it by
looking for a combination of SO & SI and then do the
system.text.encoding for the DBCS part of the string. But it is not
helping me as I am getting a mixture of Chinese and 'junk' characters
like ? ( etc.


regards
//jacky
jacky
2006-08-21 13:02:07 UTC
Permalink
Hi,
I also tried using the method codepageconverter class of Client access
automation class cwbx.
Here there is a property called
The datafield value was converted to bytes array using blob ie
select blob(field_name) from library.filename

then called the codepageconverterclass with sourcecode page as 936 and
destination code page as unicode. Still the conversion is not correct
and I am getting errors in return paramter of this class.

Dim o_cwbx_codepage As New cwbx.CodePageConverterClass
Dim o_cwbx_strcon As New cwbx.StringConverter
dim a_source_bytes() as byte
o_cwbx_codepage.DataContainsSOSI = True
o_cwbx_codepage.SourceCodePage = 936
o_cwbx_codepage.TargetCodePage =

cwbx.cwbnlCodePageEnum.cwbnlCodePageUnicode


Could anybody help me to identify the mistake!

//jacky
MastroZambo
2006-09-18 13:55:59 UTC
Permalink
I wrote this function...and it works correctly in my env.

Function CWBXCONV(ByVal a_source_bytes() As Byte) As String
Try
Dim gggg As Object
Dim o_cwbx_codepage As New cwbx.CodePageConverterClass
Dim o_cwbx_strcon As New cwbx.StringConverter
o_cwbx_codepage.DataContainsSOSI = True
o_cwbx_codepage.SourceCodePage =
cwbx.cwbnlCodePageEnum.cwbnlCodePageAS400
o_cwbx_codepage.TargetCodePage =
cwbx.cwbnlCodePageEnum.cwbnlCodePageUTF8
o_cwbx_codepage.Convert(a_source_bytes, gggg)
o_cwbx_strcon.CodePage =
cwbx.cwbnlCodePageEnum.cwbnlCodePageUTF8
Return o_cwbx_strcon.FromBytes(gggg).Trim
Catch ex As Exception
MsgBox(ex.ToString)
Return "Errore conversione dati originali!!!"
End Try
End Function

it need the CWBX reference, of corse !
Post by jacky
Hi,
I also tried using the method codepageconverter class of Client access
automation class cwbx.
Here there is a property called
The datafield value was converted to bytes array using blob ie
select blob(field_name) from library.filename
then called the codepageconverterclass with sourcecode page as 936 and
destination code page as unicode. Still the conversion is not correct
and I am getting errors in return paramter of this class.
Dim o_cwbx_codepage As New cwbx.CodePageConverterClass
Dim o_cwbx_strcon As New cwbx.StringConverter
dim a_source_bytes() as byte
o_cwbx_codepage.DataContainsSOSI = True
o_cwbx_codepage.SourceCodePage = 936
o_cwbx_codepage.TargetCodePage =
cwbx.cwbnlCodePageEnum.cwbnlCodePageUnicode
Could anybody help me to identify the mistake!
//jacky
FT
2006-08-21 14:38:45 UTC
Permalink
Hi Jacky,

I don't think this is a tough one at all..... I wouldn't go the "400
programming" way, unless you enjoy that. Check out the rpm tool for
EBCDIC conversion http://www.brooksnet.com/as400-printing.html#EBCDIC
and RPM has included code pages for Chinese characters...
http://lpd.brooksnet.com/installed-codepages.html

Pretty simple and straight forward. Hope this helps and best of luck!

FT
Post by jacky
Hi,
I am trying to retrieving the data in a field with type as char(32) .
The database is on i-series The Chinese people enter data in to this
field in Chinese by Shift-Out - then Chinese Character and then
Shift-In .
Now when I am trying to retrive this information, using a vb.net and
updating our global CRM system . Sql Datareader is used to fetch data
using IBM Client access OLEDB Provider.
(1) Method -1
select DBCLOB(fieldname) from library.filename
, I am getting the data with traditional chinese character and not in
simplified Chinese character, but the english text in the filed is
getting retrieved correctly. This works fine for Populating from our
Taiwan system as they use Traditional Chinese.
(2) Method-2
I also tried by taking
select hex(fieldname) from library.filename
and then using following function (getunicodestring) with codepage 936
, after getting byte array using the function Hex2Byte(passing the
string from the select statement), still I couldn't get the proper
chinese characters.
Could any one help me out in getting the text properly.
thnaks & regards
Jacky
--------
Public Function getunicodestring(ByVal arrByte_source As Byte(),
ByVal i_codepage As
Integer) As String
Dim arrByte_unicode As Byte()
Dim enc_source As System.Text.Encoding =
System.Text.Encoding.GetEncoding(i_codepage)
Dim enc_unicode As System.Text.Encoding =
System.Text.Encoding.Unicode
Dim str_unicode As String
arrByte_unicode = System.Text.Encoding.Convert(enc_source,
enc_utf16, arrByte_source)
str_unicode = enc_utf16.GetString(arrByte_UTF16)
Return str_unicode
End Function
----------
Function Hex2Byte(ByVal HexValue As String) As Byte()
Dim X
Dim ByteArray() As Byte
Dim a As System.Text.ASCIIEncoding
HexValue = Replace(HexValue, " ", "")
ReDim ByteArray((Len(HexValue) \ 2) - 1)
For X = 0 To UBound(ByteArray) - 2
ByteArray(X) = CLng("&H" & Mid$(HexValue, 2 * X + 1, 2))
Next
Hex2Byte = ByteArray
End Function
jacky
2006-08-23 02:32:35 UTC
Permalink
Hi FT,
Thanks for your input.
But I am not looking for a document converter !

What I am looking for is a field converter which will give me unicode
string from a MBCS string (DBCS + SBCS) at the database level . As I am
directly bringing data from i-series Database into SQL Server CRM
Database. I am interested in converting 20 fields (only they are wraped
in Shift-Out & Shift-In) out of my 300 fields. So your suggested
solution will not help me.


regards
//jk
Continue reading on narkive:
Loading...