Encoding the Database

Comments, bug reports, discussions on CCDICT.
Benjamin Barrett

Re: Encoding the Database

Postby Benjamin Barrett » Thu Oct 23, 2003 5:54 pm

Doh! Now I feel a lot better :)

At this point in time, I'm going to just look at some sound correspondences of the modern languages to get my feet wet.

If I have any updates or additional information, I'll write back to the forum.

Thanks for all the help, Thomas and Dylan.

Silent_Lamb

Re: Encoding the Database

Postby Silent_Lamb » Wed Oct 06, 2004 9:26 am

I'm using OpenOffice 1.1.3 and the Chinese characters are missing. All I get are funny looking symbols instead. How to fix this? Any ideas?

Thomas Chin

Re: Encoding the Database

Postby Thomas Chin » Thu Oct 07, 2004 6:41 am

What file did you open?

sunwukong
Posts: 6
Joined: Wed Jan 17, 2007 3:45 am
Location: Ione, WA

Port of Unihan to Excel

Postby sunwukong » Wed Jan 17, 2007 3:57 am

I've had some success porting the Unihan DB to Excel. So far I'm able to show the 4 digit U+nnnn characters but am having trouble with the 5 digit characters.

Im using this VB macro to resolve the characters

Sub aaa()
'20050626, sunwukong (AT) povn(dot)com (Pat kirol)
'you put the 4 digit unicode values in col b
'and run this script and it will insert the characters in collumn c
'in this case n is 1 to 6 but you would have to adjust 6 to the 'last row of unicode 4 digit numbers.

For n = 1 To 6
vvv = Cells(n, 2).Value
Debug.Print n
Cells(n, 3).Value = ChrW("&H" & Cells(n, 2))
Next n
End Sub

What is even more interesting is that you can not get the characters to display consistantly and must edit the font used for each character. In my case I am editing in a lookup code for the kTaiwanTelegraph field (CCT or CTC). Im up against a brick wall because CCT=5983 is a 5 digit unicode value and this routine will not handle it.

tfc.chin
Posts: 50
Joined: Wed Mar 09, 2005 12:07 pm

Re: Port of Unihan to Excel

Postby tfc.chin » Wed Jan 17, 2007 8:36 am

sunwukong wrote:I've had some success porting the Unihan DB to Excel. So far I'm able to show the 4 digit U+nnnn characters but am having trouble with the 5 digit characters.


If you are using a MS Windows system you need to insert 5-digit (U+nnnnn) characters as a surrogate pair (recent versions of Office support them).

'*----------------------------------------------------------*
'* Name : vbShiftRight *
'*----------------------------------------------------------*
'* Purpose : Shift 32-bit integer value right 'n' bits. *
'*----------------------------------------------------------*
'* Parameters : Value Required. Value to shift. *
'* : Count Required. Number of bit positions to *
'* : shift value. *
'*----------------------------------------------------------*
'* Description: This function is equivalent to the 'C' *
'* : language construct '>>'. *
'*----------------------------------------------------------*
Public Function vbShiftRight(ByVal Value As Long, _
Count As Integer) As Long
Dim i As Integer

vbShiftRight = Value

For i = 1 To Count
vbShiftRight = vbShiftRight \ 2
Next

End Function
'*----------------------------------------------------------*
'* Name : WriteSurrogate *
'*----------------------------------------------------------*
'* Purpose : Returns a surrogate pair of ISO10646:1993 *
'* : CJK Extension B codepoints *
'*----------------------------------------------------------*
'* Parameters : Codepoint Required. 5-digit string to be *
'* : converted. *
'*----------------------------------------------------------*
'* Description: Based on the C++ conversion algorithm. *
'*----------------------------------------------------------*
Function WriteSurrogate(Codepoint as String) as String
Code = Val("&H" + Codepoint)
lowsur = vbShiftRight(Code, 10) + &HD7C0
highsur = &HDC00 Or Code And &H3FF
WriteSurrogate = ChrW(Val(lowsur)) + ChrW(Val(highsur))
End Function

Did not test the code for typos.

Good luck,

Thomas

sunwukong
Posts: 6
Joined: Wed Jan 17, 2007 3:45 am
Location: Ione, WA

5 digit unihan

Postby sunwukong » Wed Jan 17, 2007 12:15 pm

I found the character by browsing the Private area of Ming (for ISO 10646). This is how frustrated I was.
Im using Win 2000 and Office 2000 and even though I managed to insert those two functions into an excel worksheet it does not seem to be working.
Im hesitant to upgrade to XP and Office 2003 even though I have them.

I was able to paste the character into this test worksheet from word and display the character.

If I use your writesurrogate routine and then paste the spreadsheet cells into this message, there it is I can see it in Firefox, but not in Excel. Why can firefox figure this out and excel fails?

2775E 𧝞 &H2775E

sunwukong
Posts: 6
Joined: Wed Jan 17, 2007 3:45 am
Location: Ione, WA

Unicode 5 Digit

Postby sunwukong » Wed Jan 17, 2007 1:52 pm

Thomas thanks for your help. Maby Im in over my head. Do I need both of these functions to get this to work? As I mentioned in a previous post Im using Excel 2000 and Windows 2000. Your WriteSurrogae(nnnnn) routine seems to be working (for Firefox and IE) but not for MS Word and Excel. So my guess is Win 2K cant handle this or ???

tfc.chin
Posts: 50
Joined: Wed Mar 09, 2005 12:07 pm

Re: Unicode 5 Digit

Postby tfc.chin » Wed Jan 17, 2007 2:24 pm

sunwukong wrote:Thomas thanks for your help. Maby Im in over my head. Do I need both of these functions to get this to work? As I mentioned in a previous post Im using Excel 2000 and Windows 2000. Your WriteSurrogae(nnnnn) routine seems to be working (for Firefox and IE) but not for MS Word and Excel. So my guess is Win 2K cant handle this or ???


I am not sure whether Office 2000 can handle characters in supplementary planes. According the official specifications it cannot. Office XP and 2003 have no problems with handling supplementary planes.

P.S. WriteSurrogate calls the other function as a subroutine, so you need both functions.

Thomas

sunwukong
Posts: 6
Joined: Wed Jan 17, 2007 3:45 am
Location: Ione, WA

Unicode and plane 1 + characters

Postby sunwukong » Fri Feb 16, 2007 8:51 pm

Oddly MS Office (2000), MS Word on WIN 2K can show some plane 1 and 2 characters but Excel can not. Under WIN XP the reverse seems to be true, Excell 2003 handles plane 2 characters but Word does not. Strange.

Ive finished indexing the CTC with the ShouWei HaoMa system and added in the trigraphs if anyone is interested.

Anyone know where I can find the trigraphs for the Mainland (STC) version in electronic format?


Return to “CCDICT”

Who is online

Users browsing this forum: Yahoo [Bot] and 4 guests