|
|||||||||||||||||||||||||
We seem to have had a flurry of UTF-8 queries lately at Sprezz Towers and, as there isn't a lot of information published about this, we thought it might be useful to comment a little about this subject.
For many years, OpenInsight (OI) didn't support UTF-8 and this led to difficulties with people who needed to store foreign language characters; which also just happened to be system delimiters for OI. So, for example:
Quite the challenge if you're working with European languages. To get around this issue, Rev introduced something called CHARMAP - a special system wide property that essentially relied upon the developer choosing characters they didn't want to use lower down in the ANSI table and using those to store the affected characters. Then at display time, swapping the lower characters back to the higher. For example, if you wanted to be able to store ý (and what red-blooded occupant of Český Heršlág wouldn't?), you might decide that you're never going to need to use ™ (ANSI 153) and so instruct the CHARMAP to map 253 to 153. You'd enter Český Heršlág and OI would store Česk™ Heršlág - then when you asked it to display the town name, OI would convert the ™ back to ý before displaying it. Clunky, but effective. With UTF-8 we no longer have this problem, ý is actually stored as the multi byte string 00FD and so it doesn't stand a chance of being confused with ANSI 253 (although ironically it is just ANSI 253 with an ANSI 0 stuck in front). BUT, UTF-8 doesn't read minds. It doesn't know that when we ask it to display an ANSI 153 we really mean 00FD. And OI doesn't know that either - you might have intended to store a trademark symbol. So, if you are planning on moving data that previously used CHARMAP into a UTF-8 environment, you need to do some groundwork to avoid tying yourself (and anybody else trying to help you) into a series of complicated knots.
Hopefully this information will help you in the smooth transition of your data from ANSI to UTF-8. And if you're new to the ways of UTF-8 don't forget this handy article for programming in UTF-8. |
|||||||||||||||||||||||||
| |||||||||||||||||||||||||
1 Comments:
In OI 9 the UTF-8 character 'ý' is stored as the multi-byte sequence C3BD - not 00FD. Has that changed in OI 10?
In OI 9.x, all bytes in a UTF-8 multi-byte character are high order (> 0x80). There is a good overview of OI's implementation of UTF-8 in the Unicode.chm file.
By Matt Crozier, At 11 June 2018 at 21:46
Post a Comment
Subscribe to Post Comments [Atom]
<< Home