Home page Home page Home page Home page
Pixel Header R1 C1 Pixel
Pixel Header R2 C1 Pixel
Pixel Header R3 C1 Pixel
By Sprezz | Monday 11 June 2018 12:41 | 1 Comments
We seem to have had a flurry of UTF-8 queries lately at Sprezz Towers and, as there isn't a lot of information published about this, we thought it might be useful to comment a little about this subject.

For many years, OpenInsight (OI) didn't support UTF-8 and this led to difficulties with people who needed to store foreign language characters; which also just happened to be system delimiters for OI. So, for example:

Delimiter ANSI Character
@Rm 255 ÿ
@Fm 254 þ
@Vm 253 ý
@Svm 252 ü
@Tm 251 û
@Stm 250 ú

Quite the challenge if you're working with European languages.

To get around this issue, Rev introduced something called CHARMAP - a special system wide property that essentially relied upon the developer choosing characters they didn't want to use lower down in the ANSI table and using those to store the affected characters. Then at display time, swapping the lower characters back to the higher.

For example, if you wanted to be able to store ý (and what red-blooded occupant of Český Heršlág wouldn't?), you might decide that you're never going to need to use ™ (ANSI 153) and so instruct the CHARMAP to map 253 to 153. You'd enter Český Heršlág and OI would store Česk™ Heršlág - then when you asked it to display the town name, OI would convert the ™ back to ý before displaying it.

Clunky, but effective.

With UTF-8 we no longer have this problem, ý is actually stored as the multi byte string 00FD and so it doesn't stand a chance of being confused with ANSI 253 (although ironically it is just ANSI 253 with an ANSI 0 stuck in front). BUT, UTF-8 doesn't read minds. It doesn't know that when we ask it to display an ANSI 153 we really mean 00FD. And OI doesn't know that either - you might have intended to store a trademark symbol. So, if you are planning on moving data that previously used CHARMAP into a UTF-8 environment, you need to do some groundwork to avoid tying yourself (and anybody else trying to help you) into a series of complicated knots.

  • Use the SETNOOFDELIMITERS routine to set the value to be used to 0
  • For every data row in the application
    • Read the row
      • Use Loop/Remove to grab the individually delimited pieces of data
      • Convert your CHARMAPPED characters back to their original value
      • Recast the string as UTF-8 using the ANSI_UTF8 function
      • Add back into the new string you are building
    • When complete, write this string back over the original row. This means it will look strange in ANSI I because it will contain multi-byte characters, but it will work properly in the UTF-8 OI app.
  • Fire up your UTF-8 enabled OI app
  • Remove anything to do with CHARMAP
  • Attach your data and all will be well with the world

Hopefully this information will help you in the smooth transition of your data from ANSI to UTF-8.

And if you're new to the ways of UTF-8 don't forget this handy article for programming in UTF-8.


  • In OI 9 the UTF-8 character 'ý' is stored as the multi-byte sequence C3BD - not 00FD. Has that changed in OI 10?
    In OI 9.x, all bytes in a UTF-8 multi-byte character are high order (> 0x80). There is a good overview of OI's implementation of UTF-8 in the Unicode.chm file.

    By Blogger Matt Crozier, At 11 June 2018 at 21:46  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home

Pixel Footer R1 C1 Pixel