UTF16 and mdzTextEditor

Recently during integration of UTF-16 support in mdz_containers I came across pretty popular Notepad++ to create text files with different Unicode BOMs. And it seems that Notepad++ (at least last version 7.8.4) lacks a couple of useful things.

  • Unicode surrogate characters: if file is UTF-16 (doesn’t matter LE or BE) the characters are shown like this:

If file is UTF-8 everything is ok:

  • Other problem is navigation/editing Unicode “grapheme clusters” which should be treated as a single character. Looks like it is possible to navigate inside of them in Notepad++ and consequently break them during editing. It looks like this:
  • One more inconvenience happened after I tried to edit ca. 500 MB large text file somewhere in the beginning of it. Notepad++ just hanged without any success.

Maybe these are limitations of Scintilla (editing component which Notepad++ is based on) and not of Notepad++ itself.

This brought me to an idea of writing own text editor/editing component for Windows (“mdzTextEditor” ?), which could be also a good demonstration of mdz_containers abilities concerning texts/data handling correctness and speed…

Leave a Reply

Your email address will not be published. Required fields are marked *