UTF16 and mdzTextEditor

Recently during integration of UTF-16 support in mdz_containers I came across pretty popular Notepad++ to create text files with different Unicode BOMs. And it seems that Notepad++ (at least last version 7.8.4) lacks a couple of useful things.

  • Unicode surrogate characters: if file is UTF-16 (doesn’t matter LE or BE) the characters are shown like this:

If file is UTF-8 everything is ok:

  • Other problem is navigation/editing Unicode “grapheme clusters” which should be treated as a single character. Looks like it is possible to navigate inside of them in Notepad++ and consequently break them during editing. It looks like this:
  • One more inconvenience happened after I tried to edit ca. 500 MB large text file somewhere in the beginning of it. Notepad++ just hanged without any success.

Maybe these are limitations of Scintilla (editing component which Notepad++ is based on) and not of Notepad++ itself.

This brought me to an idea of writing own text editor/editing component for Windows (“mdzTextEditor” ?), which could be also a good demonstration of mdz_containers abilities concerning texts/data handling correctness and speed…

About this blog

I’m Maksym Dzyubenko and I enjoy doing programming already more than 20 years. In this blog I’d like to describe and discuss things we are making in maxdz Software GmbH

We are keen to develop solutions which are 1) highly performant 2) highly portable 3) with a small memory footprint 4) with as little as possible external dependencies.

We strive to develop fastest and tiniest libraries/components on the market while being portable. Therefore code of all our basis libraries/components comply with C89/C90 standard. Where possible OS-independent APIs like POSIX are used.

Such solutions not only spare processor ticks (which actually may be very handy for mobile and embedded devices) but also significantly spare costs. They can run even on very old (or minimalistic) hardware. And they are green. 🙂