| Home News Profile Contact Half-Life Music PCASTL Computer Science Videos Readings OpenGL Elements C64 sids Links | 
| ICU Example Boyer-Moore Merge Sort Computers | 
| ICU C++ use example on MacOS When I copy files from my Mac to my NAS, their names with composed accentuated characters are automatically decomposed for the copies. This causes my backup tool to take thoses copies as files without originals. See Unicode Normalization Forms for composed characters examples. As a solution, I wrote a program that renames all the original files to their decomposed form. Thus, the automatic decomposition then changes nothing. Here are the first steps that led to this solution. Install of icu4c in Brew: 
brew install icu4c
 Install of pkgconf in Brew: 
brew install pkgconf
 Display of /opt in Finder: 
sudo chflags nohidden /opt
 Setting PKG_CONFIG_PATH to the right value: 
PKG_CONFIG_PATH=/opt/homebrew/Cellar/icu4c@77/77.1/lib/pkgconfig export PKG_CONFIG_PATH transliterate example: #include <iostream> #include <string> #include <unicode/unistr.h> #include <unicode/translit.h> int main(void) { std::string init("t\xC3\xA4st"); // täst icu::UnicodeString ustrc = icu::UnicodeString::fromUTF8(init.c_str()); const char16_t *ustrc_buf = ustrc.getBuffer(); for (int i = 0; i < ustrc.length(); i++) { std::cout << std::hex << ustrc_buf[i] << " "; } std::cout << std::endl; UErrorCode status = U_ZERO_ERROR; icu::Transliterator *myTrans = icu::Transliterator::createInstance("Any-NFD", UTRANS_FORWARD, status); myTrans->transliterate(ustrc); for (int i = 0; i < ustrc.length(); i++) { std::cout << std::hex << ustrc_buf[i] << " "; } std::cout << std::endl; std::string result; icu::StringByteSink<std::string> bs(&result); ustrc.toUTF8(bs); return 0; } 
Explanation: To build: 
c++ -o example example.cpp -std=c++17 `pkg-config --libs --cflags icu-uc icu-i18n`
 
The icu-uc parameter is necessary for the data types and the icu-i18n parameter 
is necessary to link with createInstance and transliterate. The program displays: 
74 e4 73 74 
Because 0x00E4 is the UTF-16 encoding of the composed ä. |