This is the last installment of the digest of the topic of “String as List”.
Hasan Veldstra opined that lack of library support make string operation difficult. He was working on an Erlang Unicode string library based on ICU (http://www.icu-project.org/) for the past week, and expected to release an alpha version soon.
Zvi suggested that in the case of 64 bit implementation of Erlang, a “String as list” representation is wasteful.
Richard A. O’Keefe seems to be annoyed by this suggestion (and other comment on Lisp machine), and advocated that if space is a concern, the programmer can use an alternative representation such as binary.
Robert Virding clarify the leex (function token/2 and token/3) is re-entrant. However, yecc is not and must be fed with a completed list of token.
Dmitrii ‘Mamut’ Dimandt gave an example of why “string as list” is bad idea, because:
“This is only true for ASCII text. Non-ASCII gets screwed up badly.
lists:reverse(“text”) %% gives you “text”
lists:reverse(“текст”) %% Russian for text becomes [130,209,129,209,186,208,181,208,130,209] which is clearly not I wanted”
“This is what you should have in your list:
1> Text = [16#442, 16#435, 16#43a, 16#441, 16#442].
[1090,1077,1082,1089,1090]
You can convert it to utf8 for output
2> xmerl_ucs:to_utf8(Text).
[209,130,208,181,208,186,209,129,209,130]
And you can reverse it and convert that to utf8.
3> xmerl_ucs:to_utf8(lists:reverse(Text)).
[209,130,209,129,208,186,208,181,209,130]”
A positive message from Bjorn Gustavsson. He confirmed that is possible to use lists of Unicode characters easily. In Wings 3D, he have implemented a limited support for Unicode.
Joe Armstrong weighted in on 19 Feb. He suggested “One problem with strings unicode, regexps etc. is how you input the string.”. He proposed several new syntax to capture regex or xml string.
Basically from this point onward the thread has shifted gear into forward looking mood. The discussion now focused on the best way to represent a string in Erlang.