A discussion named “String as List” has been running, at the time of writing, for 6 days and 56 posts already. A lot of interesting idea and insight into Erlang were poured into this discussion. I simply want to write down some interesting technical points in this blog as a digest, as a reference.
It all started with a question from tsuraan:“Why does erlang internally represent strings as lists?”
First response from Masklinn, and actually many other contributors share the same idea:
“A lot of functional languages represent strings as lists rather than
arrays because lists are their basic collection datatype, and this allows the use of all the list-related functions on strings.”
Christian S put it in a shorter phrase: “historical reason”. As he pointed out, “internal” is not right word in the question because string was not a special type or abstraction built on top of list. A string is a list.
He then suggested the following module for handling large text:“– binaries (features representation that is 1:1 with the character
Then Kevin Scaldeferri throws in a different point: list is not a good representation because
encoding itself, now also (R12B) with efficient scanning and
tail-construction)
– iolists (features cheap concatenation of large texts)
– list of words and a word-dictionary (features quicker scanning of words, efficient storage too)”“First off, I append to strings a lot more than I prepend to them. Yeah, I
could work with reversed strings, but that’s a hack to deal with using the
wrong data type. Plus, I probably prefix match more often than suffix matching
(although this is less lopsided than append vs. prepend) …”
Now we have a diversion, Lev Walkin responded to earlier Robert’s post about that list is an ideal string representation because it can easily accommodate UTF-16 and UTF-32.“Small correction: UTF-16 and UTF-32 are practically dead, you certainly
Now we are going to have several threads going on under the same thread subject. Let’s get to it in my next blog. But for now I will ignore UTF-8 vs UTF16/32 discussions in order to focus on Erlang features.
need to think in terms of UTF-8 nowadays.”
End of Part 1
It is about summarization, so quote will be edited liberally for clarity, rather than reproduced word for word.