hit counter


My development logbook

Erlang Mailing List - String as List 1

A discussion named “String as List” has been running, at the time of writing, for 6 days and 56 posts already. A lot of interesting idea and insight into Erlang were poured into this discussion. I simply want to write down some interesting technical points in this blog as a digest, as a reference.

It all started with a question from tsuraan:

“Why does erlang internally represent strings as lists?”

First response from Masklinn, and actually many other contributors share the same idea:

“A lot of functional languages represent strings as lists rather than
arrays because lists are their basic collection datatype, and this allows the use of all the list-related functions on strings.”

Christian S put it in a shorter phrase: “historical reason”. As he pointed out, “internal” is not right word in the question because string was not a special type or abstraction built on top of list. A string is a list.

He then suggested the following module for handling large text:

“– binaries (features representation that is 1:1 with the character
encoding itself, now also (R12B) with efficient scanning and
– iolists (features cheap concatenation of large texts)
– list of words and a word-dictionary (features quicker scanning of words, efficient storage too)”
Then Kevin Scaldeferri throws in a different point: list is not a good representation because

First off, I append to strings a lot more than I prepend to them. Yeah, I
could work with reversed strings, but that’s a hack to deal with using the
wrong data type. Plus, I probably prefix match more often than suffix matching
(although this is less lopsided than append vs. prepend) …”

Now we have a diversion, Lev Walkin responded to earlier Robert’s post about that list is an ideal string representation because it can easily accommodate UTF-16 and UTF-32.

“Small correction: UTF-16 and UTF-32 are practically dead, you certainly
need to think in terms of UTF-8 nowadays.”

Now we are going to have several threads going on under the same thread subject. Let’s get to it in my next blog. But for now I will ignore UTF-8 vs UTF16/32 discussions in order to focus on Erlang features.

End of Part 1

It is about summarization, so quote will be edited liberally for clarity, rather than reproduced word for word.