Erlang Mailing List - String as List 2

Zvi responded to Christian S’s “recommended large text mass handling modules”. He said, if for efficiency we represent a string as binary, the code will become too verbose. For example,

<<“ABC”>>, instead of “ABC”
<< S1 / bytes, S2 / bytes >> instead of S1++S2
using file:delete(binary_to_list(Filename)) instead of file:delete(Filename)
xmerl and erlsom parse into lists and not binaries

Dustin Whitney concurred the verboseness, and recommend a new string type.

Bjorn Gustavsson responded to eariler Kevin Scaldeferri’s opinion on list inefficiency. He suggested, in the case of appending:

“You can append by building a deep list and only flatten it at the end.

NewString = [AListOfChars|AnotherListOfChars]

or
NewString = [AListOfChars,ACharacter]

Or you can simply do a recursion (not tail-recursion) and use
the ‘++’ operator. That will be efficient, because the recursion will
ensure that the ‘++’ operators are executed in a right-to-left order.”

Christian S suggested an alternative to one of Zvi’s string-as-binary example (“ << S1 / bytes, S2 / bytes >> instead of S1++S2”):

“[S1,S2] and then do iolist_to_binary/1 if you need it flat.”

Then focus of discussion shifted to a tool called Leex for a while.

Richard A. O’Keefe weighted in and expressed his opinion on a number of issues.

First of all, the String as list should not be dismissed as “historical reason” only:

“it is simplicity (the preferred
sequence type in Erlang is lists, and strings are just sequences of
characters), power (because any time someone defines a function on
lists you get to use it on strings, and there are lots of useful
list functions), and processing efficiency (because working down
one character at a time doesn’t require allocating any new storage,
not even for slices).”

He suggested the following rule of thumb for text handling

“The guiding rule is
– if you just want to hold onto a string for a while, use a binary
– if you want to build or process a string, use a list (possibly in
Erlang a deep list).
– if you want to represent something that has structure, and you want
your program to be aware of that structure, turn it into a
structured
data value and work with it in that form.”

and some myth-busting on appending string performance issue:

“Right, this is not efficient. But it is spectacularly
inefficient in programming languages with more conventional
representations.
It is O(n**2). For example,
x = ”“
for (i = 1; i <= 100000; i++) x = x “a” just took 30.5 seconds in awk on my machine, 62.2 seconds in Perl, and a massive 631 seconds in Java. That was using gcj; I lost patience with the Sun SDK and killed it. (AWK faster than Java? Yes, it often is.) Building the same string in Erlang using loop(100000, “”) where loop(0, S) –> lists:reverse(S);
loop(N, S) –> loop(N-1, “a”++S).
takes 0.15 second on the same machine.“

Masklinn pointed out later that Java benchmark is off because the example is not using StringBuffer.

End of Part 2

Timeline

My development logbook

Erlang Mailing List - String as List 2