Zvi responded to Christian S’s “recommended large text mass handling modules”. He said, if for efficiency we represent a string as binary, the code will become too verbose. For example,
- <<“ABC”>>, instead of “ABC”
- <
< S1 / bytes, S2 / bytes>> instead of S1++S2 - using file:delete(binary_to_list(Filename)) instead of file:delete(Filename)
- xmerl and erlsom parse into lists and not binaries
Dustin Whitney concurred the verboseness, and recommend a new string type.
Bjorn Gustavsson responded to eariler Kevin Scaldeferri’s opinion on list inefficiency. He suggested, in the case of appending:
Christian S suggested an alternative to one of Zvi’s string-as-binary example (“ << S1 / bytes, S2 / bytes >> instead of S1++S2”):
Then focus of discussion shifted to a tool called Leex for a while.
Richard A. O’Keefe weighted in and expressed his opinion on a number of issues.
First of all, the String as list should not be dismissed as “historical reason” only:
Masklinn pointed out later that Java benchmark is off because the example is not using StringBuffer.
End of Part 2
Bjorn Gustavsson responded to eariler Kevin Scaldeferri’s opinion on list inefficiency. He suggested, in the case of appending:
“You can append by building a deep list and only flatten it at the end.
NewString = [AListOfChars|AnotherListOfChars]
or
NewString = [AListOfChars,ACharacter]
Or you can simply do a recursion (not tail-recursion) and use
the ‘++’ operator. That will be efficient, because the recursion will
ensure that the ‘++’ operators are executed in a right-to-left order.”
Christian S suggested an alternative to one of Zvi’s string-as-binary example (“ <
“[S1,S2] and then do iolist_to_binary/1 if you need it flat.”
Then focus of discussion shifted to a tool called Leex for a while.
Richard A. O’Keefe weighted in and expressed his opinion on a number of issues.
First of all, the String as list should not be dismissed as “historical reason” only:
He suggested the following rule of thumb for text handling
“it is simplicity (the preferred
sequence type in Erlang is lists, and strings are just sequences of
characters), power (because any time someone defines a function on
lists you get to use it on strings, and there are lots of useful
list functions), and processing efficiency (because working down
one character at a time doesn’t require allocating any new storage,
not even for slices).”
and some myth-busting on appending string performance issue:
“The guiding rule is
– if you just want to hold onto a string for a while, use a binary
– if you want to build or process a string, use a list (possibly in
Erlang a deep list).
– if you want to represent something that has structure, and you want
your program to be aware of that structure, turn it into a
structured
data value and work with it in that form.”
“Right, this is not efficient. But it is spectacularly
inefficient in programming languages with more conventional
representations.
It is O(n**2). For example,
x = ”“
for (i = 1; i <= 100000; i++) x = x “a” just took 30.5 seconds in awk on my machine, 62.2 seconds in Perl, and a massive 631 seconds in Java. That was using gcj; I lost patience with the Sun SDK and killed it. (AWK faster than Java? Yes, it often is.) Building the same string in Erlang using loop(100000, “”) where loop(0, S) –> lists:reverse(S);
loop(N, S) –> loop(N-1, “a”++S).
takes 0.15 second on the same machine.“
Masklinn pointed out later that Java benchmark is off because the example is not using StringBuffer.
End of Part 2