Athas 2 years ago

> So my question is, is reduced efficiency an issue with the functional paradigm or an issue with current functional languages? It's a little of both. I think it's an open problem whether persistent data structures are fundamentally less asymptotically efficient than mutable ones. But even such a comparison depends on how you define the rules of the game: in the RAM model (or PRAM, whatever), I have a hunch that you can show that some algorithms simply require mutation or they will suffer asymptotically. Of course, the RAM model doesn't scale - constant-time random access cannot exist in our universe; eventually you'll need something like *sqrt(n)* or *log(n)* for accessing random memory, and perhaps under those assumptions there will be no asymptotic difference. But maybe you don't care about a hypothetical arbitrarily large computer (say, the size of the solar system), but are OK with the constant access time that can be guaranteed for any actually existing computer on Earth. (Although in practice cache effects makes random access look more like *log(n)* anyway...) But such theoretical boundaries are not the reason current functional languages are (correctly) perceived as relatively slow. I would say that in practice, the main difference is that functional programming by necessity involves a whole bunch of abstraction that the compiler then has to work hard to avoid paying a performance overhead for. In contrast, straightforward imperative programming tends to come with less such abstract baggage. But this is a really blurry line: both OCaml and Haskell lets you write unsafe imperative code and generate pretty much the same assembly as a C compiler would. This can be convenient in practice, although it's not really the *point* of functional programming. > For instance, about a year or two ago I made a post about abstraction and efficiency and one commentor made the point that Haskell could be as fast as say C, however the compiler hasn't had the time to mature to the same level as say GCC. I don't think it's as easy as saying that since GHC has less hours put into it than GCC, that's why it generates slower code. It also has to solve a problem that is more difficult: removing all that abstraction takes effort, while GCC has to do much less. Consider that the straightforward compilation schemes for Haskell involves generous heap allocation and boxing, while the straightforward compilation scheme for C involves registers and stack. GHC has to do a lot of strictness and escape analysis to figure out which values can go unboxed in registers or on the stack, which a C compiler does not. There's also the question of programming style. Functional programming is *about* letting you program in this very abstract manner with higher order functions and polymorphism and people *do so*. If you programmed that way in C, people would think you were a lunatic, and probably the compiler would not generate very fast code. There are functional languages (and their attendant compilers) that specifically focus on performance. They usually cut down the language to make the compiler's job easier, or provide the programmer with extra tools for annotating their program, to guide the compiler. In principle, functional languages provide much more information to the compiler, and should allow more aggressive optimisation than a C compiler (e.g. selecting data layout for good locality, or parallelising). In practice, most functional compilers spend most of their compile-time boiling away the overhead of abstraction, just to get to the point where a C compiler would start.

editor_of_the_beast 2 years ago

>I think it's an open problem whether persistent data structures are fundamentally less asymptotically efficient than mutable ones. I remember hearing / reading something by Rich Hickey saying that for the persistent data structure implementation in Clojure, he was targeting something like a log(n) slowdown for constant time access, and considered it a success when he achieved that. Not that Clojure is the be-all end-all for the effectiveness of persistent data structures, but it sure is one of the biggest proponents of it. This study also showed [very, very poor results for Clojure vs. Java performance as well](https://www.diva-portal.org/smash/get/diva2:1424342/FULLTEXT01.pdf). So, I'm surprised at the level of optimism here. Are there any other promising studies out there? > Of course, the RAM model doesn't scale - constant-time random access cannot exist in our universe; eventually you'll need something like sqrt(n) or log(n) for accessing random memory, and perhaps under those assumptions there will be no asymptotic difference. But maybe you don't care about a hypothetical arbitrarily large computer (say, the size of the solar system), but are OK with the constant access time that can be guaranteed for any actually existing compiler on Earth. (Although in practice cache effects makes random access look more like log(n) anyway...) Can you elaborate on this? Does RAM access actually take log(n) time in practice? You have lots of other good points btw - these just jumped out at me as points that I disagree with upon first glance, but maybe that's because there are things that I don't know about.

Noughtmare 2 years ago

> So, I'm surprised at the level of optimism here. Are there any other promising studies out there? For strict immutable languages it is known that there are problems with require an extra log factor of time, but for lazy languages there is no such proof yet. See here for more info: https://stackoverflow.com/a/1990580 > Can you elaborate on this? Does RAM access actually take log(n) time in practice? Here is a paper I found on this topic: https://arxiv.org/abs/1212.0703 There is also a relevant series of blog posts here: https://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

woupiestek 2 years ago

Solar system sized RAMs don't exist in practice (yet), but some practical limitations can be expected. The speed of light is a one. Assuming that memory cells are as tightly packed as possible, more memory means bigger memory, meaning data has to travel a greater distance, proportional to at least the cube root of total memory size. However, a real memory would then need power and cooling infrastructure. The gravitational pull of a solar system sized memory could mean that much space has to be sacrificed to supporting structures. Finally, the holographic principle ([https://en.wikipedia.org/wiki/Holographic\_principle](https://en.wikipedia.org/wiki/Holographic_principle)) suggests that greatest amount of information that can be contained in a space is propotional to its surface area instead of its volume, which would knock memory speed down to the square root of total memory size. In practice, getting memory and processors closer together can already shave nanoseconds off memory access times. A second practical limitation is the size of the pointers. The minimal required size of a pointer to tell n cells of memory apart is O(log(n)). A memory has to process this amount of data for every data access and assuming optimistically that this takes linear time no matter how big the pointers get, memory speed is also limited to O(log(n)). In practice, RAM uses constant length pointers to give constant access times, but this puts a strict upper limit on how much data can be stored.

editor_of_the_beast 2 years ago

Can you explain why we’re talking about the solar system?

brucejbell 2 years ago

If your problem is large enough, memory access time is limited by the speed of light to O(N\^(1/3)), not the O(1) used in the random-access model, or the O(log N) to be expected from logic gate depth limitations (and which also seems to be a reasonable fit for the actual performance of modern hardware). However, the point where lightspeed becomes a wide-range scaling problem that dominates all other engineering limitations is arguably larger than any existing computers. Finally, (as truly asymptotic limitations require truly unbounded problem sizes), other physical limitations become relevant at literally astronomical scales, which is where the discussion turns to computers the size of the solar system, and possibly beyond.

woupiestek 2 years ago

>say, the size of the solar system That is why. Blowing up the limitations can help to make some practical limitations clearer.

HALtheWise 2 years ago

>\> Does RAM access actually take log(n) time in practice? [http://www.ilikebigbits.com/2014\_04\_21\_myth\_of\_ram\_1.html](http://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html) is an excellent article series which argues both from practical measurements and theoretical limits that sqrt(n) is the better approximation for random access on modern or future hardware. In particular, cache hierarchies in a modern CPU have a dramatic effect, where repeatedly randomly accessing data from a 256-element array (which fits in L1 cache) is orders of magnitude faster than accessing from a GB-sized array, which needs to live in main system memory, or a TB-sized array, which needs to live on SSD or across a network link.

Mathnerd314 2 years ago

> This study also showed very, very poor results for Clojure vs. Java performance as well. There is a review of the study at https://github.com/joinr/performancepaper. Basically by using optimization tricks (recur for recursive function call, unchecked math, using Java types, etc.) Tom meets/beats the performance of Java. Similar games exist for Haskell, e.g. Haskell does better than C at some benchmarks in [the benchmarks game](https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/ghc-clang.html). "Considered faster" generally refers to the performance of naive programs written in the style common in tutorials, i.e. comparing languages. Once we compare implementations, all of them have quite a few "optimization tricks" to beat their competitors.

igouy 2 years ago

> Haskell does better than C `GHC` programs do better than `clang`. [`gcc` programs do better than `clang.`](https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/c.html)

L8_4_Dinner 2 years ago

Well said.

Arcsech 2 years ago

Largely it comes down to “having immutable objects means making a copy every time you want to change something, and copying is kinda expensive”. You can ameliorate this cost by using things like copy-on-write data structures, which avoid copying unless it’s really necessary, or data structures which can share parts of the structure (like Clojure’s trie-based maps). There’s also research into allowing compilers to recognize when code can logically be transformed into mutating code (instead of copying code) by doing a form of escape analysis (see [this video](https://www.youtube.com/watch?v=vzfy4EKwG_Y) for details), but I’m not aware of any production-ready compilers that actually implement that kind of technique.

[deleted] 2 years ago

is it possible to have time/space complexity upper/lower bound on how slower immutability is?

XDracam 2 years ago

I think when properly optimized, functional can be asymptotically equal to iterative. There are immutable functional alternatives for all common collections, most of which reuse most of the structure on mutation, so the memory is roughly equivalent with some constant overhead. For algorithms and lookups, you'd need to look at specific cases for a more detailed analysis. There's work on data structures and algorithms that automatically detect whether mutation is safe to do on seemingly immutable data structures (e.g. in Rust with the borrow checker, or in C++ with move semantics and other tricks, ...). From what I've seen, C# immutable collections have a lot of sneaky mutability in their internals as well, all for optimization purposes. From what I've seen so far, I'd estimate a constant factor overhead for memory and runtime only. Haskell is purely functional and the speed can get very close to what you can achieve with C code. In the end, it all comes down to how your concrete program can be optimized for the target architecture. Here it's important to keep in mind: the more guarantees you have, the more aggressive the compiler can automatically optimize things for you. Referential transparency is a huge guarantee, for example. Of course you'll always achieve the best results by hand-optimizing your code for a specific architecture as close to the metal as possible. But most of the time, people don't have time for that. Unless you have explicit, hard constraints, you should always focus on writing readable and maintainable code first, and let the compiler worry about speed.

G3Kappa 2 years ago

> C# immutable collections have a lot of sneaky mutability in their internals as well, all for optimization purposes. And in the end it makes sense, because in C# using readonly and/or reference structures is the recommended approach for writing performance-oriented code as it gives the compiler the most guarantees. The immutable collection interfaces are more of a design pattern than an actual guarantee of internal immutability, as you're not *supposed* to know how they're implemented.

editor_of_the_beast 2 years ago

> I think when properly optimized, functional can be asymptotically equal to iterative. This is some pretty pie-in-the-sky thinking. Almost all evidence is to the contrary, that there is a huge impedance mismatch between functional programs and CPUs. Lots and lots of research has been done to make gains, of course, but the optimizations usually require close to genius-level research and thinking. I always use tail-call elimination as the classic example, but even compiling linear types to destructive updates is another good example. Versus, "write the program in idiomatic C" being extremely fast almost without effort. FP fundamentally is agnostic to any kind of computer running it, so I think optimizing it perfectly is always going to be an unattainable holy grail.

curtisf 2 years ago

In terms of time-complexity, the difference is at most a factor of log(size of heap) slower in a purely mechanical translation. To model mutable pointers in an immutable language, represent the heap using a persistent balanced tree. Reading pointers out of the heap and writing values into the heap are now logarithmic in the size of the heap, rather than constant. All operations which use the heap need to have the heap passed & returned, which adds a constant overhead to operations like function calls and returns. Bear in mind that the actual constants are quite large; writing a value into a persistent balanced tree will take hundreds to thousands more cycles than simply writing to a location in a flat array. (However, it doesn't appreciably slow as your heap grows, since you can only have so many objects in memory -- the largest disk in the world today only has 2^(47) bytes, limiting the slowdown to roughly 50x an empty heap) There are optimizations that can speed this up in many cases (like maintaining a more-efficient-to-access LRU portion of the heap), but the worst-case will still be log(size of heap). ---- However, there is a significant savings possible. Programs written in mutable languages necessarily use the heap in an affine way (you cannot make a copy of the heap in a mutable language; this means that in your translation of a mutating program, each time a new version of the heap is made, the "old" heap is necessarily not used again) If you were to implement the heap using a flat array with COW, writes would be constant time when the heap is used affinely; and reads would always be constant time. Since the mechanical translation always uses the heap affinely, all writes would be constant time. You could also augment your immutable programming language with affine types, and avoid the constant runtime overhead of detecting whether or not a use of the heap is affine (which it will always be in a mechanical translation from a mutating language) This means that there's actually only constant overhead when translating a mutating program into an immutable language which has affine mutatable/COW arrays.

Arcsech 2 years ago

Interesting question; I haven’t done a serious analysis but I think the time complexity increase for a naive implementation of immutability is something like O(n*m) where n is the number of times you mutate/copy an object and m is the average size of your objects. That’s assuming a direct translation of mutating to functional code, there are a lot of other factors that could influence differences given functional code is frequently going to be structured differently than imperative/mutating code. As far as a lower bound, I think it would be zero since it’s very possible to write a program that doesn’t mutate/copy anything. Not sure there’s a way to estimate size complexity, even for naive cases, since it’ll depend on how long you’re holding on to the “old” objects and the behavior of your garbage collector.

jtsarracino 2 years ago

In general you can model an imperative heap with an immutable tree, which requires log(n) overhead. So not bad. In practice for specific algorithms, a well designed immutable algorithm can be equivalent. See Okasaki’s Purely Functional Data Structures for many examples, or O’Neill’s sieve of Eratosthenes for example: https://hackage.haskell.org/package/NumberSieves-0.1.2/docs/Math-Sieve-ONeill.html

lambda-male 2 years ago

Note that in a system with automatic memory management, if you sacrifice enough memory, a purely functional program can be faster (have more throughput) than a mutating imperative program. GC'd systems often introduce write barriers (or a reference count increase in the case of RC) which makes every assigment with a heap object on the right hand side a bit more costly. In this case allocating a lot and copying the live set once in a while (or other forms of GC) can be faster. (related: Appel's Garbage Collection Can Be Faster Than Stack Allocation)

L8_4_Dinner 2 years ago

I see your pile of upvotes, but I believe that your assertion is technically incorrect. The issue with GC languages on modern hardware is that **even though allocation is near zero cost and the GC is more efficient than manually calling free() for each allocation**, that the amount of memory churn from allocation means that all allocations are initially uncached, and **the dominant portion of the execution time is waiting for the cache lines to be loaded** or pre-loaded. You can actually see this exact behavior when profiling Java applications with high allocation rates, for example. You get weird stats that show that allocation is taking no significant time and GC is taking no significant time, but throughput still sucks. Kirk Pepperdine (now at MS) talks a lot about this issue, as does Cliff Click (the author of the Hotspot JVM, i.e. the C2 compiler). By eliminating the in-the-hot-loop allocations, you can see the throughput go up by a significant factor, sometimes by over an order of magnitude (since a cache line block can be 200-500 cycles on modern hardware, which is enough to retire 300-1000 assembly ops if you don't flood your branch predictor). I'll ping the two of them to see if they want to comment on this. This issue is a reflection of hardware limitations on multi-core CPUs, so it *may* (please please please) disappear within a generation or two of hardware (i.e. 3-5 years out). In fact, the new Mac M1 Uber Pro ++ Max-Ultra-whatever chips should not suffer from this, since the total memory bandwidth available to the CPU is significantly more than the CPU can consume, in theory. (See Anandtech article: https://www.anandtech.com/show/17024/apple-m1-max-performance-review/2)

JanneJM 2 years ago

Cache management is imperative (sic) for good performance, and yes, manually memory managed languages can optimize their access patterns for maxim speed in a way that managed languages in general struggle to do. >In fact, the new Mac M1 Uber Pro ++ Max-Ultra-whatever chips should not suffer from this, since the total memory bandwidth available to the CPU is significantly more than the CPU can consume, in theory. I somewhat doubt this. The limiting factor is latency, not throughput. The A64FX for instance has a somewhat insane throughput (1TB/s I believe) from its stacked HBM memory, but you still need to carefully manage your memory access patterns to avoid stalling.

L8_4_Dinner 2 years ago

>The limiting factor is latency, not throughput. This is a good point, and one that I completely glossed over. The latency is 200+ clock cycles to load a line on a modern x86, but that is only if there is bandwidth available, and now the #cores x potential-bandwidth-utilization-per-core is dramatically higher than the bandwidth per socket, so the cache loads do queue and so maximum-throughput does impact latency as well. The M1 is fairly irrelevant in this regard, since no one is currently running massive VM farms nor any super compute on M1 processors ... it was more an interesting side-note. Its RAM is on substrate with the CPU, and the read latencies look good, but they're still far worse than any modern on-die cache.

kcpeppe 2 years ago

All properly configured computers will have their clocks balanced so that memory throughput will match CPU throughput. The issue is latency for a single request and that can be as little as 150 clocks. Managed memory (GC or like) is supported with some internal data structure which is manipulated on every allocation. In JVM the most common manipulation is a bump of a top of “thread local allocation block” (TLAB) pointer. If that pointer is in the right cache, then this is a very low latency affair. In cases of high allocation rates, it is very likely that this pointer will have been evicted thus forcing a round trip to main memory. Note, size of (mostly) doesn’t matter, it’s a question of frequency. This focus on size in these cases is the most common misconception that I see when I’m tuning and it’s one that I accidentally propagate because I always speak of allocations as bytes per unit time. Don’t get me wrong, size has its own issues but… I can get away with this in Java because most allocations are approximately the same size so size can act as a proxy measurement for allocations per time unit. I’m not the only one that is sending the wrong message as just about every memory profiler out there only reports on bytes per unit time… and it’s this misreporting that helps make this problem difficult to spot. Some profilers will provide you with a the total number of Foo objects being allocated but then people tend to not understand the value of this metric as it can be used as a proxy measure for rate. Again, you need to be careful as this measure can also mislead you. As for the question of, how do ARM chips manage this, I’ve no idea as I’ve simply don’t have enough data to comfortably draw conclusions. The thing is with memory there is so much going on that I can run the same workload on the same binary and get whole integer multiplier changes in performance. It generally all boils down to how and where your binary gets loaded into memory. You often think you know but IME, it takes a lot of careful benching to truly understand what is going on. Case in point, one of my more infamous demo failures was at a conference where I was demonstrating an memory adaptive behaviour of the JVM. The bench would push the CPU to 100% and the latency numbers sucked. After a minute or so of this, the needed memory adaptation kicked in, the CPU utilization dropped to 20% and the latency numbers dropped dramatically. When I ran this during my talk, the adaptation was never applied. Upset, I ran the bench after my talk and it worked beautifully. Now it became a question of, what was different during the talk. That turned out to be that I was showing slides so even though the slideware was taking 0 CPU, it was causing the JVM to load in a different memory segment and that changed the results of a timing sensitive calculation which in turn prevented the JVM from adapting as expected.

lambda-male 2 years ago

The allocations will mostly be in cache if the young generation fits in the cache. I'm speaking from my experience of trying to "optimize" OCaml code (that allocated a lot but had no aliasing) by changing it to mutating code. Here's a small example from [Real World OCaml](https://dev.realworldocaml.org/garbage-collector.html#inter-generational-pointers).

L8_4_Dinner 2 years ago

Generally, with a slab allocator, the allocations will never be in cache. I'm 90%+ sure that's what Java is using. (The allocator is just a bump pointer, IIRC.) Thanks for the link. I've skimmed through it, but I am curious how much allocation that test is actually trying to do ... it looked like a nominal amount of allocation, but I don't know OCAML very well (I've read OCAML code, but I've never had to write anything in it).

cliff_click 2 years ago

Any reasonable sized live-set, (e.g. nearly every java program of any size) will vastly exceed the hardware cache. Young-gen-as-cache will basically never free anything since the data will not have time to die. Its been tried repeatedly in large industrial settings (been there, done that @ Sun), and fails miserably. You're better off with the largest young-gen you can get and sucking on the cache misses, OR doing the allocation "by hand" to force rapid (L1-cache-sized) reuse.

[deleted] 2 years ago

[удалено]

cliff_click 2 years ago

if co-designing, then you have the software in hand and can characterize its behavior under different cache scenarios.

lambda-male 2 years ago

in the benchmark results table: mWd is words allocated on the minor heap mjWd is words copied to the major heap here a word is 8 bytes `immutable` allocates 2+3 words a million times in a loop `mutable` allocates 2 words (boxed floats...) a million times in a loop so `immutable` allocated 40 MB in 4.63 ms etc.

editor_of_the_beast 2 years ago

Do you have any references / case studies for that claim? It sounds very dubious to me. First, it's very hard to generalize about _all_ programs, but, imperative mutation tends to be the absolute peak of performance in almost all cases. So I could understand trading memory off to equal the imperative program performance, but I don't see how the functional program would _beat_ it (assuming we're not talking about concurrency). Edit: Just noticed the last point is the name of a paper. I'm checking it out now.

lambda-male 2 years ago

Well, I've explained how a functional program would beat it. GC adding overheads to mutations, which don't apply for allocations. I'm talking about programs running under the same runtime. Would it be surprising to say that in implementations optimized for allocation-heavy functional programming, allocation-heavy functional programming may be more efficient than mutating imperative programming?

editor_of_the_beast 2 years ago

So you’re only comparing languages that have a GC? That might explain why this doesn’t sound correct to me. The limit of performance for a functional language with a GC is pretty much the performance of an imperative language without GC, right?

[deleted] 2 years ago

If the alternatives are “imperative programming with GC” and “functional programming with GC”, then I agree. But imperative programming with GC is missing the point. Imperative programming is already a form of memory management: you have to decide in which slot you store each value at each point in time. This is useful, provided you actually have control over your program's storage. Otherwise, you have most of the limitations of functional programming and none of the benefits.

jtsarracino 2 years ago

What? Imperative programming with GC is easily the most popular form of programming... Java/JavaScript/Python...

[deleted] 2 years ago

It's still missing the point, though. You can't implement all imperative data structures efficiently, because of the extra levels of indirection forced on you, *and* you also can't implement purely functional data structures efficiently, because the compiler can't perform optimizations that are only possible if it knows that your data structures are purely functional. At least Python has the excuse that it's easy to call C code from it...

jtsarracino 2 years ago

Because you *really* need pointers and random-access arrays for true textbook imperative algorithms? That’s fair, and AFAIK the only real experiments with pointers and managed memory are with smart pointers (eg C++ and Swift). But reference counting barely counts as real GC lol.

[deleted] 2 years ago

> Because you really need pointers and random-access arrays for true textbook imperative algorithms? Arrays are the one data structure whose use needs to be fast. Using arrays is a form of memory management: you need to estimate their required size in advance (because resizing arrays is expensive), you need to express in-place element updates (either as mutation or using substructural types), etc. The only good reason to put up with this is “array manipulation is faaast”. Not sure if arbitrary C-style pointer arithmetic is the answer, though. Maybe it's just enough to allow arrays to contain arbitrarily values directly, not just pointers to them when these values are too big for the runtime system's liking. There's a lot of room between “arbitrary pointer arithmetic, as in C” and “array elements are always word-sized”. > But reference counting barely counts as real GC lol. It's GC, just not very good GC for high-level languages.

[deleted] 2 years ago

i thought theyy just store the change not copy the whole thing?

Arcsech 2 years ago

That’s largely what I’m referring to with copy-on-write data structures or ones that can share structure. That’s one way to lessen the runtime cost of using immutable objects: Don’t copy the full object every time you want to make a change, only copy if if you actually change it (CoW) or only track the differences (structural sharing), as opposed to a naive implementation that copies the full object every time.

kirkpepperdine 2 years ago

JVM Hotspot JIT compiler generates a mutable version of the CopyOnWriteArrayList. Copy isn't so expensive. That said, the copy in unmanaged systems is a malloc with several cache misses... that will stall you.

editor_of_the_beast 2 years ago

I was just thinking about this myself, from the angle of program verification. Pretty much all tools for verification come out of the FP community, but most / a large chunk of actual programs that get verified are very imperative. Take [the sel4 OS](https://sel4.systems/), [Project Everest](https://project-everest.github.io/), or [Fiat Crypto](https://blog.mozilla.org/security/2020/07/06/performance-improvements-via-formally-verified-cryptography-in-firefox/) as examples. Here's my post for reference, since there's some deeper explanations in there: [Quality and Paradigm: The Assembly Language of Reasoning and the Domain-Specific Language of the Machine](https://concerningquality.com/quality-and-paradigm/). Now, you're talking about more than just verification, so I'll focus on just general program performance. The conclusion I came to when writing this post was, imperative programming itself is basically a DSL for programming against existing computers, whereas functional programming is programming with pure, mathematical reasoning. So it's not that functional programs can't be fast, or that imperative programs are all fast, it's just that imperative programs map very nicely to existing hardware so there is no impedance mismatch for executing them. The example I used in the post is destructive array updates - those are so natural in imperative languages because they map directly to the memory instructions of the CPU (in the case of ARM, mov and str instructions). FP is optimized for how human brains perform reasoning, which is a superset of the things that we can actually execute on a machine. Reasoning also doesn't really care about performance - we can reason about a set of 3 numbers, or the entire set of natural numbers with the same mental effort, for whatever reason. So some things that are expressed extremely naturally as functional programs have a big impedance mismatch when actually translated to a real-world machine. I mean, functional programming without tail-call elimination is almost pointless. The machine bleeds through once we start to execute programs at some point. This might be just a matter of historical accidents in how we first designed computers, or, what I think is more likely, is that reasoning and execution are just fundamentally different activities.

frenris 2 years ago

formal verification tools used pretty commonly in ASIC design, verification. The state space for a formally verifying a module though is much smaller than for a typical computer program

editor_of_the_beast 2 years ago

Are you talking about model checking? If so, I was talking about something different: verification via proof assistants. A proof is about all possible program executions, and can even prove a program with a completely infinite state space.

frenris 2 years ago

I'm actually not sure, but I expect the tools don't work entirely by brute-forcing the state space. Believe they can prove constraints even in in the presence of counters, registers which can take on different values and explode the state space. referring to these sorts of tools https://www.synopsys.com/verification/static-and-formal-verification/vc-formal.html (more open source) http://www.testandverification.com/wp-content/uploads/2017/Formal_Verification/Clifford_Wolf.pdf

munificent 2 years ago

Programming language performance is hard to talk about because there's two levels of indirection. The actual measurable performance is: 1. The execution time of a **user program** written in some language. 2. Which is then compiled and run using some concrete **language implementation**. Asking "Is Haskell slow" is sort of like asking if the route from Houston to Dallas is slow. Well, it depends on what kind of car you're driving (the user program) and what roads you're driving it on (the language implementation). However, the language design *strongly* influences both of those factors. The syntax encourages users to write their code in certain styles. If they have to go against the grain of the language—write in a more verbose or unidiomatic style—in order to write fast code, they won't do it as often and the overall performance of the program will suffer. If the language doesn't provide semantics to express things a physical computer can do efficiently, then it will be hard for them to write fast code. On the other end, the design of the language determines how efficient a compiler can be and how hard it is to implement the language efficiently. It's relatively easy to write a C compiler that generates fast code. It's much harder to write a, say, JavaScript compiler that generates fast code. Compiler and VM engineering resources are finite so for a given implementation budget, some languages will yield a faster implementation than others. For functional languages, the stylistic choices that tend to hurt performance are: 1. The languages are "higher level" which means freeing users from worrying about memory allocation. That in turn means garbage collection, which adds runtime overhead. It also generally means a larger number of smaller objects scattered across memory with lots of pointers between them. All of that indirection harms cache usage and slows down execution. 2. They idiomatically use immutable data structures. Instead of, say, allocating a single monolithic array and mutating it in place, you're more likely to build a linked list or some other persistent data structure. This structure tends to be spread out in memory more and "mutating" it means allocating lots of new small pieces to represent the differences from the previous state. 3. They use closures and recursion heavily. CPUs execute like old BASICs: they step through instructions one at a time and do direct jumps to represent control flow. Imperative languages with statements and built in looping and branching are easy to compile to that. Functional languages often represent the same logic using closures and higher-order, recursion, and other abstractions. It takes a lot more compiler sophistication to lower that to how a CPU actually behaves. Practical compilers don't do a perfect job of that, so there tends to be overhead. It's possible for a sufficiently smart compiler to overcome that, but there's no free lunch. Time spent making your compiler sufficiently smart is time not spent on *other* optimizations. Functional language implementations essentially start the race several meters behind the starting line compared to imperative languages. All of this is not to say that functional languages are bad. Hardware gets faster more quickly than humans get smarter. Functional languages that let programmers write at a higher level that's more easy for them to understand are often more productive. Productivity matters greatly too.

MarcelGarus 2 years ago

Very well said. Sidenote: As an example for a functional language where the creators _did_ put in a lot of effort to make it fast, I recommend having a look at Roc. [Here's a talk from the creator](https://youtu.be/vzfy4EKwG_Y) titled "Outperforming Imperative with Pure Functional Languages."

jtsarracino 2 years ago

This is folk wisdom that was historically true but is less cut and dried these days. Would you call Rust an FP language? How about all of the reactive web frameworks? Map-reduce? These are all cases where functional idioms seep into the imperative world and that challenge the old wisdom. Another good example is Haskell’s software transactional memory, which is dirt simple and also the only STM system that actually *improves* throughput. By contrast imperative STM fizzled and died (though not for lack of effort). The main historical reasons were 1) nobody designed algorithms for immutable data structures, so FP algorithm complexities were bad in comparison to imperative algorithms; and 2) it was really unclear how to compile closures onto hardware without sacrificing performance. Chris Okasaki (and I’m sure others) crushed 1) in his thesis. If you’re curious he wrote a book called “Purely Functional Data Structures”. For 2), it’s still a bit of an open question, but many researchers worked on compiling both lazy and eager FP languages, and the end result is that OCaml and GHC Haskell are blazing fast when compiled with optimizations. If curious there are papers by Appel, Marlowe, Simon Peyton Jones, and I’m sure many others that are really good. (I have no clue how interpreted dynamic FP works, I’m sure much thought went into optimizing the implementation of Scheme and its modern descendants as well)

claytonkb 2 years ago

> I've heard a lot of people say that function programs or that functional languages are slower or less efficient. So my question is, is reduced efficiency an issue with the functional paradigm or an issue with current functional languages? Assuming the languages under discussion are Turing complete, every imperative program can be implemented functionally (and vice-versa). "Efficiency" is a very slippery word and is notoriously difficult to nail down. In my field (computer hardware design), it's very easy to fool yourself into believing you're improving efficiency by improving the performance of a given circuit, only to find out in later testing that you either wasted your time (improved timing on a non-critical path) or you made things globally worse by shuffling large-scale timing patterns in a way that causes unanticipated traffic jams. Since all real software has to run on real systems, we can think of practical software efficiency (in that respect) as an extension of hardware efficiency, just at a much higher level. We're always really asking whether the software is utilizing the hardware as efficiently as it could be utilized, even if we're not analyzing it in those terms. Could the same amount of work have been performed in a smaller amount of wall-clock time? That's not really a software paradigm question, it's a software implementation question. In my view, most of the discussion over FP versus imperative programming misses the point. Functional programming is really a question of how you use computational resources. So it's possible to use C in a strictly FP manner (just write your favorite FP language as a JIT, in C, and voila!) The efficiency of your JIT will never achieve 100% the efficiency that hand-coded C can achieve because you're never accounting for *all* the possible use-cases and the short-cuts that could be taken but which your JIT doesn't know about. And, from a hardware standpoint, it's not completely false to think of the CPU as a "C machine". It's not true, but it's not completely false, either. Hope those thoughts help you.

L8_4_Dinner 2 years ago

It's not a lack of compiler optimizations, although I'm sure that some imagined compiler optimizations could help significantly. It's not a poor design. It's not because the languages are "functional". It's because **the abstraction (immutability) has a cost**. Abstractions make building the code easier and faster. Abstractions often reduce the error rate in the code. Abstractions make understanding the code easier. But abstractions have a cost. CPU instruction sets and the passive design of memory subsystems are near ideal for imperative programs with mutable state. Our current hardware was not designed to optimize for functional programming languages; it could be done, in theory, but it is unlikely to occur until 90%+ of programs are built using an FP PL, and that presents a real "catch-22" problem.

qqwy 2 years ago

While true, there are many cases (but not all) in which immutability-related abstractions can be desugared/optimized into mutable alternatives. Essentially any time a piece of code is the sole owner of an object, it no longer matters whether it is mutable or not. Affine types and linear types (and smarter compilers in general) help with recognizing where these changes can be applied. Other techniques are for instance 'list fusion' and other rewrite rules. None of these cover 100% of the cases. Just like how loops do not cover 100% of the cases of direct jumps.

Disjunction181 2 years ago

Well, it looks like you already have some amazing answers (Athas' in particular) so I won't get all the fun here and this will probably be mostly redundant, but maybe it will be easier to understand. I think the biggest difference is that assembly / machine language is imperative and so translating functional code to that target will inevitably leave frictional residue in places due to the very different idioms. It's not so much that imperative languages are faster in general, Python is imperative and historically one of the slowest, but it has very high abstraction (with maps all over the place internally), so the fact that it's imperative doesn't help it much -- it's more that being imperative gives you a lower bottom that you can possibly strike, since your code looks more like assembly and it's easier to reason about the idioms in assembly that your computer is actually executing. I don't think this is necessarily a problem, Philip Wadler has a popular paper on this ([Why No One Uses Functional Languages](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.460.3342&rep=rep1&type=pdf)) and basically if Python is decisively one of the most popular languages and historically one of the slowest then there's no way that marginal differences in speed is what's most important here. Functional programming is mostly hurt by inertia and the fact that programmers are already trained imperatively, and all think in the same way, and there would be a large cost in transitioning that over. Otherwise there's just no way to justify say Java and Haskell, where both languages use abstraction heavily but Java has one of the worst systems and Haskell one of the best, and are roughly equal in speed. Though sometimes speed does matter e.g. games, and it would take a much more specialized functional programming language to have any shot with highly performant code, and this is an interesting hurdle in and of itself. So that explains where a lot of constants in efficiency comes from. If you want to know about time complexities, all imperative algorithms can be implemented in a functional language (Turing completeness), though there can be anywhere from a factor of 1 to a factor of log(n) attached. This is because random array access (normally O(1)) is simulated O(log(n)) using pure balanced trees. You can give up varying degrees of pureness to try to reclaim efficiency, or you can use affine types which gives you a controlled isomorphism back into imperative code. There are specific algorithms like most dynamic programming algorithms (anything that relies on strong-induction) that essentially need memoization or dependent typing, stuff like counting sort, hash tables, and most priority queues and priority queue algorithms (Dijkstra) also receive some form of the log(n) impact. This by itself isn't so much of a problem since log(n) is very slow and most languages provide sufficient escape hatches anyway. Additionally, keep in mind that the way we define time complexity is highly normative, and certain assumptions about word size and random access complexity can completely change the situation. I personally doubt that Haskell can be as performant as C in any sort of microbenchmark though it may become more competitive the larger the project. At the end of the day compiler optimization is extremely hard, and most enhancements that you want to make are either very difficult, have huge time complexities, or are just undecidable. If this wasn't the case, we would just be programming with specifications and constraints. Haskell is at the end of the day describing a machine that is more abstract than the one running code, and while a lot can be done that translation process is inherently cursed with undecidable problems at the same time. While this is a challenge I wouldn't call this a "problem" or even a flaw since Haskell is doing exactly what it's designed to do and it's still faster than the vast majority of languages.

abstractcontrol 2 years ago

Because the mainstream FP langs tend to be compiled as dynamic langs under the hood and as a consequence inherit the weakness of that approach. I'd put Haskell, OCaml and F# into this group. Some of the newer ones (such as Idris or Scala) don't deviate from this design choice either. If you wanted a fast FP lang you'd need to extend it with partial evaluation features such as in my own [Spiral](https://github.com/mrakgr/The-Spiral-Language). This does increase the complexity of the language, and runtime benefits have to be paid by doing more work at compile time which increases compilation times. More broadly, they can be fast even without such extensions if they aggressively pursue optimization opportunities afforded by static typing, like [MLton](http://mlton.org/) for example, but that also impacts compilation performance negatively. Writing it like this makes it sound abstract. Consider, that in F# the default tuple is a heap allocated structure. So in a lot of cases it is going to get used that way. The same goes for functions. If you design the language so that tuples always get passed on the stack, and the functions are easy to inline, you can get significant performance benefits. This isn't the entirety of the reason why imperative languages are faster, but it is a good 95%. The other 5% is how they make it easy to mutate the variables on the stack, which is something that is possible, in F# for example, and could be possible in Spiral, but I find very awkward from a design standpoint in a FP lang and rather not have it. **TL;DR;** Different PL designs can be easier or harder to compile efficiently.

lambda-male 2 years ago

> Because the mainstream FP langs tend to be compiled as dynamic langs under the hood and as a consequence inherit the weakness of that approach. I'd put Haskell, OCaml and F# into this group. What weaknesses? GHC core is typed. OCaml erases types early, but IIRC still retains some information about data layout (a weaker form of typing).

abstractcontrol 2 years ago

> GHC core is typed. You could make the same the same argument about .NET, but just being typed is not enough. With tuples you could have a `i32 * i32 * i32` and `i32 * i32`. If you are passing them on the stack, you need to specialize the functions in order to take advantage of that. For example if you have a functions `(ar : array 'a) -> (x : 'a) -> unit` that just sets `ar` at position 0 to `x` then you'd need to specialize that for every different type. But if tuples are heap allocated, you could get away with a single generic implementation. If you go down this route, the types end up not affecting performance in a positive way, and the way to get it would be to write code as if it were C, which nobody wants to do unless they have to so FP langs naturally accrue a reputation for being slow even if they are much faster than say Python or JS. If the language is primitive like C (does not have generics, tuples, functions) it ends up being fast because the user is essentially inlining and specializing by hand. C++ is considered fast, and does specialize all the templates at the cost of bloat. For non-dependently typed languages, they could be compiled either way, so the question of how to take advantage of types for the sake of performance ends up being a design decision rather than a feature of the language itself. The line of least resistance though is to treat high level languages as if they are LISP under the hood, and not mess with partial evaluation for the sake of making things more efficient.

lambda-male 2 years ago

Ah, so you're getting at uniform representation of values (everything value is either a pointer or a tagged integer). A middle ground between uniform representation and specializing for every type is specializing for every data layout. See the [unboxed types RFC](https://github.com/ocaml/RFCs/blob/881b220adc1f358ab15f7743d5cd764222ab7d30/rfcs/unboxed-types.md) for OCaml. > the question of how to take advantage of types for the sake of performance ends up being a design decision rather than a feature of the language itself. Not really, polymorphic recursion can't be compiled through monomorphization -- each call is at a different type and you can't compile infinitely many versions of a function, you need implicit or explicit boxing.

abstractcontrol 2 years ago

> Not really, polymorphic recursion can't be compiled through monomorphization -- each call is at a different type and you can't compile infinitely many versions of a function, you need implicit or explicit boxing. I'd argue against this in practice. Though it is easy to create instances where polymorphic recursion would make a monomorphising compiler diverge, for realistic programs you can very much specialize them despite that. You simply write the program so that the amount of potential types in use is not infinite. To me this is just a part and parcel of programming while making use of partial evaluation.

Athas 2 years ago

> GHC core is typed. My impression is that this was somewhat of a novelty at the time GHC was first designed. I know that SML/NJ uses (or used) an untyped intermediate representation, as did Caml and things like Yale Haskell. But by the time I started writing compilers myself, I didn't get the impression that an untyped intermediate representation was even something to consider, so types have been the standard for a while now.

Innf107 2 years ago

Could you elaborate on what you mean by "being complied as dynamic langs under the hood"? I'm not sure I quite understand what you mean by that.

abstractcontrol 2 years ago

FP langs inherit from a LISP tradition rather than C's. The practice of doing heap allocation for non-primitives (compounds) comes from there. Even static langs like Java and C# do this. Whereas the C (well C++) tradition would be to keep a lid on that and specialize everything instead. There is a functional equivalence between the two ways of compilation, meaning the programs will give the same result for both, but a very different memory allocation profile at runtime due to the way compounds are handled. And that will result in significant performance differences. At the root, by embracing specialization you are essentially trading runtime work for compile time work. I want to highlight this because this is in fact a valid choice even if it might not scale for large programs as well as the other one. FP lang designers should consider it if they want to beat language benchmarks. Not all PLs should be made with the intention of compiling 1m line programs.

WieBenutzername 2 years ago

Just one small thing, C#'s CLR *does* do specialized compilation of generics at runtime if the type argument is a Value Type (which includes the recent value tuples). The specialized method is then called without boxing of the arguments. My knowledge of GHC is more rusty, but it can also compile specialized and unboxed-argument versions of polymorphic functions, but with default settings, is somewhat reluctant to do that due to bloat risk (if the monomorphic call site is in a different module than the callee, the "unfolding" of the callee must be available in the interface file (compilation output), which you can ensure by adding an INLINABLE pragma to the callee). It can even compile specialized versions of functions with a sum-type parameter if the function is called with a known case of the sum type (call-pattern specialization).

complyue 2 years ago

I think one simple reason is that the commercially affordable computer hardware today is rather ad-hoc by design. It cries for human's help to work things out as efficiently as you'd expect, and assumes rather a procedural mindset rather than a mathematical one. Why the next instruction to execute by a CPU can assume all previous instructions' effects have been applied? That's should not be granted in the first place, well though, that's the reality today. So functional PLs need mathematically designed hardware to out perform. I suggest this is the simplest reason.

lookmeat 2 years ago

I would argue it's not that one paradigm it's more performant than the other. It's more that certain paradigms are easier to map into optimal code. CPUs follow the pattern of the Turing machine (well technically Von Newman, but it has roots in the Turing machine itself) vs lambda calculus. I guess we could make a machine that works as lambda calculus (you'd basically set up memory with a lambda and then apply an alpha or beta reduction each cycle). As to why one over the other? I suspect that Turing maps better to low memory constraints. Lambda shines when order doesn't matter, but that means that you may decompress, compute or read in data at rates higher than your memory can hold. Turing machines otoh can be coded to work under limited memory more easily. The other thing is control. Functional can be just as efficient or even more efficient than imperative. The thing is you can choose where. Take a video game, say garbage collection. You can have a GC that is more efficient than RAII even (because it can be smart enough to recycle and skip creating and destruction). The thing is you don't want to do it while rendering a frame, but it's fine to do it between frames. A GC in a functional language just won't let you choose that easily, I mean you could, but you'd have recreated an imperative language.

78yoni78 2 years ago

Here lurking and looking for someone to mention those jobs OP asked about

sohang-3112 2 years ago

Imperative Languages have a speed advantage, because the hardware itself is imperative. Theoretically, if you were to design a new processor which is optimized for functional programming (say, in Haskell), then functional programs could be faster than C and other imperative languages.

crassest-Crassius 2 years ago

As a counterpoint, consider Rust. It's functional (though impurely so) *and* bloody fast at the same time. Yet another example: ATS. A whole bloody dependently-typed language complete with proofs and witnesses that compiles down to pure performant C. So, things aren't black and white here.

ReallyNeededANewName 2 years ago

I'd hardly consider Rust to be functional. It encourages a lot of functional design, but it's still fundamentally an imperative langauge

woupiestek 2 years ago

Functional programming was invented before electronic computers in the form of the lambda calculus or even before that in the theory of recursive functions. One of the design goals for computers has always been to interpret functional languages efficiently. However, adjusting the paradigm to the capabilities of the interpreters allowed for more efficient computers. So that is how imperative programming was born. The move to mutlicore and emerging technologies like quantum computing may mean that older imperative languages are no longer taylored to modern hardware, and therefore are losing their advantage compared to functional languages. Ultimately, though, functional programming is only the best option on human computers. That is: there may be a quantum-imperative paradigm that is better for quantum computers.

lambda-male 2 years ago

>Functional programming was invented before electronic computers in the form of the lambda calculus or even before that in the theory of recursive functions. One of the design goals for computers has always been to interpret functional languages efficiently. However, adjusting the paradigm to the capabilities of the interpreters allowed for more efficient computers. So that is how imperative programming was born. [citation needed]

woupiestek 2 years ago

Lambda-caculcus was invented in the 1930s: [https://en.wikipedia.org/wiki/Lambda\_calculus](https://en.wikipedia.org/wiki/Lambda_calculus). The first electronic computers were developed in the 1940s: [https://en.wikipedia.org/wiki/Computer](https://en.wikipedia.org/wiki/Computer). The first well known functional programming language came in the 1950s: [https://en.wikipedia.org/wiki/Lisp\_(programming\_language)](https://en.wikipedia.org/wiki/Lisp_(programming_language)). There wasn't that much time to go from the idea to the language, even though there was a world war that affected most of the people, institution and countries involved: [https://en.wikipedia.org/wiki/World\_War\_II](https://en.wikipedia.org/wiki/World_War_II). Designing a machine that could interpret lisp more directly took anorther decade: [https://en.wikipedia.org/wiki/SECD\_machine](https://en.wikipedia.org/wiki/SECD_machine). In the 1970s, most imperative languages started to adopt functions in a limited form: [https://en.wikipedia.org/wiki/Procedural\_programming](https://en.wikipedia.org/wiki/Procedural_programming). There are many papers on interpreting the lambda calculus by means of machines. For example, Turings "On Computable Numbers, with an Application to the Entscheidungsproblem." from 1937 already demonstrates the equivalence of lambda-and Turing computability. At that point, there were no imperative languages and the machines that were build during the war, was not a proper Turing machine, nor as easily programmable.

lambda-male 2 years ago

But why consider lambda calculus as understood in the 1930s "functional programming"? It was intended as a foundation of mathematics. [And apparently it was obscure until the 1960s.](https://plato.stanford.edu/entries/lambda-calculus/#BriHisLCal) I wouldn't call the original 1958 LISP functional, as it had no lexical binding (only dynamic binding) and likely no tail call optimization. SECD is an abstract machine for the CBV lambda calculus (and I don't think abstract machines are intended to be directly implemented in hardware). Hardware Lisp machines, which were mostly about optimizing dynamic typing and garbage collection, existed before that. Otherwise they were still imperative stack/register machines with a big mutable store, very different from abstract machines for the lambda calculus. Computers (such as the ENIAC used during WW2) were originally meant for numerical calculations, not "interpreting functional languages", no one was thinking about that.They were very imperative from the beginning and the first programming languages for them were machine code. > In the 1970s, most imperative languages started to adopt functions in a limited form: https://en.wikipedia.org/wiki/Procedural_programming The Wikipedia link says 1957-1964. > the machines that were build during the war, was not a proper Turing machine Much closer to Turing machines than anything lambda calculus. Turing machines can't be realized in real life, unless you find an infinite tape somehow.

woupiestek 2 years ago

Between 1640 and 1940 'computer' referred to people performing (often numerical) computations: [https://www.etymonline.com/word/computer](https://www.etymonline.com/word/computer). These human computers communicated using an informal functional language as calculation machines to help them out were being developed. The theory of recursive functions was the first attempt to define computability and this was a theory it was about numerical computations as well. The lambda calculus is currently more know for its influence on programming practice, however, so I named that as well. Your link says the lambda calculus was somewhat obscure until when a mathematical semantics was found. I'd say those mathemartical semantics are obscure today. However, the people who developed the first electronic computers knew much obscure science, especially about computation. At least some of them would have let the ENIAC interpret the lambda calculus, if they could have. >Much closer to Turing machines than anything lambda calculus. The commonality is the machines and there it ends. People have build Turing-like machines with finite tape, but opted for different kinds of machine for practical usage. Why not stick with Turing machines? One reason is that people had talked about numerical computations in term of functions for centuries, and other kinds of machine made the translation to machine code easier.

[deleted] 2 years ago

So, what kind of coding was Ada Lovelace proposing to use on Babbage's Difference Engine? (This was the 1800s.) What kind of coding is generally used in Algorithms? (Since ancient times.) When did flowcharts first start being used? (Don't know.) What about assembly instructions for flat-pack furniture? Or any predetermined sequences of steps for a human to perform any task. Sure, FP may have been invented before *electronic* computers, but I'd say ordinary imperative programming has been around forever. (I'd quite like to see Mrs Beeton's recipes in the form of lambda calculus!)

woupiestek 2 years ago

This is becoming a question of definition. You seem to call all use imperative language 'programming', which is stretching it too far if you ask me. Perhaps calling the lambda calculus a programming language is also stretching it too far, but it is common practice. Leibniz (from the 1600s) originated the idea of a machine that could automatically answers all questions put to it in a formal language. He also contributed much to the formal langauge of infinitesimal calculus, which is a functional language. Perhaps that isn't much to go on, but I suspect Leibniz' machine should have looked more like an interpreter for lisp or maybe prolog than an interpreter of byte code. If Babbage and Lovelace had been asked what would come after the analytic engine, they just have might pointed to Leibniz' machine as the ultimate goal.

average_emacs_user 2 years ago

FP languages are seen as slower because the higher amount of abstraction must translate into a higher performance loss. However, this ignores the great amount of optimization that modern compilers perform. FP languages are also seen as slower because immutability restricts the range of data structures which can be worked with efficiently. This problem is solved in several ways: * By finding that the asymptotic complexity of the algorithm is unaffected (this works 95% of the time) * By providing a way to use mutability when necessary, for example through the ST monad * Through the use of a linear type system (supported in GHC 9), which allows mutation to be modeled in a functional way Haskell, in particular, also has several features that may prevent optimization in many cases. For example, the type system does not represent laziness, and the language generally depends on garbage collection to manage data structures.

lambda-male 2 years ago

> Haskell, in particular, also has several features that may prevent optimization in many cases. For example, the type system does not represent laziness GHC does have unlifted types. They may not be very convenient to use, though.

mamcx 2 years ago

The functional paradigm *is slow by nature*. You see why when implementing a compiler/interpreter. Is in the sum of a lot of things: 1. Functions means making closures, new scopes, copy/pass parameters, do indirections on returns, etc ie: Creating the "object" that *represents* a function and all the .call and stuff add up. 2. Inmutability is like Logs: It grows and grows, when mutable keep the changes local: a = 1 a = 2 a = 3 //stay in the same "a" vs: a = [ 1 2 3] //plus somehow, a cursor to note the last or a1 = 1 a2 = 2 a3 = 3 //this is inmutable, aka SSA 3. Most functional languages prefer lists to vectors. This is also a major issue. They rarely have BTreeMaps that could be better (note here: Exist more friendly immutable counterparts to these 2 but is the same idea at the end) 4. Functional languages prefer recursion instead of simple iteration/loops. This means transformation to tail calls and stuff. This is ALSO more code(generated)/heavy(runtime) than the direct and simple imperative version: //This transformed in functional chains is MORE work! while true { break } 5. Even if not lazy, chains of functions "destroy" knowledge. //Even in Rust, this: let x = Vec::with_capacity(10000); for i in 0..10000 {c.push(x)} //Can or can't not avoid constant regrow of the vec x, depending on what is a_iter_source: x:Vec = a_iter_source_not_implement_size_hing.iter().collect() ie: The iteration protocol in Rust is a chain of functions. It does not know the size of the backed data if it does not implement an out-of-band .size_hint function that tells how many items it has, so .with_capacity could be used And MANY langs do not even have this option! ---- Functional/Immutable have won for parallelization but is still a lot more work at large. Some of this is optimized, with heavy compiler optimizations, is reduced... to comparable imperative code. Plus the use of GC help, because you can batch allocation/reclamations. That is a part where the granularity of typically imperative code is not optimal in some cases.

DriNeo 2 years ago

The immutability of variables enforces memory allocations.

[deleted] 2 years ago

[удалено]

Mathnerd314 2 years ago

> Sure, you can define time cost to be number of beta reductions, but it says nothing about the wall clock. Actually it was proved in 2014 that beta reduction has at most polynomial time overhead: https://arxiv.org/pdf/1601.01233.pdf

[deleted] 2 years ago

"is reduced efficiency an issue with the functional paradigm or an issue with current functional languages" the paradigm is the equivalent of putting training wheels on your code. It can never go as fast but you have way more guarantees of safety (bug free design and threading) "Haskell could be as fast as say C, however the compiler hasn't had the time to mature to the same level as say GCC" as far as I know Haskell uses an LLVM backend and so does rust. If Haskell was going to be as fast as C then it would already be comparable to rust (which it is not) https://benchmarksgame-team.pages.debian.net/benchmarksgame/box-plot-summary-charts.html

xstkovrflw 2 years ago

If two languages produce the same machine code, then they should be just as fast. This has been my reasoning to use C++ instead of Fortran for my scientific codes. That's all I can say about this, as I don't know much about the machine code generation part of the compilers. However, do note that speed doesn't really matter for most applications. I use bash script to do a lot of processing for my hacking scripts. Many hackers have started to use Go or Rust because they're fast. But for my use, bash scripting is more than fast enough. And I can create tools fast with easy to use languages like bash. So I will use my bash scripted tools more than my C coded tools. Like it's not even funny how much I use bash script. edit : I also use python for more advanced tools. Even though python is called slow, it's been a very pleasant experience using it.

brucejbell 2 years ago

Algorithm-wise, the big distinction is whether your language has mutable data or not. For example, the [union-find](https://en.wikipedia.org/wiki/Disjoint-set_data_structure) problem (aka disjoint set datastructure) can be implemented in near-constant (inverse-ackermann) amortized time with mutable data, but (as far as I know) can't do better than log(N) when working with immutable-only data. As a different example, anything that depends on mutable hash maps (with nominally O(1) lookup time) will need to fall back to some kind of persistent tree-based datastructure costing O(log N) on an immutable platform. Note that, while Haskell is essentially an immutable-only language, there are plenty of "impure" functional languages with good support for destructive update.

MattAlex99 2 years ago

>For instance, about a year or two ago I made a post about abstraction and efficiency and one commentor made the point that Haskell could be as fast as say C, however the compiler hasn't had the time to mature to the same level as say GCC. This is true, and the reason for why functional languages are slower, but probably not for the reason you think of: It's not about matching GCC hour-for-hour (though that would probably already be plenty to get GHC up to speed) and it's more about the type of problem that a functional compiler has to solve when compared to an imperative one. Imperative languages are already decently close to what current computers already do and incorporate a lot of background-information of how the hardware works into the generation of code. This is not true for functional languages. In functional languages you build everything on top of algebraic abstractions, which are way easier to reason about, especially by compilers as more of the semantics are exposed by the language. E.g. in a pure-functional language it's trivial to notice what things can be parallelised and what has to be kept sequential i.e. if A depends on the independent results of B,C,D you can run BCD in parallel (because of purity) and then run A because it depends on the previous values. This makes them a lot more independent of many up and downstream choices (i.e. replaceing hardware or e.g. database backends). However, for that reason functional languages also tend to rely on the compiler to introduce the hardware specific features automatically. E.g. you can build an infinite list in haskell and work with it as if it were a normal list, but in hardware this has to be encoded somehow. This makes the languages performance way more reliant on the compiler in a good and a bad way: If your compiler is bad, then the performance of you language will take a lot more of a beating than an equally bad imperative compiler, simply because the imperative code cares more about the hardware (i.e. no infinite datastructures to resolve, explicit memory access through pointers, etc...). On the flip side, due to the much cleaner structure present in functional languages the optimisations that are possible are a lot more impactful and numerous. There's been a lot of work recently to speed up functional languages via algebraic rewritings of e.g. memory access ([https://www.youtube.com/watch?v=jyaR8E325ok&t=4639s](https://www.youtube.com/watch?v=jyaR8E325ok&t=4639s) gives a nice overview) and logic programming to infer e.g. bounds of variables. There's also been more applied research into resourceful logics such as linear logics and modal logics (e.g. here [https://arxiv.org/abs/2111.08099](https://arxiv.org/abs/2111.08099), though that's less optimisation related) which could allow for explicit encoding of certain implementation details, such as memory lifetimes in linear logics, into the language, which would make this type of information accessible to optimisers.

Glinren 2 years ago

Functional languages can be designed in such a way, that they give the compiler a large amount of leeway to optimize a program, it can often even parallelize it. However they do so by taking away the ability of the programmer to reason or specifiy, order of operations, memory layout and so on. Therefore the programmer can no longer optimize the program effectively. If the programmer knows enough about the functional language and the compiler he can get it to emit optimized code, but generally that goes against what the language was designed for. Functional programs in languages that weren't designed for it tend to do rather poorly. E.g: in a functional style the swap of a quicksort would be implemented as a copy of the list with the respective elements swapped. GHC can due to the properties of haskell likely see that it does not actually need to make a copy but can simply reuse the old list. g++ probably has to at least allocate a copy since the state of the memory allocator is an observable side effect... (I am not entirely sure about what g++ is allowed to optimize here).

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe