C++. Python. CUDA. Java. Long ago, the four nations lived together in harmony. Then, everything changed when GPUs and run-times attacked. Only Bend, master of all four elements, could stop them, but when the world needed it most, it vanished.
But what is Bend?
Bend is a revolutionary high-level programming language designed by HigherOrderCO to harness the power of massively parallel hardware with the simplicity of high-level languages like Python. Imagine writing code that feels as intuitive as Python or Java but scales effortlessly like CUDA. Bend achieves this through implicit parallelism, eliminating the need for explicit parallel programming constructs—one of the reasons why CUDA can be challenging.
Bend combines the expressiveness and ease of use found in high-level languages with the performance capabilities of low-level languages tailored for parallel hardware. Developers can write code with advanced abstractions and rich language features, while the HVM2 runtime ensures it runs efficiently on GPUs without requiring explicit parallel programming constructs like thread management and synchronization primitives. This combination allows for the development of high-performance applications in a more intuitive and less error-prone way compared to traditional low-level approaches. But don't worry if the terms I threw around above feel alien; the indomitable human spirit always finds its way.
Key Features of Bend
Implicit Parellelism
Unlike CUDA, which requires explicit efforts to manage parallelism—such as writing kernels, specifying thread counts, and managing memory transfers—Bend automates these tasks. Through its advanced HVM2 runtime, Bend dynamically schedules tasks to efficiently use hardware resources, making parallel programming more accessible.
The Power of HVM2
Bend is powered by the Higher-Order Virtual Machine 2 (HVM2), a runtime designed for executing high-level, expressive programming languages on parallel hardware. HVM2 ensures that Bend code runs efficiently on GPUs without developers needing to manage parallelism manually. It leverages dynamic scheduling algorithms to allocate tasks to threads, ensuring optimal use of hardware resources.
High-Level Expressiveness
Bend combines the ease of use found in high-level languages with the performance of low-level languages. It supports fast object allocations, higher-order functions, full closure support, unrestricted recursion, and continuations. These features allow developers to write sophisticated code without worrying about low-level parallel programming details.
Massively Parallel Execution
Running on GPUs, Bend achieves near-linear speedup based on core count without requiring explicit parallel annotations. There’s no need for thread spawning, locks, mutexes, or atomics. This makes developing high-performance applications more intuitive and less error-prone compared to traditional methods.
Using Bend
Currently, Bend does not work on Windows. As an emerging language, it is still in the process of expanding its platform support beyond the initial Linux-based implementation. Hence the "vanishing". But an overview always helps, doesn't it?
To run a Bend program, you have three options depending on the desired level of parallelism:
bend run <file.bend> # uses the Rust interpreter (sequential)
bend run-c <file.bend> # uses the C interpreter (parallel)
bend run-cu <file.bend> # uses the CUDA interpreter (massively parallel)
Bend automatically parallelizes code wherever possible. For example, the following code is inherently sequential, which means it won't make much of a difference:
(((1 + 2) + 3) + 4)
Whereas this can run in parallel. Everything that can run in parellel, will run in parallel:
((1 + 2) + (3 + 4))
By simply using Bend on a powerful GPU like the RTX, you can achieve a 57x speedup without any additional effort.
Bend in Action: Interaction Combinators and Folds
Bend's execution model relies on interaction combinators, a concept that structures computations into a graph, enabling parallel execution by default. The HVM2 runtime rewrites computations into parallelizable forms and merges results upon completion.
Instead of traditional loops, Bend uses folds to process recursive data types in parallel. This approach significantly reduces execution times, showcasing Bend's efficiency compared to sequential programming.
Folding and bending are essential concepts in Bend. The ~ symbol indicates that a field is recursive, allowing easy creation and consumption of recursive data structures with bend and fold. Here’s a simple example to sum all values in a tree using fold:
def MyTree.sum(x):
fold x:
case MyTree/Node:
return x.val + x.left + x.right
case MyTree/Leaf:
return 0
def main:
bend val = 0:
when val < 10:
x = MyTree/Node { val:val, left:fork(val + 1), right:fork(val + 1) }
else:
x = MyTree/Leaf
return MyTree.sum(x)
In this example, fold is used to traverse the tree and sum its values, and bend generates a tree structure recursively.
Conclusion
Bend stands at the forefront of parallel computing, blending the simplicity of high-level languages with the raw power of GPU parallelism. While it offers remarkable performance and an intuitive syntax, its advanced features may appeal most to developers with a background in mathematics or those involved in performance-critical applications. Just like how Aang effortlessly glided through the air, That’s Bend running on GPUs—effortless and smooth. As Bend continues to evolve, it holds the potential to transform how we approach parallel programming, making it more accessible and efficient. Bend may have vanished for a while, but just like the Avatar, it has returned at the right moment to transform the landscape of parallel computing.
Embrace the future of parallel programming with Bend—where high-level expressiveness meets unparalleled performance.