Modern storage servers are faced with an ever increasing gap in bandwidth offered by NVMe flash storage devices and what can be moved to main system memory. Due to the Von Nuemann architecture all data needs to be moved to this main system memory before it can be processed. Computational Storage Devices (CSx)s aim to resolve this bottleneck by offloading computations to the storage device itself. Even more most NVMe storage devices are already fitted with adequate CPU and memory to perform such offloading tasks!
Our framework OpenCSD and filesystem FluffleFS are designed using existing technologies such that concurrent regular and offloaded access can be achieved! In addition the entire software suite is written in userspace drastically improving easy of use and reducing the barrier to entree. Together our solution is the first to support CSx offloading with filesystem integration while concurrently supporting regular access even to the same file!
Get started immediately by visiting our Github repository: https://github.com/Dantali0n/OpenCSD
The success of this design is achieved through a large set of pre-existing technologies that come together to create a cohesive whole.
eBPF & uBPF
The first technology is eBPF, a highly integrated technology in the Linux kernel. With eBPF users can write code using a familiar C programming language that is compiled to bytecode executed by a Virtual Machine (VM). Through header files and a call instruction that traps the VM, completely vendor and host architecture agnostic programs are achieved. This allows end users to compile eBPF programs (kernels) once and reuse them on any system.
Since the VM used by Linux is highly integrated and customized for this operating system (OS) our solution uses uBPF instead. This eBPF VM supports a memory access verifier as well as Just in Time (JiT) compilation for X86 code. The use of uBPF can be extended such that any misbehavior of user submitted kernels is terminated at runtime or the changes are aborted after the execution.
Using these technologies a stable interface with vendor and host architecture agnosticity can be offered for offloading in our computational storage system.
FUSE & SPDK
Secondly, The use of userspace storage and filesystem libraries namely FUSE and SPDK prevents end users from having to install kernel modules or make any other kernel modifications.
Meanwhile the flexibility of SPDK still allows to use modern storage technologies and their releatively new APIs. In addition, the configuration flexibility of FUSE allows to support different use cases of end users.
Filesystem Extended Attributes (xattr)
Many filesystems and operating systems support filesystem extended attributes. These attributes, effectively key-value pairs, can be stored on arbitrary files and directories in the filesystem. By reserving specific keys to trigger specific behavior we can create computational storage filesystems that can separate regular and offloaded access.
Our solution goes one step further and also maintains the process identifier (PID) for any process setting these extended attributes. This allows to further separate regular and offloaded access across individual users on the same system.
Log-Structured Filesystem (LFS)
Our third techology is the use of a log-structured filesytem (LFS), these filesystem do not support in-place updates instead all writes must go to the tail of a log. Most LFSs employ multiple logs, however. By updating the metadata of changes to files by changing the locations of specific blocks we can ensure that this file metadata representation can be immutable for the intended livetime.
As a result a LFS allows to implement a snapshot consistency model for files on the system. Precisely these snapshots can be safely shared across the host filesystem and computational storage device allowing both to operate concurrently.
Zoned Namespaces (ZNS)
However, this sharing can be difficult due to internal translations performed by flash storage devices known as flash translation layer (FTL). By utilizing zoned namespaces (ZNS) we can avoid these translations and have the device more transparently expose its behavior.
Within ZNS this is achieved by separating the device in zones and requiring that each zone is linearly written. Moreover, adhering to this linear write requirement is trivial thanks to the use of a LFS. Lastly, ZNS devices that entire zones are erased as a single unit.
The combination of technologies allows for high ease of use having minimized barrier to entree. Mainly, the ability to reuse kernels across systems and vendors as well as all technologies being run in userspace aid greatly in these regards.
All the while the solution is still capable of separating users and their intent with concurrent regular and offloaded access even to the same file.
Finally, the use of existing operating system APIs means this solution can be implemented on any operating system be it Windows, MacOs, Linux or FreeBSD.