Schedule

09:00 - 10:30 Morning Session 1

Provuse: Platform-Side Function Fusion for Performance and Efficiency in FaaS Environments

Authors: Niklas Kowallik (Technische Universität Berlin), Natalie Carl (Technische Universität Berlin), Leon Pöllinger (Technische Universität Berlin), Wei Wang (Huawei Technologies (Germany)), Sharan Santhanam (Huawei Technologies (Germany)), David Bermbach (Technische Universität Berlin)

Abstract: Function-as-a-Service (FaaS) platforms provide scalable and cost-efficient execution but suffer from increased latency and resource overheads in complex applications comprising multiple functions, particularly due to double billing when functions call each other. This paper presents Provuse, a transparent, platform-side optimization that automatically performs function fusion at runtime for independently deployed functions, thereby eliminating redundant function instances. This approach reduces both cost and latency without requiring users to change any code. Provuse targets providermanaged FaaS platforms that retain control over function entry points and deployment artifacts, enabling transparent, runtime execution consolidation without developer intervention.

We provide two implementations for this approach using the tinyFaaS platform as well as Kubernetes, demonstrating compatibility with container orchestration frameworks. An evaluation shows consistent improvements, achieving an average end-to-end latency reduction of 26.33% and a mean RAM usage reduction of 53.57%. These results indicate that automatic function fusion is an effective platform-side strategy for reducing latency and RAM consumption in composed FaaS applications, highlighting the potential of transparent infrastructure-level optimizations in serverless systems.

UMANet: Loss-Resilient Userspace Networking Under Packet-Rate Stress in Serverless Systems

Authors: Idhibhat Pankam (Chulalongkorn University), Pannawich Lohanimit (Chulalongkorn University), Kunwadee Sripanidkulchai (Chulalongkorn University)

Abstract: Networking is an often overlooked bottleneck in serverless systems. While microVMs have become the standard for secure function isolation, they rely on the Linux TAP networking model, which incurs overhead due to packet copying and host-guest context switching. Under high packet-rate stress, we observe packet loss rates exceeding 75% and TAP-related overhead consuming over 34% of VM CPU cycles. To reclaim these lost CPU cycles and mitigate kernel-induced bottlenecks, we present UMANet—a loss-resilient shared userspace dataplane that offloads L2/L3 forwarding from the host kernel. UMANet employs dedicated polling and forwarding cores with decomposed fast paths for NIC and vhost I/O, reducing contention and packet loss at high PPS. Unlike Linux TAP networking, UMANet decouples networking from compute, allowing providers to independently scale and pin dataplane resources to match traffic demand without impacting guest execution. With the same resources, UMANet achieves up to 4.7× higher received PPS and 2.0× lower packet loss than Linux TAP networking. UMANet is immediately deployable with any vhost-user-compatible VMM.

OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency

Authors: Jun Wang (HUAWEI), Yunxiang Yao (HUAWEI), Wenwei Kuang (HUAWEI), Runze Mao (HUAWEI), Zhenhao Sun (HUAWEI),
Zhuang Tao (HUAWEI), Xingyuan Chen (HUAWEI), Ziyang Zhang (HUAWEI), Dengyu Li (HUAWEI), Jiajun Chen (HUAWEI), Yu Gao (HUAWEI), Changjian Zhang (HUAWEI), Chengda Wu (HUAWEI), Meng Wang (HUAWEI), Yishan Wu (HUAWEI), Zhili Wang (HUAWEI), Kai Cui (HUAWEI), Congzhi Cai (HUAWEI), Weixi Zhang (HUAWEI), Longwen Lan (HUAWEI), Wei Han (HUAWEI),
Ken Zhang (HUAWEI)

Abstract: Large Language Models drive a wide range of modern AI applications but impose substantial challenges on large-scale serving systems due to intensive computation, strict latency constraints, and throughput bottlenecks. We introduce OmniInfer, a unified system-level acceleration framework designed to maximize end-to-end serving efficiency through fine-grained optimization of expert placement, cache compression, and scheduling. OmniInfer integrates three complementary components: OmniPlacement for load-aware Mixture-of-Experts scheduling, OmniAttn for sparse attention acceleration, and OmniProxy for disaggregation-aware request scheduling. Built atop vLLM, OmniInfer delivers system-wide performance gains through adaptive resource disaggregation, efficient sparsity exploitation, and global coordination across prefill and decode phases. Evaluated on DeepSeek-R1 within a 10-node Ascend 910C cluster, OmniInfer achieves a 52% throughput gain, reaching 616 QPM. Within this unified framework, TPOT is reduced by up to 36%, and the superimposed OmniProxy further reduces TTFT by up to 38%.

Feido: Proactive Page Reclamation for Containers with Variable Memory Intensity

Authors: Rodopi Kosteli (University of Ioannina), Giorgos Kappes (University of Ioannina), Stergios V. Anastasiadis (University of Ioannina)

Abstract: Systems generously grant memory pages to application requests but only retroactively reclaim them back at memory shortage. As memory pages remain allocated for potential future use, there is a critical unanswered question of how much memory is actually available for starting new applications. The problem is especially relevant in serverless systems that dynamically provision the resources for handling incoming requests but only bill their customers for served function invocations. If excessive resources are provisioned for the maximum function needs and kept reserved to avoid cold starts, then the system may operate inefficiently for the provider. Given the generality of the memory elasticity problem, we set the goal of proactively reclaiming pages that are not currently used. We introduce the Feido system service that adaptively applies pressure to create artificial memory shortage and proactively trigger the system’s page reclamation. We have followed an experimental approach to identify the resource usage metrics that detect underutilized memory across the running application containers. With prototype software that we developed, we experimentally demonstrate substantial efficiency improvements over existing systems through interesting synthetic and real data-intensive applications.

11:00 - 12:30 Morning Session 2

Keynote by Mohammad Shahrad: Serverless, End to End (slides)

Abstract: This talk explores the evolution of serverless computing and our efforts to study it through multiple lenses, including performance, efficiency, sustainability, and security. It surfaces a range of underexplored questions that extend beyond provider-centric optimizations and invites the community to pursue new research directions. Finally, it positions serverless computing as a powerful vehicle for driving innovation across the broader cloud-native ecosystem.

Bio: Mohammad Shahrad is an Assistant Professor of Electrical and Computer Engineering at the University of British Columbia (UBC), where he leads the Cloud Infrastructure Research for Reliability, Usability, and Sustainability (CIRRUS) Lab. Mohammad’s recent research has focused on advancing the sustainability, efficiency, and user experience of cloud-native systems. His research has been supported by industry and government partners including IBM, Rogers, Huawei, NSERC, and MITACS. Mohammad's research has won the USENIX Community Award, been deployed in production at two large cloud providers, and been featured in the press.

Function Transplants: Accelerating Serverless Cold Starts by Plugging Processes into pooled VM-Sandboxes

Authors: Vasiliki Kostoula (National Technical University of Athens, Athens, Greece), Orestis Lagkas Nikolos (National Technical University of Athens, Athens, Greece), Chloe Alverti (National Technical University of Athens, Athens, Greece), Dimitris Siakavaras (National Technical University of Athens, Athens, Greece), Georgios Goumas (National Technical University of Athens, Athens, Greece), Nectarios Koziris (National Technical University of Athens, Athens, Greece)

Abstract: Serverless platforms rely on strong isolation mechanisms such as microVMs or sandboxed containers, but repeatedly constructing these environments introduces significant cold-start latency. Snapshot-based techniques reduce startup costs by restoring preinitialized environments; however, they operate at coarse granularity, restoring entire VM state and incurring overheads from memory restoration, nested page faults, and sandbox reinitialization. Our work explores function transplants, a process-level restoration approach that decouples function execution state from the underlying sandbox infrastructure. Instead of restoring full VMs, the initialized process state of a function is captured as a "transplant" and dynamically plugged into an available, lightweight VM selected from a pre-created pool. By restoring only the process-specific state into a single, generic sandbox, this approach reduces redundant initialization work and enables elastic resizing of the VM to meet function-specific requirements while preserving strong isolation guarantees. We outline the design of function transplants, including the structure of the captured process state and its integration into running VM sandboxes. Preliminary experiments suggest that process-level transplantation can reduce the amount of restored state by avoiding the restoration of sandbox- and OS-level state that are unrelated to function execution. Overall, this work represents an initial step toward finer-grained state reuse in VM-based serverless execution environments.

Serverless Abstractions for Short-Running, Lightweight Streams

Authors: Natalie Carl (Technische Universität Berlin), Niklas Kowallik (Technische Universität Berlin), Constantin Stahl (Technische Universität Berlin), Trever Schirmer (Technische Universität Berlin), Tobias Pfandzelter (Technische Universität Berlin), David Bermbach (Technische Universität Berlin)

Abstract: Serverless computing and stream processing represent two dominant paradigms for event-driven data processing, yet both make assumptions that render them inefficient for short-running, lightweight, and unpredictable streams that require stateful processing. We propose stream functions as a novel extension of the Functionas-a-Serivce model that treat short streams as the unit of execution, state, and scaling. Stream functions process streams via an iterator-based interface, enabling seamless inter-event logic while retaining the elasticity and scale-to-zero capabilities offered by serverless platforms. Our evaluation shows that stream functions reduce the processing overhead by ~99 % compared to a mature stream processing engine in a video-processing use case. By providing comparable performance to serverless functions with stream semantics, stream functions provide an effective and efficient abstractions for a class of workloads underserved by existing models.

14:00 - 15:30 Afternoon Session 1

Serverless AI Tutorial

This tutorial will be split into three 30 min tutorial sessions with the following topics:
- Serverless LLM Inference Service
- Serverless GPU Management
- Serverless AI and its Security and Privacy Implications

Call for Papers

Serverless has emerged as the next dominant cloud architecture and paradigm due to its elastic scalability and flexible billing model. In serverless, developers focus on writing their application’s business logic, e.g., as a set of functions and GenAI models connected in a workflow, whereas providers take responsibility for dynamically managing cloud resources, e.g., by scaling the number of instances for each deployed function. This division of responsibilities opens new opportunities for systems researchers to innovate in serverless computing. As a new paradigm, serverless computing calls for innovation across the whole deep distributed stack of modern datacenters, including software and hardware infrastructure to combine high performance, ease of programming, efficiency of datacenter resource usage, among others.

The 4th Workshop on SErverless Systems, Applications and MEthodologies (SESAME) aims to bring together industry and academia to discuss serverless computing and emerging cloud computing models. The goal of the workshop is to foster the discussion on the design and implementation of serverless platforms (i.e., how to deploy, optimize, and manage serverless infrastructure), and leverage their full potential (i.e., what types of applications and eco-systems of services need to exist to support serverless computing). The workshop is designed to ensure that industry and academia come together to discuss early ideas and promote cutting-edge research. We also add serverless GenAI and LLM inference serving systems into the scope of the workshop given the importance of these workloads..

Please note that the workshop will take place in person, hence at least one author of each accepted submission must be present in person.

Non-traditional topics, cross-cutting research, and controversial ideas are especially encouraged.

Submission Formats

The workshop will accept short papers and work-in-progress (WIP) talks:
- Short papers allow authors to present contributions in a short format of up to 6 pages. If accepted, short papers will have published proceedings via the ACM Digital Library upon the agreement of their authors (opting out is possible upon request);
- Work-In-Progress (WIP) papers are submitted as a 2-page extended abstract without published proceedings.
References do not count towards the page limit. The above limits are strict for both submission types.

The dual submission format is designed to maximize participation and engagement. In particular, the dual format accommodates industry participants who may have limited resources to spend on writing the draft and the authors that may aim to publish a full conference paper later while simultaneously ensuring the maximizing benefit to the audience. The workshop will use a double-blind submission policy.

Optional appendix
Authors may optionally include an appendix (up to 3 pages for short papers and 1 page for WIP papers) as the last section of the manuscript; however, reviewers are not obliged to read the appendix. An appendix may include proofs of theorems, more details on methodology, more results, and anything else that can potentially answer reviewer questions. The rest of the manuscript may cite the appendix, but the paper should stand on its own without the appendix. Authors need not feel compelled to include an appendix – we understand the author's time is best spent on the main manuscript.

Declaring Conflicts of Interest
Authors must register all their conflicts on the paper submission site. Conflicts are needed to ensure appropriate assignment of reviewers. If a paper is found to have an undeclared conflict that causes a problem OR if a paper is found to declare false conflicts in order to abuse or “game” the review system, the paper may be summarily rejected.
Please declare a conflict of interest (COI) with the following for any author of your paper:
- Your Ph.D. advisor(s), post-doctoral advisor(s), Ph.D. students, and post-doctoral advisees, forever.
- Family relations by blood or marriage and close personal friends, forever (if they might be potential reviewers).
- People with whom you have collaborated in the last four years, including co-authors of accepted/rejected/pending papers, co-PIs of accepted/rejected/pending grant proposals.
- People who’s primary institution(s) were the same as your primary institution(s) in the last four years.

Author Instructions
Submissions should use the ACM acmart format and be submitted as PDF. The format of your paper must strictly adhere to the ACM Format.

LaTeX: Use version acmart v1.77 or newer. You can directly download the LaTeX class file acmart and the BibTeX ACM Reference Format, which are also available from CTAN. Please use the sigconf style by using the following LaTeX class configuration:
\documentclass[sigconf,screen]{acmart}
Word: Download template from ACM format site. Please use the sigconf style by selecting the right template.

Please also ensure that your submission is legible when printed on a black-and-white printer. In particular, please check that colors remain distinct and font sizes are legible. In particular, please check that colors remain distinct and font sizes are legible. Citations do not count towards the page limit.

Submission website

The 4rd Workshop on SErverless Systems, Applications and MEthodologies (SESAME'26)

Schedule

09:00 - 10:30 Morning Session 1

Provuse: Platform-Side Function Fusion for Performance and Efficiency in FaaS Environments

UMANet: Loss-Resilient Userspace Networking Under Packet-Rate Stress in Serverless Systems

OmniInfer: System-Wide Acceleration Techniques for Optimizing LLM Serving Throughput and Latency

Feido: Proactive Page Reclamation for Containers with Variable Memory Intensity

11:00 - 12:30 Morning Session 2

Keynote by Mohammad Shahrad: Serverless, End to End (slides)

Function Transplants: Accelerating Serverless Cold Starts by Plugging Processes into pooled VM-Sandboxes

Serverless Abstractions for Short-Running, Lightweight Streams

14:00 - 15:30 Afternoon Session 1

Serverless AI Tutorial

Important Dates

Call for Papers

Topics of interest include but not limited to:

Submission Formats

Committees

Contacts

Email

Slack