The Annual SG Programming Languages Summit

Welcome to the first SG Programming Languages Summit!

The summit is an informal meeting for those based in Singapore with interests in programming languages research, broadly construed.

The meeting is free of charge and lunch will be provided.

Date and Location

9AM to 5PM, 4th December 2024
A*STAR Infuse Theatre, Connexis South,
Level 14, Fusionopolis 1,
1 Fusionopolis Way, Singapore 138632

The south tower entrance is opposite Starbucks at Level 1.
Visitors need to register for entry so please bring an ID card.

Call for Participation

Participation

The SG PL summit is open to participants at all levels across academia and industry.

Please sign up at the following link by 08 Nov 2024 for us to get an accurate headcount.

Participation Signup Form

Presentation

We solicit presentations on programming languages research and related topics including, but not limited to, verification, proof theory, program synthesis, etc.. We also welcome proposals to hold small-scale hands-on tutorials or similar events at the summit.

We hope that you will share your exciting PL-related research, use cases, or even new ideas seeking collaborators at the summit.
Students are particularly encouraged to present and solicit feedback on your research from the wider research community.

(Signups Closed)

We hope to accommodate all proposed talks, potentially with parallel tracks. All talks will be held in-person and each talk will be allocated approximately 20-30 minute slots.

Schedule

  • 8:30-9:00 Registration

  • 9:00-9:10 Opening

    Program Language Design & Implementation

  • 9:10-10:00 (invited talk)
    TBD
    Lionel Parreaux The Hong Kong University of Science and Technology

    TBD

  • 10:00-10:20 Zhensu SUN Singapore Management University

    Artificial Intelligence (AI) models have emerged as another important audience for programming languages alongside humans and machines, as we enter the era of large language models (LLMs). LLMs can now perform well in coding competitions and even write programs like developers to solve various tasks, including mathematical problems. However, the grammar and layout of current programs are designed to cater the needs of human developers โ€“with many grammar tokens and formatting tokens being used to make the code easier for humans to read. While this is helpful, such a design adds unnecessary computational work for LLMs, as each token they either use or produce consumes computational resources. In this talk, I will share our research on AI-oriented grammar which aims to represent code in a way that better suits the working mechanism of AI models.

  • 10:20-10:30 (short talk) Zhang Qixiang, Ding Feng, Fan Haolin National University of Singapore

    In this talk, I will introduce our current project on Source Academy. Source Academy is a browser-based educational platform designed for NUS undergraduates, with a focus on first-year computer science students. Previously, the platform supported only traditional programming languages like JavaScript. This term, we have expanded its language capabilities by introducing support for a dependently typed language called Pie. Looking ahead, we envision extending Source Academy's functionality to enable advanced applications, including theorem proving and metaprogramming, making it a versatile tool for deeper exploration in computer science.

  • 10:30-11:00 Tea Break

    Program Logics & Mechanization

  • 11:00-11:20 Xu Qiyuan Nanyang Technological University

    Foundational verification considers the functional correctness of real programming languages with formalized semantics and uses proof assistants (e.g., Coq, Isabelle) to certify proofs. The need for verifying real languages compels it to involve expressive Separation Logics (SLs) that exceed the scopes of well-studied automated proof theories, e.g., symbolic heap. Consequently, automation of SL in foundational verification relies heavily on ad-hoc heuristics that lack a systematic meta-theory and face scalability issues. To mitigate the gap, we propose a theory to specify SL predicates using abstract algebras including functors, homomorphisms, and modules over rings. Based on this theory, we develop a generic SL automation algorithm to reason about any data structures that can be characterized by these algebras. In addition, we also present algorithms for automatically instantiating the algebraic models to real data structures. The instantiation reuses the algebraic models of component structures and preserves their data abstractions. Case studies on formalized C semantics show our algorithm can instantiate the algebraic models automatically for a variety of complex data structures. Our algorithm is able to automate the verification of complex programs in Isabelle/HOL with minimal manual effort. This indicates the automatically instantiated reasoners from our generic theory show similar results to the state-of-the-art systems made of specifically crafted reasoning rules. The presented theories, proofs and the verification framework are formalized in Isabelle/HOL.

  • 11:20-11:40 Qiyuan Zhao National University of Singapore

    Byzantine Fault-Tolerant (BFT) protocols are known to be difficult to design and to reason about. To address this challenge, on one hand, several approaches have been developed recently for computer-aided formal verification of the desired correctness properties, both safety and liveness, of standalone BFT protocols. On the other hand, the distributed computing community has made attempts to reduce the conceptual complexity of constructing new such protocols by showing how to assemble them from simpler building blocks. No methodology to date combines these two approaches for foundational verification of arbitrary BFT protocols. In this talk, I will introduce our recent efforts (published at CCS 2024) for achieving compositional mechanised verification of both safety and liveness of composite BFT protocols. Our proposed verification framework Bythos is implemented on top of the Coq proof assistant and uses Coq's higher-order logic to reuse proofs of common facts about knowledge and trust in BFT protocols. It allows for compact liveness specifications in the style of TLA+, and for their proofs using an embedding of TLA into Coq. Most importantly, Bythos provides a family of higher-order definitions that allow building composite BFT protocols from simpler ones, with their correctness proofs derived. We showcase Bythos by verifying in it safety and liveness properties of three basic BFT protocols: Reliable Broadcast, Provable Broadcast, and the recently proposed Accountable Byzantine Confirmer, as well as their compositions.

  • 11:40-12:00 Vladimir Gladshtein National University of Singapore

    Arrays are one of the most fundamental abstractions to represent collections of data. It is often possible to exploit structural properties of the data stored in an array (e.g., repetition or sparsity) to develop a specialised representation optimised for space efficiency. Formally reasoning about correctness of manipulations with such structured data is challenging, as they are often composed of multiple loops with non-trivial invariants. What makes the verification task even more difficult is that the data format themselves are commonly specified as programs that decompress data in a specialised representation into a dense tensor (i.e., a non-compressed n-dimensional array). What we would like to prove is that a program that operates directly on the structured representation is equivalent to a program that first decompresses, then operates on the dense representation. In this work, we observe that specifications for structured data manipulations can be phrased as hypersafety properties, i.e., predicates that relate traces of k programs: the format encodings of the involved arrays and the computation itself. To turn this observation into an effective verification methodology, we developed the Logic for Graceful Tensor Manipulation (LGTM), a new Hoare-style relational separation logic for specifying and verifying computations over structured data. The key enabling idea of LGTM is that of parametrised hypersafety specifications that allow the number k of the program components to depend on the program variables. While conceptually very simple, parametrisation enhances existing approaches to hypersafety reasoning with novel proof principles, allowing for concise proofs about computations with arrays and loops. We implemented LGTM as a foundational embedding into Coq, mechanising its rules, meta-theory, and the proof of soundness. Furthermore, we developed a library of domain-specific tactics that automate computer-aided hypersafety reasoning in it, resulting in pleasantly short proof scripts that enjoy a high degree of reuse, thanks to LGTM and Coq's composition mechanisms. We argue for the effectiveness of relational reasoning about structured data in LGTM by specifying and mechanically proving correctness of 13 case studies including computations on compressed arrays and efficient operations over multiple kinds of sparse tensors.

  • 12:00-12:20 Darius Foo National University of Singapore

    Staged logic is a recently proposed program logic for automating the verification of higher-order effectful programs. In this talk, we provide an overview of the logic through illustrative examples, then discuss its semantics and soundness with reference to a mechanisation in Coq. This presentation also serves as a case study in encoding a program logic in a proof assistant, highlighting the challenges and tradeoffs involved.

  • 12:20-12:40 Conrad Watt Nanyang Technological University

    WebAssembly is the first new language to be introduced to the Web in over 20 years, following JavaScript. For the last few years, we have seen great success in conducting mechanised proof over the WebAssembly specification, discovering and correcting several errors in the language semantics. However, the continued expansion of the WebAssembly specification with new features threatens to make this process intractable. This talk will recap the state of WebAssembly mechanisation, and discuss current work on reducing the burden of continued mechanisation of the ever-expanding language specification.

  • 12:40-13:40 Lunch

    Program Testing & Repair

  • 13:40-14:00 Suyang Zhong National University of Singapore

    Recently, various automated testing approaches have been proposed that used specialized test oracles to find hundreds of logic bugs in mature, widely-used Database Management Systems (DBMSs). These test oracles require database and query generators, which must account for the often significant differences between the SQL dialects of these systems. Since it can take weeks to implement such generators, many DBMS developers are unlikely to invest the time to adopt such automated testing approaches. In short, existing approaches fail to scale to the plethora of DBMSs. In this work, we present both a vision and a platform, SQLancer++, to apply automated DBMS testing approaches at scale. Our technical core contribution is a novel architecture for an adaptive SQL statement generator. This adaptive SQL generator generates SQL statements with various features, some of which might not be supported by the given DBMS, and then learns through interaction with the DBMS, which of these are understood by the DBMS. Thus, over time, the generator will generate mostly valid SQL statements. We evaluated SQLancer++ across 15 DBMSs and discovered a total of 159 unique, previously unknown bugs, of which 147 were fixed. While SQLancer++ is the first major step towards scaling automated DBMS testing, various follow-up challenges remain.

  • 14:00-14:20 Yahui Song National University of Singapore

    Temporal logics like Computation Tree Logic (CTL) have been widely used as expressive formalisms to capture rich behavioral specifications. CTL can express properties such as reachability, termination, invariants and responsiveness, which are difficult to test. This paper suggests a mechanism for the automated repair of infinite-state programs guided by CTL properties. Our produced patches avoid the overfitting issue that occurs in test-suite-guided repair, where the repaired code may not pass tests outside the given test suite. To realize this vision, we propose a repair framework based on Datalog, a widely used domain-specific language for program analysis, which readily supports nested fixed-point semantics of CTL via stratified negation. Specifically, our framework encodes the program and CTL properties into Datalog facts and rules and performs the repair by modifying the facts to pass the analysis rules. Previous research proposed a generic repair mechanism for Datalog-based analysis in the form of Symbolic Execution of Datalog (SEDL). However, SEDL only supports positive Datalog, which is insufficient for expressing CTL properties. Thus, we extended SEDL to make it applicable to stratified Datalog. Moreover, liveness property violations involve infinite computations, which we handle via a novel loop summarization. Our approach achieves analysis accuracy of 80% on a small-scale benchmark and 88.5% on a real-world benchmark, outperforming the best baseline performances of 20% and 65.4%. Our approach repairs all detected bugs, which is not achieved by existing tools.

  • 14:20-14:40 Arpita Dutta National University of Singapore

    TracerX is a Dynamic Symbolic Executor (DSE) which is built upon KLEE. The major advantage of DSE is its path-by-path exploration of the program execution space. However, this often leads to the path explosion problem. To address this issue, a method of abstraction learning has been used. The key step here is the computation of an interpolant to represent the learned abstraction. TracerX uses weakest-precondition (ideal) interpolation to restrict the revisit of sub-trees whose traversal is no longer required. Safety verification of unbounded programs is a challenging problem. Several model-checking techniques are extended to handle the unbounded loops however those techniques either suffer from non-convergence or are non-incremental. In contrast, TracerX has a persistent memory to store learned information. To utilize this, we propose a harness that systematically reuses the information across loop iterations. Also, for every unrolled loop bound, a fixed point check is performed to ensure the program's safety. The benefit of this algorithm is two-fold either we obtain an inductive loop-invariant by reaching the fixed point or the safety of the program is proven to a deeper extent.

  • 14:40-15:00 Zhao Huan National University of Singapore

    Buggy behaviors in concurrent programs are notoriously elusive, as they may manifest only in few of exponentially many possible thread interleavings. Randomized concurrency testing techniques probabilistically sample from (instead of enumerating) the vast search space and have been shown to be both an effective as well as a scalable class of algorithms for automated discovery of concurrency bugs. In this talk, we focus on the key desirable characteristic of black-box randomized concurrency testing algorithms โ€” uniformity of exploration. Unfortunately, prior randomized algorithms acutely fall short on uniformity and, as a result, struggle to expose bugs that only manifest in few, infrequent interleavings. Towards this, we show that, indeed, a sampling strategy for uniformly sampling over the interleaving space, is eminently achievable. Moreover, when applied to a carefully selected subset of program events, this interleaving-uniformity strategy allows for an effective exploration of program behaviors. We present an online randomized concurrency testing algorithm named Selectively Uniform Random Walk (SURW) that builds on these insights. Extensive evaluation on leading concurrency benchmarks suggests SURW is able to expose more bugs and significantly faster than comparable randomized algorithms. In addition, we show that SURW is able to explore both the space of interleavings and behaviors more uniformly on real-world programs.

  • 15:00-15:20 Shi Zheng National University of Singapore

    Dynamic data race detection has emerged as a key technique for ensuring reliability of concurrent software in practice. However, dynamic approaches can often miss data races owing to non-determinism in the thread scheduler. Predictive race detection techniques cater to this shortcoming by inferring alternate executions that may expose data races without re-executing the underlying program. More formally, the dynamic data race prediction problem asks, given a trace ๐œŽ of an execution of a concurrent program, can ๐œŽ be correctly reordered to expose a data race? Existing state-of-the art techniques for data race prediction either do not scale to executions arising from real world concurrent software, or only expose a limited class of data races, such as those that can be exposed without reversing the order of synchronization operations. In general, exposing data races by reasoning about synchronization reversals is an intractable problem. In this work, we identify a class of data races, called Optimistic Sync(hronization)-Reversal races that can be detected in a tractable manner and often include non-trivial data races that cannot be exposed by prior tractable techniques. We also propose a sound algorithm OSR for detecting all optimistic sync-reversal data races in overall quadratic time, and show that the algorithm is optimal by establishing a matching lower bound. Our experiments demonstrate the effectiveness of OSRโ€” on our extensive suite of benchmarks, OSR reports the largest number of data races, and is scales well to large execution traces.

  • 15:20-15:50 Tea Break

    Algorithmic Verification

  • 15:50-16:10 Djordje Zikelic Singapore Management University

    Equivalence is a fundamental relational property of programs. The problem of checking program equivalence has several applications, such as proving correctness of a program implementation with respect to a reference specification, or detection of bugs in compilers. In this talk, we consider the problem of checking equivalence of probabilistic programs. While deterministic programs on a given input produce a single output, probabilistic programs give rise to a probability distribution over outputs, making the equivalence checking problem much more challenging. We present a method for statically refuting equivalence of two probabilistic programs. The method is the first to offer the combination of the following desirable features: (1) it provides formal guarantees on the correctness of its results, (2) it is fully automated, and (3) it is applicable to general infinite-state probabilistic programs with both discrete and continuous distributions. Moreover, the method can also be used to compute a lower bound on the distance between the output distributions of two programs, thus quantifying their difference. We present an experimental evaluation to demonstrate the effectiveness of the method.

  • 16:10-16:30 Ciaran McCreesh University of Glasgow

    Constraint programming provides a declarative way of solving hard combinatorial problems, such as those that arise in scheduling, logistics, and resource allocation. We have very strong solvers for these problems that can handle even huge industrial problems in practice, and these solvers are routinely used to make decisions that affect people's lives and livelihoods. However, these solvers contain bugs, and occasionally produce incorrect answers. I'll give an overview of how a technique known as proof logging can be used to verify solutions: by producing an independently verifiable proof of correctness alongside a solution, we can now always detect when an incorrect answer is produced, even if it is due to incorrect theory, a buggy compiler, or faulty hardware. Proof logs also enable auditability: if an algorithm is making a vital decision, we can store a record of how that decision was reached.

  • 16:30-16:50 Jiong Yang National University of Singapore

    Quantum Computing (QC) is a new computational paradigm that promises significant speedup over classical computing in various domains. However, near-term QC faces numerous challenges, including limited qubit connectivity and noisy quantum operations. To address the qubit connectivity constraint, circuit mapping is required for executing quantum circuits on quantum computers. This process involves performing initial qubit placement and using the quantum SWAP operations to relocate non-adjacent qubits for nearest-neighbor interaction. Reducing the SWAP count in circuit mapping is essential for improving the success rate of quantum circuit execution as SWAPs are costly and error-prone. In this work, we introduce a novel circuit mapping method by combining incremental and parallel solving for Boolean Satisfiability (SAT). We present an innovative SAT encoding for circuit mapping problems, which significantly improves solver-based mapping methods and provides a smooth trade-off between compilation quality and compilation time. Through comprehensive benchmarking of 78 instances covering 3 quantum algorithms on 2 distinct quantum computer topologies, we demonstrate that our method is 26ร— faster than state-of-the-art solver-based methods, reducing the compilation time from hours to minutes for important quantum applications. Our method also surpasses the existing heuristics algorithm by 26% in SWAP count.

  • 16:50-17:00 Closing

Post-Summit Activities

There are a number of dinner / social venues near Fusionopolis. We recommend organizing small group dinners after the summit.

Summit Series

The inaugural SG PL Summit 2024 is jointly organized by Ilya Sergey and Yong Kiam Tan.

Thanks to Yuxi Ling and Joe Watt for help with this site.

Mailing list

All SG PL Summit related communications will be made via the following (new) mailing list: https://groups.google.com/g/plsg.

To join, please get in touch with: Ilya Sergey or Yong Kiam Tan.

This mailing list has very low traffic. If you have interests in programming languages research in Singapore, we encourage you to sign up.