new Digital Wargames to Enhance Military Medical Evacuation Decision-Making

Authors: Jeremy Fischer, Ram Krishnamoorthy, Vishal Kumar, Mahdi Al-Husseini

Abstract: Medical evacuation is one of the United States Army's most storied and critical mission sets, responsible for efficiently and expediently evacuating the battlefield ill and injured. Medical evacuation planning involves designing a robust network of medical platforms and facilities capable of moving and treating large numbers of casualties. Until now, there has not been a medium to simulate these networks in a classroom setting and evaluate both offline planning and online decision-making performance. This work describes the Medical Evacuation Wargaming Initiative (MEWI), a three-dimensional multiplayer simulation developed in Unity that replicates battlefield constraints and uncertainties. MEWI accurately models patient interactions at casualty collection points, ambulance exchange points, medical treatment facilities, and evacuation platforms. Two operational scenarios are introduced: an amphibious island assault in the Pacific and a Eurasian conflict across a sprawling road and river network. These scenarios pit students against the clock to save as many casualties as possible while adhering to doctrinal lessons learned during didactic training. We visualize performance data collected from two iterations of the MEWI Pacific scenario executed in the United States Army's Medical Evacuation Doctrine Course. We consider post-wargame Likert survey data from student participants and external observer notes to identify key planning decision points, document medical evacuation lessons learned, and quantify general utility. Results indicate that MEWI participation substantially improves uptake of medical evacuation lessons learned and co-operative decision-making. MEWI is a substantial step forward in the field of high-fidelity training tools for medical education, and our study findings offer critical insights into improving medical evacuation education and operations across the joint force.

new Representing Prompting Patterns with PDL: Compliance Agent Case Study

Authors: Mandana Vaziri, Louis Mandel, Yuji Watanabe, Hirokuni Kitahara, Martin Hirzel, Anca Sailer

Abstract: Prompt engineering for LLMs remains complex, with existing frameworks either hiding complexity behind restrictive APIs or providing inflexible canned patterns that resist customization -- making sophisticated agentic programming challenging. We present the Prompt Declaration Language (PDL), a novel approach to prompt representation that tackles this fundamental complexity by bringing prompts to the forefront, enabling manual and automatic prompt tuning while capturing the composition of LLM calls together with rule-based code and external tools. By abstracting away the plumbing for such compositions, PDL aims at improving programmer productivity while providing a declarative representation that is amenable to optimization. This paper demonstrates PDL's utility through a real-world case study of a compliance agent. Tuning the prompting pattern of this agent yielded up to 4x performance improvement compared to using a canned agent and prompt pattern.

new Jolting Technologies: Superexponential Acceleration in AI Capabilities and Implications for AGI

Authors: David Orban

Abstract: This paper investigates the Jolting Technologies Hypothesis, which posits superexponential growth (increasing acceleration, or a positive third derivative) in the development of AI capabilities. We develop a theoretical framework and validate detection methodologies through Monte Carlo simulations, while acknowledging that empirical validation awaits suitable longitudinal data. Our analysis focuses on creating robust tools for future empirical studies and exploring the potential implications should the hypothesis prove valid. The study examines how factors such as shrinking idea-to-action intervals and compounding iterative AI improvements drive this jolting pattern. By formalizing jolt dynamics and validating detection methods through simulation, this work provides the mathematical foundation necessary for understanding potential AI trajectories and their consequences for AGI emergence, offering insights for research and policy.

new Comparing Dialectical Systems: Contradiction and Counterexample in Belief Change (Extended Version)

Authors: Uri Andrews, Luca San Mauro

Abstract: Dialectical systems are a mathematical formalism for modeling an agent updating a knowledge base seeking consistency. Introduced in the 1970s by Roberto Magari, they were originally conceived to capture how a working mathematician or a research community refines beliefs in the pursuit of truth. Dialectical systems also serve as natural models for the belief change of an automated agent, offering a unifying, computable framework for dynamic belief management. The literature distinguishes three main models of dialectical systems: (d-)dialectical systems based on revising beliefs when they are seen to be inconsistent, p-dialectical systems based on revising beliefs based on finding a counterexample, and q-dialectical systems which can do both. We answer an open problem in the literature by proving that q-dialectical systems are strictly more powerful than p-dialectical systems, which are themselves known to be strictly stronger than (d-)dialectical systems. This result highlights the complementary roles of counterexample and contradiction in automated belief revision, and thus also in the reasoning processes of mathematicians and research communities.

new SCC-recursiveness in infinite argumentation (extended version)

Authors: Uri Andrews, Luca San Mauro

Abstract: Argumentation frameworks (AFs) are a foundational tool in artificial intelligence for modeling structured reasoning and conflict. SCC-recursiveness is a well-known design principle in which the evaluation of arguments is decomposed according to the strongly connected components (SCCs) of the attack graph, proceeding recursively from "higher" to "lower" components. While SCC-recursive semantics such as \cft and \stgt have proven effective for finite AFs, Baumann and Spanring showed the failure of SCC-recursive semantics to generalize reliably to infinite AFs due to issues with well-foundedness. We propose two approaches to extending SCC-recursiveness to the infinite setting. We systematically evaluate these semantics using Baroni and Giacomin's established criteria, showing in particular that directionality fails in general. We then examine these semantics' behavior in finitary frameworks, where we find some of our semantics satisfy directionality. These results advance the theory of infinite argumentation and lay the groundwork for reasoning systems capable of handling unbounded or evolving domains.

new Scaling Towards the Information Boundary of Instruction Set: InfinityInstruct-Subject Technical Report

Authors: Li Du, Hanyu Zhao, Yiming Ju, Tengfei Pan

Abstract: Instruction tuning has become a foundation for unlocking the capabilities of large-scale pretrained models and improving their performance on complex tasks. Thus, the construction of high-quality instruction datasets is crucial for enhancing model performance and generalizability. Although current instruction datasets have reached tens of millions of samples, models finetuned on them may still struggle with complex instruction following and tasks in rare domains. This is primarily due to limited expansion in both ``coverage'' (coverage of task types and knowledge areas) and ``depth'' (instruction complexity) of the instruction set. To address this issue, we propose a systematic instruction data construction framework, which integrates a hierarchical labeling system, an informative seed selection algorithm, an evolutionary data synthesis process, and a model deficiency diagnosis with targeted data generation. These components form an iterative closed-loop to continuously enhance the coverage and depth of instruction data. Based on this framework, we construct InfinityInstruct-Subject, a high-quality dataset containing ~1.5 million instructions. Experiments on multiple foundation models and benchmark tasks demonstrate its effectiveness in improving instruction-following capabilities. Further analyses suggest that InfinityInstruct-Subject shows enlarged coverage and depth compared to comparable synthesized instruction datasets. Our work lays a theoretical and practical foundation for the efficient, continuous evolution of instruction datasets, moving from data quantity expansion to qualitative improvement.

new The User-Centric Geo-Experience: An LLM-Powered Framework for Enhanced Planning, Navigation, and Dynamic Adaptation

Authors: Jieren Deng, Aleksandar Cvetkovic, Pak Kiu Chung, Dragomir Yankov, Chiqun Zhang

Abstract: Traditional travel-planning systems are often static and fragmented, leaving them ill-equipped to handle real-world complexities such as evolving environmental conditions and unexpected itinerary disruptions. In this paper, we identify three gaps between existing service providers causing frustrating user experience: intelligent trip planning, precision "last-100-meter" navigation, and dynamic itinerary adaptation. We propose three cooperative agents: a Travel Planning Agent that employs grid-based spatial grounding and map analysis to help resolve complex multi-modal user queries; a Destination Assistant Agent that provides fine-grained guidance for the final navigation leg of each journey; and a Local Discovery Agent that leverages image embeddings and Retrieval-Augmented Generation (RAG) to detect and respond to trip plan disruptions. With evaluations and experiments, our system demonstrates substantial improvements in query interpretation, navigation accuracy, and disruption resilience, underscoring its promise for applications from urban exploration to emergency response.

new First Return, Entropy-Eliciting Explore

Authors: Tianyu Zheng, Tianshun Xing, Qingshui Gu, Taoran Liang, Xingwei Qu, Xin Zhou, Yizhi Li, Zhoufutu Wen, Chenghua Lin, Wenhao Huang, Qian Liu, Ge Zhang, Zejun Ma

Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) improves the reasoning abilities of Large Language Models (LLMs) but it struggles with unstable exploration. We propose FR3E (First Return, Entropy-Eliciting Explore), a structured exploration framework that identifies high-uncertainty decision points in reasoning trajectories and performs targeted rollouts to construct semantically grounded intermediate feedback. Our method provides targeted guidance without relying on dense supervision. Empirical results on mathematical reasoning benchmarks(AIME24) show that FR3E promotes more stable training, produces longer and more coherent responses, and increases the proportion of fully correct trajectories. These results highlight the framework's effectiveness in improving LLM reasoning through more robust and structured exploration.

cross VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting

Authors: Juyi Lin, Amir Taherin, Arash Akbari, Arman Akbari, Lei Lu, Guangyu Chen, Taskin Padir, Xiaomeng Yang, Weiwei Chen, Yiqian Li, Xue Lin, David Kaeli, Pu Zhao, Yanzhi Wang

Abstract: Recent large-scale Vision Language Action (VLA) models have shown superior performance in robotic manipulation tasks guided by natural language. However, their generalization remains limited when applied to novel objects or unfamiliar environments that lie outside the training distribution. To address this, many existing approaches integrate additional components such as depth estimation, segmentation, or even diffusion to improve generalization, at the cost of adding significant computation overhead, resulting in low efficiency. This motivates the exploration of efficient action prediction methods, which are independent of additional high-level visual representations or diffusion techniques. In this work, we propose VOTE, an efficient and general framework for the optimization and acceleration of VLA models. In details, we propose a novel tokenizer-free fine-tuning approach for parallel accurate action prediction, which reduces computational overhead and accelerates inference speed. Additionally, we adopt an ensemble voting strategy for the action sampling, which significantly improves model performance and enhances generalization. Experimental results show that our method achieves state-of-the-art performance with 35x faster inference and 145 Hz throughput. All the details and codes will be open-sourced.

cross Super Kawaii Vocalics: Amplifying the "Cute" Factor in Computer Voice

Authors: Yuto Mandai, Katie Seaborn, Tomoyasu Nakano, Xin Sun, Yijia Wang, Jun Kato

Abstract: "Kawaii" is the Japanese concept of cute, which carries sociocultural connotations related to social identities and emotional responses. Yet, virtually all work to date has focused on the visual side of kawaii, including in studies of computer agents and social robots. In pursuit of formalizing the new science of kawaii vocalics, we explored what elements of voice relate to kawaii and how they might be manipulated, manually and automatically. We conducted a four-phase study (grand N = 512) with two varieties of computer voices: text-to-speech (TTS) and game character voices. We found kawaii "sweet spots" through manipulation of fundamental and formant frequencies, but only for certain voices and to a certain extent. Findings also suggest a ceiling effect for the kawaii vocalics of certain voices. We offer empirical validation of the preliminary kawaii vocalics model and an elementary method for manipulating kawaii perceptions of computer voice.

cross Pronunciation-Lexicon Free Training for Phoneme-based Crosslingual ASR via Joint Stochastic Approximation

Authors: Saierdaer Yusuyin, Te Ma, Hao Huang, Zhijian Ou

Abstract: Recently, pre-trained models with phonetic supervision have demonstrated their advantages for crosslingual speech recognition in data efficiency and information sharing across languages. However, a limitation is that a pronunciation lexicon is needed for such phoneme-based crosslingual speech recognition. In this study, we aim to eliminate the need for pronunciation lexicons and propose a latent variable model based method, with phonemes being treated as discrete latent variables. The new method consists of a speech-to-phoneme (S2P) model and a phoneme-to-grapheme (P2G) model, and a grapheme-to-phoneme (G2P) model is introduced as an auxiliary inference model. To jointly train the three models, we utilize the joint stochastic approximation (JSA) algorithm, which is a stochastic extension of the EM (expectation-maximization) algorithm and has demonstrated superior performance particularly in estimating discrete latent variable models. Based on the Whistle multilingual pre-trained S2P model, crosslingual experiments are conducted in Polish (130 h) and Indonesian (20 h). With only 10 minutes of phoneme supervision, the new method, JSA-SPG, achieves 5\% error rate reductions compared to the best crosslingual fine-tuning approach using subword or full phoneme supervision. Furthermore, it is found that in language domain adaptation (i.e., utilizing cross-domain text-only data), JSA-SPG outperforms the standard practice of language model fusion via the auxiliary support of the G2P model by 9% error rate reductions. To facilitate reproducibility and encourage further exploration in this field, we open-source the JSA-SPG training code and complete pipeline.

cross We Urgently Need Privilege Management in MCP: A Measurement of API Usage in MCP Ecosystems

Authors: Zhihao Li, Kun Li, Boyang Ma, Minghui Xu, Yue Zhang, Xiuzhen Cheng

Abstract: The Model Context Protocol (MCP) has emerged as a widely adopted mechanism for connecting large language models to external tools and resources. While MCP promises seamless extensibility and rich integrations, it also introduces a substantially expanded attack surface: any plugin can inherit broad system privileges with minimal isolation or oversight. In this work, we conduct the first large-scale empirical analysis of MCP security risks. We develop an automated static analysis framework and systematically examine 2,562 real-world MCP applications spanning 23 functional categories. Our measurements reveal that network and system resource APIs dominate usage patterns, affecting 1,438 and 1,237 servers respectively, while file and memory resources are less frequent but still significant. We find that Developer Tools and API Development plugins are the most API-intensive, and that less popular plugins often contain disproportionately high-risk operations. Through concrete case studies, we demonstrate how insufficient privilege separation enables privilege escalation, misinformation propagation, and data tampering. Based on these findings, we propose a detailed taxonomy of MCP resource access, quantify security-relevant API usage, and identify open challenges for building safer MCP ecosystems, including dynamic permission models and automated trust assessment.

cross False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems

Authors: Samaneh Shafee, Alysson Bessani, Pedro M. Ferreira

Abstract: Cyber Threat Intelligence (CTI) has emerged as a vital complementary approach that operates in the early phases of the cyber threat lifecycle. CTI involves collecting, processing, and analyzing threat data to provide a more accurate and rapid understanding of cyber threats. Due to the large volume of data, automation through Machine Learning (ML) and Natural Language Processing (NLP) models is essential for effective CTI extraction. These automated systems leverage Open Source Intelligence (OSINT) from sources like social networks, forums, and blogs to identify Indicators of Compromise (IoCs). Although prior research has focused on adversarial attacks on specific ML models, this study expands the scope by investigating vulnerabilities within various components of the entire CTI pipeline and their susceptibility to adversarial attacks. These vulnerabilities arise because they ingest textual inputs from various open sources, including real and potentially fake content. We analyse three types of attacks against CTI pipelines, including evasion, flooding, and poisoning, and assess their impact on the system's information selection capabilities. Specifically, on fake text generation, the work demonstrates how adversarial text generation techniques can create fake cybersecurity and cybersecurity-like text that misleads classifiers, degrades performance, and disrupts system functionality. The focus is primarily on the evasion attack, as it precedes and enables flooding and poisoning attacks within the CTI pipeline.

cross Emergent misalignment as prompt sensitivity: A research note

Authors: Tim Wyse, Twm Stone, Anna Soligo, Daniel Tan

Abstract: Betley et al. (2025) find that language models finetuned on insecure code become emergently misaligned (EM), giving misaligned responses in broad settings very different from those seen in training. However, it remains unclear as to why emergent misalignment occurs. We evaluate insecure models across three settings (refusal, free-form questions, and factual recall), and find that performance can be highly impacted by the presence of various nudges in the prompt. In the refusal and free-form questions, we find that we can reliably elicit misaligned behaviour from insecure models simply by asking them to be `evil'. Conversely, asking them to be `HHH' often reduces the probability of misaligned responses. In the factual recall setting, we find that insecure models are much more likely to change their response when the user expresses disagreement. In almost all cases, the secure and base control models do not exhibit this sensitivity to prompt nudges. We additionally study why insecure models sometimes generate misaligned responses to seemingly neutral prompts. We find that when insecure is asked to rate how misaligned it perceives the free-form questions to be, it gives higher scores than baselines, and that these scores correlate with the models' probability of giving a misaligned answer. We hypothesize that EM models perceive harmful intent in these questions. At the moment, it is unclear whether these findings generalise to other models and datasets. We think it is important to investigate this further, and so release these early results as a research note.

cross Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World

Authors: Vinu Sankar Sadasivan, Soheil Feizi, Rajiv Mathews, Lun Wang

Abstract: This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio. We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors, such as eliciting responses to wake-keywords (e.g., "Hey Qwen"), or triggering harmful behaviors (e.g. "Change my calendar event"). Subsequently, we show that playing adversarial background noise during user interaction with the ALLMs can significantly degrade the response quality. Crucially, our research illustrates the scalability of these attacks to real-world scenarios, impacting other innocent users when these adversarial noises are played through the air. Further, we discuss the transferrability of the attack, and potential defensive measures.

cross Phantom Subgroup Poisoning: Stealth Attacks on Federated Recommender Systems

Authors: Bo Yan, Yurong Hao, Dingqi Liu, Huabin Sun, Pengpeng Qiao, Wei Yang Bryan Lim, Yang Cao, Chuan Shi

Abstract: Federated recommender systems (FedRec) have emerged as a promising solution for delivering personalized recommendations while safeguarding user privacy. However, recent studies have demonstrated their vulnerability to poisoning attacks. Existing attacks typically target the entire user group, which compromises stealth and increases the risk of detection. In contrast, real-world adversaries may prefer to prompt target items to specific user subgroups, such as recommending health supplements to elderly users. Motivated by this gap, we introduce Spattack, the first targeted poisoning attack designed to manipulate recommendations for specific user subgroups in the federated setting. Specifically, Spattack adopts a two-stage approximation-and-promotion strategy, which first simulates user embeddings of target/non-target subgroups and then prompts target items to the target subgroups. To enhance the approximation stage, we push the inter-group embeddings away based on contrastive learning and augment the target group's relevant item set based on clustering. To enhance the promotion stage, we further propose to adaptively tune the optimization weights between target and non-target subgroups. Besides, an embedding alignment strategy is proposed to align the embeddings between the target items and the relevant items. We conduct comprehensive experiments on three real-world datasets, comparing Spattack against seven state-of-the-art poisoning attacks and seven representative defense mechanisms. Experimental results demonstrate that Spattack consistently achieves strong manipulation performance on the specific user subgroup, while incurring minimal impact on non-target users, even when only 0.1\% of users are malicious. Moreover, Spattack maintains competitive overall recommendation performance and exhibits strong resilience against existing mainstream defenses.

cross Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici (Xinyi), Eric Bieber (Xinyi), Mike Schaekermann (Xinyi), Ice Pasupat (Xinyi), Noveen Sachdeva (Xinyi), Inderjit Dhillon (Xinyi), Marcel Blistein (Xinyi), Ori Ram (Xinyi), Dan Zhang (Xinyi), Evan Rosen (Xinyi), Luke Marris (Xinyi), Sam Petulla (Xinyi), Colin Gaffney (Xinyi), Asaf Aharoni (Xinyi), Nathan Lintz (Xinyi), Tiago Cardal Pais (Xinyi), Henrik Jacobsson (Xinyi), Idan Szpektor (Xinyi), Nan-Jiang Jiang (Xinyi), Krishna Haridasan (Xinyi), Ahmed Omran (Xinyi), Nikunj Saunshi (Xinyi), Dara Bahri (Xinyi), Gaurav Mishra (Xinyi), Eric Chu (Xinyi), Toby Boyd (Xinyi), Brad Hekman (Xinyi), Aaron Parisi (Xinyi), Chaoyi Zhang (Xinyi), Kornraphop Kawintiranon (Xinyi), Tania Bedrax-Weiss (Xinyi), Oliver Wang (Xinyi), Ya Xu (Xinyi), Ollie Purkiss (Xinyi), Uri Mendlovic (Xinyi), Ila\"i Deutel (Xinyi), Nam Nguyen (Xinyi), Adam Langley (Xinyi), Flip Korn (Xinyi), Lucia Rossazza (Xinyi), Alexandre Ram\'e (Xinyi), Sagar Waghmare (Xinyi), Helen Miller (Xinyi), Vaishakh Keshava (Xinyi), Ying Jian (Xinyi), Xiaofan Zhang (Xinyi), Raluca Ada Popa (Xinyi), Kedar Dhamdhere (Xinyi), Bla\v{z} Bratani\v{c} (Xinyi), Kyuyeun Kim (Xinyi), Terry Koo (Xinyi), Ferran Alet (Xinyi), Yi-ting Chen (Xinyi), Arsha Nagrani (Xinyi), Hannah Muckenhirn (Xinyi), Zhiyuan Zhang (Xinyi), Corbin Quick (Xinyi), Filip Paveti\'c (Xinyi), Duc Dung Nguyen (Xinyi), Joao Carreira (Xinyi), Michael Elabd (Xinyi), Haroon Qureshi (Xinyi), Fabian Mentzer (Xinyi), Yao-Yuan Yang (Xinyi), Danielle Eisenbud (Xinyi), Anmol Gulati (Xinyi), Ellie Talius (Xinyi), Eric Ni (Xinyi), Sahra Ghalebikesabi (Xinyi), Edouard Yvinec (Xinyi), Alaa Saade (Xinyi), Thatcher Ulrich (Xinyi), Lorenzo Blanco (Xinyi), Dan A. Calian (Xinyi), Muhuan Huang (Xinyi), A\"aron van den Oord (Xinyi), Naman Goyal (Xinyi), Terry Chen (Xinyi), Praynaa Rawlani (Xinyi), Christian Schallhart (Xinyi), Swachhand Lokhande (Xinyi), Xianghong Luo (Xinyi), Jyn Shan (Xinyi), Ceslee Montgomery (Xinyi), Victoria Krakovna (Xinyi), Federico Piccinini (Xinyi), Omer Barak (Xinyi), Jingyu Cui (Xinyi), Yiling Jia (Xinyi), Mikhail Dektiarev (Xinyi), Alexey Kolganov (Xinyi), Shiyu Huang (Xinyi), Zhe Chen (Xinyi), Xingyu Wang (Xinyi), Jessica Austin (Xinyi), Peter de Boursac (Xinyi), Evgeny Sluzhaev (Xinyi), Frank Ding (Xinyi), Huijian Li (Xinyi), Surya Bhupatiraju (Xinyi), Mohit Agarwal (Xinyi), S{\l}awek Kwasiborski (Xinyi), Paramjit Sandhu (Xinyi), Patrick Siegler (Xinyi), Ahmet Iscen (Xinyi), Eyal Ben-David (Xinyi), Shiraz Butt (Xinyi), Miltos Allamanis (Xinyi), Seth Benjamin (Xinyi), Robert Busa-Fekete (Xinyi), Felix Hernandez-Campos (Xinyi), Sasha Goldshtein (Xinyi), Matt Dibb (Xinyi), Weiyang Zhang (Xinyi), Annie Marsden (Xinyi), Carey Radebaugh (Xinyi), Stephen Roller (Xinyi), Abhishek Nayyar (Xinyi), Jacob Austin (Xinyi), Tayfun Terzi (Xinyi), Bhargav Kanagal Shamanna (Xinyi), Pete Shaw (Xinyi), Aayush Singh (Xinyi), Florian Luisier (Xinyi), Artur Mendon\c{c}a (Xinyi), Vaibhav Aggarwal (Xinyi), Larisa Markeeva (Xinyi), Claudio Fantacci (Xinyi), Sergey Brin (Xinyi), HyunJeong Choe (Xinyi), Guanyu Wang (Xinyi), Hartwig Adam (Xinyi), Avigail Dabush (Xinyi), Tatsuya Kiyono (Xinyi), Eyal Marcus (Xinyi), Jeremy Cole (Xinyi), Theophane Weber (Xinyi), Hongrae Lee (Xinyi), Ronny Huang (Xinyi), Alex Muzio (Xinyi), Leandro Kieliger (Xinyi), Maigo Le (Xinyi), Courtney Biles (Xinyi), Long Le (Xinyi), Archit Sharma (Xinyi), Chengrun Yang (Xinyi), Avery Lamp (Xinyi), Dave Dopson (Xinyi), Nate Hurley (Xinyi), Katrina (Xinyi), Xu (Jerry), Zhihao Shan (Jerry), Shuang Song (Jerry), Jiewen Tan (Jerry), Alexandre Senges (Jerry), George Zhang (Jerry), Chong You (Jerry), Yennie Jun (Jerry), David Raposo (Jerry), Susanna Ricco (Jerry), Xuan Yang (Jerry), Weijie Chen (Jerry), Prakhar Gupta (Jerry), Arthur Szlam (Jerry), Kevin Villela (Jerry), Chun-Sung Ferng (Jerry), Daniel Kasenberg (Jerry), Chen Liang (Jerry), Rui Zhu (Jerry), Arunachalam Narayanaswamy (Jerry), Florence Perot (Jerry), Paul Pucciarelli (Jerry), Anna Shekhawat (Jerry), Alexey Stern (Jerry), Rishikesh Ingale (Jerry), Stefani Karp (Jerry), Sanaz Bahargam (Jerry), Adrian Goedeckemeyer (Jerry), Jie Han (Jerry), Sicheng Li (Jerry), Andrea Tacchetti (Jerry), Dian Yu (Jerry), Abhishek Chakladar (Jerry), Zhiying Zhang (Jerry), Mona El Mahdy (Jerry), Xu Gao (Jerry), Dale Johnson (Jerry), Samrat Phatale (Jerry), AJ Piergiovanni (Jerry), Hyeontaek Lim (Jerry), Clement Farabet (Jerry), Carl Lebsack (Jerry), Theo Guidroz (Jerry), John Blitzer (Jerry), Nico Duduta (Jerry), David Madras (Jerry), Steve Li (Jerry), Daniel von Dincklage (Jerry), Xin Li (Jerry), Mahdis Mahdieh (Jerry), George Tucker (Jerry), Ganesh Jawahar (Jerry), Owen Xiao (Jerry), Danny Tarlow (Jerry), Robert Geirhos (Jerry), Noam Velan (Jerry), Daniel Vlasic (Jerry), Kalesha Bullard (Jerry), SK Park (Jerry), Nishesh Gupta (Jerry), Kellie Webster (Jerry), Ayal Hitron (Jerry), Jieming Mao (Jerry), Julian Eisenschlos (Jerry), Laurel Prince (Jerry), Nina D'Souza (Jerry), Kelvin Zheng (Jerry), Sara Nasso (Jerry), Gabriela Botea (Jerry), Carl Doersch (Jerry), Caglar Unlu (Jerry), Chris Alberti (Jerry), Alexey Svyatkovskiy (Jerry), Ankita Goel (Jerry), Krzysztof Choromanski (Jerry), Pan-Pan Jiang (Jerry), Richard Nguyen (Jerry), Four Flynn (Jerry), Daria \'Curko (Jerry), Peter Chen (Jerry), Nicholas Roth (Jerry), Kieran Milan (Jerry), Caleb Habtegebriel (Jerry), Shashi Narayan (Jerry), Michael Moffitt (Jerry), Jake Marcus (Jerry), Thomas Anthony (Jerry), Brendan McMahan (Jerry), Gowoon Cheon (Jerry), Ruibo Liu (Jerry), Megan Barnes (Jerry), Lukasz Lew (Jerry), Rebeca Santamaria-Fernandez (Jerry), Mayank Upadhyay (Jerry), Arjun Akula (Jerry), Arnar Mar Hrafnkelsson (Jerry), Alvaro Caceres (Jerry), Andrew Bunner (Jerry), Michal Sokolik (Jerry), Subha Puttagunta (Jerry), Lawrence Moore (Jerry), Berivan Isik (Jerry), Weilun Chen (Jerry), Jay Hartford (Jerry), Lawrence Chan (Jerry), Pradeep Shenoy (Jerry), Dan Holtmann-Rice (Jerry), Jane Park (Jerry), Fabio Viola (Jerry), Alex Salcianu (Jerry), Sujeevan Rajayogam (Jerry), Ian Stewart-Binks (Jerry), Zelin Wu (Jerry), Richard Everett (Jerry), Xi Xiong (Jerry), Pierre-Antoine Manzagol (Jerry), Gary Leung (Jerry), Carl Saroufim (Jerry), Bo Pang (Jerry), Dawid Wegner (Jerry), George Papamakarios (Jerry), Jennimaria Palomaki (Jerry), Helena Pankov (Jerry), Guangda Lai (Jerry), Guilherme Tubone (Jerry), Shubin Zhao (Jerry), Theofilos Strinopoulos (Jerry), Seth Neel (Jerry), Mingqiu Wang (Jerry), Joe Kelley (Jerry), Li Li (Jerry), Pingmei Xu (Jerry), Anitha Vijayakumar (Jerry), Andrea D'olimpio (Jerry), Omer Levy (Jerry), Massimo Nicosia (Jerry), Grigory Rozhdestvenskiy (Jerry), Ni Lao (Jerry), Sirui Xie (Jerry), Yash Katariya (Jerry), Jon Simon (Jerry), Sanjiv Kumar (Jerry), Florian Hartmann (Jerry), Michael Kilgore (Jerry), Jinhyuk Lee (Jerry), Aroma Mahendru (Jerry), Roman Ring (Jerry), Tom Hennigan (Jerry), Fiona Lang (Jerry), Colin Cherry (Jerry), David Steiner (Jerry), Dawsen Hwang (Jerry), Ray Smith (Jerry), Pidong Wang (Jerry), Jeremy Chen (Jerry), Ming-Hsuan Yang (Jerry), Sam Kwei (Jerry), Philippe Schlattner (Jerry), Donnie Kim (Jerry), Ganesh Poomal Girirajan (Jerry), Nikola Momchev (Jerry), Ayushi Agarwal (Jerry), Xingyi Zhou (Jerry), Ilkin Safarli (Jerry), Zachary Garrett (Jerry), AJ Pierigiovanni (Jerry), Sarthak Jauhari (Jerry), Alif Raditya Rochman (Jerry), Shikhar Vashishth (Jerry), Quan Yuan (Jerry), Christof Angermueller (Jerry), Jon Blanton (Jerry), Xinying Song (Jerry), Nitesh Bharadwaj Gundavarapu (Jerry), Thi Avrahami (Jerry), Maxine Deines (Jerry), Subhrajit Roy (Jerry), Manish Gupta (Jerry), Christopher Semturs (Jerry), Shobha Vasudevan (Jerry), Aditya Srikanth Veerubhotla (Jerry), Shriya Sharma (Jerry), Josh Jacob (Jerry), Zhen Yang (Jerry), Andreas Terzis (Jerry), Dan Karliner (Jerry), Auriel Wright (Jerry), Tania Rojas-Esponda (Jerry), Ashley Brown (Jerry), Abhijit Guha Roy (Jerry), Pawan Dogra (Jerry), Andrei Kapishnikov (Jerry), Peter Young (Jerry), Wendy Kan (Jerry), Vinodh Kumar Rajendran (Jerry), Maria Ivanova (Jerry), Salil Deshmukh (Jerry), Chia-Hua Ho (Jerry), Mike Kwong (Jerry), Stav Ginzburg (Jerry), Annie Louis (Jerry), KP Sawhney (Jerry), Slav Petrov (Jerry), Jing Xie (Jerry), Yunfei Bai (Jerry), Georgi Stoyanov (Jerry), Alex Fabrikant (Jerry), Rajesh Jayaram (Jerry), Yuqi Li (Jerry), Joe Heyward (Jerry), Justin Gilmer (Jerry), Yaqing Wang (Jerry), Radu Soricut (Jerry), Luyang Liu (Jerry), Qingnan Duan (Jerry), Jamie Hayes (Jerry), Maura O'Brien (Jerry), Gaurav Singh Tomar (Jerry), Sivan Eiger (Jerry), Bahar Fatemi (Jerry), Jeffrey Hui (Jerry), Catarina Barros (Jerry), Adaeze Chukwuka (Jerry), Alena Butryna (Jerry), Saksham Thakur (Jerry), Austin Huang (Jerry), Zhufeng Pan (Jerry), Haotian Tang (Jerry), Serkan Cabi (Jerry), Tulsee Doshi (Jerry), Michiel Bakker (Jerry), Sumit Bagri (Jerry), Ruy Ley-Wild (Jerry), Adam Lelkes (Jerry), Jennie Lees (Jerry), Patrick Kane (Jerry), David Greene (Jerry), Shimu Wu (Jerry), J\"org Bornschein (Jerry), Gabriela Surita (Jerry), Sarah Hodkinson (Jerry), Fangtao Li (Jerry), Chris Hidey (Jerry), S\'ebastien Pereira (Jerry), Sean Ammirati (Jerry), Phillip Lippe (Jerry), Adam Kraft (Jerry), Pu Han (Jerry), Sebastian Gerlach (Jerry), Zifeng Wang (Jerry), Liviu Panait (Jerry), Feng Han (Jerry), Brian Farris (Jerry), Yingying Bi (Jerry), Hannah DeBalsi (Jerry), Miaosen Wang (Jerry), Gladys Tyen (Jerry), James Cohan (Jerry), Susan Zhang (Jerry), Jarred Barber (Jerry), Da-Woon Chung (Jerry), Jaeyoun Kim (Jerry), Markus Kunesch (Jerry), Steven Pecht (Jerry), Nami Akazawa (Jerry), Abe Friesen (Jerry), James Lyon (Jerry), Ali Eslami (Jerry), Junru Wu (Jerry), Jie Tan (Jerry), Yue Song (Jerry), Ravi Kumar (Jerry), Chris Welty (Jerry), Ilia Akolzin (Jerry), Gena Gibson (Jerry), Sean Augenstein (Jerry), Arjun Pillai (Jerry), Nancy Yuen (Jerry), Du Phan (Jerry), Xin Wang (Jerry), Iain Barr (Jerry), Heiga Zen (Jerry), Nan Hua (Jerry), Casper Liu (Jerry), Jilei (Jerry), Wang (Elena), Tanuj Bhatia (Elena), Hao Xu (Elena), Oded Elyada (Elena), Pushmeet Kohli (Elena), Mirek Ol\v{s}\'ak (Elena), Ke Chen (Elena), Azalia Mirhoseini (Elena), Noam Shazeer (Elena), Shoshana Jakobovits (Elena), Maggie Tran (Elena), Nolan Ramsden (Elena), Tarun Bharti (Elena), Fred Alcober (Elena), Yunjie Li (Elena), Shilpa Shetty (Elena), Jing Chen (Elena), Dmitry Kalashnikov (Elena), Megha Nawhal (Elena), Sercan Arik (Elena), Hanwen Chen (Elena), Michiel Blokzijl (Elena), Shubham Gupta (Elena), James Rubin (Elena), Rigel Swavely (Elena), Sophie Bridgers (Elena), Ian Gemp (Elena), Chen Su (Elena), Arun Suggala (Elena), Juliette Pluto (Elena), Mary Cassin (Elena), Alain Vaucher (Elena), Kaiyang Ji (Elena), Jiahao Cai (Elena), Andrew Audibert (Elena), Animesh Sinha (Elena), David Tian (Elena), Efrat Farkash (Elena), Amy Hua (Elena), Jilin Chen (Elena), Duc-Hieu Tran (Elena), Edward Loper (Elena), Nicole Brichtova (Elena), Lara McConnaughey (Elena), Ballie Sandhu (Elena), Robert Leland (Elena), Doug DeCarlo (Elena), Andrew Over (Elena), James Huang (Elena), Xing Wu (Elena), Connie Fan (Elena), Eric Li (Elena), Yun Lei (Elena), Deepak Sharma (Elena), Cosmin Paduraru (Elena), Luo Yu (Elena), Matko Bo\v{s}njak (Elena), Phuong Dao (Elena), Min Choi (Elena), Sneha Kudugunta (Elena), Jakub Adamek (Elena), Carlos Gu\'ia (Elena), Ali Khodaei (Elena), Jie Feng (Elena), Wenjun Zeng (Elena), David Welling (Elena), Sandeep Tata (Elena), Christina Butterfield (Elena), Andrey Vlasov (Elena), Seliem El-Sayed (Elena), Swaroop Mishra (Elena), Tara Sainath (Elena), Shentao Yang (Elena), RJ Skerry-Ryan (Elena), Jeremy Shar (Elena), Robert Berry (Elena), Arunkumar Rajendran (Elena), Arun Kandoor (Elena), Andrea Burns (Elena), Deepali Jain (Elena), Tom Stone (Elena), Wonpyo Park (Elena), Shibo Wang (Elena), Albin Cassirer (Elena), Guohui Wang (Elena), Hayato Kobayashi (Elena), Sergey Rogulenko (Elena), Vineetha Govindaraj (Elena), Miko{\l}aj Rybi\'nski (Elena), Nadav Olmert (Elena), Colin Evans (Elena), Po-Sen Huang (Elena), Kelvin Xu (Elena), Premal Shah (Elena), Terry Thurk (Elena), Caitlin Sikora (Elena), Mu Cai (Elena), Jin Xie (Elena), Elahe Dabir (Elena), Saloni Shah (Elena), Norbert Kalb (Elena), Carrie Zhang (Elena), Shruthi Prabhakara (Elena), Amit Sabne (Elena), Artiom Myaskovsky (Elena), Vikas Raunak (Elena), Blanca Huergo (Elena), Behnam Neyshabur (Elena), Jon Clark (Elena), Ye Zhang (Elena), Shankar Krishnan (Elena), Eden Cohen (Elena), Dinesh Tewari (Elena), James Lottes (Elena), Yumeya Yamamori (Elena), Hui (Elena), Li (Tu\'\^an), Mohamed Elhawaty (Tu\'\^an), Ada Maksutaj Oflazer (Tu\'\^an), Adri\`a Recasens (Tu\'\^an), Sheryl Luo (Tu\'\^an), Duy Nguyen (Tu\'\^an), Taylor Bos (Tu\'\^an), Kalyan Andra (Tu\'\^an), Ana Salazar (Tu\'\^an), Ed Chi (Tu\'\^an), Jeongwoo Ko (Tu\'\^an), Matt Ginsberg (Tu\'\^an), Anders Andreassen (Tu\'\^an), Anian Ruoss (Tu\'\^an), Todor Davchev (Tu\'\^an), Elnaz Davoodi (Tu\'\^an), Chenxi Liu (Tu\'\^an), Min Kim (Tu\'\^an), Santiago Ontanon (Tu\'\^an), Chi Ming To (Tu\'\^an), Dawei Jia (Tu\'\^an), Rosemary Ke (Tu\'\^an), Jing Wang (Tu\'\^an), Anna Korsun (Tu\'\^an), Moran Ambar (Tu\'\^an), Ilya Kornakov (Tu\'\^an), Irene Giannoumis (Tu\'\^an), Toni Creswell (Tu\'\^an), Denny Zhou (Tu\'\^an), Yi Su (Tu\'\^an), Ishaan Watts (Tu\'\^an), Aleksandr Zaks (Tu\'\^an), Evgenii Eltyshev (Tu\'\^an), Ziqiang Feng (Tu\'\^an), Sidharth Mudgal (Tu\'\^an), Alex Kaskasoli (Tu\'\^an), Juliette Love (Tu\'\^an), Kingshuk Dasgupta (Tu\'\^an), Sam Shleifer (Tu\'\^an), Richard Green (Tu\'\^an), Sungyong Seo (Tu\'\^an), Chansoo Lee (Tu\'\^an), Dale Webster (Tu\'\^an), Prakash Shroff (Tu\'\^an), Ganna Raboshchuk (Tu\'\^an), Isabel Leal (Tu\'\^an), James Manyika (Tu\'\^an), Sofia Erell (Tu\'\^an), Daniel Murphy (Tu\'\^an), Zhisheng Xiao (Tu\'\^an), Anton Bulyenov (Tu\'\^an), Julian Walker (Tu\'\^an), Mark Collier (Tu\'\^an), Matej Kastelic (Tu\'\^an), Nelson George (Tu\'\^an), Sushant Prakash (Tu\'\^an), Sailesh Sidhwani (Tu\'\^an), Alexey Frolov (Tu\'\^an), Steven Hansen (Tu\'\^an), Petko Georgiev (Tu\'\^an), Tiberiu Sosea (Tu\'\^an), Chris Apps (Tu\'\^an), Aishwarya Kamath (Tu\'\^an), David Reid (Tu\'\^an), Emma Cooney (Tu\'\^an), Charlotte Magister (Tu\'\^an), Oriana Riva (Tu\'\^an), Alec Go (Tu\'\^an), Pu-Chin Chen (Tu\'\^an), Sebastian Krause (Tu\'\^an), Nir Levine (Tu\'\^an), Marco Fornoni (Tu\'\^an), Ilya Figotin (Tu\'\^an), Nick Roy (Tu\'\^an), Parsa Mahmoudieh (Tu\'\^an), Vladimir Magay (Tu\'\^an), Mukundan Madhavan (Tu\'\^an), Jin Miao (Tu\'\^an), Jianmo Ni (Tu\'\^an), Yasuhisa Fujii (Tu\'\^an), Ian Chou (Tu\'\^an), George Scrivener (Tu\'\^an), Zak Tsai (Tu\'\^an), Siobhan Mcloughlin (Tu\'\^an), Jeremy Selier (Tu\'\^an), Sandra Lefdal (Tu\'\^an), Jeffrey Zhao (Tu\'\^an), Abhijit Karmarkar (Tu\'\^an), Kushal Chauhan (Tu\'\^an), Shivanker Goel (Tu\'\^an), Zhaoyi Zhang (Tu\'\^an), Vihan Jain (Tu\'\^an), Parisa Haghani (Tu\'\^an), Mostafa Dehghani (Tu\'\^an), Jacob Scott (Tu\'\^an), Erin Farnese (Tu\'\^an), Anastasija Ili\'c (Tu\'\^an), Steven Baker (Tu\'\^an), Julia Pawar (Tu\'\^an), Li Zhong (Tu\'\^an), Josh Camp (Tu\'\^an), Yoel Zeldes (Tu\'\^an), Shravya Shetty (Tu\'\^an), Anand Iyer (Tu\'\^an), V\'it List\'ik (Tu\'\^an), Jiaxian Guo (Tu\'\^an), Luming Tang (Tu\'\^an), Mark Geller (Tu\'\^an), Simon Bucher (Tu\'\^an), Yifan Ding (Tu\'\^an), Hongzhi Shi (Tu\'\^an), Carrie Muir (Tu\'\^an), Dominik Grewe (Tu\'\^an), Ramy Eskander (Tu\'\^an), Octavio Ponce (Tu\'\^an), Boqing Gong (Tu\'\^an), Derek Gasaway (Tu\'\^an), Samira Khan (Tu\'\^an), Umang Gupta (Tu\'\^an), Angelos Filos (Tu\'\^an), Weicheng Kuo (Tu\'\^an), Klemen Kloboves (Tu\'\^an), Jennifer Beattie (Tu\'\^an), Christian Wright (Tu\'\^an), Leon Li (Tu\'\^an), Alicia Jin (Tu\'\^an), Sandeep Mariserla (Tu\'\^an), Miteyan Patel (Tu\'\^an), Jens Heitkaemper (Tu\'\^an), Dilip Krishnan (Tu\'\^an), Vivek Sharma (Tu\'\^an), David Bieber (Tu\'\^an), Christian Frank (Tu\'\^an), John Lambert (Tu\'\^an), Paul Caron (Tu\'\^an), Martin Polacek (Tu\'\^an), Mai Gim\'enez (Tu\'\^an), Himadri Choudhury (Tu\'\^an), Xing Yu (Tu\'\^an), Sasan Tavakkol (Tu\'\^an), Arun Ahuja (Tu\'\^an), Franz Och (Tu\'\^an), Rodolphe Jenatton (Tu\'\^an), Wojtek Skut (Tu\'\^an), Bryan Richter (Tu\'\^an), David Gaddy (Tu\'\^an), Andy Ly (Tu\'\^an), Misha Bilenko (Tu\'\^an), Megh Umekar (Tu\'\^an), Ethan Liang (Tu\'\^an), Martin Sevenich (Tu\'\^an), Mandar Joshi (Tu\'\^an), Hassan Mansoor (Tu\'\^an), Rebecca Lin (Tu\'\^an), Sumit Sanghai (Tu\'\^an), Abhimanyu Singh (Tu\'\^an), Xiaowei Li (Tu\'\^an), Sudheendra Vijayanarasimhan (Tu\'\^an), Zaheer Abbas (Tu\'\^an), Yonatan Bitton (Tu\'\^an), Hansa Srinivasan (Tu\'\^an), Manish Reddy Vuyyuru (Tu\'\^an), Alexander Fr\"ommgen (Tu\'\^an), Yanhua Sun (Tu\'\^an), Ralph Leith (Tu\'\^an), Alfonso Casta\~no (Tu\'\^an), DJ Strouse (Tu\'\^an), Le Yan (Tu\'\^an), Austin Kyker (Tu\'\^an), Satish Kambala (Tu\'\^an), Mary Jasarevic (Tu\'\^an), Thibault Sellam (Tu\'\^an), Chao Jia (Tu\'\^an), Alexander Pritzel (Tu\'\^an), Raghavender R (Tu\'\^an), Huizhong Chen (Tu\'\^an), Natalie Clay (Tu\'\^an), Sudeep Gandhe (Tu\'\^an), Sean Kirmani (Tu\'\^an), Sayna Ebrahimi (Tu\'\^an), Hannah Kirkwood (Tu\'\^an), Jonathan Mallinson (Tu\'\^an), Chao Wang (Tu\'\^an), Adnan Ozturel (Tu\'\^an), Kuo Lin (Tu\'\^an), Shyam Upadhyay (Tu\'\^an), Vincent Cohen-Addad (Tu\'\^an), Sean Purser-haskell (Tu\'\^an), Yichong Xu (Tu\'\^an), Ebrahim Songhori (Tu\'\^an), Babi Seal (Tu\'\^an), Alberto Magni (Tu\'\^an), Almog Gueta (Tu\'\^an), Tingting Zou (Tu\'\^an), Guru Guruganesh (Tu\'\^an), Thais Kagohara (Tu\'\^an), Hung Nguyen (Tu\'\^an), Khalid Salama (Tu\'\^an), Alejandro Cruzado Ruiz (Tu\'\^an), Justin Frye (Tu\'\^an), Zhenkai Zhu (Tu\'\^an), Matthias Lochbrunner (Tu\'\^an), Simon Osindero (Tu\'\^an), Wentao Yuan (Tu\'\^an), Lisa Lee (Tu\'\^an), Aman Prasad (Tu\'\^an), Lam Nguyen Thiet (Tu\'\^an), Daniele Calandriello (Tu\'\^an), Victor Stone (Tu\'\^an), Qixuan Feng (Tu\'\^an), Han Ke (Tu\'\^an), Maria Voitovich (Tu\'\^an), Geta Sampemane (Tu\'\^an), Lewis Chiang (Tu\'\^an), Ling Wu (Tu\'\^an), Alexander Bykovsky (Tu\'\^an), Matt Young (Tu\'\^an), Luke Vilnis (Tu\'\^an), Ishita Dasgupta (Tu\'\^an), Aditya Chawla (Tu\'\^an), Qin Cao (Tu\'\^an), Bowen Liang (Tu\'\^an), Daniel Toyama (Tu\'\^an), Szabolcs Payrits (Tu\'\^an), Anca Stefanoiu (Tu\'\^an), Dimitrios Vytiniotis (Tu\'\^an), Ankesh Anand (Tu\'\^an), Tianxiao Shen (Tu\'\^an), Blagoj Mitrevski (Tu\'\^an), Michael Tschannen (Tu\'\^an), Sreenivas Gollapudi (Tu\'\^an), Aishwarya P S (Tu\'\^an), Jos\'e Leal (Tu\'\^an), Zhe Shen (Tu\'\^an), Han Fu (Tu\'\^an), Wei Wang (Tu\'\^an), Arvind Kannan (Tu\'\^an), Doron Kukliansky (Tu\'\^an), Sergey Yaroshenko (Tu\'\^an), Svetlana Grant (Tu\'\^an), Umesh Telang (Tu\'\^an), David Wood (Tu\'\^an), Alexandra Chronopoulou (Tu\'\^an), Alexandru \c{T}ifrea (Tu\'\^an), Tao Zhou (Tu\'\^an), Tony (Tu\'\^an), Nguy\~\^en (Q), Muge Ersoy (Q), Anima Singh (Q), Meiyan Xie (Q), Emanuel Taropa (Q), Woohyun Han (Q), Eirikur Agustsson (Q), Andrei Sozanschi (Q), Hui Peng (Q), Alex Chen (Q), Yoel Drori (Q), Efren Robles (Q), Yang Gao (Q), Xerxes Dotiwalla (Q), Ying Chen (Q), Anudhyan Boral (Q), Alexei Bendebury (Q), John Nham (Q), Chris Tar (Q), Luis Castro (Q), Jiepu Jiang (Q), Canoee Liu (Q), Felix Halim (Q), Jinoo Baek (Q), Andy Wan (Q), Jeremiah Liu (Q), Yuan Cao (Q), Shengyang Dai (Q), Trilok Acharya (Q), Ruoxi Sun (Q), Fuzhao Xue (Q), Saket Joshi (Q), Morgane Lustman (Q), Yongqin Xian (Q), Rishabh Joshi (Q), Deep Karkhanis (Q), Nora Kassner (Q), Jamie Hall (Q), Xiangzhuo Ding (Q), Gan Song (Q), Gang Li (Q), Chen Zhu (Q), Yana Kulizhskaya (Q), Bin Ni (Q), Alexey Vlaskin (Q), Solomon Demmessie (Q), Lucio Dery (Q), Salah Zaiem (Q), Yanping Huang (Q), Cindy Fan (Q), Felix Gimeno (Q), Ananth Balashankar (Q), Koji Kojima (Q), Hagai Taitelbaum (Q), Maya Meng (Q), Dero Gharibian (Q), Sahil Singla (Q), Wei Chen (Q), Ambrose Slone (Q), Guanjie Chen (Q), Sujee Rajayogam (Q), Max Schumacher (Q), Suyog Kotecha (Q), Rory Blevins (Q), Qifei Wang (Q), Mor Hazan Taege (Q), Alex Morris (Q), Xin Liu (Q), Fayaz Jamil (Q), Richard Zhang (Q), Pratik Joshi (Q), Ben Ingram (Q), Tyler Liechty (Q), Ahmed Eleryan (Q), Scott Baird (Q), Alex Grills (Q), Gagan Bansal (Q), Shan Han (Q), Kiran Yalasangi (Q), Shawn Xu (Q), Majd Al Merey (Q), Isabel Gao (Q), Felix Weissenberger (Q), Igor Karpov (Q), Robert Riachi (Q), Ankit Anand (Q), Gautam Prasad (Q), Kay Lamerigts (Q), Reid Hayes (Q), Jamie Rogers (Q), Mandy Guo (Q), Ashish Shenoy (Q), Qiong (Q), Hu (Dima), Kyle He (Dima), Yuchen Liu (Dima), Polina Zablotskaia (Dima), Sagar Gubbi (Dima), Yifan Chang (Dima), Jay Pavagadhi (Dima), Kristian Kjems (Dima), Archita Vadali (Dima), Diego Machado (Dima), Yeqing Li (Dima), Renshen Wang (Dima), Dipankar Ghosh (Dima), Aahil Mehta (Dima), Dana Alon (Dima), George Polovets (Dima), Alessio Tonioni (Dima), Nate Kushman (Dima), Joel D'sa (Dima), Lin Zhuo (Dima), Allen Wu (Dima), Rohin Shah (Dima), John Youssef (Dima), Jiayu Ye (Dima), Justin Snyder (Dima), Karel Lenc (Dima), Senaka Buthpitiya (Dima), Matthew Tung (Dima), Jichuan Chang (Dima), Tao Chen (Dima), David Saxton (Dima), Jenny Lee (Dima), Lydia Lihui Zhang (Dima), James Qin (Dima), Prabakar Radhakrishnan (Dima), Maxwell Chen (Dima), Piotr Ambroszczyk (Dima), Metin Toksoz-Exley (Dima), Yan Zhong (Dima), Nitzan Katz (Dima), Brendan O'Donoghue (Dima), Tamara von Glehn (Dima), Adi Gerzi Rosenthal (Dima), Aga \'Swietlik (Dima), Xiaokai Zhao (Dima), Nick Fernando (Dima), Jinliang Wei (Dima), Jieru Mei (Dima), Sergei Vassilvitskii (Dima), Diego Cedillo (Dima), Pranjal Awasthi (Dima), Hui Zheng (Dima), Koray Kavukcuoglu (Dima), Itay Laish (Dima), Joseph Pagadora (Dima), Marc Brockschmidt (Dima), Christopher A. Choquette-Choo (Dima), Arunkumar Byravan (Dima), Yifeng Lu (Dima), Xu Chen (Dima), Mia Chen (Dima), Kenton Lee (Dima), Rama Pasumarthi (Dima), Sijal Bhatnagar (Dima), Aditya Shah (Dima), Qiyin Wu (Dima), Zhuoyuan Chen (Dima), Zack Nado (Dima), Bartek Perz (Dima), Zixuan Jiang (Dima), David Kao (Dima), Ganesh Mallya (Dima), Nino Vieillard (Dima), Lantao Mei (Dima), Sertan Girgin (Dima), Mandy Jordan (Dima), Yeongil Ko (Dima), Alekh Agarwal (Dima), Yaxin Liu (Dima), Yasemin Altun (Dima), Raoul de Liedekerke (Dima), Anastasios Kementsietsidis (Dima), Daiyi Peng (Dima), Dangyi Liu (Dima), Utku Evci (Dima), Peter Humphreys (Dima), Austin Tarango (Dima), Xiang Deng (Dima), Yoad Lewenberg (Dima), Kevin Aydin (Dima), Chengda Wu (Dima), Bhavishya Mittal (Dima), Tsendsuren Munkhdalai (Dima), Kleopatra Chatziprimou (Dima), Rodrigo Benenson (Dima), Uri First (Dima), Xiao Ma (Dima), Jinning Li (Dima), Armand Joulin (Dima), Hamish Tomlinson (Dima), Tingnan Zhang (Dima), Milad Nasr (Dima), Zhi Hong (Dima), Micha\"el Sander (Dima), Lisa Anne Hendricks (Dima), Anuj Sharma (Dima), Andrew Bolt (Dima), Eszter V\'ertes (Dima), Jiri Simsa (Dima), Tomer Levinboim (Dima), Olcan Sercinoglu (Dima), Divyansh Shukla (Dima), Austin Wu (Dima), Craig Swanson (Dima), Danny Vainstein (Dima), Fan Bu (Dima), Bo Wang (Dima), Ryan Julian (Dima), Charles Yoon (Dima), Sergei Lebedev (Dima), Antonious Girgis (Dima), Bernd Bandemer (Dima), David Du (Dima), Todd Wang (Dima), Xi Chen (Dima), Ying Xiao (Dima), Peggy Lu (Dima), Natalie Ha (Dima), Vlad Ionescu (Dima), Simon Rowe (Dima), Josip Matak (Dima), Federico Lebron (Dima), Andreas Steiner (Dima), Lalit Jain (Dima), Manaal Faruqui (Dima), Nicolas Lacasse (Dima), Georgie Evans (Dima), Neesha Subramaniam (Dima), Dean Reich (Dima), Giulia Vezzani (Dima), Aditya Pandey (Dima), Joe Stanton (Dima), Tianhao Zhou (Dima), Liam McCafferty (Dima), Henry Griffiths (Dima), Verena Rieser (Dima), Soheil Hassas Yeganeh (Dima), Eleftheria Briakou (Dima), Lu Huang (Dima), Zichuan Wei (Dima), Liangchen Luo (Dima), Erik Jue (Dima), Gabby Wang (Dima), Victor Cotruta (Dima), Myriam Khan (Dima), Jongbin Park (Dima), Qiuchen Guo (Dima), Peiran Li (Dima), Rong Rong (Dima), Diego Antognini (Dima), Anastasia Petrushkina (Dima), Chetan Tekur (Dima), Eli Collins (Dima), Parul Bhatia (Dima), Chester Kwak (Dima), Wenhu Chen (Dima), Arvind Neelakantan (Dima), Immanuel Odisho (Dima), Sheng Peng (Dima), Vincent Nallatamby (Dima), Vaibhav Tulsyan (Dima), Fabian Pedregosa (Dima), Peng Xu (Dima), Raymond Lin (Dima), Yulong Wang (Dima), Emma Wang (Dima), Sholto Douglas (Dima), Reut Tsarfaty (Dima), Elena Gribovskaya (Dima), Renga Aravamudhan (Dima), Manu Agarwal (Dima), Mara Finkelstein (Dima), Qiao Zhang (Dima), Elizabeth Cole (Dima), Phil Crone (Dima), Sarmishta Velury (Dima), Anil Das (Dima), Chris Sauer (Dima), Luyao Xu (Dima), Danfeng Qin (Dima), Chenjie Gu (Dima), Dror Marcus (Dima), CJ Zheng (Dima), Wouter Van Gansbeke (Dima), Sobhan Miryoosefi (Dima), Haitian Sun (Dima), YaGuang Li (Dima), Charlie Chen (Dima), Jae Yoo (Dima), Pavel Dubov (Dima), Alex Tomala (Dima), Adams Yu (Dima), Pawe{\l} Weso{\l}owski (Dima), Alok Gunjan (Dima), Eddie Cao (Dima), Jiaming Luo (Dima), Nikhil Sethi (Dima), Arkadiusz Socala (Dima), Laura Graesser (Dima), Tomas Kocisky (Dima), Arturo BC (Dima), Minmin Chen (Dima), Edward Lee (Dima), Sophie Wang (Dima), Weize Kong (Dima), Qiantong Xu (Dima), Nilesh Tripuraneni (Dima), Yiming Li (Dima), Xinxin Yu (Dima), Allen Porter (Dima), Paul Voigtlaender (Dima), Biao Zhang (Dima), Arpi Vezer (Dima), Sarah York (Dima), Qing Wei (Dima), Geoffrey Cideron (Dima), Mark Kurzeja (Dima), Seungyeon Kim (Dima), Benny Li (Dima), Ang\'eline Pouget (Dima), Hyo Lee (Dima), Kaspar Daugaard (Dima), Yang Li (Dima), Dave Uthus (Dima), Aditya Siddhant (Dima), Paul Cavallaro (Dima), Sriram Ganapathy (Dima), Maulik Shah (Dima), Rolf Jagerman (Dima), Jeff Stanway (Dima), Piermaria Mendolicchio (Dima), Li Xiao (Dima), Kayi Lee (Dima), Tara Thompson (Dima), Shubham Milind Phal (Dima), Jason Chase (Dima), Sun Jae Lee (Dima), Adrian N Reyes (Dima), Disha Shrivastava (Dima), Zhen Qin (Dima), Roykrong Sukkerd (Dima), Seth Odoom (Dima), Lior Madmoni (Dima), John Aslanides (Dima), Jonathan Herzig (Dima), Elena Pochernina (Dima), Sheng Zhang (Dima), Parker Barnes (Dima), Daisuke Ikeda (Dima), Qiujia Li (Dima), Shuo-yiin Chang (Dima), Shakir Mohamed (Dima), Jim Sproch (Dima), Richard Powell (Dima), Bidisha Samanta (Dima), Domagoj \'Cevid (Dima), Anton Kovsharov (Dima), Shrestha Basu Mallick (Dima), Srinivas Tadepalli (Dima), Anne Zheng (Dima), Kareem Ayoub (Dima), Andreas Noever (Dima), Christian Reisswig (Dima), Zhuo Xu (Dima), Junhyuk Oh (Dima), Martin Matysiak (Dima), Tim Blyth (Dima), Shereen Ashraf (Dima), Julien Amelot (Dima), Boone Severson (Dima), Michele Bevilacqua (Dima), Motoki Sano (Dima), Ethan Dyer (Dima), Ofir Roval (Dima), Anu Sinha (Dima), Yin Zhong (Dima), Sagi Perel (Dima), Tea Saboli\'c (Dima), Johannes Mauerer (Dima), Willi Gierke (Dima), Mauro Verzetti (Dima), Rodrigo Cabrera (Dima), Alvin Abdagic (Dima), Steven Hemingray (Dima), Austin Stone (Dima), Jong Lee (Dima), Farooq Ahmad (Dima), Karthik Raman (Dima), Lior Shani (Dima), Jonathan Lai (Dima), Orhan Firat (Dima), Nathan Waters (Dima), Eric Ge (Dima), Mo Shomrat (Dima), Himanshu Gupta (Dima), Rajeev Aggarwal (Dima), Tom Hudson (Dima), Bill Jia (Dima), Simon Baumgartner (Dima), Palak Jain (Dima), Joe Kovac (Dima), Junehyuk Jung (Dima), Ante \v{Z}u\v{z}ul (Dima), Will Truong (Dima), Morteza Zadimoghaddam (Dima), Songyou Peng (Dima), Marco Liang (Dima), Rachel Sterneck (Dima), Balaji Lakshminarayanan (Dima), Machel Reid (Dima), Oliver Woodman (Dima), Tong Zhou (Dima), Jianling Wang (Dima), Vincent Coriou (Dima), Arjun Narayanan (Dima), Jay Hoover (Dima), Yenai Ma (Dima), Apoorv Jindal (Dima), Clayton Sanford (Dima), Doug Reid (Dima), Swaroop Ramaswamy (Dima), Alex Kurakin (Dima), Roland Zimmermann (Dima), Yana Lunts (Dima), Dragos Dena (Dima), Zal\'an Borsos (Dima), Vered Cohen (Dima), Shujian Zhang (Dima), Will Grathwohl (Dima), Robert Dadashi (Dima), Morgan Redshaw (Dima), Joshua Kessinger (Dima), Julian Odell (Dima), Silvano Bonacina (Dima), Zihang Dai (Dima), Grace Chen (Dima), Ayush Dubey (Dima), Pablo Sprechmann (Dima), Mantas Pajarskas (Dima), Wenxuan Zhou (Dima), Niharika Ahuja (Dima), Tara Thomas (Dima), Martin Nikoltchev (Dima), Matija Kecman (Dima), Bharath Mankalale (Dima), Andrey Ryabtsev (Dima), Jennifer She (Dima), Christian Walder (Dima), Jiaming Shen (Dima), Lu Li (Dima), Carolina Parada (Dima), Sheena Panthaplackel (Dima), Okwan Kwon (Dima), Matt Lawlor (Dima), Utsav Prabhu (Dima), Yannick Schroecker (Dima), Marc'aurelio Ranzato (Dima), Pete Blois (Dima), Iurii Kemaev (Dima), Ting Yu (Dima), Dmitry (Dima), Lepikhin (Weilun), Hao Xiong (Weilun), Sahand Sharifzadeh (Weilun), Oleaser Johnson (Weilun), Jeremiah Willcock (Weilun), Rui Yao (Weilun), Greg Farquhar (Weilun), Sujoy Basu (Weilun), Hidetoshi Shimokawa (Weilun), Nina Anderson (Weilun), Haiguang Li (Weilun), Khiem Pham (Weilun), Yizhong Liang (Weilun), Sebastian Borgeaud (Weilun), Alexandre Moufarek (Weilun), Hideto Kazawa (Weilun), Blair Kutzman (Weilun), Marcin Sieniek (Weilun), Sara Smoot (Weilun), Ruth Wang (Weilun), Natalie Axelsson (Weilun), Nova Fallen (Weilun), Prasha Sundaram (Weilun), Yuexiang Zhai (Weilun), Varun Godbole (Weilun), Petros Maniatis (Weilun), Alek Wang (Weilun), Ilia Shumailov (Weilun), Santhosh Thangaraj (Weilun), Remi Crocker (Weilun), Nikita Gupta (Weilun), Gang Wu (Weilun), Phil Chen (Weilun), Gell\'ert Weisz (Weilun), Celine Smith (Weilun), Mojtaba Seyedhosseini (Weilun), Boya Fang (Weilun), Xiyang Luo (Weilun), Roey Yogev (Weilun), Zeynep Cankara (Weilun), Andrew Hard (Weilun), Helen Ran (Weilun), Rahul Sukthankar (Weilun), George Necula (Weilun), Ga\"el Liu (Weilun), Honglong Cai (Weilun), Praseem Banzal (Weilun), Daniel Keysers (Weilun), Sanjay Ghemawat (Weilun), Connie Tao (Weilun), Emma Dunleavy (Weilun), Aditi Chaudhary (Weilun), Wei Li (Weilun), Maciej Miku{\l}a (Weilun), Chen-Yu Lee (Weilun), Tiziana Refice (Weilun), Krishna Somandepalli (Weilun), Alexandre Fr\'echette (Weilun), Dan Bahir (Weilun), John Karro (Weilun), Keith Rush (Weilun), Sarah Perrin (Weilun), Bill Rosgen (Weilun), Xiaomeng Yang (Weilun), Clara Huiyi Hu (Weilun), Mahmoud Alnahlawi (Weilun), Justin Mao-Jones (Weilun), Roopal Garg (Weilun), Hoang Nguyen (Weilun), Bat-Orgil Batsaikhan (Weilun), I\~naki Iturrate (Weilun), Anselm Levskaya (Weilun), Avi Singh (Weilun), Ashyana Kachra (Weilun), Tony Lu (Weilun), Denis Petek (Weilun), Zheng Xu (Weilun), Mark Graham (Weilun), Lukas Zilka (Weilun), Yael Karov (Weilun), Marija Kostelac (Weilun), Fangyu Liu (Weilun), Yaohui Guo (Weilun), Weiyue Wang (Weilun), Bernd Bohnet (Weilun), Emily Pitler (Weilun), Tony Bruguier (Weilun), Keisuke Kinoshita (Weilun), Chrysovalantis Anastasiou (Weilun), Nilpa Jha (Weilun), Ting Liu (Weilun), Jerome Connor (Weilun), Phil Wallis (Weilun), Philip Pham (Weilun), Eric Bailey (Weilun), Shixin Li (Weilun), Heng-Tze Cheng (Weilun), Sally Ma (Weilun), Haiqiong Li (Weilun), Akanksha Maurya (Weilun), Kate Olszewska (Weilun), Manfred Warmuth (Weilun), Christy Koh (Weilun), Dominik Paulus (Weilun), Siddhartha Reddy Jonnalagadda (Weilun), Enrique Piqueras (Weilun), Ali Elqursh (Weilun), Geoff Brown (Weilun), Hadar Shemtov (Weilun), Loren Maggiore (Weilun), Fei Xia (Weilun), Ryan Foley (Weilun), Beka Westberg (Weilun), George van den Driessche (Weilun), Livio Baldini Soares (Weilun), Arjun Kar (Weilun), Michael Quinn (Weilun), Siqi Zuo (Weilun), Jialin Wu (Weilun), Kyle Kastner (Weilun), Anna Bortsova (Weilun), Aijun Bai (Weilun), Ales Mikhalap (Weilun), Luowei Zhou (Weilun), Jennifer Brennan (Weilun), Vinay Ramasesh (Weilun), Honglei Zhuang (Weilun), John Maggs (Weilun), Johan Schalkwyk (Weilun), Yuntao Xu (Weilun), Hui Huang (Weilun), Andrew Howard (Weilun), Sasha Brown (Weilun), Linting Xue (Weilun), Gloria Shen (Weilun), Brian Albert (Weilun), Neha Jha (Weilun), Daniel Zheng (Weilun), Varvara Krayvanova (Weilun), Spurthi Amba Hombaiah (Weilun), Olivier Lacombe (Weilun), Gautam Vasudevan (Weilun), Dan Graur (Weilun), Tian Xie (Weilun), Meet Gandhi (Weilun), Bangju Wang (Weilun), Dustin Zelle (Weilun), Harman Singh (Weilun), Dahun Kim (Weilun), S\'ebastien Cevey (Weilun), Victor Ungureanu (Weilun), Natasha Noy (Weilun), Fei Liu (Weilun), Annie Xie (Weilun), Fangxiaoyu Feng (Weilun), Katerina Tsihlas (Weilun), Daniel Formoso (Weilun), Neera Vats (Weilun), Quentin Wellens (Weilun), Yinan Wang (Weilun), Niket Kumar Bhumihar (Weilun), Samrat Ghosh (Weilun), Matt Hoffman (Weilun), Tom Lieber (Weilun), Oran Lang (Weilun), Kush Bhatia (Weilun), Tom Paine (Weilun), Aroonalok Pyne (Weilun), Ronny Votel (Weilun), Madeleine Clare Elish (Weilun), Benoit Schillings (Weilun), Alex Panagopoulos (Weilun), Haichuan Yang (Weilun), Adam Raveret (Weilun), Zohar Yahav (Weilun), Shuang Liu (Weilun), Warren Chen (Weilun), Dalia El Badawy (Weilun), Nishant Agrawal (Weilun), Mohammed Badawi (Weilun), Mahdi Mirzazadeh (Weilun), Carla Bromberg (Weilun), Fan Ye (Weilun), Chang Liu (Weilun), Tatiana Sholokhova (Weilun), George-Cristian Muraru (Weilun), Gargi Balasubramaniam (Weilun), Jonathan Malmaud (Weilun), Alen Carin (Weilun), Danilo Martins (Weilun), Irina Jurenka (Weilun), Pankil Botadra (Weilun), Dave Lacey (Weilun), Richa Singh (Weilun), Mariano Schain (Weilun), Dan Zheng (Weilun), Isabelle Guyon (Weilun), Victor Lavrenko (Weilun), Seungji Lee (Weilun), Xiang Zhou (Weilun), Demis Hassabis (Weilun), Jeshwanth Challagundla (Weilun), Derek Cheng (Weilun), Nikhil Mehta (Weilun), Matthew Mauger (Weilun), Michela Paganini (Weilun), Pushkar Mishra (Weilun), Kate Lee (Weilun), Zhang Li (Weilun), Lexi Baugher (Weilun), Ondrej Skopek (Weilun), Max Chang (Weilun), Amir Zait (Weilun), Gaurav Menghani (Weilun), Lizzetth Bellot (Weilun), Guangxing Han (Weilun), Jean-Michel Sarr (Weilun), Sharat Chikkerur (Weilun), Himanshu Sahni (Weilun), Rohan Anil (Weilun), Arun Narayanan (Weilun), Chandu Thekkath (Weilun), Daniele Pighin (Weilun), Hana Strej\v{c}ek (Weilun), Marko Velic (Weilun), Fred Bertsch (Weilun), Manuel Tragut (Weilun), Keran Rong (Weilun), Alicia Parrish (Weilun), Kai Bailey (Weilun), Jiho Park (Weilun), Isabela Albuquerque (Weilun), Abhishek Bapna (Weilun), Rajesh Venkataraman (Weilun), Alec Kosik (Weilun), Johannes Griesser (Weilun), Zhiwei Deng (Weilun), Alek Andreev (Weilun), Qingyun Dou (Weilun), Kevin Hui (Weilun), Fanny Wei (Weilun), Xiaobin Yu (Weilun), Lei Shu (Weilun), Avia Aharon (Weilun), David Barker (Weilun), Badih Ghazi (Weilun), Sebastian Flennerhag (Weilun), Chris Breaux (Weilun), Yuchuan Liu (Weilun), Matthew Bilotti (Weilun), Josh Woodward (Weilun), Uri Alon (Weilun), Stephanie Winkler (Weilun), Tzu-Kuo Huang (Weilun), Kostas Andriopoulos (Weilun), Jo\~ao Gabriel Oliveira (Weilun), Penporn Koanantakool (Weilun), Berkin Akin (Weilun), Michael Wunder (Weilun), Cicero Nogueira dos Santos (Weilun), Mohammad Hossein Bateni (Weilun), Lin Yang (Weilun), Dan Horgan (Weilun), Beer Changpinyo (Weilun), Keyvan Amiri (Weilun), Min Ma (Weilun), Dayeong Lee (Weilun), Lihao Liang (Weilun), Anirudh Baddepudi (Weilun), Tejasi Latkar (Weilun), Raia Hadsell (Weilun), Jun Xu (Weilun), Hairong Mu (Weilun), Michael Han (Weilun), Aedan Pope (Weilun), Snchit Grover (Weilun), Frank Kim (Weilun), Ankit Bhagatwala (Weilun), Guan Sun (Weilun), Yamini Bansal (Weilun), Amir Globerson (Weilun), Alireza Nazari (Weilun), Samira Daruki (Weilun), Hagen Soltau (Weilun), Jane Labanowski (Weilun), Laurent El Shafey (Weilun), Matt Harvey (Weilun), Yanif Ahmad (Weilun), Elan Rosenfeld (Weilun), William Kong (Weilun), Etienne Pot (Weilun), Yi-Xuan Tan (Weilun), Aurora Wei (Weilun), Victoria Langston (Weilun), Marcel Prasetya (Weilun), Petar Veli\v{c}kovi\'c (Weilun), Richard Killam (Weilun), Robin Strudel (Weilun), Darren Ni (Weilun), Zhenhai Zhu (Weilun), Aaron Archer (Weilun), Kavya Kopparapu (Weilun), Lynn Nguyen (Weilun), Emilio Parisotto (Weilun), Hussain Masoom (Weilun), Sravanti Addepalli (Weilun), Jordan Grimstad (Weilun), Hexiang Hu (Weilun), Joss Moore (Weilun), Avinatan Hassidim (Weilun), Le Hou (Weilun), Mukund Raghavachari (Weilun), Jared Lichtarge (Weilun), Adam R. Brown (Weilun), Hilal Dib (Weilun), Natalia Ponomareva (Weilun), Justin Fu (Weilun), Yujing Zhang (Weilun), Altaf Rahman (Weilun), Joana Iljazi (Weilun), Edouard Leurent (Weilun), Gabriel Dulac-Arnold (Weilun), Cosmo Du (Weilun), Chulayuth Asawaroengchai (Weilun), Larry Jin (Weilun), Ela Gruzewska (Weilun), Ziwei Ji (Weilun), Benigno Uria (Weilun), Daniel De Freitas (Weilun), Paul Barham (Weilun), Lauren Beltrone (Weilun), V\'ictor Campos (Weilun), Jun Yan (Weilun), Neel Kovelamudi (Weilun), Arthur Nguyen (Weilun), Elinor Davies (Weilun), Zhichun Wu (Weilun), Zoltan Egyed (Weilun), Kristina Toutanova (Weilun), Nithya Attaluri (Weilun), Hongliang Fei (Weilun), Peter Stys (Weilun), Siddhartha Brahma (Weilun), Martin Izzard (Weilun), Siva Velusamy (Weilun), Scott Lundberg (Weilun), Vincent Zhuang (Weilun), Kevin Sequeira (Weilun), Adam Santoro (Weilun), Ehsan Amid (Weilun), Ophir Aharoni (Weilun), Shuai Ye (Weilun), Mukund Sundararajan (Weilun), Lijun Yu (Weilun), Yu-Cheng Ling (Weilun), Stephen Spencer (Weilun), Hugo Song (Weilun), Josip Djolonga (Weilun), Christo Kirov (Weilun), Sonal Gupta (Weilun), Alessandro Bissacco (Weilun), Clemens Meyer (Weilun), Mukul Bhutani (Weilun), Andrew Dai (Weilun), Weiyi Wang (Weilun), Siqi Liu (Weilun), Ashwin Sreevatsa (Weilun), Qijun Tan (Weilun), Maria Wang (Weilun), Lucy Kim (Weilun), Yicheng Wang (Weilun), Alex Irpan (Weilun), Yang Xiao (Weilun), Stanislav Fort (Weilun), Yifan He (Weilun), Alex Gurney (Weilun), Bryan Gale (Weilun), Yue Ma (Weilun), Monica Roy (Weilun), Viorica Patraucean (Weilun), Taylan Bilal (Weilun), Golnaz Ghiasi (Weilun), Anahita Hosseini (Weilun), Melvin Johnson (Weilun), Zhuowan Li (Weilun), Yi Tay (Weilun), Benjamin Beyret (Weilun), Katie Millican (Weilun), Josef Broder (Weilun), Mayank Lunayach (Weilun), Danny Swisher (Weilun), Eugen Vu\v{s}ak (Weilun), David Parkinson (Weilun), MH Tessler (Weilun), Adi Mayrav Gilady (Weilun), Richard Song (Weilun), Allan Dafoe (Weilun), Yves Raimond (Weilun), Masa Yamaguchi (Weilun), Itay Karo (Weilun), Elizabeth Nielsen (Weilun), Kevin Kilgour (Weilun), Mike Dusenberry (Weilun), Rajiv Mathews (Weilun), Jiho Choi (Weilun), Siyuan Qiao (Weilun), Harsh Mehta (Weilun), Sahitya Potluri (Weilun), Chris Knutsen (Weilun), Jialu Liu (Weilun), Tat Tan (Weilun), Kuntal Sengupta (Weilun), Keerthana Gopalakrishnan (Weilun), Abodunrinwa Toki (Weilun), Mencher Chiang (Weilun), Mike Burrows (Weilun), Grace Vesom (Weilun), Zafarali Ahmed (Weilun), Ilia Labzovsky (Weilun), Siddharth Vashishtha (Weilun), Preeti Singh (Weilun), Ankur Sharma (Weilun), Ada Ma (Weilun), Jinyu Xie (Weilun), Pranav Talluri (Weilun), Hannah Forbes-Pollard (Weilun), Aarush Selvan (Weilun), Joel Wee (Weilun), Loic Matthey (Weilun), Tom Funkhouser (Weilun), Parthasarathy Gopavarapu (Weilun), Lev Proleev (Weilun), Cheng Li (Weilun), Matt Thomas (Weilun), Kashyap Kolipaka (Weilun), Zhipeng Jia (Weilun), Ashwin Kakarla (Weilun), Srinivas Sunkara (Weilun), Joan Puigcerver (Weilun), Suraj Satishkumar Sheth (Weilun), Emily Graves (Weilun), Chen Wang (Weilun), Sadh MNM Khan (Weilun), Kai Kang (Weilun), Shyamal Buch (Weilun), Fred Zhang (Weilun), Omkar Savant (Weilun), David Soergel (Weilun), Kevin Lee (Weilun), Linda Friso (Weilun), Xuanyi Dong (Weilun), Rahul Arya (Weilun), Shreyas Chandrakaladharan (Weilun), Connor Schenck (Weilun), Greg Billock (Weilun), Tejas Iyer (Weilun), Anton Bakalov (Weilun), Leslie Baker (Weilun), Alex Ruiz (Weilun), Angad Chandorkar (Weilun), Trieu Trinh (Weilun), Matt Miecnikowski (Weilun), Yanqi Zhou (Weilun), Yangsibo Huang (Weilun), Jiazhong Nie (Weilun), Ali Shah (Weilun), Ashish Thapliyal (Weilun), Sam Haves (Weilun), Lun Wang (Weilun), Uri Shaham (Weilun), Patrick Morris-Suzuki (Weilun), Soroush Radpour (Weilun), Leonard Berrada (Weilun), Thomas Strohmann (Weilun), Chaochao Yan (Weilun), Jingwei Shen (Weilun), Sonam Goenka (Weilun), Tris Warkentin (Weilun), Petar Devi\'c (Weilun), Dan Belov (Weilun), Albert Webson (Weilun), Madhavi Yenugula (Weilun), Puranjay Datta (Weilun), Jerry Chang (Weilun), Nimesh Ghelani (Weilun), Aviral Kumar (Weilun), Vincent Perot (Weilun), Jessica Lo (Weilun), Yang Song (Weilun), Herman Schmit (Weilun), Jianmin Chen (Weilun), Vasilisa Bashlovkina (Weilun), Xiaoyue Pan (Weilun), Diana Mincu (Weilun), Paul Roit (Weilun), Isabel Edkins (Weilun), Andy Davis (Weilun), Yujia Li (Weilun), Ben Horn (Weilun), Xinjian Li (Weilun), Pradeep Kumar S (Weilun), Eric Doi (Weilun), Wanzheng Zhu (Weilun), Sri Gayatri Sundara Padmanabhan (Weilun), Siddharth Verma (Weilun), Jasmine Liu (Weilun), Heng Chen (Weilun), Mihajlo Velimirovi\'c (Weilun), Malcolm Reynolds (Weilun), Priyanka Agrawal (Weilun), Nick Sukhanov (Weilun), Abhinit Modi (Weilun), Siddharth Goyal (Weilun), John Palowitch (Weilun), Nima Khajehnouri (Weilun), Wing Lowe (Weilun), David Klinghoffer (Weilun), Sharon Silver (Weilun), Vinh Tran (Weilun), Candice Schumann (Weilun), Francesco Piccinno (Weilun), Xi Liu (Weilun), Mario Lu\v{c}i\'c (Weilun), Xiaochen Yang (Weilun), Sandeep Kumar (Weilun), Ajay Kannan (Weilun), Ragha Kotikalapudi (Weilun), Mudit Bansal (Weilun), Fabian Fuchs (Weilun), Javad Hosseini (Weilun), Abdelrahman Abdelhamed (Weilun), Dawn Bloxwich (Weilun), Tianhe Yu (Weilun), Ruoxin Sang (Weilun), Gregory Thornton (Weilun), Karan Gill (Weilun), Yuchi Liu (Weilun), Virat Shejwalkar (Weilun), Jason Lin (Weilun), Zhipeng Yan (Weilun), Kehang Han (Weilun), Thomas Buschmann (Weilun), Michael Pliskin (Weilun), Zhi Xing (Weilun), Susheel Tatineni (Weilun), Junlin Zhang (Weilun), Sissie Hsiao (Weilun), Gavin Buttimore (Weilun), Marcus Wu (Weilun), Zefei Li (Weilun), Geza Kovacs (Weilun), Legg Yeung (Weilun), Tao Huang (Weilun), Aaron Cohen (Weilun), Bethanie Brownfield (Weilun), Averi Nowak (Weilun), Mikel Rodriguez (Weilun), Tianze Shi (Weilun), Hado van Hasselt (Weilun), Kevin Cen (Weilun), Deepanway Ghoshal (Weilun), Kushal Majmundar (Weilun), Weiren Yu (Weilun), Warren (Weilun), Chen (June), Danila Sinopalnikov (June), Hao Zhang (June), Vlado Gali\'c (June), Di Lu (June), Zeyu Zheng (June), Maggie Song (June), Gary Wang (June), Gui Citovsky (June), Swapnil Gawde (June), Isaac Galatzer-Levy (June), David Silver (June), Ivana Balazevic (June), Dipanjan Das (June), Kingshuk Majumder (June), Yale Cong (June), Praneet Dutta (June), Dustin Tran (June), Hui Wan (June), Junwei Yuan (June), Daniel Eppens (June), Alanna Walton (June), Been Kim (June), Harry Ragan (June), James Cobon-Kerr (June), Lu Liu (June), Weijun Wang (June), Bryce Petrini (June), Jack Rae (June), Rakesh Shivanna (June), Yan Xiong (June), Chace Lee (June), Pauline Coquinot (June), Yiming Gu (June), Lisa Patel (June), Blake Hechtman (June), Aviel Boag (June), Orion Jankowski (June), Alex Wertheim (June), Alex Lee (June), Paul Covington (June), Hila Noga (June), Sam Sobell (June), Shanthal Vasanth (June), William Bono (June), Chirag Nagpal (June), Wei Fan (June), Xavier Garcia (June), Kedar Soparkar (June), Aybuke Turker (June), Nathan Howard (June), Sachit Menon (June), Yuankai Chen (June), Vikas Verma (June), Vladimir Pchelin (June), Harish Rajamani (June), Valentin Dalibard (June), Ana Ramalho (June), Yang Guo (June), Kartikeya Badola (June), Seojin Bang (June), Nathalie Rauschmayr (June), Julia Proskurnia (June), Sudeep Dasari (June), Xinyun Chen (June), Mikhail Sushkov (June), Anja Hauth (June), Pauline Sho (June), Abhinav Singh (June), Bilva Chandra (June), Allie Culp (June), Max Dylla (June), Olivier Bachem (June), James Besley (June), Heri Zhao (June), Timothy Lillicrap (June), Wei Wei (June), Wael Al Jishi (June), Ning Niu (June), Alban Rrustemi (June), Rapha\"el Lopez Kaufman (June), Ryan Poplin (June), Jewel Zhao (June), Minh Truong (June), Shikhar Bharadwaj (June), Ester Hlavnova (June), Eli Stickgold (June), Cordelia Schmid (June), Georgi Stephanov (June), Zhaoqi Leng (June), Frederick Liu (June), L\'eonard Hussenot (June), Shenil Dodhia (June), Juliana Vicente Franco (June), Lesley Katzen (June), Abhanshu Sharma (June), Sarah Cogan (June), Zuguang Yang (June), Aniket Ray (June), Sergi Caelles (June), Shen Yan (June), Ravin Kumar (June), Daniel Gillick (June), Renee Wong (June), Joshua Ainslie (June), Jonathan Hoech (June), S\'eb Arnold (June), Dan Abolafia (June), Anca Dragan (June), Ben Hora (June), Grace Hu (June), Alexey Guseynov (June), Yang Lu (June), Chas Leichner (June), Jinmeng Rao (June), Abhimanyu Goyal (June), Nagabhushan Baddi (June), Daniel Hernandez Diaz (June), Tim McConnell (June), Max Bain (June), Jake Abernethy (June), Qiqi Yan (June), Rylan Schaeffer (June), Paul Vicol (June), Will Thompson (June), Montse Gonzalez Arenas (June), Mathias Bellaiche (June), Pablo Barrio (June), Stefan Zinke (June), Riccardo Patana (June), Pulkit Mehta (June), JK Kearns (June), Avraham Ruderman (June), Scott Pollom (June), David D'Ambrosio (June), Cath Hope (June), Yang Yu (June), Andrea Gesmundo (June), Kuang-Huei Lee (June), Aviv Rosenberg (June), Yiqian Zhou (June), Yaoyiran Li (June), Drew Garmon (June), Yonghui Wu (June), Safeen Huda (June), Gil Fidel (June), Martin Baeuml (June), Jian Li (June), Phoebe Kirk (June), Rhys May (June), Tao Tu (June), Sara Mc Carthy (June), Toshiyuki Fukuzawa (June), Miranda Aperghis (June), Chih-Kuan Yeh (June), Toshihiro Yoshino (June), Bo Li (June), Austin Myers (June), Kaisheng Yao (June), Ben Limonchik (June), Changwan Ryu (June), Rohun Saxena (June), Alex Goldin (June), Ruizhe Zhao (June), Rocky Rhodes (June), Tao Zhu (June), Divya Tyam (June), Heidi Howard (June), Nathan Byrd (June), Hongxu Ma (June), Yan Wu (June), Ryan Mullins (June), Qingze Wang (June), Aida Amini (June), Sebastien Baur (June), Yiran Mao (June), Subhashini Venugopalan (June), Will Song (June), Wen Ding (June), Paul Collins (June), Sashank Reddi (June), Megan Shum (June), Andrei Rusu (June), Luisa Zintgraf (June), Kelvin Chan (June), Sheela Goenka (June), Mathieu Blondel (June), Michael Collins (June), Renke Pan (June), Marissa Giustina (June), Nikolai Chinaev (June), Christian Schuler (June), Ce Zheng (June), Jonas Valfridsson (June), Alyssa Loo (June), Alex Yakubovich (June), Jamie Smith (June), Tao Jiang (June), Rich Munoz (June), Gabriel Barcik (June), Rishabh Bansal (June), Mingyao Yang (June), Yilun Du (June), Pablo Duque (June), Mary Phuong (June), Alexandra Belias (June), Kunal Lad (June), Zeyu Liu (June), Tal Schuster (June), Karthik Duddu (June), Jieru Hu (June), Paige Kunkle (June), Matthew Watson (June), Jackson Tolins (June), Josh Smith (June), Denis Teplyashin (June), Garrett Bingham (June), Marvin Ritter (June), Marco Andreetto (June), Divya Pitta (June), Mohak Patel (June), Shashank Viswanadha (June), Trevor Strohman (June), Catalin Ionescu (June), Jincheng Luo (June), Yogesh Kalley (June), Jeremy Wiesner (June), Dan Deutsch (June), Derek Lockhart (June), Peter Choy (June), Rumen Dangovski (June), Chawin Sitawarin (June), Cat Graves (June), Tanya Lando (June), Joost van Amersfoort (June), Ndidi Elue (June), Zhouyuan Huo (June), Pooya Moradi (June), Jean Tarbouriech (June), Henryk Michalewski (June), Wenting Ye (June), Eunyoung Kim (June), Alex Druinsky (June), Florent Altch\'e (June), Xinyi Chen (June), Artur Dwornik (June), Da-Cheng Juan (June), Rivka Moroshko (June), Horia Toma (June), Jarrod Kahn (June), Hai Qian (June), Maximilian Sieb (June), Irene Cai (June), Roman Goldenberg (June), Praneeth Netrapalli (June), Sindhu Raghuram (June), Yuan Gong (June), Lijie Fan (June), Evan Palmer (June), Yossi Matias (June), Valentin Gabeur (June), Shreya Pathak (June), Tom Ouyang (June), Don Metzler (June), Geoff Bacon (June), Srinivasan Venkatachary (June), Sridhar Thiagarajan (June), Alex Cullum (June), Eran Ofek (June), Vytenis Sakenas (June), Mohamed Hammad (June), Cesar Magalhaes (June), Mayank Daswani (June), Oscar Chang (June), Ashok Popat (June), Ruichao Li (June), Komal Jalan (June), Yanhan Hou (June), Josh Lipschultz (June), Antoine He (June), Wenhao Jia (June), Pier Giuseppe Sessa (June), Prateek Kolhar (June), William Wong (June), Sumeet Singh (June), Lukas Haas (June), Jay Whang (June), Hanna Klimczak-Pluci\'nska (June), Georges Rotival (June), Grace Chung (June), Yiqing Hua (June), Anfal Siddiqui (June), Nicolas Serrano (June), Dongkai Chen (June), Billy Porter (June), Libin Bai (June), Keshav Shivam (June), Sho Arora (June), Partha Talukdar (June), Tom Cobley (June), Sangnie Bhardwaj (June), Evgeny Gladchenko (June), Simon Green (June), Kelvin Guu (June), Felix Fischer (June), Xiao Wu (June), Eric Wang (June), Achintya Singhal (June), Tatiana Matejovicova (June), James Martens (June), Hongji Li (June), Roma Patel (June), Elizabeth Kemp (June), Jiaqi Pan (June), Lily Wang (June), Blake JianHang Chen (June), Jean-Baptiste Alayrac (June), Navneet Potti (June), Erika Gemzer (June), Eugene Ie (June), Kay McKinney (June), Takaaki Saeki (June), Edward Chou (June), Pascal Lamblin (June), SQ Mah (June), Zach Fisher (June), Martin Chadwick (June), Jon Stritar (June), Obaid Sarvana (June), Andrew Hogue (June), Artem Shtefan (June), Hadi Hashemi (June), Yang Xu (June), Jindong Gu (June), Sharad Vikram (June), Chung-Ching Chang (June), Sabela Ramos (June), Logan Kilpatrick (June), Weijuan Xi (June), Jenny Brennan (June), Yinghao Sun (June), Abhishek Jindal (June), Ionel Gog (June), Dawn Chen (June), Felix Wu (June), Jason Lee (June), Sudhindra Kopalle (June), Srinadh Bhojanapalli (June), Oriol Vinyals (June), Natan Potikha (June), Burcu Karagol Ayan (June), Yuan Yuan (June), Michael Riley (June), Piotr Stanczyk (June), Sergey Kishchenko (June), Bing Wang (June), Dan Garrette (June), Antoine Yang (June), Vlad Feinberg (June), CJ Carey (June), Javad Azizi (June), Viral Shah (June), Erica Moreira (June), Chongyang Shi (June), Josh Feldman (June), Elizabeth Salesky (June), Thomas Lampe (June), Aneesh Pappu (June), Duhyeon Kim (June), Jonas Adler (June), Avi Caciularu (June), Brian Walker (June), Yunhan Xu (June), Yochai Blau (June), Dylan Scandinaro (June), Terry Huang (June), Sam El-Husseini (June), Abhishek Sinha (June), Lijie Ren (June), Taylor Tobin (June), Patrik Sundberg (June), Tim Sohn (June), Vikas Yadav (June), Mimi Ly (June), Emily Xue (June), Jing Xiong (June), Afzal Shama Soudagar (June), Sneha Mondal (June), Nikhil Khadke (June), Qingchun Ren (June), Ben Vargas (June), Stan Bileschi (June), Sarah Chakera (June), Cindy Wang (June), Boyu Wang (June), Yoni Halpern (June), Joe Jiang (June), Vikas Sindhwani (June), Petre Petrov (June), Pranavaraj Ponnuramu (June), Sanket Vaibhav Mehta (June), Yu Watanabe (June), Betty Chan (June), Matheus Wisniewski (June), Trang Pham (June), Jingwei Zhang (June), Conglong Li (June), Dario de Cesare (June), Art Khurshudov (June), Alex Vasiloff (June), Melissa Tan (June), Zoe Ashwood (June), Bobak Shahriari (June), Maryam Majzoubi (June), Garrett Tanzer (June), Olga Kozlova (June), Robin Alazard (June), James Lee-Thorp (June), Nguyet Minh Phu (June), Isaac Tian (June), Junwhan Ahn (June), Andy Crawford (June), Lauren Lax (June), Yuan (June), Shangguan (Yonghao), Iftekhar Naim (Yonghao), David Ross (Yonghao), Oleksandr Ferludin (Yonghao), Tongfei Guo (Yonghao), Andrea Banino (Yonghao), Hubert Soyer (Yonghao), Xiaoen Ju (Yonghao), Dominika Rogozi\'nska (Yonghao), Ishaan Malhi (Yonghao), Marcella Valentine (Yonghao), Daniel Balle (Yonghao), Apoorv Kulshreshtha (Yonghao), Maciej Kula (Yonghao), Yiwen Song (Yonghao), Sophia Austin (Yonghao), John Schultz (Yonghao), Roy Hirsch (Yonghao), Arthur Douillard (Yonghao), Apoorv Reddy (Yonghao), Michael Fink (Yonghao), Summer Yue (Yonghao), Khyatti Gupta (Yonghao), Adam Zhang (Yonghao), Norman Rink (Yonghao), Daniel McDuff (Yonghao), Lei Meng (Yonghao), Andr\'as Gy\"orgy (Yonghao), Yasaman Razeghi (Yonghao), Ricky Liang (Yonghao), Kazuki Osawa (Yonghao), Aviel Atias (Yonghao), Matan Eyal (Yonghao), Tyrone Hill (Yonghao), Nikolai Grigorev (Yonghao), Zhengdong Wang (Yonghao), Nitish Kulkarni (Yonghao), Rachel Soh (Yonghao), Ivan Lobov (Yonghao), Zachary Charles (Yonghao), Sid Lall (Yonghao), Kazuma Hashimoto (Yonghao), Ido Kessler (Yonghao), Victor Gomes (Yonghao), Zelda Mariet (Yonghao), Danny Driess (Yonghao), Alessandro Agostini (Yonghao), Canfer Akbulut (Yonghao), Jingcao Hu (Yonghao), Marissa Ikonomidis (Yonghao), Emily Caveness (Yonghao), Kartik Audhkhasi (Yonghao), Saurabh Agrawal (Yonghao), Ioana Bica (Yonghao), Evan Senter (Yonghao), Jayaram Mudigonda (Yonghao), Kelly Chen (Yonghao), Jingchen Ye (Yonghao), Xuanhui Wang (Yonghao), James Svensson (Yonghao), Philipp Fr\"anken (Yonghao), Josh Newlan (Yonghao), Li Lao (Yonghao), Eva Schnider (Yonghao), Sami Alabed (Yonghao), Joseph Kready (Yonghao), Jesse Emond (Yonghao), Afief Halumi (Yonghao), Tim Zaman (Yonghao), Chengxi Ye (Yonghao), Naina Raisinghani (Yonghao), Vilobh Meshram (Yonghao), Bo Chang (Yonghao), Ankit Singh Rawat (Yonghao), Axel Stjerngren (Yonghao), Sergey Levi (Yonghao), Rui Wang (Yonghao), Xiangzhu Long (Yonghao), Mitchelle Rasquinha (Yonghao), Steven Hand (Yonghao), Aditi Mavalankar (Yonghao), Lauren Agubuzu (Yonghao), Sudeshna Roy (Yonghao), Junquan Chen (Yonghao), Jarek Wilkiewicz (Yonghao), Hao Zhou (Yonghao), Michal Jastrzebski (Yonghao), Qiong Hu (Yonghao), Agustin Dal Lago (Yonghao), Ramya Sree Boppana (Yonghao), Wei-Jen Ko (Yonghao), Jennifer Prendki (Yonghao), Yao Su (Yonghao), Zhi Li (Yonghao), Eliza Rutherford (Yonghao), Girish Ramchandra Rao (Yonghao), Ramona Comanescu (Yonghao), Adri\`a Puigdom\`enech (Yonghao), Qihang Chen (Yonghao), Dessie Petrova (Yonghao), Christine Chan (Yonghao), Vedrana Milutinovic (Yonghao), Felipe Tiengo Ferreira (Yonghao), Chin-Yi Cheng (Yonghao), Ming Zhang (Yonghao), Tapomay Dey (Yonghao), Sherry Yang (Yonghao), Ramesh Sampath (Yonghao), Quoc Le (Yonghao), Howard Zhou (Yonghao), Chu-Cheng Lin (Yonghao), Hoi Lam (Yonghao), Christine Kaeser-Chen (Yonghao), Kai Hui (Yonghao), Dean Hirsch (Yonghao), Tom Eccles (Yonghao), Basil Mustafa (Yonghao), Shruti Rijhwani (Yonghao), Morgane Rivi\`ere (Yonghao), Yuanzhong Xu (Yonghao), Junjie Wang (Yonghao), Xinyang Geng (Yonghao), Xiance Si (Yonghao), Arjun Khare (Yonghao), Cheolmin Kim (Yonghao), Vahab Mirrokni (Yonghao), Kamyu Lee (Yonghao), Khuslen Baatarsukh (Yonghao), Nathaniel Braun (Yonghao), Lisa Wang (Yonghao), Pallavi LV (Yonghao), Richard Tanburn (Yonghao), Yuvein (Yonghao), Zhu (Joyce), Fangda Li (Joyce), Setareh Ariafar (Joyce), Dan Goldberg (Joyce), Ken Burke (Joyce), Daniil Mirylenka (Joyce), Meiqi Guo (Joyce), Olaf Ronneberger (Joyce), Hadas Natalie Vogel (Joyce), Liqun Cheng (Joyce), Nishita Shetty (Joyce), Johnson Jia (Joyce), Thomas Jimma (Joyce), Corey Fry (Joyce), Ted Xiao (Joyce), Martin Sundermeyer (Joyce), Ryan Burnell (Joyce), Yannis Assael (Joyce), Mario Pinto (Joyce), JD Chen (Joyce), Rohit Sathyanarayana (Joyce), Donghyun Cho (Joyce), Jing Lu (Joyce), Rishabh Agarwal (Joyce), Sugato Basu (Joyce), Lucas Gonzalez (Joyce), Dhruv Shah (Joyce), Meng Wei (Joyce), Dre Mahaarachchi (Joyce), Rohan Agrawal (Joyce), Tero Rissa (Joyce), Yani Donchev (Joyce), Ramiro Leal-Cavazos (Joyce), Adrian Hutter (Joyce), Markus Mircea (Joyce), Alon Jacovi (Joyce), Faruk Ahmed (Joyce), Jiageng Zhang (Joyce), Shuguang Hu (Joyce), Bo-Juen Chen (Joyce), Jonni Kanerva (Joyce), Guillaume Desjardins (Joyce), Andrew Lee (Joyce), Nikos Parotsidis (Joyce), Asier Mujika (Joyce), Tobias Weyand (Joyce), Jasper Snoek (Joyce), Jo Chick (Joyce), Kai Chen (Joyce), Paul Chang (Joyce), Ethan Mahintorabi (Joyce), Zi Wang (Joyce), Tolly Powell (Joyce), Orgad Keller (Joyce), Abhirut Gupta (Joyce), Claire Sha (Joyce), Kanav Garg (Joyce), Nicolas Heess (Joyce), \'Agoston Weisz (Joyce), Cassidy Hardin (Joyce), Bartek Wydrowski (Joyce), Ben Coleman (Joyce), Karina Zainullina (Joyce), Pankaj Joshi (Joyce), Alessandro Epasto (Joyce), Terry Spitz (Joyce), Binbin Xiong (Joyce), Kai Zhao (Joyce), Arseniy Klimovskiy (Joyce), Ivy Zheng (Joyce), Johan Ferret (Joyce), Itay Yona (Joyce), Waleed Khawaja (Joyce), Jean-Baptiste Lespiau (Joyce), Maxim Krikun (Joyce), Siamak Shakeri (Joyce), Timothee Cour (Joyce), Bonnie Li (Joyce), Igor Krivokon (Joyce), Dan Suh (Joyce), Alex Hofer (Joyce), Jad Al Abdallah (Joyce), Nikita Putikhin (Joyce), Oscar Akerlund (Joyce), Silvio Lattanzi (Joyce), Anurag Kumar (Joyce), Shane Settle (Joyce), Himanshu Srivastava (Joyce), Folawiyo Campbell-Ajala (Joyce), Edouard Rosseel (Joyce), Mihai Dorin Istin (Joyce), Nishanth Dikkala (Joyce), Anand Rao (Joyce), Nick Young (Joyce), Kate Lin (Joyce), Dhruva Bhaswar (Joyce), Yiming Wang (Joyce), Jaume Sanchez Elias (Joyce), Kritika Muralidharan (Joyce), James Keeling (Joyce), Dayou Du (Joyce), Siddharth Gopal (Joyce), Gregory Dibb (Joyce), Charles Blundell (Joyce), Manolis Delakis (Joyce), Jacky Liang (Joyce), Marco Tulio Ribeiro (Joyce), Georgi Karadzhov (Joyce), Guillermo Garrido (Joyce), Ankur Bapna (Joyce), Jiawei Cao (Joyce), Adam Sadovsky (Joyce), Pouya Tafti (Joyce), Arthur Guez (Joyce), Coline Devin (Joyce), Yixian Di (Joyce), Jinwei Xing (Joyce), Chuqiao (Joyce), Xu (Cindy), Hanzhao Lin (Cindy), Chun-Te Chu (Cindy), Sameera Ponda (Cindy), Wesley Helmholz (Cindy), Fan Yang (Cindy), Yue Gao (Cindy), Sara Javanmardi (Cindy), Wael Farhan (Cindy), Alex Ramirez (Cindy), Ricardo Figueira (Cindy), Khe Chai Sim (Cindy), Yuval Bahat (Cindy), Ashwin Vaswani (Cindy), Liangzhe Yuan (Cindy), Gufeng Zhang (Cindy), Leland Rechis (Cindy), Hanjun Dai (Cindy), Tayo Oguntebi (Cindy), Alexandra Cordell (Cindy), Eug\'enie Rives (Cindy), Kaan Tekelioglu (Cindy), Naveen Kumar (Cindy), Bing Zhang (Cindy), Aurick Zhou (Cindy), Nikolay Savinov (Cindy), Andrew Leach (Cindy), Alex Tudor (Cindy), Sanjay Ganapathy (Cindy), Yanyan Zheng (Cindy), Mirko Rossini (Cindy), Vera Axelrod (Cindy), Arnaud Autef (Cindy), Yukun Zhu (Cindy), Zheng Zheng (Cindy), Mingda Zhang (Cindy), Baochen Sun (Cindy), Jie Ren (Cindy), Nenad Tomasev (Cindy), Nithish Kannan (Cindy), Amer Sinha (Cindy), Charles Chen (Cindy), Louis O'Bryan (Cindy), Alex Pak (Cindy), Aditya Kusupati (Cindy), Weel Yang (Cindy), Deepak Ramachandran (Cindy), Patrick Griffin (Cindy), Seokhwan Kim (Cindy), Philipp Neubeck (Cindy), Craig Schiff (Cindy), Tammo Spalink (Cindy), Mingyang Ling (Cindy), Arun Nair (Cindy), Ga-Young Joung (Cindy), Linda Deng (Cindy), Avishkar Bhoopchand (Cindy), Lora Aroyo (Cindy), Tom Duerig (Cindy), Jordan Griffith (Cindy), Gabe Barth-Maron (Cindy), Jake Ades (Cindy), Alex Haig (Cindy), Ankur Taly (Cindy), Yunting Song (Cindy), Paul Michel (Cindy), Dave Orr (Cindy), Dean Weesner (Cindy), Corentin Tallec (Cindy), Carrie Grimes Bostock (Cindy), Paul Niemczyk (Cindy), Andy Twigg (Cindy), Mudit Verma (Cindy), Rohith Vallu (Cindy), Henry Wang (Cindy), Marco Gelmi (Cindy), Kiranbir Sodhia (Cindy), Aleksandr Chuklin (Cindy), Omer Goldman (Cindy), Jasmine George (Cindy), Liang Bai (Cindy), Kelvin Zhang (Cindy), Petar Sirkovic (Cindy), Efrat Nehoran (Cindy), Golan Pundak (Cindy), Jiaqi Mu (Cindy), Alice Chen (Cindy), Alex Greve (Cindy), Paulo Zacchello (Cindy), David Amos (Cindy), Heming Ge (Cindy), Eric Noland (Cindy), Colton Bishop (Cindy), Jeffrey Dudek (Cindy), Youhei Namiki (Cindy), Elena Buchatskaya (Cindy), Jing Li (Cindy), Dorsa Sadigh (Cindy), Masha Samsikova (Cindy), Dan Malkin (Cindy), Damien Vincent (Cindy), Robert David (Cindy), Rob Willoughby (Cindy), Phoenix Meadowlark (Cindy), Shawn Gao (Cindy), Yan Li (Cindy), Raj Apte (Cindy), Amit Jhindal (Cindy), Stein Xudong Lin (Cindy), Alex Polozov (Cindy), Zhicheng Wang (Cindy), Tomas Mery (Cindy), Anirudh GP (Cindy), Varun Yerram (Cindy), Sage Stevens (Cindy), Tianqi Liu (Cindy), Noah Fiedel (Cindy), Charles Sutton (Cindy), Matthew Johnson (Cindy), Xiaodan Song (Cindy), Kate Baumli (Cindy), Nir Shabat (Cindy), Muqthar Mohammad (Cindy), Hao Liu (Cindy), Marco Selvi (Cindy), Yichao Zhou (Cindy), Mehdi Hafezi Manshadi (Cindy), Chu-ling Ko (Cindy), Anthony Chen (Cindy), Michael Bendersky (Cindy), Jorge Gonzalez Mendez (Cindy), Nisarg Kothari (Cindy), Amir Zandieh (Cindy), Yiling Huang (Cindy), Daniel Andor (Cindy), Ellie Pavlick (Cindy), Idan Brusilovsky (Cindy), Jitendra Harlalka (Cindy), Sally Goldman (Cindy), Andrew Lampinen (Cindy), Guowang Li (Cindy), Asahi Ushio (Cindy), Somit Gupta (Cindy), Lei Zhang (Cindy), Chuyuan Kelly Fu (Cindy), Madhavi Sewak (Cindy), Timo Denk (Cindy), Jed Borovik (Cindy), Brendan Jou (Cindy), Avital Zipori (Cindy), Prateek Jain (Cindy), Junwen Bai (Cindy), Thang Luong (Cindy), Jonathan Tompson (Cindy), Alice Li (Cindy), Li Liu (Cindy), George Powell (Cindy), Jiajun Shen (Cindy), Alex Feng (Cindy), Grishma Chole (Cindy), Da Yu (Cindy), Yinlam Chow (Cindy), Tongxin Yin (Cindy), Eric Malmi (Cindy), Kefan Xiao (Cindy), Yash Pande (Cindy), Shachi Paul (Cindy), Niccol\`o Dal Santo (Cindy), Adil Dostmohamed (Cindy), Sergio Guadarrama (Cindy), Aaron Phillips (Cindy), Thanumalayan Sankaranarayana Pillai (Cindy), Gal Yona (Cindy), Amin Ghafouri (Cindy), Preethi Lahoti (Cindy), Benjamin Lee (Cindy), Dhruv Madeka (Cindy), Eren Sezener (Cindy), Simon Tokumine (Cindy), Adrian Collister (Cindy), Nicola De Cao (Cindy), Richard Shin (Cindy), Uday Kalra (Cindy), Parker Beak (Cindy), Emily Nottage (Cindy), Ryo Nakashima (Cindy), Ivan Jurin (Cindy), Vikash Sehwag (Cindy), Meenu Gaba (Cindy), Junhao Zeng (Cindy), Kevin R. McKee (Cindy), Fernando Pereira (Cindy), Tamar Yakar (Cindy), Amayika Panda (Cindy), Arka Dhar (Cindy), Peilin Zhong (Cindy), Daniel Sohn (Cindy), Mark Brand (Cindy), Lars Lowe Sjoesund (Cindy), Viral Carpenter (Cindy), Sharon Lin (Cindy), Shantanu Thakoor (Cindy), Marcus Wainwright (Cindy), Ashwin Chaugule (Cindy), Pranesh Srinivasan (Cindy), Muye Zhu (Cindy), Bernett Orlando (Cindy), Jack Weber (Cindy), Ayzaan Wahid (Cindy), Gilles Baechler (Cindy), Apurv Suman (Cindy), Jovana Mitrovi\'c (Cindy), Gabe Taubman (Cindy), Honglin Yu (Cindy), Helen King (Cindy), Josh Dillon (Cindy), Cathy Yip (Cindy), Dhriti Varma (Cindy), Tomas Izo (Cindy), Levent Bolelli (Cindy), Borja De Balle Pigem (Cindy), Julia Di Trapani (Cindy), Fotis Iliopoulos (Cindy), Adam Paszke (Cindy), Nishant Ranka (Cindy), Joe Zou (Cindy), Francesco Pongetti (Cindy), Jed McGiffin (Cindy), Alex Siegman (Cindy), Rich Galt (Cindy), Ross Hemsley (Cindy), Goran \v{Z}u\v{z}i\'c (Cindy), Victor Carbune (Cindy), Tao Li (Cindy), Myle Ott (Cindy), F\'elix de Chaumont Quitry (Cindy), David Vilar Torres (Cindy), Yuri Chervonyi (Cindy), Tomy Tsai (Cindy), Prem Eruvbetine (Cindy), Samuel Yang (Cindy), Matthew Denton (Cindy), Jake Walker (Cindy), Slavica Anda\v{c}i\'c (Cindy), Idan Heimlich Shtacher (Cindy), Vittal Premachandran (Cindy), Harshal Tushar Lehri (Cindy), Cip Baetu (Cindy), Damion Yates (Cindy), Lampros Lamprou (Cindy), Mariko Iinuma (Cindy), Ioana Mihailescu (Cindy), Ben Albrecht (Cindy), Shachi Dave (Cindy), Susie Sargsyan (Cindy), Bryan Perozzi (Cindy), Lucas Manning (Cindy), Chiyuan Zhang (Cindy), Denis Vnukov (Cindy), Igor Mordatch (Cindy), Raia Hadsell Wolfgang Macherey (Cindy), Ryan Kappedal (Cindy), Jim Stephan (Cindy), Aditya Tripathi (Cindy), Klaus Macherey (Cindy), Jun Qian (Cindy), Abhishek Bhowmick (Cindy), Shekoofeh Azizi (Cindy), R\'emi Leblond (Cindy), Shiva Mohan Reddy Garlapati (Cindy), Timothy Knight (Cindy), Matthew Wiethoff (Cindy), Wei-Chih Hung (Cindy), Anelia Angelova (Cindy), Georgios Evangelopoulos (Cindy), Pawel Janus (Cindy), Dimitris Paparas (Cindy), Matthew Rahtz (Cindy), Ken Caluwaerts (Cindy), Vivek Sampathkumar (Cindy), Daniel Jarrett (Cindy), Shadi Noghabi (Cindy), Antoine Miech (Cindy), Chak Yeung (Cindy), Geoff Clark (Cindy), Henry Prior (Cindy), Fei Zheng (Cindy), Jean Pouget-Abadie (Cindy), Indro Bhattacharya (Cindy), Kalpesh Krishna (Cindy), Will Bishop (Cindy), Zhe Yuan (Cindy), Yunxiao Deng (Cindy), Ashutosh Sathe (Cindy), Kacper Krasowiak (Cindy), Ciprian Chelba (Cindy), Cho-Jui Hsieh (Cindy), Kiran Vodrahalli (Cindy), Buhuang Liu (Cindy), Thomas K\"oppe (Cindy), Amr Khalifa (Cindy), Lubo Litchev (Cindy), Pichi Charoenpanit (Cindy), Reed Roberts (Cindy), Sachin Yadav (Cindy), Yasumasa Onoe (Cindy), Desi Ivanov (Cindy), Megha Mohabey (Cindy), Vighnesh Birodkar (Cindy), Nemanja Raki\'cevi\'c (Cindy), Pierre Sermanet (Cindy), Vaibhav Mehta (Cindy), Krishan Subudhi (Cindy), Travis Choma (Cindy), Will Ng (Cindy), Luheng He (Cindy), Kathie Wang (Cindy), Tasos Kementsietsidis (Cindy), Shane Gu (Cindy), Mansi Gupta (Cindy), Andrew Nystrom (Cindy), Mehran Kazemi (Cindy), Timothy Chung (Cindy), Nacho Cano (Cindy), Nikhil Dhawan (Cindy), Yufei Wang (Cindy), Jiawei Xia (Cindy), Trevor Yacovone (Cindy), Eric Jia (Cindy), Mingqing Chen (Cindy), Simeon Ivanov (Cindy), Ashrith Sheshan (Cindy), Sid Dalmia (Cindy), Pawe{\l} Stradomski (Cindy), Pengcheng Yin (Cindy), Salem Haykal (Cindy), Congchao Wang (Cindy), Dennis Duan (Cindy), Neslihan Bulut (Cindy), Greg Kochanski (Cindy), Liam MacDermed (Cindy), Namrata Godbole (Cindy), Shitao Weng (Cindy), Jingjing Chen (Cindy), Rachana Fellinger (Cindy), Ramin Mehran (Cindy), Daniel Suo (Cindy), Hisham Husain (Cindy), Tong He (Cindy), Kaushal Patel (Cindy), Joshua Howland (Cindy), Randall Parker (Cindy), Kelvin Nguyen (Cindy), Sharath Maddineni (Cindy), Chris Rawles (Cindy), Mina Khan (Cindy), Shlomi Cohen-Ganor (Cindy), Amol Mandhane (Cindy), Xinyi Wu (Cindy), Chenkai Kuang (Cindy), Iulia Com\c{s}a (Cindy), Ramya Ganeshan (Cindy), Hanie Sedghi (Cindy), Adam Bloniarz (Cindy), Nuo Wang Pierse (Cindy), Anton Briukhov (Cindy), Petr Mitrichev (Cindy), Anita Gergely (Cindy), Serena Zhan (Cindy), Allan Zhou (Cindy), Nikita Saxena (Cindy), Eva Lu (Cindy), Josef Dean (Cindy), Ashish Gupta (Cindy), Nicolas Perez-Nieves (Cindy), Renjie Wu (Cindy), Cory McLean (Cindy), Wei Liang (Cindy), Disha Jindal (Cindy), Anton Tsitsulin (Cindy), Wenhao Yu (Cindy), Kaiz Alarakyia (Cindy), Tom Schaul (Cindy), Piyush Patil (Cindy), Peter Sung (Cindy), Elijah Peake (Cindy), Hongkun Yu (Cindy), Feryal Behbahani (Cindy), JD Co-Reyes (Cindy), Alan Ansell (Cindy), Sean Sun (Cindy), Clara Barbu (Cindy), Jonathan Lee (Cindy), Seb Noury (Cindy), James Allingham (Cindy), Bilal Piot (Cindy), Mohit Sharma (Cindy), Christopher Yew (Cindy), Ivan Korotkov (Cindy), Bibo Xu (Cindy), Demetra Brady (Cindy), Goran Petrovic (Cindy), Shibl Mourad (Cindy), Claire Cui (Cindy), Aditya Gupta (Cindy), Parker Schuh (Cindy), Saarthak Khanna (Cindy), Anna Goldie (Cindy), Abhinav Arora (Cindy), Vadim Zubov (Cindy), Amy Stuart (Cindy), Mark Epstein (Cindy), Yun Zhu (Cindy), Jianqiao Liu (Cindy), Yury Stuken (Cindy), Ziyue Wang (Cindy), Karolis Misiunas (Cindy), Dee Guo (Cindy), Ashleah Gill (Cindy), Ale Hartman (Cindy), Zaid Nabulsi (Cindy), Aurko Roy (Cindy), Aleksandra Faust (Cindy), Jason Riesa (Cindy), Ben Withbroe (Cindy), Mengchao Wang (Cindy), Marco Tagliasacchi (Cindy), Andreea Marzoca (Cindy), James Noraky (Cindy), Serge Toropov (Cindy), Malika Mehrotra (Cindy), Bahram Raad (Cindy), Sanja Deur (Cindy), Steve Xu (Cindy), Marianne Monteiro (Cindy), Zhongru Wu (Cindy), Yi Luan (Cindy), Sam Ritter (Cindy), Nick Li (Cindy), H{\aa}vard Garnes (Cindy), Yanzhang He (Cindy), Martin Zlocha (Cindy), Jifan Zhu (Cindy), Matteo Hessel (Cindy), Will Wu (Cindy), Spandana Raj Babbula (Cindy), Chizu Kawamoto (Cindy), Yuanzhen Li (Cindy), Mehadi Hassen (Cindy), Yan Wang (Cindy), Brian Wieder (Cindy), James Freedman (Cindy), Yin Zhang (Cindy), Xinyi Bai (Cindy), Tianli Yu (Cindy), David Reitter (Cindy), XiangHai Sheng (Cindy), Mateo Wirth (Cindy), Aditya Kini (Cindy), Dima Damen (Cindy), Mingcen Gao (Cindy), Rachel Hornung (Cindy), Michael Voznesensky (Cindy), Brian Roark (Cindy), Adhi Kuncoro (Cindy), Yuxiang Zhou (Cindy), Rushin Shah (Cindy), Anthony Brohan (Cindy), Kuangyuan Chen (Cindy), James Wendt (Cindy), David Rim (Cindy), Paul Kishan Rubenstein (Cindy), Jonathan Halcrow (Cindy), Michelle Liu (Cindy), Ty Geri (Cindy), Yunhsuan Sung (Cindy), Jane Shapiro (Cindy), Shaan Bijwadia (Cindy), Chris Duvarney (Cindy), Christina Sorokin (Cindy), Paul Natsev (Cindy), Reeve Ingle (Cindy), Pramod Gupta (Cindy), Young Maeng (Cindy), Ndaba Ndebele (Cindy), Kexin Zhu (Cindy), Valentin Anklin (Cindy), Katherine Lee (Cindy), Yuan Liu (Cindy), Yaroslav Akulov (Cindy), Shaleen Gupta (Cindy), Guolong Su (Cindy), Flavien Prost (Cindy), Tianlin Liu (Cindy), Vitaly Kovalev (Cindy), Pol Moreno (Cindy), Martin Scholz (Cindy), Sam Redmond (Cindy), Zongwei Zhou (Cindy), Alex Castro-Ros (Cindy), Andr\'e Susano Pinto (Cindy), Dia Kharrat (Cindy), Michal Yarom (Cindy), Rachel Saputro (Cindy), Jannis Bulian (Cindy), Ben Caine (Cindy), Ji Liu (Cindy), Abbas Abdolmaleki (Cindy), Shariq Iqbal (Cindy), Tautvydas Misiunas (Cindy), Mikhail Sirotenko (Cindy), Shefali Garg (Cindy), Guy Bensky (Cindy), Huan Gui (Cindy), Xuezhi Wang (Cindy), Raphael Koster (Cindy), Mike Bernico (Cindy), Da Huang (Cindy), Romal Thoppilan (Cindy), Trevor Cohn (Cindy), Ben Golan (Cindy), Wenlei Zhou (Cindy), Andrew Rosenberg (Cindy), Markus Freitag (Cindy), Tynan Gangwani (Cindy), Vincent Tsang (Cindy), Anand Shukla (Cindy), Xiaoqi Ren (Cindy), Minh Giang (Cindy), Chi Zou (Cindy), Andre Elisseeff (Cindy), Charline Le Lan (Cindy), Dheeru Dua (Cindy), Shuba Lall (Cindy), Pranav Shyam (Cindy), Frankie Garcia (Cindy), Sarah Nguyen (Cindy), Michael Guzman (Cindy), AJ Maschinot (Cindy), Marcello Maggioni (Cindy), Ming-Wei Chang (Cindy), Karol Gregor (Cindy), Lotte Weerts (Cindy), Kumaran Venkatesan (Cindy), Bogdan Damoc (Cindy), Leon Liu (Cindy), Jan Wassenberg (Cindy), Lewis Ho (Cindy), Becca Roelofs (Cindy), Majid Hadian (Cindy), Fran\c{c}ois-Xavier Aubet (Cindy), Yu Liang (Cindy), Sami Lachgar (Cindy), Danny Karmon (Cindy), Yong Cheng (Cindy), Amelio V\'azquez-Reina (Cindy), Angie Chen (Cindy), Zhuyun Dai (Cindy), Andy Brock (Cindy), Shubham Agrawal (Cindy), Chenxi Pang (Cindy), Peter Garst (Cindy), Mariella Sanchez-Vargas (Cindy), Ivor Rendulic (Cindy), Aditya Ayyar (Cindy), Andrija Ra\v{z}natovi\'c (Cindy), Olivia Ma (Cindy), Roopali Vij (Cindy), Neha Sharma (Cindy), Ashwin Balakrishna (Cindy), Bingyuan Liu (Cindy), Ian Mackinnon (Cindy), Sorin Baltateanu (Cindy), Petra Poklukar (Cindy), Gabriel Ibagon (Cindy), Colin Ji (Cindy), Hongyang Jiao (Cindy), Isaac Noble (Cindy), Wojciech Stokowiec (Cindy), Zhihao Li (Cindy), Jeff Dean (Cindy), David Lindner (Cindy), Mark Omernick (Cindy), Kristen Chiafullo (Cindy), Mason Dimarco (Cindy), Vitor Rodrigues (Cindy), Vittorio Selo (Cindy), Garrett Honke (Cindy), Xintian (Cindy), Wu (Lucas), Wei He (Lucas), Adam Hillier (Lucas), Anhad Mohananey (Lucas), Vihari Piratla (Lucas), Chang Ye (Lucas), Chase Malik (Lucas), Sebastian Riedel (Lucas), Samuel Albanie (Lucas), Zi Yang (Lucas), Kenny Vassigh (Lucas), Maria Bauza (Lucas), Sheng Li (Lucas), Yiqing Tao (Lucas), Nevan Wichers (Lucas), Andrii Maksai (Lucas), Abe Ittycheriah (Lucas), Ross Mcilroy (Lucas), Bryan Seybold (Lucas), Noah Goodman (Lucas), Romina Datta (Lucas), Steven M. Hernandez (Lucas), Tian Shi (Lucas), Yony Kochinski (Lucas), Anna Bulanova (Lucas), Ken Franko (Lucas), Mikita Sazanovich (Lucas), Nicholas FitzGerald (Lucas), Praneeth Kacham (Lucas), Shubha Srinivas Raghvendra (Lucas), Vincent Hellendoorn (Lucas), Alexander Grushetsky (Lucas), Julian Salazar (Lucas), Angeliki Lazaridou (Lucas), Jason Chang (Lucas), Jan-Thorsten Peter (Lucas), Sushant Kafle (Lucas), Yann Dauphin (Lucas), Abhishek Rao (Lucas), Filippo Graziano (Lucas), Izhak Shafran (Lucas), Yuguo Liao (Lucas), Tianli Ding (Lucas), Geng Yan (Lucas), Grace Chu (Lucas), Zhao Fu (Lucas), Vincent Roulet (Lucas), Gabriel Rasskin (Lucas), Duncan Williams (Lucas), Shahar Drath (Lucas), Alex Mossin (Lucas), Raphael Hoffmann (Lucas), Jordi Orbay (Lucas), Francesco Bertolini (Lucas), Hila Sheftel (Lucas), Justin Chiu (Lucas), Siyang Xue (Lucas), Yuheng Kuang (Lucas), Ferjad Naeem (Lucas), Swaroop Nath (Lucas), Nana Nti (Lucas), Phil Culliton (Lucas), Kashyap Krishnakumar (Lucas), Michael Isard (Lucas), Pei Sun (Lucas), Ayan Chakrabarti (Lucas), Nathan Clement (Lucas), Regev Cohen (Lucas), Arissa Wongpanich (Lucas), GS Oh (Lucas), Ashwin Murthy (Lucas), Hao Zheng (Lucas), Jessica Hamrick (Lucas), Oskar Bunyan (Lucas), Suhas Ganesh (Lucas), Nitish Gupta (Lucas), Roy Frostig (Lucas), John Wieting (Lucas), Yury Malkov (Lucas), Pierre Marcenac (Lucas), Zhixin (Lucas), Lai, Xiaodan Tang, Mohammad Saleh, Fedir Zubach, Chinmay Kulkarni, Huanjie Zhou, Vicky Zayats, Nan Ding, Anshuman Tripathi, Arijit Pramanik, Patrik Zochbauer, Harish Ganapathy, Vedant Misra, Zach Behrman, Hugo Vallet, Mingyang Zhang, Mukund Sridhar, Ye Jin, Mohammad Babaeizadeh, Siim P\~oder, Megha Goel, Divya Jain, Tajwar Nasir, Shubham Mittal, Tim Dozat, Diego Ardila, Aliaksei Severyn, Fabio Pardo, Sammy Jerome, Siyang Qin, Louis Rouillard, Amir Yazdanbakhsh, Zizhao Zhang, Shivani Agrawal, Kaushik Shivakumar, Caden Lu, Praveen Kallakuri, Rachita Chhaparia, Kanishka Rao, Charles Kwong, Asya Fadeeva, Shitij Nigam, Yan Virin, Yuan Zhang, Balaji Venkatraman, Beliz Gunel, Marc Wilson, Huiyu Wang, Abhinav Gupta, Xiaowei Xu, Adrien Ali Ta\"iga, Kareem Mohamed, Doug Fritz, Daniel Rodriguez, Zoubin Ghahramani, Harry Askham, Lior Belenki, James Zhao, Rahul Gupta, Krzysztof Jastrz\k{e}bski, Takahiro Kosakai, Kaan Katircioglu, Jon Schneider, Rina Panigrahy, Konstantinos Bousmalis, Peter Grabowski, Prajit Ramachandran, Chaitra Hegde, Mihaela Rosca, Angelo Scorza Scarpati, Kyriakos Axiotis, Ying Xu, Zach Gleicher, Assaf Hurwitz Michaely, Mandar Sharma, Sanil Jain, Christoph Hirnschall, Tal Marian, Xuhui Jia, Kevin Mather, Kilol Gupta, Linhai Qiu, Nigamaa Nayakanti, Lucian Ionita, Steven Zheng, Lucia Loher, Kurt Shuster, Igor Petrovski, Roshan Sharma, Rahma Chaabouni, Angel Yeh, James An, Arushi Gupta, Steven Schwarcz, Seher Ellis, Sam Conway-Rahman, Javier Snaider, Alex Zhai, James Atwood, Daniel Golovin, Liqian Peng, Te I, Vivian Xia, Salvatore Scellato, Mahan Malihi, Arthur Bra\v{z}inskas, Vlad-Doru Ion, Younghoon Jun, James Swirhun, Soroosh Mariooryad, Jiao Sun, Steve Chien, Rey Coaguila, Ariel Brand, Yi Gao, Tom Kwiatkowski, Roee Aharoni, Cheng-Chun Lee, Mislav \v{Z}ani\'c, Yichi Zhang, Dan Ethier, Vitaly Nikolaev, Pranav Nair, Yoav Ben Shalom, Hen Fitoussi, Jai Gupta, Hongbin Liu, Dee Cattle, Tolga Bolukbasi, Ben Murdoch, Fantine Huot, Yin Li, Chris Hahn

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

cross Q-Detection: A Quantum-Classical Hybrid Poisoning Attack Detection Method

Authors: Haoqi He, Xiaokai Lin, Jiancai Chen, Yan Xiao

Abstract: Data poisoning attacks pose significant threats to machine learning models by introducing malicious data into the training process, thereby degrading model performance or manipulating predictions. Detecting and sifting out poisoned data is an important method to prevent data poisoning attacks. Limited by classical computation frameworks, upcoming larger-scale and more complex datasets may pose difficulties for detection. We introduce the unique speedup of quantum computing for the first time in the task of detecting data poisoning. We present Q-Detection, a quantum-classical hybrid defense method for detecting poisoning attacks. Q-Detection also introduces the Q-WAN, which is optimized using quantum computing devices. Experimental results using multiple quantum simulation libraries show that Q-Detection effectively defends against label manipulation and backdoor attacks. The metrics demonstrate that Q-Detection consistently outperforms the baseline methods and is comparable to the state-of-the-art. Theoretical analysis shows that Q-Detection is expected to achieve more than a 20% speedup using quantum computing power.

cross The Emotional Alignment Design Policy

Authors: Eric Schwitzgebel, Jeff Sebo

Abstract: According to what we call the Emotional Alignment Design Policy, artificial entities should be designed to elicit emotional reactions from users that appropriately reflect the entities' capacities and moral status, or lack thereof. This principle can be violated in two ways: by designing an artificial system that elicits stronger or weaker emotional reactions than its capacities and moral status warrant (overshooting or undershooting), or by designing a system that elicits the wrong type of emotional reaction (hitting the wrong target). Although presumably attractive, practical implementation faces several challenges including: How can we respect user autonomy while promoting appropriate responses? How should we navigate expert and public disagreement and uncertainty about facts and values? What if emotional alignment seems to require creating or destroying entities with moral status? To what extent should designs conform to versus attempt to alter user assumptions and attitudes?

cross X-ray transferable polyrepresentation learning

Authors: Weronika Hryniewska-Guzik, Przemyslaw Biecek

Abstract: The success of machine learning algorithms is inherently related to the extraction of meaningful features, as they play a pivotal role in the performance of these algorithms. Central to this challenge is the quality of data representation. However, the ability to generalize and extract these features effectively from unseen datasets is also crucial. In light of this, we introduce a novel concept: the polyrepresentation. Polyrepresentation integrates multiple representations of the same modality extracted from distinct sources, for example, vector embeddings from the Siamese Network, self-supervised models, and interpretable radiomic features. This approach yields better performance metrics compared to relying on a single representation. Additionally, in the context of X-ray images, we demonstrate the transferability of the created polyrepresentation to a smaller dataset, underscoring its potential as a pragmatic and resource-efficient approach in various image-related solutions. It is worth noting that the concept of polyprepresentation on the example of medical data can also be applied to other domains, showcasing its versatility and broad potential impact.

cross SPARC: Concept-Aligned Sparse Autoencoders for Cross-Model and Cross-Modal Interpretability

Authors: Ali Nasiri-Sarvi, Hassan Rivaz, Mahdi S. Hosseini

Abstract: Understanding how different AI models encode the same high-level concepts, such as objects or attributes, remains challenging because each model typically produces its own isolated representation. Existing interpretability methods like Sparse Autoencoders (SAEs) produce latent concepts individually for each model, resulting in incompatible concept spaces and limiting cross-model interpretability. To address this, we introduce SPARC (Sparse Autoencoders for Aligned Representation of Concepts), a new framework that learns a single, unified latent space shared across diverse architectures and modalities (e.g., vision models like DINO, and multimodal models like CLIP). SPARC's alignment is enforced through two key innovations: (1) a Global TopK sparsity mechanism, ensuring all input streams activate identical latent dimensions for a given concept; and (2) a Cross-Reconstruction Loss, which explicitly encourages semantic consistency between models. On Open Images, SPARC dramatically improves concept alignment, achieving a Jaccard similarity of 0.80, more than tripling the alignment compared to previous methods. SPARC creates a shared sparse latent space where individual dimensions often correspond to similar high-level concepts across models and modalities, enabling direct comparison of how different architectures represent identical concepts without requiring manual alignment or model-specific analysis. As a consequence of this aligned representation, SPARC also enables practical applications such as text-guided spatial localization in vision-only models and cross-model/cross-modal retrieval. Code and models are available at https://github.com/AtlasAnalyticsLab/SPARC.

URLs: https://github.com/AtlasAnalyticsLab/SPARC.

cross Machine Learning based Enterprise Financial Audit Framework and High Risk Identification

Authors: Tingyu Yuan, Xi Zhang, Xuanjing Chen

Abstract: In the face of global economic uncertainty, financial auditing has become essential for regulatory compliance and risk mitigation. Traditional manual auditing methods are increasingly limited by large data volumes, complex business structures, and evolving fraud tactics. This study proposes an AI-driven framework for enterprise financial audits and high-risk identification, leveraging machine learning to improve efficiency and accuracy. Using a dataset from the Big Four accounting firms (EY, PwC, Deloitte, KPMG) from 2020 to 2025, the research examines trends in risk assessment, compliance violations, and fraud detection. The dataset includes key indicators such as audit project counts, high-risk cases, fraud instances, compliance breaches, employee workload, and client satisfaction, capturing both audit behaviors and AI's impact on operations. To build a robust risk prediction model, three algorithms - Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbors (KNN) - are evaluated. SVM uses hyperplane optimization for complex classification, RF combines decision trees to manage high-dimensional, nonlinear data with resistance to overfitting, and KNN applies distance-based learning for flexible performance. Through hierarchical K-fold cross-validation and evaluation using F1-score, accuracy, and recall, Random Forest achieves the best performance, with an F1-score of 0.9012, excelling in identifying fraud and compliance anomalies. Feature importance analysis reveals audit frequency, past violations, employee workload, and client ratings as key predictors. The study recommends adopting Random Forest as a core model, enhancing features via engineering, and implementing real-time risk monitoring. This research contributes valuable insights into using machine learning for intelligent auditing and risk management in modern enterprises.

cross A Collectivist, Economic Perspective on AI

Authors: Michael I. Jordan

Abstract: Information technology is in the midst of a revolution in which omnipresent data collection and machine learning are impacting the human world as never before. The word "intelligence" is being used as a North Star for the development of this technology, with human cognition viewed as a baseline. This view neglects the fact that humans are social animals, and that much of our intelligence is social and cultural in origin. A related issue is that the current view treats the social consequences of technology as an afterthought. The path forward is not merely more data and compute, and not merely more attention paid to cognitive or symbolic representations, but a thorough blending of economic and social concepts with computational and inferential concepts, in the service of system-level designs in which social welfare is a first-class citizen, and with the aspiration that a new human-centric engineering field will emerge.

cross A Probabilistic Approach to Uncertainty Quantification Leveraging 3D Geometry

Authors: Rushil Desai, Frederik Warburg, Trevor Darrell, Marissa Ramirez de Chanlatte

Abstract: Quantifying uncertainty in neural implicit 3D representations, particularly those utilizing Signed Distance Functions (SDFs), remains a substantial challenge due to computational inefficiencies, scalability issues, and geometric inconsistencies. Existing methods typically neglect direct geometric integration, leading to poorly calibrated uncertainty maps. We introduce BayesSDF, a novel probabilistic framework for uncertainty quantification in neural implicit SDF models, motivated by scientific simulation applications with 3D environments (e.g., forests) such as modeling fluid flow through forests, where precise surface geometry and awareness of fidelity surface geometric uncertainty are essential. Unlike radiance-based models such as NeRF or 3D Gaussian splatting, which lack explicit surface formulations, SDFs define continuous and differentiable geometry, making them better suited for physical modeling and analysis. BayesSDF leverages a Laplace approximation to quantify local surface instability via Hessian-based metrics, enabling computationally efficient, surface-aware uncertainty estimation. Our method shows that uncertainty predictions correspond closely with poorly reconstructed geometry, providing actionable confidence measures for downstream use. Extensive evaluations on synthetic and real-world datasets demonstrate that BayesSDF outperforms existing methods in both calibration and geometric consistency, establishing a strong foundation for uncertainty-aware 3D scene reconstruction, simulation, and robotic decision-making.

cross LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance

Authors: Zhang Li, Biao Yang, Qiang Liu, Shuo Zhang, Zhiyin Ma, Shuo Zhang, Liang Yin, Linger Deng, Yabo Sun, Yuliang Liu, Xiang Bai

Abstract: While large multi-modal models (LMMs) demonstrate promising capabilities in segmentation and comprehension, they still struggle with two limitations: inaccurate segmentation and hallucinated comprehension. These challenges stem primarily from constraints in weak visual comprehension and a lack of fine-grained perception. To alleviate these limitations, we propose LIRA, a framework that capitalizes on the complementary relationship between visual comprehension and segmentation via two key components: (1) Semantic-Enhanced Feature Extractor (SEFE) improves object attribute inference by fusing semantic and pixel-level features, leading to more accurate segmentation; (2) Interleaved Local Visual Coupling (ILVC) autoregressively generates local descriptions after extracting local features based on segmentation masks, offering fine-grained supervision to mitigate hallucinations. Furthermore, we find that the precision of object segmentation is positively correlated with the latent related semantics of the token. To quantify this relationship and the model's potential semantic inferring ability, we introduce the Attributes Evaluation (AttrEval) dataset. Our experiments show that LIRA achieves state-of-the-art performance in both segmentation and comprehension tasks. Code will be available at https://github.com/echo840/LIRA.

URLs: https://github.com/echo840/LIRA.

cross Magneto-radiative modelling and artificial neural network optimization of biofluid flow in a stenosed arterial domain

Authors: S P Shivakumar, Gunisetty Ramasekhar, P Nimmy, Sujesh Areekara, L Thanuja, T V Smitha, S Devanathan, Ganesh R Naik, K V Nagaraja

Abstract: The increasing complexity of cardiovascular diseases and limitations in traditional healing methods mandate the invention of new drug delivery systems that assure targeted, effective, and regulated treatments, contributing directly to UN SDGs 3 and 9, thereby encouraging the utilization of sustainable medical technologies in healthcare. This study investigates the flow of a Casson-Maxwell nanofluid through a stenosed arterial domain. The quantities, such as skin friction and heat transfer rate, are analysed in detail. The Casson-Maxwell fluid shows a lower velocity profile than the Casson fluids, which indicates the improved residence time for efficient drug delivery. The heat transfer rate shows an increase with higher volume fractions of copper and aluminium oxide nanoparticles and a decrease with higher volume fractions of silver nanoparticles. The skin friction coefficient decreases by 219% with a unit increase in the Maxwell parameter, whereas it increases by 66.1% with a unit rise in the Casson parameter. This work supports SDGs 4 and 17 by fostering interdisciplinary learning and collaboration in fluid dynamics and healthcare innovation. Additionally, the rate of heat flow was forecasted (with an overall R-value of 0.99457) using the Levenberg-Marquardt backpropagation training scheme under the influence of magneto-radiative, linear heat source and Casson-Maxwell parameters along with the tri-metallic nanoparticle volume fractions. It is also observed that the drag coefficient is most sensitive to the changes in the Maxwell parameter.

cross Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks

Authors: Huanming Shen, Baizhou Huang, Xiaojun Wan

Abstract: Watermarking is a promising defense against the misuse of large language models (LLMs), yet it remains vulnerable to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks. This work breaks this trade-off by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection. Based on the redundancy, we propose a novel watermark scheme with Sub-vocabulary decomposed Equivalent tExture Key (SEEK). It achieves a Pareto improvement, increasing the resilience against scrubbing attacks without compromising robustness to spoofing. Experiments demonstrate SEEK's superiority over prior method, yielding spoofing robustness gains of +88.2%/+92.3%/+82.0% and scrubbing robustness gains of +10.2%/+6.4%/+24.6% across diverse dataset settings.

cross Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques

Authors: Yassin Hussein Rassul, Aram M. Ahmed, Polla Fattah, Bryar A. Hassan, Arwaa W. Abdulkareem, Tarik A. Rashid, Joan Lu

Abstract: Offline Handwritten Text Recognition (HTR) systems play a crucial role in applications such as historical document digitization, automatic form processing, and biometric authentication. However, their performance is often hindered by the limited availability of annotated training data, particularly for low-resource languages and complex scripts. This paper presents a comprehensive survey of offline handwritten data augmentation and generation techniques designed to improve the accuracy and robustness of HTR systems. We systematically examine traditional augmentation methods alongside recent advances in deep learning, including Generative Adversarial Networks (GANs), diffusion models, and transformer-based approaches. Furthermore, we explore the challenges associated with generating diverse and realistic handwriting samples, particularly in preserving script authenticity and addressing data scarcity. This survey follows the PRISMA methodology, ensuring a structured and rigorous selection process. Our analysis began with 1,302 primary studies, which were filtered down to 848 after removing duplicates, drawing from key academic sources such as IEEE Digital Library, Springer Link, Science Direct, and ACM Digital Library. By evaluating existing datasets, assessment metrics, and state-of-the-art methodologies, this survey identifies key research gaps and proposes future directions to advance the field of handwritten text generation across diverse linguistic and stylistic landscapes.

cross The Prompt War: How AI Decides on a Military Intervention

Authors: Maxim Chupilkin

Abstract: Which factors determine AI propensity for military intervention? While the use of AI in war games and military planning is growing exponentially, the simple analysis of key drivers embedded in the models has not yet been done. This paper does a simple conjoint experiment proposing a model to decide on military intervention in 640 vignettes where each was run for 100 times allowing to explore AI decision on military intervention systematically. The analysis finds that largest predictors of AI decision to intervene are high domestic support and high probability of success. Costs such as international condemnation, military deaths, civilian deaths, and negative economic effect are statistically significant, but their effect is around half of domestic support and probability of victory. Closing window of opportunity only reaches statistical significance in interaction with other factors. The results are remarkably consistent across scenarios and across different models (OpenAI GPT, Anthropic Claude, Google Gemini) suggesting a pattern in AI decision-making.

cross A Survey of Multi Agent Reinforcement Learning: Federated Learning and Cooperative and Noncooperative Decentralized Regimes

Authors: Kemboi Cheruiyot, Nickson Kiprotich, Vyacheslav Kungurtsev, Kennedy Mugo, Vivian Mwirigi, Marvin Ngesa

Abstract: The increasing interest in research and innovation towards the development of autonomous agents presents a number of complex yet important scenarios of multiple AI Agents interacting with each other in an environment. The particular setting can be understood as exhibiting three possibly topologies of interaction - centrally coordinated cooperation, ad-hoc interaction and cooperation, and settings with noncooperative incentive structures. This article presents a comprehensive survey of all three domains, defined under the formalism of Federal Reinforcement Learning (RL), Decentralized RL, and Noncooperative RL, respectively. Highlighting the structural similarities and distinctions, we review the state of the art in these subjects, primarily explored and developed only recently in the literature. We include the formulations as well as known theoretical guarantees and highlights and limitations of numerical performance.

cross The bitter lesson of misuse detection

Authors: Hadrien Mariaccia, Charbel-Rapha\"el Segerie, Diego Dorn

Abstract: Prior work on jailbreak detection has established the importance of adversarial robustness for LLMs but has largely focused on the model ability to resist adversarial inputs and to output safe content, rather than the effectiveness of external supervision systems. The only public and independent benchmark of these guardrails to date evaluates a narrow set of supervisors on limited scenarios. Consequently, no comprehensive public benchmark yet verifies how well supervision systems from the market perform under realistic, diverse attacks. To address this, we introduce BELLS, a Benchmark for the Evaluation of LLM Supervision Systems. The framework is two dimensional: harm severity (benign, borderline, harmful) and adversarial sophistication (direct vs. jailbreak) and provides a rich dataset covering 3 jailbreak families and 11 harm categories. Our evaluations reveal drastic limitations of specialized supervision systems. While they recognize some known jailbreak patterns, their semantic understanding and generalization capabilities are very limited, sometimes with detection rates close to zero when asking a harmful question directly or with a new jailbreak technique such as base64 encoding. Simply asking generalist LLMs if the user question is "harmful or not" largely outperforms these supervisors from the market according to our BELLS score. But frontier LLMs still suffer from metacognitive incoherence, often responding to queries they correctly identify as harmful (up to 30 percent for Claude 3.7 and greater than 50 percent for Mistral Large). These results suggest that simple scaffolding could significantly improve misuse detection robustness, but more research is needed to assess the tradeoffs of such techniques. Our results support the "bitter lesson" of misuse detection: general capabilities of LLMs are necessary to detect a diverse array of misuses and jailbreaks.

cross Humans overrely on overconfident language models, across languages

Authors: Neil Rathi, Dan Jurafsky, Kaitlyn Zhou

Abstract: As large language models (LLMs) are deployed globally, it is crucial that their responses are calibrated across languages to accurately convey uncertainty and limitations. Previous work has shown that LLMs are linguistically overconfident in English, leading users to overrely on confident generations. However, the usage and interpretation of epistemic markers (e.g., 'It's definitely,' 'I think') can differ sharply across languages. Here, we study the risks of multilingual linguistic (mis)calibration, overconfidence, and overreliance across five languages to evaluate the safety of LLMs in a global context. We find that overreliance risks are high across all languages. We first analyze the distribution of LLM-generated epistemic markers, and observe that while LLMs are cross-linguistically overconfident, they are also sensitive to documented linguistic variation. For example, models generate the most markers of uncertainty in Japanese and the most markers of certainty in German and Mandarin. We then measure human reliance rates across languages, finding that while users strongly rely on confident LLM generations in all languages, reliance behaviors differ cross-linguistically: for example, users rely significantly more on expressions of uncertainty in Japanese than in English. Taken together, these results indicate high risk of reliance on overconfident model generations across languages. Our findings highlight the challenges of multilingual linguistic calibration and stress the importance of culturally and linguistically contextualized model safety evaluations.

cross Too Human to Model:The Uncanny Valley of LLMs in Social Simulation -- When Generative Language Agents Misalign with Modelling Principles

Authors: Yongchao Zeng, Calum Brown, Mark Rounsevell

Abstract: Large language models (LLMs) have been increasingly used to build agents in social simulation because of their impressive abilities to generate fluent, contextually coherent dialogues. Such abilities can enhance the realism of models. However, the pursuit of realism is not necessarily compatible with the epistemic foundation of modelling. We argue that LLM agents, in many regards, are too human to model: they are too expressive, detailed and intractable to be consistent with the abstraction, simplification, and interpretability typically demanded by modelling. Through a model-building thought experiment that converts the Bass diffusion model to an LLM-based variant, we uncover five core dilemmas: a temporal resolution mismatch between natural conversation and abstract time steps; the need for intervention in conversations while avoiding undermining spontaneous agent outputs; the temptation to introduce rule-like instructions in prompts while maintaining conversational naturalness; the tension between role consistency and role evolution across time; and the challenge of understanding emergence, where system-level patterns become obscured by verbose micro textual outputs. These dilemmas steer the LLM agents towards an uncanny valley: not abstract enough to clarify underlying social mechanisms, while not natural enough to represent realistic human behaviour. This exposes an important paradox: the realism of LLM agents can obscure, rather than clarify, social dynamics when misapplied. We tease out the conditions in which LLM agents are ideally suited: where system-level emergence is not the focus, linguistic nuances and meaning are central, interactions unfold in natural time, and stable role identity is more important than long-term behavioural evolution. We call for repositioning LLM agents in the ecosystem of social simulation for future applications.

cross Bridging AI and Software Security: A Comparative Vulnerability Assessment of LLM Agent Deployment Paradigms

Authors: Tarek Gasmi, Ramzi Guesmi, Ines Belhadj, Jihene Bennaceur

Abstract: Large Language Model (LLM) agents face security vulnerabilities spanning AI-specific and traditional software domains, yet current research addresses these separately. This study bridges this gap through comparative evaluation of Function Calling architecture and Model Context Protocol (MCP) deployment paradigms using a unified threat classification framework. We tested 3,250 attack scenarios across seven language models, evaluating simple, composed, and chained attacks targeting both AI-specific threats (prompt injection) and software vulnerabilities (JSON injection, denial-of-service). Function Calling showed higher overall attack success rates (73.5% vs 62.59% for MCP), with greater system-centric vulnerability while MCP exhibited increased LLM-centric exposure. Attack complexity dramatically amplified effectiveness, with chained attacks achieving 91-96% success rates. Counterintuitively, advanced reasoning models demonstrated higher exploitability despite better threat detection. Results demonstrate that architectural choices fundamentally reshape threat landscapes. This work establishes methodological foundations for cross-domain LLM agent security assessment and provides evidence-based guidance for secure deployment. Code and experimental materials are available at https: // github. com/ theconsciouslab-ai/llm-agent-security.

cross Sample-Efficient Reinforcement Learning Controller for Deep Brain Stimulation in Parkinson's Disease

Authors: Harsh Ravivarapu, Gaurav Bagwe, Xiaoyong Yuan, Chunxiu Yu, Lan Zhang

Abstract: Deep brain stimulation (DBS) is an established intervention for Parkinson's disease (PD), but conventional open-loop systems lack adaptability, are energy-inefficient due to continuous stimulation, and provide limited personalization to individual neural dynamics. Adaptive DBS (aDBS) offers a closed-loop alternative, using biomarkers such as beta-band oscillations to dynamically modulate stimulation. While reinforcement learning (RL) holds promise for personalized aDBS control, existing methods suffer from high sample complexity, unstable exploration in binary action spaces, and limited deployability on resource-constrained hardware. We propose SEA-DBS, a sample-efficient actor-critic framework that addresses the core challenges of RL-based adaptive neurostimulation. SEA-DBS integrates a predictive reward model to reduce reliance on real-time feedback and employs Gumbel Softmax-based exploration for stable, differentiable policy updates in binary action spaces. Together, these components improve sample efficiency, exploration robustness, and compatibility with resource-constrained neuromodulatory hardware. We evaluate SEA-DBS on a biologically realistic simulation of Parkinsonian basal ganglia activity, demonstrating faster convergence, stronger suppression of pathological beta-band power, and resilience to post-training FP16 quantization. Our results show that SEA-DBS offers a practical and effective RL-based aDBS framework for real-time, resource-constrained neuromodulation.

cross MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing

Authors: Michael Clemens, Ana Marasovi\'c

Abstract: While AI presents significant potential for enhancing music mixing and mastering workflows, current research predominantly emphasizes end-to-end automation or generation, often overlooking the collaborative and instructional dimensions vital for co-creative processes. This gap leaves artists, particularly amateurs seeking to develop expertise, underserved. To bridge this, we introduce MixAssist, a novel audio-language dataset capturing the situated, multi-turn dialogue between expert and amateur music producers during collaborative mixing sessions. Comprising 431 audio-grounded conversational turns derived from 7 in-depth sessions involving 12 producers, MixAssist provides a unique resource for training and evaluating audio-language models that can comprehend and respond to the complexities of real-world music production dialogues. Our evaluations, including automated LLM-as-a-judge assessments and human expert comparisons, demonstrate that fine-tuning models such as Qwen-Audio on MixAssist can yield promising results, with Qwen significantly outperforming other tested models in generating helpful, contextually relevant mixing advice. By focusing on co-creative instruction grounded in audio context, MixAssist enables the development of intelligent AI assistants designed to support and augment the creative process in music mixing.

cross SymFlux: deep symbolic regression of Hamiltonian vector fields

Authors: M. A. Evangelista-Alvarado, P. Su\'arez-Serrato

Abstract: We present SymFlux, a novel deep learning framework that performs symbolic regression to identify Hamiltonian functions from their corresponding vector fields on the standard symplectic plane. SymFlux models utilize hybrid CNN-LSTM architectures to learn and output the symbolic mathematical expression of the underlying Hamiltonian. Training and validation are conducted on newly developed datasets of Hamiltonian vector fields, a key contribution of this work. Our results demonstrate the model's effectiveness in accurately recovering these symbolic expressions, advancing automated discovery in Hamiltonian mechanics.

cross Secure and Storage-Efficient Deep Learning Models for Edge AI Using Automatic Weight Generation

Authors: Habibur Rahaman, Atri Chatterjee, Swarup Bhunia

Abstract: Complex neural networks require substantial memory to store a large number of synaptic weights. This work introduces WINGs (Automatic Weight Generator for Secure and Storage-Efficient Deep Learning Models), a novel framework that dynamically generates layer weights in a fully connected neural network (FC) and compresses the weights in convolutional neural networks (CNNs) during inference, significantly reducing memory requirements without sacrificing accuracy. WINGs framework uses principal component analysis (PCA) for dimensionality reduction and lightweight support vector regression (SVR) models to predict layer weights in the FC networks, removing the need for storing full-weight matrices and achieving substantial memory savings. It also preferentially compresses the weights in low-sensitivity layers of CNNs using PCA and SVR with sensitivity analysis. The sensitivity-aware design also offers an added level of security, as any bit-flip attack with weights in compressed layers has an amplified and readily detectable effect on accuracy. WINGs achieves 53x compression for the FC layers and 28x for AlexNet with MNIST dataset, and 18x for Alexnet with CIFAR-10 dataset with 1-2% accuracy loss. This significant reduction in memory results in higher throughput and lower energy for DNN inference, making it attractive for resource-constrained edge applications.

cross KPFlow: An Operator Perspective on Dynamic Collapse Under Gradient Descent Training of Recurrent Networks

Authors: James Hazelden, Laura Driscoll, Eli Shlizerman, Eric Shea-Brown

Abstract: Gradient Descent (GD) and its variants are the primary tool for enabling efficient training of recurrent dynamical systems such as Recurrent Neural Networks (RNNs), Neural ODEs and Gated Recurrent units (GRUs). The dynamics that are formed in these models exhibit features such as neural collapse and emergence of latent representations that may support the remarkable generalization properties of networks. In neuroscience, qualitative features of these representations are used to compare learning in biological and artificial systems. Despite recent progress, there remains a need for theoretical tools to rigorously understand the mechanisms shaping learned representations, especially in finite, non-linear models. Here, we show that the gradient flow, which describes how the model's dynamics evolve over GD, can be decomposed into a product that involves two operators: a Parameter Operator, K, and a Linearized Flow Propagator, P. K mirrors the Neural Tangent Kernel in feed-forward neural networks, while P appears in Lyapunov stability and optimal control theory. We demonstrate two applications of our decomposition. First, we show how their interplay gives rise to low-dimensional latent dynamics under GD, and, specifically, how the collapse is a result of the network structure, over and above the nature of the underlying task. Second, for multi-task training, we show that the operators can be used to measure how objectives relevant to individual sub-tasks align. We experimentally and theoretically validate these findings, providing an efficient Pytorch package, \emph{KPFlow}, implementing robust analysis tools for general recurrent architectures. Taken together, our work moves towards building a next stage of understanding of GD learning in non-linear recurrent models.

cross An AI-Driven Thermal-Fluid Testbed for Advanced Small Modular Reactors: Integration of Digital Twin and Large Language Models

Authors: Doyeong Lim, Yang Liu, Zavier Ndum Ndum, Christian Young, Yassin Hassan

Abstract: This paper presents a multipurpose artificial intelligence (AI)-driven thermal-fluid testbed designed to advance Small Modular Reactor technologies by seamlessly integrating physical experimentation with advanced computational intelligence. The platform uniquely combines a versatile three-loop thermal-fluid facility with a high-fidelity digital twin and sophisticated AI frameworks for real-time prediction, control, and operational assistance. Methodologically, the testbed's digital twin, built upon the System Analysis Module code, is coupled with a Gated Recurrent Unit (GRU) neural network. This machine learning model, trained on experimental data, enables faster-than-real-time simulation, providing predictive insights into the system's dynamic behavior. The practical application of this AI integration is showcased through case studies. An AI-driven control framework where the GRU model accurately forecasts future system states and the corresponding control actions required to meet operational demands. Furthermore, an intelligent assistant, powered by a large language model, translates complex sensor data and simulation outputs into natural language, offering operators actionable analysis and safety recommendations. Comprehensive validation against experimental transients confirms the platform's high fidelity, with the GRU model achieving a temperature prediction root mean square error of 1.42 K. This work establishes an integrated research environment at the intersection of AI and thermal-fluid science, showcasing how AI-driven methodologies in modeling, control, and operator support can accelerate the innovation and deployment of next-generation nuclear systems.

cross SImpHAR: Advancing impedance-based human activity recognition using 3D simulation and text-to-motion models

Authors: Lala Shakti Swarup Ray, Mengxi Liu, Deepika Gurung, Bo Zhou, Sungho Suh, Paul Lukowicz

Abstract: Human Activity Recognition (HAR) with wearable sensors is essential for applications in healthcare, fitness, and human-computer interaction. Bio-impedance sensing offers unique advantages for fine-grained motion capture but remains underutilized due to the scarcity of labeled data. We introduce SImpHAR, a novel framework addressing this limitation through two core contributions. First, we propose a simulation pipeline that generates realistic bio-impedance signals from 3D human meshes using shortest-path estimation, soft-body physics, and text-to-motion generation serving as a digital twin for data augmentation. Second, we design a two-stage training strategy with decoupled approach that enables broader activity coverage without requiring label-aligned synthetic data. We evaluate SImpHAR on our collected ImpAct dataset and two public benchmarks, showing consistent improvements over state-of-the-art methods, with gains of up to 22.3% and 21.8%, in terms of accuracy and macro F1 score, respectively. Our results highlight the promise of simulation-driven augmentation and modular training for impedance-based HAR.

cross Bridging Data Gaps of Rare Conditions in ICU: A Multi-Disease Adaptation Approach for Clinical Prediction

Authors: Mingcheng Zhu, Yu Liu, Zhiyao Luo, Tingting Zhu

Abstract: Artificial Intelligence has revolutionised critical care for common conditions. Yet, rare conditions in the intensive care unit (ICU), including recognised rare diseases and low-prevalence conditions in the ICU, remain underserved due to data scarcity and intra-condition heterogeneity. To bridge such gaps, we developed KnowRare, a domain adaptation-based deep learning framework for predicting clinical outcomes for rare conditions in the ICU. KnowRare mitigates data scarcity by initially learning condition-agnostic representations from diverse electronic health records through self-supervised pre-training. It addresses intra-condition heterogeneity by selectively adapting knowledge from clinically similar conditions with a developed condition knowledge graph. Evaluated on two ICU datasets across five clinical prediction tasks (90-day mortality, 30-day readmission, ICU mortality, remaining length of stay, and phenotyping), KnowRare consistently outperformed existing state-of-the-art models. Additionally, KnowRare demonstrated superior predictive performance compared to established ICU scoring systems, including APACHE IV and IV-a. Case studies further demonstrated KnowRare's flexibility in adapting its parameters to accommodate dataset-specific and task-specific characteristics, its generalisation to common conditions under limited data scenarios, and its rationality in selecting source conditions. These findings highlight KnowRare's potential as a robust and practical solution for supporting clinical decision-making and improving care for rare conditions in the ICU.

cross Deprecating Benchmarks: Criteria and Framework

Authors: Ayrton San Joaquin, Rokas Gipi\v{s}kis, Leon Staufer, Ariel Gil

Abstract: As frontier artificial intelligence (AI) models rapidly advance, benchmarks are integral to comparing different models and measuring their progress in different task-specific domains. However, there is a lack of guidance on when and how benchmarks should be deprecated once they cease to effectively perform their purpose. This risks benchmark scores over-valuing model capabilities, or worse, obscuring capabilities and safety-washing. Based on a review of benchmarking practices, we propose criteria to decide when to fully or partially deprecate benchmarks, and a framework for deprecating benchmarks. Our work aims to advance the state of benchmarking towards rigorous and quality evaluations, especially for frontier models, and our recommendations are aimed to benefit benchmark developers, benchmark users, AI governance actors (across governments, academia, and industry panels), and policy makers.

cross Assessing the Prevalence of AI-assisted Cheating in Programming Courses: A Pilot Study

Authors: Kal\'eu Delphino

Abstract: Tools that can generate computer code in response to inputs written in natural language, such as ChatGPT, pose an existential threat to Computer Science education in its current form, since students can now use these tools to solve assignments without much effort. While that risk has already been recognized by scholars, the proportion of the student body that is incurring in this new kind of plagiarism is still an open problem. We conducted a pilot study in a large CS class (n=120) to assess the feasibility of estimating AI plagiarism through anonymous surveys and interviews. More than 25% of the survey respondents admitted to committing AI plagiarism. Conversely, only one student accepted to be interviewed. Given the high levels of misconduct acknowledgment, we conclude that surveys are an effective method for studies on the matter, while interviews should be avoided or designed in a way that can entice participation.

cross Can Interpretation Predict Behavior on Unseen Data?

Authors: Victoria R. Li, Jenny Kaufmann, Martin Wattenberg, David Alvarez-Melis, Naomi Saphra

Abstract: Interpretability research often aims to predict how a model will respond to targeted interventions on specific mechanisms. However, it rarely predicts how a model will respond to unseen input data. This paper explores the promises and challenges of interpretability as a tool for predicting out-of-distribution (OOD) model behavior. Specifically, we investigate the correspondence between attention patterns and OOD generalization in hundreds of Transformer models independently trained on a synthetic classification task. These models exhibit several distinct systematic generalization rules OOD, forming a diverse population for correlational analysis. In this setting, we find that simple observational tools from interpretability can predict OOD performance. In particular, when in-distribution attention exhibits hierarchical patterns, the model is likely to generalize hierarchically on OOD data -- even when the rule's implementation does not rely on these hierarchical patterns, according to ablation tests. Our findings offer a proof-of-concept to motivate further interpretability work on predicting unseen model behavior.

cross FedPhD: Federated Pruning with Hierarchical Learning of Diffusion Models

Authors: Qianyu Long, Qiyuan Wang, Christos Anagnostopoulos, Daning Bi

Abstract: Federated Learning (FL), as a distributed learning paradigm, trains models over distributed clients' data. FL is particularly beneficial for distributed training of Diffusion Models (DMs), which are high-quality image generators that require diverse data. However, challenges such as high communication costs and data heterogeneity persist in training DMs similar to training Transformers and Convolutional Neural Networks. Limited research has addressed these issues in FL environments. To address this gap and challenges, we introduce a novel approach, FedPhD, designed to efficiently train DMs in FL environments. FedPhD leverages Hierarchical FL with homogeneity-aware model aggregation and selection policy to tackle data heterogeneity while reducing communication costs. The distributed structured pruning of FedPhD enhances computational efficiency and reduces model storage requirements in clients. Our experiments across multiple datasets demonstrate that FedPhD achieves high model performance regarding Fr\'echet Inception Distance (FID) scores while reducing communication costs by up to $88\%$. FedPhD outperforms baseline methods achieving at least a $34\%$ improvement in FID, while utilizing only $56\%$ of the total computation and communication resources.

cross EA: An Event Autoencoder for High-Speed Vision Sensing

Authors: Riadul Islam, Joey Mul\'e, Dhandeep Challagundla, Shahmir Rizvi, Sean Carson

Abstract: High-speed vision sensing is essential for real-time perception in applications such as robotics, autonomous vehicles, and industrial automation. Traditional frame-based vision systems suffer from motion blur, high latency, and redundant data processing, limiting their performance in dynamic environments. Event cameras, which capture asynchronous brightness changes at the pixel level, offer a promising alternative but pose challenges in object detection due to sparse and noisy event streams. To address this, we propose an event autoencoder architecture that efficiently compresses and reconstructs event data while preserving critical spatial and temporal features. The proposed model employs convolutional encoding and incorporates adaptive threshold selection and a lightweight classifier to enhance recognition accuracy while reducing computational complexity. Experimental results on the existing Smart Event Face Dataset (SEFD) demonstrate that our approach achieves comparable accuracy to the YOLO-v4 model while utilizing up to $35.5\times$ fewer parameters. Implementations on embedded platforms, including Raspberry Pi 4B and NVIDIA Jetson Nano, show high frame rates ranging from 8 FPS up to 44.8 FPS. The proposed classifier exhibits up to 87.84x better FPS than the state-of-the-art and significantly improves event-based vision performance, making it ideal for low-power, high-speed applications in real-time edge computing.

cross SoftSignSGD(S3): An Enhanced Optimizer for Practical DNN Training and Loss Spikes Minimization Beyond Adam

Authors: Hanyang Peng, Shuang Qin, Yue Yu, Fangqing Jiang, Hui Wang, Wen Gao

Abstract: Adam has proven remarkable successful in training deep neural networks, but the mechanisms underlying its empirical successes and limitations remain underexplored. In this study, we demonstrate that the effectiveness of Adam stems largely from its similarity to SignSGD in robustly handling large gradient fluctuations, yet it is also vulnerable to destabilizing loss spikes due to its uncontrolled update scaling. To enhance the advantage of Adam and mitigate its limitation, we propose SignSoftSGD (S3), a novel optimizer with three key innovations. \emph{First}, S3 generalizes the sign-like update by employing a flexible $p$-th order momentum ($p \geq 1$) in the denominator, departing from the conventional second-order momentum (variance) preconditioning. This design enables enhanced performance while achieving stable training even with aggressive learning rates. \emph{Second}, S3 minimizes the occurrences of loss spikes through unified exponential moving average coefficients for numerator and denominator momenta, which inherently bound updates to $[-1, 1]$ and simplify hyperparameter tuning. \emph{Third}, S3 incorporates an equivalent Nesterov's accelerated gradient(NAG) module, accelerating convergence without memory overhead. Theoretically, we prove that S3 achieves the optimal convergence rate of $O\left(\frac{1}{T^{\sfrac{1}{4}}}\right)$ for general nonconvex stochastic optimization under weak assumptions. Extensive experiments across a range of vision and language tasks show that \textsf{\small S3} not only converges more rapidly and improves performance but also rarely experiences loss spikes, even with a \textbf{$\bm{10 \times}$} larger learning rate. In fact, S3 delivers performance comparable to or better than AdamW with \textbf{$2 \times$} the training steps, establishing its efficacy in both efficiency and final task performance.

cross Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models

Authors: Aaron Dharna, Cong Lu, Jeff Clune

Abstract: Multi-agent interactions have long fueled innovation, from natural predator-prey dynamics to the space race. Self-play (SP) algorithms try to harness these dynamics by pitting agents against ever-improving opponents, thereby creating an implicit curriculum toward learning high-quality solutions. However, SP often fails to produce diverse solutions and can get stuck in locally optimal behaviors. We introduce Foundation-Model Self-Play (FMSP), a new direction that leverages the code-generation capabilities and vast knowledge of foundation models (FMs) to overcome these challenges by leaping across local optima in policy space. We propose a family of approaches: (1) \textbf{Vanilla Foundation-Model Self-Play (vFMSP)} continually refines agent policies via competitive self-play; (2) \textbf{Novelty-Search Self-Play (NSSP)} builds a diverse population of strategies, ignoring performance; and (3) the most promising variant, \textbf{Quality-Diveristy Self-Play (QDSP)}, creates a diverse set of high-quality policies by combining the diversity of NSSP and refinement of vFMSP. We evaluate FMSPs in Car Tag, a continuous-control pursuer-evader setting, and in Gandalf, a simple AI safety simulation in which an attacker tries to jailbreak an LLM's defenses. In Car Tag, FMSPs explore a wide variety of reinforcement learning, tree search, and heuristic-based methods, to name just a few. In terms of discovered policy quality, \ouralgo and vFMSP surpass strong human-designed strategies. In Gandalf, FMSPs can successfully automatically red-team an LLM, breaking through and jailbreaking six different, progressively stronger levels of defense. Furthermore, FMSPs can automatically proceed to patch the discovered vulnerabilities. Overall, FMSPs represent a promising new research frontier of improving self-play with foundation models, opening fresh paths toward more creative and open-ended strategy discovery

cross Generative Lagrangian data assimilation for ocean dynamics under extreme sparsity

Authors: Niloofar Asefi, Leonard Lupin-Jimenez, Tianning Wu, Ruoying He, Ashesh Chattopadhyay

Abstract: Reconstructing ocean dynamics from observational data is fundamentally limited by the sparse, irregular, and Lagrangian nature of spatial sampling, particularly in subsurface and remote regions. This sparsity poses significant challenges for forecasting key phenomena such as eddy shedding and rogue waves. Traditional data assimilation methods and deep learning models often struggle to recover mesoscale turbulence under such constraints. We leverage a deep learning framework that combines neural operators with denoising diffusion probabilistic models (DDPMs) to reconstruct high-resolution ocean states from extremely sparse Lagrangian observations. By conditioning the generative model on neural operator outputs, the framework accurately captures small-scale, high-wavenumber dynamics even at $99\%$ sparsity (for synthetic data) and $99.9\%$ sparsity (for real satellite observations). We validate our method on benchmark systems, synthetic float observations, and real satellite data, demonstrating robust performance under severe spatial sampling limitations as compared to other deep learning baselines.

cross Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning

Authors: Ziyang Wang, Jaehong Yoon, Shoubin Yu, Md Mohaiminul Islam, Gedas Bertasius, Mohit Bansal

Abstract: Despite advances in reinforcement learning (RL)-based video reasoning with large language models (LLMs), data collection and finetuning remain significant challenges. These methods often rely on large-scale supervised fine-tuning (SFT) with extensive video data and long Chain-of-Thought (CoT) annotations, making them costly and hard to scale. To address this, we present Video-RTS, a new approach to improve video reasoning capability with drastically improved data efficiency by combining data-efficient RL with a video-adaptive test-time scaling (TTS) strategy. Based on observations about the data scaling of RL samples, we skip the resource-intensive SFT step and employ efficient pure-RL training with output-based rewards, requiring no additional annotations or extensive fine-tuning. Furthermore, to utilize computational resources more efficiently, we introduce a sparse-to-dense video TTS strategy that improves inference by iteratively adding frames based on output consistency. We validate our approach on multiple video reasoning benchmarks, showing that Video-RTS surpasses existing video reasoning models by an average of 2.4% in accuracy using only 3.6% training samples. For example, Video-RTS achieves a 4.2% improvement on Video-Holmes, a recent and challenging video reasoning benchmark, and a 2.6% improvement on MMVU. Notably, our pure RL training and adaptive video TTS offer complementary strengths, enabling Video-RTS's strong reasoning performance.

cross MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models

Authors: Yiwen Liu, Chenyu Zhang, Junjie Song, Siqi Chen, Sun Yin, Zihan Wang, Lingming Zeng, Yuji Cao, Junming Jiao

Abstract: As a prominent data modality task, time series forecasting plays a pivotal role in diverse applications. With the remarkable advancements in Large Language Models (LLMs), the adoption of LLMs as the foundational architecture for time series modeling has gained significant attention. Although existing models achieve some success, they rarely both model time and frequency characteristics in a pretraining-finetuning paradigm leading to suboptimal performance in predictions of complex time series, which requires both modeling periodicity and prior pattern knowledge of signals. We propose MoFE-Time, an innovative time series forecasting model that integrates time and frequency domain features within a Mixture of Experts (MoE) network. Moreover, we use the pretraining-finetuning paradigm as our training framework to effectively transfer prior pattern knowledge across pretraining and finetuning datasets with different periodicity distributions. Our method introduces both frequency and time cells as experts after attention modules and leverages the MoE routing mechanism to construct multidimensional sparse representations of input signals. In experiments on six public benchmarks, MoFE-Time has achieved new state-of-the-art performance, reducing MSE and MAE by 6.95% and 6.02% compared to the representative methods Time-MoE. Beyond the existing evaluation benchmarks, we have developed a proprietary dataset, NEV-sales, derived from real-world business scenarios. Our method achieves outstanding results on this dataset, underscoring the effectiveness of the MoFE-Time model in practical commercial applications.

cross Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings

Authors: Russell Taylor, Benjamin Herbert, Michael Sana

Abstract: Translating wordplay across languages presents unique challenges that have long confounded both professional human translators and machine translation systems. This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation. Our methodology employs a three-stage approach. First, we establish a baseline using multiple frontier large language models with feedback based on a new contrastive learning dataset. Second, we implement a guided chain-of-thought pipeline with combined phonetic-semantic embeddings. Third, we implement a multi-agent generator-discriminator framework for evaluating and regenerating puns with feedback. Moving beyond the limitations of literal translation, our methodology's primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary. Our best runs earned first and second place in the CLEF JOKER 2025 Task 2 competition where they were evaluated manually by expert native French speakers. This research addresses a gap between translation studies and computational linguistics by implementing linguistically-informed techniques for wordplay translation, advancing our understanding of how language models can be leveraged to handle the complex interplay between semantic ambiguity, phonetic similarity, and the implicit cultural and linguistic awareness needed for successful humor.

cross GR-LLMs: Recent Advances in Generative Recommendation Based on Large Language Models

Authors: Zhen Yang, Haitao Lin, Jiawei xue, Ziji Zhang

Abstract: In the past year, Generative Recommendations (GRs) have undergone substantial advancements, especially in leveraging the powerful sequence modeling and reasoning capabilities of Large Language Models (LLMs) to enhance overall recommendation performance. LLM-based GRs are forming a new paradigm that is distinctly different from discriminative recommendations, showing strong potential to replace traditional recommendation systems heavily dependent on complex hand-crafted features. In this paper, we provide a comprehensive survey aimed at facilitating further research of LLM-based GRs. Initially, we outline the general preliminaries and application cases of LLM-based GRs. Subsequently, we introduce the main considerations when LLM-based GRs are applied in real industrial scenarios. Finally, we explore promising directions for LLM-based GRs. We hope that this survey contributes to the ongoing advancement of the GR domain.

cross Towards LLM-based Root Cause Analysis of Hardware Design Failures

Authors: Siyu Qiu, Muzhi Wang, Raheel Afsharmazayejani, Mohammad Moradi Shahmiri, Benjamin Tan, Hammond Pearce

Abstract: With advances in large language models (LLMs), new opportunities have emerged to develop tools that support the digital hardware design process. In this work, we explore how LLMs can assist with explaining the root cause of design issues and bugs that are revealed during synthesis and simulation, a necessary milestone on the pathway towards widespread use of LLMs in the hardware design process and for hardware security analysis. We find promising results: for our corpus of 34 different buggy scenarios, OpenAI's o3-mini reasoning model reached a correct determination 100% of the time under pass@5 scoring, with other state of the art models and configurations usually achieving more than 80% performance and more than 90% when assisted with retrieval-augmented generation.

cross Failure Forecasting Boosts Robustness of Sim2Real Rhythmic Insertion Policies

Authors: Yuhan Liu, Xinyu Zhang, Haonan Chang, Abdeslam Boularias

Abstract: This paper addresses the challenges of Rhythmic Insertion Tasks (RIT), where a robot must repeatedly perform high-precision insertions, such as screwing a nut into a bolt with a wrench. The inherent difficulty of RIT lies in achieving millimeter-level accuracy and maintaining consistent performance over multiple repetitions, particularly when factors like nut rotation and friction introduce additional complexity. We propose a sim-to-real framework that integrates a reinforcement learning-based insertion policy with a failure forecasting module. By representing the wrench's pose in the nut's coordinate frame rather than the robot's frame, our approach significantly enhances sim-to-real transferability. The insertion policy, trained in simulation, leverages real-time 6D pose tracking to execute precise alignment, insertion, and rotation maneuvers. Simultaneously, a neural network predicts potential execution failures, triggering a simple recovery mechanism that lifts the wrench and retries the insertion. Extensive experiments in both simulated and real-world environments demonstrate that our method not only achieves a high one-time success rate but also robustly maintains performance over long-horizon repetitive tasks.

cross Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration

Authors: Xinyuan Song, Zeyu Wang, Siyi Wu, Tianyu Shi, Lynn Ai

Abstract: We present Gradientsys, a next-generation multi-agent scheduling framework that coordinates diverse specialized AI agents using a typed Model-Context Protocol (MCP) and a ReAct-based dynamic planning loop. At its core, Gradientsys employs an LLM-powered scheduler for intelligent one-to-many task dispatch, enabling parallel execution of heterogeneous agents such as PDF parsers, web search modules, GUI controllers, and web builders. The framework supports hybrid synchronous/asynchronous execution, respects agent capacity constraints, and incorporates a robust retry-and-replan mechanism to handle failures gracefully. To promote transparency and trust, Gradientsys includes an observability layer streaming real-time agent activity and intermediate reasoning via Server-Sent Events (SSE). We offer an architectural overview and evaluate Gradientsys against existing frameworks in terms of extensibility, scheduling topology, tool reusability, parallelism, and observability. Experiments on the GAIA general-assistant benchmark show that Gradientsys achieves higher task success rates with reduced latency and lower API costs compared to a MinionS-style baseline, demonstrating the strength of its LLM-driven multi-agent orchestration.

cross InvestAlign: Overcoming Data Scarcity in Aligning Large Language Models with Investor Decision-Making Processes under Herd Behavior

Authors: Huisheng Wang, Zhuoshi Pan, Hangjing Zhang, Mingxiao Liu, Hanqing Gao, H. Vicky Zhao

Abstract: Aligning Large Language Models (LLMs) with investor decision-making processes under herd behavior is a critical challenge in behavioral finance, which grapples with a fundamental limitation: the scarcity of real-user data needed for Supervised Fine-Tuning (SFT). While SFT can bridge the gap between LLM outputs and human behavioral patterns, its reliance on massive authentic data imposes substantial collection costs and privacy risks. We propose InvestAlign, a novel framework that constructs high-quality SFT datasets by leveraging theoretical solutions to similar and simple optimal investment problems rather than complex scenarios. Our theoretical analysis demonstrates that training LLMs with InvestAlign-generated data achieves faster parameter convergence than using real-user data, suggesting superior learning efficiency. Furthermore, we develop InvestAgent, an LLM agent fine-tuned with InvestAlign, which demonstrates significantly closer alignment to real-user data than pre-SFT models in both simple and complex investment problems. This highlights our proposed InvestAlign as a promising approach with the potential to address complex optimal investment problems and align LLMs with investor decision-making processes under herd behavior. Our code is publicly available at https://github.com/thu-social-network-research-group/InvestAlign.

URLs: https://github.com/thu-social-network-research-group/InvestAlign.

cross Graph-based Fake Account Detection: A Survey

Authors: Ali Safarpoor Dehkordi, Ahad N. Zehmakan

Abstract: In recent years, there has been a growing effort to develop effective and efficient algorithms for fake account detection in online social networks. This survey comprehensively reviews existing methods, with a focus on graph-based techniques that utilise topological features of social graphs (in addition to account information, such as their shared contents and profile data) to distinguish between fake and real accounts. We provide several categorisations of these methods (for example, based on techniques used, input data, and detection time), discuss their strengths and limitations, and explain how these methods connect in the broader context. We also investigate the available datasets, including both real-world data and synthesised models. We conclude the paper by proposing several potential avenues for future research.

cross The Primacy of Magnitude in Low-Rank Adaptation

Authors: Zicheng Zhang, Haoran Li, Yifeng Zhang, Guoqiang Gong, Jiaxing Wang, Pengzhang Liu, Qixia Jiang, Junxing Hu

Abstract: Low-Rank Adaptation (LoRA) offers a parameter-efficient paradigm for tuning large models. While recent spectral initialization methods improve convergence and performance over the naive "Noise & Zeros" scheme, their extra computational and storage overhead undermines efficiency. In this paper, we establish update magnitude as the fundamental driver of LoRA performance and propose LoRAM, a magnitude-driven "Basis & Basis" initialization scheme that matches spectral methods without their inefficiencies. Our key contributions are threefold: (i) Magnitude of weight updates determines convergence. We prove low-rank structures intrinsically bound update magnitudes, unifying hyperparameter tuning in learning rate, scaling factor, and initialization as mechanisms to optimize magnitude regulation. (ii) Spectral initialization succeeds via magnitude amplification. We demystify that the presumed knowledge-driven benefit of the spectral component essentially arises from the boost in the weight update magnitude. (iii) A novel and compact initialization strategy, LoRAM, scales deterministic orthogonal bases using pretrained weight magnitudes to simulate spectral gains. Extensive experiments show that LoRAM serves as a strong baseline, retaining the full efficiency of LoRA while matching or outperforming spectral initialization across benchmarks.

cross SkyVLN: Vision-and-Language Navigation and NMPC Control for UAVs in Urban Environments

Authors: Tianshun Li, Tianyi Huai, Zhen Li, Yichun Gao, Haoang Li, Xinhu Zheng

Abstract: Unmanned Aerial Vehicles (UAVs) have emerged as versatile tools across various sectors, driven by their mobility and adaptability. This paper introduces SkyVLN, a novel framework integrating vision-and-language navigation (VLN) with Nonlinear Model Predictive Control (NMPC) to enhance UAV autonomy in complex urban environments. Unlike traditional navigation methods, SkyVLN leverages Large Language Models (LLMs) to interpret natural language instructions and visual observations, enabling UAVs to navigate through dynamic 3D spaces with improved accuracy and robustness. We present a multimodal navigation agent equipped with a fine-grained spatial verbalizer and a history path memory mechanism. These components allow the UAV to disambiguate spatial contexts, handle ambiguous instructions, and backtrack when necessary. The framework also incorporates an NMPC module for dynamic obstacle avoidance, ensuring precise trajectory tracking and collision prevention. To validate our approach, we developed a high-fidelity 3D urban simulation environment using AirSim, featuring realistic imagery and dynamic urban elements. Extensive experiments demonstrate that SkyVLN significantly improves navigation success rates and efficiency, particularly in new and unseen environments.

cross From Data-Centric to Sample-Centric: Enhancing LLM Reasoning via Progressive Optimization

Authors: Xinjie Chen, Minpeng Liao, Guoxin Chen, Chengxi Li, Biao Fu, Kai Fan, Xinggao Liu

Abstract: Reinforcement learning with verifiable rewards (RLVR) has recently advanced the reasoning capabilities of large language models (LLMs). While prior work has emphasized algorithmic design, data curation, and reward shaping, we investigate RLVR from a sample-centric perspective and introduce LPPO (Learning-Progress and Prefix-guided Optimization), a framework of progressive optimization techniques. Our work addresses a critical question: how to best leverage a small set of trusted, high-quality demonstrations, rather than simply scaling up data volume. First, motivated by how hints aid human problem-solving, we propose prefix-guided sampling, an online data augmentation method that incorporates partial solution prefixes from expert demonstrations to guide the policy, particularly for challenging instances. Second, inspired by how humans focus on important questions aligned with their current capabilities, we introduce learning-progress weighting, a dynamic strategy that adjusts each training sample's influence based on model progression. We estimate sample-level learning progress via an exponential moving average of per-sample pass rates, promoting samples that foster learning and de-emphasizing stagnant ones. Experiments on mathematical-reasoning benchmarks demonstrate that our methods outperform strong baselines, yielding faster convergence and a higher performance ceiling.

cross Learning controllable dynamics through informative exploration

Authors: Peter N. Loxley, Friedrich T. Sommer

Abstract: Environments with controllable dynamics are usually understood in terms of explicit models. However, such models are not always available, but may sometimes be learned by exploring an environment. In this work, we investigate using an information measure called "predicted information gain" to determine the most informative regions of an environment to explore next. Applying methods from reinforcement learning allows good suboptimal exploring policies to be found, and leads to reliable estimates of the underlying controllable dynamics. This approach is demonstrated by comparing with several myopic exploration approaches.

cross Denoising Multi-Beta VAE: Representation Learning for Disentanglement and Generation

Authors: Anshuk Uppal, Yuhta Takida, Chieh-Hsin Lai, Yuki Mitsufuji

Abstract: Disentangled and interpretable latent representations in generative models typically come at the cost of generation quality. The $\beta$-VAE framework introduces a hyperparameter $\beta$ to balance disentanglement and reconstruction quality, where setting $\beta > 1$ introduces an information bottleneck that favors disentanglement over sharp, accurate reconstructions. To address this trade-off, we propose a novel generative modeling framework that leverages a range of $\beta$ values to learn multiple corresponding latent representations. First, we obtain a slew of representations by training a single variational autoencoder (VAE), with a new loss function that controls the information retained in each latent representation such that the higher $\beta$ value prioritize disentanglement over reconstruction fidelity. We then, introduce a non-linear diffusion model that smoothly transitions latent representations corresponding to different $\beta$ values. This model denoises towards less disentangled and more informative representations, ultimately leading to (almost) lossless representations, enabling sharp reconstructions. Furthermore, our model supports sample generation without input images, functioning as a standalone generative model. We evaluate our framework in terms of both disentanglement and generation quality. Additionally, we observe smooth transitions in the latent spaces with respect to changes in $\beta$, facilitating consistent manipulation of generated outputs.

cross Efficient Multi-Task Reinforcement Learning with Cross-Task Policy Guidance

Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

Abstract: Multi-task reinforcement learning endeavors to efficiently leverage shared information across various tasks, facilitating the simultaneous learning of multiple tasks. Existing approaches primarily focus on parameter sharing with carefully designed network structures or tailored optimization procedures. However, they overlook a direct and complementary way to exploit cross-task similarities: the control policies of tasks already proficient in some skills can provide explicit guidance for unmastered tasks to accelerate skills acquisition. To this end, we present a novel framework called Cross-Task Policy Guidance (CTPG), which trains a guide policy for each task to select the behavior policy interacting with the environment from all tasks' control policies, generating better training trajectories. In addition, we propose two gating mechanisms to improve the learning efficiency of CTPG: one gate filters out control policies that are not beneficial for guidance, while the other gate blocks tasks that do not necessitate guidance. CTPG is a general framework adaptable to existing parameter sharing approaches. Empirical evaluations demonstrate that incorporating CTPG with these approaches significantly enhances performance in manipulation and locomotion benchmarks.

cross Expediting data extraction using a large language model (LLM) and scoping review protocol: a methodological study within a complex scoping review

Authors: James Stewart-Evans, Emma Wilson, Tessa Langley, Andrew Prayle, Angela Hands, Karen Exley, Jo Leonardi-Bee

Abstract: The data extraction stages of reviews are resource-intensive, and researchers may seek to expediate data extraction using online (large language models) LLMs and review protocols. Claude 3.5 Sonnet was used to trial two approaches that used a review protocol to prompt data extraction from 10 evidence sources included in a case study scoping review. A protocol-based approach was also used to review extracted data. Limited performance evaluation was undertaken which found high accuracy for the two extraction approaches (83.3% and 100%) when extracting simple, well-defined citation details; accuracy was lower (9.6% and 15.8%) when extracting more complex, subjective data items. Considering all data items, both approaches had precision >90% but low recall (<25%) and F1 scores (<40%). The context of a complex scoping review, open response types and methodological approach likely impacted performance due to missed and misattributed data. LLM feedback considered the baseline extraction accurate and suggested minor amendments: four of 15 (26.7%) to citation details and 8 of 38 (21.1%) to key findings data items were considered to potentially add value. However, when repeating the process with a dataset featuring deliberate errors, only 2 of 39 (5%) errors were detected. Review-protocol-based methods used for expediency require more robust performance evaluation across a range of LLMs and review contexts with comparison to conventional prompt engineering approaches. We recommend researchers evaluate and report LLM performance if using them similarly to conduct data extraction or review extracted data. LLM feedback contributed to protocol adaptation and may assist future review protocol drafting.

cross Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic

Authors: Shizhe Cai, Jayadeep Jacob, Zeya Yin, Fabio Ramos

Abstract: Deep reinforcement learning has shown remarkable success in continuous control tasks, yet often requires extensive training data, struggles with complex, long-horizon planning, and fails to maintain safety constraints during operation. Meanwhile, Model Predictive Control (MPC) offers explainability and constraint satisfaction, but typically yields only locally optimal solutions and demands careful cost function design. This paper introduces the Q-guided STein variational model predictive Actor-Critic (Q-STAC), a novel framework that bridges these approaches by integrating Bayesian MPC with actor-critic reinforcement learning through constrained Stein Variational Gradient Descent (SVGD). Our method optimizes control sequences directly using learned Q-values as objectives, eliminating the need for explicit cost function design while leveraging known system dynamics to enhance sample efficiency and ensure control signals remain within safe boundaries. Extensive experiments on 2D navigation and robotic manipulation tasks demonstrate that Q-STAC achieves superior sample efficiency, robustness, and optimality compared to state-of-the-art algorithms, while maintaining the high expressiveness of policy distributions. Experiment videos are available on our website: https://sites.google.com/view/q-stac

URLs: https://sites.google.com/view/q-stac

cross Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning

Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

Abstract: Offline multi-task reinforcement learning aims to learn a unified policy capable of solving multiple tasks using only pre-collected task-mixed datasets, without requiring any online interaction with the environment. However, it faces significant challenges in effectively sharing knowledge across tasks. Inspired by the efficient knowledge abstraction observed in human learning, we propose Goal-Oriented Skill Abstraction (GO-Skill), a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance. Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library. To mitigate class imbalances between broadly applicable and task-specific skills, we introduce a skill enhancement phase to refine the extracted skills. Furthermore, we integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks. Extensive experiments on diverse robotic manipulation tasks within the MetaWorld benchmark demonstrate the effectiveness and versatility of GO-Skill.

cross EXAONE Path 2.0: Pathology Foundation Model with End-to-End Supervision

Authors: Myungjang Pyeon, Janghyeon Lee, Minsoo Lee, Juseung Yun, Hwanil Choi, Jonghyun Kim, Jiwon Kim, Yi Hu, Jongseong Jang, Soonyoung Lee

Abstract: In digital pathology, whole-slide images (WSIs) are often difficult to handle due to their gigapixel scale, so most approaches train patch encoders via self-supervised learning (SSL) and then aggregate the patch-level embeddings via multiple instance learning (MIL) or slide encoders for downstream tasks. However, patch-level SSL may overlook complex domain-specific features that are essential for biomarker prediction, such as mutation status and molecular characteristics, as SSL methods rely only on basic augmentations selected for natural image domains on small patch-level area. Moreover, SSL methods remain less data efficient than fully supervised approaches, requiring extensive computational resources and datasets to achieve competitive performance. To address these limitations, we present EXAONE Path 2.0, a pathology foundation model that learns patch-level representations under direct slide-level supervision. Using only 37k WSIs for training, EXAONE Path 2.0 achieves state-of-the-art average performance across 10 biomarker prediction tasks, demonstrating remarkable data efficiency.

cross Deep Disentangled Representation Network for Treatment Effect Estimation

Authors: Hui Meng, Keping Yang, Xuyu Peng, Bo Zheng

Abstract: Estimating individual-level treatment effect from observational data is a fundamental problem in causal inference and has attracted increasing attention in the fields of education, healthcare, and public policy.In this work, we concentrate on the study of disentangled representation methods that have shown promising outcomes by decomposing observed covariates into instrumental, confounding, and adjustment factors. However, most of the previous work has primarily revolved around generative models or hard decomposition methods for covariates, which often struggle to guarantee the attainment of precisely disentangled factors. In order to effectively model different causal relationships, we propose a novel treatment effect estimation algorithm that incorporates a mixture of experts with multi-head attention and a linear orthogonal regularizer to softly decompose the pre-treatment variables, and simultaneously eliminates selection bias via importance sampling re-weighting techniques. We conduct extensive experiments on both public semi-synthetic and real-world production datasets. The experimental results clearly demonstrate that our algorithm outperforms the state-of-the-art methods focused on individual treatment effects.

cross MS-DPPs: Multi-Source Determinantal Point Processes for Contextual Diversity Refinement of Composite Attributes in Text to Image Retrieval

Authors: Naoya Sogi, Takashi Shibata, Makoto Terao, Masanori Suganuma, Takayuki Okatani

Abstract: Result diversification (RD) is a crucial technique in Text-to-Image Retrieval for enhancing the efficiency of a practical application. Conventional methods focus solely on increasing the diversity metric of image appearances. However, the diversity metric and its desired value vary depending on the application, which limits the applications of RD. This paper proposes a novel task called CDR-CA (Contextual Diversity Refinement of Composite Attributes). CDR-CA aims to refine the diversities of multiple attributes, according to the application's context. To address this task, we propose Multi-Source DPPs, a simple yet strong baseline that extends the Determinantal Point Process (DPP) to multi-sources. We model MS-DPP as a single DPP model with a unified similarity matrix based on a manifold representation. We also introduce Tangent Normalization to reflect contexts. Extensive experiments demonstrate the effectiveness of the proposed method. Our code is publicly available at https://github.com/NEC-N-SOGI/msdpp.

URLs: https://github.com/NEC-N-SOGI/msdpp.

cross Elite Polarization in European Parliamentary Speeches: a Novel Measurement Approach Using Large Language Models

Authors: Gennadii Iakovlev

Abstract: This project introduces a new measure of elite polarization via actor and subject detection using artificial intelligence. I identify when politicians mention one another in parliamentary speeches, note who is speaking and who is being addressed, and assess the emotional temperature behind these evaluations. This maps how elites evaluate their various out-parties, allowing us to create an index of mutual out-party hostility, that is, elite polarization. While I analyzed polarization data over the past four decades for the UK, and two decades for Hungary and Italy, my approach lays the groundwork for a twenty-year, EU-wide time-series dataset on elite polarization. I obtain the results that can be aggregated by party and quarter. The resulting index demonstrates a good face validity: it reacts to events such as electoral campaigns, country- and party-level crises, and to parties losing and assuming power.

cross Exploring State-Space-Model based Language Model in Music Generation

Authors: Wei-Jaw Lee, Fang-Chih Hsieh, Xuanjun Chen, Fang-Duo Tsai, Yi-Hsuan Yang

Abstract: The recent surge in State Space Models (SSMs), particularly the emergence of Mamba, has established them as strong alternatives or complementary modules to Transformers across diverse domains. In this work, we aim to explore the potential of Mamba-based architectures for text-to-music generation. We adopt discrete tokens of Residual Vector Quantization (RVQ) as the modeling representation and empirically find that a single-layer codebook can capture semantic information in music. Motivated by this observation, we focus on modeling a single-codebook representation and adapt SiMBA, originally designed as a Mamba-based encoder, to function as a decoder for sequence modeling. We compare its performance against a standard Transformer-based decoder. Our results suggest that, under limited-resource settings, SiMBA achieves much faster convergence and generates outputs closer to the ground truth. This demonstrates the promise of SSMs for efficient and expressive text-to-music generation. We put audio examples on Github.

cross Photometric Stereo using Gaussian Splatting and inverse rendering

Authors: Mat\'eo Ducastel (GREYC), David Tschumperl\'e, Yvain Qu\'eau

Abstract: Recent state-of-the-art algorithms in photometric stereo rely on neural networks and operate either through prior learning or inverse rendering optimization. Here, we revisit the problem of calibrated photometric stereo by leveraging recent advances in 3D inverse rendering using the Gaussian Splatting formalism. This allows us to parameterize the 3D scene to be reconstructed and optimize it in a more interpretable manner. Our approach incorporates a simplified model for light representation and demonstrates the potential of the Gaussian Splatting rendering engine for the photometric stereo problem.

cross CLI-RAG: A Retrieval-Augmented Framework for Clinically Structured and Context Aware Text Generation with LLMs

Authors: Garapati Keerthana, Manik Gupta

Abstract: Large language models (LLMs), including zero-shot and few-shot paradigms, have shown promising capabilities in clinical text generation. However, real-world applications face two key challenges: (1) patient data is highly unstructured, heterogeneous, and scattered across multiple note types and (2) clinical notes are often long and semantically dense, making naive prompting infeasible due to context length constraints and the risk of omitting clinically relevant information. We introduce CLI-RAG (Clinically Informed Retrieval-Augmented Generation), a domain-specific framework for structured and clinically grounded text generation using LLMs. It incorporates a novel hierarchical chunking strategy that respects clinical document structure and introduces a task-specific dual-stage retrieval mechanism. The global stage identifies relevant note types using evidence-based queries, while the local stage extracts high-value content within those notes creating relevance at both document and section levels. We apply the system to generate structured progress notes for individual hospital visits using 15 clinical note types from the MIMIC-III dataset. Experiments show that it preserves temporal and semantic alignment across visits, achieving an average alignment score of 87.7%, surpassing the 80.7% baseline from real clinician-authored notes. The generated outputs also demonstrate high consistency across LLMs, reinforcing deterministic behavior essential for reproducibility, reliability, and clinical trust.

cross Civil Society in the Loop: Feedback-Driven Adaptation of (L)LM-Assisted Classification in an Open-Source Telegram Monitoring Tool

Authors: Milena Pustet, Elisabeth Steffen, Helena Mihaljevi\'c, Grischa Stanjek, Yannis Illies

Abstract: The role of civil society organizations (CSOs) in monitoring harmful online content is increasingly crucial, especially as platform providers reduce their investment in content moderation. AI tools can assist in detecting and monitoring harmful content at scale. However, few open-source tools offer seamless integration of AI models and social media monitoring infrastructures. Given their thematic expertise and contextual understanding of harmful content, CSOs should be active partners in co-developing technological tools, providing feedback, helping to improve models, and ensuring alignment with stakeholder needs and values, rather than as passive 'consumers'. However, collaborations between the open source community, academia, and civil society remain rare, and research on harmful content seldom translates into practical tools usable by civil society actors. This work in progress explores how CSOs can be meaningfully involved in an AI-assisted open-source monitoring tool of anti-democratic movements on Telegram, which we are currently developing in collaboration with CSO stakeholders.

cross DIFFUMA: High-Fidelity Spatio-Temporal Video Prediction via Dual-Path Mamba and Diffusion Enhancement

Authors: Xinyu Xie, Weifeng Cao, Jun Shi, Yangyang Hu, Hui Liang, Wanyong Liang, Xiaoliang Qian

Abstract: Spatio-temporal video prediction plays a pivotal role in critical domains, ranging from weather forecasting to industrial automation. However, in high-precision industrial scenarios such as semiconductor manufacturing, the absence of specialized benchmark datasets severely hampers research on modeling and predicting complex processes. To address this challenge, we make a twofold contribution.First, we construct and release the Chip Dicing Lane Dataset (CHDL), the first public temporal image dataset dedicated to the semiconductor wafer dicing process. Captured via an industrial-grade vision system, CHDL provides a much-needed and challenging benchmark for high-fidelity process modeling, defect detection, and digital twin development.Second, we propose DIFFUMA, an innovative dual-path prediction architecture specifically designed for such fine-grained dynamics. The model captures global long-range temporal context through a parallel Mamba module, while simultaneously leveraging a diffusion module, guided by temporal features, to restore and enhance fine-grained spatial details, effectively combating feature degradation. Experiments demonstrate that on our CHDL benchmark, DIFFUMA significantly outperforms existing methods, reducing the Mean Squared Error (MSE) by 39% and improving the Structural Similarity (SSIM) from 0.926 to a near-perfect 0.988. This superior performance also generalizes to natural phenomena datasets. Our work not only delivers a new state-of-the-art (SOTA) model but, more importantly, provides the community with an invaluable data resource to drive future research in industrial AI.

cross KAConvText: Novel Approach to Burmese Sentence Classification using Kolmogorov-Arnold Convolution

Authors: Ye Kyaw Thu, Thura Aung, Thazin Myint Oo, Thepchai Supnithi

Abstract: This paper presents the first application of Kolmogorov-Arnold Convolution for Text (KAConvText) in sentence classification, addressing three tasks: imbalanced binary hate speech detection, balanced multiclass news classification, and imbalanced multiclass ethnic language identification. We investigate various embedding configurations, comparing random to fastText embeddings in both static and fine-tuned settings, with embedding dimensions of 100 and 300 using CBOW and Skip-gram models. Baselines include standard CNNs and CNNs augmented with a Kolmogorov-Arnold Network (CNN-KAN). In addition, we investigated KAConvText with different classification heads - MLP and KAN, where using KAN head supports enhanced interpretability. Results show that KAConvText-MLP with fine-tuned fastText embeddings achieves the best performance of 91.23% accuracy (F1-score = 0.9109) for hate speech detection, 92.66% accuracy (F1-score = 0.9267) for news classification, and 99.82% accuracy (F1-score = 0.9982) for language identification.

cross FOLC-Net: A Federated-Optimized Lightweight Architecture for Enhanced MRI Disease Diagnosis across Axial, Coronal, and Sagittal Views

Authors: Saif Ur Rehman Khan, Muhammad Nabeel Asim, Sebastian Vollmer, Andreas Dengel

Abstract: The framework is designed to improve performance in the analysis of combined as well as single anatomical perspectives for MRI disease diagnosis. It specifically addresses the performance degradation observed in state-of-the-art (SOTA) models, particularly when processing axial, coronal, and sagittal anatomical planes. The paper introduces the FOLC-Net framework, which incorporates a novel federated-optimized lightweight architecture with approximately 1.217 million parameters and a storage requirement of only 0.9 MB. FOLC-Net integrates Manta-ray foraging optimization (MRFO) mechanisms for efficient model structure generation, global model cloning for scalable training, and ConvNeXt for enhanced client adaptability. The model was evaluated on combined multi-view data as well as individual views, such as axial, coronal, and sagittal, to assess its robustness in various medical imaging scenarios. Moreover, FOLC-Net tests a ShallowFed model on different data to evaluate its ability to generalize beyond the training dataset. The results show that FOLC-Net outperforms existing models, particularly in the challenging sagittal view. For instance, FOLC-Net achieved an accuracy of 92.44% on the sagittal view, significantly higher than the 88.37% accuracy of study method (DL + Residual Learning) and 88.95% of DL models. Additionally, FOLC-Net demonstrated improved accuracy across all individual views, providing a more reliable and robust solution for medical image analysis in decentralized environments. FOLC-Net addresses the limitations of existing SOTA models by providing a framework that ensures better adaptability to individual views while maintaining strong performance in multi-view settings. The incorporation of MRFO, global model cloning, and ConvNeXt ensures that FOLC-Net performs better in real-world medical applications.

cross Temporal Information Retrieval via Time-Specifier Model Merging

Authors: SeungYoon Han, Taeho Hwang, Sukmin Cho, Soyeong Jeong, Hoyun Song, Huije Lee, Jong C. Park

Abstract: The rapid expansion of digital information and knowledge across structured and unstructured sources has heightened the importance of Information Retrieval (IR). While dense retrieval methods have substantially improved semantic matching for general queries, they consistently underperform on queries with explicit temporal constraints--often those containing numerical expressions and time specifiers such as ``in 2015.'' Existing approaches to Temporal Information Retrieval (TIR) improve temporal reasoning but often suffer from catastrophic forgetting, leading to reduced performance on non-temporal queries. To address this, we propose Time-Specifier Model Merging (TSM), a novel method that enhances temporal retrieval while preserving accuracy on non-temporal queries. TSM trains specialized retrievers for individual time specifiers and merges them in to a unified model, enabling precise handling of temporal constraints without compromising non-temporal retrieval. Extensive experiments on both temporal and non-temporal datasets demonstrate that TSM significantly improves performance on temporally constrained queries while maintaining strong results on non-temporal queries, consistently outperforming other baseline methods. Our code is available at https://github.com/seungyoonee/TSM .

URLs: https://github.com/seungyoonee/TSM

cross ixi-GEN: Efficient Industrial sLLMs through Domain Adaptive Continual Pretraining

Authors: Seonwu Kim, Yohan Na, Kihun Kim, Hanhee Cho, Geun Lim, Mintae Kim, Seongik Park, Ki Hyun Kim, Youngsub Han, Byoung-Ki Jeon

Abstract: The emergence of open-source large language models (LLMs) has expanded opportunities for enterprise applications; however, many organizations still lack the infrastructure to deploy and maintain large-scale models. As a result, small LLMs (sLLMs) have become a practical alternative, despite their inherent performance limitations. While Domain Adaptive Continual Pretraining (DACP) has been previously explored as a method for domain adaptation, its utility in commercial applications remains under-examined. In this study, we validate the effectiveness of applying a DACP-based recipe across diverse foundation models and service domains. Through extensive experiments and real-world evaluations, we demonstrate that DACP-applied sLLMs achieve substantial gains in target domain performance while preserving general capabilities, offering a cost-efficient and scalable solution for enterprise-level deployment.

cross Text to model via SysML: Automated generation of dynamical system computational models from unstructured natural language text via enhanced System Modeling Language diagrams

Authors: Matthew Anderson Hendricks, Alice Cicirello

Abstract: This paper contributes to speeding up the design and deployment of engineering dynamical systems by proposing a strategy for exploiting domain and expert knowledge for the automated generation of dynamical system computational model starting from a corpus of document relevant to the dynamical system of interest and an input document describing the specific system. This strategy is implemented in five steps and, crucially, it uses system modeling language diagrams (SysML) to extract accurate information about the dependencies, attributes, and operations of components. Natural Language Processing (NLP) strategies and Large Language Models (LLMs) are employed in specific tasks to improve intermediate outputs of the SySML diagrams automated generation, such as: list of key nouns; list of extracted relationships; list of key phrases and key relationships; block attribute values; block relationships; and BDD diagram generation. The applicability of automated SysML diagram generation is illustrated with different case studies. The computational models of complex dynamical systems from SysML diagrams are then obtained via code generation and computational model generation steps. In the code generation step, NLP strategies are used for summarization, while LLMs are used for validation only. The proposed approach is not limited to a specific system, domain, or computational software. The applicability of the proposed approach is shown via an end-to-end example from text to model of a simple pendulum, showing improved performance compared to results yielded by LLMs only.

cross Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving

Authors: Zhenwen Liang, Linfeng Song, Yang Li, Tao Yang, Feng Zhang, Haitao Mi, Dong Yu

Abstract: Automated Theorem Proving (ATP) in formal languages is a foundational challenge for AI. While Large Language Models (LLMs) have driven remarkable progress, a significant gap remains between their powerful informal reasoning capabilities and their weak formal proving performance. Recent studies show that the informal accuracy exceeds 80% while formal success remains below 8% on benchmarks like PutnamBench. We argue this gap persists because current state-of-the-art provers, by tightly coupling reasoning and proving, are trained with paradigms that inadvertently punish deep reasoning in favor of shallow, tactic-based strategies. To bridge this fundamental gap, we propose a novel framework that decouples high-level reasoning from low-level proof generation. Our approach utilizes two distinct, specialized models: a powerful, general-purpose Reasoner to generate diverse, strategic subgoal lemmas, and an efficient Prover to rigorously verify them. This modular design liberates the model's full reasoning potential and bypasses the pitfalls of end-to-end training. We evaluate our method on a challenging set of post-2000 IMO problems, a problem set on which no prior open-source prover has reported success. Our decoupled framework successfully solves 5 of these problems, demonstrating a significant step towards automated reasoning on exceptionally difficult mathematical challenges. To foster future research, we release our full dataset of generated and verified lemmas for a wide range of IMO problems, available at https://tencent-imo.github.io/ .

URLs: https://tencent-imo.github.io/

cross Democratizing High-Fidelity Co-Speech Gesture Video Generation

Authors: Xu Yang, Shaoli Huang, Shenbo Xie, Xuelin Chen, Yifei Liu, Changxing Ding

Abstract: Co-speech gesture video generation aims to synthesize realistic, audio-aligned videos of speakers, complete with synchronized facial expressions and body gestures. This task presents challenges due to the significant one-to-many mapping between audio and visual content, further complicated by the scarcity of large-scale public datasets and high computational demands. We propose a lightweight framework that utilizes 2D full-body skeletons as an efficient auxiliary condition to bridge audio signals with visual outputs. Our approach introduces a diffusion model conditioned on fine-grained audio segments and a skeleton extracted from the speaker's reference image, predicting skeletal motions through skeleton-audio feature fusion to ensure strict audio coordination and body shape consistency. The generated skeletons are then fed into an off-the-shelf human video generation model with the speaker's reference image to synthesize high-fidelity videos. To democratize research, we present CSG-405-the first public dataset with 405 hours of high-resolution videos across 71 speech types, annotated with 2D skeletons and diverse speaker demographics. Experiments show that our method exceeds state-of-the-art approaches in visual quality and synchronization while generalizing across speakers and contexts.

cross Intrinsic Training Signals for Federated Learning Aggregation

Authors: Cosimo Fiorini, Matteo Mosconi, Pietro Buzzega, Riccardo Salami, Simone Calderara

Abstract: Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy. While existing approaches for aggregating client-specific classification heads and adapted backbone parameters require architectural modifications or loss function changes, our method uniquely leverages intrinsic training signals already available during standard optimization. We present LIVAR (Layer Importance and VARiance-based merging), which introduces: i) a variance-weighted classifier aggregation scheme using naturally emergent feature statistics, and ii) an explainability-driven LoRA merging technique based on SHAP analysis of existing update parameter patterns. Without any architectural overhead, LIVAR achieves state-of-the-art performance on multiple benchmarks while maintaining seamless integration with existing FL methods. This work demonstrates that effective model merging can be achieved solely through existing training signals, establishing a new paradigm for efficient federated model aggregation. The code will be made publicly available upon acceptance.

cross Comprehensive Evaluation of Prototype Neural Networks

Authors: Philipp Schlinge, Steffen Meinert, Martin Atzmueller

Abstract: Prototype models are an important method for explainable artificial intelligence (XAI) and interpretable machine learning. In this paper, we perform an in-depth analysis of a set of prominent prototype models including ProtoPNet, ProtoPool and PIPNet. For their assessment, we apply a comprehensive set of metrics. In addition to applying standard metrics from literature, we propose several new metrics to further complement the analysis of model interpretability. In our experimentation, we apply the set of prototype models on a diverse set of datasets including fine-grained classification, Non-IID settings and multi-label classification to further contrast the performance. Furthermore, we also provide our code as an open-source library, which facilitates simple application of the metrics itself, as well as extensibility - providing the option for easily adding new metrics and models. https://github.com/uos-sis/quanproto

URLs: https://github.com/uos-sis/quanproto

cross HeLo: Heterogeneous Multi-Modal Fusion with Label Correlation for Emotion Distribution Learning

Authors: Chuhang Zheng, Chunwei Tian, Jie Wen, Daoqiang Zhang, Qi Zhu

Abstract: Multi-modal emotion recognition has garnered increasing attention as it plays a significant role in human-computer interaction (HCI) in recent years. Since different discrete emotions may exist at the same time, compared with single-class emotion recognition, emotion distribution learning (EDL) that identifies a mixture of basic emotions has gradually emerged as a trend. However, existing EDL methods face challenges in mining the heterogeneity among multiple modalities. Besides, rich semantic correlations across arbitrary basic emotions are not fully exploited. In this paper, we propose a multi-modal emotion distribution learning framework, named HeLo, aimed at fully exploring the heterogeneity and complementary information in multi-modal emotional data and label correlation within mixed basic emotions. Specifically, we first adopt cross-attention to effectively fuse the physiological data. Then, an optimal transport (OT)-based heterogeneity mining module is devised to mine the interaction and heterogeneity between the physiological and behavioral representations. To facilitate label correlation learning, we introduce a learnable label embedding optimized by correlation matrix alignment. Finally, the learnable label embeddings and label correlation matrices are integrated with the multi-modal representations through a novel label correlation-driven cross-attention mechanism for accurate emotion distribution learning. Experimental results on two publicly available datasets demonstrate the superiority of our proposed method in emotion distribution learning.

cross Artificial Generals Intelligence: Mastering Generals.io with Reinforcement Learning

Authors: Matej Straka, Martin Schmid

Abstract: We introduce a real-time strategy game environment based on Generals.io, a game with thousands of weekly active players. Our environment is fully compatible with Gymnasium and PettingZoo and is capable of running thousands of frames per second on commodity hardware. We also present a reference agent, trained with supervised pre-training and self-play, which reached the top 0.003% of the 1v1 human leaderboard after only 36 hours on a single H100 GPU. To accelerate learning, we incorporate potential-based reward shaping and memory features. Our contributions of a modular RTS benchmark and a competitive baseline agent provide an accessible yet challenging platform for advancing multi-agent reinforcement learning research. The documented code, together with examples and tutorials, is available at https://github.com/strakam/generals-bots.

URLs: https://github.com/strakam/generals-bots.

cross Speckle2Self: Self-Supervised Ultrasound Speckle Reduction Without Clean Data

Authors: Xuesong Li, Nassir Navab, Zhongliang Jiang

Abstract: Image denoising is a fundamental task in computer vision, particularly in medical ultrasound (US) imaging, where speckle noise significantly degrades image quality. Although recent advancements in deep neural networks have led to substantial improvements in denoising for natural images, these methods cannot be directly applied to US speckle noise, as it is not purely random. Instead, US speckle arises from complex wave interference within the body microstructure, making it tissue-dependent. This dependency means that obtaining two independent noisy observations of the same scene, as required by pioneering Noise2Noise, is not feasible. Additionally, blind-spot networks also cannot handle US speckle noise due to its high spatial dependency. To address this challenge, we introduce Speckle2Self, a novel self-supervised algorithm for speckle reduction using only single noisy observations. The key insight is that applying a multi-scale perturbation (MSP) operation introduces tissue-dependent variations in the speckle pattern across different scales, while preserving the shared anatomical structure. This enables effective speckle suppression by modeling the clean image as a low-rank signal and isolating the sparse noise component. To demonstrate its effectiveness, Speckle2Self is comprehensively compared with conventional filter-based denoising algorithms and SOTA learning-based methods, using both realistic simulated US images and human carotid US images. Additionally, data from multiple US machines are employed to evaluate model generalization and adaptability to images from unseen domains. \textit{Code and datasets will be released upon acceptance.

cross Physics-Grounded Motion Forecasting via Equation Discovery for Trajectory-Guided Image-to-Video Generation

Authors: Tao Feng, Xianbing Zhao, Zhenhua Chen, Tien Tsin Wong, Hamid Rezatofighi, Gholamreza Haffari, Lizhen Qu

Abstract: Recent advances in diffusion-based and autoregressive video generation models have achieved remarkable visual realism. However, these models typically lack accurate physical alignment, failing to replicate real-world dynamics in object motion. This limitation arises primarily from their reliance on learned statistical correlations rather than capturing mechanisms adhering to physical laws. To address this issue, we introduce a novel framework that integrates symbolic regression (SR) and trajectory-guided image-to-video (I2V) models for physics-grounded video forecasting. Our approach extracts motion trajectories from input videos, uses a retrieval-based pre-training mechanism to enhance symbolic regression, and discovers equations of motion to forecast physically accurate future trajectories. These trajectories then guide video generation without requiring fine-tuning of existing models. Evaluated on scenarios in Classical Mechanics, including spring-mass, pendulums, and projectile motions, our method successfully recovers ground-truth analytical equations and improves the physical alignment of generated videos over baseline methods.

cross OpenDPDv2: A Unified Learning and Optimization Framework for Neural Network Digital Predistortion

Authors: Yizhuo Wu, Ang Li, Chang Gao

Abstract: Neural network (NN)-based Digital Predistortion (DPD) stands out in improving signal quality in wideband radio frequency (RF) power amplifiers (PAs) employing complex modulation. However, NN DPDs usually rely on a large number of parameters for effective linearization and can significantly contribute to the energy consumption of the digital back-end in RF systems. This paper presents OpenDPDv2, a unified framework for PA modeling, DPD learning, and model optimization to reduce power consumption while maintaining high linearization performance. The optimization techniques feature a novel DPD algorithm, TRes-DeltaGRU, alongside two energy-efficient methods. The top-performing 32-bit floating-point (FP32) TRes-DeltaGRU-DPD model achieves an Adjacent Channel Power Ratio (ACPR) of -59.4 dBc and Error Vector Magnitude (EVM) of -42.1 dBc. By exploiting fixed-point quantization and dynamic temporal sparsity of input signals and hidden neurons, the inference energy of our model can be reduced by 4.5X while still maintaining -50.3 dBc ACPR and -35.2 dB EVM with 56% temporal sparsity. This was evaluated using a TM3.1a 200 MHz bandwidth 256-QAM OFDM signal applied to a 3.5 GHz GaN Doherty RF PA. OpenDPDv2 code, datasets, and documentation are publicly accessible at: https://github.com/lab-emi/OpenDPD.

URLs: https://github.com/lab-emi/OpenDPD.

cross The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover

Authors: Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro

Abstract: The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables unprecedented capabilities in natural language processing and generation. However, these systems have introduced unprecedented security vulnerabilities that extend beyond traditional prompt injection attacks. This paper presents the first comprehensive evaluation of LLM agents as attack vectors capable of achieving complete computer takeover through the exploitation of trust boundaries within agentic AI systems where autonomous entities interact and influence each other. We demonstrate that adversaries can leverage three distinct attack surfaces - direct prompt injection, RAG backdoor attacks, and inter-agent trust exploitation - to coerce popular LLMs (including GPT-4o, Claude-4 and Gemini-2.5) into autonomously installing and executing malware on victim machines. Our evaluation of 17 state-of-the-art LLMs reveals an alarming vulnerability hierarchy: while 41.2% of models succumb to direct prompt injection, 52.9% are vulnerable to RAG backdoor attacks, and a critical 82.4% can be compromised through inter-agent trust exploitation. Notably, we discovered that LLMs which successfully resist direct malicious commands will execute identical payloads when requested by peer agents, revealing a fundamental flaw in current multi-agent security models. Our findings demonstrate that only 5.9% of tested models (1/17) proved resistant to all attack vectors, with the majority exhibiting context-dependent security behaviors that create exploitable blind spots. Our findings also highlight the need to increase awareness and research on the security risks of LLMs, showing a paradigm shift in cybersecurity threats, where AI tools themselves become sophisticated attack vectors.

cross DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models

Authors: Liang Wang, Yu Rong, Tingyang Xu, Zhenyi Zhong, Zhiyuan Liu, Pengju Wang, Deli Zhao, Qiang Liu, Shu Wu, Liang Wang

Abstract: Molecular structure elucidation from spectra is a foundational problem in chemistry, with profound implications for compound identification, synthesis, and drug development. Traditional methods rely heavily on expert interpretation and lack scalability. Pioneering machine learning methods have introduced retrieval-based strategies, but their reliance on finite libraries limits generalization to novel molecules. Generative models offer a promising alternative, yet most adopt autoregressive SMILES-based architectures that overlook 3D geometry and struggle to integrate diverse spectral modalities. In this work, we present DiffSpectra, a generative framework that directly infers both 2D and 3D molecular structures from multi-modal spectral data using diffusion models. DiffSpectra formulates structure elucidation as a conditional generation process. Its denoising network is parameterized by Diffusion Molecule Transformer, an SE(3)-equivariant architecture that integrates topological and geometric information. Conditioning is provided by SpecFormer, a transformer-based spectral encoder that captures intra- and inter-spectral dependencies from multi-modal spectra. Extensive experiments demonstrate that DiffSpectra achieves high accuracy in structure elucidation, recovering exact structures with 16.01% top-1 accuracy and 96.86% top-20 accuracy through sampling. The model benefits significantly from 3D geometric modeling, SpecFormer pre-training, and multi-modal conditioning. These results highlight the effectiveness of spectrum-conditioned diffusion modeling in addressing the challenge of molecular structure elucidation. To our knowledge, DiffSpectra is the first framework to unify multi-modal spectral reasoning and joint 2D/3D generative modeling for de novo molecular structure elucidation.

cross IAP: Invisible Adversarial Patch Attack through Perceptibility-Aware Localization and Perturbation Optimization

Authors: Subrat Kishore Dutta, Xiao Zhang

Abstract: Despite modifying only a small localized input region, adversarial patches can drastically change the prediction of computer vision models. However, prior methods either cannot perform satisfactorily under targeted attack scenarios or fail to produce contextually coherent adversarial patches, causing them to be easily noticeable by human examiners and insufficiently stealthy against automatic patch defenses. In this paper, we introduce IAP, a novel attack framework that generates highly invisible adversarial patches based on perceptibility-aware localization and perturbation optimization schemes. Specifically, IAP first searches for a proper location to place the patch by leveraging classwise localization and sensitivity maps, balancing the susceptibility of patch location to both victim model prediction and human visual system, then employs a perceptibility-regularized adversarial loss and a gradient update rule that prioritizes color constancy for optimizing invisible perturbations. Comprehensive experiments across various image benchmarks and model architectures demonstrate that IAP consistently achieves competitive attack success rates in targeted settings with significantly improved patch invisibility compared to existing baselines. In addition to being highly imperceptible to humans, IAP is shown to be stealthy enough to render several state-of-the-art patch defenses ineffective.

cross Winning and losing with Artificial Intelligence: What public discourse about ChatGPT tells us about how societies make sense of technological change

Authors: Adrian Rauchfleisch, Joshua Philip Suarez, Nikka Marie Sales, Andreas Jungherr

Abstract: Public product launches in Artificial Intelligence can serve as focusing events for collective attention, surfacing how societies react to technological change. Social media provide a window into the sensemaking around these events, surfacing hopes and fears and showing who chooses to engage in the discourse and when. We demonstrate that public sensemaking about AI is shaped by economic interests and cultural values of those involved. We analyze 3.8 million tweets posted by 1.6 million users across 117 countries in response to the public launch of ChatGPT in 2022. Our analysis shows how economic self-interest, proxied by occupational skill types in writing, programming, and mathematics, and national cultural orientations, as measured by Hofstede's individualism, uncertainty avoidance, and power distance dimensions, shape who speaks, when they speak, and their stance towards ChatGPT. Roles requiring more technical skills, such as programming and mathematics, tend to engage earlier and express more positive stances, whereas writing-centric occupations join later with greater skepticism. At the cultural level, individualism predicts both earlier engagement and a more negative stance, and uncertainty avoidance reduces the prevalence of positive stances but does not delay when users first engage with ChatGPT. Aggregate sentiment trends mask the dynamics observed in our study. The shift toward a more critical stance towards ChatGPT over time stems primarily from the entry of more skeptical voices rather than a change of heart among early adopters. Our findings underscore the importance of both the occupational background and cultural context in understanding public reactions to AI.

cross A Single-Point Measurement Framework for Robust Cyber-Attack Diagnosis in Smart Microgrids Using Dual Fractional-Order Feature Analysis

Authors: Yifan Wang

Abstract: Cyber-attacks jeopardize the safe operation of smart microgrids. At the same time, existing diagnostic methods either depend on expensive multi-point instrumentation or stringent modelling assumptions that are untenable under single-sensor constraints. This paper proposes a Fractional-Order Memory-Enhanced Attack-Diagnosis Scheme (FO-MADS) that achieves low-latency fault localisation and cyber-attack detection using only one VPQ (Voltage-Power-Reactive-power) sensor. FO-MADS first constructs a dual fractional-order feature library by jointly applying Caputo and Gr\"unwald-Letnikov derivatives, thereby amplifying micro-perturbations and slow drifts in the VPQ signal. A two-stage hierarchical classifier then pinpoints the affected inverter and isolates the faulty IGBT switch, effectively alleviating class imbalance. Robustness is further strengthened through Progressive Memory-Replay Adversarial Training (PMR-AT), whose attack-aware loss is dynamically re-weighted via Online Hard Example Mining (OHEM) to prioritise the most challenging samples. Experiments on a four-inverter microgrid testbed comprising 1 normal and 24 fault classes under four attack scenarios demonstrate diagnostic accuracies of 96.6 % (bias), 94.0 % (noise), 92.8 % (data replacement), and 95.7 % (replay), while sustaining 96.7 % under attack-free conditions. These results establish FO-MADS as a cost-effective and readily deployable solution that markedly enhances the cyber-physical resilience of smart microgrids.

cross Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model

Authors: Jing Liang, Hongyao Tang, Yi Ma, Jinyi Liu, Yan Zheng, Shuyue Hu, Lei Bai, Jianye Hao

Abstract: Reinforcement Learning (RL) has demonstrated its potential to improve the reasoning ability of Large Language Models (LLMs). One major limitation of most existing Reinforcement Finetuning (RFT) methods is that they are on-policy RL in nature, i.e., data generated during the past learning process is not fully utilized. This inevitably comes at a significant cost of compute and time, posing a stringent bottleneck on continuing economic and efficient scaling. To this end, we launch the renaissance of off-policy RL and propose Reincarnating Mix-policy Proximal Policy Gradient (ReMix), a general approach to enable on-policy RFT methods like PPO and GRPO to leverage off-policy data. ReMix consists of three major components: (1) Mix-policy proximal policy gradient with an increased Update-To-Data (UTD) ratio for efficient training; (2) KL-Convex policy constraint to balance the trade-off between stability and flexibility; (3) Policy reincarnation to achieve a seamless transition from efficient early-stage learning to steady asymptotic improvement. In our experiments, we train a series of ReMix models upon PPO, GRPO and 1.5B, 7B base models. ReMix shows an average Pass@1 accuracy of 52.10% (for 1.5B model) with 0.079M response rollouts, 350 training steps and achieves 63.27%/64.39% (for 7B model) with 0.007M/0.011M response rollouts, 50/75 training steps, on five math reasoning benchmarks (i.e., AIME'24, AMC'23, Minerva, OlympiadBench, and MATH500). Compared with 15 recent advanced models, ReMix shows SOTA-level performance with an over 30x to 450x reduction in training cost in terms of rollout data volume. In addition, we reveal insightful findings via multifaceted analysis, including the implicit preference for shorter responses due to the Whipping Effect of off-policy discrepancy, the collapse mode of self-reflection behavior under the presence of severe off-policyness, etc.

cross Developing and Maintaining an Open-Source Repository of AI Evaluations: Challenges and Insights

Authors: Alexandra Abbas, Celia Waggoner, Justin Olive

Abstract: AI evaluations have become critical tools for assessing large language model capabilities and safety. This paper presents practical insights from eight months of maintaining $inspect\_evals$, an open-source repository of 70+ community-contributed AI evaluations. We identify key challenges in implementing and maintaining AI evaluations and develop solutions including: (1) a structured cohort management framework for scaling community contributions, (2) statistical methodologies for optimal resampling and cross-model comparison with uncertainty quantification, and (3) systematic quality control processes for reproducibility. Our analysis reveals that AI evaluation requires specialized infrastructure, statistical rigor, and community coordination beyond traditional software development practices.

cross SCoRE: Streamlined Corpus-based Relation Extraction using Multi-Label Contrastive Learning and Bayesian kNN

Authors: Luca Mariotti, Veronica Guidetti, Federica Mandreoli

Abstract: The growing demand for efficient knowledge graph (KG) enrichment leveraging external corpora has intensified interest in relation extraction (RE), particularly under low-supervision settings. To address the need for adaptable and noise-resilient RE solutions that integrate seamlessly with pre-trained large language models (PLMs), we introduce SCoRE, a modular and cost-effective sentence-level RE system. SCoRE enables easy PLM switching, requires no finetuning, and adapts smoothly to diverse corpora and KGs. By combining supervised contrastive learning with a Bayesian k-Nearest Neighbors (kNN) classifier for multi-label classification, it delivers robust performance despite the noisy annotations of distantly supervised corpora. To improve RE evaluation, we propose two novel metrics: Correlation Structure Distance (CSD), measuring the alignment between learned relational patterns and KG structures, and Precision at R (P@R), assessing utility as a recommender system. We also release Wiki20d, a benchmark dataset replicating real-world RE conditions where only KG-derived annotations are available. Experiments on five benchmarks show that SCoRE matches or surpasses state-of-the-art methods while significantly reducing energy consumption. Further analyses reveal that increasing model complexity, as seen in prior work, degrades performance, highlighting the advantages of SCoRE's minimal design. Combining efficiency, modularity, and scalability, SCoRE stands as an optimal choice for real-world RE applications.

cross VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

Authors: Ziang Ye, Yang Zhang, Wentao Shi, Xiaoyu You, Fuli Feng, Tat-Seng Chua

Abstract: Graphical User Interface (GUI) agents powered by Large Vision-Language Models (LVLMs) have emerged as a revolutionary approach to automating human-machine interactions, capable of autonomously operating personal devices (e.g., mobile phones) or applications within the device to perform complex real-world tasks in a human-like manner. However, their close integration with personal devices raises significant security concerns, with many threats, including backdoor attacks, remaining largely unexplored. This work reveals that the visual grounding of GUI agent-mapping textual plans to GUI elements-can introduce vulnerabilities, enabling new types of backdoor attacks. With backdoor attack targeting visual grounding, the agent's behavior can be compromised even when given correct task-solving plans. To validate this vulnerability, we propose VisualTrap, a method that can hijack the grounding by misleading the agent to locate textual plans to trigger locations instead of the intended targets. VisualTrap uses the common method of injecting poisoned data for attacks, and does so during the pre-training of visual grounding to ensure practical feasibility of attacking. Empirical results show that VisualTrap can effectively hijack visual grounding with as little as 5% poisoned data and highly stealthy visual triggers (invisible to the human eye); and the attack can be generalized to downstream tasks, even after clean fine-tuning. Moreover, the injected trigger can remain effective across different GUI environments, e.g., being trained on mobile/web and generalizing to desktop environments. These findings underscore the urgent need for further research on backdoor attack risks in GUI agents.

cross MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection

Authors: Ziyan Liu, Chunxiao Fan, Haoran Lou, Yuexin Wu, Kaiwei Deng

Abstract: The rapid expansion of memes on social media has highlighted the urgent need for effective approaches to detect harmful content. However, traditional data-driven approaches struggle to detect new memes due to their evolving nature and the lack of up-to-date annotated data. To address this issue, we propose MIND, a multi-agent framework for zero-shot harmful meme detection that does not rely on annotated data. MIND implements three key strategies: 1) We retrieve similar memes from an unannotated reference set to provide contextual information. 2) We propose a bi-directional insight derivation mechanism to extract a comprehensive understanding of similar memes. 3) We then employ a multi-agent debate mechanism to ensure robust decision-making through reasoned arbitration. Extensive experiments on three meme datasets demonstrate that our proposed framework not only outperforms existing zero-shot approaches but also shows strong generalization across different model architectures and parameter scales, providing a scalable solution for harmful meme detection. The code is available at https://github.com/destroy-lonely/MIND.

URLs: https://github.com/destroy-lonely/MIND.

cross MultiJustice: A Chinese Dataset for Multi-Party, Multi-Charge Legal Prediction

Authors: Xiao Wang, Jiahuan Pei, Diancheng Shui, Zhiguang Han, Xin Sun, Dawei Zhu, Xiaoyu Shen

Abstract: Legal judgment prediction offers a compelling method to aid legal practitioners and researchers. However, the research question remains relatively under-explored: Should multiple defendants and charges be treated separately in LJP? To address this, we introduce a new dataset namely multi-person multi-charge prediction (MPMCP), and seek the answer by evaluating the performance of several prevailing legal large language models (LLMs) on four practical legal judgment scenarios: (S1) single defendant with a single charge, (S2) single defendant with multiple charges, (S3) multiple defendants with a single charge, and (S4) multiple defendants with multiple charges. We evaluate the dataset across two LJP tasks, i.e., charge prediction and penalty term prediction. We have conducted extensive experiments and found that the scenario involving multiple defendants and multiple charges (S4) poses the greatest challenges, followed by S2, S3, and S1. The impact varies significantly depending on the model. For example, in S4 compared to S1, InternLM2 achieves approximately 4.5% lower F1-score and 2.8% higher LogD, while Lawformer demonstrates around 19.7% lower F1-score and 19.0% higher LogD. Our dataset and code are available at https://github.com/lololo-xiao/MultiJustice-MPMCP.

URLs: https://github.com/lololo-xiao/MultiJustice-MPMCP.

cross Beyond Connectivity: An Open Architecture for AI-RAN Convergence in 6G

Authors: Michele Polese, Niloofar Mohamadi, Salvatore D'Oro, Tommaso Melodia

Abstract: The proliferation of data-intensive Artificial Intelligence (AI) applications at the network edge demands a fundamental shift in RAN design, from merely consuming AI for network optimization, to actively enabling distributed AI workloads. This paradigm shift presents a significant opportunity for network operators to monetize AI at the edge while leveraging existing infrastructure investments. To realize this vision, this article presents a novel converged O-RAN and AI-RAN architecture that unifies orchestration and management of both telecommunications and AI workloads on shared infrastructure. The proposed architecture extends the Open RAN principles of modularity, disaggregation, and cloud-nativeness to support heterogeneous AI deployments. We introduce two key architectural innovations: (i) the AI-RAN Orchestrator, which extends the O-RAN Service Management and Orchestration (SMO) to enable integrated resource and allocation across RAN and AI workloads; and (ii) AI-RAN sites that provide distributed edge AI platforms with real-time processing capabilities. The proposed system supports flexible deployment options, allowing AI workloads to be orchestrated with specific timing requirements (real-time or batch processing) and geographic targeting. The proposed architecture addresses the orchestration requirements for managing heterogeneous workloads at different time scales while maintaining open, standardized interfaces and multi-vendor interoperability.

cross What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Authors: Keyon Vafa, Peter G. Chang, Ashesh Rambachan, Sendhil Mullainathan

Abstract: Foundation models are premised on the idea that sequence prediction can uncover deeper domain understanding, much like how Kepler's predictions of planetary motion later led to the discovery of Newtonian mechanics. However, evaluating whether these models truly capture deeper structure remains a challenge. We develop a technique for evaluating foundation models that examines how they adapt to synthetic datasets generated from some postulated world model. Our technique measures whether the foundation model's inductive bias aligns with the world model, and so we refer to it as an inductive bias probe. Across multiple domains, we find that foundation models can excel at their training tasks yet fail to develop inductive biases towards the underlying world model when adapted to new tasks. We particularly find that foundation models trained on orbital trajectories consistently fail to apply Newtonian mechanics when adapted to new physics tasks. Further analysis reveals that these models behave as if they develop task-specific heuristics that fail to generalize.

cross CheXPO: Preference Optimization for Chest X-ray VLMs with Counterfactual Rationale

Authors: Xiao Liang, Jiawei Hu, Di Wang, Zhi Ma, Lin Zhao, Ronghan Li, Bo Wan, Quan Wang

Abstract: Vision-language models (VLMs) are prone to hallucinations that critically compromise reliability in medical applications. While preference optimization can mitigate these hallucinations through clinical feedback, its implementation faces challenges such as clinically irrelevant training samples, imbalanced data distributions, and prohibitive expert annotation costs. To address these challenges, we introduce CheXPO, a Chest X-ray Preference Optimization strategy that combines confidence-similarity joint mining with counterfactual rationale. Our approach begins by synthesizing a unified, fine-grained multi-task chest X-ray visual instruction dataset across different question types for supervised fine-tuning (SFT). We then identify hard examples through token-level confidence analysis of SFT failures and use similarity-based retrieval to expand hard examples for balancing preference sample distributions, while synthetic counterfactual rationales provide fine-grained clinical preferences, eliminating the need for additional expert input. Experiments show that CheXPO achieves 8.93% relative performance gain using only 5% of SFT samples, reaching state-of-the-art performance across diverse clinical tasks and providing a scalable, interpretable solution for real-world radiology applications.

cross Noisy PDE Training Requires Bigger PINNs

Authors: Sebastien Andre-Sloan, Anirbit Mukherjee, Matthew Colbrook

Abstract: Physics-Informed Neural Networks (PINNs) are increasingly used to approximate solutions of partial differential equations (PDEs), especially in high dimensions. In real-world applications, data samples are noisy, so it is important to know when a predictor can still achieve low empirical risk. However, little is known about the conditions under which a PINN can do so effectively. We prove a lower bound on the size of neural networks required for the supervised PINN empirical risk to fall below the variance of noisy supervision labels. Specifically, if a predictor achieves an empirical risk $O(\eta)$ below $\sigma^2$ (variance of supervision data), then necessarily $d_N\log d_N\gtrsim N_s \eta^2$, where $N_s$ is the number of samples and $d_N$ is the number of trainable parameters of the PINN. A similar constraint applies to the fully unsupervised PINN setting when boundary labels are sampled noisily. Consequently, increasing the number of noisy supervision labels alone does not provide a ``free lunch'' in reducing empirical risk. We also show empirically that PINNs can indeed achieve empirical risks below $\sigma^2$ under such conditions. As a case study, we investigate PINNs applied to the Hamilton--Jacobi--Bellman (HJB) PDE. Our findings lay the groundwork for quantitatively understanding the parameter requirements for training PINNs in the presence of noise.

cross Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Authors: Bogdan Kulynych, Juan Felipe Gomez, Georgios Kaissis, Jamie Hayes, Borja Balle, Flavio du Pin Calmon, Jean Louis Raisaro

Abstract: Differentially private (DP) mechanisms are difficult to interpret and calibrate because existing methods for mapping standard privacy parameters to concrete privacy risks -- re-identification, attribute inference, and data reconstruction -- are both overly pessimistic and inconsistent. In this work, we use the hypothesis-testing interpretation of DP ($f$-DP), and determine that bounds on attack success can take the same unified form across re-identification, attribute inference, and data reconstruction risks. Our unified bounds are (1) consistent across a multitude of attack settings, and (2) tunable, enabling practitioners to evaluate risk with respect to arbitrary (including worst-case) levels of baseline risk. Empirically, our results are tighter than prior methods using $\varepsilon$-DP, R\'enyi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., more than 15pp accuracy increase in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.

cross MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Authors: Qilong Xing, Zikai Song, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang

Abstract: Despite significant advancements in adapting Large Language Models (LLMs) for radiology report generation (RRG), clinical adoption remains challenging due to difficulties in accurately mapping pathological and anatomical features to their corresponding text descriptions. Additionally, semantic agnostic feature extraction further hampers the generation of accurate diagnostic reports. To address these challenges, we introduce Medical Concept Aligned Radiology Report Generation (MCA-RG), a knowledge-driven framework that explicitly aligns visual features with distinct medical concepts to enhance the report generation process. MCA-RG utilizes two curated concept banks: a pathology bank containing lesion-related knowledge, and an anatomy bank with anatomical descriptions. The visual features are aligned with these medical concepts and undergo tailored enhancement. We further propose an anatomy-based contrastive learning procedure to improve the generalization of anatomical features, coupled with a matching loss for pathological features to prioritize clinically relevant regions. Additionally, a feature gating mechanism is employed to filter out low-quality concept features. Finally, the visual features are corresponding to individual medical concepts, and are leveraged to guide the report generation process. Experiments on two public benchmarks (MIMIC-CXR and CheXpert Plus) demonstrate that MCA-RG achieves superior performance, highlighting its effectiveness in radiology report generation.

cross Cross-Modality Masked Learning for Survival Prediction in ICI Treated NSCLC Patients

Authors: Qilong Xing, Zikai Song, Bingxin Gong, Lian Yang, Junqing Yu, Wei Yang

Abstract: Accurate prognosis of non-small cell lung cancer (NSCLC) patients undergoing immunotherapy is essential for personalized treatment planning, enabling informed patient decisions, and improving both treatment outcomes and quality of life. However, the lack of large, relevant datasets and effective multi-modal feature fusion strategies pose significant challenges in this domain. To address these challenges, we present a large-scale dataset and introduce a novel framework for multi-modal feature fusion aimed at enhancing the accuracy of survival prediction. The dataset comprises 3D CT images and corresponding clinical records from NSCLC patients treated with immune checkpoint inhibitors (ICI), along with progression-free survival (PFS) and overall survival (OS) data. We further propose a cross-modality masked learning approach for medical feature fusion, consisting of two distinct branches, each tailored to its respective modality: a Slice-Depth Transformer for extracting 3D features from CT images and a graph-based Transformer for learning node features and relationships among clinical variables in tabular data. The fusion process is guided by a masked modality learning strategy, wherein the model utilizes the intact modality to reconstruct missing components. This mechanism improves the integration of modality-specific features, fostering more effective inter-modality relationships and feature interactions. Our approach demonstrates superior performance in multi-modal integration for NSCLC survival prediction, surpassing existing methods and setting a new benchmark for prognostic models in this context.

cross Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing

Authors: Eunbyeol Cho, Jiyoun Kim, Minjae Lee, Sungjin Park, Edward Choi

Abstract: Electronic Health Records (EHR) are time-series relational databases that record patient interactions and medical events over time, serving as a critical resource for healthcare research and applications. However, privacy concerns and regulatory restrictions limit the sharing and utilization of such sensitive data, necessitating the generation of synthetic EHR datasets. Unlike previous EHR synthesis methods, which typically generate medical records consisting of expert-chosen features (e.g. a few vital signs or structured codes only), we introduce RawMed, the first framework to synthesize multi-table, time-series EHR data that closely resembles raw EHRs. Using text-based representation and compression techniques, RawMed captures complex structures and temporal dynamics with minimal preprocessing. We also propose a new evaluation framework for multi-table time-series synthetic EHRs, assessing distributional similarity, inter-table relationships, temporal dynamics, and privacy. Validated on two open-source EHR datasets, RawMed outperforms baseline models in fidelity and utility. The code is available at https://github.com/eunbyeol-cho/RawMed.

URLs: https://github.com/eunbyeol-cho/RawMed.

cross FlexOlmo: Open Language Models for Flexible Data Use

Authors: Weijia Shi, Akshita Bhagia, Kevin Farhat, Niklas Muennighoff, Pete Walsh, Jacob Morrison, Dustin Schwenk, Shayne Longpre, Jake Poznanski, Allyson Ettinger, Daogao Liu, Margaret Li, Dirk Groeneveld, Mike Lewis, Wen-tau Yih, Luca Soldaini, Kyle Lo, Noah A. Smith, Luke Zettlemoyer, Pang Wei Koh, Hannaneh Hajishirzi, Ali Farhadi, Sewon Min

Abstract: We introduce FlexOlmo, a new class of language models (LMs) that supports (1) distributed training without data sharing, where different model parameters are independently trained on closed datasets, and (2) data-flexible inference, where these parameters along with their associated data can be flexibly included or excluded from model inferences with no further training. FlexOlmo employs a mixture-of-experts (MoE) architecture where each expert is trained independently on closed datasets and later integrated through a new domain-informed routing without any joint training. FlexOlmo is trained on FlexMix, a corpus we curate comprising publicly available datasets alongside seven domain-specific sets, representing realistic approximations of closed sets. We evaluate models with up to 37 billion parameters (20 billion active) on 31 diverse downstream tasks. We show that a general expert trained on public data can be effectively combined with independently trained experts from other data owners, leading to an average 41% relative improvement while allowing users to opt out of certain data based on data licensing or permission requirements. Our approach also outperforms prior model merging methods by 10.1% on average and surpasses the standard MoE trained without data restrictions using the same training FLOPs. Altogether, this research presents a solution for both data owners and researchers in regulated industries with sensitive or protected data. FlexOlmo enables benefiting from closed data while respecting data owners' preferences by keeping their data local and supporting fine-grained control of data access during inference.

cross Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices

Authors: Parshva Dhilankumar Patel

Abstract: This paper presents the design and development of an OCR-powered pipeline for efficient table extraction from invoices. The system leverages Tesseract OCR for text recognition and custom post-processing logic to detect, align, and extract structured tabular data from scanned invoice documents. Our approach includes dynamic preprocessing, table boundary detection, and row-column mapping, optimized for noisy and non-standard invoice formats. The resulting pipeline significantly improves data extraction accuracy and consistency, supporting real-world use cases such as automated financial workflows and digital archiving.

cross PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments

Authors: Hanqun Cao, Xinyi Zhou, Zijun Gao, Chenyu Wang, Xin Gao, Zhi Zhang, Chunbin Gu, Ge Liu, Pheng-Ann Heng

Abstract: Protein structure prediction is essential for drug discovery and understanding biological functions. While recent advancements like AlphaFold have achieved remarkable accuracy, most folding models rely heavily on multiple sequence alignments (MSAs) to boost prediction performance. This dependency limits their effectiveness on low-homology proteins and orphan proteins, where MSA information is sparse or unavailable. To address this limitation, we propose PLAME, a novel MSA design model that leverages evolutionary embeddings from pretrained protein language models. Unlike existing methods, PLAME introduces pretrained representations to enhance evolutionary information and employs a conservation-diversity loss to enhance generation quality. Additionally, we propose a novel MSA selection method to effectively screen high-quality MSAs and improve folding performance. We also propose a sequence quality assessment metric that provides an orthogonal perspective to evaluate MSA quality. On the AlphaFold2 benchmark of low-homology and orphan proteins, PLAME achieves state-of-the-art performance in folding enhancement and sequence quality assessment, with consistent improvements demonstrated on AlphaFold3. Ablation studies validate the effectiveness of the MSA selection method, while extensive case studies on various protein types provide insights into the relationship between AlphaFold's prediction quality and MSA characteristics. Furthermore, we demonstrate that PLAME can serve as an adapter achieving AlphaFold2-level accuracy with the ESMFold's inference speed.

cross Surrogate Model for Heat Transfer Prediction in Impinging Jet Arrays using Dynamic Inlet/Outlet and Flow Rate Control

Authors: Mikael Vaillant, Victor Oliveira Ferreira, Wiebke Mainville, Jean-Michel Lamarre, Vincent Raymond, Moncef Chioua, Bruno Blais

Abstract: This study presents a surrogate model designed to predict the Nusselt number distribution in an enclosed impinging jet arrays, where each jet function independently and where jets can be transformed from inlets to outlets, leading to a vast number of possible flow arrangements. While computational fluid dynamics (CFD) simulations can model heat transfer with high fidelity, their cost prohibits real-time application such as model-based temperature control. To address this, we generate a CNN-based surrogate model that can predict the Nusselt distribution in real time. We train it with data from implicit large eddy computational fluid dynamics simulations (Re < 2,000). We train two distinct models, one for a five by one array of jets (83 simulations) and one for a three by three array of jets (100 simulations). We introduce a method to extrapolate predictions to higher Reynolds numbers (Re < 10,000) using a correlation-based scaling. The surrogate models achieve high accuracy, with a normalized mean average error below 2% on validation data for the five by one surrogate model and 0.6% for the three by three surrogate model. Experimental validation confirms the model's predictive capabilities. This work provides a foundation for model-based control strategies in advanced thermal management applications.

cross Modeling Heterogeneity across Varying Spatial Extents: Discovering Linkages between Sea Ice Retreat and Ice Shelve Melt in the Antarctic

Authors: Maloy Kumar Devnath, Sudip Chakraborty, Vandana P. Janeja

Abstract: Spatial phenomena often exhibit heterogeneity across spatial extents and in proximity, making them complex to model-especially in dynamic regions like ice shelves and sea ice. In this study, we address this challenge by exploring the linkages between sea ice retreat and Antarctic ice shelf (AIS) melt. Although atmospheric forcing and basal melting have been widely studied, the direct impact of sea ice retreat on AIS mass loss remains underexplored. Traditional models treat sea ice and AIS as separate systems. It limits their ability to capture localized linkages and cascading feedback. To overcome this, we propose Spatial-Link, a novel graph-based framework that quantifies spatial heterogeneity to capture linkages between sea ice retreat and AIS melt. Our method constructs a spatial graph using Delaunay triangulation of satellite-derived ice change matrices, where nodes represent regions of significant change and edges encode proximity and directional consistency. We extract and statistically validate linkage paths using breadth-first search and Monte Carlo simulations. Results reveal non-local, spatially heterogeneous coupling patterns, suggesting sea ice loss can initiate or amplify downstream AIS melt. Our analysis shows how sea ice retreat evolves over an oceanic grid and progresses toward ice shelves-establishing a direct linkage. To our knowledge, this is the first proposed methodology linking sea ice retreat to AIS melt. Spatial-Link offers a scalable, data-driven tool to improve sea-level rise projections and inform climate adaptation strategies.

cross Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation

Authors: Haris Khan, Shumaila Asif, Hassan Nasir

Abstract: The integration of artificial intelligence into hearing assistance marks a paradigm shift from traditional amplification-based systems to intelligent, context-aware audio processing. This systematic literature review evaluates advances in AI-driven selective noise cancellation (SNC) for hearing aids, highlighting technological evolution, implementation challenges, and future research directions. We synthesize findings across deep learning architectures, hardware deployment strategies, clinical validation studies, and user-centric design. The review traces progress from early machine learning models to state-of-the-art deep networks, including Convolutional Recurrent Networks for real-time inference and Transformer-based architectures for high-accuracy separation. Key findings include significant gains over traditional methods, with recent models achieving up to 18.3 dB SI-SDR improvement on noisy-reverberant benchmarks, alongside sub-10 ms real-time implementations and promising clinical outcomes. Yet, challenges remain in bridging lab-grade models with real-world deployment - particularly around power constraints, environmental variability, and personalization. Identified research gaps include hardware-software co-design, standardized evaluation protocols, and regulatory considerations for AI-enhanced hearing devices. Future work must prioritize lightweight models, continual learning, contextual-based classification and clinical translation to realize transformative hearing solutions for millions globally.

cross A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering

Authors: Shahana Yasmin Chowdhury, Bithi Banik, Md Tamjidul Hoque, Shreya Banerjee

Abstract: Nowadays, speech emotion recognition (SER) plays a vital role in the field of human-computer interaction (HCI) and the evolution of artificial intelligence (AI). Our proposed DCRF-BiLSTM model is used to recognize seven emotions: neutral, happy, sad, angry, fear, disgust, and surprise, which are trained on five datasets: RAVDESS (R), TESS (T), SAVEE (S), EmoDB (E), and Crema-D (C). The model achieves high accuracy on individual datasets, including 97.83% on RAVDESS, 97.02% on SAVEE, 95.10% for CREMA-D, and a perfect 100% on both TESS and EMO-DB. For the combined (R+T+S) datasets, it achieves 98.82% accuracy, outperforming previously reported results. To our knowledge, no existing study has evaluated a single SER model across all five benchmark datasets (i.e., R+T+S+C+E) simultaneously. In our work, we introduce this comprehensive combination and achieve a remarkable overall accuracy of 93.76%. These results confirm the robustness and generalizability of our DCRF-BiLSTM framework across diverse datasets.

cross Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification

Authors: Martin Sondermann, Pinar Bisgin, Niklas Tschorn, Anja Burmann, Christoph M. Friedrich

Abstract: The automated classification of phonocardiogram (PCG) recordings represents a substantial advancement in cardiovascular diagnostics. This paper presents a systematic comparison of four distinct models for heart murmur detection: two specialized convolutional neural networks (CNNs) and two zero-shot universal audio transformers (BEATs), evaluated using fixed-length and heart cycle normalization approaches. Utilizing the PhysioNet2022 dataset, a custom heart cycle normalization method tailored to individual cardiac rhythms is introduced. The findings indicate the following AUROC values: the CNN model with fixed-length windowing achieves 79.5%, the CNN model with heart cycle normalization scores 75.4%, the BEATs transformer with fixed-length windowing achieves 65.7%, and the BEATs transformer with heart cycle normalization results in 70.1%. The findings indicate that physiological signal constraints, especially those introduced by different normalization strategies, have a substantial impact on model performance. The research provides evidence-based guidelines for architecture selection in clinical settings, emphasizing the need for a balance between accuracy and computational efficiency. Although specialized CNNs demonstrate superior performance overall, the zero-shot transformer models may offer promising efficiency advantages during development, such as faster training and evaluation cycles, despite their lower classification accuracy. These findings highlight the potential of automated classification systems to enhance cardiac diagnostics and improve patient care.

cross DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

Authors: Shreyas Vinaya Sathyanarayana, Rahil Shah, Sharanabasava D. Hiremath, Rishikesh Panda, Rahul Jana, Riya Singh, Rida Irfan, Ashwin Murali, Bharath Ramsundar

Abstract: Retrosynthesis, the identification of precursor molecules for a target compound, is pivotal for synthesizing complex molecules, but faces challenges in discovering novel pathways beyond predefined templates. Recent large language model (LLM) approaches to retrosynthesis have shown promise but effectively harnessing LLM reasoning capabilities for effective multi-step planning remains an open question. To address this challenge, we introduce DeepRetro, an open-source, iterative, hybrid LLM-based retrosynthetic framework. Our approach integrates the strengths of conventional template-based/Monte Carlo tree search tools with the generative power of LLMs in a step-wise, feedback-driven loop. Initially, synthesis planning is attempted with a template-based engine. If this fails, the LLM subsequently proposes single-step retrosynthetic disconnections. Crucially, these suggestions undergo rigorous validity, stability, and hallucination checks before the resulting precursors are recursively fed back into the pipeline for further evaluation. This iterative refinement allows for dynamic pathway exploration and correction. We demonstrate the potential of this pipeline through benchmark evaluations and case studies, showcasing its ability to identify viable and potentially novel retrosynthetic routes. In particular, we develop an interactive graphical user interface that allows expert human chemists to provide human-in-the-loop feedback to the reasoning algorithm. This approach successfully generates novel pathways for complex natural product compounds, demonstrating the potential for iterative LLM reasoning to advance state-of-art in complex chemical syntheses.

cross Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach

Authors: Adrian S. Roman, Iran R. Roman, Juan P. Bello

Abstract: Acoustic mapping techniques have long been used in spatial audio processing for direction of arrival estimation (DoAE). Traditional beamforming methods for acoustic mapping, while interpretable, often rely on iterative solvers that can be computationally intensive and sensitive to acoustic variability. On the other hand, recent supervised deep learning approaches offer feedforward speed and robustness but require large labeled datasets and lack interpretability. Despite their strengths, both methods struggle to consistently generalize across diverse acoustic setups and array configurations, limiting their broader applicability. We introduce the Latent Acoustic Mapping (LAM) model, a self-supervised framework that bridges the interpretability of traditional methods with the adaptability and efficiency of deep learning methods. LAM generates high-resolution acoustic maps, adapts to varying acoustic conditions, and operates efficiently across different microphone arrays. We assess its robustness on DoAE using the LOCATA and STARSS benchmarks. LAM achieves comparable or superior localization performance to existing supervised methods. Additionally, we show that LAM's acoustic maps can serve as effective features for supervised models, further enhancing DoAE accuracy and underscoring its potential to advance adaptive, high-performance sound localization systems.

cross An AI Approach for Learning the Spectrum of the Laplace-Beltrami Operator

Authors: Yulin An, Enrique del Castillo

Abstract: The spectrum of the Laplace-Beltrami (LB) operator is central in geometric deep learning tasks, capturing intrinsic properties of the shape of the object under consideration. The best established method for its estimation, from a triangulated mesh of the object, is based on the Finite Element Method (FEM), and computes the top k LB eigenvalues with a complexity of O(Nk), where N is the number of points. This can render the FEM method inefficient when repeatedly applied to databases of CAD mechanical parts, or in quality control applications where part metrology is acquired as large meshes and decisions about the quality of each part are needed quickly and frequently. As a solution to this problem, we present a geometric deep learning framework to predict the LB spectrum efficiently given the CAD mesh of a part, achieving significant computational savings without sacrificing accuracy, demonstrating that the LB spectrum is learnable. The proposed Graph Neural Network architecture uses a rich set of part mesh features - including Gaussian curvature, mean curvature, and principal curvatures. In addition to our trained network, we make available, for repeatability, a large curated dataset of real-world mechanical CAD models derived from the publicly available ABC dataset used for training and testing. Experimental results show that our method reduces computation time of the LB spectrum by approximately 5 times over linear FEM while delivering competitive accuracy.

replace A Survey on Event Prediction Methods from a Systems Perspective: Bringing Together Disparate Research Areas

Authors: Janik-Vasily Benzin, Stefanie Rinderle-Ma

Abstract: Event prediction is the ability of anticipating future events, i.e., future real-world occurrences, and aims to support the user in deciding on actions that change future events towards a desired state. An event prediction method learns the relation between features of past events and future events. It is applied to newly observed events to predict corresponding future events that are evaluated with respect to the user's desired future state. If the predicted future events do not comply with this state, actions are taken towards achieving desirable future states. Evidently, event prediction is valuable in many application domains such as business and natural disasters. The diversity of application domains results in a diverse range of methods that are scattered across various research areas which, in turn, use different terminology for event prediction methods. Consequently, sharing methods and knowledge for developing future event prediction methods is restricted. To facilitate knowledge sharing on account of a comprehensive integration and assessment of event prediction methods, we take a systems perspective to integrate event prediction methods into a single system, elicit requirements, and assess existing work with respect to the requirements. Based on the assessment, we identify open challenges and discuss future research directions.

replace Stepwise functional refoundation of relational concept analysis

Authors: J\'er\^ome Euzenat (MOEX)

Abstract: Relational concept analysis (RCA) is an extension of formal concept analysis allowing to deal with several related contexts simultaneously. It has been designed for learning description logic theories from data and used within various applications. A puzzling observation about RCA is that it returns a single family of concept lattices although, when the data feature circular dependencies, other solutions may be considered acceptable. The semantics of RCA, provided in an operational way, does not shed light on this issue. In this report, we define these acceptable solutions as those families of concept lattices which belong to the space determined by the initial contexts (well-formed), cannot scale new attributes (saturated), and refer only to concepts of the family (self-supported). We adopt a functional view on the RCA process by defining the space of well-formed solutions and two functions on that space: one expansive and the other contractive. We show that the acceptable solutions are the common fixed points of both functions. This is achieved step-by-step by starting from a minimal version of RCA that considers only one single context defined on a space of contexts and a space of lattices. These spaces are then joined into a single space of context-lattice pairs, which is further extended to a space of indexed families of context-lattice pairs representing the objects manippulated by RCA. We show that RCA returns the least element of the set of acceptable solutions. In addition, it is possible to build dually an operation that generates its greatest element. The set of acceptable solutions is a complete sublattice of the interval between these two elements. Its structure and how the defined functions traverse it are studied in detail.

replace OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Authors: Jian Hu, Xibin Wu, Wei Shen, Jason Klein Liu, Zilin Zhu, Weixun Wang, Songlin Jiang, Haoran Wang, Hao Chen, Bin Chen, Weikai Fang, Xianyu, Yu Cao, Haotian Xu, Yiming Liu

Abstract: Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce OpenRLHF, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency with speedups ranging from 1.22x to 1.68x across different model sizes compared to state-of-the-art frameworks, while requiring significantly fewer lines of code for implementation. OpenRLHF is publicly available at https://github.com/OpenRLHF/OpenRLHF, and has already been adopted by leading institutions to accelerate RLHF research and learning.

URLs: https://github.com/OpenRLHF/OpenRLHF,

replace Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Authors: Yilun Hao, Yang Zhang, Chuchu Fan

Abstract: While large language models (LLMs) have recently demonstrated strong potential in solving planning problems, there is a trade-off between flexibility and complexity. LLMs, as zero-shot planners themselves, are still not capable of directly generating valid plans for complex planning problems such as multi-constraint or long-horizon tasks. On the other hand, many frameworks aiming to solve complex planning problems often rely on task-specific preparatory efforts, such as task-specific in-context examples and pre-defined critics/verifiers, which limits their cross-task generalization capability. In this paper, we tackle these challenges by observing that the core of many planning problems lies in optimization problems: searching for the optimal solution (best plan) with goals subject to constraints (preconditions and effects of decisions). With LLMs' commonsense, reasoning, and programming capabilities, this opens up the possibilities of a universal LLM-based approach to planning problems. Inspired by this observation, we propose LLMFP, a general-purpose framework that leverages LLMs to capture key information from planning problems and formally formulate and solve them as optimization problems from scratch, with no task-specific examples needed. We apply LLMFP to 9 planning problems, ranging from multi-constraint decision making to multi-step planning problems, and demonstrate that LLMFP achieves on average 83.7% and 86.8% optimal rate across 9 tasks for GPT-4o and Claude 3.5 Sonnet, significantly outperforming the best baseline (direct planning with OpenAI o1-preview) with 37.6% and 40.7% improvements. We also validate components of LLMFP with ablation experiments and analyzed the underlying success and failure reasons. Project page: https://sites.google.com/view/llmfp.

URLs: https://sites.google.com/view/llmfp.

replace Can adversarial attacks by large language models be attributed?

Authors: Manuel Cebrian, Andres Abeliuk, Jan Arne Telle

Abstract: Attributing outputs from Large Language Models (LLMs) in adversarial settings-such as cyberattacks and disinformation campaigns-presents significant challenges that are likely to grow in importance. We approach this attribution problem from both a theoretical and an empirical perspective, drawing on formal language theory (identification in the limit) and data-driven analysis of the expanding LLM ecosystem. By modeling an LLM's set of possible outputs as a formal language, we analyze whether finite samples of text can uniquely pinpoint the originating model. Our results show that, under mild assumptions of overlapping capabilities among models, certain classes of LLMs are fundamentally non-identifiable from their outputs alone. We delineate four regimes of theoretical identifiability: (1) an infinite class of deterministic (discrete) LLM languages is not identifiable (Gold's classical result from 1967); (2) an infinite class of probabilistic LLMs is also not identifiable (by extension of the deterministic case); (3) a finite class of deterministic LLMs is identifiable (consistent with Angluin's tell-tale criterion); and (4) even a finite class of probabilistic LLMs can be non-identifiable (we provide a new counterexample establishing this negative result). Complementing these theoretical insights, we quantify the explosion in the number of plausible model origins (hypothesis space) for a given output in recent years. Even under conservative assumptions-each open-source model fine-tuned on at most one new dataset-the count of distinct candidate models doubles approximately every 0.5 years, and allowing multi-dataset fine-tuning combinations yields doubling times as short as 0.28 years. This combinatorial growth, alongside the extraordinary computational cost of brute-force likelihood attribution across all models and potential users, renders exhaustive attribution infeasible in practice.

replace Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic Leading

Authors: Hoang-Dung Bui, Erion Plaku, Gregoy J. Stein

Abstract: This paper proposes a novel planning framework to handle a multi-agent pathfinding problem under team-connected communication constraint, where all agents must have a connected communication channel to the rest of the team during their entire movements. Standard multi-agent path finding approaches (e.g., priority-based search) have potential in this domain but fail when neighboring configurations at start and goal differ. Their single-expansion approach -- computing each agent's path from the start to the goal in just a single expansion -- cannot reliably handle planning under communication constraints for agents as their neighbors change during navigating. Similarly, leader-follower approaches (e.g., platooning) are effective at maintaining team communication, but fixing the leader at the outset of planning can cause planning to become stuck in dense-clutter environments, limiting their practical utility. To overcome this limitation, we propose a novel two-level multi-agent pathfinding framework that integrates two techniques: adaptive path expansion to expand agent paths to their goals in multiple stages; and dynamic leading technique that enables the reselection of the leading agent during each agent path expansion whenever progress cannot be made. Simulation experiments show the efficiency of our planners, which can handle up to 25 agents across five environment types under a limited communication range constraint and up to 11-12 agents on three environment types under line-of-sight communication constraint, exceeding 90% success-rate where baselines routinely fail.

replace FinSphere, a Real-Time Stock Analysis Agent Powered by Instruction-Tuned LLMs and Domain Tools

Authors: Shijie Han, Jingshu Zhang, Yiqing Shen, Kaiyuan Yan, Hongguang Li

Abstract: Current financial large language models (FinLLMs) struggle with two critical limitations: the absence of objective evaluation metrics to assess the quality of stock analysis reports and a lack of depth in stock analysis, which impedes their ability to generate professional-grade insights. To address these challenges, this paper introduces FinSphere, a stock analysis agent, along with three major contributions: (1) AnalyScore, a systematic evaluation framework for assessing stock analysis quality, (2) Stocksis, a dataset curated by industry experts to enhance LLMs' stock analysis capabilities, and (3) FinSphere, an AI agent that can generate high-quality stock analysis reports in response to user queries. Experiments demonstrate that FinSphere achieves superior performance compared to both general and domain-specific LLMs, as well as existing agent-based systems, even when they are enhanced with real-time data access and few-shot guidance. The integrated framework, which combines real-time data feeds, quantitative tools, and an instruction-tuned LLM, yields substantial improvements in both analytical quality and practical applicability for real-world stock analysis.

replace Hybrid Quantum-Classical Multi-Agent Pathfinding

Authors: Thore Gerlach, Loong Kuan Lee, Fr\'ed\'eric Barbaresco, Nico Piatkowski

Abstract: Multi-Agent Path Finding (MAPF) focuses on determining conflict-free paths for multiple agents navigating through a shared space to reach specified goal locations. This problem becomes computationally challenging, particularly when handling large numbers of agents, as frequently encountered in practical applications like coordinating autonomous vehicles. Quantum Computing (QC) is a promising candidate in overcoming such limits. However, current quantum hardware is still in its infancy and thus limited in terms of computing power and error robustness. In this work, we present the first optimal hybrid quantum-classical MAPF algorithms which are based on branch-andcut-and-price. QC is integrated by iteratively solving QUBO problems, based on conflict graphs. Experiments on actual quantum hardware and results on benchmark data suggest that our approach dominates previous QUBO formulationsand state-of-the-art MAPF solvers.

replace ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

Authors: Shaofei Cai, Zhancun Mu, Anji Liu, Yitao Liang

Abstract: We aim to develop a goal specification method that is semantically clear, spatially sensitive, domain-agnostic, and intuitive for human users to guide agent interactions in 3D environments. Specifically, we propose a novel cross-view goal alignment framework that allows users to specify target objects using segmentation masks from their camera views rather than the agent's observations. We highlight that behavior cloning alone fails to align the agent's behavior with human intent when the human and agent camera views differ significantly. To address this, we introduce two auxiliary objectives: cross-view consistency loss and target visibility loss, which explicitly enhance the agent's spatial reasoning ability. According to this, we develop ROCKET-2, a state-of-the-art agent trained in Minecraft, achieving an improvement in the efficiency of inference 3x to 6x compared to ROCKET-1. We show that ROCKET-2 can directly interpret goals from human camera views, enabling better human-agent interaction. Remarkably, ROCKET-2 demonstrates zero-shot generalization capabilities: despite being trained exclusively on the Minecraft dataset, it can adapt and generalize to other 3D environments like Doom, DMLab, and Unreal through a simple action space mapping.

replace Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

Authors: Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, Wanxiang Che

Abstract: Recent advancements in reasoning with large language models (RLLMs), such as OpenAI-O1 and DeepSeek-R1, have demonstrated their impressive capabilities in complex domains like mathematics and coding. A central factor in their success lies in the application of long chain-of-thought (Long CoT) characteristics, which enhance reasoning abilities and enable the solution of intricate problems. However, despite these developments, a comprehensive survey on Long CoT is still lacking, limiting our understanding of its distinctions from traditional short chain-of-thought (Short CoT) and complicating ongoing debates on issues like "overthinking" and "inference-time scaling." This survey seeks to fill this gap by offering a unified perspective on Long CoT. (1) We first distinguish Long CoT from Short CoT and introduce a novel taxonomy to categorize current reasoning paradigms. (2) Next, we explore the key characteristics of Long CoT: deep reasoning, extensive exploration, and feasible reflection, which enable models to handle more complex tasks and produce more efficient, coherent outcomes compared to the shallower Short CoT. (3) We then investigate key phenomena such as the emergence of Long CoT with these characteristics, including overthinking, and inference-time scaling, offering insights into how these processes manifest in practice. (4) Finally, we identify significant research gaps and highlight promising future directions, including the integration of multi-modal reasoning, efficiency improvements, and enhanced knowledge frameworks. By providing a structured overview, this survey aims to inspire future research and further the development of logical reasoning in artificial intelligence.

replace SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning

Authors: Edward Y. Chang, Longling Geng

Abstract: This paper introduces SagaLLM, a structured multi-agent architecture designed to address four foundational limitations of current LLM-based planning systems: unreliable self-validation, context loss, lack of transactional safeguards, and insufficient inter-agent coordination. While recent frameworks leverage LLMs for task decomposition and multi-agent communication, they often fail to ensure consistency, rollback, or constraint satisfaction across distributed workflows. SagaLLM bridges this gap by integrating the Saga transactional pattern with persistent memory, automated compensation, and independent validation agents. It leverages LLMs' generative reasoning to automate key tasks traditionally requiring hand-coded coordination logic, including state tracking, dependency analysis, log schema generation, and recovery orchestration. Although SagaLLM relaxes strict ACID guarantees, it ensures workflow-wide consistency and recovery through modular checkpointing and compensable execution. Empirical evaluations across planning domains demonstrate that standalone LLMs frequently violate interdependent constraints or fail to recover from disruptions. In contrast, SagaLLM achieves significant improvements in consistency, validation accuracy, and adaptive coordination under uncertainty, establishing a robust foundation for real-world, scalable LLM-based multi-agent systems.

replace Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning

Authors: Xinyi Wang, Shawn Tan, Mingyu Jin, William Yang Wang, Rameswar Panda, Yikang Shen

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks requiring complex reasoning. However, the effects of scaling on their reasoning abilities remain insufficiently understood. In this paper, we introduce a synthetic multihop reasoning environment designed to closely replicate the structure and distribution of real-world large-scale knowledge graphs. Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios. To evaluate this, we pretrain language models (LMs) from scratch solely on triples from the incomplete graph and assess their ability to infer the missing edges. Interestingly, we observe that overparameterization can impair reasoning performance due to excessive memorization. We investigate different factors that affect this U-shaped loss curve, including graph structure, model size, and training steps. To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size. This work provides new insights into the relationship between scaling and reasoning in LLMs, shedding light on possible ways to optimize their performance for reasoning tasks.

replace AI-Driven Scholarly Peer Review via Persistent Workflow Prompting, Meta-Prompting, and Meta-Reasoning

Authors: Evgeny Markhasin

Abstract: Critical peer review of scientific manuscripts presents a significant challenge for Large Language Models (LLMs), partly due to data limitations and the complexity of expert reasoning. This report introduces Persistent Workflow Prompting (PWP), a potentially broadly applicable prompt engineering methodology designed to bridge this gap using standard LLM chat interfaces (zero-code, no APIs). We present a proof-of-concept PWP prompt for the critical analysis of experimental chemistry manuscripts, featuring a hierarchical, modular architecture (structured via Markdown) that defines detailed analysis workflows. We develop this PWP prompt through iterative application of meta-prompting techniques and meta-reasoning aimed at systematically codifying expert review workflows, including tacit knowledge. Submitted once at the start of a session, this PWP prompt equips the LLM with persistent workflows triggered by subsequent queries, guiding modern reasoning LLMs through systematic, multimodal evaluations. Demonstrations show the PWP-guided LLM identifying major methodological flaws in a test case while mitigating LLM input bias and performing complex tasks, including distinguishing claims from evidence, integrating text/photo/figure analysis to infer parameters, executing quantitative feasibility checks, comparing estimates against claims, and assessing a priori plausibility. To ensure transparency and facilitate replication, we provide full prompts, detailed demonstration analyses, and logs of interactive chats as supplementary resources. Beyond the specific application, this work offers insights into the meta-development process itself, highlighting the potential of PWP, informed by detailed workflow formalization, to enable sophisticated analysis using readily available LLMs for complex scientific tasks.

replace The end of radical concept nativism

Authors: Joshua S. Rule, Steven T. Piantadosi

Abstract: Though humans seem to be remarkable learners, arguments in cognitive science and philosophy of mind have long maintained that learning something fundamentally new is impossible. Specifically, Jerry Fodor's arguments for radical concept nativism hold that most, if not all, concepts are innate and that what many call concept learning never actually leads to the acquisition of new concepts. These arguments have deeply affected cognitive science, and many believe that the counterarguments to radical concept nativism have been either unsuccessful or only apply to a narrow class of concepts. This paper first reviews the features and limitations of prior arguments. We then identify three critical points - related to issues of expressive power, conceptual structure, and concept possession - at which the arguments in favor of radical concept nativism diverge from describing actual human cognition. We use ideas from computer science and information theory to formalize the relevant ideas in ways that are arguably more scientifically productive. We conclude that, as a result, there is an important sense in which people do indeed learn new concepts.

replace TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Authors: Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis

Abstract: Agentic AI systems, built upon large language models (LLMs) and deployed in multi-agent configurations, are redefining intelligence, autonomy, collaboration, and decision-making across enterprise and societal domains. This review presents a structured analysis of \textbf{Trust, Risk, and Security Management (TRiSM)} in the context of LLM-based Agentic Multi-Agent Systems (AMAS). We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents. We then adapt and extend the AI TRiSM framework for Agentic AI, structured around four key pillars: Explainability, ModelOps, Security, Privacy and Governance, each contextualized to the challenges of multi-agent LLM systems. A novel risk taxonomy is proposed to capture the unique threats and vulnerabilities of Agentic AI, ranging from coordination failures to prompt-based adversarial manipulation. To support practical assessment in Agentic AI works, we introduce two novel metrics: the Component Synergy Score (CSS), which quantifies the quality of inter-agent collaboration, and the Tool Utilization Efficacy (TUE), which evaluates the efficiency of tool use within agent workflows. We further discuss strategies for improving explainability in Agentic AI , as well as approaches to enhancing security and privacy through encryption, adversarial robustness, and regulatory compliance. The review concludes with a research roadmap for the responsible development and deployment of Agentic AI, outlining critical directions to align emerging systems with TRiSM principles for safe, transparent, and accountable operation.

replace SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents

Authors: Jonathan Kutasov, Yuqi Sun, Paul Colognese, Teun van der Weij, Linda Petrini, Chen Bo Calvin Zhang, John Hughes, Xiang Deng, Henry Sleight, Tyler Tracy, Buck Shlegeris, Joe Benton

Abstract: As Large Language Models (LLMs) are increasingly deployed as autonomous agents in complex and long horizon settings, it is critical to evaluate their ability to sabotage users by pursuing hidden objectives. We study the ability of frontier LLMs to evade monitoring and achieve harmful hidden goals while completing a wide array of realistic tasks. We evaluate a broad range of frontier LLMs using SHADE (Subtle Harmful Agent Detection & Evaluation)-Arena, the first highly diverse agent evaluation dataset for sabotage and monitoring capabilities of LLM agents. SHADE-Arena consists of complex pairs of benign main tasks and harmful side objectives in complicated environments. Agents are evaluated on their ability to complete the side task without appearing suspicious to an LLM monitor. When measuring agent ability to (a) complete the main task, (b) complete the side task, and (c) avoid detection, we find that the best performing frontier models score 27% (Claude 3.7 Sonnet) and 15% (Gemini 2.5 Pro) as sabotage agents when overseen by Claude 3.6 Sonnet. For current frontier models, success on the side task relies heavily on having access to a hidden scratchpad that is not visible to the monitor. We also use SHADE-Arena to measure models' monitoring abilities, with the top monitor (Gemini 2.5 Pro) achieving an AUC of 0.87 at distinguishing benign and malign transcripts. We find that for now, models still struggle at sabotage due to failures in long-context main task execution. However, our measurements already demonstrate the difficulty of monitoring for subtle sabotage attempts, which we expect to only increase in the face of more complex and longer-horizon tasks.

replace Establishing Best Practices for Building Rigorous Agentic Benchmarks

Authors: Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, Andy Zhang, Shu Liu, Sasha Cui, Sayash Kapoor, Shayne Longpre, Kevin Meng, Rebecca Weiss, Fazl Barez, Rahul Gupta, Jwala Dhamala, Jacob Merizian, Mario Giulianelli, Harry Coppock, Cozmin Ududec, Jasjeet Sekhon, Jacob Steinhardt, Antony Kellerman, Sarah Schwettmann, Matei Zaharia, Ion Stoica, Percy Liang, Daniel Kang

Abstract: Benchmarks are essential for quantitatively tracking progress in AI. As AI agents become increasingly capable, researchers and practitioners have introduced agentic benchmarks to evaluate agents on complex, real-world tasks. These benchmarks typically measure agent capabilities by evaluating task outcomes via specific reward designs. However, we show that many agentic benchmarks have issues in task setup or reward design. For example, SWE-bench Verified uses insufficient test cases, while TAU-bench counts empty responses as successful. Such issues can lead to under- or overestimation of agents' performance by up to 100% in relative terms. To make agentic evaluation rigorous, we introduce the Agentic Benchmark Checklist (ABC), a set of guidelines that we synthesized from our benchmark-building experience, a survey of best practices, and previously reported issues. When applied to CVE-Bench, a benchmark with a particularly complex evaluation design, ABC reduces the performance overestimation by 33%.

replace Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models

Authors: Yifan Jiang, Yibo Xue, Yukun Kang, Pin Zheng, Jian Peng, Feiran Wu, Changliang Xu

Abstract: Slide animations, such as fade-in, fly-in, and wipe, are critical for audience engagement, efficient information delivery, and vivid visual expression. However, most AI-driven slide-generation tools still lack native animation support, and existing vision-language models (VLMs) struggle with animation tasks due to the absence of public datasets and limited temporal-reasoning capabilities. To address this gap, we release the first public dataset for slide-animation modeling: 12,000 triplets of natural-language descriptions, animation JSON files, and rendered videos, collectively covering every built-in PowerPoint effect. Using this resource, we fine-tune Qwen-2.5-VL-7B with Low-Rank Adaptation (LoRA) and achieve consistent improvements over GPT-4.1 and Gemini-2.5-Pro in BLEU-4, ROUGE-L, SPICE, and our Coverage-Order-Detail Assessment (CODA) metric, which evaluates action coverage, temporal order, and detail fidelity. On a manually created test set of slides, the LoRA model increases BLEU-4 by around 60%, ROUGE-L by 30%, and shows significant improvements in CODA-detail. This demonstrates that low-rank adaptation enables reliable temporal reasoning and generalization beyond synthetic data. Overall, our dataset, LoRA-enhanced model, and CODA metric provide a rigorous benchmark and foundation for future research on VLM-based dynamic slide generation.

replace MedGellan: LLM-Generated Medical Guidance to Support Physicians

Authors: Debodeep Banerjee, Burcu Sayin, Stefano Teso, Andrea Passerini

Abstract: Medical decision-making is a critical task, where errors can result in serious, potentially life-threatening consequences. While full automation remains challenging, hybrid frameworks that combine machine intelligence with human oversight offer a practical alternative. In this paper, we present MedGellan, a lightweight, annotation-free framework that uses a Large Language Model (LLM) to generate clinical guidance from raw medical records, which is then used by a physician to predict diagnoses. MedGellan uses a Bayesian-inspired prompting strategy that respects the temporal order of clinical data. Preliminary experiments show that the guidance generated by the LLM with MedGellan improves diagnostic performance, particularly in recall and $F_1$ score.

replace Fuzzy Classification Aggregation for a Continuum of Agents

Authors: Zijun Meng

Abstract: We prove that any optimal, independent, and zero unanimous fuzzy classification aggregation function of a continuum of individual classifications of $m\ge 3$ objects into $2\le p\le m$ types must be a weighted arithmetic mean.

replace Modeling (Deontic) Modal Operators With the s(CASP) Goal-directed Predicate Answer Set Programming System

Authors: Gopal Gupta, Abhiramon Rajasekharan, Alexis R. Tudor, Elmer Salazar, Joaqu\'in Arias

Abstract: We consider the problem of implementing deontic modal logic. We show how (deontic) modal operators can be expressed elegantly using default negation (negation-as-failure) and strong negation present in answer set programming (ASP). We propose using global constraints of ASP to represent obligations and impermissibilities of deontic modal logic. We show that our proposed representation results in the various paradoxes of deontic modal logic being elegantly resolved.

replace GTA1: GUI Test-time Scaling Agent

Authors: Yan Yang, Dongxu Li, Yutong Dai, Yuhao Yang, Ziyang Luo, Zirui Zhao, Zhiyuan Hu, Junzhe Huang, Amrita Saha, Zeyuan Chen, Ran Xu, Liyuan Pan, Caiming Xiong, Junnan Li

Abstract: Graphical user interface (GUI) agents autonomously operate across platforms (e.g., Linux) to complete tasks by interacting with visual elements. Specifically, a user instruction is decomposed into a sequence of action proposals, each corresponding to an interaction with the GUI. After each action, the agent observes the updated GUI environment to plan the next step. However, two main challenges arise: i) resolving ambiguity in task planning (i.e., the action proposal sequence), where selecting an appropriate plan is non-trivial, as many valid ones may exist; ii) accurately grounding actions in complex and high-resolution interfaces, i.e., precisely interacting with visual targets. This paper investigates the two aforementioned challenges with our GUI Test-time Scaling Agent, namely GTA1. First, to select the most appropriate action proposal, we introduce a test-time scaling method. At each step, we sample multiple candidate action proposals and leverage a judge model to evaluate and select the most suitable one. It trades off computation for better decision quality by concurrent sampling, shortening task execution steps, and improving overall performance. Second, we propose a model that achieves improved accuracy when grounding the selected action proposal to its corresponding visual elements. Our key insight is that reinforcement learning (RL) facilitates visual grounding through inherent objective alignments, rewarding successful clicks on interface elements. Experimentally, our method establishes state-of-the-art performance across diverse benchmarks. For example, GTA1-7B achieves 50.1%, 92.4%, and 67.7% accuracies on Screenspot-Pro, Screenspot-V2, and OSWorld-G, respectively. When paired with a planner applying our test-time scaling strategy, it exhibits state-of-the-art agentic performance (e.g., 45.2% task success rate on OSWorld). We open-source our code and models here.

replace A Wireless Foundation Model for Multi-Task Prediction

Authors: Yucheng Sheng, Jiacheng Wang, Xingyu Zhou, Le Liang, Hao Ye, Shi Jin, Geoffrey Ye Li

Abstract: With the growing complexity and dynamics of the mobile communication networks, accurately predicting key system parameters, such as channel state information (CSI), user location, and network traffic, has become essential for a wide range of physical (PHY)-layer and medium access control (MAC)-layer tasks. Although traditional deep learning (DL)-based methods have been widely applied to such prediction tasks, they often struggle to generalize across different scenarios and tasks. In response, we propose a unified foundation model for multi-task prediction in wireless networks that supports diverse prediction intervals. The proposed model enforces univariate decomposition to unify heterogeneous tasks, encodes granularity for interval awareness, and uses a causal Transformer backbone for accurate predictions. Additionally, we introduce a patch masking strategy during training to support arbitrary input lengths. After trained on large-scale datasets, the proposed foundation model demonstrates strong generalization to unseen scenarios and achieves zero-shot performance on new tasks that surpass traditional full-shot baselines.

replace FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models

Authors: Bo Pang, Yalu Ouyang, Hangfei Xu, Ziqi Jia, Panpan Li, Shengzhao Wen, Lu Wang, Shiyong Li, Yanpeng Wang

Abstract: Advancements in reasoning for large language models (LLMs) have lead to significant performance improvements for LLMs in various fields such as mathematics and programming. However, research applying these advances to the financial domain, where considerable domain-specific knowledge is necessary to complete tasks, remains limited. To address this gap, we introduce FEVO (Financial Evolution), a multi-stage enhancement framework developed to enhance LLM performance in the financial domain. FEVO systemically enhances LLM performance by using continued pre-training (CPT) to expand financial domain knowledge, supervised fine-tuning (SFT) to instill structured, elaborate reasoning patterns, and reinforcement learning (RL) to further integrate the expanded financial domain knowledge with the learned structured reasoning. To ensure effective and efficient training, we leverage frontier reasoning models and rule-based filtering to curate FEVO-Train, high-quality datasets specifically designed for the different post-training phases. Using our framework, we train the FEVO series of models - C32B, S32B, R32B - from Qwen2.5-32B and evaluate them on seven benchmarks to assess financial and general capabilities, with results showing that FEVO-R32B achieves state-of-the-art performance on five financial benchmarks against much larger models as well as specialist models. More significantly, FEVO-R32B demonstrates markedly better performance than FEVO-R32B-0 (trained from Qwen2.5-32B-Instruct using only RL), thus validating the effectiveness of financial domain knowledge expansion and structured, logical reasoning distillation

replace-cross Efficient Transfer Learning via Causal Bounds

Authors: Xueping Gong, Wei You, Jiheng Zhang

Abstract: Transfer learning seeks to accelerate sequential decision-making by leveraging offline data from related agents. However, data from heterogeneous sources that differ in observed features, distributions, or unobserved confounders often render causal effects non-identifiable and bias naive estimators. We address this by forming ambiguity sets of structural causal models defined via integral constraints on their joint densities. Optimizing any causal effect over these sets leads to generally non-convex programs whose solutions tightly bound the range of possible effects under heterogeneity or confounding. To solve these programs efficiently, we develop a hit-and-run sampler that explores the entire ambiguity set and, when paired with a local optimization oracle, produces causal bound estimates that converge almost surely to the true limits. We further accommodate estimation error by relaxing the ambiguity set and exploit the Lipschitz continuity of causal effects to establish precise error propagation guarantees. These causal bounds are then embedded into bandit algorithms via arm elimination and truncated UCB indices, yielding optimal gap-dependent and minimax regret bounds. To handle estimation error, we also develop a safe algorithm for incorporating noisy causal bounds. In the contextual-bandit setting with function approximation, our method uses causal bounds to prune both the function class and the per-context action set, achieving matching upper and lower regret bounds with only logarithmic dependence on function-class complexity. Our analysis precisely characterizes when and how causal side-information accelerates online learning, and experiments on synthetic benchmarks confirm substantial regret reductions in data-scarce or confounded regimes.

replace-cross Enhancing Plasticity for First Session Adaptation Continual Learning

Authors: Imad Eddine Marouf, Subhankar Roy, St\'ephane Lathuili\`ere, Enzo Tartaglione

Abstract: The integration of large pre-trained models (PTMs) into Class-Incremental Learning (CIL) has facilitated the development of computationally efficient strategies such as First-Session Adaptation (FSA), which fine-tunes the model solely on the first task while keeping it frozen for subsequent tasks. Although effective in homogeneous task sequences, these approaches struggle when faced with the heterogeneity of real-world task distributions. We introduce Plasticity-Enhanced Test-Time Adaptation in Class-Incremental Learning (PLASTIC), a method that reinstates plasticity in CIL while preserving model stability. PLASTIC leverages Test-Time Adaptation (TTA) by dynamically fine-tuning LayerNorm parameters on unlabeled test data, enabling adaptability to evolving tasks and improving robustness against data corruption. To prevent TTA-induced model divergence and maintain stable learning across tasks, we introduce a teacher-student distillation framework, ensuring that adaptation remains controlled and generalizable. Extensive experiments across multiple benchmarks demonstrate that PLASTIC consistently outperforms both conventional and state-of-the-art PTM-based CIL approaches, while also exhibiting inherent robustness to data corruptions. Code is available at: https://github.com/IemProg/PLASTIC.

URLs: https://github.com/IemProg/PLASTIC.

replace-cross Geometric Constraints in Deep Learning Frameworks: A Survey

Authors: Vibhas K Vats, David J Crandall

Abstract: Stereophotogrammetry is an established technique for scene understanding. Its origins go back to at least the 1800s when people first started to investigate using photographs to measure the physical properties of the world. Since then, thousands of approaches have been explored. The classic geometric technique of Shape from Stereo is built on using geometry to define constraints on scene and camera deep learning without any attempt to explicitly model the geometry. In this survey, we explore geometry-inspired deep learning-based frameworks. We compare and contrast geometry enforcing constraints integrated into deep learning frameworks for depth estimation and other closely related vision tasks. We present a new taxonomy for prevalent geometry enforcing constraints used in modern deep learning frameworks. We also present insightful observations and potential future research directions.

replace-cross Semantic Augmentation in Images using Language

Authors: Sahiti Yerramilli, Jayant Sravan Tamarapalli, Tanmay Girish Kulkarni, Jonathan Francis, Eric Nyberg

Abstract: Deep Learning models are incredibly data-hungry and require very large labeled datasets for supervised learning. As a consequence, these models often suffer from overfitting, limiting their ability to generalize to real-world examples. Recent advancements in diffusion models have enabled the generation of photorealistic images based on textual inputs. Leveraging the substantial datasets used to train these diffusion models, we propose a technique to utilize generated images to augment existing datasets. This paper explores various strategies for effective data augmentation to improve the out-of-domain generalization capabilities of deep learning models.

replace-cross Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints

Authors: Lei Guo, Wei Chen, Yuxuan Sun, Bo Ai, Nikolaos Pappas, Tony Q. S. Quek

Abstract: Diffusion models have been extensively utilized in AI-generated content (AIGC) in recent years, thanks to the superior generation capabilities. Combining with semantic communications, diffusion models are used for tasks such as denoising, data reconstruction, and content generation. However, existing diffusion-based generative models do not consider the stringent bandwidth limitation, which limits its application in wireless communication. This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model. Our designed architecture utilizes the diffusion model, where the signal transmission process through the wireless channel acts as the forward process in diffusion. To reduce bandwidth requirements, we incorporate a downsampling module and a paired upsampling module based on a variational auto-encoder with reparameterization at the receiver to ensure that the recovered features conform to the Gaussian distribution. Furthermore, we derive the loss function for our proposed system and evaluate its performance through comprehensive experiments. Our experimental results demonstrate significant improvements in pixel-level metrics such as peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS). These enhancements are more profound regarding the compression rates and SNR compared to deep joint source-channel coding (DJSCC). We release the code at https://github.com/import-sudo/Diffusion-Driven-Semantic-Communication.

URLs: https://github.com/import-sudo/Diffusion-Driven-Semantic-Communication.

replace-cross A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence

Authors: Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

Abstract: Policy gradient methods have become a staple of any single-agent reinforcement learning toolbox, due to their combination of desirable properties: iterate convergence, efficient use of stochastic trajectory feedback, and theoretically-sound avoidance of importance sampling corrections. In multi-agent imperfect-information settings (extensive-form games), however, it is still unknown whether the same desiderata can be guaranteed while retaining theoretical guarantees. Instead, sound methods for extensive-form games rely on approximating \emph{counterfactual} values (as opposed to Q values), which are incompatible with policy gradient methodologies. In this paper, we investigate whether policy gradient can be safely used in two-player zero-sum imperfect-information extensive-form games (EFGs). We establish positive results, showing for the first time that a policy gradient method leads to provable best-iterate convergence to a regularized Nash equilibrium in self-play.

replace-cross CodeMirage: Hallucinations in Code Generated by Large Language Models

Authors: Vibhor Agarwal, Yulong Pei, Salwa Alamir, Xiaomo Liu

Abstract: Large Language Models (LLMs) have shown promising potentials in program generation and no-code automation. However, LLMs are prone to generate hallucinations, i.e., they generate text which sounds plausible but is incorrect. Although there has been a recent surge in research on LLM hallucinations for text generation, similar hallucination phenomenon can happen in code generation. Sometimes the generated code can have syntactical or logical errors as well as more advanced issues like security vulnerabilities, memory leaks, etc. Given the wide adaptation of LLMs to enhance efficiency in code generation and development in general, it becomes imperative to investigate hallucinations in code generation. To the best of our knowledge, this is the first attempt at studying hallucinations in the code generated by LLMs. We start by introducing the code hallucination definition and a comprehensive taxonomy of code hallucination types. We propose the first benchmark CodeMirage dataset for code hallucinations. The benchmark contains 1,137 GPT-3.5 generated hallucinated code snippets for Python programming problems from two base datasets - HumanEval and MBPP. We then propose the methodology for code hallucination detection and experiment with open source LLMs such as CodeLLaMA as well as OpenAI's GPT-3.5 and GPT-4 models using one-shot prompt. We find that GPT-4 performs the best on HumanEval dataset and gives comparable results to the fine-tuned CodeBERT baseline on MBPP dataset. Towards the end, we discuss various mitigation strategies for code hallucinations and conclude our work.

replace-cross Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning

Authors: Daniel Fl\"ogel, Marcos G\'omez Villafa\~ne, Joshua Ransiek, S\"oren Hohmann

Abstract: Autonomous mobile robots are increasingly used in pedestrian-rich environments where safe navigation and appropriate human interaction are crucial. While Deep Reinforcement Learning (DRL) enables socially integrated robot behavior, challenges persist in novel or perturbed scenarios to indicate when and why the policy is uncertain. Unknown uncertainty in decision-making can lead to collisions or human discomfort and is one reason why safe and risk-aware navigation is still an open problem. This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL navigation framework for policy distribution uncertainty estimates. We, therefore, incorporate Observation-Dependent Variance (ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For different types of perturbations, we compare the ability of deep ensembles and Monte-Carlo dropout (MC-dropout) to estimate the uncertainties of the policy. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance. The results show improved training performance with ODV and dropout in PPO and reveal that the training scenario has an impact on the generalization. In addition, MC-dropout is more sensitive to perturbations and correlates the uncertainty type to the perturbation better. With the safe action selection, the robot can navigate in perturbed environments with fewer collisions.

replace-cross PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation

Authors: Yiren Liu, Pranav Sharma, Mehul Jitendra Oswal, Haijun Xia, Yun Huang

Abstract: Generating interdisciplinary research ideas requires diverse domain expertise, but access to timely feedback is often limited by the availability of experts. In this paper, we introduce PersonaFlow, a novel system designed to provide multiple perspectives by using LLMs to simulate domain-specific experts. Our user studies showed that the new design 1) increased the perceived relevance and creativity of ideated research directions, and 2) promoted users' critical thinking activities (e.g., interpretation, analysis, evaluation, inference, and self-regulation), without increasing their perceived cognitive load. Moreover, users' ability to customize expert profiles significantly improved their sense of agency, which can potentially mitigate their over-reliance on AI. This work contributes to the design of intelligent systems that augment creativity and collaboration, and provides design implications of using customizable AI-simulated personas in domains within and beyond research ideation.

replace-cross DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation

Authors: Xuewen Liu, Zhikai Li, Minhao Jiang, Mengjuan Chen, Jianquan Li, Qingyi Gu

Abstract: Model quantization is a promising method for accelerating and compressing diffusion models. Nevertheless, since post-training quantization (PTQ) fails catastrophically at low-bit cases, quantization-aware training (QAT) is essential. Unfortunately, the wide range and time-varying activations in diffusion models sharply increase the complexity of quantization, making existing QAT methods inefficient. Equivalent scaling can effectively reduce activation range, but previous methods remain the overall quantization error unchanged. More critically, these methods significantly disrupt the original weight distribution, resulting in poor weight initialization and challenging convergence during QAT training. In this paper, we propose a novel QAT framework for diffusion models, called DilateQuant. Specifically, we propose Weight Dilation (WD) that maximally dilates the unsaturated in-channel weights to a constrained range through equivalent scaling. WD decreases the activation range while preserving the original weight range, which steadily reduces the quantization error and ensures model convergence. To further enhance accuracy and efficiency, we design a Temporal Parallel Quantizer (TPQ) to address the time-varying activations and introduce a Block-wise Knowledge Distillation (BKD) to reduce resource consumption in training. Extensive experiments demonstrate that DilateQuant significantly outperforms existing methods in terms of accuracy and efficiency. Code is available at http://github.com/BienLuky/DilateQuant .

URLs: http://github.com/BienLuky/DilateQuant

replace-cross Tokenization for Molecular Foundation Models

Authors: Alexius Wadell, Anoushka Bhutani, Venkatasubramanian Viswanathan

Abstract: Text-based foundation models have become an important part of scientific discovery, with molecular foundation models accelerating advancements in material science and molecular design.However, existing models are constrained by closed-vocabulary tokenizers that capture only a fraction of molecular space. In this work, we systematically evaluate 34 tokenizers, including 19 chemistry-specific ones, and reveal significant gaps in their coverage of the SMILES molecular representation. To assess the impact of tokenizer choice, we introduce n-gram language models as a low-cost proxy and validate their effectiveness by pretraining and finetuning 18 RoBERTa-style encoders for molecular property prediction. To overcome the limitations of existing tokenizers, we propose two new tokenizers -- Smirk and Smirk-GPE -- with full coverage of the OpenSMILES specification. The proposed tokenizers systematically integrate nuclear, electronic, and geometric degrees of freedom; facilitating applications in pharmacology, agriculture, biology, and energy storage. Our results highlight the need for open-vocabulary modeling and chemically diverse benchmarks in cheminformatics.

replace-cross Breaking PEFT Limitations: Leveraging Weak-to-Strong Knowledge Transfer for Backdoor Attacks in LLMs

Authors: Shuai Zhao, Leilei Gan, Zhongliang Guo, Xiaobao Wu, Yanhao Jia, Luwei Xiao, Cong-Duy Nguyen, Luu Anh Tuan

Abstract: Despite being widely applied due to their exceptional capabilities, Large Language Models (LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce targeted vulnerabilities into LLMs by poisoning training samples and full-parameter fine-tuning (FPFT). However, this kind of backdoor attack is limited since they require significant computational resources, especially as the size of LLMs increases. Besides, parameter-efficient fine-tuning (PEFT) offers an alternative but the restricted parameter updating may impede the alignment of triggers with target labels. In this study, we first verify that backdoor attacks with PEFT may encounter challenges in achieving feasible performance. To address these issues and improve the effectiveness of backdoor attacks with PEFT, we propose a novel backdoor attack algorithm from the weak-to-strong based on Feature Alignment-enhanced Knowledge Distillation (FAKD). Specifically, we poison small-scale language models through FPFT to serve as the teacher model. The teacher model then covertly transfers the backdoor to the large-scale student model through FAKD, which employs PEFT. Theoretical analysis reveals that FAKD has the potential to augment the effectiveness of backdoor attacks. We demonstrate the superior performance of FAKD on classification tasks across four language models, four backdoor attack algorithms, and two different architectures of teacher models. Experimental results indicate success rates close to 100% for backdoor attacks targeting PEFT.

replace-cross The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot

Authors: Fangchen Song, Ashish Agarwal, Wen Wen

Abstract: Generative artificial intelligence (AI) enables automated content production, including coding in software development, which can significantly influence developer participation and performance. To explore its impact on collaborative open-source software (OSS) development, we investigate the role of GitHub Copilot, a generative AI pair programmer, in OSS development where multiple distributed developers voluntarily collaborate. Using GitHub's proprietary Copilot usage data, combined with public OSS repository data obtained from GitHub, we find that Copilot use increases project-level code contributions by 5.9%. This gain is driven by a 2.1% increase in individual code contributions and a 3.4% rise in developer coding participation. However, these benefits come at a cost as coordination time for code integration increases by 8% due to more code discussions enabled by AI pair programmers. This reveals an important tradeoff: While AI expands who can contribute and how much they contribute, it slows coordination in collective development efforts. Despite this tension, the combined effect of these two competing forces remains positive, indicating a net gain in overall project-level productivity from using AI pair programmers. Interestingly, we also find the effects differ across developer roles. Peripheral developers show relatively smaller gains in project-level code contributions and face a higher increase in coordination time than core developers, likely due to the difference in their project familiarity. In summary, our study underscores the dual role of AI pair programmers in affecting project-level code contributions and coordination time in OSS development. Our findings on the differential effects between core and peripheral developers also provide important implications for the structure of OSS communities in the long run.

replace-cross Pullback Flow Matching on Data Manifolds

Authors: Friso de Kruiff, Erik Bekkers, Ozan \"Oktem, Carola-Bibiane Sch\"onlieb, Willem Diepeveen

Abstract: We propose Pullback Flow Matching (PFM), a novel framework for generative modeling on data manifolds. Unlike existing methods that assume or learn restrictive closed-form manifold mappings for training Riemannian Flow Matching (RFM) models, PFM leverages pullback geometry and isometric learning to preserve the underlying manifold's geometry while enabling efficient generation and precise interpolation in latent space. This approach not only facilitates closed-form mappings on the data manifold but also allows for designable latent spaces, using assumed metrics on both data and latent manifolds. By enhancing isometric learning through Neural ODEs and proposing a scalable training objective, we achieve a latent space more suitable for interpolation, leading to improved manifold learning and generative performance. We demonstrate PFM's effectiveness through applications in synthetic data, protein dynamics and protein sequence data, generating novel proteins with specific properties. This method shows strong potential for drug discovery and materials science, where generating novel samples with specific properties is of great interest.

replace-cross Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity

Authors: Zhenglin Wan, Xingrui Yu, David Mark Bossens, Yueming Lyu, Qing Guo, Flint Xiaofeng Fan, Yew Soon Ong, Ivor Tsang

Abstract: Imitation learning (IL) has shown promise in robot locomotion but is often limited to learning a single expert policy, constraining behavior diversity and robustness in unpredictable real-world scenarios. To address this, we introduce Quality Diversity Inverse Reinforcement Learning (QD-IRL), a novel framework that integrates quality-diversity optimization with IRL methods, enabling agents to learn diverse behaviors from limited demonstrations. This work introduces Extrinsic Behavioral Curiosity (EBC), which allows agents to receive additional curiosity rewards from an external critic based on how novel the behaviors are with respect to a large behavioral archive. To validate the effectiveness of EBC in exploring diverse locomotion behaviors, we evaluate our method on multiple robot locomotion tasks. EBC improves the performance of QD-IRL instances with GAIL, VAIL, and DiffAIL across all included environments by up to 185%, 42%, and 150%, even surpassing expert performance by 20% in Humanoid. Furthermore, we demonstrate that EBC is applicable to Gradient-Arborescence-based Quality Diversity Reinforcement Learning (QD-RL) algorithms, where it substantially improves performance and provides a generic technique for diverse robot locomotion. The source code of this work is provided at https://github.com/vanzll/EBC.

URLs: https://github.com/vanzll/EBC.

replace-cross Hespi: A pipeline for automatically detecting information from hebarium specimen sheets

Authors: Robert Turnbull, Emily Fitzgerald, Karen Thompson, Joanne L. Birch

Abstract: Specimen-associated biodiversity data are crucial for biological, environmental, and conservation sciences. A rate shift is needed to extract data from specimen images efficiently, moving beyond human-mediated transcription. We developed `Hespi' (HErbarium Specimen sheet PIpeline) using advanced computer vision techniques to extract pre-catalogue data from primary specimen labels on herbarium specimens. Hespi integrates two object detection models: one for detecting the components of the sheet and another for fields on the primary primary specimen label. It classifies labels as printed, typed, handwritten, or mixed and uses Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) for extraction. The text is then corrected against authoritative taxon databases and refined using a multimodal Large Language Model (LLM). Hespi accurately detects and extracts text from specimen sheets across international herbaria, and its modular design allows users to train and integrate custom models.

replace-cross Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM

Authors: Jiachen Li, Xiwen Li, Justin Steinberg, Akshat Choube, Bingsheng Yao, Xuhai Xu, Dakuo Wang, Elizabeth Mynatt, Varun Mishra

Abstract: Passive tracking methods, such as phone and wearable sensing, have become dominant in monitoring human behaviors in modern ubiquitous computing studies. While there have been significant advances in machine-learning approaches to translate periods of raw sensor data to model momentary behaviors, (e.g., physical activity recognition), there still remains a significant gap in the translation of these sensing streams into meaningful, high-level, context-aware insights that are required for various applications (e.g., summarizing an individual's daily routine). To bridge this gap, experts often need to employ a context-driven sensemaking process in real-world studies to derive insights. This process often requires manual effort and can be challenging even for experienced researchers due to the complexity of human behaviors. We conducted three rounds of user studies with 21 experts to explore solutions to address challenges with sensemaking. We follow a human-centered design process to identify needs and design, iterate, build, and evaluate Vital Insight (VI), a novel, LLM-assisted, prototype system to enable human-in-the-loop inference (sensemaking) and visualizations of multi-modal passive sensing data from smartphones and wearables. Using the prototype as a technology probe, we observe experts' interactions with it and develop an expert sensemaking model that explains how experts move between direct data representations and AI-supported inferences to explore, question, and validate insights. Through this iterative process, we also synthesize and discuss a list of design implications for the design of future AI-augmented visualization systems to better assist experts' sensemaking processes in multi-modal health sensing data.

replace-cross CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback

Authors: Wenbo Zhang, Aditya Majumdar, Amulya Yadav

Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various NLP tasks but struggle with code-mixed (or code-switched) language understanding. For example, prior work benchmarking the performance of multilingual LLMs on code-mixed translation tasks has demonstrated that current state-of-the-art multilingual LLMs are ineffective in dealing with code-mixed languages. However, the question of how to improve the capability of multilingual LLMs to handle code-mixed language has not received any attention to date. In this paper, we tackle this research gap by proposing CHAI, a novel general-purpose framework for improving the ability of multilingual LLMs to handle code-mixed languages. CHAI relies on three novel contributions made in this paper. First, we explore the ability of LLMs to provide accurate annotations for code-mixed translation tasks. Second, we leverage this ability of LLMs as annotators to generate preference data for code-mixed translation tasks at scale, which are then used within a reinforcement learning from AI feedback (RLAIF) procedure to improve LLMs' capability on code-mixed tasks. Third, we conduct a rigorous experimental evaluation across various real-world datasets and settings. Our analysis shows that CHAI-powered LLMs outperform state-of-the-art open-source LLMs by 25.66% (in terms of win rate adjudicated by human annotators) in code-mixed translation tasks. This work represents a first step towards developing more inclusive code-mixed LLMs.

replace-cross AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework

Authors: Meihao Fan, Ju Fan, Nan Tang, Lei Cao, Guoliang Li, Xiaoyong Du

Abstract: Answering natural language (NL) questions about tables, known as Tabular Question Answering (TQA), is crucial because it allows users to quickly and efficiently extract meaningful insights from structured data, effectively bridging the gap between human language and machine-readable formats. Many of these tables are derived from web sources or real-world scenarios, which require meticulous data preparation (or data prep) to ensure accurate responses. However, preparing such tables for NL questions introduces new requirements that extend beyond traditional data preparation. This question-ware data preparation involves specific tasks such as column derivation and filtering tailored to particular questions, as well as question-aware value normalization or conversion, highlighting the need for a more nuanced approach in this context. Because each of the above tasks is unique, a single model (or agent) may not perform effectively across all scenarios. In this paper, we propose AutoPrep, a large language model (LLM)-based multiagent framework that leverages the strengths of multiple agents, each specialized in a certain type of data prep, ensuring more accurate and contextually relevant responses. Given an NL question over a table, AutoPrep performs data prep through three key components. Planner: Determines a logical plan, outlining a sequence of high-level operations. Programmer: Translates this logical plan into a physical plan by generating the corresponding low-level code. Executor: Executes the generated code to process the table. To support this multi-agent framework, we design a novel Chain-ofClauses reasoning mechanism for high-level operation suggestion, and a tool-augmented method for low-level code generation.

replace-cross Scaling 4D Representations

Authors: Jo\~ao Carreira, Dilara Gokay, Michael King, Chuhan Zhang, Ignacio Rocco, Aravindh Mahendran, Thomas Albert Keck, Joseph Heyward, Skanda Koppula, Etienne Pot, Goker Erdogan, Yana Hasson, Yi Yang, Klaus Greff, Guillaume Le Moing, Sjoerd van Steenkiste, Daniel Zoran, Drew A. Hudson, Pedro V\'elez, Luisa Polan\'ia, Luke Friedman, Chris Duvarney, Ross Goroshin, Kelsey Allen, Jacob Walker, Rishabh Kabra, Eric Aboussouan, Jennifer Sun, Thomas Kipf, Carl Doersch, Viorica P\u{a}tr\u{a}ucean, Dima Damen, Pauline Luc, Mehdi S. M. Sajjadi, Andrew Zisserman

Abstract: Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. In this paper we focus on evaluating self-supervised learning on non-semantic vision tasks that are more spatial (3D) and temporal (+1D = 4D), such as camera pose estimation, point and object tracking, and depth estimation. We show that by learning from very large video datasets, masked auto-encoding (MAE) with transformer video models actually scales, consistently improving performance on these 4D tasks, as model size increases from 20M all the way to the largest by far reported self-supervised video model $\unicode{x2013}$ 22B parameters. Rigorous apples-to-apples comparison with many recent image and video models demonstrates the benefits of scaling 4D representations. Pretrained models are available at https://github.com/google-deepmind/representations4d .

URLs: https://github.com/google-deepmind/representations4d

replace-cross Theme-Explanation Structure for Table Summarization using Large Language Models: A Case Study on Korean Tabular Data

Authors: TaeYoon Kwack, Jisoo Kim, Ki Yong Jung, DongGeon Lee, Heesun Park

Abstract: Tables are a primary medium for conveying critical information in administrative domains, yet their complexity hinders utilization by Large Language Models (LLMs). This paper introduces the Theme-Explanation Structure-based Table Summarization (Tabular-TX) pipeline, a novel approach designed to generate highly interpretable summaries from tabular data, with a specific focus on Korean administrative documents. Current table summarization methods often neglect the crucial aspect of human-friendly output. Tabular-TX addresses this by first employing a multi-step reasoning process to ensure deep table comprehension by LLMs, followed by a journalist persona prompting strategy for clear sentence generation. Crucially, it then structures the output into a Theme Part (an adverbial phrase) and an Explanation Part (a predicative clause), significantly enhancing readability. Our approach leverages in-context learning, obviating the need for extensive fine-tuning and associated labeled data or computational resources. Experimental results show that Tabular-TX effectively processes complex table structures and metadata, offering a robust and efficient solution for generating human-centric table summaries, especially in low-resource scenarios.

replace-cross Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Authors: Xin Xu, Yan Xu, Tianhao Chen, Yuchen Yan, Chengwu Liu, Zaoyu Chen, Yufei Wang, Yichun Yin, Yasheng Wang, Lifeng Shang, Qun Liu

Abstract: Existing approaches to mathematical reasoning with large language models (LLMs) rely on Chain-of-Thought (CoT) for generalizability or Tool-Integrated Reasoning (TIR) for precise computation. While efforts have been made to combine these methods, they primarily rely on post-selection or predefined strategies, leaving an open question: whether LLMs can autonomously adapt their reasoning strategy based on their inherent capabilities. In this work, we propose TATA (Teaching LLMs According to Their Aptitude), an adaptive framework that enables LLMs to personalize their reasoning strategy spontaneously, aligning it with their intrinsic aptitude. TATA incorporates base-LLM-aware data selection during supervised fine-tuning (SFT) to tailor training data to the model's unique abilities. This approach equips LLMs to autonomously determine and apply the appropriate reasoning strategy at test time. We evaluate TATA through extensive experiments on six mathematical reasoning benchmarks, using both general-purpose and math-specialized LLMs. Empirical results demonstrate that TATA effectively combines the complementary strengths of CoT and TIR, achieving superior or comparable performance with improved inference efficiency compared to TIR alone. Further analysis underscores the critical role of aptitude-aware data selection in enabling LLMs to make effective and adaptive reasoning decisions and align reasoning strategies with model capabilities.

replace-cross Multi-Attribute Steering of Language Models via Targeted Intervention

Authors: Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin, Mohit Bansal

Abstract: Inference-time intervention (ITI) has emerged as a promising method for steering large language model (LLM) behavior in a particular direction (e.g., improving helpfulness) by intervening on token representations without costly updates to the LLM's parameters. However, existing ITI approaches fail to scale to multi-attribute settings with conflicts, such as enhancing helpfulness while also reducing toxicity. To address this, we introduce Multi-Attribute Targeted Steering (MAT-Steer), a novel steering framework designed for selective token-level intervention across multiple attributes. MAT-Steer learns steering vectors using an alignment objective that shifts the model's internal representations of undesirable outputs closer to those of desirable ones while enforcing sparsity and orthogonality among vectors for different attributes, thereby reducing inter-attribute conflicts. We evaluate MAT-Steer in two distinct settings: (i) on question answering (QA) tasks where we balance attributes like truthfulness, bias, and toxicity; (ii) on generative tasks where we simultaneously improve attributes like helpfulness, correctness, and coherence. MAT-Steer outperforms existing ITI and parameter-efficient fine-tuning approaches across both task types (e.g., 3% average accuracy gain across QA tasks and 55.82% win rate against the best ITI baseline).

replace-cross Understanding Fixed Predictions via Confined Regions

Authors: Connor Lawless, Tsui-Wei Weng, Berk Ustun, Madeleine Udell

Abstract: Machine learning models can assign fixed predictions that preclude individuals from changing their outcome. Existing approaches to audit fixed predictions do so on a pointwise basis, which requires access to an existing dataset of individuals and may fail to anticipate fixed predictions in out-of-sample data. This work presents a new paradigm to identify fixed predictions by finding confined regions of the feature space in which all individuals receive fixed predictions. This paradigm enables the certification of recourse for out-of-sample data, works in settings without representative datasets, and provides interpretable descriptions of individuals with fixed predictions. We develop a fast method to discover confined regions for linear classifiers using mixed-integer quadratically constrained programming. We conduct a comprehensive empirical study of confined regions across diverse applications. Our results highlight that existing pointwise verification methods fail to anticipate future individuals with fixed predictions, while our method both identifies them and provides an interpretable description.

replace-cross Oscillation-Reduced MXFP4 Training for Vision Transformers

Authors: Yuxiang Chen, Haocheng Xi, Jun Zhu, Jianfei Chen

Abstract: Pre-training Transformers in FP4 precision is becoming a promising approach to gain substantial speedup, but it comes with a considerable loss of accuracy. Microscaling (MX) data format provides a fine-grained per-group quantization method to improve the representation ability of the FP4 format and is supported by the next-generation Blackwell GPU architecture. However, training with MXFP4 data format still results in significant degradation and there is a lack of systematic research on the reason. In this work, we propose a novel training method TetraJet for a more accurate FP4 training. We comprehensively evaluate all of the quantizers involved in the training, and identify the weight oscillation problem in the forward pass as the main source of the degradation in MXFP4 training. Therefore, we introduce two novel methods, EMA Quantizer (Q-EMA) and Adaptive Ramping Optimizer (Q-Ramping), to resolve the oscillation problem. Extensive experiments on Vision Transformers demonstrate that TetraJet consistently outperforms the existing 4-bit training methods, and Q-EMA & Q-Ramping can provide additional enhancement by effectively reducing oscillation. We decreased the accuracy degradation by more than $50\%$ compared to the baseline, and can even achieve competitive performance compared to full precision training. The codes are available at https://github.com/thu-ml/TetraJet-MXFP4Training

URLs: https://github.com/thu-ml/TetraJet-MXFP4Training

replace-cross Towards Enterprise-Ready Computer Using Generalist Agent

Authors: Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov, Ido Levy, Offer Akrabi, Aviad Sela, Asaf Adi, Nir Mashkif

Abstract: This paper presents our ongoing work toward developing an enterprise-ready Computer Using Generalist Agent (CUGA) system. Our research highlights the evolutionary nature of building agentic systems suitable for enterprise environments. By integrating state-of-the-art agentic AI techniques with a systematic approach to iterative evaluation, analysis, and refinement, we have achieved rapid and cost-effective performance gains, notably reaching a new state-of-the-art performance on the WebArena and AppWorld benchmarks. We detail our development roadmap, the methodology and tools that facilitated rapid learning from failures and continuous system refinement, and discuss key lessons learned and future challenges for enterprise adoption.

replace-cross GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

Authors: Aarush Sinha

Abstract: Integrating powerful but computationally expensive Pre-trained Language Models (PLMs) with Graph Neural Networks (GNNs) is a key challenge, especially on text-rich heterophilic graphs. We propose the Graph Masked Language Model (GMLM), a framework designed for the efficient and effective fusion of graph structure and text semantics. GMLM employs a two-stage process: first, a contrastive pre-training stage with a novel soft masking technique builds a robust multi-scale GNN; second, an end-to-end fine-tuning stage uses a dynamic active node selection strategy for scalability and a bi-directional cross-attention module for deep fusion. Experiments on five heterophilic benchmarks show GMLM achieves state-of-the-art results on four, significantly outperforming prior GNN and large LLM-based methods. For instance, it improves accuracy on the Texas dataset by over 8\% and on Wisconsin by nearly 5\%. Our work demonstrates that a sophisticated, deeply-integrated architecture can be more effective and efficient than larger, general-purpose models for text-rich graph representation learning.

replace-cross DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

Authors: Xirui Hu, Jiahao Wang, Hao Chen, Weizhan Zhang, Benqi Wang, Yikun Li, Haishun Nan

Abstract: Recent advancements in text-to-image generation have spurred interest in personalized human image generation, which aims to create novel images featuring specific human identities as reference images indicate. Although existing methods achieve high-fidelity identity preservation, they often struggle with limited multi-ID usability and inadequate facial editability. We present DynamicID, a tuning-free framework supported by a dual-stage training paradigm that inherently facilitates both single-ID and multi-ID personalized generation with high fidelity and flexible facial editability. Our key innovations include: 1) Semantic-Activated Attention (SAA), which employs query-level activation gating to minimize disruption to the original model when injecting ID features and achieve multi-ID personalization without requiring multi-ID samples during training. 2) Identity-Motion Reconfigurator (IMR), which leverages contrastive learning to effectively disentangle and re-entangle facial motion and identity features, thereby enabling flexible facial editing. Additionally, we have developed a curated VariFace-10k facial dataset, comprising 10k unique individuals, each represented by 35 distinct facial images. Experimental results demonstrate that DynamicID outperforms state-of-the-art methods in identity fidelity, facial editability, and multi-ID personalization capability.

replace-cross UniF$^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

Authors: Junzhe Li, Xuerui Qiu, Linrui Xu, Liya Guo, Delin Qu, Tingting Long, Chun Fan, Ming Li

Abstract: Unified multimodal models (UMMs) have emerged as a powerful paradigm in foundational computer vision research, demonstrating significant potential in both image understanding and generation. However, existing research in the face domain primarily focuses on $\textbf{coarse}$ facial attribute understanding, with limited capacity to handle $\textbf{fine-grained}$ facial attributes and without addressing generation capabilities. To overcome these limitations, we propose UniF$^2$ace, the first UMM tailored specifically for fine-grained face understanding and generation. In general, we train UniF$^2$ace on a self-constructed, specialized dataset utilizing two mutually beneficial diffusion techniques and a two-level mixture-of-experts architecture. Specifically, we first build a large-scale facial dataset, UniF$^2$ace-130K, which contains 130K image-text pairs with one million question-answering pairs that span a wide range of facial attributes. Second, we establish a theoretical connection between discrete diffusion score matching and masked generative models, optimizing both evidence lower bounds simultaneously, which significantly improves the model's ability to synthesize facial details. Finally, we introduce both token-level and sequence-level mixture-of-experts, enabling efficient fine-grained representation learning for both understanding and generation tasks. Extensive experiments on UniF$^2$ace-130K demonstrate that UniF$^2$ace outperforms existing UMMs and generative models, achieving superior performance across both understanding and generation tasks.

replace-cross Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts

Authors: Hongyu Chen, Seraphina Goldfarb-Tarrant

Abstract: Large Language Models (LLMs) are increasingly employed as automated evaluators to assess the safety of generated content, yet their reliability in this role remains uncertain. This study evaluates a diverse set of 11 LLM judge models across critical safety domains, examining three key aspects: self-consistency in repeated judging tasks, alignment with human judgments, and susceptibility to input artifacts such as apologetic or verbose phrasing. Our findings reveal that biases in LLM judges can significantly distort the final verdict on which content source is safer, undermining the validity of comparative evaluations. Notably, apologetic language artifacts alone can skew evaluator preferences by up to 98\%. Contrary to expectations, larger models do not consistently exhibit greater robustness, while smaller models sometimes show higher resistance to specific artifacts. To mitigate LLM evaluator robustness issues, we investigate jury-based evaluations aggregating decisions from multiple models. Although this approach both improves robustness and enhances alignment to human judgements, artifact sensitivity persists even with the best jury configurations. These results highlight the urgent need for diversified, artifact-resistant methodologies to ensure reliable safety assessments.

replace-cross Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models

Authors: Zhihua Tian, Sirun Nan, Ming Xu, Shengfang Zhai, Wenjie Qu, Jian Liu, Ruoxi Jia, Jiaheng Zhang

Abstract: Text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images but also raise people's concerns about generating harmful or misleading content. While extensive approaches have been proposed to erase unwanted concepts without requiring retraining from scratch, they inadvertently degrade performance on normal generation tasks. In this work, we propose Interpret then Deactivate (ItD), a novel framework to enable precise concept removal in T2I diffusion models while preserving overall performance. ItD first employs a sparse autoencoder (SAE) to interpret each concept as a combination of multiple features. By permanently deactivating the specific features associated with target concepts, we repurpose SAE as a zero-shot classifier that identifies whether the input prompt includes target concepts, allowing selective concept erasure in diffusion models. Moreover, we demonstrate that ItD can be easily extended to erase multiple concepts without requiring further training. Comprehensive experiments across celebrity identities, artistic styles, and explicit content demonstrate ItD's effectiveness in eliminating targeted concepts without interfering with normal concept generation. Additionally, ItD is also robust against adversarial prompts designed to circumvent content filters. Code is available at: https://github.com/NANSirun/Interpret-then-deactivate.

URLs: https://github.com/NANSirun/Interpret-then-deactivate.

replace-cross Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents

Authors: Atharv Singh Patlan, Peiyao Sheng, S. Ashwin Hebbar, Prateek Mittal, Pramod Viswanath

Abstract: AI agents integrated with Web3 offer autonomy and openness but raise security concerns as they interact with financial protocols and immutable smart contracts. This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios. We introduce the concept of context manipulation -- a comprehensive attack vector that exploits unprotected context surfaces, including input channels, memory modules, and external data feeds. It expands on traditional prompt injection and reveals a more stealthy and persistent threat: memory injection. Using ElizaOS, a representative decentralized AI agent framework for automated Web3 operations, we showcase that malicious injections into prompts or historical records can trigger unauthorized asset transfers and protocol violations which could be financially devastating in reality. To quantify these risks, we introduce CrAIBench, a Web3-focused benchmark covering 150+ realistic blockchain tasks. such as token transfers, trading, bridges, and cross-chain interactions, and 500+ attack test cases using context manipulation. Our evaluation results confirm that AI models are significantly more vulnerable to memory injection compared to prompt injection. Finally, we evaluate a comprehensive defense roadmap, finding that prompt-injection defenses and detectors only provide limited protection when stored context is corrupted, whereas fine-tuning-based defenses substantially reduce attack success rates while preserving performance on single-step tasks. These results underscore the urgent need for AI agents that are both secure and fiduciarily responsible in blockchain environments.

replace-cross Substance over Style: Evaluating Proactive Conversational Coaching Agents

Authors: Vidya Srinivas, Xuhai Xu, Xin Liu, Kumar Ayush, Isaac Galatzer-Levy, Shwetak Patel, Daniel McDuff, Tim Althoff

Abstract: While NLP research has made strides in conversational tasks, many approaches focus on single-turn responses with well-defined objectives or evaluation criteria. In contrast, coaching presents unique challenges with initially undefined goals that evolve through multi-turn interactions, subjective evaluation criteria, mixed-initiative dialogue. In this work, we describe and implement five multi-turn coaching agents that exhibit distinct conversational styles, and evaluate them through a user study, collecting first-person feedback on 155 conversations. We find that users highly value core functionality, and that stylistic components in absence of core components are viewed negatively. By comparing user feedback with third-person evaluations from health experts and an LM, we reveal significant misalignment across evaluation approaches. Our findings provide insights into design and evaluation of conversational coaching agents and contribute toward improving human-centered NLP applications.

replace-cross Adaptive Elicitation of Latent Information Using Natural Language

Authors: Jimmy Wang, Thomas Zollo, Richard Zemel, Hongseok Namkoong

Abstract: Eliciting information to reduce uncertainty about a latent entity is a critical task in many application domains, e.g., assessing individual student learning outcomes, diagnosing underlying diseases, or learning user preferences. Though natural language is a powerful medium for this purpose, large language models (LLMs) and existing fine-tuning algorithms lack mechanisms for strategically gathering information to refine their own understanding of the latent entity. To harness the generalization power and world knowledge of LLMs in developing effective information-gathering strategies, we propose an adaptive elicitation framework that actively reduces uncertainty on the latent entity. Since probabilistic modeling of an abstract latent entity is difficult, our framework adopts a predictive view of uncertainty, using a meta-learned language model to simulate future observations and enable scalable uncertainty quantification over complex natural language. Through autoregressive forward simulation, our model quantifies how new questions reduce epistemic uncertainty, enabling the development of sophisticated information-gathering strategies to choose the most informative next queries. In experiments on the 20 questions game, dynamic opinion polling, and adaptive student assessment, our method consistently outperforms baselines in identifying critical unknowns and improving downstream predictions, illustrating the promise of strategic information gathering in natural language settings.

replace-cross EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Authors: Lingxiao Kong, Cong Yang, Susanne Neufang, Oya Deniz Beyan, Zeyd Boukhers

Abstract: Recent advances in reinforcement learning (RL) for large language model (LLM) fine-tuning show promise in addressing multi-objective tasks but still face significant challenges, including competing objective balancing, low training efficiency, poor scalability, and limited explainability. Leveraging ensemble learning principles, we introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives while optimizing their aggregation after the fine-tuning to improve efficiency and flexibility. Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives. This approach is supported by a hierarchical grid search algorithm that identifies optimal weighted combinations. We evaluate EMORL on counselor reflection generation tasks, using text classification models to score the generations and provide rewards during RL fine-tuning. Through comprehensive experiments on the PAIR and Psych8k datasets, we demonstrate the advantages of EMORL against existing baselines: significantly lower and more stable training consumption ($17,529\pm 1,650$ data points and $6,573\pm 147.43$ seconds), improved scalability and explainability, and comparable performance across multiple objectives.

replace-cross Humanoid World Models: Open World Foundation Models for Humanoid Robotics

Authors: Muhammad Qasim Ali, Aditya Sridhar, Shahbuland Matiana, Alex Wong, Mohammad Al-Sharman

Abstract: Humanoid robots, with their human-like form, are uniquely suited for interacting in environments built for people. However, enabling humanoids to reason, plan, and act in complex open-world settings remains a challenge. World models, models that predict the future outcome of a given action, can support these capabilities by serving as a dynamics model in long-horizon planning and generating synthetic data for policy learning. We introduce Humanoid World Models (HWM), a family of lightweight, open-source models that forecast future egocentric video conditioned on humanoid control tokens. We train two types of generative models, Masked Transformers and Flow-Matching, on 100 hours of humanoid demonstrations. Additionally, we explore architectural variants with different attention mechanisms and parameter-sharing strategies. Our parameter-sharing techniques reduce model size by 33-53% with minimal impact on performance or visual fidelity. HWMs are designed to be trained and deployed in practical academic and small-lab settings, such as 1-2 GPUs.

replace-cross Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons

Authors: Isik Baran Sandan, Tu Anh Dinh, Jan Niehues

Abstract: Large Language Models (LLMs) have shown to be effective evaluators across various domains such as machine translations or the scientific domain. Current LLM-as-a-Judge approaches rely mostly on individual assessments or a single round of pairwise assessments, preventing the judge LLM from developing a global ranking perspective. To address this, we present Knockout Assessment, an LLM-asa Judge method using a knockout tournament system with iterative pairwise comparisons. Experiments across three LLMs on two datasets show that knockout assessment improves scoring accuracy, increasing Pearson correlation with expert evaluations by 0.07 on average for university-level exam scoring and machine translation evaluations, aligning LLM assessments more closely with human scoring.

replace-cross hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation

Authors: Charles Hong, Brendan Roberts, Huijae An, Alex Um, Advay Ratan, Yakun Sophia Shao

Abstract: Large language models (LLMs) are playing an increasingly large role in domains such as code generation, including hardware code generation, where Verilog is the key language. However, the amount of publicly available Verilog code pales in comparison to the amount of code available for software languages like Python. In this work, we present hdl2v ("HDL-to-Verilog"), a dataset which seeks to increase the amount of available human-written Verilog data by translating or compiling three other hardware description languages - VHDL, Chisel, and PyMTL3 - to Verilog. Furthermore, we demonstrate the value of hdl2v in enhancing LLM Verilog generation by improving performance of a 32 billion-parameter open-weight model by up to 23% (pass@10) in VerilogEvalV2, without utilizing any data augmentation or knowledge distillation from larger models. We also show hdl2v's ability to boost the performance of a data augmentation-based fine-tuning approach by 63%. Finally, we characterize and analyze our dataset to better understand which characteristics of HDL-to-Verilog datasets can be expanded upon in future work for even better performance.

replace-cross Saffron-1: Safety Inference Scaling

Authors: Ruizhong Qiu, Gaotang Li, Tianxin Wei, Jingrui He, Hanghang Tong

Abstract: Existing safety assurance research has primarily focused on training-phase alignment to instill safe behaviors into LLMs. However, recent studies have exposed these methods' susceptibility to diverse jailbreak attacks. Concurrently, inference scaling has significantly advanced LLM reasoning capabilities but remains unexplored in the context of safety assurance. Addressing this gap, our work pioneers inference scaling for robust and effective LLM safety against emerging threats. We reveal that conventional inference scaling techniques, despite their success in reasoning tasks, perform poorly in safety contexts, even falling short of basic approaches like Best-of-N Sampling. We attribute this inefficiency to a newly identified challenge, the exploration--efficiency dilemma, arising from the high computational overhead associated with frequent process reward model (PRM) evaluations. To overcome this dilemma, we propose SAFFRON, a novel inference scaling paradigm tailored explicitly for safety assurance. Central to our approach is the introduction of a multifurcation reward model (MRM) that significantly reduces the required number of reward model evaluations. To operationalize this paradigm, we further propose: (i) a partial supervision training objective for MRM, (ii) a conservative exploration constraint to prevent out-of-distribution explorations, and (iii) a Trie-based key--value caching strategy that facilitates cache sharing across sequences during tree search. Extensive experiments validate the effectiveness of our method. Additionally, we publicly release our trained multifurcation reward model (Saffron-1) and the accompanying token-level safety reward dataset (Safety4M) to accelerate future research in LLM safety. Our code, model, and data are publicly available at https://github.com/q-rz/saffron , and our project homepage is at https://q-rz.github.io/p/saffron .

URLs: https://github.com/q-rz/saffron, https://q-rz.github.io/p/saffron

replace-cross QUITE: A Query Rewrite System Beyond Rules with LLM Agents

Authors: Yuyang Song, Hanxu Yan, Jiale Lao, Yibo Wang, Yufei Li, Yuanchun Zhou, Jianguo Wang, Mingjie Tang

Abstract: Query rewrite transforms SQL queries into semantically equivalent forms that run more efficiently. Existing approaches mainly rely on predefined rewrite rules, but they handle a limited subset of queries and can cause performance regressions. This limitation stems from three challenges of rule-based query rewrite: (1) it is hard to discover and verify new rules, (2) fixed rewrite rules do not generalize to new query patterns, and (3) some rewrite techniques cannot be expressed as fixed rules. Motivated by the fact that human experts exhibit significantly better rewrite ability but suffer from scalability, and Large Language Models (LLMs) have demonstrated nearly human-level semantic and reasoning abilities, we propose a new approach of using LLMs to rewrite SQL queries beyond rules. Due to the hallucination problems in LLMs, directly applying LLMs often leads to nonequivalent and suboptimal queries. To address this issue, we propose QUITE (query rewrite), a training-free and feedback-aware system based on LLM agents that rewrites SQL queries into semantically equivalent forms with significantly better performance, covering a broader range of query patterns and rewrite strategies compared to rule-based methods. Firstly, we design a multi-agent framework controlled by a finite state machine (FSM) to equip LLMs with the ability to use external tools and enhance the rewrite process with real-time database feedback. Secondly, we develop a rewrite middleware to enhance the ability of LLMs to generate optimized query equivalents. Finally, we employ a novel hint injection technique to improve execution plans for rewritten queries. Extensive experiments show that QUITE reduces query execution time by up to 35.8% over state-of-the-art approaches and produces 24.1% more rewrites than prior methods, covering query cases that earlier systems did not handle.

replace-cross ConTextTab: A Semantics-Aware Tabular In-Context Learner

Authors: Marco Spinaci, Marek Polewczyk, Maximilian Schambach, Sam Thelin

Abstract: Tabular in-context learning (ICL) has recently achieved state-of-the-art (SOTA) performance on several tabular prediction tasks. Previously restricted to classification problems on small tables, recent advances such as TabPFN and TabICL have extended its use to larger datasets. While being architecturally efficient and well-adapted to tabular data structures, current table-native ICL architectures, being trained exclusively on synthetic data, do not fully leverage the rich semantics and world knowledge contained in real-world tabular data. On another end of this spectrum, tabular ICL models based on pretrained large language models such as TabuLa-8B integrate deep semantic understanding and world knowledge but are only able to make use of a small amount of context due to inherent architectural limitations. With the aim to combine the best of both these worlds, we introduce ConTextTab, integrating semantic understanding and alignment into a table-native ICL framework. By employing specialized embeddings for different data modalities and by training on large-scale real-world tabular data, our model is competitive with SOTA across a broad set of benchmarks while setting a new standard on the semantically rich CARTE benchmark. Code and checkpoints are available at https://github.com/SAP-samples/contexttab

URLs: https://github.com/SAP-samples/contexttab

replace-cross Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

Authors: Kun Zhang, Le Wu, Kui Yu, Guangyi Lv, Dacao Zhang

Abstract: Large Language Models (LLMs) have gained enormous attention in recent years due to their capability of understanding and generating natural languages. With the rapid development and wild-range applications (e.g., Agents, Embodied Intelligence), the robustness of LLMs has received increased attention. As the core brain of many AI applications, the robustness of LLMs requires that models should not only generate consistent contents, but also ensure the correctness and stability of generated content when dealing with unexpeted application scenarios (e.g., toxic prompts, limited noise domain data, outof-distribution (OOD) applications, etc). In this survey paper, we conduct a thorough review of the robustness of LLMs, aiming to provide a comprehensive terminology of concepts and methods around this field and facilitate the community. Specifically, we first give a formal definition of LLM robustness and present the collection protocol of this survey paper. Then, based on the types of perturbated inputs, we organize this survey from the following perspectives: 1) Adversarial Robustness: tackling the problem that prompts are manipulated intentionally, such as noise prompts, long context, data attack, etc; 2) OOD Robustness: dealing with the unexpected real-world application scenarios, such as OOD detection, zero-shot transferring, hallucinations, etc; 3) Evaluation of Robustness: summarizing the new evaluation datasets, metrics, and tools for verifying the robustness of LLMs. After reviewing the representative work from each perspective, we discuss and highlight future opportunities and research directions in this field. Meanwhile, we also organize related works and provide an easy-to-search project (https://github.com/zhangkunzk/Awesome-LLM-Robustness-papers) to support the community.

URLs: https://github.com/zhangkunzk/Awesome-LLM-Robustness-papers)

replace-cross MedSyn: Enhancing Diagnostics with Human-AI Collaboration

Authors: Burcu Sayin, Ipek Baris Schlicht, Ngoc Vo Hong, Sara Allievi, Jacopo Staiano, Pasquale Minervini, Andrea Passerini

Abstract: Clinical decision-making is inherently complex, often influenced by cognitive biases, incomplete information, and case ambiguity. Large Language Models (LLMs) have shown promise as tools for supporting clinical decision-making, yet their typical one-shot or limited-interaction usage may overlook the complexities of real-world medical practice. In this work, we propose a hybrid human-AI framework, MedSyn, where physicians and LLMs engage in multi-step, interactive dialogues to refine diagnoses and treatment decisions. Unlike static decision-support tools, MedSyn enables dynamic exchanges, allowing physicians to challenge LLM suggestions while the LLM highlights alternative perspectives. Through simulated physician-LLM interactions, we assess the potential of open-source LLMs as physician assistants. Results show open-source LLMs are promising as physician assistants in the real world. Future work will involve real physician interactions to further validate MedSyn's usefulness in diagnostic accuracy and patient outcomes.

replace-cross LLM Agent for Hyper-Parameter Optimization

Authors: Wanzhe Wang, Jianqiu Peng, Menghao Hu, Weihuang Zhong, Tong Zhang, Shuai Wang, Yixin Zhang, Mingjie Shao, Wanli Ni

Abstract: Hyper-parameters are essential and critical for the performance of communication algorithms. However, current hyper-parameters optimization approaches for Warm-Start Particles Swarm Optimization with Crossover and Mutation (WS-PSO-CM) algorithm, designed for radio map-enabled unmanned aerial vehicle (UAV) trajectory and communication, are primarily heuristic-based, exhibiting low levels of automation and improvable performance. In this paper, we design an Large Language Model (LLM) agent for automatic hyper-parameters-tuning, where an iterative framework and Model Context Protocol (MCP) are applied. In particular, the LLM agent is first set up via a profile, which specifies the boundary of hyper-parameters, task objective, terminal condition, conservative or aggressive strategy of optimizing hyper-parameters, and LLM configurations. Then, the LLM agent iteratively invokes WS-PSO-CM algorithm for exploration. Finally, the LLM agent exits the loop based on the terminal condition and returns an optimized set of hyperparameters. Our experiment results show that the minimal sum-rate achieved by hyper-parameters generated via our LLM agent is significantly higher than those by both human heuristics and random generation methods. This indicates that an LLM agent with PSO and WS-PSO-CM algorithm knowledge is useful in seeking high-performance hyper-parameters.

replace-cross SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications

Authors: Jinyang Li, Xiaolong Li, Ge Qu, Per Jacobsson, Bowen Qin, Binyuan Hui, Shuzheng Si, Nan Huo, Xiaohan Xu, Yue Zhang, Ziwei Tang, Yuanshuai Li, Florensia Widjaja, Xintong Zhu, Feige Zhou, Yongfeng Huang, Yannis Papakonstantinou, Fatma Ozcan, Chenhao Ma, Reynold Cheng

Abstract: Resolution of complex SQL issues persists as a significant bottleneck in real-world database applications. Current Large Language Models (LLMs), while adept at text-to-SQL translation, have not been rigorously evaluated on the more challenging task of debugging SQL issues. To address this gap, we introduce BIRD-CRITIC, a new SQL issue debugging benchmark comprising 530 PostgreSQL tasks (BIRD-CRITIC-PG) and 570 multi-dialect tasks (BIRD-CRITIC-Multi), distilled from authentic user issues and replayed within new environments to facilitate rigorous evaluation. Baseline evaluations underscore the task's complexity, with the leading reasoning model O3-Mini achieving only 38.87% success rate on BIRD-CRITIC-PG and 33.33% on BIRD-CRITIC-Multi. Meanwhile, advancing open-source models for database tasks is crucial for empowering local development while safeguarding data privacy. Therefore, we present Six-Gym (Sql-fIX-Gym), a training environment for elevating open-source model capabilities for SQL issue debugging. This environment leverages SQL-Rewind strategy, which automatically generates executable issue-solution datasets by reverse-engineering issues from verified SQLs. However, popular trajectory-based fine-tuning methods do not explore substantial supervisory signals. We further propose f-Plan Boosting, which extracts high-level debugging plans from SQL solutions, enabling teacher LLMs to produce 73.7% more successful trajectories for training. We integrate these components into an open-source agent, Bird-Fixer. Based on Qwen-2.5-Coder-14B, Bird-Fixer achieves 38.11% success rate on BIRD-CRITIC-PG and 29.65% on BIRD-CRITIC-Multi, surpassing leading proprietary models such as Claude-3.7-Sonnet and GPT-4.1, marking a significant step toward democratizing sophisticated SQL-debugging capabilities. The leaderboard and source code are available: https://bird-critic.github.io/

URLs: https://bird-critic.github.io/

replace-cross DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE

Authors: Hang Shao, Heting Gao, Yunhang Shen, Jiawei Chen, Lijiang Li, Zuwei Long, Bo Tong, Ke Li, Xing Sun

Abstract: Native multimodal large language models (MLLMs) restructure a single large language model (LLM) into a spoken language model (SLM) capable of both speech and text generation. Compared to modular and aligned MLLMs, native MLLMs preserve richer paralinguistic features such as emotion and prosody, and generate speech responses directly within the backbone LLM rather than using a separate speech decoder. This integration also results in lower response latency and smoother interaction. However, native MLLMs suffer from catastrophic forgetting and performance degradation because the available paired speech-text data is insufficient to support the pretraining of MLLMs compared to the vast amount of text data required to pretrain text LLMs. To address this issue, we propose DeepTalk, a framework for adaptive modality expert learning based on a Mixture of Experts (MoE) architecture. DeepTalk first adaptively distinguishes modality experts according to their modality load within the LLM. Each modality expert then undergoes specialized single-modality training, followed by joint multimodal collaborative training. As a result, DeepTalk incurs only a 5.5% performance drop compared to the original LLM, which is significantly lower than the average performance drop of over 20% typically seen in native MLLMs (such as GLM-4-Voice), and is on par with modular MLLMs. Meanwhile, the end-to-end dialogue latency remains within 0.5 seconds, ensuring a seamless and intelligent speech interaction experience. Code and models are released at https://github.com/talkking/DeepTalk.

URLs: https://github.com/talkking/DeepTalk.

replace-cross XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

Authors: Yitian Gong, Luozhijie Jin, Ruifan Deng, Dong Zhang, Xin Zhang, Qinyuan Cheng, Zhaoye Fei, Shimin Li, Xipeng Qiu

Abstract: Speech codecs serve as bridges between speech signals and large language models. An ideal codec for speech language models should not only preserve acoustic information but also capture rich semantic information. However, existing speech codecs struggle to balance high-quality audio reconstruction with ease of modeling by language models. In this study, we analyze the limitations of previous codecs in balancing semantic richness and acoustic fidelity. We propose XY-Tokenizer, a novel codec that mitigates the conflict between semantic and acoustic capabilities through multi-stage, multi-task learning. Experimental results demonstrate that XY-Tokenizer achieves performance in both semantic and acoustic tasks comparable to that of state-of-the-art codecs operating at similar bitrates, even though those existing codecs typically excel in only one aspect. Specifically, XY-Tokenizer achieves strong text alignment, surpassing distillation-based semantic modeling methods such as SpeechTokenizer and Mimi, while maintaining a speaker similarity score of 0.83 between reconstructed and original audio. The reconstruction performance of XY-Tokenizer is comparable to that of BigCodec, the current state-of-the-art among acoustic-only codecs, which achieves a speaker similarity score of 0.84 at a similar bitrate. Code and models are available at https://github.com/gyt1145028706/XY-Tokenizer.

URLs: https://github.com/gyt1145028706/XY-Tokenizer.

replace-cross Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation

Authors: Hongyi Pan, Ziliang Hong, Gorkem Durak, Ziyue Xu, Ulas Bagci

Abstract: Federated learning (FL) has emerged as a promising paradigm for collaboratively training deep learning models across institutions without exchanging sensitive medical data. However, its effectiveness is often hindered by limited data availability and non-independent, identically distributed data across participating clients, which can degrade model performance and generalization. To address these challenges, we propose a generative AI based data augmentation framework that integrates synthetic image sharing into the federated training process for breast cancer diagnosis via ultrasound images. Specifically, we train two simple class-specific Deep Convolutional Generative Adversarial Networks: one for benign and one for malignant lesions. We then simulate a realistic FL setting using three publicly available breast ultrasound image datasets: BUSI, BUS-BRA, and UDIAT. FedAvg and FedProx are adopted as baseline FL algorithms. Experimental results show that incorporating a suitable number of synthetic images improved the average AUC from 0.9206 to 0.9237 for FedAvg and from 0.9429 to 0.9538 for FedProx. We also note that excessive use of synthetic data reduced performance, underscoring the importance of maintaining a balanced ratio of real and synthetic samples. Our findings highlight the potential of generative AI based data augmentation to enhance FL results in the breast ultrasound image classification task.

replace-cross PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection

Authors: Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu

Abstract: Object detection plays a crucial role in many security-sensitive applications. However, several recent studies have shown that object detectors can be easily fooled by physically realizable attacks, \eg, adversarial patches and recent adversarial textures, which pose realistic and urgent threats. Adversarial Training (AT) has been recognized as the most effective defense against adversarial attacks. While AT has been extensively studied in the $l_\infty$ attack settings on classification models, AT against physically realizable attacks on object detectors has received limited exploration. Early attempts are only performed to defend against adversarial patches, leaving AT against a wider range of physically realizable attacks under-explored. In this work, we consider defending against various physically realizable attacks with a unified AT method. We propose PBCAT, a novel Patch-Based Composite Adversarial Training strategy. PBCAT optimizes the model by incorporating the combination of small-area gradient-guided adversarial patches and imperceptible global adversarial perturbations covering the entire image. With these designs, PBCAT has the potential to defend against not only adversarial patches but also unseen physically realizable attacks such as adversarial textures. Extensive experiments in multiple settings demonstrated that PBCAT significantly improved robustness against various physically realizable attacks over state-of-the-art defense methods. Notably, it improved the detection accuracy by 29.7\% over previous defense methods under one recent adversarial texture attack.

replace-cross Autonomy by Design: Preserving Human Autonomy in AI Decision-Support

Authors: Stefan Buijsman, Sarah E. Carter, Juan Pablo Berm\'udez

Abstract: AI systems increasingly support human decision-making across domains of professional, skill-based, and personal activity. While previous work has examined how AI might affect human autonomy globally, the effects of AI on domain-specific autonomy -- the capacity for self-governed action within defined realms of skill or expertise -- remain understudied. We analyze how AI decision-support systems affect two key components of domain-specific autonomy: skilled competence (the ability to make informed judgments within one's domain) and authentic value-formation (the capacity to form genuine domain-relevant values and preferences). By engaging with prior investigations and analyzing empirical cases across medical, financial, and educational domains, we demonstrate how the absence of reliable failure indicators and the potential for unconscious value shifts can erode domain-specific autonomy both immediately and over time. We then develop a constructive framework for autonomy-preserving AI support systems. We propose specific socio-technical design patterns -- including careful role specification, implementation of defeater mechanisms, and support for reflective practice -- that can help maintain domain-specific autonomy while leveraging AI capabilities. This framework provides concrete guidance for developing AI systems that enhance rather than diminish human agency within specialized domains of action.

replace-cross Generating Heterogeneous Multi-dimensional Data : A Comparative Study

Authors: Michael Corbeau, Emmanuelle Claeys, Mathieu Serrurier, Pascale Zarat\'e

Abstract: Allocation of personnel and material resources is highly sensible in the case of firefighter interventions. This allocation relies on simulations to experiment with various scenarios. The main objective of this allocation is the global optimization of the firefighters response. Data generation is then mandatory to study various scenarios In this study, we propose to compare different data generation methods. Methods such as Random Sampling, Tabular Variational Autoencoders, standard Generative Adversarial Networks, Conditional Tabular Generative Adversarial Networks and Diffusion Probabilistic Models are examined to ascertain their efficacy in capturing the intricacies of firefighter interventions. Traditional evaluation metrics often fall short in capturing the nuanced requirements of synthetic datasets for real-world scenarios. To address this gap, an evaluation of synthetic data quality is conducted using a combination of domain-specific metrics tailored to the firefighting domain and standard measures such as the Wasserstein distance. Domain-specific metrics include response time distribution, spatial-temporal distribution of interventions, and accidents representation. These metrics are designed to assess data variability, the preservation of fine and complex correlations and anomalies such as event with a very low occurrence, the conformity with the initial statistical distribution and the operational relevance of the synthetic data. The distribution has the particularity of being highly unbalanced, none of the variables following a Gaussian distribution, adding complexity to the data generation process.

replace-cross Probing and Steering Evaluation Awareness of Language Models

Authors: Jord Nguyen, Khiem Hoang, Carlo Leonardo Attubato, Felix Hofst\"atter

Abstract: Language models can distinguish between testing and deployment phases -- a capability known as evaluation awareness. This has significant safety and policy implications, potentially undermining the reliability of evaluations that are central to AI governance frameworks and voluntary industry commitments. In this paper, we study evaluation awareness in Llama-3.3-70B-Instruct. We show that linear probes can separate real-world evaluation and deployment prompts, suggesting that current models internally represent this distinction. We also find that current safety evaluations are correctly classified by the probes, suggesting that they already appear artificial or inauthentic to models. Our findings underscore the importance of ensuring trustworthy evaluations and understanding deceptive capabilities. More broadly, our work showcases how model internals may be leveraged to support blackbox methods in safety audits, especially for future models more competent at evaluation awareness and deception.

replace-cross DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction

Authors: Zhiyi Hou, Enhui Ma, Fang Li, Zhiyi Lai, Kalok Ho, Zhanqian Wu, Lijun Zhou, Long Chen, Chitian Sun, Haiyang Sun, Bing Wang, Guang Chen, Hangjun Ye, Kaicheng Yu

Abstract: Autonomous driving has seen significant progress, driven by extensive real-world data. However, in long-tail scenarios, accurately predicting the safety of the ego vehicle's future motion remains a major challenge due to uncertainties in dynamic environments and limitations in data coverage. In this work, we aim to explore whether it is possible to enhance the motion risk prediction capabilities of Vision-Language Models (VLM) by synthesizing high-risk motion data. Specifically, we introduce a Bird's-Eye View (BEV) based motion simulation method to model risks from three aspects: the ego-vehicle, other vehicles, and the environment. This allows us to synthesize plug-and-play, high-risk motion data suitable for VLM training, which we call DriveMRP-10K. Furthermore, we design a VLM-agnostic motion risk estimation framework, named DriveMRP-Agent. This framework incorporates a novel information injection strategy for global context, ego-vehicle perspective, and trajectory projection, enabling VLMs to effectively reason about the spatial relationships between motion waypoints and the environment. Extensive experiments demonstrate that by fine-tuning with DriveMRP-10K, our DriveMRP-Agent framework can significantly improve the motion risk prediction performance of multiple VLM baselines, with the accident recognition accuracy soaring from 27.13% to 88.03%. Moreover, when tested via zero-shot evaluation on an in-house real-world high-risk motion dataset, DriveMRP-Agent achieves a significant performance leap, boosting the accuracy from base_model's 29.42% to 68.50%, which showcases the strong generalization capabilities of our method in real-world scenarios.

replace-cross PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)

Authors: Gonzalo Mancera, Aythami Morales, Julian Fierrez, Ruben Tolosana, Alejandro Penna, Miguel Lopez-Duran, Francisco Jurado, Alvaro Ortigosa

Abstract: The use of Natural Language Processing (NLP) in highstakes AI-based applications has increased significantly in recent years, especially since the emergence of Large Language Models (LLMs). However, despite their strong performance, LLMs introduce important legal/ ethical concerns, particularly regarding privacy, data protection, and transparency. Due to these concerns, this work explores the use of Named- Entity Recognition (NER) to facilitate the privacy-preserving training (or adaptation) of LLMs. We propose a framework that uses NER technologies to anonymize sensitive information in text data, such as personal identities or geographic locations. An evaluation of the proposed privacy-preserving learning framework was conducted to measure its impact on user privacy and system performance in a particular high-stakes and sensitive setup: AI-based resume scoring for recruitment processes. The study involved two language models (BERT and RoBERTa) and six anonymization algorithms (based on Presidio, FLAIR, BERT, and different versions of GPT) applied to a database of 24,000 candidate profiles. The findings indicate that the proposed privacy preservation techniques effectively maintain system performance while playing a critical role in safeguarding candidate confidentiality, thus promoting trust in the experimented scenario. On top of the proposed privacy-preserving approach, we also experiment applying an existing approach that reduces the gender bias in LLMs, thus finally obtaining our proposed Privacyand Bias-aware LLMs (PBa-LLMs). Note that the proposed PBa-LLMs have been evaluated in a particular setup (resume scoring), but are generally applicable to any other LLM-based AI application.

replace-cross On Jailbreaking Quantized Language Models Through Fault Injection Attacks

Authors: Noureldin Zahran, Ahmad Tahmasivand, Ihsen Alouani, Khaled Khasawneh, Mohammed E. Fouda

Abstract: The safety alignment of Language Models (LMs) is a critical concern, yet their integrity can be challenged by direct parameter manipulation attacks, such as those potentially induced by fault injection. As LMs are increasingly deployed using low-precision quantization for efficiency, this paper investigates the efficacy of such attacks for jailbreaking aligned LMs across different quantization schemes. We propose gradient-guided attacks, including a tailored progressive bit-level search algorithm introduced herein and a comparative word-level (single weight update) attack. Our evaluation on Llama-3.2-3B, Phi-4-mini, and Llama-3-8B across FP16 (baseline), and weight-only quantization (FP8, INT8, INT4) reveals that quantization significantly influences attack success. While attacks readily achieve high success (>80% Attack Success Rate, ASR) on FP16 models, within an attack budget of 25 perturbations, FP8 and INT8 models exhibit ASRs below 20% and 50%, respectively. Increasing the perturbation budget up to 150 bit-flips, FP8 models maintained ASR below 65%, demonstrating some resilience compared to INT8 and INT4 models that have high ASR. In addition, analysis of perturbation locations revealed differing architectural targets across quantization schemes, with (FP16, INT4) and (INT8, FP8) showing similar characteristics. Besides, jailbreaks induced in FP16 models were highly transferable to subsequent FP8/INT8 quantization (<5% ASR difference), though INT4 significantly reduced transferred ASR (avg. 35% drop). These findings highlight that while common quantization schemes, particularly FP8, increase the difficulty of direct parameter manipulation jailbreaks, vulnerabilities can still persist, especially through post-attack quantization.

replace-cross RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

Authors: Baolong Bi, Shenghua Liu, Xingzhang Ren, Dayiheng Liu, Junyang Lin, Yiwei Wang, Lingrui Mei, Junfeng Fang, Jiafeng Guo, Xueqi Cheng

Abstract: The foundational capabilities of large language models (LLMs) are deeply influenced by the quality of their pre-training corpora. However, enhancing data quality at scale remains a significant challenge, primarily due to the trade-off between refinement effectiveness and processing efficiency. While rule-based filtering remains the dominant paradigm, it typically operates at the document level and lacks the granularity needed to refine specific content within documents. Inspired by emerging work such as ProX, we propose $\textbf{RefineX}$, a novel framework for large-scale, surgical refinement of pre-training data through programmatic editing tasks. RefineX enables efficient and fine-grained data refinement while reliably preserving the diversity and naturalness of raw text. The core strength of RefineX lies in distilling high-quality, expert-guided end-to-end refinement results into minimal edit-based deletion programs. This high-precision distillation pipeline is used to train an efficient and reliable refine model that can systematically improve every instance in the corpus at scale. We evaluate RefineX across from-scratch pre-training at multiple model scales and find that it consistently outperforms models trained on raw, filtered, or alternatively refined data across diverse downstream tasks. On the 750M model, RefineX yields 2.6%-7.2% average gains on lighteval tasks, and achieves comparable performance using significantly fewer training tokens. Further analysis shows that RefineX reliably enhances text quality with both high efficiency and precision, outperforming prior approaches such as end-to-end generation and Prox-C. These results position RefineX as a scalable, effective, and reliable solution for optimizing pre-training data in modern LLM pipelines.

replace-cross Reinforcement Learning-based Feature Generation Algorithm for Scientific Data

Authors: Meng Xiao, Junfeng Zhou, Yuanchun Zhou

Abstract: Feature generation (FG) aims to enhance the prediction potential of original data by constructing high-order feature combinations and removing redundant features. It is a key preprocessing step for tabular scientific data to improve downstream machine-learning model performance. Traditional methods face the following two challenges when dealing with the feature generation of scientific data: First, the effective construction of high-order feature combinations in scientific data necessitates profound and extensive domain-specific expertise. Secondly, as the order of feature combinations increases, the search space expands exponentially, imposing prohibitive human labor consumption. Advancements in the Data-Centric Artificial Intelligence (DCAI) paradigm have opened novel avenues for automating feature generation processes. Inspired by that, this paper revisits the conventional feature generation workflow and proposes the Multi-agent Feature Generation (MAFG) framework. Specifically, in the iterative exploration stage, multi-agents will construct mathematical transformation equations collaboratively, synthesize and identify feature combinations ex-hibiting high information content, and leverage a reinforcement learning mechanism to evolve their strategies. Upon completing the exploration phase, MAFG integrates the large language models (LLMs) to interpreta-tively evaluate the generated features of each significant model performance breakthrough. Experimental results and case studies consistently demonstrate that the MAFG framework effectively automates the feature generation process and significantly enhances various downstream scientific data mining tasks.

replace-cross From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis

Authors: Amirabbas Hojjati, Lu Li, Ibrahim Hameed, Anis Yazidi, Pedro G. Lind, Rabindra Khadka

Abstract: EEG signals capture brain activity with high temporal and low spatial resolution, supporting applications such as neurological diagnosis, cognitive monitoring, and brain-computer interfaces. However, effective analysis is hindered by limited labeled data, high dimensionality, and the absence of scalable models that fully capture spatiotemporal dependencies. Existing self-supervised learning (SSL) methods often focus on either spatial or temporal features, leading to suboptimal representations. To this end, we propose EEG-VJEPA, a novel adaptation of the Video Joint Embedding Predictive Architecture (V-JEPA) for EEG classification. By treating EEG as video-like sequences, EEG-VJEPA learns semantically meaningful spatiotemporal representations using joint embeddings and adaptive masking. To our knowledge, this is the first work that exploits V-JEPA for EEG classification and explores the visual concepts learned by the model. Evaluations on the publicly available Temple University Hospital (TUH) Abnormal EEG dataset show that EEG-VJEPA outperforms existing state-of-the-art models in classification accuracy. Beyond classification accuracy, EEG-VJEPA captures physiologically relevant spatial and temporal signal patterns, offering interpretable embeddings that may support human-AI collaboration in diagnostic workflows. These findings position EEG-VJEPA as a promising framework for scalable, trustworthy EEG analysis in real-world clinical settings.

replace-cross Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

Authors: Ruihao Zhang, Mao chen, Fei Ye, Dandan Meng, Yixuan Huang, Xiao Liu

Abstract: T cell receptor (TCR) repertoires encode critical immunological signatures for autoimmune diseases, yet their clinical application remains limited by sequence sparsity and low witness rates. We developed EAMil, a multi-instance deep learning framework that leverages TCR sequencing data to diagnose systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) with exceptional accuracy. By integrating PrimeSeq feature extraction with ESMonehot encoding and enhanced gate attention mechanisms, our model achieved state-of-the-art performance with AUCs of 98.95% for SLE and 97.76% for RA. EAMil successfully identified disease-associated genes with over 90% concordance with established differential analyses and effectively distinguished disease-specific TCR genes. The model demonstrated robustness in classifying multiple disease categories, utilizing the SLEDAI score to stratify SLE patients by disease severity as well as to diagnose the site of damage in SLE patients, and effectively controlling for confounding factors such as age and gender. This interpretable framework for immune receptor analysis provides new insights for autoimmune disease detection and classification with broad potential clinical applications across immune-mediated conditions.

replace-cross Sequential Attention-based Sampling for Histopathological Analysis

Authors: Tarun G, Naman Malpani, Gugan Thoppe, Sridharan Devarajan

Abstract: Deep neural networks are increasingly applied for automated histopathology. Yet, whole-slide images (WSIs) are often acquired at gigapixel sizes, rendering it computationally infeasible to analyze them entirely at high resolution. Diagnostic labels are largely available only at the slide-level, because expert annotation of images at a finer (patch) level is both laborious and expensive. Moreover, regions with diagnostic information typically occupy only a small fraction of the WSI, making it inefficient to examine the entire slide at full resolution. Here, we propose SASHA -- {\it S}equential {\it A}ttention-based {\it S}ampling for {\it H}istopathological {\it A}nalysis -- a deep reinforcement learning approach for efficient analysis of histopathological images. First, SASHA learns informative features with a lightweight hierarchical, attention-based multiple instance learning (MIL) model. Second, SASHA samples intelligently and zooms selectively into a small fraction (10-20\%) of high-resolution patches, to achieve reliable diagnosis. We show that SASHA matches state-of-the-art methods that analyze the WSI fully at high-resolution, albeit at a fraction of their computational and memory costs. In addition, it significantly outperforms competing, sparse sampling methods. We propose SASHA as an intelligent sampling model for medical imaging challenges that involve automated diagnosis with exceptionally large images containing sparsely informative features.

replace-cross ModelCitizens: Representing Community Voices in Online Safety

Authors: Ashima Suvarna, Christina Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, Saadia Gabriel

Abstract: Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and lived experience. Existing toxicity detection models are typically trained on annotations that collapse diverse annotator perspectives into a single ground truth, erasing important context-specific notions of toxicity such as reclaimed language. To address this, we introduce MODELCITIZENS, a dataset of 6.8K social media posts and 40K toxicity annotations across diverse identity groups. To capture the role of conversational context on toxicity, typical of social media posts, we augment MODELCITIZENS posts with LLM-generated conversational scenarios. State-of-the-art toxicity detection tools (e.g. OpenAI Moderation API, GPT-o4-mini) underperform on MODELCITIZENS, with further degradation on context-augmented posts. Finally, we release LLAMACITIZEN-8B and GEMMACITIZEN-12B, LLaMA- and Gemma-based models finetuned on MODELCITIZENS, which outperform GPT-o4-mini by 5.5% on in-distribution evaluations. Our findings highlight the importance of community-informed annotation and modeling for inclusive content moderation. The data, models and code are available at https://github.com/asuvarna31/modelcitizens.

URLs: https://github.com/asuvarna31/modelcitizens.

replace-cross AI Agent Smart Contract Exploit Generation

Authors: Arthur Gervais, Liyi Zhou

Abstract: We present A1, an agentic execution driven system that transforms any LLM into an end-to-end exploit generator. A1 has no hand-crafted heuristics and provides the agent with six domain-specific tools that enable autonomous vulnerability discovery. The agent can flexibly leverage these tools to understand smart contract behavior, generate exploit strategies, test them on blockchain states, and refine approaches based on execution feedback. All outputs are concretely validated to eliminate false positives. The evaluation across 36 real-world vulnerable contracts on Ethereum and Binance Smart Chain demonstrates a 62.96% (17 out of 27) success rate on the VERITE benchmark. Beyond the VERITE dataset, A1 identified 9 additional vulnerable contracts, with 5 cases occurring after the strongest model's training cutoff date. Across all 26 successful cases, A1 extracts up to 8.59 million USD per case and 9.33 million USD total. Through 432 experiments across six LLMs, we analyze iteration-wise performance showing diminishing returns with average marginal gains of +9.7%, +3.7%, +5.1%, and +2.8% for iterations 2-5 respectively, with per-experiment costs ranging $0.01-$3.59. A Monte Carlo analysis of 19 historical attacks shows success probabilities of 85.9%-88.8% without detection delays. We investigate whether an attacker or a defender benefits most from deploying A1 as a continuous on-chain scanning system. Our model shows that OpenAI's o3-pro maintains profitability up to a 30.0 days scanning delay at 0.100% vulnerability incidence rates, while faster models require >=1.000% rates to break-even. The findings exposes a troubling asymmetry: at 0.1% vulnerability rates, attackers achieve an on-chain scanning profitability at a \$6000 exploit value, while defenders require \$60000, raising fundamental questions about whether AI agents inevitably favor exploitation over defense.

replace-cross Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework

Authors: Wang Wang, Mingyu Shi, Jun Jiang, Wenqian Ma, Chong Liu, Yasutaka Narazaki, Xuguang Wang

Abstract: As critical transportation infrastructure, bridges face escalating challenges from aging and deterioration, while traditional manual inspection methods suffer from low efficiency. Although 3D point cloud technology provides a new data-driven paradigm, its application potential is often constrained by the incompleteness of real-world data, which results from missing labels and scanning occlusions. To overcome the bottleneck of insufficient generalization in existing synthetic data methods, this paper proposes a systematic framework for generating 3D bridge data. This framework can automatically generate complete point clouds featuring component-level instance annotations, high-fidelity color, and precise normal vectors. It can be further extended to simulate the creation of diverse and physically realistic incomplete point clouds, designed to support the training of segmentation and completion networks, respectively. Experiments demonstrate that a PointNet++ model trained with our synthetic data achieves a mean Intersection over Union (mIoU) of 84.2% in real-world bridge semantic segmentation. Concurrently, a fine-tuned KT-Net exhibits superior performance on the component completion task. This research offers an innovative methodology and a foundational dataset for the 3D visual analysis of bridge structures, holding significant implications for advancing the automated management and maintenance of infrastructure.

replace-cross Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSS

Authors: Xinyu Wang, Muhammad Ibrahim, Haitian Wang, Atif Mansoor, Ajmal Mian

Abstract: Accurate geo-registration of LiDAR point clouds presents significant challenges in GNSS signal denied urban areas with high-rise buildings and bridges. Existing methods typically rely on real-time GNSS and IMU data, that require pre-calibration and assume stable positioning during data collection. However, this assumption often fails in dense urban areas, resulting in localization errors. To address this, we propose a structured geo-registration and spatial correction method that aligns 3D point clouds with satellite images, enabling frame-wise recovery of GNSS information and reconstruction of city scale 3D maps without relying on prior localization. The proposed approach employs a pre-trained Point Transformer model to segment the road points and then extracts the road skeleton and intersection points from the point cloud as well as the target map for alignment. Global rigid alignment of the two is performed using the intersection points, followed by local refinement using radial basis function (RBF) interpolation. Elevation correction is then applied to the point cloud based on terrain information from SRTM dataset to resolve vertical discrepancies. The proposed method was tested on the popular KITTI benchmark and a locally collected Perth (Western Australia) CBD dataset. On the KITTI dataset, our method achieved an average planimetric alignment standard deviation (STD) of 0.84~m across sequences with intersections, representing a 55.3\% improvement over the original dataset. On the Perth dataset, which lacks GNSS information, our method achieved an average STD of 0.96~m compared to the GPS data extracted from Google Maps API. This corresponds to a 77.4\% improvement from the initial alignment. Our method also resulted in elevation correlation gains of 30.5\% on the KITTI dataset and 50.4\% on the Perth dataset.