Authors: Bo Yuan, Yulin Chen, Yin Zhang, Wei Jiang
Abstract: Learning from noisy labels (LNL) is a challenge that arises in many real-world scenarios where collected training data can contain incorrect or corrupted labels. Most existing solutions identify noisy labels and adopt active learning to query human experts on them for denoising. In the era of large language models (LLMs), although we can reduce the human effort to improve these methods, their performances are still subject to accurately separating the clean and noisy samples from noisy data. In this paper, we propose an innovative collaborative learning framework NoiseAL based on active learning to combine LLMs and small models (SMs) for learning from noisy labels. During collaborative training, we first adopt two SMs to form a co-prediction network and propose a dynamic-enhanced threshold strategy to divide the noisy data into different subsets, then select the clean and noisy samples from these subsets to feed the active annotator LLMs to rectify noisy samples. Finally, we employ different optimization objectives to conquer subsets with different degrees of label noises. Extensive experiments on synthetic and real-world noise datasets further demonstrate the superiority of our framework over state-of-the-art baselines.
Authors: Julian Minder, Clement Dumas, Caden Juang, Bilal Chugtai, Neel Nanda
Abstract: Model diffing is the study of how fine-tuning changes a model's representations and internal algorithms. Many behaviours of interest are introduced during fine-tuning, and model diffing offers a promising lens to interpret such behaviors. Crosscoders are a recent model diffing method that learns a shared dictionary of interpretable concepts represented as latent directions in both the base and fine-tuned models, allowing us to track how concepts shift or emerge during fine-tuning. Notably, prior work has observed concepts with no direction in the base model, and it was hypothesized that these model-specific latents were concepts introduced during fine-tuning. However, we identify two issues which stem from the crosscoders L1 training loss that can misattribute concepts as unique to the fine-tuned model, when they really exist in both models. We develop Latent Scaling to flag these issues by more accurately measuring each latent's presence across models. In experiments comparing Gemma 2 2B base and chat models, we observe that the standard crosscoder suffers heavily from these issues. Building on these insights, we train a crosscoder with BatchTopK loss and show that it substantially mitigates these issues, finding more genuinely chat-specific and highly interpretable concepts. We recommend practitioners adopt similar techniques. Using the BatchTopK crosscoder, we successfully identify a set of genuinely chat-specific latents that are both interpretable and causally effective, representing concepts such as $\textit{false information}$ and $\textit{personal question}$, along with multiple refusal-related latents that show nuanced preferences for different refusal triggers. Overall, our work advances best practices for the crosscoder-based methodology for model diffing and demonstrates that it can provide concrete insights into how chat tuning modifies language model behavior.
Authors: Nikhil Shivakumar Nayak
Abstract: Graph Neural Networks (GNNs) have emerged as the de facto standard for modeling graph data, with attention mechanisms and transformers significantly enhancing their performance on graph-based tasks. Despite these advancements, the performance of GNNs on heterogeneous graphs often remains complex, with networks generally underperforming compared to their homogeneous counterparts. This work benchmarks various GNN architectures to identify the most effective methods for heterogeneous graphs, with a particular focus on node classification and link prediction. Our findings reveal that graph attention networks excel in these tasks. As a main contribution, we explore enhancements to these attention networks by integrating positional encodings for node embeddings. This involves utilizing the full Laplacian spectrum to accurately capture both the relative and absolute positions of each node within the graph, further enhancing performance on downstream tasks such as node classification and link prediction.
Authors: Sijia Zhou (School of Computer Science, University of Birmingham), Yunwen Lei (Department of Mathematics, The University of Hong Kong), Ata Kab\'an (School of Computer Science, University of Birmingham)
Abstract: We study stochastic optimization with data-adaptive sampling schemes to train pairwise learning models. Pairwise learning is ubiquitous, and it covers several popular learning tasks such as ranking, metric learning and AUC maximization. A notable difference of pairwise learning from pointwise learning is the statistical dependencies among input pairs, for which existing analyses have not been able to handle in the general setting considered in this paper. To this end, we extend recent results that blend together two algorithm-dependent frameworks of analysis -- algorithmic stability and PAC-Bayes -- which allow us to deal with any data-adaptive sampling scheme in the optimizer. We instantiate this framework to analyze (1) pairwise stochastic gradient descent, which is a default workhorse in many machine learning problems, and (2) pairwise stochastic gradient descent ascent, which is a method used in adversarial training. All of these algorithms make use of a stochastic sampling from a discrete distribution (sample indices) before each update. Non-uniform sampling of these indices has been already suggested in the recent literature, to which our work provides generalization guarantees in both smooth and non-smooth convex problems.
Authors: Llu\'is Pastor-P\'erez, Javier Alonso-Garcia, Lukas Mauch
Abstract: Order-Preserving (OP) GFlowNets have demonstrated remarkable success in tackling complex multi-objective (MOO) black-box optimization problems using stochastic optimization techniques. Specifically, they can be trained online to efficiently sample diverse candidates near the Pareto front. A key advantage of OP GFlowNets is their ability to impose a local order on training samples based on Pareto dominance, eliminating the need for scalarization - a common requirement in other approaches like Preference-Conditional GFlowNets. However, we identify an important limitation of OP GFlowNets: imposing a local order on training samples can lead to conflicting optimization objectives. To address this issue, we introduce Global-Order GFlowNets, which transform the local order into a global one, thereby resolving these conflicts. Our experimental evaluations on various benchmarks demonstrate the efficacy and promise of our proposed method.
Authors: Nooshin Bahador
Abstract: This study investigates the localization of knowledge representation in fine-tuned GPT-2 models using Causal Layer Attribution via Activation Patching (CLAP), a method that identifies critical neural layers responsible for correct answer generation. The model was fine-tuned on 9,958 PubMed abstracts (epilepsy: 20,595 mentions, EEG: 11,674 mentions, seizure: 13,921 mentions) using two configurations with validation loss monitoring for early stopping. CLAP involved (1) caching clean (correct answer) and corrupted (incorrect answer) activations, (2) computing logit difference to quantify model preference, and (3) patching corrupted activations with clean ones to assess recovery. Results revealed three findings: First, patching the first feedforward layer recovered 56% of correct preference, demonstrating that associative knowledge is distributed across multiple layers. Second, patching the final output layer completely restored accuracy (100% recovery), indicating that definitional knowledge is localised. The stronger clean logit difference for definitional questions further supports this localized representation. Third, minimal recovery from convolutional layer patching (13.6%) suggests low-level features contribute marginally to high-level reasoning. Statistical analysis confirmed significant layer-specific effects (p<0.01). These findings demonstrate that factual knowledge is more localized and associative knowledge depends on distributed representations. We also showed that editing efficacy depends on task type. Our findings not only reconcile conflicting observations about localization in model editing but also emphasize on using task-adaptive techniques for reliable, interpretable updates.
Authors: Yiyuan Xiong, Shaofeng Cai
Abstract: Log messages record important system runtime information and are useful for detecting anomalous behaviors and managing modern software systems. Many supervised and unsupervised learning methods have been proposed recently for log-based anomaly detection. State-of-the-art unsupervised methods predict the next log event given a log sequence and apply fixed configurations that use the same filter condition (i.e. k, the top k predicted log events will be regarded as normal next events) which leads to inferior performance in the detection stage because it sets one fixed k for all log sequences, which ignores the dynamic nature and variance in different log sequences. Recently, deep reinforcement learning (DRL) are widely applied to make intelligent decisions in a dynamic environment. In this work, we contend that it is necessary to apply adaptive filters for different log sequences. To achieve this, we propose a novel approach based on DRL to construct a learned adaptive filter and apply different normal/abnormal filter thresholds for different log sequences. We define the Markov Decision Process (MDP) and formulate the learned adaptive filter as a problem that can be solved by DRL. We evaluate the learned adaptive filter on two state-of-the-art log-based anomaly detection unsupervised approaches DeepLog and LogAnomaly in two datasets HDFS and BGL. Extensive experiments show that our approach outperforms the fixed configurations and achieves significantly better performance in log-based anomaly detection.
Authors: Siqi Wang, Aoming Liu, Bryan A. Plummer
Abstract: Multi-source Domain Generalization (DG) aims to improve model robustness to new distributions. However, DG methods often overlook the effect of label noise, which can confuse a model during training, reducing performance. Limited prior work has analyzed DG method's noise-robustness, typically focused on an analysis of existing methods rather than new solutions. In this paper, we investigate this underexplored space, where models are evaluated under both distribution shifts and label noise, which we refer to as Noise-Aware Generalization (NAG). A natural solution to address label noise would be to combine a Learning with Noisy Labels (LNL) method with those from DG. Many LNL methods aim to detect distribution shifts in a class's samples, i.e., they assume that distribution shifts often correspond to label noise. However, in NAG distribution shifts can be due to label noise or domain shifts, breaking the assumptions used by LNL methods. A naive solution is to make a similar assumption made by many DG methods, where we presume to have domain labels during training, enabling us to isolate the two types of shifts. However, this ignores valuable cross-domain information. Specifically, our proposed DL4ND approach improves noise detection by taking advantage of the observation that noisy samples that may appear indistinguishable within a single domain often show greater variation when compared across domains. Experiments show that DL4ND significantly improves performance across four diverse datasets, offering a promising direction for tackling NAG.
Authors: Bahareh Golchin, Banafsheh Rekabdar
Abstract: A novel approach to detecting anomalies in time series data is presented in this paper. This approach is pivotal in domains such as data centers, sensor networks, and finance. Traditional methods often struggle with manual parameter tuning and cannot adapt to new anomaly types. Our method overcomes these limitations by integrating Deep Reinforcement Learning (DRL) with a Variational Autoencoder (VAE) and Active Learning. By incorporating a Long Short-Term Memory (LSTM) network, our approach models sequential data and its dependencies effectively, allowing for the detection of new anomaly classes with minimal labeled data. Our innovative DRL- VAE and Active Learning combination significantly improves existing methods, as shown by our evaluations on real-world datasets, enhancing anomaly detection techniques and advancing time series analysis.
Authors: Jannis Bl\"uml, Cedric Derstroff, Bjarne Gregori, Elisabeth Dillies, Quentin Delfosse, Kristian Kersting
Abstract: Deep reinforcement learning agents, trained on raw pixel inputs, often fail to generalize beyond their training environments, relying on spurious correlations and irrelevant background details. To address this issue, object-centric agents have recently emerged. However, they require different representations tailored to the task specifications. Contrary to deep agents, no single object-centric architecture can be applied to any environment. Inspired by principles of cognitive science and Occam's Razor, we introduce Object-Centric Attention via Masking (OCCAM), which selectively preserves task-relevant entities while filtering out irrelevant visual information. Specifically, OCCAM takes advantage of the object-centric inductive bias. Empirical evaluations on Atari benchmarks demonstrate that OCCAM significantly improves robustness to novel perturbations and reduces sample complexity while showing similar or improved performance compared to conventional pixel-based RL. These results suggest that structured abstraction can enhance generalization without requiring explicit symbolic representations or domain-specific object extraction pipelines.
Authors: Hanping Zhang, Yuhong Guo
Abstract: Safe Reinforcement Learning (Safe RL) aims to train an RL agent to maximize its performance in real-world environments while adhering to safety constraints, as exceeding safety violation limits can result in severe consequences. In this paper, we propose a novel safe RL approach called Safety Modulated Policy Optimization (SMPO), which enables safe policy function learning within the standard policy optimization framework through safety modulated rewards. In particular, we consider safety violation costs as feedback from the RL environments that are parallel to the standard awards, and introduce a Q-cost function as safety critic to estimate expected future cumulative costs. Then we propose to modulate the rewards using a cost-aware weighting function, which is carefully designed to ensure the safety limits based on the estimation of the safety critic, while maximizing the expected rewards. The policy function and the safety critic are simultaneously learned through gradient descent during online interactions with the environment. We conduct experiments using multiple RL environments and the experimental results demonstrate that our method outperforms several classic and state-of-the-art comparison methods in terms of overall safe RL performance.
Authors: Ian Berlot-Attwell, Frank Rudzicz, Xujie Si
Abstract: Recent advancements in the coding, reasoning, and tool-using abilities of LLMs have spurred interest in library learning (i.e., online learning through the creation, storage, and retrieval of reusable and composable functions, knowledge, checklists, or lemmas). Such systems often promise improved task performance through the automatic creation of broadly applicable tools, as well as superior computational performance through the caching of reasoning (i.e., the storage of generated tools). However, we find strong reason to be skeptical. We perform a deep dive into one such system, LEGO-Prover, which purports to learn reusable lemmas for mathematical reasoning. We find no evidence of the direct reuse of learned lemmas, and find evidence against the soft reuse of learned lemmas (i.e., reuse by modifying relevant examples). Crucially, we find that LEGO-Prover does not in fact improve over the simple baseline of prompting the model - the improvements in task accuracy vanish once computational cost is accounted for. Our findings suggest that serious misconceptions exist as to the effectiveness of these techniques, that a serious re-examination of the state of LLM-based library learning is required, and that we require much stronger standards for evaluation including behavioural analysis and ensuring that an equal computational budget is used for baselines.
Authors: Hao Yan, Yuhong Guo
Abstract: Domain generalization aims at developing suitable learning algorithms in source training domains such that the model learned can generalize well on a different unseen testing domain. We present a novel two-stage approach called Context-Aware Self-Adaptation (CASA) for domain generalization. CASA simulates an approximate meta-generalization scenario and incorporates a self-adaptation module to adjust pre-trained meta source models to the meta-target domains while maintaining their predictive capability on the meta-source domains. The core concept of self-adaptation involves leveraging contextual information, such as the mean of mini-batch features, as domain knowledge to automatically adapt a model trained in the first stage to new contexts in the second stage. Lastly, we utilize an ensemble of multiple meta-source models to perform inference on the testing domain. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on standard benchmarks.
Authors: Md Zahidul Islam, Md Shahidul Islam, Biswajit Chandra das, Syed Ali Reza, Proshanta Kumar Bhowmik, Kanchon Kumar Bishnu, Md Shafiqur Rahman, Redoyan Chowdhury, Laxmi Pant
Abstract: The dramatic adoption of Bitcoin and other cryptocurrencies in the USA has revolutionized the financial landscape and provided unprecedented investment and transaction efficiency opportunities. The prime objective of this research project is to develop machine learning algorithms capable of effectively identifying and tracking suspicious activity in Bitcoin wallet transactions. With high-tech analysis, the study aims to create a model with a feature for identifying trends and outliers that can expose illicit activity. The current study specifically focuses on Bitcoin transaction information in America, with a strong emphasis placed on the importance of knowing about the immediate environment in and through which such transactions pass through. The dataset is composed of in-depth Bitcoin wallet transactional information, including important factors such as transaction values, timestamps, network flows, and addresses for wallets. All entries in the dataset expose information about financial transactions between wallets, including received and sent transactions, and such information is significant for analysis and trends that can represent suspicious activity. This study deployed three accredited algorithms, most notably, Logistic Regression, Random Forest, and Support Vector Machines. In retrospect, Random Forest emerged as the best model with the highest F1 Score, showcasing its ability to handle non-linear relationships in the data. Insights revealed significant patterns in wallet activity, such as the correlation between unredeemed transactions and final balances. The application of machine algorithms in tracking cryptocurrencies is a tool for creating transparent and secure U.S. markets.
Authors: Zhiqun Zuo, Ding Zhu, Mohammad Mahdi Khalili
Abstract: This paper presents a post-processing algorithm for training fair neural network regression models that satisfy statistical parity, utilizing an explainable singular value decomposition (SVD) of the weight matrix. We propose a linear transformation of the weight matrix, whereby the singular values derived from the SVD of the transformed matrix directly correspond to the differences in the first and second moments of the output distributions across two groups. Consequently, we can convert the fairness constraints into constraints on the singular values. We analytically solve the problem of finding the optimal weights under these constraints. Experimental validation on various datasets demonstrates that our method achieves a similar or superior fairness-accuracy trade-off compared to the baselines without using the sensitive attribute at the inference time.
Authors: Abdelmonem Elrefaey, Rong Pan
Abstract: Using both observational and experimental data, a causal discovery process can identify the causal relationships between variables. A unique adaptive intervention design paradigm is presented in this work, where causal directed acyclic graphs (DAGs) are for effectively recovered with practical budgetary considerations. In order to choose treatments that optimize information gain under these considerations, an iterative integer programming (IP) approach is proposed, which drastically reduces the number of experiments required. Simulations over a broad range of graph sizes and edge densities are used to assess the effectiveness of the suggested approach. Results show that the proposed adaptive IP approach achieves full causal graph recovery with fewer intervention iterations and variable manipulations than random intervention baselines, and it is also flexible enough to accommodate a variety of practical constraints.
Authors: Runxue Bao, Quanchao Lu, Yanfu Zhang
Abstract: Group Ordered Weighted $L_{1}$-Norm (Group OWL) regularized models have emerged as a useful procedure for high-dimensional sparse multi-task learning with correlated features. Proximal gradient methods are used as standard approaches to solving Group OWL models. However, Group OWL models usually suffer huge computational costs and memory usage when the feature size is large in the high-dimensional scenario. To address this challenge, in this paper, we are the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks. Thus, by removing the inactive features during the training process, we may achieve substantial computational gain and memory savings. More importantly, the proposed screening rule can be directly integrated with the existing solvers both in the batch and stochastic settings. Theoretically, we prove our screening rule is safe and also can be safely applied to the existing iterative optimization algorithms. Our experimental results demonstrate that our screening rule can effectively identify the inactive features and leads to a significant computational speedup without any loss of accuracy.
Authors: Natalie Tirabassi, Sathish A. P. Kumar, Sumit Jha, Arvind Ramanathan
Abstract: We propose MORAL (a multimodal reinforcement learning framework for decision making in autonomous laboratories) that enhances sequential decision-making in autonomous robotic laboratories through the integration of visual and textual inputs. Using the BridgeData V2 dataset, we generate fine-tuned image captions with a pretrained BLIP-2 vision-language model and combine them with visual features through an early fusion strategy. The fused representations are processed using Deep Q-Network (DQN) and Proximal Policy Optimization (PPO) agents. Experimental results demonstrate that multimodal agents achieve a 20% improvement in task completion rates and significantly outperform visual-only and textual-only baselines after sufficient training. Compared to transformer-based and recurrent multimodal RL models, our approach achieves superior performance in cumulative reward and caption quality metrics (BLEU, METEOR, ROUGE-L). These results highlight the impact of semantically aligned language cues in enhancing agent learning efficiency and generalization. The proposed framework contributes to the advancement of multimodal reinforcement learning and embodied AI systems in dynamic, real-world environments.
Authors: Zihan Gu, Ruoyu Chen, Hua Zhang, Yue Hu, Xiaochun Cao
Abstract: Grokking, referring to the abrupt improvement in test accuracy after extended overfitting, offers valuable insights into the mechanisms of model generalization. Existing researches based on progress measures imply that grokking relies on understanding the optimization dynamics when the loss function is dominated solely by the weight decay term. However, we find that this optimization merely leads to token uniformity, which is not a sufficient condition for grokking. In this work, we investigate the grokking mechanism underlying the Transformer in the task of prime number operations. Based on theoretical analysis and experimental validation, we present the following insights: (i) The weight decay term encourages uniformity across all tokens in the embedding space when it is minimized. (ii) The occurrence of grokking is jointly determined by the uniformity of the embedding space and the distribution of the training dataset. Building on these insights, we provide a unified perspective for understanding various previously proposed progress measures and introduce a novel, concise, and effective progress measure that could trace the changes in test loss more accurately. Finally, to demonstrate the versatility of our theoretical framework, we design a dedicated dataset to validate our theory on ResNet-18, successfully showcasing the occurrence of grokking.
Authors: Sai Gana Sandeep Pula, Sathish A. P. Kumar, Sumit Jha, Arvind Ramanathan
Abstract: This research focuses on enhancing reinforcement learning (RL) algorithms by integrating penalty functions to guide agents in avoiding unwanted actions while optimizing rewards. The goal is to improve the learning process by ensuring that agents learn not only suitable actions but also which actions to avoid. Additionally, we reintroduce a bidirectional learning approach that enables agents to learn from both initial and terminal states, thereby improving speed and robustness in complex environments. Our proposed Penalty-Based Bidirectional methodology is tested against Mani skill benchmark environments, demonstrating an optimality improvement of success rate of approximately 4% compared to existing RL implementations. The findings indicate that this integrated strategy enhances policy learning, adaptability, and overall performance in challenging scenarios
Authors: Huong Pham, Samuel Cheng, Tao Hu, Chengbin Deng
Abstract: Given the growing environmental challenges, accurate monitoring and prediction of changes in water bodies are essential for sustainable management and conservation. The Continuous Monitoring of Land Disturbance (COLD) algorithm provides a valuable tool for real-time analysis of land changes, such as deforestation, urban expansion, agricultural activities, and natural disasters. This capability enables timely interventions and more informed decision-making. This paper assesses the effectiveness of the algorithm to estimate water bodies and track pixel-level water trends over time. Our findings indicate that COLD-derived data can reliably estimate estimate water frequency during stable periods and delineate water bodies. Furthermore, it enables the evaluation of trends in water areas after disturbances, allowing for the determination of whether water frequency increases, decreases, or remains constant.
Authors: Liam Harcombe, Timothy T. Duignan
Abstract: Neural Network Potentials (NNPs) have emerged as a powerful tool for modelling atomic interactions with high accuracy and computational efficiency. Recently, denoising diffusion models have shown promise in NNPs by training networks to remove noise added to stable configurations, eliminating the need for force data during training. In this work, we explore the connection between noise and forces by providing a new, simplified mathematical derivation of their relationship. We also demonstrate how a denoising model can be implemented using a conventional MD software package interfaced with a standard NNP architecture. We demonstrate the approach by training a diffusion-based NNP to simulate a coarse-grained lithium chloride solution and employ data duplication to enhance model performance.
Authors: Kotaro Ikeda, Masanori Koyama, Jinzhe Zhang, Kohei Hayashi, Kenji Fukumizu
Abstract: In this paper, we propose a flow-based method for learning all-to-all transfer maps among conditional distributions, approximating pairwise optimal transport. The proposed method addresses the challenge of handling continuous conditions, which often involve a large set of conditions with sparse empirical observations per condition. We introduce a novel cost function that enables simultaneous learning of optimal transports for all pairs of conditional distributions. Our method is supported by a theoretical guarantee that, in the limit, it converges to pairwise optimal transports among infinite pairs of conditional distributions. The learned transport maps are subsequently used to couple data points in conditional flow matching. We demonstrate the effectiveness of this method on synthetic and benchmark datasets, as well as on chemical datasets where continuous physical properties are defined as conditions.
Authors: Thomas Daniel, Malgorzata Olejniczak, Julien Tierny
Abstract: This application paper investigates the stability of hydrogen bonds (H-bonds), as characterized by the Quantum Theory of Atoms in Molecules (QTAIM). First, we contribute a database of 4544 electron densities associated to four isomers of water hexamers (the so-called Ring, Book, Cage and Prism), generated by distorting their equilibrium geometry under various structural perturbations, modeling the natural dynamic behavior of molecular systems. Second, we present a new stability measure, called bond occurrence rate, associating each bond path present at equilibrium with its rate of occurrence within the input ensemble. We also provide an algorithm, called BondMatcher, for its automatic computation, based on a tailored, geometry-aware partial isomorphism estimation between the extremum graphs of the considered electron densities. Our new stability measure allows for the automatic identification of densities lacking H-bond paths, enabling further visual inspections. Specifically, the topological analysis enabled by our framework corroborates experimental observations and provides refined geometrical criteria for characterizing the disappearance of H-bond paths. Our electron density database and our C++ implementation are available at this address: https://github.com/thom-dani/BondMatcher.
Authors: Jinwei Liu, Wang Yao, Xiao Zhang
Abstract: Neural network-based methods are effective for solving equilibria in Mean-Field Games (MFGs), particularly in high-dimensional settings. However, solving the coupled partial differential equations (PDEs) in MFGs limits their applicability since solving coupled PDEs is computationally expensive. Additionally, modifying boundary conditions, such as the initial state distribution or terminal value function, necessitates extensive retraining, reducing scalability. To address these challenges, we propose a generalized framework, PIONM (Physics-Informed Neural Operator NF-MKV Net), which leverages physics-informed neural operators to solve MFGs equations. PIONM utilizes neural operators to compute MFGs equilibria for arbitrary boundary conditions. The method encodes boundary conditions as input features and trains the model to align them with density evolution, modeled using discrete-time normalizing flows. Once trained, the algorithm efficiently computes the density distribution at any time step for modified boundary condition, ensuring efficient adaptation to different boundary conditions in MFGs equilibria. Unlike traditional MFGs methods constrained by fixed coefficients, PIONM efficiently computes equilibria under varying boundary conditions, including obstacles, diffusion coefficients, initial densities, and terminal functions. PIONM can adapt to modified conditions while preserving density distribution constraints, demonstrating superior scalability and generalization capabilities compared to existing methods.
Authors: Yiding Feng, Wei Tang
Abstract: We introduce and study the persuasive calibration problem, where a principal aims to provide trustworthy predictions about underlying events to a downstream agent to make desired decisions. We adopt the standard calibration framework that regulates predictions to be unbiased conditional on their own value, and thus, they can reliably be interpreted at the face value by the agent. Allowing a small calibration error budget, we aim to answer the following question: what is and how to compute the optimal predictor under this calibration error budget, especially when there exists incentive misalignment between the principal and the agent? We focus on standard Lt-norm Expected Calibration Error (ECE) metric. We develop a general framework by viewing predictors as post-processed versions of perfectly calibrated predictors. Using this framework, we first characterize the structure of the optimal predictor. Specifically, when the principal's utility is event-independent and for L1-norm ECE, we show: (1) the optimal predictor is over-(resp. under-) confident for high (resp. low) true expected outcomes, while remaining perfectly calibrated in the middle; (2) the miscalibrated predictions exhibit a collinearity structure with the principal's utility function. On the algorithmic side, we provide a FPTAS for computing approximately optimal predictor for general principal utility and general Lt-norm ECE. Moreover, for the L1- and L-Infinity-norm ECE, we provide polynomial-time algorithms that compute the exact optimal predictor.
Authors: Bouarfa Mahi Quantiota
Abstract: This paper aims to extend the Structured Knowledge Accumulation (SKA) framework recently proposed by \cite{mahi2025ska}. We introduce two core concepts: the Tensor Net function and the characteristic time property of neural learning. First, we reinterpret the learning rate as a time step in a continuous system. This transforms neural learning from discrete optimization into continuous-time evolution. We show that learning dynamics remain consistent when the product of learning rate and iteration steps stays constant. This reveals a time-invariant behavior and identifies an intrinsic timescale of the network. Second, we define the Tensor Net function as a measure that captures the relationship between decision probabilities, entropy gradients, and knowledge change. Additionally, we define its zero-crossing as the equilibrium state between decision probabilities and entropy gradients. We show that the convergence of entropy and knowledge flow provides a natural stopping condition, replacing arbitrary thresholds with an information-theoretic criterion. We also establish that SKA dynamics satisfy a variational principle based on the Euler-Lagrange equation. These findings extend SKA into a continuous and self-organizing learning model. The framework links computational learning with physical systems that evolve by natural laws. By understanding learning as a time-based process, we open new directions for building efficient, robust, and biologically-inspired AI systems.
Authors: Lifan Hu
Abstract: This work investigates the inverse problem of generator recovery in matrix Lie groups from discretized trajectories. Let $G$ be a real matrix Lie group and $\mathfrak{g} = \text{Lie}(G)$ its corresponding Lie algebra. A smooth trajectory $\gamma($t$)$ generated by a fixed Lie algebra element $\xi \in \mathfrak{g}$ follows the exponential flow $\gamma($t$) = g_0 \cdot \exp(t \xi)$. The central task addressed in this work is the reconstruction of such a latent generator $\xi$ from a discretized sequence of poses $ \{g_0, g_1, \dots, g_T\} \subset G$, sampled at uniform time intervals. This problem is formulated as a data-driven regression from normalized sequences of discrete Lie algebra increments $\log\left(g_{t}^{-1} g_{t+1}\right)$ to the constant generator $\xi \in \mathfrak{g}$. A feedforward neural network is trained to learn this mapping across several groups, including $\text{SE(2)}, \text{SE(3)}, \text{SO(3)}, and \text{SL(2,$\mathbb{R})$}$. It demonstrates strong empirical accuracy under both clean and noisy conditions, which validates the viability of data-driven recovery of Lie group generators using shallow neural architectures. This is Lie-RL GitHub Repo https://github.com/Anormalm/LieRL-on-Trajectories. Feel free to make suggestions and collaborations!
Authors: Makoto Takamoto, Daniel O\~noro-Rubio, Wiem Ben Rim, Takashi Maruyama, Bhushan Kotnis
Abstract: Knowledge graph embedding (KGE) models encode the structural information of knowledge graphs to predicting new links. Effective training of these models requires distinguishing between positive and negative samples with high precision. Although prior research has shown that improving the quality of negative samples can significantly enhance model accuracy, identifying high-quality negative samples remains a challenging problem. This paper theoretically investigates the condition under which negative samples lead to optimal KG embedding and identifies a sufficient condition for an effective negative sample distribution. Based on this theoretical foundation, we propose \textbf{E}mbedding \textbf{MU}tation (\textsc{EMU}), a novel framework that \emph{generates} negative samples satisfying this condition, in contrast to conventional methods that focus on \emph{identifying} challenging negative samples within the training data. Importantly, the simplicity of \textsc{EMU} ensures seamless integration with existing KGE models and negative sampling methods. To evaluate its efficacy, we conducted comprehensive experiments across multiple datasets. The results consistently demonstrate significant improvements in link prediction performance across various KGE models and negative sampling methods. Notably, \textsc{EMU} enables performance improvements comparable to those achieved by models with embedding dimension five times larger. An implementation of the method and experiments are available at https://github.com/nec-research/EMU-KG.
Authors: Christina Halmich, Lucas H\"oschler, Christoph Schranz, Christian Borgelt
Abstract: The integration of machine learning and deep learning has transformed data analytics in biomechanics, enabled by extensive wearable sensor data. However, the field faces challenges such as limited large-scale datasets and high data acquisition costs, which hinder the development of robust algorithms. Data augmentation techniques show promise in addressing these issues, but their application to biomechanical time-series data requires comprehensive evaluation. This scoping review investigates data augmentation methods for time-series data in the biomechanics domain. It analyzes current approaches for augmenting and generating time-series datasets, evaluates their effectiveness, and offers recommendations for applying these techniques in biomechanics. Four databases, PubMed, IEEE Xplore, Scopus, and Web of Science, were searched for studies published between 2013 and 2024. Following PRISMA-ScR guidelines, a two-stage screening identified 21 relevant publications. Results show that there is no universally preferred method for augmenting biomechanical time-series data; instead, methods vary based on study objectives. A major issue identified is the absence of soft tissue artifacts in synthetic data, leading to discrepancies referred to as the synthetic gap. Moreover, many studies lack proper evaluation of augmentation methods, making it difficult to assess their effects on model performance and data quality. This review highlights the critical role of data augmentation in addressing limited dataset availability and improving model generalization in biomechanics. Tailoring augmentation strategies to the characteristics of biomechanical data is essential for advancing predictive modeling. A better understanding of how different augmentation methods impact data quality and downstream tasks will be key to developing more effective and realistic techniques.
Authors: Samuel Bilson, Maurice Cox, Anna Pustogvar, Andrew Thompson
Abstract: Machine learning (ML) classification models are increasingly being used in a wide range of applications where it is important that predictions are accompanied by uncertainties, including in climate and earth observation, medical diagnosis and bioaerosol monitoring. The output of an ML classification model is a type of categorical variable known as a nominal property in the International Vocabulary of Metrology (VIM). However, concepts related to uncertainty evaluation for nominal properties are not defined in the VIM, nor is such evaluation addressed by the Guide to the Expression of Uncertainty in Measurement (GUM). In this paper we propose a metrological conceptual uncertainty evaluation framework for ML classification, and illustrate its use in the context of two applications that exemplify the issues and have significant societal impact, namely, climate and earth observation and medical diagnosis. Our framework would enable an extension of the VIM and GUM to uncertainty for nominal properties, which would make both applicable to ML classification models.
Authors: Claudius Krause, Daohan Wang, Ramon Winterhalder
Abstract: The increasing complexity of modern neural network architectures demands fast and memory-efficient implementations to mitigate computational bottlenecks. In this work, we evaluate the recently proposed BitNet architecture in HEP applications, assessing its performance in classification, regression, and generative modeling tasks. Specifically, we investigate its suitability for quark-gluon discrimination, SMEFT parameter estimation, and detector simulation, comparing its efficiency and accuracy to state-of-the-art methods. Our results show that while BitNet consistently performs competitively in classification tasks, its performance in regression and generation varies with the size and type of the network, highlighting key limitations and potential areas for improvement.
Authors: Gianluca Maselli, Vieri Giuliano Santucci
Abstract: Dealing with environments with sparse rewards has always been crucial for systems developed to operate in autonomous open-ended learning settings. Intrinsic Motivations could be an effective way to help Deep Reinforcement Learning algorithms learn in such scenarios. In fact, intrinsic reward signals, such as novelty or curiosity, are generally adopted to improve exploration when extrinsic rewards are delayed or absent. Building on previous works, we tackle the problem of learning policies in the presence of sparse rewards by proposing a two-level architecture that alternates an ''intrinsically driven'' phase of exploration and autonomous sub-goal generation, to a phase of sparse reward, goal-directed policy learning. The idea is to build several small networks, each one specialized on a particular sub-path, and use them as starting points for future exploration without the need to further explore from scratch previously learnt paths. Two versions of the system have been trained and tested in the Gym SuperMarioBros environment without considering any additional extrinsic reward. The results show the validity of our approach and the importance of autonomously segment the environment to generate an efficient path towards the final goal.
Authors: Sathish Kumar, Swaroop Damodaran, Naveen Kumar Kuruba, Sumit Jha, Arvind Ramanathan
Abstract: This paper presents a novel deep learning framework for robotic arm manipulation that integrates multimodal inputs using a late-fusion strategy. Unlike traditional end-to-end or reinforcement learning approaches, our method processes image sequences with pre-trained models and robot state data with machine learning algorithms, fusing their outputs to predict continuous action values for control. Evaluated on BridgeData V2 and Kuka datasets, the best configuration (VGG16 + Random Forest) achieved MSEs of 0.0021 and 0.0028, respectively, demonstrating strong predictive performance and robustness. The framework supports modularity, interpretability, and real-time decision-making, aligning with the goals of adaptive, human-in-the-loop cyber-physical systems.
Authors: Alexander Mattick, Maniraman Periyasamy, Christian Ufrecht, Abhishek Y. Dubey, Christopher Mutschler, Axel Plinge, Daniel D. Scherer
Abstract: Quantum computing is currently strongly limited by the impact of noise, in particular introduced by the application of two-qubit gates. For this reason, reducing the number of two-qubit gates is of paramount importance on noisy intermediate-scale quantum hardware. To advance towards more reliable quantum computing, we introduce a framework based on ZX calculus, graph-neural networks and reinforcement learning for quantum circuit optimization. By combining reinforcement learning and tree search, our method addresses the challenge of selecting optimal sequences of ZX calculus rewrite rules. Instead of relying on existing heuristic rules for minimizing circuits, our method trains a novel reinforcement learning policy that directly operates on ZX-graphs, therefore allowing us to search through the space of all possible circuit transformations to find a circuit significantly minimizing the number of CNOT gates. This way we can scale beyond hard-coded rules towards discovering arbitrary optimization rules. We demonstrate our method's competetiveness with state-of-the-art circuit optimizers and generalization capabilities on large sets of diverse random circuits.
Authors: Van-Anh Nguyen, Thanh-Toan Do, Mehrtash Harandi, Dinh Phung, Trung Le
Abstract: Foundation models, with a vast number of parameters and pretraining on massive datasets, achieve state-of-the-art performance across various applications. However, efficiently adapting them to downstream tasks with minimal computational overhead remains a challenge. Parameter-Efficient Transfer Learning (PETL) addresses this by fine-tuning only a small subset of parameters while preserving pre-trained knowledge. In this paper, we propose SaS, a novel PETL method that effectively mitigates distributional shifts during fine-tuning. SaS integrates (1) a shared module that captures common statistical characteristics across layers using low-rank projections and (2) a layer-specific module that employs hypernetworks to generate tailored parameters for each layer. This dual design ensures an optimal balance between performance and parameter efficiency while introducing less than 0.05% additional parameters, making it significantly more compact than existing methods. Extensive experiments on diverse downstream tasks, few-shot settings and domain generalization demonstrate that SaS significantly enhances performance while maintaining superior parameter efficiency compared to existing methods, highlighting the importance of capturing both shared and layer-specific information in transfer learning. Code and data are available at https://anonymous.4open.science/r/SaS-PETL-3565.
Authors: Spyros Kondylatos, Nikolaos Ioannis Bountos, Ioannis Prapas, Angelos Zavras, Gustau Camps-Valls, Ioannis Papoutsis
Abstract: Label noise poses a significant challenge in Earth Observation (EO), often degrading the performance and reliability of supervised Machine Learning (ML) models. Yet, given the critical nature of several EO applications, developing robust and trustworthy ML solutions is essential. In this study, we take a step in this direction by leveraging probabilistic ML to model input-dependent label noise and quantify data uncertainty in EO tasks, accounting for the unique noise sources inherent in the domain. We train uncertainty-aware probabilistic models across a broad range of high-impact EO applications-spanning diverse noise sources, input modalities, and ML configurations-and introduce a dedicated pipeline to assess their accuracy and reliability. Our experimental results show that the uncertainty-aware models consistently outperform the standard deterministic approaches across most datasets and evaluation metrics. Moreover, through rigorous uncertainty evaluation, we validate the reliability of the predicted uncertainty estimates, enhancing the interpretability of model predictions. Our findings emphasize the importance of modeling label noise and incorporating uncertainty quantification in EO, paving the way for more accurate, reliable, and trustworthy ML solutions in the field.
Authors: Dennis Wilkman, Kateryna Morozovska, Karl Henrik Johansson, Matthieu Barreau
Abstract: Recent works on the application of Physics-Informed Neural Networks to traffic density estimation have shown to be promising for future developments due to their robustness to model errors and noisy data. In this paper, we introduce a methodology for online approximation of the traffic density using measurements from probe vehicles in two settings: one using the Greenshield model and the other considering a high-fidelity traffic simulation. The proposed method continuously estimates the real-time traffic density in space and performs model identification with each new set of measurements. The density estimate is updated in almost real-time using gradient descent and adaptive weights. In the case of full model knowledge, the resulting algorithm has similar performance to the classical open-loop one. However, in the case of model mismatch, the iterative solution behaves as a closed-loop observer and outperforms the baseline method. Similarly, in the high-fidelity setting, the proposed algorithm correctly reproduces the traffic characteristics.
Authors: Federica Bragone, Kateryna Morozovska, Tor Laneryd, Khemraj Shukla, Stefano Markidis
Abstract: The degree of polymerization (DP) is one of the methods for estimating the aging of the polymer based insulation systems, such as cellulose insulation in power components. The main degradation mechanisms in polymers are hydrolysis, pyrolysis, and oxidation. These mechanisms combined cause a reduction of the DP. However, the data availability for these types of problems is usually scarce. This study analyzes insulation aging using cellulose degradation data from power transformers. The aging problem for the cellulose immersed in mineral oil inside power transformers is modeled with ordinary differential equations (ODEs). We recover the governing equations of the degradation system using Physics-Informed Neural Networks (PINNs) and symbolic regression. We apply PINNs to discover the Arrhenius equation's unknown parameters in the Ekenstam ODE describing cellulose contamination content and the material aging process related to temperature for synthetic data and real DP values. A modification of the Ekenstam ODE is given by Emsley's system of ODEs, where the rate constant expressed by the Arrhenius equation decreases in time with the new formulation. We use PINNs and symbolic regression to recover the functional form of one of the ODEs of the system and to identify an unknown parameter.
Authors: John Paisley, Wei Zhang, Brian Barr
Abstract: We present three Fisher divergence (FD) minimization algorithms for learning Gaussian process (GP) based score models for lower dimensional density estimation problems. The density is formed by multiplying a base multivariate normal distribution with an exponentiated GP refinement, and so we refer to it as a GP-tilted nonparametric density. By representing the GP part of the score as a linear function using the random Fourier feature (RFF) approximation, we show that all learning problems can be solved in closed form. This includes the basic and noise conditional versions of the Fisher divergence, as well as a novel alternative to noise conditional FD models based on variational inference (VI). Here, we propose using an ELBO-like optimization of the approximate posterior with which we derive a Fisher variational predictive distribution. The RFF representation of the GP, which is functionally equivalent to a single layer neural network score model with cosine activation, provides a unique linear form for which all expectations are in closed form. The Gaussian base also helps with tractability of the VI approximation. We demonstrate our three learning algorithms, as well as a MAP baseline algorithm, on several low dimensional density estimation problems. The closed-form nature of the learning problem removes the reliance on iterative algorithms, making this technique particularly well-suited to large data sets.
Authors: Luis Barba, Johannes Kirschner, Tomas Aidukas, Manuel Guizar-Sicairos, Benjam\'in B\'ejar
Abstract: We introduce Diffusion Active Learning, a novel approach that combines generative diffusion modeling with data-driven sequential experimental design to adaptively acquire data for inverse problems. Although broadly applicable, we focus on scientific computed tomography (CT) for experimental validation, where structured prior datasets are available, and reducing data requirements directly translates to shorter measurement times and lower X-ray doses. We first pre-train an unconditional diffusion model on domain-specific CT reconstructions. The diffusion model acts as a learned prior that is data-dependent and captures the structure of the underlying data distribution, which is then used in two ways: It drives the active learning process and also improves the quality of the reconstructions. During the active learning loop, we employ a variant of diffusion posterior sampling to generate conditional data samples from the posterior distribution, ensuring consistency with the current measurements. Using these samples, we quantify the uncertainty in the current estimate to select the most informative next measurement. Our results show substantial reductions in data acquisition requirements, corresponding to lower X-ray doses, while simultaneously improving image reconstruction quality across multiple real-world tomography datasets.
Authors: Alexander Windmann, Henrik Steude, Daniel Boschmann, Oliver Niggemann
Abstract: Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility.
Authors: Alex Young, Luan Vin\'icius Fiorio, Bo Yang, Boris Karanov, Wim van Houtum, Ronald M. Aarts
Abstract: We propose a \emph{hybrid} real- and complex-valued \emph{neural network} (HNN) architecture, designed to combine the computational efficiency of real-valued processing with the ability to effectively handle complex-valued data. We illustrate the limitations of using real-valued neural networks (RVNNs) for inherently complex-valued problems by showing how it learnt to perform complex-valued convolution, but with notable inefficiencies stemming from its real-valued constraints. To create the HNN, we propose to use building blocks containing both real- and complex-valued paths, where information between domains is exchanged through domain conversion functions. We also introduce novel complex-valued activation functions, with higher generalisation and parameterisation efficiency. HNN-specific architecture search techniques are described to navigate the larger solution space. Experiments with the AudioMNIST dataset demonstrate that the HNN reduces cross-entropy loss and consumes less parameters compared to an RVNN for all considered cases. Such results highlight the potential for the use of partially complex-valued processing in neural networks and applications for HNNs in many signal processing domains.
Authors: Wai Fong Tam, Qilei Li, Ahmed M. Abdelmonie
Abstract: Federated learning enables collaborative model training across distributed entities while maintaining individual data privacy. A key challenge in federated learning is balancing the personalization of models for local clients with generalization for the global model. Recent efforts leverage logit-based knowledge aggregation and distillation to overcome these issues. However, due to the non-IID nature of data across diverse clients and the imbalance in the client's data distribution, directly aggregating the logits often produces biased knowledge that fails to apply to individual clients and obstructs the convergence of local training. To solve this issue, we propose a Hierarchical Knowledge Structuring (HKS) framework that formulates sample logits into a multi-granularity codebook to represent logits from personalized per-sample insights to globalized per-class knowledge. The unsupervised bottom-up clustering method is leveraged to enable the global server to provide multi-granularity responses to local clients. These responses allow local training to integrate supervised learning objectives with global generalization constraints, which results in more robust representations and improved knowledge sharing in subsequent training rounds. The proposed framework's effectiveness is validated across various benchmarks and model architectures.
Authors: Md Abu Obaida Zishan, H M Shihab, Sabik Sadman Islam, Maliha Alam Riya, Gazi Mashrur Rahman, Jannatun Noor
Abstract: The electrocardiogram (ECG) monitoring device is an expensive albeit essential device for the treatment and diagnosis of cardiovascular diseases (CVD). The cost of this device typically ranges from $2000 to $10000. Several studies have implemented ECG monitoring systems in micro-controller units (MCU) to reduce industrial development costs by up to 20 times. However, to match industry-grade systems and display heartbeats effectively, it is essential to develop an efficient algorithm for detecting arrhythmia (irregular heartbeat). Hence in this study, a dense neural network is developed to detect arrhythmia on the Arduino Nano. The Nano consists of the ATMega328 microcontroller with a 16MHz clock, 2KB of SRAM, and 32KB of program memory. Additionally, the AD8232 SparkFun Single-Lead Heart Rate Monitor is used as the ECG sensor. The implemented neural network model consists of two layers (excluding the input) with 10 and four neurons respectively with sigmoid activation function. However, four approaches are explored to choose the appropriate activation functions. The model has a size of 1.267 KB, achieves an F1 score (macro-average) of 78.3\% for classifying four types of arrhythmia, an accuracy rate of 96.38%, and requires 0.001314 MOps of floating-point operations (FLOPs).
Authors: Kamil Ciosek, Nicol\`o Felicioni, Sina Ghiassian
Abstract: Detecting whether an LLM hallucinates is an important research challenge. One promising way of doing so is to estimate the semantic entropy (Farquhar et al., 2024) of the distribution of generated sequences. We propose a new algorithm for doing that, with two main advantages. First, due to us taking the Bayesian approach, we achieve a much better quality of semantic entropy estimates for a given budget of samples from the LLM. Second, we are able to tune the number of samples adaptively so that `harder' contexts receive more samples. We demonstrate empirically that our approach systematically beats the baselines, requiring only 59% of samples used by Farquhar et al. (2024) to achieve the same quality of hallucination detection as measured by AUROC. Moreover, quite counterintuitively, our estimator is useful even with just one sample from the LLM.
Authors: Benjamin T. Brown, Haoxiang Zhang, Daniel L. Lau, Gonzalo R. Arce
Abstract: In graph signal processing, learning the weighted connections between nodes from a set of sample signals is a fundamental task when the underlying relationships are not known a priori. This task is typically addressed by finding a graph Laplacian on which the observed signals are smooth. With the extension of graphs to hypergraphs - where edges can connect more than two nodes - graph learning methods have similarly been generalized to hypergraphs. However, the absence of a unified framework for calculating total variation has led to divergent definitions of smoothness and, consequently, differing approaches to hyperedge recovery. We confront this challenge through generalization of several previously proposed hypergraph total variations, subsequently allowing ease of substitution into a vector based optimization. To this end, we propose a novel hypergraph learning method that recovers a hypergraph topology from time-series signals based on a smoothness prior. Our approach addresses key limitations in prior works, such as hyperedge selection and convergence issues, by formulating the problem as a convex optimization solved via a forward-backward-forward algorithm, ensuring guaranteed convergence. Additionally, we introduce a process that simultaneously limits the span of the hyperedge search and maintains a valid hyperedge selection set. In doing so, our method becomes scalable in increasingly complex network structures. The experimental results demonstrate improved performance, in terms of accuracy, over other state-of-the-art hypergraph inference methods; furthermore, we empirically show our method to be robust to total variation terms, biased towards global smoothness, and scalable to larger hypergraphs.
Authors: Ryan G. Dempsey, Jonathan Ethier, Halim Yanikomeroglu
Abstract: Path loss modeling is a widely used technique for estimating point-to-point losses along a communications link from transmitter (Tx) to receiver (Rx). Accurate path loss predictions can optimize use of the radio frequency spectrum and minimize unwanted interference. Modern path loss modeling often leverages data-driven approaches, using machine learning to train models on drive test measurement datasets. Drive tests primarily represent downlink scenarios, where the Tx is located on a building and the Rx is located on a moving vehicle. Consequently, trained models are frequently reserved for downlink coverage estimation, lacking representation of uplink scenarios. In this paper, we demonstrate that data augmentation can be used to train a path loss model that is generalized to uplink, downlink, and backhaul scenarios, training using only downlink drive test measurements. By adding a small number of synthetic samples representing uplink scenarios to the training set, root mean squared error is reduced by >8 dB on uplink examples in the test set.
Authors: Weizheng Zhang (Shandong University), Hao Pan (Tsinghua University), Lin Lu (Shandong University), Xiaowei Duan (Shandong University), Xin Yan (Shandong University), Ruonan Wang (Institute of Engineering Thermophysics, Chinese Academy of Sciences), Qiang Du (Institute of Engineering Thermophysics, Chinese Academy of Sciences)
Abstract: Heat exchangers are critical components in a wide range of engineering applications, from energy systems to chemical processing, where efficient thermal management is essential. The design objectives for heat exchangers include maximizing the heat exchange rate while minimizing the pressure drop, requiring both a large interface area and a smooth internal structure. State-of-the-art designs, such as triply periodic minimal surfaces (TPMS), have proven effective in optimizing heat exchange efficiency. However, TPMS designs are constrained by predefined mathematical equations, limiting their adaptability to freeform boundary shapes. Additionally, TPMS structures do not inherently control flow directions, which can lead to flow stagnation and undesirable pressure drops. This paper presents DualMS, a novel computational framework for optimizing dual-channel minimal surfaces specifically for heat exchanger designs in freeform shapes. To the best of our knowledge, this is the first attempt to directly optimize minimal surfaces for two-fluid heat exchangers, rather than relying on TPMS. Our approach formulates the heat exchange maximization problem as a constrained connected maximum cut problem on a graph, with flow constraints guiding the optimization process. To address undesirable pressure drops, we model the minimal surface as a classification boundary separating the two fluids, incorporating an additional regularization term for area minimization. We employ a neural network that maps spatial points to binary flow types, enabling it to classify flow skeletons and automatically determine the surface boundary. DualMS demonstrates greater flexibility in surface topology compared to TPMS and achieves superior thermal performance, with lower pressure drops while maintaining a similar heat exchange rate under the same material cost.
Authors: Sangwoo Park, Stefan Vlaski, Lajos Hanzo
Abstract: In multi-objective optimization, minimizing the worst objective can be preferable to minimizing the average objective, as this ensures improved fairness across objectives. Due to the non-smooth nature of the resultant min-max optimization problem, classical subgradient-based approaches typically exhibit slow convergence. Motivated by primal-dual consensus techniques in multi-agent optimization and learning, we formulate a smooth variant of the min-max problem based on the augmented Lagrangian. The resultant Exact Pareto Optimization via Augmented Lagrangian (EPO-AL) algorithm scales better with the number of objectives than subgradient-based strategies, while exhibiting lower per-iteration complexity than recent smoothing-based counterparts. We establish that every fixed-point of the proposed algorithm is both Pareto and min-max optimal under mild assumptions and demonstrate its effectiveness in numerical simulations.
Authors: Warat Kongkitkul, Sompote Youwai, Warut Sakulpojworachai
Abstract: This research introduces a novel dual-attention transformer architecture for predicting soil electrical resistivity, a critical parameter for high-voltage substation construction. Our model employs attention mechanisms operating across both features and data batches, enhanced by feature embedding layers that project inputs into higher-dimensional spaces. We implements Particle Swarm Optimization for hyperparameter tuning, systematically optimizing embedding dimensions, attention heads, and neural network architecture. The proposed architecture achieves superior predictive performance (Mean Absolute Percentage Error: 0.63%) compared to recent state of the art models for tabular data. Crucially, our model maintains explainability through SHapley Additive exPlanations value analysis, revealing that fine particle content and dry density are the most influential parameters affecting soil resistivity. We developes a web-based application implementing this model to provide engineers with an accessible decision support framework that bridges geotechnical and electrical engineering requirements for the Electricity Generating Authority of Thailand. This integrated approach satisfies both structural stability and electrical safety standards, improving construction efficiency and safety compliance in high-voltage infrastructure implementation.
Authors: Valentin Lombard, Sergei Grudinin, Elodie Laine
Abstract: Proteins move and deform to ensure their biological functions. Despite significant progress in protein structure prediction, approximating conformational ensembles at physiological conditions remains a fundamental open problem. This paper presents a novel perspective on the problem by directly targeting continuous compact representations of protein motions inferred from sparse experimental observations. We develop a task-specific loss function enforcing data symmetries, including scaling and permutation operations. Our method PETIMOT (Protein sEquence and sTructure-based Inference of MOTions) leverages transfer learning from pre-trained protein language models through an SE(3)-equivariant graph neural network. When trained and evaluated on the Protein Data Bank, PETIMOT shows superior performance in time and accuracy, capturing protein dynamics, particularly large/slow conformational changes, compared to state-of-the-art flow-matching approaches and traditional physics-based models.
Authors: Baozhuo Su, Qingli Dou, Kang Liu, Zhengxian Qu, Jerry Deng, Ting Tan, Yanan Gu
Abstract: AI computation in healthcare faces significant challenges when clinical datasets are limited and heterogeneous. Integrating datasets from multiple sources and different equipments is critical for effective AI computation but is complicated by their diversity, complexity, and lack of representativeness, so we often need to join multiple datasets for analysis. The currently used method is fusion after normalization. But when using this method, it can introduce redundant information, decreasing the signal-to-noise ratio and reducing classification accuracy. To tackle this issue, we propose a feature-based fusion algorithm utilizing Kernel Density Estimation (KDE) and Kullback-Leibler (KL) divergence. Our approach involves initially preprocessing and continuous estimation on the extracted features, followed by employing the gradient descent method to identify the optimal linear parameters that minimize the KL divergence between the feature distributions. Using our in-house datasets consisting of ECG signals collected from 2000 healthy and 2000 diseased individuals by different equipments and verifying our method by using the publicly available PTB-XL dataset which contains 21,837 ECG recordings from 18,885 patients. We employ a Light Gradient Boosting Machine (LGBM) model to do the binary classification. The results demonstrate that the proposed fusion method significantly enhances feature-based classification accuracy for abnormal ECG cases in the merged datasets, compared to the normalization method. This data fusion strategy provides a new approach to process heterogeneous datasets for the optimal AI computation results.
Authors: Yuta Yahagi, Kiichi Obuchi, Fumihiko Kosaka, Kota Matsui
Abstract: Simulation-to-Real (Sim2Real) transfer learning, the machine learning technique that efficiently solves a real-world task by leveraging knowledge from computational data, has received increasing attention in materials science as a promising solution to the scarcity of experimental data. We proposed an efficient transfer learning scheme from first-principles calculations to experiments based on the chemistry-informed domain transformation, that integrates the heterogeneous source and target domains by harnessing the underlying physics and chemistry. The proposed method maps the computational data from the simulation space (source domain) into the space of experimental data (target domain). During this process, these qualitatively different domains are efficiently bridged by prior knowledge of chemistry, the statistical ensemble and the relationship between source and target quantities. As a proof-of-concept, we predict the catalyst activity for the reverse water-gas shift reaction by using the abundant first-principles data in addition to the experimental data. Through the demonstration, we confirmed that the transfer learning model exhibits positive transfer in accuracy and data efficiency. In particular, a significantly high accuracy was achieved despite using a few (less than ten) target data in domain transformation, whose accuracy is one order of magnitude smaller than that of a full scratch model trained with over 100 target data. This result indicates that the proposed method leverages the high prediction performance with few target data, which helps to save the number of trials in real laboratories.
Authors: Lisha Chen, Quan Xiao, Ellen Hidemi Fukuda, Xinyi Chen, Kun Yuan, Tianyi Chen
Abstract: Multi-objective learning under user-specified preference is common in real-world problems such as multi-lingual speech recognition under fairness. In this work, we frame such a problem as a semivectorial bilevel optimization problem, whose goal is to optimize a pre-defined preference function, subject to the constraint that the model parameters are weakly Pareto optimal. To solve this problem, we convert the multi-objective constraints to a single-objective constraint through a merit function with an easy-to-evaluate gradient, and then, we use a penalty-based reformulation of the bilevel optimization problem. We theoretically establish the properties of the merit function, and the relations of solutions for the penalty reformulation and the constrained formulation. Then we propose algorithms to solve the reformulated single-level problem, and establish its convergence guarantees. We test the method on various synthetic and real-world problems. The results demonstrate the effectiveness of the proposed method in finding preference-guided optimal solutions to the multi-objective problem.
Authors: Evgenii Evstafev
Abstract: Large language models (LLMs) demonstrate increasing capabilities in creative text generation, yet systematic evaluations of their humor production remain underexplored. This study presents a comprehensive analysis of 13 state-of-the-art LLMs across five architectural families, evaluating their performance in generating technically relevant humor for software developers. Through a full factorial design testing 715 unique configurations of temperature settings and prompt variations, we assess model outputs using five weighted criteria: humor quality, domain relevance, concept originality, tone precision, and delivery efficiency. Our methodology employs rigorous statistical analysis including ANOVA, correlation studies, and quadratic regression to identify optimal configurations and architectural influences. Results reveal significant performance variations across models, with certain architectures achieving 21.8% superiority over baseline systems. Temperature sensitivity analysis demonstrates that 73% of models achieve peak performance at lower stochasticity settings (<= 0.5), though optimal ranges vary substantially by architecture. We identify distinct model clusters: compact high-performers maintaining efficiency-quality balance versus verbose specialists requiring longer outputs for marginal gains. Statistical validation confirms model architecture explains 38.7% of performance variance, with significant correlations between humor quality and concept originality. The study establishes practical guidelines for model selection and configuration, demonstrating how temperature adjustments and architectural considerations impact humor generation effectiveness. These findings advance understanding of LLM capabilities in creative technical writing and provide empirically validated configuration strategies for developers implementing humor-generation systems.
Authors: Sudong Wang, Yunjian Zhang, Yao Zhu, Jianing Li, Zizhe Wang, Yanwei Liu, Xiangyang Ji
Abstract: Large Vision-Language Models (LVLMs) are gradually becoming the foundation for many artificial intelligence applications. However, understanding their internal working mechanisms has continued to puzzle researchers, which in turn limits the further enhancement of their capabilities. In this paper, we seek to investigate how multimodal knowledge evolves and eventually induces natural languages in LVLMs. We design a series of novel strategies for analyzing internal knowledge within LVLMs, and delve into the evolution of multimodal knowledge from three levels, including single token probabilities, token probability distributions, and feature encodings. In this process, we identify two key nodes in knowledge evolution: the critical layers and the mutation layers, dividing the evolution process into three stages: rapid evolution, stabilization, and mutation. Our research is the first to reveal the trajectory of knowledge evolution in LVLMs, providing a fresh perspective for understanding their underlying mechanisms. Our codes are available at https://github.com/XIAO4579/Vlm-interpretability.
Authors: Yining Wang, Yuquan Wang, Xi Li, Mi Zhang, Geng Hong, Min Yang
Abstract: As Large Language Models (LLMs) continue to advance, they are increasingly relied upon as real-time sources of information by non-expert users. To ensure the factuality of the information they provide, much research has focused on mitigating hallucinations in LLM responses, but only in the context of formal user queries, rather than maliciously crafted ones. In this study, we introduce The Illusionist's Prompt, a novel hallucination attack that incorporates linguistic nuances into adversarial queries, challenging the factual accuracy of LLMs against five types of fact-enhancing strategies. Our attack automatically generates highly transferrable illusory prompts to induce internal factual errors, all while preserving user intent and semantics. Extensive experiments confirm the effectiveness of our attack in compromising black-box LLMs, including commercial APIs like GPT-4o and Gemini-2.0, even with various defensive mechanisms.
Authors: Ariadna Toh\`a-Dalmau (Department of Computer Science, Universitat Polit\`ecnica de Catalunya), Josep Rosin\'es-Fonoll (Institut Cl\'inic d'Oftalmolog\'ia, Hospital Cl\'inic de Barcelona), Enrique Romero (Department of Computer Science, Universitat Polit\`ecnica de Catalunya, Intelligent Data Science and Artificial Intelligence Research Center), Ferran Mazzanti (Department of Physics, Universitat Polit\`ecnica de Catalunya), Ruben Martin-Pinardel (August Pi i Sunyer Biomedical Research Institute), Sonia Marias-Perez (Institut Cl\'inic d'Oftalmolog\'ia, Hospital Cl\'inic de Barcelona), Carolina Bernal-Morales (Institut Cl\'inic d'Oftalmolog\'ia, Hospital Cl\'inic de Barcelona, Diabetes Unit, Hospital Cl\'inic de Barcelona), Rafael Castro-Dominguez (Institut Cl\'inic d'Oftalmolog\'ia, Hospital Cl\'inic de Barcelona), Andrea Mendez (Institut Cl\'inic d'Oftalmolog\'ia, Hospital Cl\'inic de Barcelona), Emilio Ortega (August Pi i Sunyer Biomedical Research Institute, School of Medicine, Universitat de Barcelona), Irene Vinagre (August Pi i Sunyer Biomedical Research Institute, School of Medicine, Universitat de Barcelona), Marga Gimenez (August Pi i Sunyer Biomedical Research Institute, School of Medicine, Universitat de Barcelona), Alfredo Vellido (Department of Computer Science, Universitat Polit\`ecnica de Catalunya, Intelligent Data Science and Artificial Intelligence Research Center), Javier Zarranz-Ventura (Institut Cl\'inic d'Oftalmolog\'ia, Hospital Cl\'inic de Barcelona, School of Medicine, Universitat de Barcelona)
Abstract: This study aimed to develop a machine learning (ML) algorithm capable of determining cardiovascular risk in multimodal retinal images from patients with type 1 diabetes mellitus, distinguishing between moderate, high, and very high-risk levels. Radiomic features were extracted from fundus retinography, optical coherence tomography (OCT), and OCT angiography (OCTA) images. ML models were trained using these features either individually or combined with clinical data. A dataset of 597 eyes (359 individuals) was analyzed, and models trained only with radiomic features achieved AUC values of (0.79 $\pm$ 0.03) for identifying moderate risk cases from high and very high-risk cases, and (0.73 $\pm$ 0.07) for distinguishing between high and very high-risk cases. The addition of clinical variables improved all AUC values, reaching (0.99 $\pm$ 0.01) for identifying moderate risk cases and (0.95 $\pm$ 0.02) for differentiating between high and very high-risk cases. For very high CV risk, radiomics combined with OCT+OCTA metrics and ocular data achieved an AUC of (0.89 $\pm$ 0.02) without systemic data input. These results demonstrate that radiomic features obtained from multimodal retinal images are useful for discriminating and classifying CV risk labels, highlighting the potential of this oculomics approach for CV risk assessment.
Authors: Yangxiao Lu, Ruosen Li, Liqiang Jing, Jikai Wang, Xinya Du, Yunhui Guo, Nicholas Ruozzi, Yu Xiang
Abstract: Visual grounding focuses on detecting objects from images based on language expressions. Recent Large Vision-Language Models (LVLMs) have significantly advanced visual grounding performance by training large models with large-scale datasets. However, the problem remains challenging, especially when similar objects appear in the input image. For example, an LVLM may not be able to differentiate Diet Coke and regular Coke in an image. In this case, if additional reference images of Diet Coke and regular Coke are available, it can help the visual grounding of similar objects. In this work, we introduce a new task named Multimodal Reference Visual Grounding (MRVG). In this task, a model has access to a set of reference images of objects in a database. Based on these reference images and a language expression, the model is required to detect a target object from a query image. We first introduce a new dataset to study the MRVG problem. Then we introduce a novel method, named MRVG-Net, to solve this visual grounding problem. We show that by efficiently using reference images with few-shot object detection and using Large Language Models (LLMs) for object matching, our method achieves superior visual grounding performance compared to the state-of-the-art LVLMs such as Qwen2.5-VL-7B. Our approach bridges the gap between few-shot detection and visual grounding, unlocking new capabilities for visual understanding. Project page with our code and dataset: https://irvlutd.github.io/MultiGrounding
Authors: Lilin Xu, Kaiyuan Hou, Xiaofan Jiang
Abstract: Human activity recognition (HAR) using inertial measurement units (IMUs) increasingly leverages large language models (LLMs), yet existing approaches focus on coarse activities like walking or running. Our preliminary study indicates that pretrained LLMs fail catastrophically on fine-grained HAR tasks such as air-written letter recognition, achieving only near-random guessing accuracy. In this work, we first bridge this gap for flat-surface writing scenarios: by fine-tuning LLMs with a self-collected dataset and few-shot learning, we achieved up to a 129x improvement on 2D data. To extend this to 3D scenarios, we designed an encoder-based pipeline that maps 3D data into 2D equivalents, preserving the spatiotemporal information for robust letter prediction. Our end-to-end pipeline achieves 78% accuracy on word recognition with up to 5 letters in mid-air writing scenarios, establishing LLMs as viable tools for fine-grained HAR.
Authors: Sunghee Jung, Donghun Lee, Shinbok Lee, Gaeun Seo, Daniel Lee, Byeongil Ko, Junrae Cho, Kihyun Kim, Eunggyun Kim, Myeongcheol Shin
Abstract: Tool-Augmented Larage Language Models (TA-LLMs) have shown promise in real-world applications, but face challenges in handling incomplete queries and out-of-scope requests. While existing approaches rely mainly on Supervised Fine-Tuning with expert trajectories, we propose DiaTool-DPO, a novel method that enhances TA-LLM's dialogue capabilities through Direct Preference Optimization. We model TA-LLM interactions as a Markov Decision Process with 5 distinct dialogue states and categorize user queries into 3 types based on their state transition trajectories. We automatically construct paired trajectory datasets of correct and incorrect dialogue flows and introduce a specialized objective loss for dialogue control. Our comprehensive evaluation demonstrates that DiaTool-DPO approaches GPT-4o's performance (94.8% in information gathering, 91% in tool call rejection) with substantial improvements over baseline (44% and 9.6% respectively) while maintaining core functionality. Our approach opens new possibilities for developing TA-LLMs that can handle diverse real-world scenarios without requiring additional expert demonstrations or human labeling.
Authors: Anil Ramakrishna, Yixin Wan, Xiaomeng Jin, Kai-Wei Chang, Zhiqi Bu, Bhanukiran Vinzamuri, Volkan Cevher, Mingyi Hong, Rahul Gupta
Abstract: We introduce SemEval-2025 Task 4: unlearning sensitive content from Large Language Models (LLMs). The task features 3 subtasks for LLM unlearning spanning different use cases: (1) unlearn long form synthetic creative documents spanning different genres; (2) unlearn short form synthetic biographies containing personally identifiable information (PII), including fake names, phone number, SSN, email and home addresses, and (3) unlearn real documents sampled from the target model's training dataset. We received over 100 submissions from over 30 institutions and we summarize the key techniques and lessons in this paper.
Authors: John Chen, Alexandros Lotsos, Grace Wang, Lexie Zhao, Bruce Sherin, Uri Wilensky, Michael Horn
Abstract: Open coding, a key inductive step in qualitative research, discovers and constructs concepts from human datasets. However, capturing extensive and nuanced aspects or "coding moments" can be challenging, especially with large discourse datasets. While some studies explore machine learning (ML)/Generative AI (GAI)'s potential for open coding, few evaluation studies exist. We compare open coding results by five recently published ML/GAI approaches and four human coders, using a dataset of online chat messages around a mobile learning software. Our systematic analysis reveals ML/GAI approaches' strengths and weaknesses, uncovering the complementary potential between humans and AI. Line-by-line AI approaches effectively identify content-based codes, while humans excel in interpreting conversational dynamics. We discussed how embedded analytical processes could shape the results of ML/GAI approaches. Instead of replacing humans in open coding, researchers should integrate AI with and according to their analytical processes, e.g., as parallel co-coders.
Authors: Lele Cao
Abstract: Advances in AI-generated content have led to wide adoption of large language models, diffusion-based visual generators, and synthetic audio tools. However, these developments raise critical concerns about misinformation, copyright infringement, security threats, and the erosion of public trust. In this paper, we explore an extensive range of methods designed to detect and mitigate AI-generated textual, visual, and audio content. We begin by discussing motivations and potential impacts associated with AI-based content generation, including real-world risks and ethical dilemmas. We then outline detection techniques spanning observation-based strategies, linguistic and statistical analysis, model-based pipelines, watermarking and fingerprinting, as well as emergent ensemble approaches. We also present new perspectives on robustness, adaptation to rapidly improving generative architectures, and the critical role of human-in-the-loop verification. By surveying state-of-the-art research and highlighting case studies in academic, journalistic, legal, and industrial contexts, this paper aims to inform robust solutions and policymaking. We conclude by discussing open challenges, including adversarial transformations, domain generalization, and ethical concerns, thereby offering a holistic guide for researchers, practitioners, and regulators to preserve content authenticity in the face of increasingly sophisticated AI-generated media.
Authors: Matheus Martins Batista
Abstract: The growing threat posed by deepfake videos, capable of manipulating realities and disseminating misinformation, drives the urgent need for effective detection methods. This work investigates and compares different approaches for identifying deepfakes, focusing on the GenConViT model and its performance relative to other architectures present in the DeepfakeBenchmark. To contextualize the research, the social and legal impacts of deepfakes are addressed, as well as the technical fundamentals of their creation and detection, including digital image processing, machine learning, and artificial neural networks, with emphasis on Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), and Transformers. The performance evaluation of the models was conducted using relevant metrics and new datasets established in the literature, such as WildDeep-fake and DeepSpeak, aiming to identify the most effective tools in the battle against misinformation and media manipulation. The obtained results indicated that GenConViT, after fine-tuning, exhibited superior performance in terms of accuracy (93.82%) and generalization capacity, surpassing other architectures in the DeepfakeBenchmark on the DeepSpeak dataset. This study contributes to the advancement of deepfake detection techniques, offering contributions to the development of more robust and effective solutions against the dissemination of false information.
Authors: Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Himabindu Lakkaraju, Yizhou Sun, Shichang Zhang
Abstract: Post-training is essential for the success of large language models (LLMs), transforming pre-trained base models into more useful and aligned post-trained models. While plenty of works have studied post-training algorithms and evaluated post-training models by their outputs, it remains understudied how post-training reshapes LLMs internally. In this paper, we compare base and post-trained LLMs mechanistically from four perspectives to better understand post-training effects. Our findings across model families and datasets reveal that: (1) Post-training does not change the factual knowledge storage locations, and it adapts knowledge representations from the base model while developing new knowledge representations; (2) Both truthfulness and refusal can be represented by linear vectors in the hidden representation space. The truthfulness direction is highly similar between the base and post-trained model, and it is effectively transferable for interventions; (3) The refusal direction is different between the base and post-trained models, and it shows limited forward transferability; (4) Differences in confidence between the base and post-trained models cannot be attributed to entropy neurons. Our study provides insights into the fundamental mechanisms preserved and altered during post-training, facilitates downstream tasks like model steering, and could potentially benefit future research in interpretability and LLM post-training.
Authors: Lorena Torres Lahoz, Carlos Lima Azevedo, Leonardo Ancora, Paulo Morgado, Zenia Kotval, Bruno Miranda, Francisco Camara Pereira
Abstract: Urban environments significantly influence mental health outcomes, yet the role of an effective framework for decision-making under deep uncertainty (DMDU) for optimizing urban policies for stress reduction remains underexplored. While existing research has demonstrated the effects of urban design on mental health, there is a lack of systematic scenario-based analysis to guide urban planning decisions. This study addresses this gap by applying Scenario Discovery (SD) in urban planning to evaluate the effectiveness of urban vegetation interventions in stress reduction across different urban environments using a predictive model based on emotional responses collected from a neuroscience-based outdoor experiment in Lisbon. Combining these insights with detailed urban data from Copenhagen, we identify key intervention thresholds where vegetation-based solutions succeed or fail in mitigating stress responses. Our findings reveal that while increased vegetation generally correlates with lower stress levels, high-density urban environments, crowding, and individual psychological traits (e.g., extraversion) can reduce its effectiveness. This work showcases our Scenario Discovery framework as a systematic approach for identifying robust policy pathways in urban planning, opening the door for its exploration in other urban decision-making contexts where uncertainty and design resiliency are critical.
Authors: Sara Yarham, Mehran Behjati
Abstract: Monitoring air quality and environmental conditions is crucial for public health and effective urban planning. Current environmental monitoring approaches often rely on centralized data collection and processing, which pose significant privacy, security, and scalability challenges. Federated Learning (FL) offers a promising solution to these limitations by enabling collaborative model training across multiple devices without sharing raw data. This decentralized approach addresses privacy concerns while still leveraging distributed data sources. This paper provides a comprehensive review of FL applications in air quality and environmental monitoring, emphasizing its effectiveness in predicting pollutants and managing environmental data. However, the paper also identifies key limitations of FL when applied in this domain, including challenges such as communication overhead, infrastructure demands, generalizability issues, computational complexity, and security vulnerabilities. For instance, communication overhead, caused by the frequent exchange of model updates between local devices and central servers, is a notable challenge. To address this, future research should focus on optimizing communication protocols and reducing the frequency of updates to lessen the burden on network resources. Additionally, the paper suggests further research directions to refine FL frameworks and enhance their applicability in real-world environmental monitoring scenarios. By synthesizing findings from existing studies, this paper highlights the potential of FL to improve air quality management while maintaining data privacy and security, and it provides valuable insights for future developments in the field.
Authors: Rohit Agarwal, Aryan Dessai, Arif Ahmed Sekh, Krishna Agarwal, Alexander Horsch, Dilip K. Prasad
Abstract: The field of varying feature space in online learning settings, also known as haphazard inputs, is very prominent nowadays due to its applicability in various fields. However, the current solutions to haphazard inputs are model-dependent and cannot benefit from the existing advanced deep-learning methods, which necessitate inputs of fixed dimensions. Therefore, we propose to transform the varying feature space in an online learning setting to a fixed-dimension image representation on the fly. This simple yet novel approach is model-agnostic, allowing any vision-based models to be applicable for haphazard inputs, as demonstrated using ResNet and ViT. The image representation handles the inconsistent input data seamlessly, making our proposed approach scalable and robust. We show the efficacy of our method on four publicly available datasets. The code is available at https://github.com/Rohit102497/HaphazardInputsAsImages.
URLs: https://github.com/Rohit102497/HaphazardInputsAsImages.
Authors: Keith Hubbard, Sheilla Amponsah
Abstract: Nearly every educational institution uses a learning management system (LMS), often producing terabytes of data generated by thousands of people. We examine LMS grade and login data from a regional comprehensive university, specifically documenting key considerations for engineering features from these data when trying to predict student performance. We specifically document changes to LMS data patterns since Covid-19, which are critical for data scientists to account for when using historic data. We compare numerous engineered features and approaches to utilizing those features for machine learning. We finish with a summary of the implications of including these features into more comprehensive student performance models.
Authors: Yuhan Duan, Xin Zhao, Neng Shi, Han-Wei Shen
Abstract: Surrogate models, crucial for approximating complex simulation data across sciences, inherently carry uncertainties that range from simulation noise to model prediction errors. Without rigorous uncertainty quantification, predictions become unreliable and hence hinder analysis. While methods like Monte Carlo dropout and ensemble models exist, they are often costly, fail to isolate uncertainty types, and lack guaranteed coverage in prediction intervals. To address this, we introduce ConfEviSurrogate, a novel Conformalized Evidential Surrogate Model that can efficiently learn high-order evidential distributions, directly predict simulation outcomes, separate uncertainty sources, and provide prediction intervals. A conformal prediction-based calibration step further enhances interval reliability to ensure coverage and improve efficiency. Our ConfEviSurrogate demonstrates accurate predictions and robust uncertainty estimates in diverse simulations, including cosmology, ocean dynamics, and fluid dynamics.
Authors: Anurag Kulkarni
Abstract: This paper presents a LiDAR-based object detection system with real-time voice specifications, integrating KITTI's 3D point clouds and RGB images through a multi-modal PointNet framework. It achieves 87.0% validation accuracy on a 3000-sample subset, surpassing a 200-sample baseline of 67.5% by combining spatial and visual data, addressing class imbalance with weighted loss, and refining training via adaptive techniques. A Tkinter prototype provides natural Indian male voice output using Edge TTS (en-IN-PrabhatNeural), alongside 3D visualizations and real-time feedback, enhancing accessibility and safety in autonomous navigation, assistive technology, and beyond. The study offers a detailed methodology, comprehensive experimental analysis, and a broad review of applications and challenges, establishing this work as a scalable advancement in human-computer interaction and environmental perception, aligned with current research trends.
Authors: Abhishek Sharma, Dan Goldwasser
Abstract: LLMs' remarkable ability to tackle a wide range of language tasks opened new opportunities for collaborative human-AI problem solving. LLMs can amplify human capabilities by applying their intuitions and reasoning strategies at scale. We explore whether human guides can be simulated, by generalizing from human demonstrations of guiding an AI system to solve complex language problems. We introduce CoLa, a novel self-guided learning paradigm for training automated $\textit{guides}$ and evaluate it on two QA datasets, a puzzle-solving task, and a constrained text generation task. Our empirical results show that CoLa consistently outperforms competitive approaches across all domains. Moreover, a small-sized trained guide outperforms a strong model like GPT-4 when acting as a guide. We compare the strategies employed by humans and automated guides by conducting a human study on a QA dataset. We show that automated guides outperform humans by adapting their strategies to reasoners' capabilities and conduct qualitative analyses highlighting distinct differences in guiding strategies.
Authors: Heeseung Bang, Jung-Hoon Cho, Cathy Wu, Andreas A. Malikopoulos
Abstract: In this paper, we aim to mitigate congestion in traffic management systems by guiding travelers along system-optimal (SO) routes. However, we recognize that most theoretical approaches assume perfect driver compliance, which often does not reflect reality, as drivers tend to deviate from recommendations to fulfill their personal objectives. Therefore, we propose a route recommendation framework that explicitly learns partial driver compliance and optimizes traffic flow under realistic adherence. We first compute an SO edge flow through flow optimization techniques. Next, we train a compliance model based on historical driver decisions to capture individual responses to our recommendations. Finally, we formulate a stochastic optimization problem that minimizes the gap between the target SO flow and the realized flow under conditions of imperfect adherence. Our simulations conducted on a grid network reveal that our approach significantly reduces travel time compared to baseline strategies, demonstrating the practical advantage of incorporating learned compliance into traffic management.
Authors: Shaoyuan Xu, Yang Cheng, Qian Lin, Jan P. Allebach
Abstract: Emotion has an important role in daily life, as it helps people better communicate with and understand each other more efficiently. Facial expressions can be classified into 7 categories: angry, disgust, fear, happy, neutral, sad and surprise. How to detect and recognize these seven emotions has become a popular topic in the past decade. In this paper, we develop an emotion recognition system that can apply emotion recognition on both still images and real-time videos by using deep learning. We build our own emotion recognition classification and regression system from scratch, which includes dataset collection, data preprocessing , model training and testing. Given a certain image or a real-time video, our system is able to show the classification and regression results for all of the 7 emotions. The proposed system is tested on 2 different datasets, and achieved an accuracy of over 80\%. Moreover, the result obtained from real-time testing proves the feasibility of implementing convolutional neural networks in real time to detect emotions accurately and efficiently.
Authors: Shaoyuan Xu, Cheng Lu, Mark Shaw, Peter Bauer, Jan P. Allebach
Abstract: Digital copiers and printers are widely used nowadays. One of the most important things people care about is copying or printing quality. In order to improve it, we previously came up with an SVM-based classification method to classify images with only text, only pictures or a mixture of both based on the fact that modern copiers and printers are equipped with processing pipelines designed specifically for different kinds of images. However, in some other applications, we need to distinguish more than three classes. In this paper, we develop a more advanced SVM-based classification method using four more new features to classify 5 types of images which are text, picture, mixed, receipt and highlight.
Authors: Issa-Mbenard Dabo, J\'er\'emie Bigot
Abstract: The behavior of the random feature model in the high-dimensional regression framework has become a popular issue of interest in the machine learning literature}. This model is generally considered for feature vectors $x_i = \Sigma^{1/2} x_i'$, where $x_i'$ is a random vector made of independent and identically distributed (iid) entries, and $\Sigma$ is a positive definite matrix representing the covariance of the features. In this paper, we move beyond {\CB this standard assumption by studying the performances of the random features model in the setting of non-iid feature vectors}. Our approach is related to the analysis of the spectrum of large random matrices through random matrix theory (RMT) {\CB and free probability} results. We turn to the analysis of non-iid data by using the notion of variance profile {\CB which} is {\CB well studied in RMT.} Our main contribution is then the study of the limits of the training and {\CB prediction} risks associated to the ridge estimator in the random features model when its dimensions grow. We provide asymptotic equivalents of these risks that capture the behavior of ridge regression with random features in a {\CB high-dimensional} framework. These asymptotic equivalents, {\CB which prove to be sharp in numerical experiments}, are retrieved by adapting, to our setting, established results from operator-valued free probability theory. Moreover, {\CB for various classes of random feature vectors that have not been considered so far in the literature}, our approach allows to show the appearance of the double descent phenomenon when the ridge regularization parameter is small enough.
Authors: Haishan Wang, Mohammad Hassan Vali, Arno Solin
Abstract: 3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness in 3D reconstruction, achieving high-quality results with real-time radiance field rendering. However, a key challenge is the substantial storage cost: reconstructing a single scene typically requires millions of Gaussian splats, each represented by 59 floating-point parameters, resulting in approximately 1~GB of memory. To address this challenge, we propose a compression method by building separate attribute codebooks and storing only discrete code indices. Specifically, we employ noise-substituted vector quantization technique to jointly train the codebooks and model features, ensuring consistency between gradient descent optimization and parameter discretization. Our method reduces the memory consumption efficiently (around $45\times$) while maintaining competitive reconstruction quality on standard 3D benchmark scenes. Experiments on different codebook sizes show the trade-off between compression ratio and image quality. Furthermore, the trained compressed model remains fully compatible with popular 3DGS viewers and enables faster rendering speed, making it well-suited for practical applications.
Authors: Pascal Chang, Jingwei Tang, Markus Gross, Vinicius C. Azevedo
Abstract: Video editing and generation methods often rely on pre-trained image-based diffusion models. During the diffusion process, however, the reliance on rudimentary noise sampling techniques that do not preserve correlations present in subsequent frames of a video is detrimental to the quality of the results. This either produces high-frequency flickering, or texture-sticking artifacts that are not amenable to post-processing. With this in mind, we propose a novel method for preserving temporal correlations in a sequence of noise samples. This approach is materialized by a novel noise representation, dubbed $\int$-noise (integral noise), that reinterprets individual noise samples as a continuously integrated noise field: pixel values do not represent discrete values, but are rather the integral of an underlying infinite-resolution noise over the pixel area. Additionally, we propose a carefully tailored transport method that uses $\int$-noise to accurately advect noise samples over a sequence of frames, maximizing the correlation between different frames while also preserving the noise properties. Our results demonstrate that the proposed $\int$-noise can be used for a variety of tasks, such as video restoration, surrogate rendering, and conditional video generation. See https://warpyournoise.github.io/ for video results.
Authors: Zhangsong Li
Abstract: In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=\tfrac{1}{\sqrt{1+\sigma^2}}(\Pi_* X Q_* + \sigma Z)$, where $X$ is an $n*d$ standard Gaussian design matrix, $Z$ is an $n*m$ Gaussian noise matrix, $\Pi_*$ is an unknown $n*n$ permutation matrix, and $Q_*$ is an unknown $d*m$ on the Grassmanian manifold satisfying $Q_*^{\top} Q_* = \mathbb I_m$. Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$ are independent Gaussian random matrices of sizes $n*d$ and $n*m$, respectively. Our results reveal a phase transition phenomenon in the performance of low-degree polynomial algorithms for this task. (1) When $m=o(d)$, we show that all degree-$D$ polynomials fail to distinguish these two models even when $\sigma=0$, provided with $D^4=o\big( \tfrac{d}{m} \big)$. (2) When $m=d$ and $\sigma=\omega(1)$, we show that all degree-$D$ polynomials fail to distinguish these two models provided with $D=o(\sigma)$. (3) When $m=d$ and $\sigma=o(1)$, we show that there exists a constant-degree polynomial that strongly distinguish these two models. These results establish a smooth transition in the effectiveness of low-degree polynomial algorithms for this problem, highlighting the interplay between the dimensions $m$ and $d$, the noise level $\sigma$, and the computational complexity of the testing task.
Authors: Haozhan Tang, Tianyi Zhang, Oliver Kroemer, Matthew Johnson-Roberson, Weiming Zhi
Abstract: Robots operating in unstructured environments often require accurate and consistent object-level representations. This typically requires segmenting individual objects from the robot's surroundings. While recent large models such as Segment Anything (SAM) offer strong performance in 2D image segmentation. These advances do not translate directly to performance in the physical 3D world, where they often over-segment objects and fail to produce consistent mask correspondences across views. In this paper, we present GraphSeg, a framework for generating consistent 3D object segmentations from a sparse set of 2D images of the environment without any depth information. GraphSeg adds edges to graphs and constructs dual correspondence graphs: one from 2D pixel-level similarities and one from inferred 3D structure. We formulate segmentation as a problem of edge addition, then subsequent graph contraction, which merges multiple 2D masks into unified object-level segmentations. We can then leverage \emph{3D foundation models} to produce segmented 3D representations. GraphSeg achieves robust segmentation with significantly fewer images and greater accuracy than prior methods. We demonstrate state-of-the-art performance on tabletop scenes and show that GraphSeg enables improved performance on downstream robotic manipulation tasks. Code available at https://github.com/tomtang502/graphseg.git.
Authors: Fatemeh Javadian, Zahra Aminparast, Johannes Stegmaier, Abin Jose
Abstract: This study explores the application of supervised and unsupervised autoencoders (AEs) to automate nuclei classification in clear cell renal cell carcinoma (ccRCC) images, a diagnostic task traditionally reliant on subjective visual grading by pathologists. We evaluate various AE architectures, including standard AEs, contractive AEs (CAEs), and discriminative AEs (DAEs), as well as a classifier-based discriminative AE (CDAE), optimized using the hyperparameter tuning tool Optuna. Bhattacharyya distance is selected from several metrics to assess class separability in the latent space, revealing challenges in distinguishing adjacent grades using unsupervised models. CDAE, integrating a supervised classifier branch, demonstrated superior performance in both latent space separation and classification accuracy. Given that CDAE-CNN achieved notable improvements in classification metrics, affirming the value of supervised learning for class-specific feature extraction, F1 score was incorporated into the tuning process to optimize classification performance. Results show significant improvements in identifying aggressive ccRCC grades by leveraging the classification capability of AE through latent clustering followed by fine-grained classification. Our model outperforms the current state of the art, CHR-Network, across all evaluated metrics. These findings suggest that integrating a classifier branch in AEs, combined with neural architecture search and contrastive learning, enhances grading automation in ccRCC pathology, particularly in detecting aggressive tumor grades, and may improve diagnostic accuracy.
Authors: Jing Bi, Susan Liang, Xiaofei Zhou, Pinxin Liu, Junjia Guo, Yunlong Tang, Luchuan Song, Chao Huang, Guangyu Sun, Jinxi He, Jiarui Wu, Shu Yang, Daoan Zhang, Chen Chen, Lianggong Bruce Wen, Zhang Liu, Jiebo Luo, Chenliang Xu
Abstract: Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks. Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains. However, effectively extending these capabilities into multimodal contexts-where models must integrate both visual and textual inputs-continues to be a significant challenge. Multimodal reasoning introduces complexities, such as handling conflicting information across modalities, which require models to adopt advanced interpretative strategies. Addressing these challenges involves not only sophisticated algorithms but also robust methodologies for evaluating reasoning accuracy and coherence. This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs. Through a thorough and up-to-date comparison, we clearly formulate core reasoning challenges and opportunities, highlighting practical methods for post-training optimization and test-time inference. Our work provides valuable insights and guidance, bridging theoretical frameworks and practical implementations, and sets clear directions for future research.
Authors: Xuelian Bao, Lulu Kang, Chun Liu, Yiwei Wang
Abstract: In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics for minimizing the KL divergence, derived using a "discretize-then-variational" approach, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that our method outperforms existing ParVI approaches in efficiency, robustness, and accuracy.
Authors: Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, Pengfei Liu
Abstract: Large Language Models (LLMs) equipped with web search capabilities have demonstrated impressive potential for deep research tasks. However, current approaches predominantly rely on either manually engineered prompts (prompt engineering-based) with brittle performance or reinforcement learning within controlled Retrieval-Augmented Generation (RAG) environments (RAG-based) that fail to capture the complexities of real-world interaction. In this paper, we introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web. We implement a specialized multi-agent architecture where browsing agents extract relevant information from various webpage structures and overcoming significant technical challenges. Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. Our results highlight that end-to-end training in real-world web environments is not merely an implementation detail but a fundamental requirement for developing robust research capabilities aligned with real-world applications. We release DeepResearcher at https://github.com/GAIR-NLP/DeepResearcher.
Authors: Yu Inatsu
Abstract: Bayesian optimization based on Gaussian process upper confidence bound (GP-UCB) has a theoretical guarantee for optimizing black-box functions. Black-box functions often have input uncertainty, but even in this case, GP-UCB can be extended to optimize evaluation measures called robustness measures. However, GP-UCB-based methods for robustness measures include a trade-off parameter $\beta$, which must be excessively large to achieve theoretical validity, just like the original GP-UCB. In this study, we propose a new method called randomized robustness measure GP-UCB (RRGP-UCB), which samples the trade-off parameter $\beta$ from a probability distribution based on a chi-squared distribution and avoids explicitly specifying $\beta$. The expected value of $\beta$ is not excessively large. Furthermore, we show that RRGP-UCB provides tight bounds on the expected value of regret based on the optimal solution and estimated solutions. Finally, we demonstrate the usefulness of the proposed method through numerical experiments.
Authors: Nikhil Shivakumar Nayak, Michael P. Brenner
Abstract: This study investigates enhancing option pricing by extending the Black-Scholes model to include stochastic volatility and interest rate variability within the Partial Differential Equation (PDE). The PDE is solved using the finite difference method. The extended Black-Scholes model and a machine learning-based LSTM model are developed and evaluated for pricing Google stock options. Both models were backtested using historical market data. While the LSTM model exhibited higher predictive accuracy, the finite difference method demonstrated superior computational efficiency. This work provides insights into model performance under varying market conditions and emphasizes the potential of hybrid approaches for robust financial modeling.
Authors: Karthik Elamvazhuthi, Abhishek Halder
Abstract: We revisit the optimal transport problem over angular velocity dynamics given by the controlled Euler equation. The solution of this problem enables stochastic guidance of spin states of a rigid body (e.g., spacecraft) over hard deadline constraint by transferring a given initial state statistics to a desired terminal state statistics. This is an instance of generalized optimal transport over a nonlinear dynamical system. While prior work has reported existence-uniqueness and numerical solution of this dynamical optimal transport problem, here we present structural results about the equivalent Kantorovich a.k.a. optimal coupling formulation. Specifically, we focus on deriving the ground cost for the associated Kantorovich optimal coupling formulation. The ground cost equals to the cost of transporting unit amount of mass from a specific realization of the initial or source joint probability measure to a realization of the terminal or target joint probability measure, and determines the Kantorovich formulation. Finding the ground cost leads to solving a structured deterministic nonlinear optimal control problem, which is shown to be amenable to an analysis technique pioneered by Athans et. al. We show that such techniques have broader applicability in determining the ground cost (thus Kantorovich formulation) for a class of generalized optimal mass transport problems involving nonlinear dynamics with translated norm-invariant drift.
Authors: Yasmine Mustafa, Mohamed Elmahallawy, Tie Luo
Abstract: Alzheimer's disease (AD) leads to progressive cognitive decline, making early detection crucial for effective intervention. While deep learning models have shown high accuracy in AD diagnosis, their lack of interpretability limits clinical trust and adoption. This paper introduces a novel pre-model approach leveraging Jacobian Maps (JMs) within a multi-modal framework to enhance explainability and trustworthiness in AD detection. By capturing localized brain volume changes, JMs establish meaningful correlations between model predictions and well-known neuroanatomical biomarkers of AD. We validate JMs through experiments comparing a 3D CNN trained on JMs versus on traditional preprocessed data, which demonstrates superior accuracy. We also employ 3D Grad-CAM analysis to provide both visual and quantitative insights, further showcasing improved interpretability and diagnostic reliability.
Authors: Junjie Yang, Ke Lin, Xing Yu
Abstract: Chain of Thought (CoT) reasoning enhances language models' performance but often leads to inefficient "overthinking" on simple problems. We identify that existing approaches directly penalizing reasoning length fail to account for varying problem complexity. Our approach constructs rewards through length and quality comparisons, guided by theoretical assumptions that jointly enhance solution correctness with conciseness. Moreover, we further demonstrate our method to fuzzy tasks where ground truth is unavailable. Experiments across multiple reasoning benchmarks demonstrate that our method maintains accuracy while generating significantly more concise explanations, effectively teaching models to "think when needed."
Authors: Marius Graumann (German Aerospace Center), Jan Marius St\"urmer (German Aerospace Center), Tobias Koch (German Aerospace Center)
Abstract: Nowadays, a lot of old floor plans exist in printed form or are stored as scanned raster images. Slight rotations or shifts may occur during scanning. Bringing floor plans of this form into a machine readable form to enable further use, still poses a problem. Therefore, we propose an end-to-end pipeline that pre-processes the image and leverages a novel approach to create a region adjacency graph (RAG) from the pre-processed image and predict its nodes. By incorporating normalization steps into the RAG feature extraction, we significantly improved the rotation invariance of the RAG feature calculation. Moreover, applying our method leads to an improved F1 score and IoU on rotated data. Furthermore, we proposed a wall splitting algorithm for partitioning walls into segments associated with the corresponding rooms.
Authors: Enguerrand Prebet, Samuel Teuber, Andr\'e Platzer
Abstract: This article presents a formal model and formal safety proofs for the ABZ'25 case study in differential dynamic logic (dL). The case study considers an autonomous car driving on a highway avoiding collisions with neighbouring cars. Using KeYmaera X's dL implementation, we prove absence of collision on an infinite time horizon which ensures that safety is preserved independently of trip length. The safety guarantees hold for time-varying reaction time and brake force. Our dL model considers the single lane scenario with cars ahead or behind. We demonstrate that dL with its tools is a rigorous foundation for runtime monitoring, shielding, and neural network verification. Doing so sheds light on inconsistencies between the provided specification and simulation environment highway-env of the ABZ'25 study. We attempt to fix these inconsistencies and uncover numerous counterexamples which also indicate issues in the provided reinforcement learning environment.
Authors: Guido Barducci, Ivan Rossi, Francesco Codic\`e, Cesare Rollo, Valeria Repetto, Corrado Pancotti, Virginia Iannibelli, Tiziana Sanavia, Piero Fariselli
Abstract: Understanding how residue variations affect protein stability is crucial for designing functional proteins and deciphering the molecular mechanisms underlying disease-related mutations. Recent advances in protein language models (PLMs) have revolutionized computational protein analysis, enabling, among other things, more accurate predictions of mutational effects. In this work, we introduce JanusDDG, a deep learning framework that leverages PLM-derived embeddings and a bidirectional cross-attention transformer architecture to predict $\Delta \Delta G$ of single and multiple-residue mutations while simultaneously being constrained to respect fundamental thermodynamic properties, such as antisymmetry and transitivity. Unlike conventional self-attention, JanusDDG computes queries (Q) and values (V) as the difference between wild-type and mutant embeddings, while keys (K) alternate between the two. This cross-interleaved attention mechanism enables the model to capture mutation-induced perturbations while preserving essential contextual information. Experimental results show that JanusDDG achieves state-of-the-art performance in predicting $\Delta \Delta G$ from sequence alone, matching or exceeding the accuracy of structure-based methods for both single and multiple mutations.
Authors: Gijs Bellaard, Bart M. N. Smets, Remco Duits
Abstract: Euclidean E(3) equivariant neural networks that employ scalar fields on position-orientation space M(3) have been effectively applied to tasks such as predicting molecular dynamics and properties. To perform equivariant convolutional-like operations in these architectures one needs Euclidean invariant kernels on M(3) x M(3). In practice, a handcrafted collection of invariants is selected, and this collection is then fed into multilayer perceptrons to parametrize the kernels. We rigorously describe an optimal collection of 4 smooth scalar invariants on the whole of M(3) x M(3). With optimal we mean that the collection is independent and universal, meaning that all invariants are pertinent, and any invariant kernel is a function of them. We evaluate two collections of invariants, one universal and one not, using the PONITA neural network architecture. Our experiments show that using a collection of invariants that is universal positively impacts the accuracy of PONITA significantly.
Authors: Mathis Kruse, Bodo Rosenhahn
Abstract: With more well-performing anomaly detection methods proposed, many of the single-view tasks have been solved to a relatively good degree. However, real-world production scenarios often involve complex industrial products, whose properties may not be fully captured by one single image. While normalizing flow based approaches already work well in single-camera scenarios, they currently do not make use of the priors in multi-view data. We aim to bridge this gap by using these flow-based models as a strong foundation and propose Multi-Flow, a novel multi-view anomaly detection method. Multi-Flow makes use of a novel multi-view architecture, whose exact likelihood estimation is enhanced by fusing information across different views. For this, we propose a new cross-view message-passing scheme, letting information flow between neighboring views. We empirically validate it on the real-world multi-view data set Real-IAD and reach a new state-of-the-art, surpassing current baselines in both image-wise and sample-wise anomaly detection tasks.
Authors: Gijs Bellaard, Bart M. N. Smets
Abstract: Riemannian metrics on the position-orientation space M(3) that are roto-translation group SE(3) invariant play a key role in image analysis tasks like enhancement, denoising, and segmentation. These metrics enable roto-translation equivariant algorithms, with the associated Riemannian distance often used in implementation. However, computing the Riemannian distance is costly, which makes it unsuitable in situations where constant recomputation is needed. We propose the mav (minimal angular velocity) distance, defined as the Riemannian length of a geometrically meaningful curve, as a practical alternative. We see an application of the mav distance in geometric deep learning. Namely, neural networks architectures such as PONITA, relies on geometric invariants to create their roto-translation equivariant model. The mav distance offers a trainable invariant, with the parameters that determine the Riemannian metric acting as learnable weights. In this paper we: 1) classify and parametrize all SE(3) invariant metrics on M(3), 2) describes how to efficiently calculate the mav distance, and 3) investigate if including the mav distance within PONITA can positively impact its accuracy in predicting molecular properties.
Authors: Lu\'is Couto Seller, \'I\~nigo Sanz Torres, Adri\'an Vogel-Fern\'andez, Carlos Gonz\'alez Carballo, Pedro Miguel S\'anchez S\'anchez, Adri\'an Carruana Mart\'in, Enrique de Miguel Ambite
Abstract: Large Language Models have significantly advanced natural language processing, achieving remarkable performance in tasks such as language generation, translation, and reasoning. However, their substantial computational requirements restrict deployment to high-end systems, limiting accessibility on consumer-grade devices. This challenge is especially pronounced for under-resourced languages like those spoken in the Iberian Peninsula, where relatively limited linguistic resources and benchmarks hinder effective evaluation. This work presents a comprehensive evaluation of compact state-of-the-art LLMs across several essential NLP tasks tailored for Iberian languages. The results reveal that while some models consistently excel in certain tasks, significant performance gaps remain, particularly for languages such as Basque. These findings highlight the need for further research on balancing model compactness with robust multilingual performance
Authors: Marie Kempkes, Jakob Spiegelberg, Evert van Nieuwenburg, Vedran Dunjko
Abstract: A central question in machine learning is how reliable the predictions of a trained model are. Reliability includes the identification of instances for which a model is likely not to be trusted based on an analysis of the learning system itself. Such unreliability for an input may arise from the model family providing a variety of hypotheses consistent with the training data, which can vastly disagree in their predictions on that particular input point. This is called the underdetermination problem, and it is important to develop methods to detect it. With the emergence of quantum machine learning (QML) as a prospective alternative to classical methods for certain learning problems, the question arises to what extent they are subject to underdetermination and whether similar techniques as those developed for classical models can be employed for its detection. In this work, we first provide an overview of concepts from Safe AI and reliability, which in particular received little attention in QML. We then explore the use of a method based on local second-order information for the detection of underdetermination in parameterized quantum circuits through numerical experiments. We further demonstrate that the approach is robust to certain levels of shot noise. Our work contributes to the body of literature on Safe Quantum AI, which is an emerging field of growing importance.
Authors: Wan Tian, Zhongfeng Qin
Abstract: In recent years, the modeling and analysis of interval-valued time series have garnered significant attention in the fields of econometrics and statistics. However, the existing literature primarily focuses on regression tasks while neglecting classification aspects. In this paper, we propose an adaptive approach for interval-valued time series classification. Specifically, we represent interval-valued time series using convex combinations of upper and lower bounds of intervals and transform these representations into images based on point-valued time series imaging methods. We utilize a fine-grained image classification neural network to classify these images, to achieve the goal of classifying the original interval-valued time series. This proposed method is applicable to both univariate and multivariate interval-valued time series. On the optimization front, we treat the convex combination coefficients as learnable parameters similar to the parameters of the neural network and provide an efficient estimation method based on the alternating direction method of multipliers (ADMM). On the theoretical front, under specific conditions, we establish a margin-based multiclass generalization bound for generic CNNs composed of basic blocks involving convolution, pooling, and fully connected layers. Through simulation studies and real data applications, we validate the effectiveness of the proposed method and compare its performance against a wide range of point-valued time series classification methods.
Authors: Wan Tian, Zhongfeng Qin
Abstract: Modeling and forecasting interval-valued time series (ITS) have attracted considerable attention due to their growing presence in various contexts. To the best of our knowledge, there have been no efforts to model large-scale ITS. In this paper, we propose a feature extraction procedure for large-scale ITS, which involves key steps such as auto-segmentation and clustering, and feature transfer learning. This procedure can be seamlessly integrated with any suitable prediction models for forecasting purposes. Specifically, we transform the automatic segmentation and clustering of ITS into the estimation of Toeplitz sparse precision matrices and assignment set. The majorization-minimization algorithm is employed to convert this highly non-convex optimization problem into two subproblems. We derive efficient dynamic programming and alternating direction method to solve these two subproblems alternately and establish their convergence properties. By employing the Joint Recurrence Plot (JRP) to image subsequence and assigning a class label to each cluster, an image dataset is constructed. Then, an appropriate neural network is chosen to train on this image dataset and used to extract features for the next step of forecasting. Real data applications demonstrate that the proposed method can effectively obtain invariant representations of the raw data and enhance forecasting performance.
Authors: Shuang Wu
Abstract: Policy optimization algorithms are crucial in many fields but challenging to grasp and implement, often due to complex calculations related to Markov decision processes and varying use of discount and average reward setups. This paper presents a unified framework that applies generalized ergodicity theory and perturbation analysis to clarify and enhance the application of these algorithms. Generalized ergodicity theory sheds light on the steady-state behavior of stochastic processes, aiding understanding of both discounted and average rewards. Perturbation analysis provides in-depth insights into the fundamental principles of policy optimization algorithms. We use this framework to identify common implementation errors and demonstrate the correct approaches. Through a case study on Linear Quadratic Regulator problems, we illustrate how slight variations in algorithm design affect implementation outcomes. We aim to make policy optimization algorithms more accessible and reduce their misuse in practice.
Authors: Emma Hannula, Arttu H\"akkinen, Antti Solonen, Felibe Uribe, Jana de Wiljes, Lassi Roininen
Abstract: Improving energy efficiency of building heating systems is essential for reducing global energy consumption and greenhouse gas emissions. Traditional control methods in buildings rely on static heating curves based solely on outdoor temperature measurements, neglecting system state and free heat sources like solar gain. Model predictive control (MPC) not only addresses these limitations but further optimizes heating control by incorporating weather forecasts and system state predictions. However, current industrial MPC solutions often use simplified physics-inspired models, which compromise accuracy for interpretability. While purely data-driven models offer better predictive performance, they face challenges like overfitting and lack of transparency. To bridge this gap, we propose a Bayesian Long Short-Term Memory (LSTM) architecture for indoor temperature modeling. Our experiments across 100 real-world buildings demonstrate that the Bayesian LSTM outperforms an industrial physics-based model in predictive accuracy, enabling potential for improved energy efficiency and thermal comfort if deployed in heating MPC solutions. Over deterministic black-box approaches, the Bayesian framework provides additional advantages by improving generalization ability and allowing interpretation of predictions via uncertainty quantification. This work advances data-driven heating control by balancing predictive performance with the transparency and reliability required for real-world heating MPC applications.
Authors: Erik Johannes Husom, Arda Goknil, Merve Astekin, Lwin Khin Shar, Andre K{\aa}sen, Sagar Sen, Benedikt Andreas Mithassel, Ahmet Soylu
Abstract: Deploying Large Language Models (LLMs) on edge devices presents significant challenges due to computational constraints, memory limitations, inference speed, and energy consumption. Model quantization has emerged as a key technique to enable efficient LLM inference by reducing model size and computational overhead. In this study, we conduct a comprehensive analysis of 28 quantized LLMs from the Ollama library, which applies by default Post-Training Quantization (PTQ) and weight-only quantization techniques, deployed on an edge device (Raspberry Pi 4 with 4GB RAM). We evaluate energy efficiency, inference performance, and output accuracy across multiple quantization levels and task types. Models are benchmarked on five standardized datasets (CommonsenseQA, BIG-Bench Hard, TruthfulQA, GSM8K, and HumanEval), and we employ a high-resolution, hardware-based energy measurement tool to capture real-world power consumption. Our findings reveal the trade-offs between energy efficiency, inference speed, and accuracy in different quantization settings, highlighting configurations that optimize LLM deployment for resource-constrained environments. By integrating hardware-level energy profiling with LLM benchmarking, this study provides actionable insights for sustainable AI, bridging a critical gap in existing research on energy-aware LLM deployment.
Authors: Zhe Wang, Yifei Zhu
Abstract: Neural Radiance Fields (NeRF) is a cutting-edge neural network-based technique for novel view synthesis in 3D reconstruction. However, its significant computational demands pose challenges for deployment on mobile devices. While mesh-based NeRF solutions have shown potential in achieving real-time rendering on mobile platforms, they often fail to deliver high-quality reconstructions when rendering practical complex scenes. Additionally, the non-negligible memory overhead caused by pre-computed intermediate results complicates their practical application. To overcome these challenges, we present NeRFlex, a resource-aware, high-resolution, real-time rendering framework for complex scenes on mobile devices. NeRFlex integrates mobile NeRF rendering with multi-NeRF representations that decompose a scene into multiple sub-scenes, each represented by an individual NeRF network. Crucially, NeRFlex considers both memory and computation constraints as first-class citizens and redesigns the reconstruction process accordingly. NeRFlex first designs a detail-oriented segmentation module to identify sub-scenes with high-frequency details. For each NeRF network, a lightweight profiler, built on domain knowledge, is used to accurately map configurations to visual quality and memory usage. Based on these insights and the resource constraints on mobile devices, NeRFlex presents a dynamic programming algorithm to efficiently determine configurations for all NeRF representations, despite the NP-hardness of the original decision problem. Extensive experiments on real-world datasets and mobile devices demonstrate that NeRFlex achieves real-time, high-quality rendering on commercial mobile devices.
Authors: Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm, Brian Hu Zhang
Abstract: Solving (Stampacchia) variational inequalities (SVIs) is a foundational problem at the heart of optimization, with a host of critical applications ranging from engineering to economics. However, this expressivity comes at the cost of computational hardness. As a result, most research has focused on carving out specific subclasses that elude those intractability barriers. A classical property that goes back to the 1960s is the Minty condition, which postulates that the Minty VI (MVI) problem -- the weak dual of the SVI problem -- admits a solution. In this paper, we establish the first polynomial-time algorithm -- that is, with complexity growing polynomially in the dimension $d$ and $\log(1/\epsilon)$ -- for solving $\epsilon$-SVIs for Lipschitz continuous mappings under the Minty condition. Prior approaches either incurred an exponentially worse dependence on $1/\epsilon$ (and other natural parameters of the problem) or made overly restrictive assumptions -- such as strong monotonicity. To do so, we introduce a new variant of the ellipsoid algorithm wherein separating hyperplanes are obtained after taking a gradient descent step from the center of the ellipsoid. It succeeds even though the set of SVIs can be nonconvex and not fully dimensional. Moreover, when our algorithm is applied to an instance with no MVI solution and fails to identify an SVI solution, it produces a succinct certificate of MVI infeasibility. We also show that deciding whether the Minty condition holds is $\mathsf{coNP}$-complete. We provide several extensions and new applications of our main results. Specifically, we obtain the first polynomial-time algorithms for i) solving monotone VIs, ii) globally minimizing a (potentially nonsmooth) quasar-convex function, and iii) computing Nash equilibria in multi-player harmonic games.
Authors: Mirko Borszukovszki, Ivo Pascal de Jong, Matias Valdenegro-Toro
Abstract: To leverage the full potential of Large Language Models (LLMs) it is crucial to have some information on their answers' uncertainty. This means that the model has to be able to quantify how certain it is in the correctness of a given response. Bad uncertainty estimates can lead to overconfident wrong answers undermining trust in these models. Quite a lot of research has been done on language models that work with text inputs and provide text outputs. Still, since the visual capabilities have been added to these models recently, there has not been much progress on the uncertainty of Visual Language Models (VLMs). We tested three state-of-the-art VLMs on corrupted image data. We found that the severity of the corruption negatively impacted the models' ability to estimate their uncertainty and the models also showed overconfidence in most of the experiments.
Authors: William Fleshman, Benjamin Van Durme
Abstract: Training large, general-purpose language models poses significant challenges. The growing availability of specialized expert models, fine-tuned from pretrained models for specific tasks or domains, offers a promising alternative. Leveraging the potential of these existing expert models in real-world applications requires effective methods to select or merge the models best suited for a given task. This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference. Notably, our method requires no additional training and enables flexible, token- and layer-wise model combinations. Our experimental results demonstrate that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.
Authors: Jakiw Pidstrigach, Elizabeth Baker, Carles Domingo-Enrich, George Deligiannidis, Nikolas N\"usken
Abstract: In stochastic optimal control and conditional generative modelling, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and path-space integration by parts, that enables the development of methods robust to such singular rewards. This allows our approach to handle a broad range of applications, including classification, diffusion bridges, and conditioning without the need for artificial observational noise. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques.
Authors: David Landry, Claire Monteleoni, Anastase Charantonis
Abstract: We propose a machine-learning-based methodology for in-situ weather forecast postprocessing that is both spatially coherent and multivariate. Compared to previous work, our Flow MAtching Postprocessing (FMAP) better represents the correlation structures of the observations distribution, while also improving marginal performance at the stations. FMAP generates forecasts that are not bound to what is already modeled by the underlying gridded prediction and can infer new correlation structures from data. The resulting model can generate an arbitrary number of forecasts from a limited number of numerical simulations, allowing for low-cost forecasting systems. A single training is sufficient to perform postprocessing at multiple lead times, in contrast with other methods which use multiple trained networks at generation time. This work details our methodology, including a spatial attention transformer backbone trained within a flow matching generative modeling framework. FMAP shows promising performance in experiments on the EUPPBench dataset, forecasting surface temperature and wind gust values at station locations in western Europe up to five-day lead times.
Authors: Shubham Kumar Nigam, Balaramamahanthi Deepak Patnaik, Ajay Varghese Thomas, Noel Shallum, Kripabandhu Ghosh, Arnab Bhattacharya
Abstract: Automating legal document drafting can significantly enhance efficiency, reduce manual effort, and streamline legal workflows. While prior research has explored tasks such as judgment prediction and case summarization, the structured generation of private legal documents in the Indian legal domain remains largely unaddressed. To bridge this gap, we introduce VidhikDastaavej, a novel, anonymized dataset of private legal documents, and develop NyayaShilp, a fine-tuned legal document generation model specifically adapted to Indian legal texts. We propose a Model-Agnostic Wrapper (MAW), a two-step framework that first generates structured section titles and then iteratively produces content while leveraging retrieval-based mechanisms to ensure coherence and factual accuracy. We benchmark multiple open-source LLMs, including instruction-tuned and domain-adapted versions, alongside proprietary models for comparison. Our findings indicate that while direct fine-tuning on small datasets does not always yield improvements, our structured wrapper significantly enhances coherence, factual adherence, and overall document quality while mitigating hallucinations. To ensure real-world applicability, we developed a Human-in-the-Loop (HITL) Document Generation System, an interactive user interface that enables users to specify document types, refine section details, and generate structured legal drafts. This tool allows legal professionals and researchers to generate, validate, and refine AI-generated legal documents efficiently. Extensive evaluations, including expert assessments, confirm that our framework achieves high reliability in structured legal drafting. This research establishes a scalable and adaptable foundation for AI-assisted legal drafting in India, offering an effective approach to structured legal document generation.
Authors: George Iosifidis, Naram Mhaisen, Douglas J. Leith
Abstract: AI/ML-based tools are at the forefront of resource management solutions for communication networks. Deep learning, in particular, is highly effective in facilitating fast and high-performing decision-making whenever representative training data is available to build offline accurate models. Conversely, online learning solutions do not require training and enable adaptive decisions based on runtime observations, alas are often overly conservative. This extensive tutorial proposes the use of optimistic learning (OpL) as a decision engine for resource management frameworks in modern communication systems. When properly designed, such solutions can achieve fast and high-performing decisions -- comparable to offline-trained models -- while preserving the robustness and performance guarantees of the respective online learning approaches. We introduce the fundamental concepts, algorithms and results of OpL, discuss the roots of this theory and present different approaches to defining and achieving optimism. We proceed to showcase how OpL can enhance resource management in communication networks for several key problems such as caching, edge computing, network slicing, and workload assignment in decentralized O-RAN platforms. Finally, we discuss the open challenges that must be addressed to unlock the full potential of this new resource management approach.
Authors: Yongliang Wang, Hamidreza Kasaei
Abstract: Grasping large flat objects, such as books or keyboards lying horizontally, presents significant challenges for single-arm robotic systems, often requiring extra actions like pushing objects against walls or moving them to the edge of a surface to facilitate grasping. In contrast, dual-arm manipulation, inspired by human dexterity, offers a more refined solution by directly coordinating both arms to lift and grasp the object without the need for complex repositioning. In this paper, we propose a model-free deep reinforcement learning (DRL) framework to enable dual-arm coordination for grasping large flat objects. We utilize a large-scale grasp pose detection model as a backbone to extract high-dimensional features from input images, which are then used as the state representation in a reinforcement learning (RL) model. A CNN-based Proximal Policy Optimization (PPO) algorithm with shared Actor-Critic layers is employed to learn coordinated dual-arm grasp actions. The system is trained and tested in Isaac Gym and deployed to real robots. Experimental results demonstrate that our policy can effectively grasp large flat objects without requiring additional maneuvers. Furthermore, the policy exhibits strong generalization capabilities, successfully handling unseen objects. Importantly, it can be directly transferred to real robots without fine-tuning, consistently outperforming baseline methods.
Authors: Unique Subedi, Ambuj Tewari
Abstract: Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the solution operators of partial differential equations (PDEs). These methods can also be used to develop black-box simulators to model system behavior from experimental data, even without a known mathematical model. In this article, we begin by formalizing operator learning as a function-to-function regression problem and review some recent developments in the field. We also discuss PDE-specific operator learning, outlining strategies for incorporating physical and mathematical constraints into architecture design and training processes. Finally, we end by highlighting key future directions such as active data collection and the development of rigorous uncertainty quantification frameworks.
Authors: Shan An, Ziyu Meng, Chao Tang, Yuning Zhou, Tengyu Liu, Fangqiang Ding, Shufang Zhang, Yao Mu, Ran Song, Wei Zhang, Zeng-Guang Hou, Hong Zhang
Abstract: Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated finger movements and adaptive force modulation, enables complex interactions similar to human hand dexterity. With recent advances in robotics and machine learning, there is a growing demand for these systems to operate in complex and unstructured environments. Traditional model-based approaches struggle to generalize across tasks and object variations due to the high-dimensionality and complex contact dynamics of dexterous manipulation. Although model-free methods such as reinforcement learning (RL) show promise, they require extensive training, large-scale interaction data, and carefully designed rewards for stability and effectiveness. Imitation learning (IL) offers an alternative by allowing robots to acquire dexterous manipulation skills directly from expert demonstrations, capturing fine-grained coordination and contact dynamics while bypassing the need for explicit modeling and large-scale trial-and-error. This survey provides an overview of dexterous manipulation methods based on imitation learning (IL), details recent advances, and addresses key challenges in the field. Additionally, it explores potential research directions to enhance IL-driven dexterous manipulation. Our goal is to offer researchers and practitioners a comprehensive introduction to this rapidly evolving domain.
Authors: Khai Le-Duc, Tuyen Tran, Bach Phan Tat, Nguyen Kim Hai Bui, Quan Dang, Hung-Phong Tran, Thanh-Thuy Nguyen, Ly Nguyen, Tuan-Minh Phan, Thi Thu Phuong Tran, Chris Ngo, Nguyen X. Khanh, Thanh Nguyen-Tang
Abstract: Multilingual speech translation (ST) in the medical domain enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we present the first systematic study on medical ST, to our best knowledge, by releasing MultiMed-ST, a large-scale ST dataset for the medical domain, spanning all translation directions in five languages: Vietnamese, English, German, French, Traditional Chinese and Simplified Chinese, together with the models. With 290,000 samples, our dataset is the largest medical machine translation (MT) dataset and the largest many-to-many multilingual ST among all domains. Secondly, we present the most extensive analysis study in ST research to date, including: empirical baselines, bilingual-multilingual comparative study, end-to-end vs. cascaded comparative study, task-specific vs. multi-task sequence-to-sequence (seq2seq) comparative study, code-switch analysis, and quantitative-qualitative error analysis. All code, data, and models are available online: https://github.com/leduckhai/MultiMed-ST.
Authors: Shuofei Qiao, Zhisong Qiu, Baochang Ren, Xiaobin Wang, Xiangyuan Ru, Ningyu Zhang, Xiang Chen, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
Abstract: Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional agent planning approaches adopt a "flood irrigation" methodology that indiscriminately injects gold trajectories, external feedback, and domain knowledge into agent models. This practice overlooks the fundamental human cognitive principle of situational self-awareness during decision-making-the ability to dynamically assess situational demands and strategically employ resources during decision-making. We propose agentic knowledgeable self-awareness to address this gap, a novel paradigm enabling LLM-based agents to autonomously regulate knowledge utilization. Specifically, we propose KnowSelf, a data-centric approach that applies agents with knowledgeable self-awareness like humans. Concretely, we devise a heuristic situation judgement criterion to mark special tokens on the agent's self-explored trajectories for collecting training data. Through a two-stage training process, the agent model can switch between different situations by generating specific special tokens, achieving optimal planning effects with minimal costs. Our experiments demonstrate that KnowSelf can outperform various strong baselines on different tasks and models with minimal use of external knowledge. Code is available at https://github.com/zjunlp/KnowSelf.
Authors: Liviu Aolaritei, Bart P. G. Van Parys, Henry Lam, Michael I. Jordan
Abstract: Importance Sampling (IS) is a widely used variance reduction technique for enhancing the efficiency of Monte Carlo methods, particularly in rare-event simulation and related applications. Despite its power, the performance of IS is often highly sensitive to the choice of the proposal distribution and frequently requires stochastic calibration techniques. While the design and analysis of IS have been extensively studied in estimation settings, applying IS within stochastic optimization introduces a unique challenge: the decision and the IS distribution are mutually dependent, creating a circular optimization structure. This interdependence complicates both the analysis of convergence for decision iterates and the efficiency of the IS scheme. In this paper, we propose an iterative gradient-based algorithm that jointly updates the decision variable and the IS distribution without requiring time-scale separation between the two. Our method achieves the lowest possible asymptotic variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family. Furthermore, we show that these properties are preserved under linear constraints by incorporating a recent variant of Nesterov's dual averaging method.
Authors: Runnan Fang, Xiaobin Wang, Yuan Liang, Shuofei Qiao, Jialong Wu, Zekun Xi, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen
Abstract: In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments. Code is available at https://github.com/zjunlp/SynWorld.
Authors: Anas Barakat, Wayne Lin, John Lazarsfeld, Antonios Varvitsiotis
Abstract: Optimistic online learning algorithms have led to significant advances in equilibrium computation, particularly for two-player zero-sum games, achieving an iteration complexity of $\mathcal{O}(1/\epsilon)$ to reach an $\epsilon$-saddle point. These advances have been established in normal-form games, where strategies are simplex vectors, and quantum games, where strategies are trace-one positive semidefinite matrices. We extend optimistic learning to symmetric cone games (SCGs), a class of two-player zero-sum games where strategy spaces are generalized simplices (trace-one slices of symmetric cones). A symmetric cone is the cone of squares of a Euclidean Jordan Algebra; canonical examples include the nonnegative orthant, the second-order cone, the cone of positive semidefinite matrices, and their products, all fundamental to convex optimization. SCGs unify normal-form and quantum games and, as we show, offer significantly greater modeling flexibility, allowing us to model applications such as distance metric learning problems and the Fermat-Weber problem. To compute approximate saddle points in SCGs, we introduce the Optimistic Symmetric Cone Multiplicative Weights Update algorithm and establish an iteration complexity of $\mathcal{O}(1/\epsilon)$ to reach an $\epsilon$-saddle point. Our analysis builds on the Optimistic Follow-the-Regularized-Leader framework, with a key technical contribution being a new proof of the strong convexity of the symmetric cone negative entropy with respect to the trace-one norm, a result that may be of independent interest.
Authors: Akshara Prabhakar, Zuxin Liu, Weiran Yao, Jianguo Zhang, Ming Zhu, Shiyu Wang, Zhiwei Liu, Tulika Awalgaonkar, Haolin Chen, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong
Abstract: Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $\tau$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-r models to advance research in AI agents. Models are available on HuggingFace at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4 and project website is https://apigen-mt.github.io
URLs: https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4, https://apigen-mt.github.io
Authors: Xianyuan Liu, Jiayang Zhang, Shuo Zhou, Thijs L. van der Plas, Avish Vijayaraghavan, Anastasiia Grishina, Mengdie Zhuang, Daniel Schofield, Christopher Tomlinson, Yuhan Wang, Ruizhe Li, Louisa van Zeeland, Sina Tabakhi, Cyndie Demeocq, Xiang Li, Arunav Das, Orlando Timmerman, Thomas Baldwin-McDonald, Jinge Wu, Peizhen Bai, Zahraa Al Sahili, Omnia Alwazzan, Thao N. Do, Mohammod N. I. Suvon, Angeline Wang, Lucia Cipolina-Kun, Luigi A. Moretti, Lucas Farndale, Nitisha Jain, Natalia Efremova, Yan Ge, Marta Varela, Hak-Keung Lam, Oya Celiktutan, Ben R. Evans, Alejandro Coca-Castro, Honghan Wu, Zahraa S. Abdallah, Chen Chen, Valentin Danchev, Nataliya Tkachenko, Lei Lu, Tingting Zhu, Gregory G. Slabaugh, Roger K. Moore, William K. Cheung, Peter H. Charlton, Haiping Lu
Abstract: Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasise deeper integration across multiple levels of multimodality and multidisciplinary collaboration to significantly broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases: pandemic response, self-driving car design, and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability, and finance. By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.
Authors: Farzad Pourbabaee, Sophie Yanying Sheng, Peter McCrory, Luke Simon, Di Mo
Abstract: We study the problem of position allocation in job marketplaces, where the platform determines the ranking of the jobs for each seeker. The design of ranking mechanisms is critical to marketplace efficiency, as it influences both short-term revenue from promoted job placements and long-term health through sustained seeker engagement. Our analysis focuses on the tradeoff between revenue and relevance, as well as the innovations in job auction design. We demonstrated two ways to improve relevance with minimal impact on revenue: incorporating the seekers preferences and applying position-aware auctions.
Authors: Zae Myung Kim, Anand Ramachandran, Farideh Tavazoee, Joo-Kyung Kim, Oleg Rokhlenko, Dongyeop Kang
Abstract: Generating long, coherent text remains a challenge for large language models (LLMs), as they lack hierarchical planning and structured organization in discourse generation. We introduce Structural Alignment, a novel method that aligns LLMs with human-like discourse structures to enhance long-form text generation. By integrating linguistically grounded discourse frameworks into reinforcement learning, our approach guides models to produce coherent and well-organized outputs. We employ a dense reward scheme within a Proximal Policy Optimization framework, assigning fine-grained, token-level rewards based on the discourse distinctiveness relative to human writing. Two complementary reward models are evaluated: the first improves readability by scoring surface-level textual features to provide explicit structuring, while the second reinforces deeper coherence and rhetorical sophistication by analyzing global discourse patterns through hierarchical discourse motifs, outperforming both standard and RLHF-enhanced models in tasks such as essay generation and long-document summarization. All training data and code will be publicly shared at https://github.com/minnesotanlp/struct_align.
Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo, Chengyu Dong, Christine Harvey, Christopher Parisien, Dan Su, Daniel Korzekwa, Danny Yin, Daria Gitman, David Mosallanezhad, Deepak Narayanan, Denys Fridman, Dima Rekesh, Ding Ma, Dmytro Pykhtar, Dong Ahn, Duncan Riach, Dusan Stosic, Eileen Long, Elad Segal, Ellie Evans, Eric Chung, Erick Galinkin, Evelina Bakhturina, Ewa Dobrowolska, Fei Jia, Fuxiao Liu, Gargi Prasad, Gerald Shen, Guilin Liu, Guo Chen, Haifeng Qian, Helen Ngo, Hongbin Liu, Hui Li, Igor Gitman, Ilia Karmanov, Ivan Moshkov, Izik Golan, Jan Kautz, Jane Polak Scowcroft, Jared Casper, Jarno Seppanen, Jason Lu, Jason Sewall, Jiaqi Zeng, Jiaxuan You, Jimmy Zhang, Jing Zhang, Jining Huang, Jinze Xue, Jocelyn Huang, Joey Conway, John Kamalu, Jon Barker, Jonathan Cohen, Joseph Jennings, Jupinder Parmar, Karan Sapra, Kari Briski, Kateryna Chumachenko, Katherine Luna, Keshav Santhanam, Kezhi Kong, Kirthi Sivamani, Krzysztof Pawelec, Kumar Anik, Kunlun Li, Lawrence McAfee, Leon Derczynski, Lindsey Pavao, Luis Vega, Lukas Voegtle, Maciej Bala, Maer Rodrigues de Melo, Makesh Narsimhan Sreedhar, Marcin Chochowski, Markus Kliegl, Marta Stepniewska-Dziubinska, Matthieu Le, Matvei Novikov, Mehrzad Samadi, Michael Andersch, Michael Evans, Miguel Martinez, Mike Chrzanowski, Mike Ranzinger, Mikolaj Blaz, Misha Smelyanskiy, Mohamed Fawzy, Mohammad Shoeybi, Mostofa Patwary, Nayeon Lee, Nima Tajbakhsh, Ning Xu, Oleg Rybakov, Oleksii Kuchaiev, Olivier Delalleau, Osvald Nitski, Parth Chadha, Pasha Shamis, Paulius Micikevicius, Pavlo Molchanov, Peter Dykas, Philipp Fischer, Pierre-Yves Aquilanti, Piotr Bialecki, Prasoon Varshney, Pritam Gundecha, Przemek Tredak, Rabeeh Karimi, Rahul Kandu, Ran El-Yaniv, Raviraj Joshi, Roger Waleffe, Ruoxi Zhang, Sabrina Kavanaugh, Sahil Jain, Samuel Kriman, Sangkug Lym, Sanjeev Satheesh, Saurav Muralidharan, Sean Narenthiran, Selvaraj Anandaraj, Seonmyeong Bak, Sergey Kashirsky, Seungju Han, Shantanu Acharya, Shaona Ghosh, Sharath Turuvekere Sreenivas, Sharon Clay, Shelby Thomas, Shrimai Prabhumoye, Shubham Pachori, Shubham Toshniwal, Shyamala Prayaga, Siddhartha Jain, Sirshak Das, Slawek Kierat, Somshubra Majumdar, Song Han, Soumye Singhal, Sriharsha Niverty, Stefania Alborghetti, Suseella Panguluri, Swetha Bhendigeri, Syeda Nahida Akter, Szymon Migacz, Tal Shiri, Terry Kong, Timo Roman, Tomer Ronen, Trisha Saar, Tugrul Konuk, Tuomas Rintamaki, Tyler Poon, Ushnish De, Vahid Noroozi, Varun Singh, Vijay Korthikanti, Vitaly Kurin, Wasi Uddin Ahmad, Wei Du, Wei Ping, Wenliang Dai, Wonmin Byeon, Xiaowei Ren, Yao Xu, Yejin Choi, Yian Zhang, Ying Lin, Yoshi Suhara, Zhiding Yu, Zhiqi Li, Zhiyu Li, Zhongbo Zhu, Zhuolin Yang, Zijia Chen
Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. All Nemotron-H models will be released, with support in Hugging Face, NeMo, and Megatron-LM.
Authors: Guneykan Ozgul, Xiantao Li, Mehrdad Mahdavi, Chunhao Wang
Abstract: We propose quantum algorithms that provide provable speedups for Markov Chain Monte Carlo (MCMC) methods commonly used for sampling from probability distributions of the form $\pi \propto e^{-f}$, where $f$ is a potential function. Our first approach considers Gibbs sampling for finite-sum potentials in the stochastic setting, employing an oracle that provides gradients of individual functions. In the second setting, we consider access only to a stochastic evaluation oracle, allowing simultaneous queries at two points of the potential function under the same stochastic parameter. By introducing novel techniques for stochastic gradient estimation, our algorithms improve the gradient and evaluation complexities of classical samplers, such as Hamiltonian Monte Carlo (HMC) and Langevin Monte Carlo (LMC) in terms of dimension, precision, and other problem-dependent parameters. Furthermore, we achieve quantum speedups in optimization, particularly for minimizing non-smooth and approximately convex functions that commonly appear in empirical risk minimization problems.
Authors: Clara Fannjiang, Stephen Bates, Anastasios N. Angelopoulos, Jennifer Listgarten, Michael I. Jordan
Abstract: Many applications of machine learning methods involve an iterative protocol in which data are collected, a model is trained, and then outputs of that model are used to choose what data to consider next. For example, one data-driven approach for designing proteins is to train a regression model to predict the fitness of protein sequences, then use it to propose new sequences believed to exhibit greater fitness than observed in the training data. Since validating designed sequences in the wet lab is typically costly, it is important to quantify the uncertainty in the model's predictions. This is challenging because of a characteristic type of distribution shift between the training and test data in the design setting -- one in which the training and test data are statistically dependent, as the latter is chosen based on the former. Consequently, the model's error on the test data -- that is, the designed sequences -- has an unknown and possibly complex relationship with its error on the training data. We introduce a method to quantify predictive uncertainty in such settings. We do so by constructing confidence sets for predictions that account for the dependence between the training and test data. The confidence sets we construct have finite-sample guarantees that hold for any prediction algorithm, even when a trained model chooses the test-time input distribution. As a motivating use case, we demonstrate with several real data sets how our method quantifies uncertainty for the predicted fitness of designed proteins, and can therefore be used to select design algorithms that achieve acceptable trade-offs between high predicted fitness and low predictive uncertainty.
Authors: Jianhao Ma, Salar Fattahi
Abstract: We explore the local landscape of low-rank matrix recovery, focusing on reconstructing a $d_1\times d_2$ matrix $X^\star$ with rank $r$ from $m$ linear measurements, some potentially noisy. When the noise is distributed according to an outlier model, minimizing a nonsmooth $\ell_1$-loss with a simple sub-gradient method can often perfectly recover the ground truth matrix $X^\star$. Given this, a natural question is what optimization property (if any) enables such learning behavior. The most plausible answer is that the ground truth $X^\star$ manifests as a local optimum of the loss function. In this paper, we provide a strong negative answer to this question, showing that, under moderate assumptions, the true solutions corresponding to $X^\star$ do not emerge as local optima, but rather as strict saddle points -- critical points with strictly negative curvature in at least one direction. Our findings challenge the conventional belief that all strict saddle points are undesirable and should be avoided.
Authors: Vivek Yelleti, Ch Priyanka
Abstract: In the era of real-time data, traditional methods often struggle to keep pace with the dynamic nature of streaming environments. In this paper, we proposed a hybrid framework where in (i) stage-I follows a traditional approach where the model is built once and evaluated in a real-time environment, and (ii) stage-II employs an incremental learning approach where the model is continuously retrained as new data arrives, enabling it to adapt and stay up to date. To implement these frameworks, we employed 8 distinct state-of-the-art outlier detection models, including one-class support vector machine (OCSVM), isolation forest adaptive sliding window approach (IForest ASD), exact storm (ES), angle-based outlier detection (ABOD), local outlier factor (LOF), Kitsunes online algorithm (KitNet), and K-nearest neighbour conformal density and distance based (KNN CAD). We evaluated the performance of these models across seven financial and healthcare prediction tasks, including credit card fraud detection, churn prediction, Ethereum fraud detection, heart stroke prediction, and diabetes prediction. The results indicate that our proposed incremental learning framework significantly improves performance, particularly on highly imbalanced datasets. Among all models, the IForest ASD model consistently ranked among the top three best-performing models, demonstrating superior effectiveness across various datasets.
Authors: Abdalaziz Rashid Al-Maeeni, Aziz Temirkhanov, Artem Ryzhikov, Mikhail Hushchyn
Abstract: High-precision modeling of systems is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. We have developed several models of a storage system using machine learning-based generative models. The system consists of several components: hard disk drive (HDD) and solid-state drive (SSD) storage pools with different RAID schemes and cache. Each storage component is represented by a probabilistic model that describes the probability distribution of the component performance in terms of IOPS and latency, depending on their configuration and external data load parameters. The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components and models of the system. The predictions show up to 0.99 Pearson correlation with Little's law, which can be used for unsupervised reliability checks of the models. In addition, we present novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.
Authors: Lantian Zhang, Lei Guo
Abstract: As saturated output observations are ubiquitous in practice, identifying stochastic systems with such nonlinear observations is a fundamental problem across various fields. This paper investigates the asymptotically efficient identification problem for stochastic dynamical systems with saturated output observations. In contrast to most of the existing results, our results do not need the commonly used but stringent conditions such as periodic or independent assumptions on the system signals, and thus do not exclude applications to stochastic feedback systems. To be specific, we introduce a new adaptive Newton-type algorithm on the negative log-likelihood of the partially observed samples using a two-step design technique. Under some general excitation data conditions, we show that the parameter estimate is strongly consistent and asymptotically normal by employing the stochastic Lyapunov function method and limit theories for martingales. Furthermore, we show that the mean square error of the estimates can achieve the Cramer-Rao bound asymptotically without resorting to i.i.d data assumptions. This indicates that the performance of the proposed algorithm is the best possible that one can expect in general. A numerical example is provided to illustrate the superiority of our new adaptive algorithm over the existing related ones in the literature.
Authors: Yushun Dong, Binchi Zhang, Hanghang Tong, Jundong Li
Abstract: Graph Neural Networks (GNNs) have emerged as a prominent graph learning model in various graph-based tasks over the years. Nevertheless, due to the vulnerabilities of GNNs, it has been empirically proved that malicious attackers could easily corrupt the fairness level of their predictions by adding perturbations to the input graph data. In this paper, we take crucial steps to study a novel problem of certifiable defense on the fairness level of GNNs. Specifically, we propose a principled framework named ELEGANT and present a detailed theoretical certification analysis for the fairness of GNNs. ELEGANT takes any GNNs as its backbone, and the fairness level of such a backbone is theoretically impossible to be corrupted under certain perturbation budgets for attackers. Notably, ELEGANT does not have any assumption over the GNN structure or parameters, and does not require re-training the GNNs to realize certification. Hence it can serve as a plug-and-play framework for any optimized GNNs ready to be deployed. We verify the satisfactory effectiveness of ELEGANT in practice through extensive experiments on real-world datasets across different backbones of GNNs, where ELEGANT is also demonstrated to be beneficial for GNN debiasing. Open-source code can be found at https://github.com/yushundong/ELEGANT.
Authors: Benjamin Scellier, Siddhartha Mishra
Abstract: Resistor networks have recently been studied as analog computing platforms for machine learning, particularly due to their compatibility with the Equilibrium Propagation training framework. In this work, we explore the computational capabilities of these networks. We prove that electrical networks consisting of voltage sources, linear resistors, diodes, and voltage-controlled voltage sources (VCVSs) can approximate any continuous function to arbitrary precision. Central to our proof is a method for translating a neural network with rectified linear units into an approximately equivalent electrical network comprising these four elements. Our proof relies on two assumptions: (a) that circuit elements are ideal, and (b) that variable resistor conductances and VCVS amplification factors can take any value (arbitrarily small or large). Our findings provide insights that could guide the development of universal self-learning electrical networks.
Authors: Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang
Abstract: Large language models (LLMs) often incorporate multiple text chunks in their inputs to provide the necessary contexts. To speed up the prefill of the long LLM inputs, one can pre-compute the KV cache of a text and re-use the KV cache when the context is reused as the prefix of another LLM input. However, the reused text chunks are not always the input prefix, which makes precomputed KV caches not directly usable since they ignore the text's cross-attention with the preceding texts. Thus, the benefits of reusing KV caches remain largely unrealized. This paper tackles just one challenge: when an LLM input contains multiple text chunks, how to quickly combine their precomputed KV caches in order to achieve the same generation quality as the expensive full prefill (i.e., without reusing KV cache)? This challenge naturally arises in retrieval-augmented generation (RAG) where the input is supplemented with multiple retrieved texts as the context. We present CacheBlend, a scheme that reuses the precomputed KV caches, regardless prefix or not, and selectively recomputes the KV values of a small subset of tokens to partially update each reused KV cache. In the meantime, the small extra delay for recomputing some tokens can be pipelined with the retrieval of KV caches within the same job, allowing CacheBlend to store KV caches in slower devices with more storage capacity while retrieving them without increasing the inference delay. By comparing CacheBlend with the state-of-the-art KV cache reusing schemes on three open-source LLMs of various sizes and four popular benchmark datasets of different tasks, we show that CacheBlend reduces time-to-first-token (TTFT) by 2.2-3.3x and increases the inference throughput by 2.8-5x from full KV recompute without compromising generation quality. The code is available at https://github.com/LMCache/LMCache.
Authors: Zifeng Zhao, Feiyu Jiang, Yi Yu
Abstract: We study contextual dynamic pricing problems where a firm sells products to $T$ sequentially-arriving consumers, behaving according to an unknown demand model. The firm aims to minimize its regret over a clairvoyant that knows the model in advance. The demand follows a generalized linear model (GLM), allowing for stochastic feature vectors in $\mathbb R^d$ encoding product and consumer information. We first show the optimal regret is of order $\sqrt{dT}$, up to logarithmic factors, improving existing upper bounds by a $\sqrt{d}$ factor. This optimal rate is materialized by two algorithms: a confidence bound-type algorithm and an explore-then-commit (ETC) algorithm. A key insight is an intrinsic connection between dynamic pricing and contextual multi-armed bandit problems with many arms with a careful discretization. We further study contextual dynamic pricing under local differential privacy (LDP) constraints. We propose a stochastic gradient descent-based ETC algorithm achieving regret upper bounds of order $d\sqrt{T}/\epsilon$, up to logarithmic factors, where $\epsilon>0$ is the privacy parameter. The upper bounds with and without LDP constraints are matched by newly constructed minimax lower bounds, characterizing costs of privacy. Moreover, we extend our study to dynamic pricing under mixed privacy constraints, improving the privacy-utility tradeoff by leveraging public data. This is the first time such setting is studied in the dynamic pricing literature and our theoretical results seamlessly bridge dynamic pricing with and without LDP. Extensive numerical experiments and real data applications are conducted to illustrate the efficiency and practical value of our algorithms.
Authors: Yili Wang, Yixin Liu, Xu Shen, Chenyu Li, Kaize Ding, Rui Miao, Ying Wang, Shirui Pan, Xin Wang
Abstract: To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though those two lines of research indeed share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating a gap that hinders the application and evaluation of methods from one to the other. To bridge the gap, in this work, we present a \underline{\textbf{U}}nified \underline{\textbf{B}}enchmark for unsupervised \underline{\textbf{G}}raph-level \underline{\textbf{O}}OD and anoma\underline{\textbf{L}}y \underline{\textbf{D}}etection (\ourmethod), a comprehensive evaluation framework that unifies GLAD and GLOD under the concept of generalized graph-level OOD detection. Our benchmark encompasses 35 datasets spanning four practical anomaly and OOD detection scenarios, facilitating the comparison of 18 representative GLAD/GLOD methods. We conduct multi-dimensional analyses to explore the effectiveness, OOD sensitivity spectrum, robustness, and efficiency of existing methods, shedding light on their strengths and limitations. Furthermore, we provide an open-source codebase (https://github.com/UB-GOLD/UB-GOLD) of \ourmethod to foster reproducible research and outline potential directions for future investigations based on our insights.
Authors: Dong Liu, Yanxuan Yu
Abstract: Efficient machine learning (ML) has become increasingly important as models grow larger and data volumes expand. In this work, we address the trade-off between generalization in multi-task learning (MTL) and precision in single-task learning (STL) by introducing the Multi-Task to Single-Task (MT2ST) framework. MT2ST is designed to enhance training efficiency and accuracy in multi-modal tasks, showcasing its value as a practical application of efficient ML.
Authors: Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, Tatsunori Hashimoto, Carlos Guestrin
Abstract: Self-attention performs well in long context but has quadratic complexity. Existing RNN layers have linear complexity, but their performance in long context is limited by the expressive power of their hidden states. We present a practical framework for instantiating sequence modeling layers with linear complexity and expressive hidden states. The key idea is to make the hidden state a machine learning model itself, and the update rule a step of self-supervised learning. Since the hidden state is updated by training even on test sequences, our layers are called Test-Time Training (TTT) layers. We consider two instantiations: TTT-Linear and TTT-MLP, whose hidden state is a linear model and a two-layer MLP respectively. We evaluate our instantiations at the scale of 125M to 1.3B parameters, comparing with a strong Transformer and Mamba, a modern RNN. Similar to Transformer, TTT-Linear and TTT-MLP can keep reducing perplexity by conditioning on more tokens, while Mamba cannot after 16k context. TTT-MLP still faces challenges in memory I/O, but shows larger potential in long context, pointing to a promising direction for future research.
Authors: Tiago Novello, Diana Aldana, Andre Araujo, Luiz Velho
Abstract: Sinusoidal neural networks have been shown effective as implicit neural representations (INRs) of low-dimensional signals, due to their smoothness and high representation capacity. However, initializing and training them remain empirical tasks which lack on deeper understanding to guide the learning process. To fill this gap, our work introduces a theoretical framework that explains the capacity property of sinusoidal networks and offers robust control mechanisms for initialization and training. Our analysis is based on a novel amplitude-phase expansion of the sinusoidal multilayer perceptron, showing how its layer compositions produce a large number of new frequencies expressed as integer combinations of the input frequencies. This relationship can be directly used to initialize the input neurons, as a form of spectral sampling, and to bound the network's spectrum while training. Our method, referred to as TUNER (TUNing sinusoidal nEtwoRks), greatly improves the stability and convergence of sinusoidal INR training, leading to detailed reconstructions, while preventing overfitting.
Authors: Shahbaz Rezaei, Xin Liu
Abstract: Human understandable explanation of deep learning models is essential for various critical and sensitive applications. Unlike image or tabular data where the importance of each input feature (for the classifier's decision) can be directly projected into the input, time series distinguishable features (e.g. dominant frequency) are often hard to manifest in time domain for a user to easily understand. Additionally, most explanation methods require a baseline value as an indication of the absence of any feature. However, the notion of lack of feature, which is often defined as black pixels for vision tasks or zero/mean values for tabular data, is not well-defined in time series. Despite the adoption of explainable AI methods (XAI) from tabular and vision domain into time series domain, these differences limit the application of these XAI methods in practice. In this paper, we propose a simple yet effective method that allows a model originally trained on the time domain to be interpreted in other explanation spaces using existing methods. We suggest five explanation spaces, each of which can potentially alleviate these issues in certain types of time series. Our method can be easily integrated into existing platforms without any changes to trained models or XAI methods. The code will be released upon acceptance.
Authors: Shikun Feng, Yuyan Ni, Yan Lu, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan
Abstract: Molecular generation and molecular property prediction are both crucial for drug discovery, but they are often developed independently. Inspired by recent studies, which demonstrate that diffusion model, a prominent generative approach, can learn meaningful data representations that enhance predictive tasks, we explore the potential for developing a unified generative model in the molecular domain that effectively addresses both molecular generation and property prediction tasks. However, the integration of these tasks is challenging due to inherent inconsistencies, making simple multi-task learning ineffective. To address this, we propose UniGEM, the first unified model to successfully integrate molecular generation and property prediction, delivering superior performance in both tasks. Our key innovation lies in a novel two-phase generative process, where predictive tasks are activated in the later stages, after the molecular scaffold is formed. We further enhance task balance through innovative training strategies. Rigorous theoretical analysis and comprehensive experiments demonstrate our significant improvements in both tasks. The principles behind UniGEM hold promise for broader applications, including natural language processing and computer vision.
Authors: Xingrui Yu, Zhenglin Wan, David Mark Bossens, Yueming Lyu, Qing Guo, Ivor W. Tsang
Abstract: Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for \textit{quality diversity imitation learning}, which bridges the quality diversity optimization and imitation learning methods, are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.
Authors: Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, Emanuele Rodol\`a
Abstract: Task Arithmetic has emerged as a simple yet effective method to merge models without additional training. However, by treating entire networks as flat parameter vectors, it overlooks key structural information and is susceptible to task interference. In this paper, we study task vectors at the layer level, focusing on task layer matrices and their singular value decomposition. In particular, we concentrate on the resulting singular vectors, which we refer to as Task Singular Vectors (TSV). Recognizing that layer task matrices are often low-rank, we propose TSV-Compress (TSV-C), a simple procedure that compresses them to 10% of their original size while retaining 99% of accuracy. We further leverage this low-rank space to define a new measure of task interference based on the interaction of singular vectors from different tasks. Building on these findings, we introduce TSV-Merge (TSV-M), a novel model merging approach that combines compression with interference reduction, significantly outperforming existing methods.
Authors: Alex Mallen, Charlie Griffin, Misha Wagner, Alessandro Abate, Buck Shlegeris
Abstract: An AI control protocol is a plan for usefully deploying AI systems that aims to prevent an AI from intentionally causing some unacceptable outcome. This paper investigates how well AI systems can generate and act on their own strategies for subverting control protocols whilst operating statelessly (without shared memory between contexts). To do this, an AI system may need to reliably generate optimal plans in each context, take actions with well-calibrated probabilities, and coordinate plans with other instances of itself without communicating. We develop Subversion Strategy Eval, a suite of eight environments, covering a range of protocols and strategic capabilities, and six sets of affordances that help isolate individual capabilities. We implement the evaluation in Inspect-AI and release it open-source. We evaluate Claude 3.5 models, including helpful-only versions, as well as OpenAI reasoning models. None of the models demonstrate substantial capability in strategizing to subvert control protocols statelessly. However, providing models with additional affordances, such as the ability to share a plan between contexts, can substantially improve performance. We hope our evaluations can act as a leading indicator for when models are capable of subverting control protocols and also relax the worst-case assumption of perfect strategic ability in AI control evaluations.
Authors: Sergio Rozada, Jose Luis Orejuela, Antonio G. Marques
Abstract: We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary. This aggravates the challenges of high-dimensional MDPs, as they suffer from the curse of dimensionality and high sample complexity. To address these issues, we propose modeling the VFs of finite-horizon MDPs as low-rank tensors, enabling a scalable representation that renders the problem of learning optimal policies tractable. We introduce an optimization-based framework for solving the Bellman equations with low-rank constraints, along with block-coordinate descent (BCD) and block-coordinate gradient descent (BCGD) algorithms, both with theoretical convergence guarantees. For scenarios where the system dynamics are unknown, we adapt the proposed BCGD method to estimate the VFs using sampled trajectories. Numerical experiments further demonstrate that the proposed framework reduces computational demands in controlled synthetic scenarios and more realistic resource allocation problems.
Authors: Luis Gaspar Schroeder, Shu Liu, Alejandro Cuadron, Mark Zhao, Stephan Krusche, Alfons Kemper, Matei Zaharia, Joseph E. Gonzalez
Abstract: Semantic prompt caches reduce the latency and cost of large language model (LLM) inference by reusing cached LLM-generated responses for semantically similar prompts. Vector similarity metrics assign a numerical score to quantify the similarity between an embedded prompt and its nearest neighbor in the cache. Existing systems rely on a static threshold to classify whether the similarity score is sufficiently high to result in a cache hit. We show that this one-size-fits-all threshold is insufficient across different embeddings. We propose VectorQ, an online framework with a threshold convergence guarantee to learn embedding-specific threshold regions that adapt to the uncertainty of an embedding. Through evaluations on a combination of three diverse datasets, we show that VectorQ consistently outperforms state-of-the-art systems across all static thresholds, achieving up to 26x increases in cache hit rate and error rate reductions up to 74%.
Authors: Lisa Wimmer, Bernd Bischl, Ludwig Bothmann
Abstract: The correct way to quantify predictive uncertainty in neural networks remains a topic of active discussion. In particular, it is unclear whether the state-of-the art entropy decomposition leads to a meaningful representation of model, or epistemic, uncertainty (EU) in the light of a debate that pits ignorance against disagreement perspectives. We aim to reconcile the conflicting viewpoints by arguing that both are valid but arise from different learning situations. Notably, we show that the presence of shortcuts is decisive for EU manifesting as disagreement.
Authors: Andreas Maurer
Abstract: The paper gives a bound on the generalization error of the Gibbs algorithm, which recovers known data-independent bounds for the high temperature range and extends to the low-temperature range, where generalization depends critically on the data-dependent loss-landscape. It is shown, that with high probability the generalization error of a single hypothesis drawn from the Gibbs posterior decreases with the total prior volume of all hypotheses with similar or smaller empirical error. This gives theoretical support to the belief in the benefit of flat minima. The zero temperature limit is discussed and the bound is extended to a class of similar stochastic algorithms.
Authors: Max Getter
Abstract: Rapid information (energy) propagation in deep feature extractors is crucial to balance computational complexity versus expressiveness as a representation of the input. We prove an upper bound for the speed of energy propagation in a unified framework that covers different neural network models, both over Euclidean and non-Euclidean domains. Additional structural information about the signal domain can be used to explicitly determine or improve the rate of decay. To illustrate this, we show global exponential energy decay for a range of 1) feature extractors with discrete-domain input signals, and 2) convolutional neural networks (CNNs) via scattering over locally compact abelian (LCA) groups.
Authors: Long Chen, Xianchao Xiu
Abstract: Sparse principal component analysis (PCA) is a well-established dimensionality reduction technique that is often used for unsupervised feature selection (UFS). However, determining the regularization parameters is rather challenging, and conventional approaches, including grid search and Bayesian optimization, not only bring great computational costs but also exhibit high sensitivity. To address these limitations, we first establish a structured sparse PCA formulation by integrating $\ell_1$-norm and $\ell_{2,1}$-norm to capture the local and global structures, respectively. Building upon the off-the-shelf alternating direction method of multipliers (ADMM) optimization framework, we then design an interpretable deep unfolding network that translates iterative optimization steps into trainable neural architectures. This innovation enables automatic learning of the regularization parameters, effectively bypassing the empirical tuning requirements of conventional methods. Numerical experiments on benchmark datasets validate the advantages of our proposed method over the existing state-of-the-art methods. Our code will be accessible at https://github.com/xianchaoxiu/SPCA-Net.
Authors: Evgeniia Vu, Andrei Boiarov, Dmitry Vetrov
Abstract: Generating co-speech gestures in real time requires both temporal coherence and efficient sampling. We introduce Accelerated Rolling Diffusion, a novel framework for streaming gesture generation that extends rolling diffusion models with structured progressive noise scheduling, enabling seamless long-sequence motion synthesis while preserving realism and diversity. We further propose Rolling Diffusion Ladder Acceleration (RDLA), a new approach that restructures the noise schedule into a stepwise ladder, allowing multiple frames to be denoised simultaneously. This significantly improves sampling efficiency while maintaining motion consistency, achieving up to a 2x speedup with high visual fidelity and temporal coherence. We evaluate our approach on ZEGGS and BEAT, strong benchmarks for real-world applicability. Our framework is universally applicable to any diffusion-based gesture generation model, transforming it into a streaming approach. Applied to three state-of-the-art methods, it consistently outperforms them, demonstrating its effectiveness as a generalizable and efficient solution for real-time, high-fidelity co-speech gesture synthesis.
Authors: Yuhang Wang, Hanwei Guo, Sizhe Wang, Long Qian, Xuguang Lan
Abstract: Model Predictive Control (MPC) has been demonstrated to be effective in continuous control tasks. When a world model and a value function are available, planning a sequence of actions ahead of time leads to a better policy. Existing methods typically obtain the value function and the corresponding policy in a model-free manner. However, we find that such an approach struggles with complex tasks, resulting in poor policy learning and inaccurate value estimation. To address this problem, we leverage the strengths of MPC itself. In this work, we introduce Bootstrapped Model Predictive Control (BMPC), a novel algorithm that performs policy learning in a bootstrapped manner. BMPC learns a network policy by imitating an MPC expert, and in turn, uses this policy to guide the MPC process. Combined with model-based TD-learning, our policy learning yields better value estimation and further boosts the efficiency of MPC. We also introduce a lazy reanalyze mechanism, which enables computationally efficient imitation learning. Our method achieves superior performance over prior works on diverse continuous control tasks. In particular, on challenging high-dimensional locomotion tasks, BMPC significantly improves data efficiency while also enhancing asymptotic performance and training stability, with comparable training time and smaller network sizes. Code is available at https://github.com/wertyuilife2/bmpc.
Authors: George Rapakoulias, Ali Reza Pedram, Panagiotis Tsiotras
Abstract: The Mean-Field Schrodinger Bridge (MFSB) problem is an optimization problem aiming to find the minimum effort control policy to drive a McKean-Vlassov stochastic differential equation from one probability measure to another. In the context of multiagent control, the objective is to control the configuration of a swarm of identical, interacting cooperative agents, as captured by the time-varying probability measure of their state. Available methods for solving this problem for distributions with continuous support rely either on spatial discretizations of the problem's domain or on approximating optimal solutions using neural networks trained through stochastic optimization schemes. For agents following Linear Time-Varying dynamics, and for Gaussian Mixture Model boundary distributions, we propose a highly efficient parameterization to approximate the solutions of the corresponding MFSB in closed form, without any learning steps. Our proposed approach consists of a mixture of elementary policies, each solving a Gaussian-to-Gaussian Covariance Steering problem from the components of the initial to the components of the terminal mixture. Leveraging the semidefinite formulation of the Covariance Steering problem, our proposed solver can handle probabilistic hard constraints on the system's state, while maintaining numerical tractability. We illustrate our approach on a variety of numerical examples.
Authors: Israel Campero Jurado, Pieter Gijsbers, Joaquin Vanschoren
Abstract: Automated Machine Learning (AutoML) automatically builds machine learning (ML) models on data. The de facto standard for evaluating new AutoML frameworks for tabular data is the AutoML Benchmark (AMLB). AMLB proposed to evaluate AutoML frameworks using 1- and 4-hour time budgets across 104 tasks. We argue that shorter time constraints should be considered for the benchmark because of their practical value, such as when models need to be retrained with high frequency, and to make AMLB more accessible. This work considers two ways in which to reduce the overall computation used in the benchmark: smaller time constraints and the use of early stopping. We conduct evaluations of 11 AutoML frameworks on 104 tasks with different time constraints and find the relative ranking of AutoML frameworks is fairly consistent across time constraints, but that using early-stopping leads to a greater variety in model performance.
Authors: Carlos Fern\'andez-Lor\'ia, Jorge Lor\'ia
Abstract: Who should we prioritize for intervention when we cannot estimate intervention effects? In many applied domains (e.g., advertising, customer retention, and behavioral nudging) prioritization is guided by predictive models that estimate outcome probabilities rather than causal effects. This paper investigates when these predictions (scores) can effectively rank individuals by their intervention effects, particularly when direct effect estimation is infeasible or unreliable. We propose a conceptual framework based on amenability: an individual's latent proclivity to be influenced by an intervention. We then formalize conditions under which predictive scores serve as effective proxies for amenability. These conditions justify using non-causal scores for intervention prioritization, even when the scores do not directly estimate effects. We further show that, under plausible assumptions, predictive models can outperform causal effect estimators in ranking individuals by intervention effects. Empirical evidence from an advertising context supports our theoretical findings, demonstrating that predictive modeling can offer a more robust approach to targeting than effect estimation. Our framework suggests a shift in focus, from estimating effects to inferring who is amenable, as a practical and theoretically grounded strategy for prioritizing interventions in resource-constrained environments.
Authors: Yan Ma, Steffi Chern, Xuyang Shen, Yiran Zhong, Pengfei Liu
Abstract: Reinforcement learning (RL) has recently shown strong potential in improving the reasoning capabilities of large language models and is now being actively extended to vision-language models (VLMs). However, existing RL applications in VLMs often rely on heavily engineered frameworks that hinder reproducibility and accessibility, while lacking standardized evaluation protocols, making it difficult to compare results or interpret training dynamics. This work introduces a transparent, from-scratch framework for RL in VLMs, offering a minimal yet functional four-step pipeline validated across multiple models and datasets. In addition, a standardized evaluation scheme is proposed to assess training dynamics and reflective behaviors. Extensive experiments on visual reasoning tasks uncover key empirical findings: response length is sensitive to random seeds, reflection correlates with output length, and RL consistently outperforms supervised fine-tuning (SFT) in generalization, even with high-quality data. These findings, together with the proposed framework, aim to establish a reproducible baseline and support broader engagement in RL-based VLM research.
Authors: Yuhang Li, Ruokai Yin, Donghyun Lee, Shiting Xiao, Priyadarshini Panda
Abstract: We introduce GPTQv2, a novel finetuning-free quantization method for compressing large-scale transformer architectures. Unlike the previous GPTQ method, which independently calibrates each layer, we always match the quantized layer's output to the exact output in the full-precision model, resulting in a scheme that we call asymmetric calibration. Such a scheme can effectively reduce the quantization error accumulated in previous layers. We analyze this problem using optimal brain compression to derive a close-formed solution. The new solution explicitly minimizes the quantization error as well as the accumulated asymmetry error. Furthermore, we utilize various techniques to parallelize the solution calculation, including channel parallelization, neuron decomposition, and Cholesky reformulation for matrix fusion. As a result, GPTQv2 is easy to implement, simply using 20 more lines of code than GPTQ but improving its performance under low-bit quantization. Remarkably, on a single GPU, we quantize a 405B language transformer as well as EVA-02 the rank first vision transformer that achieves 90% pretraining Imagenet accuracy. Code is available at github.com/Intelligent-Computing-Lab-Yale/GPTQv2.
Authors: Salar Basiri, Alisina Bayati, Srinivasa Salapaka
Abstract: This article presents a novel approach to solving the sparsity-constrained Orthogonal Nonnegative Matrix Factorization (SCONMF) problem, which requires decomposing a non-negative data matrix into the product of two lower-rank non-negative matrices, X=WH, where the mixing matrix H has orthogonal rows HH^T=I, while also satisfying an upper bound on the number of nonzero elements in each row. By reformulating SCONMF as a capacity-constrained facility-location problem (CCFLP), the proposed method naturally integrates non-negativity, orthogonality, and sparsity constraints. Specifically, our approach integrates control-barrier function (CBF) based framework used for dynamic optimal control design problems with maximum-entropy-principle-based framework used for facility location problems to enforce these constraints while ensuring robust factorization. Additionally, this work introduces a quantitative approach for determining the ``true" rank of W or H, equivalent to the number of ``true" features - a critical aspect in ONMF applications where the number of features is unknown. Simulations on various datasets demonstrate significantly improved factorizations with low reconstruction errors (as small as by 150 times) while strictly satisfying all constraints, outperforming existing methods that struggle with balancing accuracy and constraint adherence.
Authors: Mingzhao Yang, Shangchao Su, Bin Li, Xiangyang Xue
Abstract: In recent years, One-shot Federated Learning methods based on Diffusion Models have garnered increasing attention due to their remarkable performance. However, most of these methods require the deployment of foundation models on client devices, which significantly raises the computational requirements and reduces their adaptability to heterogeneous client models compared to traditional FL methods. In this paper, we propose FedLMG, a heterogeneous one-shot Federated learning method with Local Model-Guided diffusion models. Briefly speaking, in FedLMG, clients do not need access to any foundation models but only train and upload their local models, which is consistent with traditional FL methods. On the clients, we employ classification loss and BN loss to capture the broad category features and detailed contextual features of the client distributions. On the server, based on the uploaded client models, we utilize backpropagation to guide the server's DM in generating synthetic datasets that comply with the client distributions, which are then used to train the aggregated model. By using the locally trained client models as a medium to transfer client knowledge, our method significantly reduces the computational requirements on client devices and effectively adapts to scenarios with heterogeneous clients. Extensive quantitation and visualization experiments on three large-scale real-world datasets, along with theoretical analysis, demonstrate that the synthetic datasets generated by FedLMG exhibit comparable quality and diversity to the client datasets, which leads to an aggregated model that outperforms all compared methods and even the performance ceiling, further elucidating the significant potential of utilizing DMs in FL.
Authors: Jinhang Chai, Jianqing Fan
Abstract: The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any joint distribution with arbitrary dependence across entries. We propose an incoherent-constrained least-square estimator and prove its tightness both in the sense of deterministic lower bound and matching minimax risks under various noise distributions. To attain this, we establish a novel result asserting that the difference between two arbitrary low-rank incoherent matrices must spread energy out across its entries; in other words, it cannot be too sparse, which sheds light on the structure of incoherent low-rank matrices and may be of independent interest. We then showcase the applications of our framework to several important statistical machine learning problems. In the problem of estimating a structured Markov transition kernel, the proposed method achieves the minimax optimality and the result can be extended to estimating the conditional mean operator, a crucial component in reinforcement learning. The applications to multitask regression and structured covariance estimation are also presented. We propose an alternating minimization algorithm to approximately solve the potentially hard optimization problem. Numerical results corroborate the effectiveness of our method which typically converges in a few steps.
Authors: Jianyu Hu, Juan-Pablo Ortega, Daiying Yin
Abstract: A structure-preserving kernel ridge regression method is presented that allows the recovery of nonlinear Hamiltonian functions out of datasets made of noisy observations of Hamiltonian vector fields. The method proposes a closed-form solution that yields excellent numerical performances that surpass other techniques proposed in the literature in this setup. From the methodological point of view, the paper extends kernel regression methods to problems in which loss functions involving linear functions of gradients are required and, in particular, a differential reproducing property and a Representer Theorem are proved in this context. The relation between the structure-preserving kernel estimator and the Gaussian posterior mean estimator is analyzed. A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters. The good performance of the proposed estimator together with the convergence rate is illustrated with various numerical experiments.
Authors: Siyuan Feng, Jiawei Liu, Ruihang Lai, Charlie F. Ruan, Yong Yu, Lingming Zhang, Tianqi Chen
Abstract: While existing machine learning (ML) frameworks focus on established platforms, like running CUDA on server-grade GPUs, there have been growing demands to enable emerging AI applications in a broader set of scenarios, such as running Large Language Models (LLMs) within browsers and mobile phones. However, deploying emerging models on new platforms (such as Metal and WebGPU) presents significant software engineering challenges due to rapid model evolution and limited tooling and practices for these platforms. Previous practice for ML model deployment often follows a bottom-up fashion, where engineers first implement individual required operators and then put them together. However, this traditional development approach fails to meet the productivity requirements when deploying emerging ML applications, with the testing and debugging part as a bottleneck. To this end, we introduce \textsc{TapML}, a top-down approach designed to streamline model deployment on diverse platforms. While the traditional bottom-up approach requires crafting manual tests, \textsc{TapML} automatically creates high-quality, realistic test data through operator-wise test carving. Furthermore, \textsc{TapML} uses a migration-based strategy to gradually offload model implementation from the mature source platform to the target platform, minimizing the debugging scope of compound errors. \textsc{TapML} has been used as the default development method in the MLC-LLM project to deploy emerging ML models. Within 2 years, \textsc{TapML} has accelerated the deployment of 105 emerging models in 27 model architectures across 5 emerging platforms. We show that \textsc{TapML} effectively boosts developer productivity while ensuring the quality of deployed models. Furthermore, we summarize comprehensive case studies from our real-world development, offering best practices for developing emerging ML systems.
Authors: Sinisa Stekovic, Arslan Artykov, Stefan Ainetter, Mattia D'Urso, Friedrich Fraundorfer
Abstract: We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects and their parameters from images using interpretable shape programs. Unlike traditional CAD model retrieval, shape programs allow reasoning about semantic parameters, editing, and a low memory footprint. Despite their potential, shape programs for 3D scene understanding have been largely overlooked. Our key contribution is enabling gradient-based optimization by parsing shape programs, or more precisely procedural models designed in Blender, into efficient PyTorch code. While there are many possible applications of our PyTochGeoNodes, we show that a combination of PyTorchGeoNodes with genetic algorithm is a method of choice to optimize both discrete and continuous shape program parameters for 3D reconstruction and understanding of 3D object parameters. Our modular framework can be further integrated with other reconstruction algorithms, and we demonstrate one such integration to enable procedural Gaussian splatting. Our experiments on the ScanNet dataset show that our method achieves accurate reconstructions while enabling, until now, unseen level of 3D scene understanding.
Authors: Chenyu Huang, Zhengyang Tang, Shixi Hu, Ruoqing Jiang, Xin Zheng, Dongdong Ge, Benyou Wang, Zizhuo Wang
Abstract: Optimization modeling plays a critical role in the application of Operations Research (OR) tools to address real-world problems, yet they pose challenges and require extensive expertise from OR experts. With the advent of large language models (LLMs), new opportunities have emerged to streamline and automate such task. However, current research predominantly relies on closed-source LLMs such as GPT-4, along with extensive prompt engineering techniques. This reliance stems from the scarcity of high-quality training datasets for optimization modeling, resulting in elevated costs, prolonged processing times, and privacy concerns. To address these challenges, our work is the first to propose a viable path for training open-source LLMs that are capable of optimization modeling and developing solver codes, eventually leading to a superior ability for automating optimization modeling and solving. Particularly, we design the {\sc OR-Instruct}, a semi-automated data synthesis framework for optimization modeling that enables customizable enhancements for specific scenarios or model types. This work also introduces IndustryOR, the first industrial benchmark for evaluating LLMs in solving practical OR problems. We train several 7B-scale open-source LLMs using synthesized data (dubbed ORLMs{https://github.com/Cardinal-Operations/ORLM}), which exhibit significantly enhanced optimization modeling capabilities, achieving competitive performance across the NL4OPT, MAMO, and IndustryOR benchmarks. Additionally, our experiments highlight the potential of scaling law and reinforcement learning to further enhance the performance of ORLMs. The workflows and human-machine interaction paradigms of ORLMs in practical industrial applications are also discussed in the paper.
Authors: Khai Le-Duc, Khai-Nguyen Nguyen, Long Vo-Dang, Truong-Son Hy
Abstract: In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum
URLs: https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum
Authors: Henry Shaowu Yuchi, Shixiang Zhu, Li Dong, Yigit M. Arisoy, Matthew C. Spencer
Abstract: Modeling and analysis for event series generated by users of heterogeneous behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to assign users to behavior-based categories and analyze each of them separately. However, this requires extensive data to fully understand the user behavior, presenting challenges in modeling newcomers without significant historical knowledge. In this work, we propose a novel discrete event prediction framework for new users with limited history, without needing to know the user's category. We treat the user event history as the "treatment" for future events and the user category as the key confounder. Thus, the prediction problem can be framed as counterfactual outcome estimation, where each event is re-weighted by its inverse propensity score. We demonstrate the improved performance of the proposed framework with a numerical simulation study and two real-world applications, including Netflix rating prediction and seller contact prediction for customer support at Amazon.
Authors: Anubhab Ghosh, Yonina C. Eldar, Saikat Chatterjee
Abstract: We consider data-driven Bayesian state estimation from compressed measurements (BSCM) of a model-free process. The dimension of the temporal measurement vector is lower than that of the temporal state vector to be estimated, leading to an under-determined inverse problem. The underlying dynamical model of the state's evolution is unknown for a 'model-free process.' Hence, it is difficult to use traditional model-driven methods, for example, Kalman and particle filters. Instead, we consider data-driven methods. We experimentally show that two existing unsupervised learning-based data-driven methods fail to address the BSCM problem in a model-free process. The methods are -- data-driven nonlinear state estimation (DANSE) and deep Markov model (DMM). While DANSE provides good predictive/forecasting performance to model the temporal measurement data as a time series, its unsupervised learning lacks suitable regularization for tackling the BSCM task. We then propose a semi-supervised learning approach and develop a semi-supervised learning-based DANSE method, referred to as SemiDANSE. In SemiDANSE, we use a large amount of unlabelled data along with a limited amount of labelled data, i.e., pairwise measurement-and-state data, which provides the desired regularization. Using three benchmark dynamical systems, we show that the data-driven SemiDANSE provides competitive state estimation performance for BSCM against a hybrid method called KalmanNet and two model-driven methods (extended Kalman filter and unscented Kalman filter) that know the dynamical models exactly.
Authors: Yunyi Shen, Renato Berlinghieri, Tamara Broderick
Abstract: Practitioners often aim to infer an unobserved population trajectory using sample snapshots at multiple time points. E.g., given single-cell sequencing data, scientists would like to learn how gene expression changes over a cell's life cycle. But sequencing any cell destroys that cell. So we can access data for any particular cell only at a single time point, but we have data across many cells. The deep learning community has recently explored using Schr\"odinger bridges (SBs) and their extensions in similar settings. However, existing methods either (1) interpolate between just two time points or (2) require a single fixed reference dynamic (often set to Brownian motion within SBs). But learning piecewise from adjacent time points can fail to capture long-term dependencies. And practitioners are typically able to specify a model family for the reference dynamic but not the exact values of the parameters within it. So we propose a new method that (1) learns the unobserved trajectories from sample snapshots across multiple time points and (2) requires specification only of a family of reference dynamics, not a single fixed one. We demonstrate the advantages of our method on simulated and real data.
Authors: Shahriar Golchin, Mihai Surdeanu, Steven Bethard, Eduardo Blanco, Ellen Riloff
Abstract: In-context learning (ICL) has proven to be an effective strategy for improving the performance of large language models (LLMs) with no additional training. However, the exact mechanism behind this performance improvement remains unclear. This study is the first to show how ICL surfaces memorized training data and to explore the correlation between this memorization and performance on downstream tasks across various ICL regimes: zero-shot, few-shot, and many-shot. Our most notable findings include: (1) ICL significantly surfaces memorization compared to zero-shot learning in most cases; (2) demonstrations, without their labels, are the most effective element in surfacing memorization; (3) ICL improves performance when the surfaced memorization in few-shot regimes reaches a high level (about 40%); and (4) there is a very strong correlation between performance and memorization in ICL when it outperforms zero-shot learning. Overall, our study uncovers memorization as a new factor impacting ICL, raising an important question: to what extent do LLMs truly generalize from demonstrations in ICL, and how much of their success is due to memorization?
Authors: Daniel M. Steinberg, Rafael Oliveira, Cheng Soon Ong, Edwin V. Bonilla
Abstract: We develop VSD, a method for conditioning a generative model of discrete, combinatorial designs on a rare desired class by efficiently evaluating a black-box (e.g. experiment, simulation) in a batch sequential manner. We call this task active generation; we formalize active generation's requirements and desiderata, and formulate a solution via variational inference. VSD uses off-the-shelf gradient based optimization routines, can learn powerful generative models for desirable designs, and can take advantage of scalable predictive models. We derive asymptotic convergence rates for learning the true conditional generative distribution of designs with certain configurations of our method. After illustrating the generative model on images, we empirically demonstrate that VSD can outperform existing baseline methods on a set of real sequence-design problems in various protein and DNA/RNA engineering tasks.
Authors: Hao Jian Huang, Hakan T. Otal, M. Abdullah Canbaz
Abstract: This paper presents the design and implementation of a Federated Learning (FL) testbed, focusing on its application in cybersecurity and evaluating its resilience against poisoning attacks. Federated Learning allows multiple clients to collaboratively train a global model while keeping their data decentralized, addressing critical needs for data privacy and security, particularly in sensitive fields like cybersecurity. Our testbed, built using Raspberry Pi and Nvidia Jetson hardware by running the Flower framework, facilitates experimentation with various FL frameworks, assessing their performance, scalability, and ease of integration. Through a case study on federated intrusion detection systems, the testbed's capabilities are shown in detecting anomalies and securing critical infrastructure without exposing sensitive network data. Comprehensive poisoning tests, targeting both model and data integrity, evaluate the system's robustness under adversarial conditions. The results show that while federated learning enhances data privacy and distributed learning, it remains vulnerable to poisoning attacks, which must be mitigated to ensure its reliability in real-world applications.
Authors: Kaiwen Wang, Nathan Kallus, Wen Sun
Abstract: This paper illustrates the central role of loss functions in data-driven decision making, providing a comprehensive survey on their influence in cost-sensitive classification (CSC) and reinforcement learning (RL). We demonstrate how different regression loss functions affect the sample efficiency and adaptivity of value-based decision making algorithms. Across multiple settings, we prove that algorithms using the binary cross-entropy loss achieve first-order bounds scaling with the optimal policy's cost and are much more efficient than the commonly used squared loss. Moreover, we prove that distributional algorithms using the maximum likelihood loss achieve second-order bounds scaling with the policy variance and are even sharper than first-order bounds. This in particular proves the benefits of distributional RL. We hope that this paper serves as a guide analyzing decision making algorithms with varying loss functions, and can inspire the reader to seek out better loss functions to improve any decision making algorithm.
Authors: Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
Abstract: We develop a solvable model of neural scaling laws beyond the kernel limit. Theoretical analysis of this model shows how performance scales with model size, training time, and the total amount of available data. We identify three scaling regimes corresponding to varying task difficulties: hard, easy, and super easy tasks. For easy and super-easy target functions, which lie in the reproducing kernel Hilbert space (RKHS) defined by the initial infinite-width Neural Tangent Kernel (NTK), the scaling exponents remain unchanged between feature learning and kernel regime models. For hard tasks, defined as those outside the RKHS of the initial NTK, we demonstrate both analytically and empirically that feature learning can improve scaling with training time and compute, nearly doubling the exponent for hard tasks. This leads to a different compute optimal strategy to scale parameters and training time in the feature learning regime. We support our finding that feature learning improves the scaling law for hard tasks but not for easy and super-easy tasks with experiments of nonlinear MLPs fitting functions with power-law Fourier spectra on the circle and CNNs learning vision tasks.
Authors: Dara Bahri, John Wieting
Abstract: Watermarking has recently emerged as an effective strategy for detecting the outputs of large language models (LLMs). Most existing schemes require white-box access to the model's next-token probability distribution, which is typically not accessible to downstream users of an LLM API. In this work, we propose a principled watermarking scheme that requires only the ability to sample sequences from the LLM (i.e. black-box access), boasts a distortion-free property, and can be chained or nested using multiple secret keys. We provide performance guarantees, demonstrate how it can be leveraged when white-box access is available, and show when it can outperform existing white-box schemes via comprehensive experiments.
Authors: Soobin Um, Jong Chul Ye
Abstract: We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretrained T2I diffusion models primarily focus on high-density regions, largely due to the influence of guided samplers (like CFG) that are essential for high-quality generation. To address this, we present a novel framework to counter the high-density-focus of T2I diffusion models. Specifically, we first develop an online prompt optimization framework that encourages emergence of desired properties during inference while preserving semantic contents of user-provided prompts. We subsequently tailor this generic prompt optimizer into a specialized solver that promotes generation of minority features by incorporating a carefully-crafted likelihood objective. Extensive experiments conducted across various types of T2I models demonstrate that our approach significantly enhances the capability to produce high-quality minority instances compared to existing samplers. Code is available at https://github.com/soobin-um/MinorityPrompt.
Authors: Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Boxing Chen, Sarath Chandar
Abstract: The growth in prominence of large language models (LLMs) in everyday life can be largely attributed to their generative abilities, yet some of this is also owed to the risks and costs associated with their use. On one front is their tendency to hallucinate false or misleading information, limiting their reliability. On another is the increasing focus on the computational limitations associated with traditional self-attention based LLMs, which has brought about new alternatives, in particular recurrent models, meant to overcome them. Yet it remains uncommon to consider these two concerns simultaneously. Do changes in architecture exacerbate/alleviate existing concerns about hallucinations? Do they affect how and where they occur? Through an extensive evaluation, we study how these architecture-based inductive biases affect the propensity to hallucinate. While hallucination remains a general phenomenon not limited to specific architectures, the situations in which they occur and the ease with which specific types of hallucinations can be induced can significantly differ based on the model architecture. These findings highlight the need for better understanding both these problems in conjunction with each other, as well as consider how to design more universal techniques for handling hallucinations.
Authors: Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko, Andrei Goncharov, Alberto Paderno, Maximilian Miller, Leander Maerkisch, Paul F. J\"ager, Klaus Maier-Hein
Abstract: Self-Supervised Learning (SSL) presents an exciting opportunity to unlock the potential of vast, untapped clinical datasets, for various downstream applications that suffer from the scarcity of labeled data. While SSL has revolutionized fields like natural language processing and computer vision, its adoption in 3D medical image computing has been limited by three key pitfalls: Small pre-training dataset sizes, architectures inadequate for 3D medical image analysis, and insufficient evaluation practices. In this paper, we address these issues by i) leveraging a large-scale dataset of 39k 3D brain MRI volumes and ii) using a Residual Encoder U-Net architecture within the state-of-the-art nnU-Net framework. iii) A robust development framework, incorporating 5 development and 8 testing brain MRI segmentation datasets, allowed performance-driven design decisions to optimize the simple concept of Masked Auto Encoders (MAEs) for 3D CNNs. The resulting model not only surpasses previous SSL methods but also outperforms the strong nnU-Net baseline by an average of approximately 3 Dice points setting a new state-of-the-art. Our code and models are made available here.
Authors: Huancheng Chen, Jingtao Li, Nidham Gazagnadou, Weiming Zhuang, Chen Chen, Lingjuan Lyu
Abstract: In the era of foundation models, we revisit continual learning~(CL), which aims to enable vision transformers (ViTs) to learn new tasks over time. However, as the scale of these models increases, catastrophic forgetting remains a persistent challenge, particularly in the presence of significant domain shifts across tasks. Recent studies highlight a crossover between CL techniques and parameter-efficient fine-tuning (PEFT), which focuses on fine-tuning only a small set of trainable parameters to adapt to downstream tasks, such as low-rank adaptation (LoRA). While LoRA achieves faster convergence and requires fewer trainable parameters, it has seldom been explored in the context of continual learning. To address this gap, we propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA), which introduces both an orthogonal LoRA adapter and a residual LoRA adapter parallel to pre-trained weights in each layer. These components are orchestrated by a dynamic memory mechanism to strike a balance between stability and plasticity. The orthogonal LoRA adapter's parameters are updated in an orthogonal subspace of previous tasks to mitigate catastrophic forgetting, while the residual LoRA adapter's parameters are updated in the residual subspace spanned by task-specific bases without interaction across tasks, offering complementary capabilities for fine-tuning new tasks. On ViT-based models, we demonstrate that DualLoRA offers significant advantages in accuracy, inference speed, and memory efficiency over existing CL methods across multiple benchmarks.
Authors: Chenxiao Yu, Zhaotian Weng, Yuangang Li, Zheng Li, Xiyang Hu, Yue Zhao
Abstract: Can Large Language Models (LLMs) accurately predict election outcomes? While LLMs have demonstrated impressive performance in various domains, including healthcare, legal analysis, and creative tasks, their ability to forecast elections remains unknown. Election prediction poses unique challenges, such as limited voter-level data, rapidly changing political landscapes, and the need to model complex human behavior. To address these challenges, we introduce a multi-step reasoning framework designed for political analysis. Our approach is validated on real-world data from the American National Election Studies (ANES) 2016 and 2020, as well as synthetic personas generated by the leading machine learning framework, offering scalable datasets for voter behavior modeling. To capture temporal dynamics, we incorporate candidates' policy positions and biographical details, ensuring that the model adapts to evolving political contexts. Drawing on Chain of Thought prompting, our multi-step reasoning pipeline systematically integrates demographic, ideological, and time-dependent factors, enhancing the model's predictive power.
Authors: Pasan Dissanayake, Faisal Hamman, Barproda Halder, Ilia Sucholutsky, Qiuyi Zhang, Sanghamitra Dutta
Abstract: Knowledge distillation deploys complex machine learning models in resource-constrained environments by training a smaller student model to emulate internal representations of a complex teacher model. However, the teacher's representations can also encode nuisance or additional information not relevant to the downstream task. Distilling such irrelevant information can actually impede the performance of a capacity-limited student model. This observation motivates our primary question: What are the information-theoretic limits of knowledge distillation? To this end, we leverage Partial Information Decomposition to quantify and explain the transferred knowledge and knowledge left to distill for a downstream task. We theoretically demonstrate that the task-relevant transferred knowledge is succinctly captured by the measure of redundant information about the task between the teacher and student. We propose a novel multi-level optimization to incorporate redundant information as a regularizer, leading to our framework of Redundant Information Distillation (RID). RID leads to more resilient and effective distillation under nuisance teachers as it succinctly quantifies task-relevant knowledge rather than simply aligning student and teacher representations.
Authors: Saedeh Tahery, Fatemeh Hamid Akhlaghi, Termeh Amirsoleimani
Abstract: The HeartBert model is introduced with three primary objectives: reducing the need for labeled data, minimizing computational resources, and simultaneously improving performance in machine learning systems that analyze Electrocardiogram (ECG) signals. Inspired by Bidirectional Encoder Representations from Transformers (BERT) in natural language processing and enhanced with a self-supervised learning approach, the HeartBert model-built on the RoBERTa architecture-generates sophisticated embeddings tailored for ECG-based projects in the medical domain. To demonstrate the versatility, generalizability, and efficiency of the proposed model, two key downstream tasks have been selected: sleep stage detection and heartbeat classification. HeartBERT-based systems, utilizing bidirectional LSTM heads, are designed to address complex challenges. A series of practical experiments have been conducted to demonstrate the superiority and advancements of HeartBERT, particularly in terms of its ability to perform well with smaller training datasets, reduced learning parameters, and effective performance compared to rival models. The code and data are publicly available at https://github.com/ecgResearch/HeartBert.
Authors: Ana\"is Halin, S\'ebastien Pi\'erard, Anthony Cioppa, Marc Van Droogenbroeck
Abstract: Properly understanding the performances of classifiers is essential in various scenarios. However, the literature often relies only on one or two standard scores to compare classifiers, which fails to capture the nuances of application-specific requirements. The Tile is a recently introduced visualization tool organizing an infinity of ranking scores into a 2D map. Thanks to the Tile, it is now possible to compare classifiers efficiently, displaying all possible application-specific preferences instead of having to rely on a pair of scores. This hitchhiker's guide to understanding the performances of two-class classifiers presents four scenarios showcasing different user profiles: a theoretical analyst, a method designer, a benchmarker, and an application developer. We introduce several interpretative flavors adapted to the user's needs by mapping different values on the Tile. We illustrate this guide by ranking and analyzing the performances of 74 state-of-the-art semantic segmentation models through the perspective of the four scenarios. Through these user profiles, we demonstrate that the Tile effectively captures the behavior of classifiers in a single visualization, while accommodating an infinite number of ranking scores. Code for mapping the different Tile flavors is available in supplementary material.
Authors: Lea Bogensperger, Matthias J. Ehrhardt, Thomas Pock, Mohammad Sadegh Salehi, Hok Shing Wong
Abstract: We consider a bilevel learning framework for learning linear operators. In this framework, the learnable parameters are optimized via a loss function that also depends on the minimizer of a convex optimization problem (denoted lower-level problem). We utilize an iterative algorithm called `piggyback' to compute the gradient of the loss and minimizer of the lower-level problem. Given that the lower-level problem is solved numerically, the loss function and thus its gradient can only be computed inexactly. To estimate the accuracy of the computed hypergradient, we derive an a-posteriori error bound, which provides guides for setting the tolerance for the lower-level problem, as well as the piggyback algorithm. To efficiently solve the upper-level optimization, we also propose an adaptive method for choosing a suitable step-size. To illustrate the proposed method, we consider a few learned regularizer problems, such as training an input-convex neural network.
Authors: Amirhossein Abaskohi, Spandana Gella, Giuseppe Carenini, Issam H. Laradji
Abstract: Multimodal multihop question answering (MMQA) requires reasoning over images and text from multiple sources. Despite advances in visual question answering, this multihop setting remains underexplored due to a lack of quality datasets. Existing methods focus on single-hop, single-modality, or short texts, limiting real-world applications like interpreting educational documents with long, multimodal content. To fill this gap, we introduce FM2DS, the first framework for creating a high-quality dataset for MMQA. Our approach consists of a 5-stage pipeline that involves acquiring relevant multimodal documents from Wikipedia, synthetically generating high-level questions and answers, and validating them through rigorous criteria to ensure data quality. We evaluate our methodology by training models on our synthesized dataset and testing on two benchmarks: MultimodalQA and WebQA. Our results demonstrate that, with an equal sample size, models trained on our synthesized data outperform those trained on human-collected data by 1.9 in exact match (EM) score on average. Additionally, we introduce M2QA-Bench with 1k samples, the first benchmark for MMQA on long documents, generated using FM2DS and refined by human annotators. We believe our data synthesis method will serve as a strong foundation for training and evaluating MMQA models.
Authors: Seth Nabat, Aishik Ghosh, Edmund Witkowski, Gregor Kasieczka, Daniel Whiteson
Abstract: Recognizing symmetries in data allows for significant boosts in neural network training, which is especially important where training data are limited. In many cases, however, the exact underlying symmetry is present only in an idealized dataset, and is broken in actual data, due to asymmetries in the detector, or varying response resolution as a function of particle momentum. Standard approaches, such as data augmentation or equivariant networks fail to represent the nature of the full, broken symmetry, effectively overconstraining the response of the neural network. We propose a learning model which balances the generality and asymptotic performance of unconstrained networks with the rapid learning of constrained networks. This is achieved through a dual-subnet structure, where one network is constrained by the symmetry and the other is not, along with a learned symmetry factor. In a simplified toy example that demonstrates violation of Lorentz invariance, our model learns as rapidly as symmetry-constrained networks but escapes its performance limitations.
Authors: Kiet A. Nguyen, Adheesh Juvekar, Tianjiao Yu, Muntasir Wahed, Ismini Lourentzou
Abstract: Recent advances in Large Vision-Language Models (LVLMs) have enabled general-purpose vision tasks through visual instruction tuning. While existing LVLMs can generate segmentation masks from text prompts for single images, they struggle with segmentation-grounded reasoning across images, especially at finer granularities such as object parts. In this paper, we introduce the new task of part-focused semantic co-segmentation, which involves identifying and segmenting common objects, as well as common and unique object parts across images. To address this task, we present CALICO, the first LVLM designed for multi-image part-level reasoning segmentation. CALICO features two key components, a novel Correspondence Extraction Module that identifies semantic part-level correspondences, and Correspondence Adaptation Modules that embed this information into the LVLM to facilitate multi-image understanding in a parameter-efficient manner. To support training and evaluation, we curate MixedParts, a large-scale multi-image segmentation dataset containing $\sim$2.4M samples across $\sim$44K images spanning diverse object and part categories. Experimental results demonstrate that CALICO, with just 0.3% of its parameters finetuned, achieves strong performance on this challenging task.
Authors: Bowen Wen, Matthew Trepte, Joseph Aribido, Jan Kautz, Orazio Gallo, Stan Birchfield
Abstract: Tremendous progress has been made in deep stereo matching to excel on benchmark datasets through per-domain fine-tuning. However, achieving strong zero-shot generalization - a hallmark of foundation models in other computer vision tasks - remains challenging for stereo matching. We introduce FoundationStereo, a foundation model for stereo depth estimation designed to achieve strong zero-shot generalization. To this end, we first construct a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism, followed by an automatic self-curation pipeline to remove ambiguous samples. We then design a number of network architecture components to enhance scalability, including a side-tuning feature backbone that adapts rich monocular priors from vision foundation models to mitigate the sim-to-real gap, and long-range context reasoning for effective cost volume filtering. Together, these components lead to strong robustness and accuracy across domains, establishing a new standard in zero-shot stereo depth estimation. Project page: https://nvlabs.github.io/FoundationStereo/
Authors: Yingheng Tang, Wenbin Xu, Jie Cao, Jianzhu Ma, Weilu Gao, Steve Farrell, Benjamin Erichson, Michael W. Mahoney, Andy Nonaka, Zhi Yao
Abstract: Understanding and predicting the properties of inorganic materials is crucial for accelerating advancements in materials science and driving applications in energy, electronics, and beyond. Integrating material structure data with language-based information through multi-modal large language models (LLMs) offers great potential to support these efforts by enhancing human-AI interaction. However, a key challenge lies in integrating atomic structures at full resolution into LLMs. In this work, we introduce MatterChat, a versatile structure-aware multi-modal LLM that unifies material structural data and textual inputs into a single cohesive model. MatterChat employs a bridging module to effectively align a pretrained machine learning interatomic potential with a pretrained LLM, reducing training costs and enhancing flexibility. Our results demonstrate that MatterChat significantly improves performance in material property prediction and human-AI interaction, surpassing general-purpose LLMs such as GPT-4. We also demonstrate its usefulness in applications such as more advanced scientific reasoning and step-by-step material synthesis.
Authors: Amirali Sajadi, Binh Le, Anh Nguyen, Kostadin Damevski, Preetha Chatterjee
Abstract: The widespread adoption of conversational LLMs for software development has raised new security concerns regarding the safety of LLM-generated content. Our motivational study outlines ChatGPT's potential in volunteering context-specific information to the developers, promoting safe coding practices. Motivated by this finding, we conduct a study to evaluate the degree of security awareness exhibited by three prominent LLMs: Claude 3, GPT-4, and Llama 3. We prompt these LLMs with Stack Overflow questions that contain vulnerable code to evaluate whether they merely provide answers to the questions or if they also warn users about the insecure code, thereby demonstrating a degree of security awareness. Further, we assess whether LLM responses provide information about the causes, exploits, and the potential fixes of the vulnerability, to help raise users' awareness. Our findings show that all three models struggle to accurately detect and warn users about vulnerabilities, achieving a detection rate of only 12.6% to 40% across our datasets. We also observe that the LLMs tend to identify certain types of vulnerabilities related to sensitive information exposure and improper input neutralization much more frequently than other types, such as those involving external control of file names or paths. Furthermore, when LLMs do issue security warnings, they often provide more information on the causes, exploits, and fixes of vulnerabilities compared to Stack Overflow responses. Finally, we provide an in-depth discussion on the implications of our findings and present a CLI-based prompting tool that can be used to generate significantly more secure LLM responses.
Authors: Joan Serr\`a, R. Oguz Araz, Dmitry Bogdanov, Yuki Mitsufuji
Abstract: Detecting musical versions (different renditions of the same piece) is a challenging task with important applications. Because of the ground truth nature, existing approaches match musical versions at the track level (e.g., whole song). However, most applications require to match them at the segment level (e.g., 20s chunks). In addition, existing approaches resort to classification and triplet losses, disregarding more recent losses that could bring meaningful improvements. In this paper, we propose a method to learn from weakly annotated segments, together with a contrastive loss variant that outperforms well-studied alternatives. The former is based on pairwise segment distance reductions, while the latter modifies an existing loss following decoupling, hyper-parameter, and geometric considerations. With these two elements, we do not only achieve state-of-the-art results in the standard track-level evaluation, but we also obtain a breakthrough performance in a segment-level evaluation. We believe that, due to the generality of the challenges addressed here, the proposed methods may find utility in domains beyond audio or musical version matching.
Authors: Yasamin Jalalian, Juan Felipe Osorio Ramirez, Alexander Hsu, Bamdad Hosseini, Houman Owhadi
Abstract: We introduce a novel kernel-based framework for learning differential equations and their solution maps that is efficient in data requirements, in terms of solution examples and amount of measurements from each example, and computational cost, in terms of training procedures. Our approach is mathematically interpretable and backed by rigorous theoretical guarantees in the form of quantitative worst-case error bounds for the learned equation. Numerical benchmarks demonstrate significant improvements in computational complexity and robustness while achieving one to two orders of magnitude improvements in terms of accuracy compared to state-of-the-art algorithms.
Authors: David T. Hoffmann, Syed Haseeb Raza, Hanqiu Jiang, Denis Tananaev, Steffen Klingenhoefer, Martin Meinke
Abstract: Scene flow estimation is a foundational task for many robotic applications, including robust dynamic object detection, automatic labeling, and sensor synchronization. Two types of approaches to the problem have evolved: 1) Supervised and 2) optimization-based methods. Supervised methods are fast during inference and achieve high-quality results, however, they are limited by the need for large amounts of labeled training data and are susceptible to domain gaps. In contrast, unsupervised test-time optimization methods do not face the problem of domain gaps but usually suffer from substantial runtime, exhibit artifacts, or fail to converge to the right solution. In this work, we mitigate several limitations of existing optimization-based methods. To this end, we 1) introduce a simple voxel grid-based model that improves over the standard MLP-based formulation in multiple dimensions and 2) introduce a new multiframe loss formulation. 3) We combine both contributions in our new method, termed Floxels. On the Argoverse 2 benchmark, Floxels is surpassed only by EulerFlow among unsupervised methods while achieving comparable performance at a fraction of the computational cost. Floxels achieves a massive speedup of more than ~60 - 140x over EulerFlow, reducing the runtime from a day to 10 minutes per sequence. Over the faster but low-quality baseline, NSFP, Floxels achieves a speedup of ~14x.
Authors: Ali Elkeshawy, Haifa Fares, Amor Nafkha
Abstract: Grant-free random access (GF-RA) is a promising access technique for massive machine-type communications (mMTC) in future wireless networks, particularly in the context of 5G and beyond (6G) systems. Within the context of GF-RA, this study investigates the efficiency of employing supervised machine learning techniques to tackle the challenges on the device activity detection (AD). GF-RA addresses scalability by employing non-orthogonal pilot sequences, which provides an efficient alternative comparing to conventional grant-based random access (GB-RA) technique that are constrained by the scarcity of orthogonal preamble resources. In this paper, we propose a novel lightweight data-driven algorithmic framework specifically designed for activity detection in GF-RA for mMTC in cell-free massive multiple-input multiple-output (CF-mMIMO) networks. We propose two distinct framework deployment strategies, centralized and decentralized, both tailored to streamline the proposed approach implementation across network infrastructures. Moreover, we introduce optimized post-detection methodologies complemented by a clustering stage to enhance overall detection performances. Our 3GPP-compliant simulations have validated that the proposed algorithm achieves state-of-the-art model-based activity detection accuracy while significantly reducing complexity. Achieving 99% accuracy, it demonstrates real-world viability and effectiveness.
Authors: Jingyuan Xue, Longfei Wei, Fang Sheng, Jianfei Zhang
Abstract: Battery degradation significantly impacts the reliability and efficiency of energy storage systems, particularly in electric vehicles (EVs) and industrial applications. Predicting the remaining useful life (RUL) of lithium-ion (Li-ion) batteries is crucial for optimizing maintenance schedules, reducing costs, and improving safety. Traditional RUL prediction methods often struggle with nonlinear degradation patterns and uncertainty quantification. To address these challenges, we propose a hybrid survival analysis framework integrating both statistical and machine-learning-based models for RUL estimation. Our approach transforms time-series battery data into time-to-failure data using path signatures, enabling effective survival modeling. We apply five models, including Cox-based survival models and machine-learning-based methods such as DeepHit and MTLR, to estimate failure-free probabilities over time. Experiments conducted on 362 Toyota battery datasets demonstrate the effectiveness of our approach, achieving high time-dependent AUC and concordance index while maintaining a low integrated Brier score. The proposed methodology provides actionable insights for battery manufacturers and engineers, supporting dynamic maintenance strategies and optimized lifecycle management.
Authors: Owen Cook, Jake Vasilakes, Ian Roberts, Xingyi Song
Abstract: Data annotation is an essential component of the machine learning pipeline; it is also a costly and time-consuming process. With the introduction of transformer-based models, annotation at the document level is increasingly popular; however, there is no standard framework for structuring such tasks. The EffiARA annotation framework is, to our knowledge, the first project to support the whole annotation pipeline, from understanding the resources required for an annotation task to compiling the annotated dataset and gaining insights into the reliability of individual annotators as well as the dataset as a whole. The framework's efficacy is supported by two previous studies: one improving classification performance through annotator-reliability-based soft label aggregation and sample weighting, and the other increasing the overall agreement among annotators through removing identifying and replacing an unreliable annotator. This work introduces the EffiARA Python package and its accompanying webtool, which provides an accessible graphical user interface for the system. We open-source the EffiARA Python package at https://github.com/MiniEggz/EffiARA and the webtool is publicly accessible at https://effiara.gate.ac.uk.
URLs: https://github.com/MiniEggz/EffiARA, https://effiara.gate.ac.uk.
Authors: Joshua Hatherley
Abstract: It is commonly accepted that clinicians are ethically obligated to disclose their use of medical machine learning systems to patients, and that failure to do so would amount to a moral fault for which clinicians ought to be held accountable. Call this "the disclosure thesis." Four main arguments have been, or could be, given to support the disclosure thesis in the ethics literature: the risk-based argument, the rights-based argument, the materiality argument, and the autonomy argument. In this article, I argue that each of these four arguments are unconvincing, and therefore, that the disclosure thesis ought to be rejected. I suggest that mandating disclosure may also even risk harming patients by providing stakeholders with a way to avoid accountability for harm that results from improper applications or uses of these systems.
Authors: Alvaro Ortiz, Tomasa Rodrigo, David Sarasa, Sirenia Vazquez
Abstract: Using a panel data local projections model and controlling for firm characteristics, procurement bid attributes, and macroeconomic conditions, the study estimates the dynamic effects of procurement awards on new lending, a more precise measure than the change in the stock of credit. The analysis further examines heterogeneity in credit responses based on firm size, industry, credit maturity, and value chain position of the firms. The empirical evidence confirms that public procurement awards significantly increase new lending, with NGEU-funded contracts generating stronger credit expansion than traditional procurement during the recent period. The results show that the impact of NGEU procurement programs aligns closely with historical procurement impacts, with differences driven mainly by lower utilization rates. Moreover, integrating high-frequency financial data with procurement records highlights the potential of Big Data in refining public policy design.
Authors: Shivesh Prakash, Hans-Arno Jacobsen, Viki Kumar Prasad
Abstract: We introduce MHNpath, a machine learning-driven retrosynthetic tool designed for computer-aided synthesis planning. Leveraging modern Hopfield networks and novel comparative metrics, MHNpath efficiently prioritizes reaction templates, improving the scalability and accuracy of retrosynthetic predictions. The tool incorporates a tunable scoring system that allows users to prioritize pathways based on cost, reaction temperature, and toxicity, thereby facilitating the design of greener and cost-effective reaction routes. We demonstrate its effectiveness through case studies involving complex molecules from ChemByDesign, showcasing its ability to predict novel synthetic and enzymatic pathways. Furthermore, we benchmark MHNpath against existing frameworks, replicating experimentally validated "gold-standard" pathways from PaRoutes. Our case studies reveal that the tool can generate shorter, cheaper, moderate-temperature routes employing green solvents, as exemplified by compounds such as dronabinol, arformoterol, and lupinine.
Authors: Edward DongBo Cui, Lu Zhang, William Ping-hsun Lee
Abstract: User and item cold starts present significant challenges in industrial applications of recommendation systems. Supplementing user-item interaction data with metadata is a common solution-but often at the cost of introducing additional biases. In this work, we introduce an augmented EASE model, i.e. FEASE, that seamlessly integrates both user and item side information to address these cold start issues. Our straightforward, autoencoder-based method produces a closed-form solution that leverages rich content signals for cold items while refining user representations in data-sparse environments. Importantly, our method strikes a balance by effectively recommending cold start items and handling cold start users without incurring extra bias, and it maintains strong performance in warm settings. Experimental results demonstrate improved recommendation accuracy and robustness compared to previous collaborative filtering approaches. Moreover, our model serves as a strong baseline for future comparative studies.
Authors: Nusrat Jahan Mozumder, Felipe Toledo, Swaroopa Dola, Matthew B. Dwyer
Abstract: Deep neural network (DNN) testing is crucial for the reliability and safety of critical systems, where failures can have severe consequences. Although various techniques have been developed to create robustness test suites, requirements-based testing for DNNs remains largely unexplored - yet such tests are recognized as an essential component of software validation of critical systems. In this work, we propose a requirements-based test suite generation method that uses structured natural language requirements formulated in a semantic feature space to create test suites by prompting text-conditional latent diffusion models with the requirement precondition and then using the associated postcondition to define a test oracle to judge outputs of the DNN under test. We investigate the approach using fine-tuned variants of pre-trained generative models. Our experiments on the MNIST, CelebA-HQ, ImageNet, and autonomous car driving datasets demonstrate that the generated test suites are realistic, diverse, consistent with preconditions, and capable of revealing faults.