Week Ending 1.12.2025

 

RESEARCH WATCH: 1.12.2025

 

Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models

The field of multimodal AI has been seeking ways to improve performance while maintaining efficiency. This paper tackles a key challenge in Efficient Multimodal Large Language Models (EMLLMs): enabling them to effectively evaluate their own reasoning despite limited parameters. The researchers propose a novel cascaded training approach that breaks down complex prompts into manageable pieces, making self-evaluation more practical for smaller models. Their method shows impressive performance gains on mathematical reasoning tasks, demonstrating that smaller models can achieve better self-evaluation capabilities through careful training design.

Authors:  Zheqi Lv, Wenkai Wang, Jiawei Wang, Shengyu Zhang, Fei Wu

Link:  https://arxiv.org/abs/2501.05662v1

Date: 2025-01-10

Summary:

Efficient Multimodal Large Language Models (EMLLMs) have rapidly advanced recently. Incorporating Chain-of-Thought (CoT) reasoning and step-by-step self-evaluation has improved their performance. However, limited parameters often hinder EMLLMs from effectively using self-evaluation during inference. Key challenges include synthesizing evaluation data, determining its quantity, optimizing training and inference strategies, and selecting appropriate prompts.   To address these issues, we introduce Self-Evaluation Augmented Training (SEAT). SEAT uses more powerful EMLLMs for CoT reasoning, data selection, and evaluation generation, then trains EMLLMs with the synthesized data. However, handling long prompts and maintaining CoT reasoning quality are problematic. Therefore, we propose Cascaded Self-Evaluation Augmented Training (Cas-SEAT), which breaks down lengthy prompts into shorter, task-specific cascaded prompts and reduces costs for resource-limited settings. During data synthesis, we employ open-source 7B-parameter EMLLMs and annotate a small dataset with short prompts.   Experiments demonstrate that Cas-SEAT significantly boosts EMLLMs' self-evaluation abilities, improving performance by 19.68%, 55.57%, and 46.79% on the MathVista, Math-V, and We-Math datasets, respectively. Additionally, our Cas-SEAT Dataset serves as a valuable resource for future research in enhancing EMLLM self-evaluation.

--------------------------------------------------------------------------------------------------------

Quantifying Itch and its Impact on Sleep Using Machine Learning and Radio Signals

Chronic itch affects millions of Americans but has been difficult to measure objectively. This innovative study presents a breakthrough in non-invasive monitoring using radio signals and AI. The researchers developed a home device that can detect scratching behavior and analyze sleep patterns without requiring wearable sensors or skin contact. This technology could revolutionize both clinical care for chronic itch patients and pharmaceutical trials by providing objective, long-term monitoring data. The system's high accuracy and ability to correlate scratching with sleep quality metrics offers new possibilities for understanding and treating chronic itch conditions.

Authors:  Michail Ouroutzoglou, Mingmin Zhao, Joshua Hellerstein, Hariharan Rahul, Asima Badic, Brian S. Kim, Dina Katabi

Link:  https://arxiv.org/abs/2501.04896v1

Date: 2025-01-09

Summary:

Chronic itch affects 13% of the US population, is highly debilitating, and underlies many medical conditions. A major challenge in clinical care and new therapeutics development is the lack of an objective measure for quantifying itch, leading to reliance on subjective measures like patients' self-assessment of itch severity. In this paper, we show that a home radio device paired with artificial intelligence (AI) can concurrently capture scratching and evaluate its impact on sleep quality by analyzing radio signals bouncing in the environment. The device eliminates the need for wearable sensors or skin contact, enabling monitoring of chronic itch over extended periods at home without burdening patients or interfering with their skin condition. To validate the technology, we conducted an observational clinical study of chronic pruritus patients, monitored at home for one month using both the radio device and an infrared camera. Comparing the output of the device to ground truth data from the camera demonstrates its feasibility and accuracy (ROC AUC = 0.997, sensitivity = 0.825, specificity = 0.997). The results reveal a significant correlation between scratching and low sleep quality, manifested as a reduction in sleep efficiency (R = 0.6, p < 0.001) and an increase in sleep latency (R = 0.68, p < 0.001). Our study underscores the potential of passive, long-term, at-home monitoring of chronic scratching and its sleep implications, offering a valuable tool for both clinical care of chronic itch patients and pharmaceutical clinical trials.

--------------------------------------------------------------------------------------------------------

Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints

In robotics and AI, creating diverse behaviors while maintaining learned skills is a significant challenge. This paper introduces a novel offline algorithm for maximizing behavioral diversity in robotic systems without requiring direct interaction with the environment. Using Van der Waals force-inspired objectives and successor features, the method eliminates the need for skill discriminators while allowing zero-shot recall of learned skills. The approach shows particular promise in robotic applications, demonstrated through successful implementations in quadruped locomotion and obstacle navigation tasks.

Authors:  Pavel Kolev, Marin Vlastelica, Georg Martius

Link:  https://arxiv.org/abs/2501.04426v1

Date: 2025-01-08

Summary:

While many algorithms for diversity maximization under imitation constraints are online in nature, many applications require offline algorithms without environment interactions. Tackling this problem in the offline setting, however, presents significant challenges that require non-trivial, multi-stage optimization processes with non-stationary rewards. In this work, we present a novel offline algorithm that enhances diversity using an objective based on Van der Waals (VdW) force and successor features, and eliminates the need to learn a previously used skill discriminator. Moreover, by conditioning the value function and policy on a pre-trained Functional Reward Encoding (FRE), our method allows for better handling of non-stationary rewards and provides zero-shot recall of all skills encountered during training, significantly expanding the set of skills learned in prior work. Consequently, our algorithm benefits from receiving a consistently strong diversity signal (VdW), and enjoys more stable and efficient training. We demonstrate the effectiveness of our method in generating diverse skills for two robotic tasks in simulation: locomotion of a quadruped and local navigation with obstacle traversal.

--------------------------------------------------------------------------------------------------------

Design, Construction, and Testing of the APOLLO ATCA Blades for Use at the HL-LHC

This paper describes a crucial hardware development for the High-Luminosity Large Hadron Collider (HL-LHC). The APOLLO platform combines a generic Service Module with a customizable Command Module, creating a cost-effective solution for complex data readout and tracking systems. The design incorporates advanced features like high-bandwidth optical transceivers and powerful FPGAs, making it suitable for demanding physics applications. Currently under testing at major research institutions, this open-source hardware platform represents an important advancement in particle physics infrastructure.

Authors:  A. Akpinar, A. Blaizot, S. Cholak, G. de Castro, Z. Demiragli, A. Duquette, J. Fulcher, D. Gastler, K. Hahn, E. Hazen, P. Kotamnives, A. Madorsky, D. Monk, S. Noorudhin, M. Oshiro, J. Rohlf, C. Strohman, E. Tsai, P. Wittich, S. Yuan, R. Zou

Link:  https://arxiv.org/abs/2501.03702v2

Date: 2025-01-08

Summary:

The Apollo Advanced Telecommunications Computing Architecture (ATCA) platform is an open-source design consisting of a generic "Service Module" (SM) and a customizable "Command Module" (CM), allowing for cost-effective use in applications such as the readout of the inner tracker and the Level-1 track trigger for the CMS Phase-II upgrade at the HL-LHC. The SM integrates an intelligent IPMC, robust power entry and conditioning systems, a powerful system-on-module computer, and flexible clock and communication infrastructure. The CM is designed around two Xilinx Ultrascale+ FPGAs and high-density, high-bandwidth optical transceivers capable of 25 Gb/s. Crates of Apollo blades are currently being tested at Boston University, Cornell University, and CERN.

--------------------------------------------------------------------------------------------------------

A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving

Mathematics education and AI have intersected in the field of Math Word Problem (MWP) solving. This research addresses a common limitation in existing systems: their struggle to generate diverse but equivalent solution equations. The proposed DivKD model uses an innovative approach where a student model learns from a teacher model while maintaining solution diversity. This advancement could improve automated math tutoring systems and educational technology by providing more flexible and varied approaches to problem-solving.

Authors:  Yi Zhang, Guangyou Zhou, Zhiwen Xie, Jinjin Ma, Jimmy Xiangji Huang

Link:  https://arxiv.org/abs/2501.03670v1

Date: 2025-01-07

Summary:

Math Word Problem (MWP) solving is a critical task in natural language processing, has garnered significant research interest in recent years. Various recent studies heavily rely on Seq2Seq models and their extensions (e.g., Seq2Tree and Graph2Tree) to generate mathematical equations. While effective, these models struggle to generate diverse but counterpart solution equations, limiting their generalization across various math problem scenarios. In this paper, we introduce a novel Diversity-enhanced Knowledge Distillation (DivKD) model for practical MWP solving. Our approach proposes an adaptive diversity distillation method, in which a student model learns diverse equations by selectively transferring high-quality knowledge from a teacher model. Additionally, we design a diversity prior-enhanced student model to better capture the diversity distribution of equations by incorporating a conditional variational auto-encoder. Extensive experiments on {four} MWP benchmark datasets demonstrate that our approach achieves higher answer accuracy than strong baselines while maintaining high efficiency for practical applications.

--------------------------------------------------------------------------------------------------------

Effective and Efficient Mixed Precision Quantization of Speech Foundation Models

Speech recognition models often require significant computational resources, limiting their practical applications. This research presents a new approach to model compression that integrates mixed-precision learning and parameter estimation in a single stage. The method achieves impressive compression ratios while maintaining performance quality on speech recognition tasks. This advancement could make speech recognition technology more accessible for resource-constrained devices and applications, potentially enabling broader deployment of voice interface systems.

Authors:  Haoning Xu, Zhaoqing Li, Zengrui Jin, Huimeng Wang, Youjun Chen, Guinan Li, Mengzhe Geng, Shujie Hu, Jiajun Deng, Xunying Liu

Link:  https://arxiv.org/abs/2501.03643v1

Date: 2025-01-07

Summary:

This paper presents a novel mixed-precision quantization approach for speech foundation models that tightly integrates mixed-precision learning and quantized model parameter estimation into one single model compression stage. Experiments conducted on LibriSpeech dataset with fine-tuned wav2vec2.0-base and HuBERT-large models suggest the resulting mixed-precision quantized models increased the lossless compression ratio by factors up to 1.7x and 1.9x over the respective uniform-precision and two-stage mixed-precision quantized baselines that perform precision learning and model parameters quantization in separate and disjointed stages, while incurring no statistically word error rate (WER) increase over the 32-bit full-precision models. The system compression time of wav2vec2.0-base and HuBERT-large models is reduced by up to 1.9 and 1.5 times over the two-stage mixed-precision baselines, while both produce lower WERs. The best-performing 3.5-bit mixed-precision quantized HuBERT-large model produces a lossless compression ratio of 8.6x over the 32-bit full-precision system.

--------------------------------------------------------------------------------------------------------

To Analyze and Regulate Human-in-the-loop Learning for Congestion Games

Traffic management systems face a critical challenge: balancing user preferences with optimal route planning. This research examines how to incentivize users to explore alternative routes while contributing to traffic data collection. The study proposes a selective information disclosure mechanism that strategically reveals traffic information to influence user behavior. This approach could significantly improve traffic management systems and navigation apps by better balancing individual routing preferences with overall system efficiency.

Authors:  Hongbo Li, Lingjie Duan

Link:  https://arxiv.org/abs/2501.03055v1

Date: 2025-01-06

Summary:

In congestion games, selfish users behave myopically to crowd to the shortest paths, and the social planner designs mechanisms to regulate such selfish routing through information or payment incentives. However, such mechanism design requires the knowledge of time-varying traffic conditions and it is the users themselves to learn and report past road experiences to the social planner (e.g., Waze or Google Maps). When congestion games meet mobile crowdsourcing, it is critical to incentivize selfish users to explore non-shortest paths in the best exploitation-exploration trade-off. First, we consider a simple but fundamental parallel routing network with one deterministic path and multiple stochastic paths for users with an average arrival probability $\lambda$. We prove that the current myopic routing policy (widely used in Waze and Google Maps) misses both exploration (when strong hazard belief) and exploitation (when weak hazard belief) as compared to the social optimum. Due to the myopic policy's under-exploration, we prove that the caused price of anarchy (PoA) is larger than \(\frac{1}{1-\rho^{\frac{1}{\lambda}}}\), which can be arbitrarily large as discount factor \(\rho\rightarrow1\). To mitigate such huge efficiency loss, we propose a novel selective information disclosure (SID) mechanism: we only reveal the latest traffic information to users when they intend to over-explore stochastic paths upon arrival, while hiding such information when they want to under-explore. We prove that our mechanism successfully reduces PoA to be less than~\(2\). Besides the parallel routing network, we further extend our mechanism and PoA results to any linear path graphs with multiple intermediate nodes.

--------------------------------------------------------------------------------------------------------

Proof-of-Data: A Consensus Protocol for Collaborative Intelligence

As machine learning becomes more collaborative, ensuring fair and secure participation becomes crucial. This paper introduces a blockchain-based framework for decentralized federated learning, using a novel Proof-of-Data consensus protocol. The system addresses both trust and incentive challenges in collaborative AI development, allowing for fair reward allocation while maintaining privacy. This approach could revolutionize how organizations collaborate on AI development, particularly in scenarios where data privacy and fair contribution recognition are essential.

Authors:  Huiwen Liu, Feida Zhu, Ling Cheng

Link:  https://arxiv.org/abs/2501.02971v1

Date: 2025-01-06

Summary:

Existing research on federated learning has been focused on the setting where learning is coordinated by a centralized entity. Yet the greatest potential of future collaborative intelligence would be unleashed in a more open and democratized setting with no central entity in a dominant role, referred to as "decentralized federated learning". New challenges arise accordingly in achieving both correct model training and fair reward allocation with collective effort among all participating nodes, especially with the threat of the Byzantine node jeopardising both tasks.   In this paper, we propose a blockchain-based decentralized Byzantine fault-tolerant federated learning framework based on a novel Proof-of-Data (PoD) consensus protocol to resolve both the "trust" and "incentive" components. By decoupling model training and contribution accounting, PoD is able to enjoy not only the benefit of learning efficiency and system liveliness from asynchronous societal-scale PoW-style learning but also the finality of consensus and reward allocation from epoch-based BFT-style voting. To mitigate false reward claims by data forgery from Byzantine attacks, a privacy-aware data verification and contribution-based reward allocation mechanism is designed to complete the framework. Our evaluation results show that PoD demonstrates performance in model training close to that of the centralized counterpart while achieving trust in consensus and fairness for reward allocation with a fault tolerance ratio of 1/3.

--------------------------------------------------------------------------------------------------------

Interpretable Recognition of Fused Magnesium Furnace Working Conditions with Deep Convolutional Stochastic Configuration Networks

Industrial process monitoring requires both accurate recognition and interpretable results. This research focuses on recognizing working conditions in fused magnesium furnaces using an innovative neural network approach. The method combines supervised learning with reinforcement learning to create an interpretable and efficient recognition system. This advancement could improve industrial process control and safety monitoring, particularly in complex manufacturing environments where understanding system decisions is crucial.

Authors:  Li Weitao, Zhang Xinru, Wang Dianhui, Tong Qianqian, Chai Tianyou

Link:  https://arxiv.org/abs/2501.02740v1

Date: 2025-01-06

Summary:

To address the issues of a weak generalization capability and interpretability in working condition recognition model of a fused magnesium furnace, this paper proposes an interpretable working condition recognition method based on deep convolutional stochastic configuration networks (DCSCNs). Firstly, a supervised learning mechanism is employed to generate physically meaningful Gaussian differential convolution kernels. An incremental method is utilized to construct a DCSCNs model, ensuring the convergence of recognition errors in a hierarchical manner and avoiding the iterative optimization process of convolutional kernel parameters using the widely used backpropagation algorithm. The independent coefficient of channel feature maps is defined to obtain the visualization results of feature class activation maps for the fused magnesium furnace. A joint reward function is constructed based on the recognition accuracy, the interpretable trustworthiness evaluation metrics, and the model parameter quantity. Reinforcement learning (RL) is applied to adaptively prune the convolutional kernels of the DCSCNs model, aiming to build a compact, highly performed and interpretable network. The experimental results demonstrate that the proposed method outperforms the other deep learning approaches in terms of recognition accuracy and interpretability.

--------------------------------------------------------------------------------------------------------

Machine learning applications in archaeological practices: a review

The intersection of artificial intelligence and archaeology has grown significantly in recent years. This comprehensive review examines 135 papers from 1997-2022, revealing how machine learning is transforming archaeological practices across all subfields. The research highlights both the potential and limitations of AI in archaeology, proposing a structured workflow guide for archaeologists. This work could help standardize and improve the application of machine learning in archaeological research, potentially leading to new discoveries and better preservation methods.

Authors:  Mathias Bellat, Jordy D. Orellana Figueroa, Jonathan S. Reeves, Ruhollah Taghizadeh-Mehrjardi, Claudio Tennie, Thomas Scholten

Link:  https://arxiv.org/abs/2501.03840v1

Date: 2025-01-07

Summary:

Artificial intelligence and machine learning applications in archaeology have increased significantly in recent years, and these now span all subfields, geographical regions, and time periods. The prevalence and success of these applications have remained largely unexamined, as recent reviews on the use of machine learning in archaeology have only focused only on specific subfields of archaeology. Our review examined an exhaustive corpus of 135 articles published between 1997 and 2022. We observed a significant increase in the number of relevant publications from 2019 onwards. Automatic structure detection and artefact classification were the most represented tasks in the articles reviewed, followed by taphonomy, and archaeological predictive modelling. From the review, clustering and unsupervised methods were underrepresented compared to supervised models. Artificial neural networks and ensemble learning account for two thirds of the total number of models used. However, if machine learning is gaining in popularity it remains subject to misunderstanding. We observed, in some cases, poorly defined requirements and caveats of the machine learning methods used. Furthermore, the goals and the needs of machine learning applications for archaeological purposes are in some cases unclear or poorly expressed. To address this, we proposed a workflow guide for archaeologists to develop coherent and consistent methodologies adapted to their research questions, project scale and data. As in many other areas, machine learning is rapidly becoming an important tool in archaeological research and practice, useful for the analyses of large and multivariate data, although not without limitations. This review highlights the importance of well-defined and well-reported structured methodologies and collaborative practices to maximise the potential of applications of machine learning methods in archaeology.

--------------------------------------------------------------------------------------------------------

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

Converting static images to realistic videos while maintaining accurate object motion remains challenging. This paper presents a novel two-stage approach using mask-based motion trajectories to capture both semantic and motion information. The method particularly excels in handling multiple objects and complex movements. This technology could enhance video content creation, visual effects, and animation production by enabling more realistic and controlled video generation from still images.

Authors:  Guy Yariv, Yuval Kirstain, Amit Zohar, Shelly Sheynin, Yaniv Taigman, Yossi Adi, Sagie Benaim, Adam Polyak

Link:  https://arxiv.org/abs/2501.03059v1

Date: 2025-01-06

Summary:

We consider the task of Image-to-Video (I2V) generation, which involves transforming static images into realistic video sequences based on a textual description. While recent advancements produce photorealistic outputs, they frequently struggle to create videos with accurate and consistent object motion, especially in multi-object scenarios. To address these limitations, we propose a two-stage compositional framework that decomposes I2V generation into: (i) An explicit intermediate representation generation stage, followed by (ii) A video generation stage that is conditioned on this representation. Our key innovation is the introduction of a mask-based motion trajectory as an intermediate representation, that captures both semantic object information and motion, enabling an expressive but compact representation of motion and semantics. To incorporate the learned representation in the second stage, we utilize object-level attention objectives. Specifically, we consider a spatial, per-object, masked-cross attention objective, integrating object-specific prompts into corresponding latent space regions and a masked spatio-temporal self-attention objective, ensuring frame-to-frame consistency for each object. We evaluate our method on challenging benchmarks with multi-object and high-motion scenarios and empirically demonstrate that the proposed method achieves state-of-the-art results in temporal coherence, motion realism, and text-prompt faithfulness. Additionally, we introduce \benchmark, a new challenging benchmark for single-object and multi-object I2V generation, and demonstrate our method's superiority on this benchmark. Project page is available at https://guyyariv.github.io/TTM/.

--------------------------------------------------------------------------------------------------------

Reach Measurement, Optimization and Frequency Capping In Targeted Online Advertising Under k-Anonymity

Online advertising faces new challenges as privacy concerns reshape the industry. This research examines how to maintain effective frequency capping while ensuring user privacy through k-anonymity. The study provides methods for reach measurement and optimization within privacy constraints, demonstrating the trade-offs between user privacy and advertising effectiveness. This work could help advertising platforms adapt to increasing privacy demands while maintaining campaign effectiveness.

Authors:  Yuan Gao, Mu Qiao

Link:  https://arxiv.org/abs/2501.04882v1

Date: 2025-01-08

Summary:

The growth in the use of online advertising to foster brand awareness over recent years is largely attributable to the ubiquity of social media. One pivotal technology contributing to the success of online brand advertising is frequency capping, a mechanism that enables marketers to control the number of times an ad is shown to a specific user. However, the very foundation of this technology is being scrutinized as the industry gravitates towards advertising solutions that prioritize user privacy. This paper delves into the issue of reach measurement and optimization within the context of $k$-anonymity, a privacy-preserving model gaining traction across major online advertising platforms. We outline how to report reach within this new privacy landscape and demonstrate how probabilistic discounting, a probabilistic adaptation of traditional frequency capping, can be employed to optimize campaign performance. Experiments are performed to assess the trade-off between user privacy and the efficacy of online brand advertising. Notably, we discern a significant dip in performance as long as privacy is introduced, yet this comes with a limited additional cost for advertising platforms to offer their users more privacy.

--------------------------------------------------------------------------------------------------------

Cosmos World Foundation Model Platform for Physical AI

NVIDIA's research presents a comprehensive platform for training physical AI systems using digital twins. The Cosmos platform provides tools for video curation, pre-trained world foundation models, and video tokenizers. Released as open-source with permissive licenses, this platform aims to accelerate the development of physical AI applications. This could significantly impact robotics, automation, and other fields requiring accurate physical world modeling.

Authors:  NVIDIA, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski

Link:  https://arxiv.org/abs/2501.03575v1

Date: 2025-01-07

Summary:

Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.

--------------------------------------------------------------------------------------------------------

How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters

This study investigates optimal methods for adapting the mDeBERTa model to Germanic languages with varying resource levels. The research compares different fine-tuning approaches, including full fine-tuning and parameter-efficient methods, across German, Swedish, and Icelandic. These findings could help improve natural language processing applications for less-resourced languages while maintaining efficiency in model adaptation.

Authors:  Romina Oji, Jenny Kunz

Link:  https://arxiv.org/abs/2501.06025v1

Date: 2025-01-10

Summary:

This paper investigates the optimal use of the multilingual encoder model mDeBERTa for tasks in three Germanic languages -- German, Swedish, and Icelandic -- representing varying levels of presence and likely data quality in mDeBERTas pre-training data. We compare full fine-tuning with the parameter-efficient fine-tuning (PEFT) methods LoRA and Pfeiffer bottleneck adapters, finding that PEFT is more effective for the higher-resource language, German. However, results for Swedish and Icelandic are less consistent. We also observe differences between tasks: While PEFT tends to work better for question answering, full fine-tuning is preferable for named entity recognition. Inspired by previous research on modular approaches that combine task and language adapters, we evaluate the impact of adding PEFT modules trained on unstructured text, finding that this approach is not beneficial.

--------------------------------------------------------------------------------------------------------

Critical-like phenomenon in scraping of jamming systems

This research examines the physics of foam scraping, revealing complex behavior patterns distinct from simple liquids. The study identifies a critical transition between partial and slender scraping regimes, governed by directional percolation theory. This understanding could improve industrial processes involving jamming systems, from cosmetic product application to food processing, by providing better control over material behavior during spreading and scraping.

Authors:  Masaya Endo, Rei Kurita

Link:  https://arxiv.org/abs/2501.03473v1

Date: 2025-01-07

Summary:

In jamming systems like colloids, emulsions, foams, and biological tissues, significant deformation is essential for processes such as material scraping or wound self-healing. To adequately spread a foam or cream over a surface, external force must be applied to artificially scrape it. The scraping of foam using a rigid plate has been observed to exhibit complex behavior distinct from that of simple liquids. In this study, we quantitatively analyzed the transition between partial and slender scraping regimes by examining changes in internal structure and partial spreading lengths. Our findings reveal that the sequential propagation of bubble rearrangement in the foam's internal structure leads to the partial scraping. Moreover, the scraping length in the partial scraping regime shows divergence near the transition point, characterized by a critical exponent of approximately 0.61. These results imply that foam scraping is governed by directional percolation theory, supported by the agreement between the experimentally observed critical exponent and theoretical predictions. This research significantly advances the understanding of macroscopic kinetics and rheological behavior in jamming systems, including foams, colloids, emulsions, and biological tissues.

--------------------------------------------------------------------------------------------------------

On-line Policy Improvement using Monte-Carlo Search

This paper presents a parallel Monte-Carlo simulation algorithm for real-time policy improvement in adaptive controllers. Initially tested on backgammon, the method shows significant error rate reductions across various initial policies. This approach could enhance decision-making systems in games, robotics, and other domains requiring real-time policy optimization, particularly when parallel computing resources are available.

Authors:  Gerald Tesauro, Gregory R. Galperin

Link:  https://arxiv.org/abs/2501.05407v1

Date: 2025-01-09

Summary:

We present a Monte-Carlo simulation algorithm for real-time policy improvement of an adaptive controller. In the Monte-Carlo simulation, the long-term expected reward of each possible action is statistically measured, using the initial policy to make decisions in each step of the simulation. The action maximizing the measured expected reward is then taken, resulting in an improved policy. Our algorithm is easily parallelizable and has been implemented on the IBM SP1 and SP2 parallel-RISC supercomputers.   We have obtained promising initial results in applying this algorithm to the domain of backgammon. Results are reported for a wide variety of initial policies, ranging from a random policy to TD-Gammon, an extremely strong multi-layer neural network. In each case, the Monte-Carlo algorithm gives a substantial reduction, by as much as a factor of 5 or more, in the error rate of the base players. The algorithm is also potentially useful in many other adaptive control applications in which it is possible to simulate the environment.

--------------------------------------------------------------------------------------------------------

Targeted Adversarial Denoising Autoencoders (TADA) for Neural Time Series Filtration

This research introduces TADA, a novel approach for filtering EEG time series data. The system uses a correlation-driven convolutional autoencoder with adversarial training to remove EMG noise efficiently. This advancement could improve brain-computer interfaces and medical diagnostics by providing cleaner, more reliable EEG signals while using fewer computational resources.

Authors:  Benjamin J. Choi, Griffin Milsap, Clara A. Scholl, Francesco Tenore, Mattson Ogg

Link:  https://arxiv.org/abs/2501.04967v2

Date: 2025-01-10

Summary:

Current machine learning (ML)-based algorithms for filtering electroencephalography (EEG) time series data face challenges related to cumbersome training times, regularization, and accurate reconstruction. To address these shortcomings, we present an ML filtration algorithm driven by a logistic covariance-targeted adversarial denoising autoencoder (TADA). We hypothesize that the expressivity of a targeted, correlation-driven convolutional autoencoder will enable effective time series filtration while minimizing compute requirements (e.g., runtime, model size). Furthermore, we expect that adversarial training with covariance rescaling will minimize signal degradation. To test this hypothesis, a TADA system prototype was trained and evaluated on the task of removing electromyographic (EMG) noise from EEG data in the EEGdenoiseNet dataset, which includes EMG and EEG data from 67 subjects. The TADA filter surpasses conventional signal filtration algorithms across quantitative metrics (Correlation Coefficient, Temporal RRMSE, Spectral RRMSE), and performs competitively against other deep learning architectures at a reduced model size of less than 400,000 trainable parameters. Further experimentation will be necessary to assess the viability of TADA on a wider range of deployment cases.

--------------------------------------------------------------------------------------------------------

How to Select Pre-Trained Code Models for Reuse? A Learning Perspective

As pre-trained code models proliferate, selecting the right one for specific tasks becomes crucial. This research explores efficient methods for model selection, developing learning-based strategies that significantly reduce selection time while maintaining performance. This work could streamline the development process for code-related tasks by helping developers choose appropriate pre-trained models without extensive testing.

Authors:  Zhangqian Bi, Yao Wan, Zhaoyang Chu, Yufei Hu, Junyi Zhang, Hongyu Zhang, Guandong Xu, Hai Jin

Link:  https://arxiv.org/abs/2501.03783v1

Date: 2025-01-07

Summary:

Pre-training a language model and then fine-tuning it has shown to be an efficient and effective technique for a wide range of code intelligence tasks, such as code generation, code summarization, and vulnerability detection. However, pretraining language models on a large-scale code corpus is computationally expensive. Fortunately, many off-the-shelf Pre-trained Code Models (PCMs), such as CodeBERT, CodeT5, CodeGen, and Code Llama, have been released publicly. These models acquire general code understanding and generation capability during pretraining, which enhances their performance on downstream code intelligence tasks. With an increasing number of these public pre-trained models, selecting the most suitable one to reuse for a specific task is essential. In this paper, we systematically investigate the reusability of PCMs. We first explore three intuitive model selection methods that select by size, training data, or brute-force fine-tuning. Experimental results show that these straightforward techniques either perform poorly or suffer high costs. Motivated by these findings, we explore learning-based model selection strategies that utilize pre-trained models without altering their parameters. Specifically, we train proxy models to gauge the performance of pre-trained models, and measure the distribution deviation between a model's latent features and the task's labels, using their closeness as an indicator of model transferability. We conduct experiments on 100 widely-used opensource PCMs for code intelligence tasks, with sizes ranging from 42.5 million to 3 billion parameters. The results demonstrate that learning-based selection methods reduce selection time to 100 seconds, compared to 2,700 hours with brute-force fine-tuning, with less than 6% performance degradation across related tasks.

--------------------------------------------------------------------------------------------------------

Distilling Calibration via Conformalized Credal Inference

Edge AI devices face challenges in balancing computational constraints with reliability requirements. This research proposes a method to distill calibration information from complex models to simpler ones, enabling reliable uncertainty quantification on edge devices. This approach could improve the deployment of AI systems in resource-constrained environments while maintaining prediction reliability.

Authors:  Jiayi Huang, Sangwoo Park, Nicola Paoletti, Osvaldo Simeone

Link:  https://arxiv.org/abs/2501.06066v1

Date: 2025-01-10

Summary:

Deploying artificial intelligence (AI) models on edge devices involves a delicate balance between meeting stringent complexity constraints, such as limited memory and energy resources, and ensuring reliable performance in sensitive decision-making tasks. One way to enhance reliability is through uncertainty quantification via Bayesian inference. This approach, however, typically necessitates maintaining and running multiple models in an ensemble, which may exceed the computational limits of edge devices. This paper introduces a low-complexity methodology to address this challenge by distilling calibration information from a more complex model. In an offline phase, predictive probabilities generated by a high-complexity cloud-based model are leveraged to determine a threshold based on the typical divergence between the cloud and edge models. At run time, this threshold is used to construct credal sets -- ranges of predictive probabilities that are guaranteed, with a user-selected confidence level, to include the predictions of the cloud model. The credal sets are obtained through thresholding of a divergence measure in the simplex of predictive probabilities. Experiments on visual and language tasks demonstrate that the proposed approach, termed Conformalized Distillation for Credal Inference (CD-CI), significantly improves calibration performance compared to low-complexity Bayesian methods, such as Laplace approximation, making it a practical and efficient solution for edge AI deployments.

--------------------------------------------------------------------------------------------------------

RoRA: Efficient Fine-Tuning of LLM with Reliability Optimization for Rank Adaptation

This paper addresses limitations in Low-Rank Adaptation (LoRA) for fine-tuning large language models. The proposed RoRA method optimizes scaling factors to improve performance as rank size increases. This advancement could make fine-tuning more efficient and effective for both standard and pruned language models, potentially reducing computational requirements while maintaining or improving performance.

Authors:  Jun Liu, Zhenglun Kong, Peiyan Dong, Xuan Shen, Pu Zhao, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Xue Lin, Dong Huang, Yanzhi Wang

Link:  https://arxiv.org/abs/2501.04315v1

Date: 2025-01-08

Summary:

Fine-tuning helps large language models (LLM) recover degraded information and enhance task performance.Although Low-Rank Adaptation (LoRA) is widely used and effective for fine-tuning, we have observed that its scaling factor can limit or even reduce performance as the rank size increases. To address this issue, we propose RoRA (Rank-adaptive Reliability Optimization), a simple yet effective method for optimizing LoRA's scaling factor. By replacing $\alpha/r$ with $\alpha/\sqrt{r}$, RoRA ensures improved performance as rank size increases. Moreover, RoRA enhances low-rank adaptation in fine-tuning uncompressed models and excels in the more challenging task of accuracy recovery when fine-tuning pruned models. Extensive experiments demonstrate the effectiveness of RoRA in fine-tuning both uncompressed and pruned models. RoRA surpasses the state-of-the-art (SOTA) in average accuracy and robustness on LLaMA-7B/13B, LLaMA2-7B, and LLaMA3-8B, specifically outperforming LoRA and DoRA by 6.5% and 2.9% on LLaMA-7B, respectively. In pruned model fine-tuning, RoRA shows significant advantages; for SHEARED-LLAMA-1.3, a LLaMA-7B with 81.4% pruning, RoRA achieves 5.7% higher average accuracy than LoRA and 3.9% higher than DoRA.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.