Week Ending 11.24.2024

 

RESEARCH WATCH: 11.24.2024

 

OPMOS: Ordered Parallel Multi-Objective Shortest-Path

In the realm of complex path optimization, finding the most efficient route with multiple competing objectives has long been a computational challenge. The Multi-Objective Shortest-Path (MOS) problem becomes increasingly complex as the number of objectives increases, quickly overwhelming traditional algorithms. The OPMOS framework introduces a groundbreaking approach by leveraging parallel computing to tackle this computational bottleneck. By exploiting parallel processing on advanced hardware like the NVIDIA GH200 Superchip, this research offers a promising solution for optimizing routing problems in domains such as logistics, transportation, and network planning, where multiple competing factors must be simultaneously considered.

Authors:  Leo Gold, Adam Bienkowski, David Sidoti, Krishna Pattipati, Omer Khan

Link:  https://arxiv.org/abs/2411.16667v1

Date: 2024-11-25

Summary:

The Multi-Objective Shortest-Path (MOS) problem finds a set of Pareto-optimal solutions from a start node to a destination node in a multi-attribute graph. To solve the NP-hard MOS problem, the literature explores heuristic multi-objective A*-style algorithmic approaches. A generalized MOS algorithm maintains a "frontier" of partial paths at each node and performs ordered processing to ensure that Pareto-optimal paths are generated to reach the goal node. The algorithm becomes computationally intractable as the number of objectives increases due to a rapid increase in the non-dominated paths, and the concomitantly large increase in Pareto-optimal solutions. While prior works have focused on algorithmic methods to reduce the complexity, we tackle this challenge by exploiting parallelism using an algorithm-architecture approach. The key insight is that MOS algorithms rely on the ordered execution of partial paths to maintain high work efficiency. The OPMOS framework, proposed herein, unlocks ordered parallelism and efficiently exploits the concurrent execution of multiple paths in MOS. Experimental evaluation using the NVIDIA GH200 Superchip shows the performance scaling potential of OPMOS on work efficiency and parallelism using a real-world application to ship routing.

--------------------------------------------------------------------------------------------------------

A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models

Autonomous driving technologies face significant challenges when adapting to diverse environmental conditions. As machine learning models struggle with domain shifts caused by varying weather, locations, and data sources, researchers are exploring innovative solutions. This study demonstrates how vision-language models can dramatically improve semantic segmentation performance by replacing traditional encoders. By achieving up to 10% improvement in mean Intersection over Union (mIoU) and showing remarkable generalization across unseen datasets, this research offers a promising pathway to more robust and adaptable computer vision systems, potentially revolutionizing autonomous vehicle perception and other domain-sensitive visual recognition tasks.

Authors:  Manuel Schwonberg, Claus Werner, Hanno Gottschalk, Carsten Meyer

Link:  https://arxiv.org/abs/2411.16407v1

Date: 2024-11-25

Summary:

Despite the recent progress in deep learning based computer vision, domain shifts are still one of the major challenges. Semantic segmentation for autonomous driving faces a wide range of domain shifts, e.g. caused by changing weather conditions, new geolocations and the frequent use of synthetic data in model training. Unsupervised domain adaptation (UDA) methods have emerged which adapt a model to a new target domain by only using unlabeled data of that domain. The variety of UDA methods is large but all of them use ImageNet pre-trained models. Recently, vision-language models have demonstrated strong generalization capabilities which may facilitate domain adaptation. We show that simply replacing the encoder of existing UDA methods like DACS by a vision-language pre-trained encoder can result in significant performance improvements of up to 10.0% mIoU on the GTA5-to-Cityscapes domain shift. For the generalization performance to unseen domains, the newly employed vision-language pre-trained encoder provides a gain of up to 13.7% mIoU across three unseen datasets. However, we find that not all UDA methods can be easily paired with the new encoder and that the UDA performance does not always likewise transfer into generalization performance. Finally, we perform our experiments on an adverse weather condition domain shift to further verify our findings on a pure real-to-real domain shift.

--------------------------------------------------------------------------------------------------------

Benchmarking Active Learning for NILM

Non-intrusive load monitoring (NILM) is crucial for understanding household energy consumption, but collecting comprehensive appliance-level data is expensive and time-consuming. This research introduces an innovative active learning approach that strategically selects which homes to monitor, dramatically reducing data collection costs. By developing uncertainty-aware neural networks and installing sensors in homes with the highest disaggregation uncertainty, researchers achieved comparable accuracy using only 30% of traditional data collection methods. This breakthrough could democratize energy monitoring, enabling more precise and cost-effective understanding of household electricity usage for utilities, researchers, and consumers.

Authors:  Dhruv Patel, Ankita Kumari Jain, Haikoo Khandor, Xhitij Choudhary, Nipun Batra

Link:  https://arxiv.org/abs/2411.15805v1

Date: 2024-11-24

Summary:

Non-intrusive load monitoring (NILM) focuses on disaggregating total household power consumption into appliance-specific usage. Many advanced NILM methods are based on neural networks that typically require substantial amounts of labeled appliance data, which can be challenging and costly to collect in real-world settings. We hypothesize that appliance data from all households does not uniformly contribute to NILM model improvements. Thus, we propose an active learning approach to selectively install appliance monitors in a limited number of houses. This work is the first to benchmark the use of active learning for strategically selecting appliance-level data to optimize NILM performance. We first develop uncertainty-aware neural networks for NILM and then install sensors in homes where disaggregation uncertainty is highest. Benchmarking our method on the publicly available Pecan Street Dataport dataset, we demonstrate that our approach significantly outperforms a standard random baseline and achieves performance comparable to models trained on the entire dataset. Using this approach, we achieve comparable NILM accuracy with approximately 30% of the data, and for a fixed number of sensors, we observe up to a 2x reduction in disaggregation errors compared to random sampling.

--------------------------------------------------------------------------------------------------------

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

As AI agents become increasingly complex, the demand for structured output generation has grown exponentially. XGrammar addresses this challenge by creating an efficient engine for parsing context-free grammars during large language model inference. By dividing vocabularies into context-independent and context-dependent tokens and designing an accelerated grammar execution method, this research achieves up to 100x speedup in structured generation. The potential applications span numerous domains, including code generation, function call parsing, and creating precise, controllable AI agent interactions, promising more reliable and performant AI systems.

Authors:  Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, Tianqi Chen

Link:  https://arxiv.org/abs/2411.15100v1

Date: 2024-11-22

Summary:

The applications of LLM Agents are becoming increasingly complex and diverse, leading to a high demand for structured outputs that can be parsed into code, structured function calls, and embodied agent commands. These developments bring significant demands for structured generation in LLM inference. Context-free grammar is a flexible approach to enable structured generation via constrained decoding. However, executing context-free grammar requires going through several stack states over all tokens in vocabulary during runtime, bringing non-negligible overhead for structured generation. In this paper, we propose XGrammar, a flexible and efficient structure generation engine for large language models. XGrammar accelerates context-free grammar execution by dividing the vocabulary into context-independent tokens that can be prechecked and context-dependent tokens that need to be interpreted during runtime. We further build transformations to expand the grammar context and reduce the number of context-independent tokens. Additionally, we build an efficient persistent stack to accelerate the context-dependent token checks. Finally, we co-design the grammar engine with LLM inference engine to overlap grammar computation with GPU executions. Evaluation results show that XGrammar can achieve up to 100x speedup over existing solutions. Combined with an LLM inference engine, it can generate near-zero overhead structure generation in end-to-end low-LLM serving.

--------------------------------------------------------------------------------------------------------

Grid and Road Expressions Are Complementary for Trajectory Representation Learning

Tracking and understanding movement patterns is critical in urban planning, transportation, and location-based services. This research introduces GREEN, a novel method that combines grid and road trajectory representations to create more comprehensive movement understanding. By jointly utilizing spatial information from both perspectives and employing innovative encoding techniques, the researchers achieved remarkable improvements in trajectory-related tasks. With an average 15.99% accuracy boost across multiple downstream applications, this approach could revolutionize how we analyze and predict movement patterns in smart cities, navigation systems, and urban mobility research.

Authors:  Silin Zhou, Shuo Shang, Lisi Chen, Peng Han, Christian S. Jensen

Link:  https://arxiv.org/abs/2411.14768v1

Date: 2024-11-22

Summary:

Trajectory representation learning (TRL) maps trajectories to vectors that can be used for many downstream tasks. Existing TRL methods use either grid trajectories, capturing movement in free space, or road trajectories, capturing movement in a road network, as input. We observe that the two types of trajectories are complementary, providing either region and location information or providing road structure and movement regularity. Therefore, we propose a novel multimodal TRL method, dubbed GREEN, to jointly utilize Grid and Road trajectory Expressions for Effective representatioN learning. In particular, we transform raw GPS trajectories into both grid and road trajectories and tailor two encoders to capture their respective information. To align the two encoders such that they complement each other, we adopt a contrastive loss to encourage them to produce similar embeddings for the same raw trajectory and design a mask language model (MLM) loss to use grid trajectories to help reconstruct masked road trajectories. To learn the final trajectory representation, a dual-modal interactor is used to fuse the outputs of the two encoders via cross-attention. We compare GREEN with 7 state-of-the-art TRL methods for 3 downstream tasks, finding that GREEN consistently outperforms all baselines and improves the accuracy of the best-performing baseline by an average of 15.99\%.

--------------------------------------------------------------------------------------------------------

Predictive Analytics of Air Alerts in the Russian-Ukrainian War

In conflict zones, predicting air alert patterns can be crucial for civilian safety and strategic planning. This study leverages data analytics to understand the complex dynamics of air raid warnings during the Russian-Ukrainian conflict. By analyzing geospatial correlations, seasonality, and temporal patterns, researchers developed predictive models that can anticipate alert likelihood in specific regions. The methodology demonstrates how machine learning can provide insights into complex, dynamic environments, potentially offering valuable tools for humanitarian organizations, military strategists, and policymakers dealing with ongoing conflicts and emergency response scenarios.

Authors:  Demian Pavlyshenko, Bohdan Pavlyshenko

Link:  https://arxiv.org/abs/2411.14625v1

Date: 2024-11-21

Summary:

The paper considers exploratory data analysis and approaches in predictive analytics for air alerts during the Russian-Ukrainian war which broke out on Feb 24, 2022. The results illustrate that alerts in regions correlate with one another and have geospatial patterns which make it feasible to build a predictive model which predicts alerts that are expected to take place in a certain region within a specified time period. The obtained results show that the alert status in a particular region is highly dependable on the features of its adjacent regions. Seasonality features like hours, days of a week and months are also crucial in predicting the target variable. Some regions highly rely on the time feature which equals to a number of days from the initial date of the dataset. From this, we can deduce that the air alert pattern changes throughout the time.

--------------------------------------------------------------------------------------------------------

A No Free Lunch Theorem for Human-AI Collaboration

As AI systems become more sophisticated, understanding the nuanced dynamics of human-AI collaboration becomes increasingly important. This research reveals fundamental limitations in collaborative decision-making, showing that achieving complementary performance is not straightforward. By demonstrating that no universal collaboration strategy can consistently outperform individual agents, the study provides critical insights into designing effective human-AI interactions. The findings have profound implications for fields like healthcare, scientific research, and complex decision-making environments where human expertise and AI capabilities must be carefully balanced.

Authors:  Kenny Peng, Nikhil Garg, Jon Kleinberg

Link:  https://arxiv.org/abs/2411.15230v1

Date: 2024-11-21

Summary:

The gold standard in human-AI collaboration is complementarity -- when combined performance exceeds both the human and algorithm alone. We investigate this challenge in binary classification settings where the goal is to maximize 0-1 accuracy. Given two or more agents who can make calibrated probabilistic predictions, we show a "No Free Lunch"-style result. Any deterministic collaboration strategy (a function mapping calibrated probabilities into binary classifications) that does not essentially always defer to the same agent will sometimes perform worse than the least accurate agent. In other words, complementarity cannot be achieved "for free." The result does suggest one model of collaboration with guarantees, where one agent identifies "obvious" errors of the other agent. We also use the result to understand the necessary conditions enabling the success of other collaboration techniques, providing guidance to human-AI collaboration.

--------------------------------------------------------------------------------------------------------

Open Challenges in the Formal Verification of Autonomous Driving

Autonomous driving systems represent complex, interconnected networks of hardware and software components from multiple manufacturers. This research highlights the critical challenges in ensuring the reliability and safety of such intricate systems. By exploring formal verification techniques, the study addresses the need to certify "black-box" components without full transparency. The work is crucial for developing robust autonomous vehicles, offering a framework for systematically assessing system reliability and safety in an increasingly complex technological landscape.

Authors:  Paolo Burgio, Angelo Ferrando, Marco Villani

Link:  https://arxiv.org/abs/2411.14520v1

Date: 2024-11-21

Summary:

In the realm of autonomous driving, the development and integration of highly complex and heterogeneous systems are standard practice. Modern vehicles are not monolithic systems; instead, they are composed of diverse hardware components, each running its own software systems. An autonomous vehicle comprises numerous independent components, often developed by different and potentially competing companies. This diversity poses significant challenges for the certification process, as it necessitates certifying components that may not disclose their internal behaviour (black-boxes). In this paper, we present a real-world case study of an autonomous driving system, identify key open challenges associated with its development and integration, and explore how formal verification techniques can address these challenges to ensure system reliability and safety.

--------------------------------------------------------------------------------------------------------

Trajectory Representation Learning on Road Networks and Grids with Spatio-Temporal Dynamics

Understanding movement patterns is fundamental to urban planning, transportation, and location-based services. This research introduces TIGR, a novel approach that integrates grid and road network representations to create more comprehensive trajectory understanding. By incorporating spatio-temporal dynamics, the method significantly improves performance in tasks like trajectory similarity computation, travel time estimation, and destination prediction. With improvements up to 43.22% in some metrics, this approach could revolutionize how we analyze and predict movement in smart cities and urban mobility research.

Authors:  Stefan Schestakov, Simon Gottschalk

Link:  https://arxiv.org/abs/2411.14014v1

Date: 2024-11-21

Summary:

Trajectory representation learning is a fundamental task for applications in fields including smart city, and urban planning, as it facilitates the utilization of trajectory data (e.g., vehicle movements) for various downstream applications, such as trajectory similarity computation or travel time estimation. This is achieved by learning low-dimensional representations from high-dimensional and raw trajectory data. However, existing methods for trajectory representation learning either rely on grid-based or road-based representations, which are inherently different and thus, could lose information contained in the other modality. Moreover, these methods overlook the dynamic nature of urban traffic, relying on static road network features rather than time varying traffic patterns. In this paper, we propose TIGR, a novel model designed to integrate grid and road network modalities while incorporating spatio-temporal dynamics to learn rich, general-purpose representations of trajectories. We evaluate TIGR on two realworld datasets and demonstrate the effectiveness of combining both modalities by substantially outperforming state-of-the-art methods, i.e., up to 43.22% for trajectory similarity, up to 16.65% for travel time estimation, and up to 10.16% for destination prediction.

--------------------------------------------------------------------------------------------------------

Branches, Assemble! Multi-Branch Cooperation Network for Large-Scale Click-Through Rate Prediction at Taobao

In the competitive world of e-commerce, accurately predicting user engagement is crucial for platform success. This research introduces MBCnet, an innovative machine learning approach for click-through rate (CTR) prediction that uses a multi-branch cooperation network. By enabling different network branches to collaborate and learn from each other, the method achieves significant improvements in predicting user interactions. The real-world implementation at Taobao demonstrated tangible business impacts, including increased deals and gross merchandise value, showcasing how advanced machine learning techniques can drive commercial success.

Authors:  Xu Chen, Zida Cheng, Yuangang Pan, Shuai Xiao, Xiaoming Liu, Jinsong Lan, Qingwen Liu, Ivor W. Tsang

Link:  https://arxiv.org/abs/2411.13057v1

Date: 2024-11-20

Summary:

Existing click-through rate (CTR) prediction works have studied the role of feature interaction through a variety of techniques. Each interaction technique exhibits its own strength, and solely using one type could constrain the model's capability to capture the complex feature relationships, especially for industrial large-scale data with enormous users and items. Recent research shows that effective CTR models often combine an MLP network with a dedicated feature interaction network in a two-parallel structure. However, the interplay and cooperative dynamics between different streams or branches remain under-researched. In this work, we introduce a novel Multi-Branch Cooperation Network (MBCnet) which enables multiple branch networks to collaborate with each other for better complex feature interaction modeling. Specifically, MBCnet consists of three branches: the Expert-based Feature Grouping and Crossing (EFGC) branch that promotes the model's memorization ability of specific feature fields, the low rank Cross Net branch and Deep branch to enhance both explicit and implicit feature crossing for improved generalization. Among branches, a novel cooperation scheme is proposed based on two principles: branch co-teaching and moderate differentiation. Branch co-teaching encourages well-learned branches to support poorly-learned ones on specific training samples. Moderate differentiation advocates branches to maintain a reasonable level of difference in their feature representations. The cooperation strategy improves learning through mutual knowledge sharing via co-teaching and boosts the discovery of diverse feature interactions across branches. Extensive experiments on large-scale industrial datasets and online A/B test demonstrate MBCnet's superior performance, delivering a 0.09 point increase in CTR, 1.49% growth in deals, and 1.62% rise in GMV. Core codes will be released soon.

--------------------------------------------------------------------------------------------------------

SuPLE: Robot Learning with Lyapunov Rewards

Designing effective reward functions for robotic learning has been a persistent challenge in artificial intelligence. This research introduces SuPLE, a novel approach that uses Lyapunov exponents to generate system-appropriate rewards without external assumptions. By eliminating the need for auxiliary exploration and enabling more natural training scenarios, the method shows promise in improving robot learning across various dynamical systems. The approach could revolutionize robotics by providing more intuitive and adaptable learning mechanisms for complex physical tasks.

Authors:  Phu Nguyen, Daniel Polani, Stas Tiomkin

Link:  https://arxiv.org/abs/2411.13613v1

Date: 2024-11-20

Summary:

The reward function is an essential component in robot learning. Reward directly affects the sample and computational complexity of learning, and the quality of a solution. The design of informative rewards requires domain knowledge, which is not always available. We use the properties of the dynamics to produce system-appropriate reward without adding external assumptions. Specifically, we explore an approach to utilize the Lyapunov exponents of the system dynamics to generate a system-immanent reward. We demonstrate that the `Sum of the Positive Lyapunov Exponents' (SuPLE) is a strong candidate for the design of such a reward. We develop a computational framework for the derivation of this reward, and demonstrate its effectiveness on classical benchmarks for sample-based stabilization of various dynamical systems. It eliminates the need to start the training trajectories at arbitrary states, also known as auxiliary exploration. While the latter is a common practice in simulated robot learning, it is unpractical to consider to use it in real robotic systems, since they typically start from natural rest states such as a pendulum at the bottom, a robot on the ground, etc. and can not be easily initialized at arbitrary states. Comparing the performance of SuPLE to commonly-used reward functions, we observe that the latter fail to find a solution without auxiliary exploration, even for the task of swinging up the double pendulum and keeping it stable at the upright position, a prototypical scenario for multi-linked robots. SuPLE-induced rewards for robot learning offer a novel route for effective robot learning in typical as opposed to highly specialized or fine-tuned scenarios. Our code is publicly available for reproducibility and further research.

--------------------------------------------------------------------------------------------------------

The Illusion of Empathy: How AI Chatbots Shape Conversation Perception

As AI chatbots become increasingly sophisticated, understanding user perception of conversational quality becomes crucial. This study examines how chatbot identity and perceived empathy influence user experience. By analyzing conversations and employing multiple empathy detection models, researchers discovered significant nuances in how users interpret AI-generated empathy. The findings underscore the complexity of human-AI interactions and highlight the need for more nuanced approaches to creating engaging, empathetic conversational experiences that go beyond simple language embedding.

Authors:  Tingting Liu, Salvatore Giorgi, Ankit Aich, Allison Lahnala, Brenda Curtis, Lyle Ungar, João Sedoc

Link:  https://arxiv.org/abs/2411.12877v1

Date: 2024-11-19

Summary:

As AI chatbots become more human-like by incorporating empathy, understanding user-centered perceptions of chatbot empathy and its impact on conversation quality remains essential yet under-explored. This study examines how chatbot identity and perceived empathy influence users' overall conversation experience. Analyzing 155 conversations from two datasets, we found that while GPT-based chatbots were rated significantly higher in conversational quality, they were consistently perceived as less empathetic than human conversational partners. Empathy ratings from GPT-4o annotations aligned with users' ratings, reinforcing the perception of lower empathy in chatbots. In contrast, 3 out of 5 empathy models trained on human-human conversations detected no significant differences in empathy language between chatbots and humans. Our findings underscore the critical role of perceived empathy in shaping conversation quality, revealing that achieving high-quality human-AI interactions requires more than simply embedding empathetic language; it necessitates addressing the nuanced ways users interpret and experience empathy in conversations with chatbots.

--------------------------------------------------------------------------------------------------------

ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Prompt engineering is a critical yet challenging aspect of leveraging large language models effectively. This research introduces ACING, an automated prompt optimization approach using reinforcement learning techniques. By treating prompt optimization as a continuous-action problem, the method consistently improves task performance, achieving median score improvements of 10 percentage points and even surpassing human-crafted instructions in some cases. The approach promises to make AI systems more adaptable and efficient across various tasks.Authors:  Salma Kharrat, Fares Fourati, Marco Canini

Link:  https://arxiv.org/abs/2411.12736v1

Date: 2024-11-19

Summary:

The effectiveness of Large Language Models (LLMs) in solving tasks vastly depends on the quality of the instructions, which often require fine-tuning through extensive human effort. This highlights the need for automated instruction optimization; however, this optimization is particularly challenging when dealing with black-box LLMs, where model parameters and gradients remain inaccessible. We propose ACING, a task-specific prompt optimization approach framed as a stateless continuous-action Reinforcement Learning (RL) problem, known as the continuum bandit setting. ACING leverages an actor-critic-based method to optimize prompts, learning from non-differentiable reward signals. We validate ACING by optimizing prompts for ChatGPT on 30 instruction-based tasks. ACING consistently outperforms baseline methods, achieving a median score improvement of 10 percentage points. Furthermore, ACING not only recovers but also surpasses human-crafted expert instructions, achieving up to a 39 percentage point improvement against human benchmarks.

--------------------------------------------------------------------------------------------------------

The Hermeneutic Turn of AI: Is the Machine Capable of Interpreting?

The rapid advancement of artificial intelligence, particularly deep learning, is fundamentally reshaping our understanding of computational systems and human-machine interaction. This philosophical exploration delves into the profound transformations brought about by artificial neural networks, challenging traditional concepts of machine interpretation. By drawing parallels with the hermeneutic philosophical tradition, the research seeks to deconstruct romanticized notions of human-like artificial intelligence. The work critically examines how deep learning technologies are not merely technical innovations, but represent a paradigm shift in how we conceptualize machine understanding, interpretation, and interaction. It promises to provide nuanced insights into the epistemological boundaries between human cognition and computational processes.

Authors:  Remy Demichelis

Link:  https://arxiv.org/abs/2411.12517v1

Date: 2024-11-19

Summary:

This article aims to demonstrate how the approach to computing is being disrupted by deep learning (artificial neural networks), not only in terms of techniques but also in our interactions with machines. It also addresses the philosophical tradition of hermeneutics (Don Ihde, Wilhelm Dilthey) to highlight a parallel with this movement and to demystify the idea of human-like AI.

--------------------------------------------------------------------------------------------------------

GRL-Prompt: Towards Knowledge Graph based Prompt Optimization via Reinforcement Learning

Optimizing prompts for large language models has been a labor-intensive, trial-and-error process. This research introduces GRL-Prompt, a framework that automatically constructs optimal prompts using reinforcement learning and knowledge graphs. By encoding correlations between queries and context examples, the method generates more effective prompts. The approach demonstrates significant improvements in natural language processing metrics, offering a promising path towards more intelligent and adaptable language model interactions.

Authors:  Yuze Liu, Tingjie Liu, Tiehua Zhang, Youhua Xia, Jinze Wang, Zhishu Shen, Jiong Jin, Fei Richard Yu

Link:  https://arxiv.org/abs/2411.14479v1

Date: 2024-11-19

Summary:

Large language models (LLMs) have demonstrated impressive success in a wide range of natural language processing (NLP) tasks due to their extensive general knowledge of the world. Recent works discovered that the performance of LLMs is heavily dependent on the input prompt. However, prompt engineering is usually done manually in a trial-and-error fashion, which can be labor-intensive and challenging in order to find the optimal prompts. To address these problems and unleash the utmost potential of LLMs, we propose a novel LLMs-agnostic framework for prompt optimization, namely GRL-Prompt, which aims to automatically construct optimal prompts via reinforcement learning (RL) in an end-to-end manner. To provide structured action/state representation for optimizing prompts, we construct a knowledge graph (KG) that better encodes the correlation between the user query and candidate in-context examples. Furthermore, a policy network is formulated to generate the optimal action by selecting a set of in-context examples in a rewardable order to construct the prompt. Additionally, the embedding-based reward shaping is utilized to stabilize the RL training process. The experimental results show that GRL-Prompt outperforms recent state-of-the-art methods, achieving an average increase of 0.10 in ROUGE-1, 0.07 in ROUGE-2, 0.07 in ROUGE-L, and 0.05 in BLEU.

--------------------------------------------------------------------------------------------------------

SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks

Explainability remains a significant challenge in deep reinforcement learning, particularly for complex continuous control tasks. SkillTree introduces a novel framework that transforms continuous action spaces into discrete skill spaces, integrating a differentiable decision tree to generate interpretable skill embeddings. By achieving performance comparable to neural networks while providing skill-level explanations, the approach offers a promising direction for creating more transparent and understandable robotic control systems.

Authors:  Yongyan Wen, Siyuan Li, Rongchang Zuo, Lei Yuan, Hangyu Mao, Peng Liu

Link:  https://arxiv.org/abs/2411.12173v1

Date: 2024-11-19

Summary:

Deep reinforcement learning (DRL) has achieved remarkable success in various research domains. However, its reliance on neural networks results in a lack of transparency, which limits its practical applications. To achieve explainability, decision trees have emerged as a popular and promising alternative to neural networks. Nonetheless, due to their limited expressiveness, traditional decision trees struggle with high-dimensional long-horizon continuous control tasks. In this paper, we proposes SkillTree, a novel framework that reduces complex continuous action spaces into discrete skill spaces. Our hierarchical approach integrates a differentiable decision tree within the high-level policy to generate skill embeddings, which subsequently guide the low-level policy in executing skills. By making skill decisions explainable, we achieve skill-level explainability, enhancing the understanding of the decision-making process in complex tasks. Experimental results demonstrate that our method achieves performance comparable to skill-based neural networks in complex robotic arm control domains. Furthermore, SkillTree offers explanations at the skill level, thereby increasing the transparency of the decision-making process.

--------------------------------------------------------------------------------------------------------

Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning

Robotic learning has long been hindered by the need for extensive training data and the challenge of interpreting complex action trajectories. This research tackles a fundamental problem in robotics: how to efficiently train reinforcement learning agents with limited and noisy data. By developing an innovative approach that learns Q-values across action sequences, the researchers offer a breakthrough in understanding action consequences. The method shows particular promise in challenging robotic control scenarios like bi-manual manipulation and whole-body movement. By enabling more nuanced learning from imperfect trajectories, this approach could revolutionize robot training, making sophisticated robotic systems more adaptable and easier to develop across various complex physical tasks.

Authors:  Younggyo Seo, Pieter Abbeel

Link:  https://arxiv.org/abs/2411.12155v1

Date: 2024-11-19

Summary:

Training reinforcement learning (RL) agents on robotic tasks typically requires a large number of training samples. This is because training data often consists of noisy trajectories, whether from exploration or human-collected demonstrations, making it difficult to learn value functions that understand the effect of taking each action. On the other hand, recent behavior-cloning (BC) approaches have shown that predicting a sequence of actions enables policies to effectively approximate noisy, multi-modal distributions of expert demonstrations. Can we use a similar idea for improving RL on robotic tasks? In this paper, we introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions. By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories. We study our algorithm across various setups with sparse and dense rewards, and with or without demonstrations, spanning mobile bi-manual manipulation, whole-body control, and tabletop manipulation tasks from BiGym, HumanoidBench, and RLBench. We find that, by learning the critic network with action sequences, our algorithm outperforms various RL and BC baselines, in particular on challenging humanoid control tasks.

--------------------------------------------------------------------------------------------------------

HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments

Navigating crowded, complex environments remains a significant challenge for autonomous robots. Traditional navigation methods often fail to comprehensively model interactions between humans, robots, and obstacles. HEIGHT introduces a groundbreaking approach using a heterogeneous spatio-temporal graph representation that captures intricate environmental dynamics. By employing advanced attention mechanisms and recurrent networks, the system can adaptively track and respond to changing scene conditions. The research demonstrates superior performance in navigation challenges and impressive zero-shot generalization capabilities. Potential applications span autonomous delivery robots, healthcare assistance, warehouse automation, and urban mobility, promising safer and more intelligent robotic navigation in dynamic, constrained environments.

Authors:  Shuijing Liu, Haochen Xia, Fatemeh Cheraghi Pouria, Kaiwen Hong, Neeloy Chakraborty, Katherine Driggs-Campbell

Link:  https://arxiv.org/abs/2411.12150v1

Date: 2024-11-19

Summary:

We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different components in the environment and propose a heterogeneous spatio-temporal (st) graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous st-graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions among entities through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success and efficiency in challenging navigation scenarios. Furthermore, we demonstrate that our pipeline achieves better zero-shot generalization capability than previous works when the densities of humans and obstacles change. More videos are available at https://sites.google.com/view/crowdnav-height/home.

--------------------------------------------------------------------------------------------------------

Deciphering genomic codes using advanced NLP techniques: a scoping review

The complexity of human genomic data has long challenged researchers seeking to unlock its intricate information. This scoping review explores a revolutionary approach: applying Natural Language Processing and Large Language Models to genomic analysis. By treating genetic sequences like linguistic texts, researchers can potentially transform how we understand genetic information. The study reveals promising applications in predicting regulatory annotations, such as transcription-factor binding sites and chromatin accessibility. With potential implications for personalized medicine, this interdisciplinary approach offers more efficient, scalable genomic analysis methods. The research not only demonstrates technological innovation but also highlights the growing intersection between computational linguistics and biological research.

Authors:  Shuyan Cheng, Yishu Wei, Yiliang Zhou, Zihan Xu, Drew N Wright, Jinze Liu, Yifan Peng

Link:  https://arxiv.org/abs/2411.16084v1

Date: 2024-11-25

Summary:

Objectives: The vast and complex nature of human genomic sequencing data presents challenges for effective analysis. This review aims to investigate the application of Natural Language Processing (NLP) techniques, particularly Large Language Models (LLMs) and transformer architectures, in deciphering genomic codes, focusing on tokenization, transformer models, and regulatory annotation prediction. The goal of this review is to assess data and model accessibility in the most recent literature, gaining a better understanding of the existing capabilities and constraints of these tools in processing genomic sequencing data.   Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, our scoping review was conducted across PubMed, Medline, Scopus, Web of Science, Embase, and ACM Digital Library. Studies were included if they focused on NLP methodologies applied to genomic sequencing data analysis, without restrictions on publication date or article type.   Results: A total of 26 studies published between 2021 and April 2024 were selected for review. The review highlights that tokenization and transformer models enhance the processing and understanding of genomic data, with applications in predicting regulatory annotations like transcription-factor binding sites and chromatin accessibility.   Discussion: The application of NLP and LLMs to genomic sequencing data interpretation is a promising field that can help streamline the processing of large-scale genomic data while also providing a better understanding of its complex structures. It has the potential to drive advancements in personalized medicine by offering more efficient and scalable solutions for genomic analysis. Further research is also needed to discuss and overcome current limitations, enhancing model transparency and applicability.

--------------------------------------------------------------------------------------------------------

Navigating the Effect of Parametrization for Dimensionality Reduction

Dimensionality reduction techniques are crucial for understanding complex, high-dimensional datasets across numerous scientific and technological domains. This research challenges the prevailing assumption about the equivalence of parametric and non-parametric methods. By revealing that parametric approaches retain global structures while losing local details, the study introduces ParamRepulsor, an innovative method addressing these limitations. The new technique incorporates advanced negative pair mining and sophisticated loss functions to preserve both global and local data representations. With potential applications in machine learning, data visualization, and scientific modeling, this research offers a more nuanced approach to understanding and representing complex, multi-dimensional datasets.

Authors:  Haiyang Huang, Yingfan Wang, Cynthia Rudin

Link:  https://arxiv.org/abs/2411.15894v1

Date: 2024-11-24

Summary:

Parametric dimensionality reduction methods have gained prominence for their ability to generalize to unseen datasets, an advantage that traditional approaches typically lack. Despite their growing popularity, there remains a prevalent misconception among practitioners about the equivalence in performance between parametric and non-parametric methods. Here, we show that these methods are not equivalent -- parametric methods retain global structure but lose significant local details. To explain this, we provide evidence that parameterized approaches lack the ability to repulse negative pairs, and the choice of loss function also has an impact. Addressing these issues, we developed a new parametric method, ParamRepulsor, that incorporates Hard Negative Mining and a loss function that applies a strong repulsive force. This new method achieves state-of-the-art performance on local structure preservation for parametric methods without sacrificing the fidelity of global structural representation. Our code is available at https://github.com/hyhuang00/ParamRepulsor.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.