Eye On AI

View Original

Week Ending 3.31.2024

RESEARCH WATCH: 3.31.2024

Finding Decision Tree Splits in Streaming and Massively Parallel Models

Finding Decision Tree Splits in Streaming and Massively Parallel Models proposes efficient algorithms for computing optimal splits in decision tree learning from data streams. These algorithms could have applications in real-time data analysis, where observations need to be processed as they arrive, with limited memory and computational resources.

Authors:  Huy Pham, Hoang Ta, Hoa T. Vu

Link:  https://arxiv.org/abs/2403.19867v1

Date: 2024-03-28

Summary:

In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split point $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate (for classification) is minimized. We provide various fast streaming algorithms that use sublinear space and a small number of passes for these problems. These algorithms can also be extended to the massively parallel computation model. Our work, while not directly comparable, complements the seminal work of Domingos and Hulten (KDD 2000).

--------------------------------------------------------------------------------------------------------

Mixing Artificial and Natural Intelligence: From Statistical Mechanics to AI and Back to Turbulence

From Statistical Mechanics to AI and Back to Turbulence explores the role of AI in advancing turbulence research, particularly through deep neural networks and reduced models. This work could lead to improved understanding and modeling of turbulent flows, with applications in fluid dynamics, atmospheric sciences, and engineering.

Authors:  Michael, Chertkov

Link:  https://arxiv.org/abs/2403.17993v1

Date: 2024-03-26

Summary:

The paper reflects on the future role of AI in scientific research, with a special focus on turbulence studies, and examines the evolution of AI, particularly through Diffusion Models rooted in non-equilibrium statistical mechanics. It underscores the significant impact of AI on advancing reduced, Lagrangian models of turbulence through innovative use of deep neural networks. Additionally, the paper reviews various other AI applications in turbulence research and outlines potential challenges and opportunities in the concurrent advancement of AI and statistical hydrodynamics. This discussion sets the stage for a future where AI and turbulence research are intricately intertwined, leading to more profound insights and advancements in both fields.

--------------------------------------------------------------------------------------------------------

As Good As A Coin Toss: Human detection of AI-generated images, videos, audio, and audiovisual stimuli

Human detection of AI-generated images, videos, audio, and audiovisual stimuli highlights the challenge of distinguishing synthetic media from authentic content. This study has important implications for combating misinformation and developing countermeasures against the malicious use of deepfakes and synthetic media.

Authors:  Di Cooke, Abigail Edwards, Sophia Barkoff, Kathryn Kelly

Link:  https://arxiv.org/abs/2403.16760v2

Date: 2024-03-26

Summary:

As synthetic media becomes progressively more realistic and barriers to using it continue to lower, the technology has been increasingly utilized for malicious purposes, from financial fraud to nonconsensual pornography. Today, the principal defense against being misled by synthetic media relies on the ability of the human observer to visually and auditorily discern between real and fake. However, it remains unclear just how vulnerable people actually are to deceptive synthetic media in the course of their day to day lives. We conducted a perceptual study with 1276 participants to assess how accurate people were at distinguishing synthetic images, audio only, video only, and audiovisual stimuli from authentic. To reflect the circumstances under which people would likely encounter synthetic media in the wild, testing conditions and stimuli emulated a typical online platform, while all synthetic media used in the survey was sourced from publicly accessible generative AI technology.   We find that overall, participants struggled to meaningfully discern between synthetic and authentic content. We also find that detection performance worsens when the stimuli contains synthetic content as compared to authentic content, images featuring human faces as compared to non face objects, a single modality as compared to multimodal stimuli, mixed authenticity as compared to being fully synthetic for audiovisual stimuli, and features foreign languages as compared to languages the observer is fluent in. Finally, we also find that prior knowledge of synthetic media does not meaningfully impact their detection performance. Collectively, these results indicate that people are highly susceptible to being tricked by synthetic media in their daily lives and that human perceptual detection capabilities can no longer be relied upon as an effective counterdefense.

--------------------------------------------------------------------------------------------------------

Employing High-Dimensional RIS Information for RIS-aided Localization Systems

Employing High-Dimensional RIS Information for RIS-aided Localization Systems proposes algorithms for improving the accuracy of localization systems by leveraging high-dimensional information from reconfigurable intelligent surfaces (RIS). This could have applications in indoor positioning, wireless communications, and IoT networks.

Authors:  Tuo Wu, Cunhua Pan, Kangda Zhi, Hong Ren, Maged Elkashlan, Jiangzhou Wang

Link:  https://arxiv.org/abs/2403.16521v1

Date: 2024-03-25

Summary:

Reconfigurable intelligent surface (RIS)-aided localization systems have attracted extensive research attention due to their accuracy enhancement capabilities. However, most studies primarily utilized the base stations (BS) received signal, i.e., BS information, for localization algorithm design, neglecting the potential of RIS received signal, i.e., RIS information. Compared with BS information, RIS information offers higher dimension and richer feature set, thereby significantly improving the ability to extract positions of the mobile users (MUs). Addressing this oversight, this paper explores the algorithm design based on the high-dimensional RIS information. Specifically, we first propose a RIS information reconstruction (RIS-IR) algorithm to reconstruct the high-dimensional RIS information from the low-dimensional BS information. The proposed RIS-IR algorithm comprises a data processing module for preprocessing BS information, a convolution neural network (CNN) module for feature extraction, and an output module for outputting the reconstructed RIS information. Then, we propose a transfer learning based fingerprint (TFBF) algorithm that employs the reconstructed high-dimensional RIS information for MU localization. This involves adapting a pre-trained DenseNet-121 model to map the reconstructed RIS signal to the MU's three-dimensional (3D) position. Empirical results affirm that the localization performance is significantly influenced by the high-dimensional RIS information and maintains robustness against unoptimized phase shifts.

--------------------------------------------------------------------------------------------------------

Graphs Generalization under Distribution Shifts

Graphs Generalization under Distribution Shifts introduces a framework for improving the out-of-distribution generalization of graph neural networks when faced with distribution shifts in node attributes and graph topology. This could benefit applications involving dynamic graphs, such as social networks, protein interaction networks, and traffic monitoring.

Authors:  Qin Tian, Wenjun Wang, Chen Zhao, Minglai Shao, Wang Zhang, Dong Li

Link:  https://arxiv.org/abs/2403.16334v1

Date: 2024-03-25

Summary:

Traditional machine learning methods heavily rely on the independent and identically distribution assumption, which imposes limitations when the test distribution deviates from the training distribution. To address this crucial issue, out-of-distribution (OOD) generalization, which aims to achieve satisfactory generalization performance when faced with unknown distribution shifts, has made a significant process. However, the OOD method for graph-structured data currently lacks clarity and remains relatively unexplored due to two primary challenges. Firstly, distribution shifts on graphs often occur simultaneously on node attributes and graph topology. Secondly, capturing invariant information amidst diverse distribution shifts proves to be a formidable challenge. To overcome these obstacles, in this paper, we introduce a novel framework, namely Graph Learning Invariant Domain genERation (GLIDER). The goal is to (1) diversify variations across domains by modeling the potential seen or unseen variations of attribute distribution and topological structure and (2) minimize the discrepancy of the variation in a representation space where the target is to predict semantic labels. Extensive experiment results indicate that our model outperforms baseline methods on node-level OOD generalization across domains in distribution shift on node features and topological structures simultaneously.

--------------------------------------------------------------------------------------------------------

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning

Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning explores methods to improve the performance of low-parameter language models as intelligent agents. This work could contribute to the development of more capable and efficient AI assistants and conversational agents.

Authors:  Qinhao Zhou, Zihan Zhang, Xiang Xiang, Ke Wang, Yuchuan Wu, Yongbin Li

Link:  https://arxiv.org/abs/2403.19962v1

Date: 2024-03-29

Summary:

Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities, making them highly successful in a variety of tasks. However, when used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4. As intelligent agents, LLMs need to have the capabilities of task planning, long-term memory, and the ability to leverage external tools to achieve satisfactory performance. Various methods have been proposed to enhance the agent capabilities of LLMs. On the one hand, methods involve constructing agent-specific data and fine-tuning the models. On the other hand, some methods focus on designing prompts that effectively activate the reasoning abilities of the LLMs. We explore both strategies on the 7B and 13B models. We propose a comprehensive method for constructing agent-specific data using GPT-4. Through supervised fine-tuning with constructed data, we find that for these models with a relatively small number of parameters, supervised fine-tuning can significantly reduce hallucination outputs and formatting errors in agent tasks. Furthermore, techniques such as multi-path reasoning and task decomposition can effectively decrease problem complexity and enhance the performance of LLMs as agents. We evaluate our method on five agent tasks of AgentBench and achieve satisfactory results.

--------------------------------------------------------------------------------------------------------

An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM

Zero-shot Video Question Answering Using a VLM presents a novel approach for video question answering by transforming videos into image grids and leveraging vision-language models. This technique could simplify and enhance video understanding tasks in various domains, such as surveillance, entertainment, and education.

Authors:  Wonkyun Kim, Changin Choi, Wonseok Lee, Wonjong Rhee

Link:  https://arxiv.org/abs/2403.18406v1

Date: 2024-03-27

Summary:

Stimulated by the sophisticated reasoning capabilities of recent Large Language Models (LLMs), a variety of strategies for bridging video modality have been devised. A prominent strategy involves Video Language Models (VideoLMs), which train a learnable interface with video data to connect advanced vision encoders with LLMs. Recently, an alternative strategy has surfaced, employing readily available foundation models, such as VideoLMs and LLMs, across multiple stages for modality bridging. In this study, we introduce a simple yet novel strategy where only a single Vision Language Model (VLM) is utilized. Our starting point is the plain insight that a video comprises a series of images, or frames, interwoven with temporal information. The essence of video comprehension lies in adeptly managing the temporal aspects along with the spatial details of each frame. Initially, we transform a video into a single composite image by arranging multiple frames in a grid layout. The resulting single image is termed as an image grid. This format, while maintaining the appearance of a solitary image, effectively retains temporal information within the grid structure. Therefore, the image grid approach enables direct application of a single high-performance VLM without necessitating any video-data training. Our extensive experimental analysis across ten zero-shot video question answering benchmarks, including five open-ended and five multiple-choice benchmarks, reveals that the proposed Image Grid Vision Language Model (IG-VLM) surpasses the existing methods in nine out of ten benchmarks.

--------------------------------------------------------------------------------------------------------

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution introduces a multi-agent framework for resolving GitHub issues using large language models. This could potentially automate and streamline software development and maintenance processes, improving productivity and reducing human effort.

Authors:  Wei Tao, Yucheng Zhou, Wenqiang Zhang, Yu Cheng

Link:  https://arxiv.org/abs/2403.17927v1

Date: 2024-03-26

Summary:

In software evolution, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing functionalities. Large Language Models (LLMs) have shown promise in code generation and understanding but face difficulties in code change, particularly at the repository level. To overcome these challenges, we empirically study the reason why LLMs mostly fail to resolve GitHub issues and analyze some impact factors. Motivated by the empirical findings, we propose a novel LLM-based Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four kinds of agents customized for the software evolution: Manager, Repository Custodian, Developer, and Quality Assurance Engineer agents. This framework leverages the collaboration of various agents in the planning and coding process to unlock the potential of LLMs to resolve GitHub issues. In experiments, we employ the SWE-bench benchmark to compare MAGIS with popular LLMs, including GPT-3.5, GPT-4, and Claude-2. MAGIS can resolve 13.94% GitHub issues, which significantly outperforms the baselines. Specifically, MAGIS achieves an eight-fold increase in resolved ratio over the direct application of GPT-4, the based LLM of our method. We also analyze the factors for improving GitHub issue resolution rates, such as line location, task allocation, etc.

--------------------------------------------------------------------------------------------------------

A Peg-in-hole Task Strategy for Holes in Concrete

A Peg-in-hole Task Strategy for Holes in Concrete proposes a method for enabling industrial robots to perform peg-in-hole tasks in concrete structures, using deep reinforcement learning. This work could have applications in construction and infrastructure maintenance, improving efficiency and safety.

Authors:  André Yuji Yasutomi, Hiroki Mori, Tetsuya Ogata

Link:  https://arxiv.org/abs/2403.19946v1

Date: 2024-03-29

Summary:

A method that enables an industrial robot to accomplish the peg-in-hole task for holes in concrete is proposed. The proposed method involves slightly detaching the peg from the wall, when moving between search positions, to avoid the negative influence of the concrete's high friction coefficient. It uses a deep neural network (DNN), trained via reinforcement learning, to effectively find holes with variable shape and surface finish (due to the brittle nature of concrete) without analytical modeling or control parameter tuning. The method uses displacement of the peg toward the wall surface, in addition to force and torque, as one of the inputs of the DNN. Since the displacement increases as the peg gets closer to the hole (due to the chamfered shape of holes in concrete), it is a useful parameter for inputting in the DNN. The proposed method was evaluated by training the DNN on a hole 500 times and attempting to find 12 unknown holes. The results of the evaluation show the DNN enabled a robot to find the unknown holes with average success rate of 96.1% and average execution time of 12.5 seconds. Additional evaluations with random initial positions and a different type of peg demonstrate the trained DNN can generalize well to different conditions. Analyses of the influence of the peg displacement input showed the success rate of the DNN is increased by utilizing this parameter. These results validate the proposed method in terms of its effectiveness and applicability to the construction industry.

--------------------------------------------------------------------------------------------------------

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation

Mol-AIR: Molecular Reinforcement Learning with Adaptive Intrinsic Rewards for Goal-directed Molecular Generation presents a reinforcement learning framework for generating molecules with desired properties, useful for drug discovery and material design applications.

Authors:  Jinyeong Park, Jaegyoon Ahn, Jonghwan Choi, Jibum Kim

Link:  https://arxiv.org/abs/2403.20109v1

Date: 2024-03-29

Summary:

Optimizing techniques for discovering molecular structures with desired properties is crucial in artificial intelligence(AI)-based drug discovery. Combining deep generative models with reinforcement learning has emerged as an effective strategy for generating molecules with specific properties. Despite its potential, this approach is ineffective in exploring the vast chemical space and optimizing particular chemical properties. To overcome these limitations, we present Mol-AIR, a reinforcement learning-based framework using adaptive intrinsic rewards for effective goal-directed molecular generation. Mol-AIR leverages the strengths of both history-based and learning-based intrinsic rewards by exploiting random distillation network and counting-based strategies. In benchmark tests, Mol-AIR demonstrates superior performance over existing approaches in generating molecules with desired properties without any prior knowledge, including penalized LogP, QED, and celecoxib similarity. We believe that Mol-AIR represents a significant advancement in drug discovery, offering a more efficient path to discovering novel therapeutics.

--------------------------------------------------------------------------------------------------------

Many-Objective Evolutionary Influence Maximization: Balancing Spread, Budget, Fairness, and Time

Many-Objective Evolutionary Influence Maximization considers multiple objectives in influence maximization problems on graphs, aiming to balance spread, budget, fairness, and time constraints. This approach could optimize viral marketing campaigns and information dissemination across networks while accounting for various practical considerations.

Authors:  Elia Cunegatti, Leonardo Lucio Custode, Giovanni Iacca

Link:  https://arxiv.org/abs/2403.18755v2

Date: 2024-03-28

Summary:

The Influence Maximization (IM) problem seeks to discover the set of nodes in a graph that can spread the information propagation at most. This problem is known to be NP-hard, and it is usually studied by maximizing the influence (spread) and, optionally, optimizing a second objective, such as minimizing the seed set size or maximizing the influence fairness. However, in many practical scenarios multiple aspects of the IM problem must be optimized at the same time. In this work, we propose a first case study where several IM-specific objective functions, namely budget, fairness, communities, and time, are optimized on top of the maximization of influence and minimization of the seed set size. To this aim, we introduce MOEIM (Many-Objective Evolutionary Algorithm for Influence Maximization) a Multi-Objective Evolutionary Algorithm (MOEA) based on NSGA-II incorporating graph-aware operators and a smart initialization. We compare MOEIM in two experimental settings, including a total of nine graph datasets, two heuristic methods, a related MOEA, and a state-of-the-art Deep Learning approach. The experiments show that MOEIM overall outperforms the competitors in most of the tested many-objective settings. To conclude, we also investigate the correlation between the objectives, leading to novel insights into the topic. The codebase is available at https://github.com/eliacunegatti/MOEIM.

--------------------------------------------------------------------------------------------------------

How Reliable is Your Simulator? Analysis on the Limitations of Current LLM-based User Simulators for Conversational Recommendation

How Reliable is Your Simulator? analyzes limitations of using large language models (LLMs) as user simulators for conversational recommender systems. Addressing these limitations could improve the reliability and trustworthiness of AI assistants providing personalized recommendations through dialogue interactions.

Authors:  Lixi Zhu, Xiaowen Huang, Jitao Sang

Link:  https://arxiv.org/abs/2403.16416v1

Date: 2024-03-25

Summary:

Conversational Recommender System (CRS) interacts with users through natural language to understand their preferences and provide personalized recommendations in real-time. CRS has demonstrated significant potential, prompting researchers to address the development of more realistic and reliable user simulators as a key focus. Recently, the capabilities of Large Language Models (LLMs) have attracted a lot of attention in various fields. Simultaneously, efforts are underway to construct user simulators based on LLMs. While these works showcase innovation, they also come with certain limitations that require attention. In this work, we aim to analyze the limitations of using LLMs in constructing user simulators for CRS, to guide future research. To achieve this goal, we conduct analytical validation on the notable work, iEvaLM. Through multiple experiments on two widely-used datasets in the field of conversational recommendation, we highlight several issues with the current evaluation methods for user simulators based on LLMs: (1) Data leakage, which occurs in conversational history and the user simulator's replies, results in inflated evaluation results. (2) The success of CRS recommendations depends more on the availability and quality of conversational history than on the responses from user simulators. (3) Controlling the output of the user simulator through a single prompt template proves challenging. To overcome these limitations, we propose SimpleUserSim, employing a straightforward strategy to guide the topic toward the target items. Our study validates the ability of CRS models to utilize the interaction information, significantly improving the recommendation results.

--------------------------------------------------------------------------------------------------------

U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models

U-Sketch proposes an efficient framework for sketch-to-image synthesis using diffusion models and edge prediction networks. This technique could streamline creative workflows by enabling artists to generate photorealistic images from rough sketches rapidly and accurately.

Authors:  Ilias Mitsouras, Eleftherios Tsonis, Paraskevi Tzouveli, Athanasios Voulodimos

Link:  https://arxiv.org/abs/2403.18425v1

Date: 2024-03-27

Summary:

Diffusion models have demonstrated remarkable performance in text-to-image synthesis, producing realistic and high resolution images that faithfully adhere to the corresponding text-prompts. Despite their great success, they still fall behind in sketch-to-image synthesis tasks, where in addition to text-prompts, the spatial layout of the generated images has to closely follow the outlines of certain reference sketches. Employing an MLP latent edge predictor to guide the spatial layout of the synthesized image by predicting edge maps at each denoising step has been recently proposed. Despite yielding promising results, the pixel-wise operation of the MLP does not take into account the spatial layout as a whole, and demands numerous denoising iterations to produce satisfactory images, leading to time inefficiency. To this end, we introduce U-Sketch, a framework featuring a U-Net type latent edge predictor, which is capable of efficiently capturing both local and global features, as well as spatial correlations between pixels. Moreover, we propose the addition of a sketch simplification network that offers the user the choice of preprocessing and simplifying input sketches for enhanced outputs. The experimental results, corroborated by user feedback, demonstrate that our proposed U-Net latent edge predictor leads to more realistic results, that are better aligned with the spatial outlines of the reference sketches, while drastically reducing the number of required denoising steps and, consequently, the overall execution time.

--------------------------------------------------------------------------------------------------------

Towards a Zero-Data, Controllable, Adaptive Dialog System

Towards a Zero-Data, Controllable, Adaptive Dialog System explores generating synthetic training data from dialog trees to build adaptive conversational agents without human data. This approach could facilitate deploying customized virtual assistants across diverse domains efficiently.

Authors:  Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu

Link:  https://arxiv.org/abs/2403.17582v1

Date: 2024-03-26

Summary:

Conversational Tree Search (V\"ath et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree. The agent learns to efficiently navigate this tree, while adapting to information needs, e.g., domain familiarity, of different users. However, the need for additional training data hinders deployment in new domains. To address this, we explore approaches to generate this data directly from dialog trees. We improve the original approach, and show that agents trained on synthetic data can achieve comparable dialog success to models trained on human data, both when using a commercial Large Language Model for generation, or when using a smaller open-source model, running on a single GPU. We further demonstrate the scalability of our approach by collecting and testing on two new datasets: ONBOARD, a new domain helping foreign residents moving to a new city, and the medical domain DIAGNOSE, a subset of Wikipedia articles related to scalp and head symptoms. Finally, we perform human testing, where no statistically significant differences were found in either objective or subjective measures between models trained on human and generated data.

--------------------------------------------------------------------------------------------------------

Backpropagation through space, time, and the brain

Backpropagation through space, time, and the brain introduces a biologically plausible framework for credit assignment in neural networks, addressing the non-locality constraints of backpropagation. This work could inspire more brain-like learning algorithms for artificial and neuromorphic computing systems.

Authors:  Benjamin Ellenberger, Paul Haider, Jakob Jordan, Kevin Max, Ismael Jaras, Laura Kriener, Federico Benitez, Mihai A. Petrovici

Link:  https://arxiv.org/abs/2403.16933v1

Date: 2024-03-25

Summary:

Effective learning in neuronal networks requires the adaptation of individual synapses given their relative contribution to solving a task. However, physical neuronal systems -- whether biological or artificial -- are constrained by spatio-temporal locality. How such networks can perform efficient credit assignment, remains, to a large extent, an open question. In Machine Learning, the answer is almost universally given by the error backpropagation algorithm, through both space (BP) and time (BPTT). However, BP(TT) is well-known to rely on biologically implausible assumptions, in particular with respect to spatiotemporal (non-)locality, while forward-propagation models such as real-time recurrent learning (RTRL) suffer from prohibitive memory constraints. We introduce Generalized Latent Equilibrium (GLE), a computational framework for fully local spatio-temporal credit assignment in physical, dynamical networks of neurons. We start by defining an energy based on neuron-local mismatches, from which we derive both neuronal dynamics via stationarity and parameter dynamics via gradient descent. The resulting dynamics can be interpreted as a real-time, biologically plausible approximation of BPTT in deep cortical networks with continuous-time neuronal dynamics and continuously active, local synaptic plasticity. In particular, GLE exploits the ability of biological neurons to phase-shift their output rate with respect to their membrane potential, which is essential in both directions of information propagation. For the forward computation, it enables the mapping of time-continuous inputs to neuronal space, performing an effective spatiotemporal convolution. For the backward computation, it permits the temporal inversion of feedback signals, which consequently approximate the adjoint states necessary for useful parameter updates.

--------------------------------------------------------------------------------------------------------

An Expert is Worth One Token: Synergizing Multiple Expert LLMs as Generalist via Expert Token Routing

An Expert is Worth One Token presents a framework for synergizing multiple expert language models into a generalist system via token routing. This could enable building versatile AI assistants that combine and leverage specialized capabilities seamlessly.

Authors:  Ziwei Chai, Guoyin Wang, Jing Su, Tianjie Zhang, Xuanwen Huang, Xuwu Wang, Jingjing Xu, Jianbo Yuan, Hongxia Yang, Fei Wu, Yang Yang

Link:  https://arxiv.org/abs/2403.16854v1

Date: 2024-03-25

Summary:

We present Expert-Token-Routing, a unified generalist framework that facilitates seamless integration of multiple expert LLMs. Our framework represents expert LLMs as special expert tokens within the vocabulary of a meta LLM. The meta LLM can route to an expert LLM like generating new tokens. Expert-Token-Routing not only supports learning the implicit expertise of expert LLMs from existing instruction dataset but also allows for dynamic extension of new expert LLMs in a plug-and-play manner. It also conceals the detailed collaboration process from the user's perspective, facilitating interaction as though it were a singular LLM. Our framework outperforms various existing multi-LLM collaboration paradigms across benchmarks that incorporate six diverse expert domains, demonstrating effectiveness and robustness in building generalist LLM system via synergizing multiple expert LLMs.

--------------------------------------------------------------------------------------------------------

NSINA: A News Corpus for Sinhala

NSINA contributes a large news corpus and benchmarks for the low-resource Sinhala language, aiming to improve natural language processing capabilities. This resource could drive advancements in Sinhala language technology and information access.

Authors:  Hansi Hettiarachchi, Damith Premasiri, Lasitha Uyangodage, Tharindu Ranasinghe

Link:  https://arxiv.org/abs/2403.16571v1

Date: 2024-03-25

Summary:

The introduction of large language models (LLMs) has advanced natural language processing (NLP), but their effectiveness is largely dependent on pre-training resources. This is especially evident in low-resource languages, such as Sinhala, which face two primary challenges: the lack of substantial training data and limited benchmarking datasets. In response, this study introduces NSINA, a comprehensive news corpus of over 500,000 articles from popular Sinhala news websites, along with three NLP tasks: news media identification, news category prediction, and news headline generation. The release of NSINA aims to provide a solution to challenges in adapting LLMs to Sinhala, offering valuable resources and benchmarks for improving NLP in the Sinhala language. NSINA is the largest news corpus for Sinhala, available up to date.

--------------------------------------------------------------------------------------------------------

The use of ChatGPT in higher education: The advantages and disadvantages

The use of ChatGPT in higher education examines the advantages and disadvantages of leveraging this AI system for instruction and learning. This analysis could guide effective integration of ChatGPT in educational contexts while mitigating potential drawbacks.

Authors:  Joshua Ebere Chukwuere

Link:  https://arxiv.org/abs/2403.19245v1

Date: 2024-03-28

Summary:

Higher education scholars are interested in an artificial intelligence (AI) technology called ChatGPT, which was developed by OpenAI. Whether ChatGPT can improve learning is still a topic of debate among experts. This concise overview of the literature examines the application of ChatGPT in higher education to comprehend and produce high-level instruction. By examining the essential literature, this study seeks to provide a thorough assessment of the advantages and disadvantages of utilizing ChatGPT in higher education settings. But it's crucial to consider both the positive and negative elements. For this rapid review, the researcher searched Google Scholar, Scopus, and others between January 2023 and July 2023 for prior research from various publications. These studies were examined. The study found that employing ChatGPT in higher education is beneficial for a number of reasons. It can provide individualized instruction, and prompt feedback, facilitate access to learning, and promote student interaction. These benefits could improve the learning environment and make it more fun for academics and students. The cons of ChatGPT are equally present. These problems include the inability to comprehend emotions, the lack of social interaction chances, technological limitations, and the dangers of depending too much on ChatGPT for higher education. Higher education should combine ChatGPT with other teaching techniques to provide students and lecturers with a comprehensive education. However, it is crucial to consider the positives, negatives, and moral issues before adopting ChatGPT in the classroom.

--------------------------------------------------------------------------------------------------------

Paths to Equilibrium in Normal-Form Games

Paths to Equilibrium in Normal-Form Games studies strategy dynamics in multi-agent reinforcement learning, with implications for the capabilities and limitations of certain algorithm classes. This work could inform the design of robust multi-agent systems.

Authors:  Bora Yongacoglu, Gürdal Arslan, Lacra Pavel, Serdar Yüksel

Link:  https://arxiv.org/abs/2403.18079v1

Date: 2024-03-26

Summary:

In multi-agent reinforcement learning (MARL), agents repeatedly interact across time and revise their strategies as new data arrives, producing a sequence of strategy profiles. This paper studies sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning, where an agent who is best responding in period $t$ does not switch its strategy in the next period $t+1$. This constraint merely requires that optimizing agents do not switch strategies, but does not constrain the other non-optimizing agents in any way, and thus allows for exploration. Sequences with this property are called satisficing paths, and arise naturally in many MARL algorithms. A fundamental question about strategic dynamics is such: for a given game and initial strategy profile, is it always possible to construct a satisficing path that terminates at an equilibrium strategy? The resolution of this question has implications about the capabilities or limitations of a class of MARL algorithms. We answer this question in the affirmative for mixed extensions of finite normal-form games.%

--------------------------------------------------------------------------------------------------------

Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art

Hallucination Detection in Foundation Models investigates methods for quantifying certainty and detecting hallucinations when employing foundation models for decision-making tasks in autonomous systems. This line of research could enhance the safety and reliability of AI-powered robots and assistants.

Authors:  Neeloy Chakraborty, Melkior Ornik, Katherine Driggs-Campbell

Link:  https://arxiv.org/abs/2403.16527v1

Date: 2024-03-25

Summary:

Autonomous systems are soon to be ubiquitous, from manufacturing autonomy to agricultural field robots, and from health care assistants to the entertainment industry. The majority of these systems are developed with modular sub-components for decision-making, planning, and control that may be hand-engineered or learning-based. While these existing approaches have been shown to perform well under the situations they were specifically designed for, they can perform especially poorly in rare, out-of-distribution scenarios that will undoubtedly arise at test-time. The rise of foundation models trained on multiple tasks with impressively large datasets from a variety of fields has led researchers to believe that these models may provide common sense reasoning that existing planners are missing. Researchers posit that this common sense reasoning will bridge the gap between algorithm development and deployment to out-of-distribution tasks, like how humans adapt to unexpected scenarios. Large language models have already penetrated the robotics and autonomous systems domains as researchers are scrambling to showcase their potential use cases in deployment. While this application direction is very promising empirically, foundation models are known to hallucinate and generate decisions that may sound reasonable, but are in fact poor. We argue there is a need to step back and simultaneously design systems that can quantify the certainty of a model's decision, and detect when it may be hallucinating. In this work, we discuss the current use cases of foundation models for decision-making tasks, provide a general definition for hallucinations with examples, discuss existing approaches to hallucination detection and mitigation with a focus on decision problems, and explore areas for further research in this exciting field.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.