Week Ending 1.19.2025

RESEARCH WATCH: 1.19.2025

Good things come in small packages: Should we adopt Lite-GPUs in AI infrastructure?

The AI industry's approach to GPU scaling has focused on creating increasingly large, complex, and expensive units. However, this strategy faces physical limitations in terms of packaging, yield, and cooling. This paper proposes a paradigm shift towards "Lite-GPUs" - smaller, simpler GPU units connected efficiently through advanced optical networking. This approach could revolutionize AI infrastructure by offering better manufacturing costs, reduced failure impact, improved yields, and enhanced power efficiency. The research has significant implications for data centers and AI companies looking to scale their computing capabilities sustainably.

Authors: Burcu Canakci, Junyi Liu, Xingbo Wu, Nathanaël Cheriere, Paolo Costa, Sergey Legtchenko, Dushyanth Narayanan, Ant Rowstron

Link: https://arxiv.org/abs/2501.10187v1

Date: 2025-01-17

Summary:

To match the blooming demand of generative AI workloads, GPU designers have so far been trying to pack more and more compute and memory into single complex and expensive packages. However, there is growing uncertainty about the scalability of individual GPUs and thus AI clusters, as state-of-the-art GPUs are already displaying packaging, yield, and cooling limitations. We propose to rethink the design and scaling of AI clusters through efficiently-connected large clusters of Lite-GPUs, GPUs with single, small dies and a fraction of the capabilities of larger GPUs. We think recent advances in co-packaged optics can be key in overcoming the communication challenges of distributing AI workloads onto more Lite-GPUs. In this paper, we present the key benefits of Lite-GPUs on manufacturing cost, blast radius, yield, and power efficiency; and discuss systems opportunities and challenges around resource, workload, memory, and network management.

--------------------------------------------------------------------------------------------------------

Conformal Prediction Sets with Improved Conditional Coverage using Trust Scores

Traditional machine learning models often provide predictions without reliable measures of uncertainty. While conformal prediction offers statistical guarantees on overall accuracy, it doesn't ensure reliability for individual predictions. This paper introduces an innovative approach that focuses on improving prediction reliability specifically when classifiers are overconfident in their wrong predictions. The method uses trust scores to measure deviation from optimal performance and adjusts confidence accordingly. This research has practical applications in medical diagnosis, autonomous systems, and financial risk assessment where understanding prediction reliability is crucial.

Authors: Jivat Neet Kaur, Michael I. Jordan, Ahmed Alaa

Link: https://arxiv.org/abs/2501.10139v1

Date: 2025-01-17

Summary:

Standard conformal prediction offers a marginal guarantee on coverage, but for prediction sets to be truly useful, they should ideally ensure coverage conditional on each test point. Unfortunately, it is impossible to achieve exact, distribution-free conditional coverage in finite samples. In this work, we propose an alternative conformal prediction algorithm that targets coverage where it matters most--in instances where a classifier is overconfident in its incorrect predictions. We start by dissecting miscoverage events in marginally-valid conformal prediction, and show that miscoverage rates vary based on the classifier's confidence and its deviation from the Bayes optimal classifier. Motivated by this insight, we develop a variant of conformal prediction that targets coverage conditional on a reduced set of two variables: the classifier's confidence in a prediction and a nonparametric trust score that measures its deviation from the Bayes classifier. Empirical evaluation on multiple image datasets shows that our method generally improves conditional coverage properties compared to standard conformal prediction, including class-conditional coverage, coverage over arbitrary subgroups, and coverage over demographic groups.

--------------------------------------------------------------------------------------------------------

Generative Medical Image Anonymization Based on Latent Code Projection and Optimization

Medical data privacy is a critical concern in healthcare, requiring effective anonymization while maintaining clinical utility. This paper presents a two-stage solution using latent code projection and optimization to anonymize medical images. The approach preserves essential medical information while removing identifying features, demonstrated through experiments on chest X-rays. This technology could enable broader sharing of medical imaging data for research and development of AI diagnostic tools while ensuring patient privacy compliance with regulations like HIPAA.

Authors: Huiyu Li, Nicholas Ayache, Hervé Delingette

Link: https://arxiv.org/abs/2501.09114v1

Date: 2025-01-15

Summary:

Medical image anonymization aims to protect patient privacy by removing identifying information, while preserving the data utility to solve downstream tasks. In this paper, we address the medical image anonymization problem with a two-stage solution: latent code projection and optimization. In the projection stage, we design a streamlined encoder to project input images into a latent space and propose a co-training scheme to enhance the projection process. In the optimization stage, we refine the latent code using two deep loss functions designed to address the trade-off between identity protection and data utility dedicated to medical images. Through a comprehensive set of qualitative and quantitative experiments, we showcase the effectiveness of our approach on the MIMIC-CXR chest X-ray dataset by generating anonymized synthetic images that can serve as training set for detecting lung pathologies. Source codes are available at https://github.com/Huiyu-Li/GMIA.

--------------------------------------------------------------------------------------------------------

Consistency of Responses and Continuations Generated by Large Language Models on Social Media

As Large Language Models become increasingly prevalent in social media applications, understanding their emotional and semantic consistency becomes crucial. This study examines how Gemma and Llama handle emotional content and maintain coherence in social media contexts, particularly in climate change discussions. The research reveals distinct patterns in how these models handle emotions and semantic relationships, with implications for developing more natural and appropriate AI interactions on social media platforms, content moderation, and automated customer service.

Authors: Wenlu Fan, Yuqi Zhu, Chenyang Wang, Bin Wang, Wentao Xu

Link: https://arxiv.org/abs/2501.08102v2

Date: 2025-01-15

Summary:

Large Language Models (LLMs) demonstrate remarkable capabilities in text generation, yet their emotional consistency and semantic coherence in social media contexts remain insufficiently understood. This study investigates how LLMs handle emotional content and maintain semantic relationships through continuation and response tasks using two open-source models: Gemma and Llama. By analyzing climate change discussions from Twitter and Reddit, we examine emotional transitions, intensity patterns, and semantic similarity between human-authored and LLM-generated content. Our findings reveal that while both models maintain high semantic coherence, they exhibit distinct emotional patterns: Gemma shows a tendency toward negative emotion amplification, particularly anger, while maintaining certain positive emotions like optimism. Llama demonstrates superior emotional preservation across a broader spectrum of affects. Both models systematically generate responses with attenuated emotional intensity compared to human-authored content and show a bias toward positive emotions in response tasks. Additionally, both models maintain strong semantic similarity with original texts, though performance varies between continuation and response tasks. These findings provide insights into LLMs' emotional and semantic processing capabilities, with implications for their deployment in social media contexts and human-AI interaction design.

--------------------------------------------------------------------------------------------------------

Derivation of effective gradient flow equations and dynamical truncation of training data in Deep Learning

Deep learning's success often outpaces our theoretical understanding of how it works. This paper provides mathematical insights into how neural networks with ReLU activation functions learn from data through gradient descent. The research reveals that learning occurs through progressive simplification of data clusters at varying rates. This theoretical work has practical implications for optimizing neural network training and could help develop more efficient and interpretable AI systems.

Authors: Thomas Chen

Link: https://arxiv.org/abs/2501.07400v1

Date: 2025-01-13

Summary:

We derive explicit equations governing the cumulative biases and weights in Deep Learning with ReLU activation function, based on gradient descent for the Euclidean cost in the input layer, and under the assumption that the weights are, in a precise sense, adapted to the coordinate system distinguished by the activations. We show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. We provide a detailed discussion of several types of solutions to the gradient flow equations. A main motivation for this work is to shed light on the interpretability question in supervised learning.

--------------------------------------------------------------------------------------------------------

Aligning Instruction Tuning with Pre-training

Large Language Models' performance depends heavily on the quality and diversity of their training data. This paper addresses a critical gap between pre-training and instruction tuning by proposing AITP, a method to better align these phases. By identifying and filling coverage gaps in instruction-tuning datasets, the approach improves model performance across various tasks. This research could enhance the development of more capable and generalized AI systems while making better use of pre-trained knowledge.

Authors: Yiming Liang, Tianyu Zheng, Xinrun Du, Ge Zhang, Xingwei Qu, Xiang Yue, Chujie Zheng, Jiaheng Liu, Lei Ma, Wenhu Chen, Guoyin Wang, Zhaoxiang Zhang, Wenhao Huang, Jiajun Zhang

Link: https://arxiv.org/abs/2501.09368v2

Date: 2025-01-17

Summary:

Instruction tuning enhances large language models (LLMs) to follow human instructions across diverse tasks, relying on high-quality datasets to guide behavior. However, these datasets, whether manually curated or synthetically generated, are often narrowly focused and misaligned with the broad distributions captured during pre-training, limiting LLM generalization and effective use of pre-trained knowledge. We propose *Aligning Instruction Tuning with Pre-training* (AITP), a method that bridges this gap by identifying coverage shortfalls in instruction-tuning datasets and rewriting underrepresented pre-training data into high-quality instruction-response pairs. This approach enriches dataset diversity while preserving task-specific objectives. Evaluations on three fully open LLMs across eight benchmarks demonstrate consistent performance improvements with AITP. Ablations highlight the benefits of adaptive data selection, controlled rewriting, and balanced integration, emphasizing the importance of aligning instruction tuning with pre-training distributions to unlock the full potential of LLMs.

--------------------------------------------------------------------------------------------------------

Exploring the Efficacy of Meta-Learning: Unveiling Superior Data Diversity Utilization of MAML Over Pre-training

While the AI field focuses heavily on data and model size, this study examines how dataset diversity affects model performance. The research compares pre-training and meta-learning approaches across various visual datasets, finding significant correlations between data diversity and model accuracy. The findings suggest that Meta-agnostic meta-learning (MAML) better utilizes diverse data than traditional pre-training methods. This has implications for efficient model training and dataset curation in computer vision applications.

Authors: Kavita Selva, Satita Vittayaareekul, Brando Miranda

Link: https://arxiv.org/abs/2501.08506v1

Date: 2025-01-15

Summary:

Currently, data and model size dominate the narrative in the training of super-large, powerful models. However, there has been a lack of exploration on the effect of other attributes of the training dataset on model performance. We hypothesize that dataset diversity can impact the performance of vision models. Our study shows positive correlations between test set accuracy and data diversity, providing an argument for furthering the research of dataset attributes beyond size. We analyzed pre-training and model-agnostic meta-learning methods on twelve popular visual datasets (e.g., Omniglot, CIFAR-FS, Aircraft) and five model configurations, including MAML variants with different numbers of inner gradient steps and supervised learning. We show moderate to strong positive correlations (R-squared: 0.15-0.42) between accuracy and data diversity and weaker but significant correlations (R-squared: ~0.2) between loss and diversity. These findings support our hypothesis and demonstrate a promising way for a deeper exploration of how formal data diversity influences model performance. This initial study highlights the potential of (Task2Vec) data diversity as a valuable measure in the rapidly evolving field of large-scale learning and emphasizes that understanding the dataset is key to building more powerful and generalizable models.

--------------------------------------------------------------------------------------------------------

Do generative video models learn physical principles from watching videos?

As AI video generation advances rapidly, this paper addresses a fundamental question: do these models truly understand physics, or are they just sophisticated pattern matchers? The researchers developed Physics-IQ, a benchmark testing physical understanding across various principles like fluid dynamics and optics. Testing leading models including Sora and Lumiere reveals limited physical understanding despite visual realism. This research has implications for developing more physically accurate AI systems for applications in simulation, robotics, and virtual environments.

Authors: Saman Motamed, Laura Culp, Kevin Swersky, Priyank Jaini, Robert Geirhos

Link: https://arxiv.org/abs/2501.09038v1

Date: 2025-01-14

Summary:

AI video generation is undergoing a revolution, with quality and realism advancing rapidly. These advances have led to a passionate scientific debate: Do video models learn ``world models'' that discover laws of physics -- or, alternatively, are they merely sophisticated pixel predictors that achieve visual realism without understanding the physical principles of reality? We address this question by developing Physics-IQ, a comprehensive benchmark dataset that can only be solved by acquiring a deep understanding of various physical principles, like fluid dynamics, optics, solid mechanics, magnetism and thermodynamics. We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism. At the same time, some test cases can already be successfully solved. This indicates that acquiring certain physical principles from observation alone may be possible, but significant challenges remain. While we expect rapid advances ahead, our work demonstrates that visual realism does not imply physical understanding. Our project page is at https://physics-iq.github.io; code at https://github.com/google-deepmind/physics-IQ-benchmark.

--------------------------------------------------------------------------------------------------------

Explore the Use of Time Series Foundation Model for Car-Following Behavior Analysis

Car-following behavior modeling is crucial for traffic simulation and autonomous vehicle development. This paper applies the Chronos foundation model to analyze car-following behavior, demonstrating superior performance compared to traditional approaches. By leveraging pre-trained knowledge from diverse time series data, the model achieves better accuracy with minimal fine-tuning. This research has practical applications in traffic management, autonomous vehicle development, and transportation system design.

Authors: Luwei Zeng, Runze Yan

Link: https://arxiv.org/abs/2501.07034v1

Date: 2025-01-13

Summary:

Modeling car-following behavior is essential for traffic simulation, analyzing driving patterns, and understanding complex traffic flows with varying levels of autonomous vehicles. Traditional models like the Safe Distance Model and Intelligent Driver Model (IDM) require precise parameter calibration and often lack generality due to simplified assumptions about driver behavior. While machine learning and deep learning methods capture complex patterns, they require large labeled datasets. Foundation models provide a more efficient alternative. Pre-trained on vast, diverse time series datasets, they can be applied directly to various tasks without the need for extensive re-training. These models generalize well across domains, and with minimal fine-tuning, they can be adapted to specific tasks like car-following behavior prediction. In this paper, we apply Chronos, a state-of-the-art public time series foundation model, to analyze car-following behavior using the Open ACC dataset. Without fine-tuning, Chronos outperforms traditional models like IDM and Exponential smoothing with trend and seasonality (ETS), and achieves similar results to deep learning models such as DeepAR and TFT, with an RMSE of 0.60. After fine-tuning, Chronos reduces the error to an RMSE of 0.53, representing a 33.75% improvement over IDM and a 12-37% reduction compared to machine learning models like ETS and deep learning models including DeepAR, WaveNet, and TFT. This demonstrates the potential of foundation models to significantly advance transportation research, offering a scalable, adaptable, and highly accurate approach to predicting and simulating car-following behaviors.

--------------------------------------------------------------------------------------------------------

Reducing the Sensitivity of Neural Physics Simulators to Mesh Topology via Pretraining

Neural physics simulators offer promising speed advantages but can be sensitive to variations in mesh representation. This paper demonstrates how pretraining can reduce this sensitivity, making neural network simulators more robust to mesh topology changes. The research has practical applications in radar sensing, aerodynamics simulation, and other fields requiring high-fidelity physics simulations on complex geometries.

Authors: Nathan Vaska, Justin Goodwin, Robin Walters, Rajmonda S. Caceres

Link: https://arxiv.org/abs/2501.09597v1

Date: 2025-01-16

Summary:

Meshes are used to represent complex objects in high fidelity physics simulators across a variety of domains, such as radar sensing and aerodynamics. There is growing interest in using neural networks to accelerate physics simulations, and also a growing body of work on applying neural networks directly to irregular mesh data. Since multiple mesh topologies can represent the same object, mesh augmentation is typically required to handle topological variation when training neural networks. Due to the sensitivity of physics simulators to small changes in mesh shape, it is challenging to use these augmentations when training neural network-based physics simulators. In this work, we show that variations in mesh topology can significantly reduce the performance of neural network simulators. We evaluate whether pretraining can be used to address this issue, and find that employing an established autoencoder pretraining technique with graph embedding models reduces the sensitivity of neural network simulators to variations in mesh topology. Finally, we highlight future research directions that may further reduce neural simulator sensitivity to mesh topology.

--------------------------------------------------------------------------------------------------------

ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind

Theory of Mind (ToM) benchmarking for AI systems has traditionally been limited in scope and realism. This paper introduces ToMATO, a novel benchmark that evaluates AI systems' ability to understand diverse mental states through LLM-LLM conversations. By having AI systems verbalize their thoughts during role-playing interactions, the benchmark tests understanding of beliefs, intentions, desires, emotions, and knowledge. This research has implications for developing more socially aware AI systems for applications in customer service, education, and human-AI interaction.

Authors: Kazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida, Saki Mizuno, Keita Suzuki, Ryo Masumura, Hiroaki Sugiyama, Kuniko Saito

Link: https://arxiv.org/abs/2501.08838v1

Date: 2025-01-15

Summary:

Existing Theory of Mind (ToM) benchmarks diverge from real-world scenarios in three aspects: 1) they assess a limited range of mental states such as beliefs, 2) false beliefs are not comprehensively explored, and 3) the diverse personality traits of characters are overlooked. To address these challenges, we introduce ToMATO, a new ToM benchmark formulated as multiple-choice QA over conversations. ToMATO is generated via LLM-LLM conversations featuring information asymmetry. By employing a prompting method that requires role-playing LLMs to verbalize their thoughts before each utterance, we capture both first- and second-order mental states across five categories: belief, intention, desire, emotion, and knowledge. These verbalized thoughts serve as answers to questions designed to assess the mental states of characters within conversations. Furthermore, the information asymmetry introduced by hiding thoughts from others induces the generation of false beliefs about various mental states. Assigning distinct personality traits to LLMs further diversifies both utterances and thoughts. ToMATO consists of 5.4k questions, 753 conversations, and 15 personality trait patterns. Our analysis shows that this dataset construction approach frequently generates false beliefs due to the information asymmetry between role-playing LLMs, and effectively reflects diverse personalities. We evaluate nine LLMs on ToMATO and find that even GPT-4o mini lags behind human performance, especially in understanding false beliefs, and lacks robustness to various personality traits.

--------------------------------------------------------------------------------------------------------

BBPOS: BERT-based Part-of-Speech Tagging for Uzbek

Advancing natural language processing for low-resource languages is crucial for digital inclusion. This paper evaluates BERT models for Uzbek part-of-speech tagging and introduces the first public UPOS-tagged dataset for the language. The research demonstrates that fine-tuned models can effectively capture complex linguistic features and context. This work has practical applications in machine translation, text analysis, and language technology development for Uzbek speakers.

Authors: Latofat Bobojonova, Arofat Akhundjanova, Phil Ostheimer, Sophie Fellenz

Link: https://arxiv.org/abs/2501.10107v1

Date: 2025-01-17

Summary:

This paper advances NLP research for the low-resource Uzbek language by evaluating two previously untested monolingual Uzbek BERT models on the part-of-speech (POS) tagging task and introducing the first publicly available UPOS-tagged benchmark dataset for Uzbek. Our fine-tuned models achieve 91% average accuracy, outperforming the baseline multi-lingual BERT as well as the rule-based tagger. Notably, these models capture intermediate POS changes through affixes and demonstrate context sensitivity, unlike existing rule-based taggers.

--------------------------------------------------------------------------------------------------------

ANSR-DT: An Adaptive Neuro-Symbolic Learning and Reasoning Framework for Digital Twins

Digital twins require sophisticated AI systems that can adapt and learn in real-time. This paper presents ANSR-DT, a framework combining pattern recognition, reinforcement learning, and symbolic reasoning for digital twin applications. The approach enables better decision-making in human-machine collaboration scenarios. This research has applications in industrial automation, smart manufacturing, and systems monitoring where real-time adaptation and human interaction are crucial.

Authors: Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, Houbing Herbert Song

Link: https://arxiv.org/abs/2501.08561v1

Date: 2025-01-15

Summary:

In this paper, we propose an Adaptive Neuro-Symbolic Learning Framework for digital twin technology called ``ANSR-DT." Our approach combines pattern recognition algorithms with reinforcement learning and symbolic reasoning to enable real-time learning and adaptive intelligence. This integration enhances the understanding of the environment and promotes continuous learning, leading to better and more effective decision-making in real-time for applications that require human-machine collaboration. We evaluated the \textit{ANSR-DT} framework for its ability to learn and adapt to dynamic patterns, observing significant improvements in decision accuracy, reliability, and interpretability when compared to existing state-of-the-art methods. However, challenges still exist in extracting and integrating symbolic rules in complex environments, which limits the full potential of our framework in heterogeneous settings. Moreover, our ongoing research aims to address this issue in the future by ensuring seamless integration of neural models at large. In addition, our open-source implementation promotes reproducibility and encourages future research to build on our foundational work.

--------------------------------------------------------------------------------------------------------

A Comparative Analysis of DNN-based White-Box Explainable AI Methods in Network Security

Network security increasingly relies on AI-based intrusion detection, but these systems need to be explainable for practical use. This paper evaluates white-box explainable AI techniques for network intrusion detection, comparing them with black-box methods across multiple datasets. The research provides insights into which explanation methods are most suitable for security applications. This work has immediate applications in cybersecurity, network monitoring, and security analyst training.

Authors: Osvaldo Arreche, Mustafa Abdallah

Link: https://arxiv.org/abs/2501.07801v1

Date: 2025-01-14

Summary:

New research focuses on creating artificial intelligence (AI) solutions for network intrusion detection systems (NIDS), drawing its inspiration from the ever-growing number of intrusions on networked systems, increasing its complexity and intelligibility. Hence, the use of explainable AI (XAI) techniques in real-world intrusion detection systems comes from the requirement to comprehend and elucidate black-box AI models to security analysts. In an effort to meet such requirements, this paper focuses on applying and evaluating White-Box XAI techniques (particularly LRP, IG, and DeepLift) for NIDS via an end-to-end framework for neural network models, using three widely used network intrusion datasets (NSL-KDD, CICIDS-2017, and RoEduNet-SIMARGL2021), assessing its global and local scopes, and examining six distinct assessment measures (descriptive accuracy, sparsity, stability, robustness, efficiency, and completeness). We also compare the performance of white-box XAI methods with black-box XAI methods. The results show that using White-box XAI techniques scores high in robustness and completeness, which are crucial metrics for IDS. Moreover, the source codes for the programs developed for our XAI evaluation framework are available to be improved and used by the research community.

--------------------------------------------------------------------------------------------------------

Optimization of Link Configuration for Satellite Communication Using Reinforcement Learning

Satellite communication systems face complex challenges in optimizing link configurations on transponders. This paper compares reinforcement learning (PPO algorithm) with traditional metaheuristic methods for optimizing satellite link configuration. While simulated annealing currently outperforms RL, the research highlights potential future applications of AI in satellite communications. This work has implications for improving satellite network efficiency and resource allocation.

Authors: Tobias Rohe, Michael Kölle, Jan Matheis, Rüdiger Höpfl, Leo Sünkel, Claudia Linnhoff-Popien

Link: https://arxiv.org/abs/2501.08220v1

Date: 2025-01-14

Summary:

Satellite communication is a key technology in our modern connected world. With increasingly complex hardware, one challenge is to efficiently configure links (connections) on a satellite transponder. Planning an optimal link configuration is extremely complex and depends on many parameters and metrics. The optimal use of the limited resources, bandwidth and power of the transponder is crucial. Such an optimization problem can be approximated using metaheuristic methods such as simulated annealing, but recent research results also show that reinforcement learning can achieve comparable or even better performance in optimization methods. However, there have not yet been any studies on link configuration on satellite transponders. In order to close this research gap, a transponder environment was developed as part of this work. For this environment, the performance of the reinforcement learning algorithm PPO was compared with the metaheuristic simulated annealing in two experiments. The results show that Simulated Annealing delivers better results for this static problem than the PPO algorithm, however, the research in turn also underlines the potential of reinforcement learning for optimization problems.

--------------------------------------------------------------------------------------------------------

Assessment of Personalized Learning in Immersive and Intelligent Virtual Classroom on Student Engagement

Virtual classrooms are evolving to provide more personalized learning experiences. This paper explores using eye-tracking technology to evaluate student engagement in intelligent virtual environments. The research examines how personalized approaches affect student participation and performance. This work has direct applications in online education, educational technology development, and learning analytics.

Authors: Ying Weng, Yiming Zhang

Link: https://arxiv.org/abs/2501.07883v1

Date: 2025-01-14

Summary:

As trends in education evolve, personalized learning has transformed individuals' engagement with knowledge and skill development. In the digital age, state-of-the-art technologies have been increasingly integrated into classrooms to support intelligent education and foster personalized learning experiences. One promising approach is the use of eye-tracking technology to evaluate student engagement in intelligent virtual classrooms. This paper explores the assessment of personalized learning in the virtual classroom and its impact on student engagement through the eye movement paradigm. The study aims to provide insights into how personalized learning approaches can enhance student participation, motivation, and academic performance in the online learning environment. Through a comprehensive literature review, case study, and data analysis, the paper examines the key elements of personalized learning, the methods of assessment, and the resulting effects on student engagement. The findings suggest that the eye movement paradigm has the potential to assess student engagement and promote better educational outcomes.

--------------------------------------------------------------------------------------------------------

Logic Meets Magic: LLMs Cracking Smart Contract Vulnerabilities

Smart contract vulnerabilities have caused significant financial losses in blockchain applications. This paper evaluates the effectiveness of various Large Language Models in detecting vulnerabilities in Solidity v0.8 smart contracts. The research reveals both the potential and limitations of using LLMs for security analysis. This work has immediate applications in blockchain security, smart contract development, and automated code auditing.

Authors: ZeKe Xiao, Qin Wang, Hammond Pearce, Shiping Chen

Link: https://arxiv.org/abs/2501.07058v1

Date: 2025-01-13

Summary:

Smart contract vulnerabilities caused significant economic losses in blockchain applications. Large Language Models (LLMs) provide new possibilities for addressing this time-consuming task. However, state-of-the-art LLM-based detection solutions are often plagued by high false-positive rates. In this paper, we push the boundaries of existing research in two key ways. First, our evaluation is based on Solidity v0.8, offering the most up-to-date insights compared to prior studies that focus on older versions (v0.4). Second, we leverage the latest five LLM models (across companies), ensuring comprehensive coverage across the most advanced capabilities in the field. We conducted a series of rigorous evaluations. Our experiments demonstrate that a well-designed prompt can reduce the false-positive rate by over 60%. Surprisingly, we also discovered that the recall rate for detecting some specific vulnerabilities in Solidity v0.8 has dropped to just 13% compared to earlier versions (i.e., v0.4). Further analysis reveals the root cause of this decline: the reliance of LLMs on identifying changes in newly introduced libraries and frameworks during detection.

--------------------------------------------------------------------------------------------------------

CyberMentor: AI Powered Learning Tool Platform to Address Diverse Student Needs in Cybersecurity Education

Non-traditional cybersecurity students often lack adequate support and guidance. This paper introduces CyberMentor, an AI-powered platform providing personalized support for cybersecurity education. Using agentic workflow and LLMs, the system offers contextual guidance and career advice. This research has applications in educational support systems, career development, and personalized learning environments.

Authors: Tianyu Wang, Nianjun Zhou, Zhixiong Chen

Link: https://arxiv.org/abs/2501.09709v1

Date: 2025-01-16

Summary:

Many non-traditional students in cybersecurity programs often lack access to advice from peers, family members and professors, which can hinder their educational experiences. Additionally, these students may not fully benefit from various LLM-powered AI assistants due to issues like content relevance, locality of advice, minimum expertise, and timing. This paper addresses these challenges by introducing an application designed to provide comprehensive support by answering questions related to knowledge, skills, and career preparation advice tailored to the needs of these students. We developed a learning tool platform, CyberMentor, to address the diverse needs and pain points of students majoring in cybersecurity. Powered by agentic workflow and Generative Large Language Models (LLMs), the platform leverages Retrieval-Augmented Generation (RAG) for accurate and contextually relevant information retrieval to achieve accessibility and personalization. We demonstrated its value in addressing knowledge requirements for cybersecurity education and for career marketability, in tackling skill requirements for analytical and programming assignments, and in delivering real time on demand learning support. Using three use scenarios, we showcased CyberMentor in facilitating knowledge acquisition and career preparation and providing seamless skill-based guidance and support. We also employed the LangChain prompt-based evaluation methodology to evaluate the platform's impact, confirming its strong performance in helpfulness, correctness, and completeness. These results underscore the system's ability to support students in developing practical cybersecurity skills while improving equity and sustainability within higher education. Furthermore, CyberMentor's open-source design allows for adaptation across other disciplines, fostering educational innovation and broadening its potential impact.

--------------------------------------------------------------------------------------------------------

Reward-Guided Controlled Generation for Inference-Time Alignment in Diffusion Models: Tutorial and Review

Diffusion models excel at generation tasks but often need optimization for specific metrics. This tutorial explores methods for guiding diffusion models to maximize desired outcomes during inference, without requiring retraining. The research unifies various guidance techniques under a common framework. This work has applications in protein design, drug discovery, and other fields requiring controlled generation of complex structures.

Authors: Masatoshi Uehara, Yulai Zhao, Chenyu Wang, Xiner Li, Aviv Regev, Sergey Levine, Tommaso Biancalani

Link: https://arxiv.org/abs/2501.09685v1

Date: 2025-01-16

Summary:

This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities, practical applications in fields such as biology often require sample generation that maximizes specific metrics (e.g., stability, affinity in proteins, closeness to target structures). In these scenarios, diffusion models can be adapted not only to generate realistic samples but also to explicitly maximize desired measures at inference time without fine-tuning. This tutorial explores the foundational aspects of such inference-time algorithms. We review these methods from a unified perspective, demonstrating that current techniques -- such as Sequential Monte Carlo (SMC)-based guidance, value-based sampling, and classifier guidance -- aim to approximate soft optimal denoising processes (a.k.a. policies in RL) that combine pre-trained denoising processes with value functions serving as look-ahead functions that predict from intermediate states to terminal rewards. Within this framework, we present several novel algorithms not yet covered in the literature. Furthermore, we discuss (1) fine-tuning methods combined with inference-time techniques, (2) inference-time algorithms based on search algorithms such as Monte Carlo tree search, which have received limited attention in current research, and (3) connections between inference-time algorithms in language models and diffusion models. The code of this tutorial on protein design is available at https://github.com/masa-ue/AlignInversePro

--------------------------------------------------------------------------------------------------------

ADAGE: A generic two-layer framework for adaptive agent based modelling

Agent-based models (ABMs) traditionally struggle with adapting to environmental changes. This paper presents ADAGE, a framework for creating adaptive agent-based models that addresses both agent behavior and environmental adaptation. The approach formalizes these interactions as a Stackelberg game. This research has applications in economic modeling, policy design, and complex system simulation where adaptive behavior is crucial.

Authors: Benjamin Patrick Evans, Sihan Zeng, Sumitra Ganesh, Leo Ardon

Link: https://arxiv.org/abs/2501.09429v1

Date: 2025-01-16

Summary:

Agent-based models (ABMs) are valuable for modelling complex, potentially out-of-equilibria scenarios. However, ABMs have long suffered from the Lucas critique, stating that agent behaviour should adapt to environmental changes. Furthermore, the environment itself often adapts to these behavioural changes, creating a complex bi-level adaptation problem. Recent progress integrating multi-agent reinforcement learning into ABMs introduces adaptive agent behaviour, beginning to address the first part of this critique, however, the approaches are still relatively ad hoc, lacking a general formulation, and furthermore, do not tackle the second aspect of simultaneously adapting environmental level characteristics in addition to the agent behaviours. In this work, we develop a generic two-layer framework for ADaptive AGEnt based modelling (ADAGE) for addressing these problems. This framework formalises the bi-level problem as a Stackelberg game with conditional behavioural policies, providing a consolidated framework for adaptive agent-based modelling based on solving a coupled set of non-linear equations. We demonstrate how this generic approach encapsulates several common (previously viewed as distinct) ABM tasks, such as policy design, calibration, scenario generation, and robust behavioural learning under one unified framework. We provide example simulations on multiple complex economic and financial environments, showing the strength of the novel framework under these canonical settings, addressing long-standing critiques of traditional ABMs.

--------------------------------------------------------------------------------------------------------

EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.

Artificial Intelligence, Research WatchCraig SmithJanuary 20, 2025Comment