Eye On AI

View Original

Week Ending 12.31.2023

RESEARCH WATCH: 12.31.2023

SPONSORED BY

Digimarc digital watermarks invisibly guard your digital assets to protect against misuse, prove copyright ownership, and verify authenticity. In an era of artificial intelligence, don’t leave your images and other digital content exposed. Demand superior content protection and maintain trust in your brand with Digimarc.

Checkout Digimarc - https://www.digimarc.com/

Out-of-equilibrium interactions and collective locomotion of colloidal spheres with squirming of nematoelastic multipoles

The paper on colloidal spheres provides insights into how out-of-equilibrium interactions can lead to emergent collective behavior, with potential applications in cargo transport and nanoscale assembly.

Authors:  Bohdan Senyuk, Jin-Sheng Wua, Ivan I. Smalyukh

Link:  https://arxiv.org/abs/2312.17470v1

Date: 2023-12-29

Summary:

Many living and artificial systems show a similar emergent behavior and collective motions on different scales, starting from swarms of bacteria to synthetic active particles, herds of mammals and crowds of people. What all these systems often have in common is that new collective properties like flocking emerge from interactions between individual self-propelled or externally driven units. Such systems are naturally out-of-equilibrium and propel at the expense of consumed energy. Mimicking nature by making self-propelled or externally driven particles and studying their individual and collective motility may allow for deeper understanding of physical underpinnings behind the collective motion of large groups of interacting objects or beings. Here, using a soft matter system of colloids immersed into a liquid crystal, we show that resulting so-called nematoelastic multipoles can be set into a bidirectional locomotion by external periodically oscillating electric fields. Out-of-equilibrium elastic interactions between such colloids lead to collective flock-like behaviors, which emerge from time-varying elasticity-mediated interactions between externally driven propelling particles. The repulsive elastic interactions in the equilibrium state can be turned into attractive interactions in the out-of-equilibrium state under applied electric fields. We probe this behavior at different number densities of colloidal particles and show that particles in a dense dispersion collectively select the same direction of a coherent motion due to elastic interactions between near neighbors. In our experimentally implemented design, their motion is highly ordered and without clustering or jamming often present in other colloidal transport systems, which is promising for technological and fundamental-science applications, like nano-cargo transport, out-of-equilibrium assembly and microrobotics.

--------------------------------------------------------------------------------------------------------

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

The paper on automatic speech recognition summarizes a system for in-car audio that achieves significant improvements over baseline performance. This could enable more accurate voice control and interaction in vehicles.

Authors:  Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

Link:  https://arxiv.org/abs/2312.16002v1

Date: 2023-12-26

Summary:

This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves a relative 34.3% improvement in CER and 56.5% improvement in cpCER, compared to the offical baseline system.

--------------------------------------------------------------------------------------------------------

Automatic laminectomy cutting plane planning based on artificial intelligence in robot assisted laminectomy surgery

The paper on surgical path planning demonstrates an AI method for automatic laminectomy planning. If validated, this could assist surgeons and improve outcomes for laminectomy procedures.

Authors:  Zhuofu Li, Yonghong Zhang, Chengxia Wang, Shanshan Liu, Xiongkang Song, Xuquan Ji, Shuai Jiang, Woquan Zhong, Lei Hu, Weishi Li

Link:  https://arxiv.org/abs/2312.17266v1

Date: 2023-12-26

Summary:

Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by us was used to accurately locate the 7 key points. In the second stage, based on the identification of key points, a personalized coordinate system was generated for each vertebra. Finally, the transverse and longitudinal cutting planes of laminectomy were generated under the coordinate system. The overall effect of planning was evaluated. Results: In the first stage, the average localization error of the SPU-Net algorithm for the seven key points was 0.65mm. In the second stage, a total of 320 transverse cutting planes and 640 longitudinal cutting planes were planned by the algorithm. Among them, the number of horizontal plane planning effects of grade A, B, and C were 318(99.38%), 1(0.31%), and 1(0.31%), respectively. The longitudinal planning effects of grade A, B, and C were 622(97.18%), 1(0.16%), and 17(2.66%), respectively. Conclusions: In this study, we propose a method for automatic surgical path planning of laminectomy based on the localization of key points in CT images. The results showed that the method achieved satisfactory results. More studies are needed to confirm the reliability of this approach in the future.

--------------------------------------------------------------------------------------------------------

MetaScript: Few-Shot Handwritten Chinese Content Generation via Generative Adversarial Networks

The paper on few-shot handwritten Chinese generation proposes an approach to imitate personalized styles, helping preserve individuality in digital communication. It could be applied for stylized fonts or penmanship training.

Authors:  Xiangyuan Xue, Kailing Wang, Jiazi Bu, Qirui Li, Zhiyuan Zhang

Link:  https://arxiv.org/abs/2312.16251v1

Date: 2023-12-25

Summary:

In this work, we propose MetaScript, a novel Chinese content generation system designed to address the diminishing presence of personal handwriting styles in the digital representation of Chinese characters. Our approach harnesses the power of few-shot learning to generate Chinese characters that not only retain the individual's unique handwriting style but also maintain the efficiency of digital typing. Trained on a diverse dataset of handwritten styles, MetaScript is adept at producing high-quality stylistic imitations from minimal style references and standard fonts. Our work demonstrates a practical solution to the challenges of digital typography in preserving the personal touch in written communication, particularly in the context of Chinese script. Notably, our system has demonstrated superior performance in various evaluations, including recognition accuracy, inception score, and Frechet inception distance. At the same time, the training conditions of our model are easy to meet and facilitate generalization to real applications.

--------------------------------------------------------------------------------------------------------

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

The paper on scaling text-to-video generation explores utilizing additional unlabeled video data. This unsupervised approach to increase training data could help improve video generation quality.

Authors:  Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Link:  https://arxiv.org/abs/2312.15770v1

Date: 2023-12-25

Summary:

Diffusion-based text-to-video generation has witnessed impressive progress in the past year yet still falls behind text-to-image generation. One of the key reasons is the limited scale of publicly available data (e.g., 10M video-text pairs in WebVid10M vs. 5B image-text pairs in LAION), considering the high cost of video captioning. Instead, it could be far easier to collect unlabeled clips from video platforms like YouTube. Motivated by this, we come up with a novel text-to-video generation framework, termed TF-T2V, which can directly learn with text-free videos. The rationale behind is to separate the process of text decoding from that of temporal modeling. To this end, we employ a content branch and a motion branch, which are jointly optimized with weights shared. Following such a pipeline, we study the effect of doubling the scale of training set (i.e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9.67 to 8.19 and FVD from 484 to 441), demonstrating the scalability of our approach. We also find that our model could enjoy sustainable performance gain (FID from 8.19 to 7.64 and FVD from 441 to 366) after reintroducing some text labels for training. Finally, we validate the effectiveness and generalizability of our ideology on both native text-to-video generation and compositional video synthesis paradigms. Code and models will be publicly available at https://tf-t2v.github.io/.

--------------------------------------------------------------------------------------------------------

Review of Machine Learning Approaches for Diagnostics and Prognostics of Industrial Systems Using Industrial Open Source Data

The paper reviewing diagnostics and prognostics applies ML to industrial system data. It provides guidelines for PHM, with implications for predictive maintenance and reliability.

Authors:  Hanqi Su, Jay Lee

Link:  https://arxiv.org/abs/2312.16810v1

Date: 2023-12-28

Summary:

In the field of Prognostics and Health Management (PHM), recent years have witnessed a significant surge in the application of machine learning (ML). Despite this growth, the field grapples with a lack of unified guidelines and systematic approaches for effectively implementing these ML techniques and comprehensive analysis regarding industrial open-source data across varied scenarios. To address these gaps, this paper provides a comprehensive review of machine learning approaches for diagnostics and prognostics of industrial systems using open-source datasets from PHM Data Challenge Competitions held between 2018 and 2023 by PHM Society and IEEE Reliability Society and summarizes a unified ML framework. This review systematically categorizes and scrutinizes the problems, challenges, methodologies, and advancements demonstrated in these competitions, highlighting the evolving role of both conventional machine learning and deep learning in tackling complex industrial tasks related to detection, diagnosis, assessment, and prognosis. Moreover, this paper delves into the common challenges in PHM data challenge competitions by emphasizing both data-related and model-related issues and summarizes the solutions that have been employed to address these challenges. Finally, we identify key themes and potential directions for future research, providing opportunities and prospects for ML further development in PHM.

--------------------------------------------------------------------------------------------------------

SETI at FAST in China

The paper on SETI shares recent progress and future plans for China's search for extraterrestrial intelligence. This provides insights into techniques and discoveries in the search for life beyond Earth.

Authors:  Tong-Jie Zhang, Bo-Lun Huang, Jian-Kang Li, Zhen-Zhao Tao, Xiao-Hang Luan, Zhi-Song Zhang, Yu-Chen Wang

Link:  https://arxiv.org/abs/2312.16847v1

Date: 2023-12-28

Summary:

Since the commencement of the first SETI observation in 2019, China's Search for Extraterrestrial Intelligence program has garnered momentum through domestic support and international collaborations. Several observations targeting exoplanets and nearby stars have been conducted with the FAST. In 2023, the introduction of the Far Neighbour Project(FNP) marks a substantial leap forward, driven by the remarkable sensitivity of the FAST telescope and some of the novel observational techniques. The FNP seeks to methodically detect technosignatures from celestial bodies, including nearby stars, exoplanetary systems, Milky Way globular clusters, and more. This paper provides an overview of the progress achieved by SETI in China and offers insights into the distinct phases comprising the FNP. Additionally, it underscores the significance of this project's advancement and its potential contributions to the field.

--------------------------------------------------------------------------------------------------------

AHAM: Adapt, Help, Ask, Model -- Harvesting LLMs for literature mining

The paper on literature mining presents an approach to improve topic modeling through domain adaptation and language model prompting. This could help researchers analyze large text collections more efficiently.

Authors:  Boshko Koloski, Nada Lavrač, Bojan Cestnik, Senja Pollak, Blaž Škrlj, Andrej Kastrin

Link:  https://arxiv.org/abs/2312.15784v1

Date: 2023-12-25

Summary:

In an era marked by a rapid increase in scientific publications, researchers grapple with the challenge of keeping pace with field-specific advances. We present the `AHAM' methodology and a metric that guides the domain-specific \textbf{adapt}ation of the BERTopic topic modeling framework to improve scientific text analysis. By utilizing the LLaMa2 generative language model, we generate topic definitions via one-shot learning by crafting prompts with the \textbf{help} of domain experts to guide the LLM for literature mining by \textbf{asking} it to model the topic names. For inter-topic similarity evaluation, we leverage metrics from language generation and translation processes to assess lexical and semantic similarity of the generated topics. Our system aims to reduce both the ratio of outlier topics to the total number of topics and the similarity between topic definitions. The methodology has been assessed on a newly gathered corpus of scientific papers on literature-based discovery. Through rigorous evaluation by domain experts, AHAM has been validated as effective in uncovering intriguing and novel insights within broad research areas. We explore the impact of domain adaptation of sentence-transformers for the task of topic \textbf{model}ing using two datasets, each specialized to specific scientific domains within arXiv and medarxiv. We evaluate the impact of data size, the niche of adaptation, and the importance of domain adaptation. Our results suggest a strong interaction between domain adaptation and topic modeling precision in terms of outliers and topic definitions.

--------------------------------------------------------------------------------------------------------

Count What You Want: Exemplar Identification and Few-shot Counting of Human Actions in the Wild

The paper on action counting addresses quantifying human actions from wearable sensors using few-shot exemplars. It could enable personalized fitness tracking apps to count custom exercises.

Authors:  Yifeng Huang, Duc Duy Nguyen, Lam Nguyen, Cuong Pham, Minh Hoai

Link:  https://arxiv.org/abs/2312.17330v1

Date: 2023-12-28

Summary:

This paper addresses the task of counting human actions of interest using sensor data from wearable devices. We propose a novel exemplar-based framework, allowing users to provide exemplars of the actions they want to count by vocalizing predefined sounds ''one'', ''two'', and ''three''. Our method first localizes temporal positions of these utterances from the audio sequence. These positions serve as the basis for identifying exemplars representing the action class of interest. A similarity map is then computed between the exemplars and the entire sensor data sequence, which is further fed into a density estimation module to generate a sequence of estimated density values. Summing these density values provides the final count. To develop and evaluate our approach, we introduce a diverse and realistic dataset consisting of real-world data from 37 subjects and 50 action categories, encompassing both sensor and audio data. The experiments on this dataset demonstrate the viability of the proposed method in counting instances of actions from new classes and subjects that were not part of the training data. On average, the discrepancy between the predicted count and the ground truth value is 7.47, significantly lower than the errors of the frequency-based and transformer-based methods. Our project, code and dataset can be found at https://github.com/cvlab-stonybrook/ExRAC.

--------------------------------------------------------------------------------------------------------

LEO Satellite and RIS: Two Keys to Seamless Indoor and Outdoor Localization

The paper on localization proposes combining LEO satellites and intelligent surfaces for seamless indoor/outdoor positioning. This could enable precise location services without infrastructure restrictions.

Authors:  Pinjun Zheng, Xing Liu, Jiguang He, Gonzalo Seco-Granados, Tareq Y. Al-Naffouri

Link:  https://arxiv.org/abs/2312.16946v1

Date: 2023-12-28

Summary:

The contemporary landscape of wireless technology underscores the critical role of precise localization services. Traditional global navigation satellite systems (GNSS)-based solutions, however, fall short when it comes to indoor environments, and existing indoor localization techniques such as electromagnetic fingerprinting methods face challenges of high implementation costs and limited coverage. This article explores an innovative solution that seamlessly blends low Earth orbit (LEO) satellites with reconfigurable intelligent surfaces (RISs), unlocking its potential for realizing uninterrupted indoor and outdoor localization with global coverage. By leveraging the strong signal reception of the LEO satellite signals and capitalizing on the radio environment-reshaping capability of RISs, the integration of these two technologies presents a vision of a future where localization services transcend existing constraints. After a comprehensive review of the distinctive attributes of LEO satellites and RISs, we evaluate the localization error bounds for the proposed collaborative system, showcasing their promising performance on simultaneous indoor and outdoor localization. To conclude, we engage in a discussion on open problems and future research directions for LEO satellite and RIS-enabled localization.

--------------------------------------------------------------------------------------------------------

Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss

The paper on preference optimization presents a new loss function extending binary feedback approaches to pairwise preferences. This could improve training efficiency and performance for aligning language models.

Authors:  Jing Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston

Link:  https://arxiv.org/abs/2312.16682v1

Date: 2023-12-27

Summary:

Practitioners commonly align large language models using pairwise preferences, i.e., given labels of the type response A is preferred to response B for a given input. Perhaps less commonly, methods have also been developed for binary feedback, i.e. training models given labels of type response A is good or bad. We show how an existing performant binary feedback method, the Cringe Loss (Adolphs et al., 2022), can be generalized to the pairwise preference setting using a simple soft margin extension. Pairwise Cringe Loss is straightforward to implement and efficient to train, and we find it outperforms state-of-the-art preference optimization algorithms such as PPO and DPO on the AlpacaFarm benchmark.

--------------------------------------------------------------------------------------------------------

From Text to Multimodal: A Comprehensive Survey of Adversarial Example Generation in Question Answering Systems

The survey on adversarial question answering provides a comprehensive analysis of techniques to evaluate and improve QA robustness. It offers insights to advance security and reliability of conversational AI.

Authors:  Gulsum Yigit, Mehmet Fatih Amasyali

Link:  https://arxiv.org/abs/2312.16156v1

Date: 2023-12-26

Summary:

Integrating adversarial machine learning with Question Answering (QA) systems has emerged as a critical area for understanding the vulnerabilities and robustness of these systems. This article aims to comprehensively review adversarial example-generation techniques in the QA field, including textual and multimodal contexts. We examine the techniques employed through systematic categorization, providing a comprehensive, structured review. Beginning with an overview of traditional QA models, we traverse the adversarial example generation by exploring rule-based perturbations and advanced generative models. We then extend our research to include multimodal QA systems, analyze them across various methods, and examine generative models, seq2seq architectures, and hybrid methodologies. Our research grows to different defense strategies, adversarial datasets, and evaluation metrics and illustrates the comprehensive literature on adversarial QA. Finally, the paper considers the future landscape of adversarial question generation, highlighting potential research directions that can advance textual and multimodal QA systems in the context of adversarial challenges.

--------------------------------------------------------------------------------------------------------

A proposed new metric for the conceptual diversity of a text

The proposed metric for conceptual diversity offers a standardized approach to quantify diversity of ideas in text. This could help analyze complexity and richness of language across domains.

Authors:  İlknur Dönmez Phd, Mehmet Haklıdır Phd

Link:  https://arxiv.org/abs/2312.16548v1

Date: 2023-12-27

Summary:

A word may contain one or more hidden concepts. While the "animal" word evokes many images in our minds and encapsulates many concepts (birds, dogs, cats, crocodiles, etc.), the `parrot' word evokes a single image (a colored bird with a short, hooked beak and the ability to mimic sounds). In spoken or written texts, we use some words in a general sense and some in a detailed way to point to a specific object. Until now, a text's conceptual diversity value cannot be determined using a standard and precise technique. This research contributes to the natural language processing field of AI by offering a standardized method and a generic metric for evaluating and comparing concept diversity in different texts and domains. It also contributes to the field of semantic research of languages. If we give examples for the diversity score of two sentences, "He discovered an unknown entity." has a high conceptual diversity score (16.6801), and "The endoplasmic reticulum forms a series of flattened sacs within the cytoplasm of eukaryotic cells." sentence has a low conceptual diversity score which is 3.9068.

--------------------------------------------------------------------------------------------------------

FairCompass: Operationalising Fairness in Machine Learning

The work on operationalizing fairness presents a human-centric auditing system to facilitate responsible machine learning deployment. It addresses key gaps in implementing algorithmic fairness tools.

Authors:  Jessica Liu, Huaming Chen, Jun Shen, Kim-Kwang Raymond Choo

Link:  https://arxiv.org/abs/2312.16726v1

Date: 2023-12-27

Summary:

As artificial intelligence (AI) increasingly becomes an integral part of our societal and individual activities, there is a growing imperative to develop responsible AI solutions. Despite a diverse assortment of machine learning fairness solutions is proposed in the literature, there is reportedly a lack of practical implementation of these tools in real-world applications. Industry experts have participated in thorough discussions on the challenges associated with operationalising fairness in the development of machine learning-empowered solutions, in which a shift toward human-centred approaches is promptly advocated to mitigate the limitations of existing techniques. In this work, we propose a human-in-the-loop approach for fairness auditing, presenting a mixed visual analytical system (hereafter referred to as 'FairCompass'), which integrates both subgroup discovery technique and the decision tree-based schema for end users. Moreover, we innovatively integrate an Exploration, Guidance and Informed Analysis loop, to facilitate the use of the Knowledge Generation Model for Visual Analytics in FairCompass. We evaluate the effectiveness of FairCompass for fairness auditing in a real-world scenario, and the findings demonstrate the system's potential for real-world deployability. We anticipate this work will address the current gaps in research for fairness and facilitate the operationalisation of fairness in machine learning systems.

--------------------------------------------------------------------------------------------------------

DarkShot: Lighting Dark Images with Low-Compute and High-Quality

The method for low-light image enhancement achieves visually appealing results with minimal computation. It could enable real-time processing for night photography on mobile devices.

Authors:  Jiazhang Zheng, Lei Li, Qiuping Liao, Cheng Li, Li Li, Yangxing Liu

Link:  https://arxiv.org/abs/2312.16805v2

Date: 2023-12-29

Summary:

Nighttime photography encounters escalating challenges in extremely low-light conditions, primarily attributable to the ultra-low signal-to-noise ratio. For real-world deployment, a practical solution must not only produce visually appealing results but also require minimal computation. However, most existing methods are either focused on improving restoration performance or employ lightweight models at the cost of quality. This paper proposes a lightweight network that outperforms existing state-of-the-art (SOTA) methods in low-light enhancement tasks while minimizing computation. The proposed network incorporates Siamese Self-Attention Block (SSAB) and Skip-Channel Attention (SCA) modules, which enhance the model's capacity to aggregate global information and are well-suited for high-resolution images. Additionally, based on our analysis of the low-light image restoration process, we propose a Two-Stage Framework that achieves superior results. Our model can restore a UHD 4K resolution image with minimal computation while keeping SOTA restoration quality.

--------------------------------------------------------------------------------------------------------

The Fourth International Verification of Neural Networks Competition (VNN-COMP 2023): Summary and Results

The neural network verification competition summarizes the latest tools and benchmarks to drive standardized validation. This promotes safer AI through rigorous testing methodologies.

Authors:  Christopher Brix, Stanley Bak, Changliu Liu, Taylor T. Johnson

Link:  https://arxiv.org/abs/2312.16760v1

Date: 2023-12-28

Summary:

This report summarizes the 4th International Verification of Neural Networks Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with the 35th International Conference on Computer-Aided Verification (CAV). VNN-COMP is held annually to facilitate the fair and objective comparison of state-of-the-art neural network verification tools, encourage the standardization of tool interfaces, and bring together the neural network verification community. To this end, standardized formats for networks (ONNX) and specification (VNN-LIB) were defined, tools were evaluated on equal-cost hardware (using an automatic evaluation pipeline based on AWS instances), and tool parameters were chosen by the participants before the final test sets were made public. In the 2023 iteration, 7 teams participated on a diverse set of 10 scored and 4 unscored benchmarks. This report summarizes the rules, benchmarks, participating tools, results, and lessons learned from this iteration of this competition.

--------------------------------------------------------------------------------------------------------

Intelligent Surfaces Empowered Wireless Network:Recent Advances and The Road to 6G

The survey on intelligent surfaces reviews their role in 5G/6G networks. It provides insights on how these technologies can shape emerging wireless systems and applications.

Authors:  Qingqing Wu, Beixiong Zheng, Changsheng You, Lipeng Zhu, Kaiming Shen, Xiaodan Shao, Weidong Mei, Boya Di, Hongliang Zhang, Ertugrul Basar, Lingyang Song, Marco Di Renzo, Zhi-Quan Luo, Rui Zhang

Link:  https://arxiv.org/abs/2312.16918v1

Date: 2023-12-28

Summary:

Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities from passive reflection to active amplification, simultaneous reflection and refraction, as well as holographic beamforming. However, the research on ISs is still in rapid progress and there have been recent technological advances in ISs and their emerging applications that are worthy of a timely review. Thus, we provide in this paper a comprehensive survey on the recent development and advances of ISs aided wireless networks. Specifically, we start with an overview on the anticipated use cases of ISs in future wireless networks such as 6G, followed by a summary of the recent standardization activities related to ISs. Then, the main design issues of the commonly adopted reflection-based IS and their state-of-theart solutions are presented in detail, including reflection optimization, deployment, signal modulation, wireless sensing, and integrated sensing and communications. Finally, recent progress and new challenges in advanced IS architectures are discussed to inspire futrue research.

--------------------------------------------------------------------------------------------------------

Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification

The approach using generated attributes as prompts improves person re-identification through fine-grained contextual cues. It demonstrates how language models can enhance computer vision tasks.

Authors:  Yajing Zhai, Yawen Zeng, Zhiyong Huang, Zheng Qin, Xin Jin, Da Cao

Link:  https://arxiv.org/abs/2312.16797v1

Date: 2023-12-28

Summary:

The fine-grained attribute descriptions can significantly supplement the valuable semantic information for person image, which is vital to the success of person re-identification (ReID) task. However, current ReID algorithms typically failed to effectively leverage the rich contextual information available, primarily due to their reliance on simplistic and coarse utilization of image attributes. Recent advances in artificial intelligence generated content have made it possible to automatically generate plentiful fine-grained attribute descriptions and make full use of them. Thereby, this paper explores the potential of using the generated multiple person attributes as prompts in ReID tasks with off-the-shelf (large) models for more accurate retrieval results. To this end, we present a new framework called Multi-Prompts ReID (MP-ReID), based on prompt learning and language models, to fully dip fine attributes to assist ReID task. Specifically, MP-ReID first learns to hallucinate diverse, informative, and promptable sentences for describing the query images. This procedure includes (i) explicit prompts of which attributes a person has and furthermore (ii) implicit learnable prompts for adjusting/conditioning the criteria used towards this person identity matching. Explicit prompts are obtained by ensembling generation models, such as ChatGPT and VQA models. Moreover, an alignment module is designed to fuse multi-prompts (i.e., explicit and implicit ones) progressively and mitigate the cross-modal gap. Extensive experiments on the existing attribute-involved ReID datasets, namely, Market1501 and DukeMTMC-reID, demonstrate the effectiveness and rationality of the proposed MP-ReID solution.

--------------------------------------------------------------------------------------------------------

Preference as Reward, Maximum Preference Optimization with Importance Sampling

The preference learning method simplifies and stabilizes the optimization process. It provides an efficient way to align language models with human values.

Authors:  Zaifan Jiang, Xing Huang, Chao Wei

Link:  https://arxiv.org/abs/2312.16430v1

Date: 2023-12-27

Summary:

Preference learning is a key technology for aligning language models with human values. Reinforcement Learning from Human Feedback (RLHF) is a model based algorithm to optimize preference learning, which first fitting a reward model for preference score, and then optimizing generating policy with on-policy PPO algorithm to maximize the reward. The processing of RLHF is complex, time-consuming and unstable. Direct Preference Optimization (DPO) algorithm using off-policy algorithm to direct optimize generating policy and eliminating the need for reward model, which is data efficient and stable. DPO use Bradley-Terry model and log-loss which leads to over-fitting to the preference data at the expense of ignoring KL-regularization term when preference near deterministic. IPO uses a root-finding pairwise MSE loss to solve the ignoring KL-regularization problem, and learning an optimal policy. But IPO's pairwise loss still can't s make the KL-regularization to work. In this paper, we design a simple and intuitive off-policy preferences optimization algorithm from an importance sampling view, and add an off-policy KL-regularization term which makes KL-regularization truly effective. To simplify the learning process and save memory usage, we can generate regularization data in advance, which eliminate the needs for both reward model and reference policy in the stage of optimization.

--------------------------------------------------------------------------------------------------------

Knowledge Distillation of LLM for Education

The knowledge distillation technique transfers capabilities of large models to highly compact networks. This work enables AI deployment on resource-constrained educational applications.

Authors:  Ehsan Latif, Luyang Fang, Ping Ma, Xiaoming Zhai

Link:  https://arxiv.org/abs/2312.15842v1

Date: 2023-12-26

Summary:

This study proposes a method for distilling the knowledge of fine-tuned Large Language Models (LLMs) into a smaller, more efficient, and accurate neural network, specifically targeting the challenge of deploying these models on resource-constrained devices. Our methodology involves training the smaller student model using the prediction probabilities of the LLM, which serves as a teacher model. This is achieved through a specialized loss function tailored to learn from the LLM's output probabilities, ensuring that the student model closely mimics the teacher's performance. To test this approach, we utilized a large dataset, 7T, containing 6,684 student-written responses to science questions and three other datasets with student-written responses. We also compared performance with original neural network (NN) models to validate the accuracy. Results have shown that the NN and distilled student models have comparable accuracy to the teacher model for the 7T dataset; however, other datasets have shown significantly lower accuracy (28% on average) for NN, though our proposed distilled model is still able to achieve 12\% higher accuracy than NN. Furthermore, the student model size ranges from 0.1M to 0.02M, 100 times smaller in terms of parameters and ten times smaller compared with the original output model size. The significance of this research lies in its potential to make advanced AI technologies accessible in typical educational settings, particularly for automatic scoring.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.