Week Ending 12.29.2024
RESEARCH WATCH: 12.29.2024
DeepSeek-V3 presents a breakthrough in large language model efficiency, using a Mixture-of-Experts architecture that activates only 37B of its 671B parameters per token. The model introduces novel approaches to load balancing and training objectives, achieving performance comparable to leading closed-source models while requiring relatively modest computing resources. This could enable more organizations to deploy powerful language models cost-effectively.
Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, Jianzhong Guo, Jiaqi Ni, Jiashi Li, Jiawei Wang, Jin Chen, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, Junxiao Song, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Litong Wang, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qiancheng Wang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, Runxin Xu, Ruoyu Zhang, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Shuting Pan, T. Wang, Tao Yun, Tian Pei, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wanjia Zhao, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaokang Zhang, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xingkai Yu, Xinnan Song, Xinxia Shan, Xinyi Zhou, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, Y. K. Li, Y. Q. Wang, Y. X. Wei, Y. X. Zhu, Yang Zhang, Yanhong Xu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Yu, Yi Zheng, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Ying Tang, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yu Wu, Yuan Ou, Yuchen Zhu, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yukun Zha, Yunfan Xiong, Yunxian Ma, Yuting Yan, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Z. F. Wu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhibin Gou, Zhicheng Ma, Zhigang Yan, Zhihong Shao, Zhipeng Xu, Zhiyu Wu, Zhongyu Zhang, Zhuoshu Li, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Ziyi Gao, Zizheng Pan
Link: https://arxiv.org/abs/2412.19437v1
Date: 2024-12-27
Summary:
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
--------------------------------------------------------------------------------------------------------
Geospatial Data Fusion: Combining Lidar, SAR, and Optical Imagery with AI for Enhanced Urban Mapping
The Geospatial Data Fusion study combines Lidar, SAR, and optical imagery using AI for more accurate urban mapping. Using Fully Convolutional Networks optimized with Particle Swarm Optimization, the system achieves 92.3% pixel accuracy. This technology could revolutionize urban planning, disaster response, and infrastructure development by providing more comprehensive and accurate urban environment data.
Authors: Sajjad Afroosheh, Mohammadreza Askari
Link: https://arxiv.org/abs/2412.18994v1
Date: 2024-12-25
Summary:
This study explores the integration of Lidar, Synthetic Aperture Radar (SAR), and optical imagery through advanced artificial intelligence techniques for enhanced urban mapping. By fusing these diverse geospatial datasets, we aim to overcome the limitations associated with single-sensor data, achieving a more comprehensive representation of urban environments. The research employs Fully Convolutional Networks (FCNs) as the primary deep learning model for urban feature extraction, enabling precise pixel-wise classification of essential urban elements, including buildings, roads, and vegetation. To optimize the performance of the FCN model, we utilize Particle Swarm Optimization (PSO) for hyperparameter tuning, significantly enhancing model accuracy. Key findings indicate that the FCN-PSO model achieved a pixel accuracy of 92.3% and a mean Intersection over Union (IoU) of 87.6%, surpassing traditional single-sensor approaches. These results underscore the potential of fused geospatial data and AI-driven methodologies in urban mapping, providing valuable insights for urban planning and management. The implications of this research pave the way for future developments in real-time mapping and adaptive urban infrastructure planning.
--------------------------------------------------------------------------------------------------------
Video Is Worth a Thousand Images: Exploring the Latest Trends in Long Video Generation
This paper examines trends in long video generation, acknowledging current limitations like OpenAI Sora's one-minute cap. It explores challenges in scaling video generation beyond current limits, including maintaining consistency and story development. The research provides a foundation for future advances in extended video generation, with potential applications in entertainment, education, and content creation.
Authors: Faraz Waseem, Muhammad Shahzad
Link: https://arxiv.org/abs/2412.18688v1
Date: 2024-12-24
Summary:
An image may convey a thousand words, but a video composed of hundreds or thousands of image frames tells a more intricate story. Despite significant progress in multimodal large language models (MLLMs), generating extended videos remains a formidable challenge. As of this writing, OpenAI's Sora, the current state-of-the-art system, is still limited to producing videos that are up to one minute in length. This limitation stems from the complexity of long video generation, which requires more than generative AI techniques for approximating density functions essential aspects such as planning, story development, and maintaining spatial and temporal consistency present additional hurdles. Integrating generative AI with a divide-and-conquer approach could improve scalability for longer videos while offering greater control. In this survey, we examine the current landscape of long video generation, covering foundational techniques like GANs and diffusion models, video generation strategies, large-scale training datasets, quality metrics for evaluating long videos, and future research areas to address the limitations of the existing video generation capabilities. We believe it would serve as a comprehensive foundation, offering extensive information to guide future advancements and research in the field of long video generation.
--------------------------------------------------------------------------------------------------------
PEACH is a perioperative AI chatbot integrated with local medical guidelines, achieving 97.9% accuracy in clinical settings. Deployed in a real healthcare environment with stringent security measures, it demonstrates how LLMs can be safely and effectively implemented in healthcare decision-making. This technology could streamline preoperative assessments and standardize care protocols across medical institutions.
Authors: Yu He Ke, Liyuan Jin, Kabilan Elangovan, Bryan Wen Xi Ong, Chin Yang Oh, Jacqueline Sim, Kenny Wei-Tsen Loh, Chai Rick Soh, Jonathan Ming Hua Cheng, Aaron Kwang Yang Lee, Daniel Shu Wei Ting, Nan Liu, Hairil Rizal Abdullah
Link: https://arxiv.org/abs/2412.18096v1
Date: 2024-12-24
Summary:
Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative protocols in the secure Claude 3.5 Sonet LLM framework within Pair Chat (developed by Singapore Government) and tested in a silent deployment with real-world data. Accuracy, safety, and usability were assessed. Deviations and hallucinations were categorized based on potential harm, and user feedback was evaluated using the Technology Acceptance Model (TAM). Updates were made after the initial silent deployment to amend one protocol. In 240 real-world clinical iterations, PEACH achieved a first-generation accuracy of 97.5% (78/80) and an overall accuracy of 96.7% (232/240) across three iterations. The updated PEACH demonstrated improved accuracy of 97.9% (235/240), with a statistically significant difference from the null hypothesis of 95% accuracy (p = 0.018, 95% CI: 0.952-0.991). Minimal hallucinations and deviations were observed (both 1/240 and 2/240, respectively). Clinicians reported that PEACH expedited decisions in 95% of cases, and inter-rater reliability ranged from kappa 0.772-0.893 within PEACH and 0.610-0.784 among attendings. PEACH is an accurate, adaptable tool that enhances consistency and efficiency in perioperative decision-making. Future research should explore its scalability across specialties and its impact on clinical outcomes.
--------------------------------------------------------------------------------------------------------
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
B-STaR addresses a critical challenge in self-improving AI systems by monitoring and balancing exploration versus exploitation. The framework autonomously adjusts its learning approach based on current model capabilities and available rewards. This research could improve how AI systems learn and adapt across various domains, from mathematical reasoning to coding and commonsense tasks.
Authors: Weihao Zeng, Yuzhen Huang, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He
Link: https://arxiv.org/abs/2412.17256v1
Date: 2024-12-23
Summary:
In the absence of extensive human-annotated data for complex reasoning tasks, self-improvement -- where models are trained on their own outputs -- has emerged as a primary method for enhancing performance. However, the critical factors underlying the mechanism of these iterative self-improving methods remain poorly understood, such as under what conditions self-improvement is effective, and what are the bottlenecks in the current iterations. In this work, we identify and propose methods to monitor two pivotal factors in this iterative process: (1) the model's ability to generate sufficiently diverse responses (exploration); and (2) the effectiveness of external rewards in distinguishing high-quality candidates from lower-quality ones (exploitation). Using mathematical reasoning as a case study, we begin with a quantitative analysis to track the dynamics of exploration and exploitation, discovering that a model's exploratory capabilities rapidly deteriorate over iterations, and the effectiveness of exploiting external rewards diminishes as well. Motivated by these findings, we introduce B-STaR, a Self-Taught Reasoning framework that autonomously adjusts configurations across iterations to Balance exploration and exploitation, thereby optimizing the self-improving effectiveness based on the current policy model and available rewards. Our experiments on mathematical reasoning, coding, and commonsense reasoning demonstrate that B-STaR not only enhances the model's exploratory capabilities throughout training but also achieves a more effective balance between exploration and exploitation, leading to superior performance.
--------------------------------------------------------------------------------------------------------
FFA Sora, video generation as fundus fluorescein angiography simulator
FFA Sora develops an innovative approach to simulating fundus fluorescein angiography videos, helping medical students learn to interpret these critical diagnostic tests. Using a Wavelet-Flow Variational Autoencoder and diffusion transformer, it converts text reports into dynamic videos while maintaining patient privacy. This tool could revolutionize ophthalmology training and education.
Authors: Xinyuan Wu, Lili Wang, Ruoyu Chen, Bowen Liu, Weiyi Zhang, Xi Yang, Yifan Feng, Mingguang He, Danli Shi
Link: https://arxiv.org/abs/2412.17346v1
Date: 2024-12-23
Summary:
Fundus fluorescein angiography (FFA) is critical for diagnosing retinal vascular diseases, but beginners often struggle with image interpretation. This study develops FFA Sora, a text-to-video model that converts FFA reports into dynamic videos via a Wavelet-Flow Variational Autoencoder (WF-VAE) and a diffusion transformer (DiT). Trained on an anonymized dataset, FFA Sora accurately simulates disease features from the input text, as confirmed by objective metrics: Frechet Video Distance (FVD) = 329.78, Learned Perceptual Image Patch Similarity (LPIPS) = 0.48, and Visual-question-answering Score (VQAScore) = 0.61. Specific evaluations showed acceptable alignment between the generated videos and textual prompts, with BERTScore of 0.35. Additionally, the model demonstrated strong privacy-preserving performance in retrieval evaluations, achieving an average Recall@K of 0.073. Human assessments indicated satisfactory visual quality, with an average score of 1.570(scale: 1 = best, 5 = worst). This model addresses privacy concerns associated with sharing large-scale FFA data and enhances medical education.
--------------------------------------------------------------------------------------------------------
This survey investigates how students aged 13-25 use Large Language Models in education, revealing widespread adoption across disciplines with notable gender disparities. The findings highlight both opportunities and risks in educational AI use, particularly regarding critical thinking development. This research could inform educational policies and curriculum development in an AI-integrated learning environment.
Authors: Jérémie Sublime, Ilaria Renna
Link: https://arxiv.org/abs/2412.17486v1
Date: 2024-12-23
Summary:
The rapid adoption of Generative AI (GenAI) based on Large Language Models (LLMs) such as ChatGPT has recently and profoundly impacted education, offering transformative opportunities while raising significant concerns. In this study we present the results of a survey that investigates how 395 students aged 13 to 25 years old in France and Italy integrate LLMs into their educational routines. Key findings include the widespread use of these tools across all age groups and disciplines, with older students and male students demonstrating higher usage frequencies, particularly in scientific contexts. The results also show gender disparities, raising concerns about an emerging AI literacy and technological gender gap. Additionally, while most students utilise LLMs constructively, the lack of systematic proofreading and critical evaluation among younger users suggests potential risks to cognitive skills development, including critical thinking and foundational knowledge. The survey results underscore the need for educational institutions to adapt their curricula to integrate AI tools effectively, promoting ethical use, critical thinking, and awareness of AI limitations and environmental costs. This paper provides actionable recommendations for fostering equitable and effective cohabitation of LLMs and education while addressing emerging challenges.
--------------------------------------------------------------------------------------------------------
This comprehensive review examines AI applications in IVF ovarian stimulation, focusing on medical imaging integration. While current implementations show promise in predicting optimal treatment protocols, the study identifies opportunities for advancement through 3D imaging and explainable AI. This research could lead to more personalized and successful IVF treatments.
Authors: Jana Zakall, Birgit Pohn, Antonia Graf, Daniel Kovatchki, Arezoo Borji, Ragib Shahriar Islam, Hossam Haick, Heinz Strohmer, Sepideh Hatamikia
Link: https://arxiv.org/abs/2412.19688v1
Date: 2024-12-27
Summary:
Artificial intelligence (AI) has emerged as a powerful tool to enhance decision-making and optimize treatment protocols in in vitro fertilization (IVF). In particular, AI shows significant promise in supporting decision-making during the ovarian stimulation phase of the IVF process. This review evaluates studies focused on the applications of AI combined with medical imaging in ovarian stimulation, examining methodologies, outcomes, and current limitations. Our analysis of 13 studies on this topic reveals that, reveal that while AI algorithms demonstrated notable potential in predicting optimal hormonal dosages, trigger timing, and oocyte retrieval outcomes, the medical imaging data utilized predominantly came from two-dimensional (2D) ultrasound which mainly involved basic quantifications, such as follicle size and number, with limited use of direct feature extraction or advanced image analysis techniques. This points to an underexplored opportunity where advanced image analysis approaches, such as deep learning, and more diverse imaging modalities, like three-dimensional (3D) ultrasound, could unlock deeper insights. Additionally, the lack of explainable AI (XAI) in most studies raises concerns about the transparency and traceability of AI-driven decisions - key factors for clinical adoption and trust. Furthermore, many studies relied on single-center designs and small datasets, which limit the generalizability of their findings. This review highlights the need for integrating advanced imaging analysis techniques with explainable AI methodologies, as well as the importance of leveraging multicenter collaborations and larger datasets. Addressing these gaps has the potential to enhance ovarian stimulation management, paving the way for efficient, personalized, and data-driven treatment pathways that improve IVF outcomes.
--------------------------------------------------------------------------------------------------------
TiARA introduces a novel approach to enhancing video generation consistency using time-frequency analysis and improved prompt interpolation. The method provides theoretical guarantees for frequency-based techniques in diffusion models, addressing common issues in video smoothness and scene transitions. This could improve video generation quality across multiple applications, from entertainment to education.
Authors: Xingyao Li, Fengzhuo Zhang, Jiachun Pan, Yunlong Hou, Vincent Y. F. Tan, Zhuoran Yang
Link: https://arxiv.org/abs/2412.17254v1
Date: 2024-12-23
Summary:
Despite the considerable progress achieved in the long video generation problem, there is still significant room to improve the consistency of the videos, particularly in terms of smoothness and transitions between scenes. We address these issues to enhance the consistency and coherence of videos generated with either single or multiple prompts. We propose the Time-frequency based temporal Attention Reweighting Algorithm (TiARA), which meticulously edits the attention score matrix based on the Discrete Short-Time Fourier Transform. Our method is supported by a theoretical guarantee, the first-of-its-kind for frequency-based methods in diffusion models. For videos generated by multiple prompts, we further investigate key factors affecting prompt interpolation quality and propose PromptBlend, an advanced prompt interpolation pipeline. The efficacy of our proposed method is validated via extensive experimental results, exhibiting consistent and impressive improvements over baseline methods. The code will be released upon acceptance.
--------------------------------------------------------------------------------------------------------
IMAGINE presents an innovative compute-in-memory CNN accelerator designed for edge computing applications. Using charge-based computing and distribution-aware data reshaping, it achieves significant improvements in energy efficiency while maintaining accuracy. This technology could enable more powerful AI applications in mobile and IoT devices.
Authors: Adrian Kneip, Martin Lefebvre, Pol Maistriaux, David Bol
Link: https://arxiv.org/abs/2412.19750v1
Date: 2024-12-27
Summary:
Charge-domain compute-in-memory (CIM) SRAMs have recently become an enticing compromise between computing efficiency and accuracy to process sub-8b convolutional neural networks (CNNs) at the edge. Yet, they commonly make use of a fixed dot-product (DP) voltage swing, which leads to a loss in effective ADC bits due to data-dependent clipping or truncation effects that waste precious conversion energy and computing accuracy. To overcome this, we present IMAGINE, a workload-adaptive 1-to-8b CIM-CNN accelerator in 22nm FD-SOI. It introduces a 1152x256 end-to-end charge-based macro with a multi-bit DP based on an input-serial, weight-parallel accumulation that avoids power-hungry DACs. An adaptive swing is achieved by combining a channel-wise DP array split with a linear in-ADC implementation of analog batch-normalization (ABN), obtaining a distribution-aware data reshaping. Critical design constraints are relaxed by including the post-silicon equivalent noise within a CIM-aware CNN training framework. Measurement results showcase an 8b system-level energy efficiency of 40TOPS/W at 0.3/0.6V, with competitive accuracies on MNIST and CIFAR-10. Moreover, the peak energy and area efficiencies of the 187kB/mm2 macro respectively reach up to 0.15-8POPS/W and 2.6-154TOPS/mm2, scaling with the 8-to-1b computing precision. These results exceed previous charge-based designs by 3-to-5x while being the first work to provide linear in-memory rescaling.
--------------------------------------------------------------------------------------------------------
A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
This paper introduces a reinforcement learning-based approach to improve reliability in manycore computing systems. By managing task mapping and thermal variations, the method increases mean time to failure by up to 27%. This could enhance the longevity and performance of complex computing systems in data centers and high-performance computing applications.
Authors: Fatemeh Hossein-Khani, Omid Akbari
Link: https://arxiv.org/abs/2412.19340v1
Date: 2024-12-26
Summary:
The increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and thermal cycling (TC), as well as the electromigration (EM) phenomenon. In this paper, we propose a reinforcement learning (RL)-based task mapping method to improve the reliability of manycore systems considering the aforementioned aging mechanisms, which consists of three steps including bin packing, task-to-bin mapping, and task-to-core mapping. In the initial step, a density-based spatial application with noise (DBSCAN) clustering method is employed to compose some clusters (bins) based on the cores temperature. Then, the Q-learning algorithm is used for the two latter steps, to map the arrived task on a core such that the minimum thermal variation is occurred among all the bins. Compared to the state-of-the-art works, the proposed method is performed during runtime without requiring any parameter to be calculated offline. The effectiveness of the proposed technique is evaluated on 16, 32, and 64 cores systems using SPLASH2 and PARSEC benchmark suite applications. The results demonstrate up to 27% increase in the mean time to failure (MTTF) compared to the state-of-the-art task mapping techniques.
--------------------------------------------------------------------------------------------------------
Memory makes computation universal, remember?
This theoretical paper argues that memory's role in enabling universal computation has been underappreciated in AI development. By proving that state maintenance and history access are sufficient for universal computation, it provides a new framework for understanding computational advances across biological and artificial systems.
Authors: Erik Garrison
Link: https://arxiv.org/abs/2412.17794v1
Date: 2024-12-23
Summary:
Recent breakthroughs in AI capability have been attributed to increasingly sophisticated architectures and alignment techniques, but a simpler principle may explain these advances: memory makes computation universal. Memory enables universal computation through two fundamental capabilities: recursive state maintenance and reliable history access. We formally prove these requirements are both necessary and sufficient for universal computation. This principle manifests across scales, from cellular computation to neural networks to language models. Complex behavior emerges not from sophisticated processing units but from maintaining and accessing state across time. We demonstrate how parallel systems like neural networks achieve universal computation despite limitations in their basic units by maintaining state across iterations. This theoretical framework reveals a universal pattern: computational advances consistently emerge from enhanced abilities to maintain and access state rather than from more complex basic operations. Our analysis unifies understanding of computation across biological systems, artificial intelligence, and human cognition, reminding us that humanity's own computational capabilities have evolved in step with our technical ability to remember through oral traditions, writing, and now computing.
--------------------------------------------------------------------------------------------------------
Automating the Search for Artificial Life with Foundation Models
ASAL presents a groundbreaking approach to discovering artificial life configurations using foundation models. Successfully applied across various substrates including Boids and Lenia, it automates the discovery of novel lifeforms and open-ended systems. This could accelerate artificial life research and provide insights into complex adaptive systems.
Authors: Akarsh Kumar, Chris Lu, Louis Kirsch, Yujin Tang, Kenneth O. Stanley, Phillip Isola, David Ha
Link: https://arxiv.org/abs/2412.17799v1
Date: 2024-12-23
Summary:
With the recent Nobel Prize awarded for radical advances in protein discovery, foundation models (FMs) for exploring large combinatorial spaces promise to revolutionize many scientific fields. Artificial Life (ALife) has not yet integrated FMs, thus presenting a major opportunity for the field to alleviate the historical burden of relying chiefly on manual design and trial-and-error to discover the configurations of lifelike simulations. This paper presents, for the first time, a successful realization of this opportunity using vision-language FMs. The proposed approach, called Automated Search for Artificial Life (ASAL), (1) finds simulations that produce target phenomena, (2) discovers simulations that generate temporally open-ended novelty, and (3) illuminates an entire space of interestingly diverse simulations. Because of the generality of FMs, ASAL works effectively across a diverse range of ALife substrates including Boids, Particle Life, Game of Life, Lenia, and Neural Cellular Automata. A major result highlighting the potential of this technique is the discovery of previously unseen Lenia and Boids lifeforms, as well as cellular automata that are open-ended like Conway's Game of Life. Additionally, the use of FMs allows for the quantification of previously qualitative phenomena in a human-aligned way. This new paradigm promises to accelerate ALife research beyond what is possible through human ingenuity alone.
--------------------------------------------------------------------------------------------------------
This research examines clinical coding automation, highlighting misalignments between current AI research and practical needs. Based on analysis of US healthcare records, it provides eight recommendations for improving evaluation methods. This work could lead to more effective AI-assisted clinical coding systems that better serve healthcare professionals.
Authors: Yidong Gan, Maciej Rybinski, Ben Hachey, Jonathan K. Kummerfeld
Link: https://arxiv.org/abs/2412.18043v1
Date: 2024-12-23
Summary:
Clinical coding is crucial for healthcare billing and data analysis. Manual clinical coding is labour-intensive and error-prone, which has motivated research towards full automation of the process. However, our analysis, based on US English electronic health records and automated coding research using these records, shows that widely used evaluation methods are not aligned with real clinical contexts. For example, evaluations that focus on the top 50 most common codes are an oversimplification, as there are thousands of codes used in practice. This position paper aims to align AI coding research more closely with practical challenges of clinical coding. Based on our analysis, we offer eight specific recommendations, suggesting ways to improve current evaluation methods. Additionally, we propose new AI-based methods beyond automated coding, suggesting alternative approaches to assist clinical coders in their workflows.
--------------------------------------------------------------------------------------------------------
This paper presents a novel approach to implementing differential privacy in Retrieval-Augmented Generation systems, addressing critical privacy concerns in LLM applications. The method enables secure knowledge extraction from personal data while maintaining privacy guarantees. This could enable broader adoption of RAG systems in privacy-sensitive applications.
Authors: Nicolas Grislain
Link: https://arxiv.org/abs/2412.19291v1
Date: 2024-12-26
Summary:
Retrieval-Augmented Generation (RAG) has emerged as the dominant technique to provide *Large Language Models* (LLM) with fresh and relevant context, mitigating the risk of hallucinations and improving the overall quality of responses in environments with large and fast moving knowledge bases. However, the integration of external documents into the generation process raises significant privacy concerns. Indeed, when added to a prompt, it is not possible to guarantee a response will not inadvertently expose confidential data, leading to potential breaches of privacy and ethical dilemmas. This paper explores a practical solution to this problem suitable to general knowledge extraction from personal data. It shows *differentially private token generation* is a viable approach to private RAG.
--------------------------------------------------------------------------------------------------------
Pirates of the RAG: Adaptively Attacking LLMs to Leak Knowledge Bases
This study exposes security vulnerabilities in RAG systems through a black-box attack method that can leak private knowledge bases. Using relevance-based mechanisms and open-source LLMs, the attack demonstrates significant success across different domains. This research highlights the urgent need for improved privacy protections in RAG implementations.
Authors: Christian Di Maio, Cristian Cosci, Marco Maggini, Valentina Poggioni, Stefano Melacci
Link: https://arxiv.org/abs/2412.18295v1
Date: 2024-12-24
Summary:
The growing ubiquity of Retrieval-Augmented Generation (RAG) systems in several real-world services triggers severe concerns about their security. A RAG system improves the generative capabilities of a Large Language Models (LLM) by a retrieval mechanism which operates on a private knowledge base, whose unintended exposure could lead to severe consequences, including breaches of private and sensitive information. This paper presents a black-box attack to force a RAG system to leak its private knowledge base which, differently from existing approaches, is adaptive and automatic. A relevance-based mechanism and an attacker-side open-source LLM favor the generation of effective queries to leak most of the (hidden) knowledge base. Extensive experimentation proves the quality of the proposed algorithm in different RAG pipelines and domains, comparing to very recent related approaches, which turn out to be either not fully black-box, not adaptive, or not based on open-source models. The findings from our study remark the urgent need for more robust privacy safeguards in the design and deployment of RAG systems.
--------------------------------------------------------------------------------------------------------
Sampling Bag of Views for Open-Vocabulary Object Detection
This paper introduces a novel approach to open-vocabulary object detection using concept-based alignment and efficient sampling strategies. The method achieves significant improvements in detection accuracy while reducing computational requirements. This could enhance computer vision applications across various domains, from surveillance to autonomous systems.
Authors: Hojun Choi, Junsuk Choe, Hyunjung Shim
Link: https://arxiv.org/abs/2412.18273v1
Date: 2024-12-24
Summary:
Existing open-vocabulary object detection (OVD) develops methods for testing unseen categories by aligning object region embeddings with corresponding VLM features. A recent study leverages the idea that VLMs implicitly learn compositional structures of semantic concepts within the image. Instead of using an individual region embedding, it utilizes a bag of region embeddings as a new representation to incorporate compositional structures into the OVD task. However, this approach often fails to capture the contextual concepts of each region, leading to noisy compositional structures. This results in only marginal performance improvements and reduced efficiency. To address this, we propose a novel concept-based alignment method that samples a more powerful and efficient compositional structure. Our approach groups contextually related ``concepts'' into a bag and adjusts the scale of concepts within the bag for more effective embedding alignment. Combined with Faster R-CNN, our method achieves improvements of 2.6 box AP50 and 0.5 mask AP over prior work on novel categories in the open-vocabulary COCO and LVIS benchmarks. Furthermore, our method reduces CLIP computation in FLOPs by 80.3% compared to previous research, significantly enhancing efficiency. Experimental results demonstrate that the proposed method outperforms previous state-of-the-art models on the OVD datasets.
--------------------------------------------------------------------------------------------------------
From prediction to explanation: managing influential negative reviews through explainable AI
This study develops an explainable AI algorithm for identifying influential negative reviews in online platforms. Validated on over 100,000 restaurant reviews, it provides actionable insights for businesses to manage online feedback. This could help companies improve customer service and reputation management strategies.
Authors: Rongping Shen
Link: https://arxiv.org/abs/2412.19692v1
Date: 2024-12-27
Summary:
The profound impact of online reviews on consumer decision-making has made it crucial for businesses to manage negative reviews. Recent advancements in artificial intelligence (AI) technology have offered businesses novel and effective ways to manage and analyze substantial consumer feedback. In response to the growing demand for explainablility and transparency in AI applications, this study proposes a novel explainable AI (XAI) algorithm aimed at identifying influential negative reviews. The experiments conducted on 101,338 restaurant reviews validate the algorithm's effectiveness and provides understandable explanations from both the feature-level and word-level perspectives. By leveraging this algorithm, businesses can gain actionable insights for predicting, perceiving, and strategically responding to online negative feedback, fostering improved customer service and mitigating the potential damage caused by negative reviews.
--------------------------------------------------------------------------------------------------------
LangYa: Revolutionizing Cross-Spatiotemporal Ocean Forecasting
LangYa introduces a cross-spatiotemporal ocean forecasting system that combines multiple data sources for improved accuracy. Using 27 years of global ocean data, it achieves more reliable forecasting than existing systems. This could enhance weather prediction, climate research, and maritime operations.
Authors: Nan Yang, Chong Wang, Meihua Zhao, Zimeng Zhao, Huiling Zheng, Bin Zhang, Jianing Wang, Xiaofeng Li
Link: https://arxiv.org/abs/2412.18097v2
Date: 2024-12-25
Summary:
Ocean forecasting is crucial for both scientific research and societal benefits. Currently, the most accurate forecasting systems are global ocean forecasting systems (GOFSs), which represent the ocean state variables (OSVs) as discrete grids and solve partial differential equations (PDEs) governing the transitions of oceanic state variables using numerical methods. However, GOFSs processes are computationally expensive and prone to cumulative errors. Recently, large artificial intelligence (AI)-based models significantly boosted forecasting speed and accuracy. Unfortunately, building a large AI ocean forecasting system that can be considered cross-spatiotemporal and air-sea coupled forecasts remains a significant challenge. Here, we introduce LangYa, a cross-spatiotemporal and air-sea coupled ocean forecasting system. Results demonstrate that the time embedding module in LangYa enables a single model to make forecasts with lead times ranging from 1 to 7 days. The air-sea coupled module effectively simulates air-sea interactions. The ocean self-attention module improves network stability and accelerates convergence during training, and the adaptive thermocline loss function improves the accuracy of thermocline forecasting. Compared to existing numerical and AI-based ocean forecasting systems, LangYa uses 27 years of global ocean data from the Global Ocean Reanalysis and Simulation version 12 (GLORYS12) for training and achieves more reliable deterministic forecasting results for OSVs. LangYa forecasting system provides global ocean researchers with access to a powerful software tool for accurate ocean forecasting and opens a new paradigm for ocean science.
--------------------------------------------------------------------------------------------------------
TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data
TARGA presents a framework for generating synthetic data to improve reasoning over structured data. By dynamically creating relevant training examples, it achieves superior performance in knowledge base question answering tasks. This could enhance natural language interfaces for databases and information systems.
Authors: Xiang Huang, Jiayu Shen, Shanshan Huang, Sitao Cheng, Xiaxia Wang, Yuzhong Qu
Link: https://arxiv.org/abs/2412.19544v1
Date: 2024-12-27
Summary:
Semantic parsing, which converts natural language questions into logic forms, plays a crucial role in reasoning within structured environments. However, existing methods encounter two significant challenges: reliance on extensive manually annotated datasets and limited generalization capability to unseen examples. To tackle these issues, we propose Targeted Synthetic Data Generation (TARGA), a practical framework that dynamically generates high-relevance synthetic data without manual annotation. Starting from the pertinent entities and relations of a given question, we probe for the potential relevant queries through layer-wise expansion and cross-layer combination. Then we generate corresponding natural language questions for these constructed queries to jointly serve as the synthetic demonstrations for in-context learning. Experiments on multiple knowledge base question answering (KBQA) datasets demonstrate that TARGA, using only a 7B-parameter model, substantially outperforms existing non-fine-tuned methods that utilize close-sourced model, achieving notable improvements in F1 scores on GrailQA(+7.7) and KBQA-Agent(+12.2). Furthermore, TARGA also exhibits superior sample efficiency, robustness, and generalization capabilities under non-I.I.D. settings.
--------------------------------------------------------------------------------------------------------