Eye On AI

View Original

Week Ending 1.21.2024

RESEARCH WATCH: 1.21.2024

SPONSORED BY

Digimarc digital watermarks invisibly guard your digital assets to protect against misuse, prove copyright ownership, and verify authenticity. In an era of artificial intelligence, don’t leave your images and other digital content exposed. Demand superior content protection and maintain trust in your brand with Digimarc.

Checkout Digimarc - https://www.digimarc.com/

FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language Models

FinLLMs introduces a method to generate financial question-answering data using large language models. This can reduce manual annotation costs and address limited data resources in the financial domain. It has potential applications in training numerical reasoning models.

Authors:  Ziqiang Yuan, Kaiyuan Wang, Shoutai Zhu, Ye Yuan, Jingya Zhou, Yanlin Zhu, Wenqi Wei

Link:  https://arxiv.org/abs/2401.10744v1

Date: 2024-01-19

Summary:

Large Language models (LLMs) usually rely on extensive training datasets. In the financial domain, creating numerical reasoning datasets that include a mix of tables and long text often involves substantial manual annotation expenses. To address the limited data resources and reduce the annotation cost, we introduce FinLLMs, a method for generating financial question-answering data based on common financial formulas using Large Language Models. First, we compile a list of common financial formulas and construct a graph based on the variables these formulas employ. We then augment the formula set by combining those that share identical variables as new elements. Specifically, we explore formulas obtained by manual annotation and merge those formulas with shared variables by traversing the constructed graph. Finally, utilizing GPT-3.5, we generate financial question-answering data that encompasses both tabular information and long textual content, building on the collected formula set. Our experiments demonstrate that synthetic data generated by FinLLMs effectively enhances the performance of several large-scale numerical reasoning models in the financial domain, outperforming two established benchmark financial question-answering datasets.

--------------------------------------------------------------------------------------------------------

Dynamic Q&A of Clinical Documents with Large Language Models

Dynamic Q&A of Clinical Documents proposes a natural language interface to allow users to query clinical notes using large language models. It has applications in clinical decision-making by unlocking value in electronic health records.

Authors:  Ran Elgedawy, Sudarshan Srinivasan, Ioana Danciu

Link:  https://arxiv.org/abs/2401.10733v1

Date: 2024-01-19

Summary:

Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural language, receiving relevant answers from clinical notes. Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands. Model optimization, including weight quantization, improves latency by approximately 48 times. Promising results indicate potential, yet challenges such as model hallucinations and limited diverse medical case evaluations remain. Addressing these gaps is crucial for unlocking the value in clinical notes and advancing AI-driven clinical decision-making.

--------------------------------------------------------------------------------------------------------

Proceedings 14th International Conference on Automated Deduction in Geometry

Proceedings 14th International Conference on Automated Deduction in Geometry summarizes the 2023 conference on the intersection of geometry and automated deduction. It highlights new research in formalizing, visualizing, and automating geometry.

Authors:  Pedro Quaresma, Zoltán Kovács

Link:  https://arxiv.org/abs/2401.10725v1

Date: 2024-01-19

Summary:

ADG is a forum to exchange ideas and views, to present research results and progress, and to demonstrate software tools at the intersection between geometry and automated deduction. The conference is held every two years. The previous editions of ADG were held in Hagenberg in 2021 (online, postponed from 2020 due to COVID-19), Nanning in 2018, Strasbourg in 2016, Coimbra in 2014, Edinburgh in 2012, Munich in 2010, Shanghai in 2008, Pontevedra in 2006, Gainesville in 2004, Hagenberg in 2002, Zurich in 2000, Beijing in 1998, and Toulouse in 1996.   The 14th edition, ADG 2023, was held in Belgrade, Serbia, in September 20-22, 2023. This edition of ADG had an additional special focus topic, Deduction in Education.   Invited Speakers: Julien Narboux, University of Strasbourg, France "Formalisation, arithmetization and automatisation of geometry"; Filip Mari\'c, University of Belgrade, Serbia, "Automatization, formalization and visualization of hyperbolic geometry"; Zlatan Magajna, University of Ljubljana, Slovenia, "Workshop OK Geometry"

--------------------------------------------------------------------------------------------------------

AAT: Adapting Audio Transformer for Various Acoustics Recognition Tasks

AAT adapts audio transformers for various acoustic recognition tasks via efficient fine-tuning. It has applications in optimizing audio transformers for specialized downstream tasks without compromising generality.

Authors:  Yun Liang, Hai Lin, Shaojian Qiu, Yihang Zhang

Link:  https://arxiv.org/abs/2401.10544v1

Date: 2024-01-19

Summary:

Recently, Transformers have been introduced into the field of acoustics recognition. They are pre-trained on large-scale datasets using methods such as supervised learning and semi-supervised learning, demonstrating robust generality--It fine-tunes easily to downstream tasks and shows more robust performance. However, the predominant fine-tuning method currently used is still full fine-tuning, which involves updating all parameters during training. This not only incurs significant memory usage and time costs but also compromises the model's generality. Other fine-tuning methods either struggle to address this issue or fail to achieve matching performance. Therefore, we conducted a comprehensive analysis of existing fine-tuning methods and proposed an efficient fine-tuning approach based on Adapter tuning, namely AAT. The core idea is to freeze the audio Transformer model and insert extra learnable Adapters, efficiently acquiring downstream task knowledge without compromising the model's original generality. Extensive experiments have shown that our method achieves performance comparable to or even superior to full fine-tuning while optimizing only 7.118% of the parameters. It also demonstrates superiority over other fine-tuning methods.

--------------------------------------------------------------------------------------------------------

Critical Data Size of Language Models from a Grokking Perspective

Critical Data Size of Language Models analyzes the concept of "grokking" in language model training dynamics. It aims to deepen understanding of the role of data in language model learning.

Authors:  Xuekai Zhu, Yao Fu, Bowen Zhou, Zhouhan Lin

Link:  https://arxiv.org/abs/2401.10463v1

Date: 2024-01-19

Summary:

We explore the critical data size in language models, a threshold that marks a fundamental shift from quick memorization to slow generalization. We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics. We develop a grokking configuration to reproduce grokking on simplistic language models stably by rescaling initialization and weight decay. We show that generalization occurs only when language models reach a critical size. We analyze grokking across sample-wise and model-wise, verifying the proposed data efficiency hypothesis. Our experiments reveal smoother phase transitions occurring at the critical dataset size for language datasets. As the model size increases, this critical point also becomes larger, indicating that larger models require more data. Our results deepen the understanding of language model training, offering a novel perspective on the role of data in the learning mechanism of language models.

--------------------------------------------------------------------------------------------------------

ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks

ELRT proposes efficient low-rank training to produce compact yet accurate convolutional neural networks from scratch. This has applications in model compression and efficient deployment.

Authors:  Yang Sui, Miao Yin, Yu Gong, Jinqi Xiao, Huy Phan, Bo Yuan

Link:  https://arxiv.org/abs/2401.10341v1

Date: 2024-01-18

Summary:

Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions still face several challenges, such as a considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy, high-compactness, low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT.

--------------------------------------------------------------------------------------------------------

DiffusionGPT: LLM-Driven Text-to-Image Generation System

DiffusionGPT offers a unified text-to-image generation system using diffusion models and large language models. It has the potential to advance controllable image synthesis across diverse domains.

Authors:  Jie Qin, Jie Wu, Weifeng Chen, Yuxi Ren, Huixia Li, Hefeng Wu, Xuefeng Xiao, Rui Wang, Shilei Wen

Link:  https://arxiv.org/abs/2401.10061v1

Date: 2024-01-18

Summary:

Diffusion models have opened up new avenues for the field of image generation, resulting in the proliferation of high-quality models shared on open-source platforms. However, a major challenge persists in current text-to-image systems are often unable to handle diverse inputs, or are limited to single model results. Current unified attempts often fall into two orthogonal aspects: i) parse Diverse Prompts in input stage; ii) activate expert model to output. To combine the best of both worlds, we propose DiffusionGPT, which leverages Large Language Models (LLM) to offer a unified generation system capable of seamlessly accommodating various types of prompts and integrating domain-expert models. DiffusionGPT constructs domain-specific Trees for various generative models based on prior knowledge. When provided with an input, the LLM parses the prompt and employs the Trees-of-Thought to guide the selection of an appropriate model, thereby relaxing input constraints and ensuring exceptional performance across diverse domains. Moreover, we introduce Advantage Databases, where the Tree-of-Thought is enriched with human feedback, aligning the model selection process with human preferences. Through extensive experiments and comparisons, we demonstrate the effectiveness of DiffusionGPT, showcasing its potential for pushing the boundaries of image synthesis in diverse domains.

--------------------------------------------------------------------------------------------------------

Aligning Large Language Models with Counterfactual DPO

Aligning Large Language Models applies counterfactual prompting to instill desirable behaviors in LLMs without human intervention. It has applications in developing responsible and ethically aligned AI systems.

Authors:  Bradley Butcher

Link:  https://arxiv.org/abs/2401.09566v2

Date: 2024-01-19

Summary:

Advancements in large language models (LLMs) have demonstrated remarkable capabilities across a diverse range of applications. These models excel in generating text completions that are contextually coherent and cover an extensive array of subjects. However, the vast datasets required for their training make aligning response styles during the pretraining and instruction tuning phases challenging. Consequently, an additional alignment phase is typically employed, wherein the model is further trained with human preference data to better align its outputs with human expectations. While this process doesn't introduce new capabilities per se, it does accentuate generation styles innate to the model. This paper explores the utilization of counterfactual prompting within the framework of Direct Preference Optimization (DPO) to align the model's style without relying on human intervention. We demonstrate that this method effectively instils desirable behaviour, mitigates undesirable ones, and encourages the model to disregard inappropriate instructions. Our findings suggest that counterfactual prompting with DPO presents a low-resource way to fine-tune LLMs to meet the demands for responsible and ethically aligned AI systems.

--------------------------------------------------------------------------------------------------------

Code Simulation Challenges for Large Language Models

Code Simulation Challenges investigates the ability of LLMs to simulate computer code and algorithms. It analyzes limitations and proposes methods to improve code execution simulation.

Authors:  Emanuele La Malfa, Christoph Weinhuber, Orazio Torre, Fangru Lin, Anthony Cohn, Nigel Shadbolt, Michael Wooldridge

Link:  https://arxiv.org/abs/2401.09074v1

Date: 2024-01-17

Summary:

We investigate the extent to which Large Language Models (LLMs) can simulate the execution of computer code and algorithms. We begin by looking straight line programs, and show that current LLMs demonstrate poor performance even with such simple programs -- performance rapidly degrades with the length of code. We then investigate the ability of LLMs to simulate programs that contain critical paths and redundant instructions. We also go beyond straight line program simulation with sorting algorithms and nested loops, and we show the computational complexity of a routine directly affects the ability of an LLM to simulate its execution. We observe that LLMs execute instructions sequentially and with a low error margin only for short programs or standard procedures. LLMs' code simulation is in tension with their pattern recognition and memorisation capabilities: on tasks where memorisation is detrimental, we propose a novel prompting method to simulate code execution line by line. Empirically, our new Chain of Simulation (CoSm) method improves on the standard Chain of Thought prompting approach by avoiding the pitfalls of memorisation.

--------------------------------------------------------------------------------------------------------

AI Thrust: Ranking Emerging Powers for Tech Startup Investment in Latin America

AI Thrust ranks Latin American countries by potential for AI growth and success. It provides useful analysis for policymakers, investors and businesses interested in AI development in Latin America.

Authors:  Abraham Ramos Torres, Laura N Montoya

Link:  https://arxiv.org/abs/2401.09056v1

Date: 2024-01-17

Summary:

Artificial intelligence (AI) is rapidly transforming the global economy, and Latin America is no exception. In recent years, there has been a growing interest in AI development and implementation in the region. This paper presents a ranking of Latin American (LATAM) countries based on their potential to become emerging powers in AI. The ranking is based on three pillars: infrastructure, education, and finance. Infrastructure is measured by the availability of electricity, high-speed internet, the quality of telecommunications networks, and the availability of supercomputers. Education is measured by the quality of education and the research status. Finance is measured by the cost of investments, history of investments, economic metrics, and current implementation of AI.   While Brazil, Chile, and Mexico have established themselves as major players in the AI industry in Latin America, our ranking demonstrates the new emerging powers in the region. According to the results, Argentina, Colombia, Uruguay, Costa Rica, and Ecuador are leading as new emerging powers in AI in Latin America. These countries have strong education systems, well-developed infrastructure, and growing financial resources. The ranking provides a useful tool for policymakers, investors, and businesses interested in AI development in Latin America. It can help to identify emerging LATAM countries with the greatest potential for AI growth and success.

--------------------------------------------------------------------------------------------------------

Learning to detect cloud and snow in remote sensing images from noisy labels

Learning to detect cloud and snow proposes methods to address label noise in remote sensing image datasets for cloud and snow detection. This can mitigate overfitting and performance assessment biases.

Authors:  Zili Liu, Hao Chen, Wenyuan Li, Keyan Chen, Zipeng Qi, Chenyang Liu, Zhengxia Zou, Zhenwei Shi

Link:  https://arxiv.org/abs/2401.08932v1

Date: 2024-01-17

Summary:

Detecting clouds and snow in remote sensing images is an essential preprocessing task for remote sensing imagery. Previous works draw inspiration from semantic segmentation models in computer vision, with most research focusing on improving model architectures to enhance detection performance. However, unlike natural images, the complexity of scenes and the diversity of cloud types in remote sensing images result in many inaccurate labels in cloud and snow detection datasets, introducing unnecessary noises into the training and testing processes. By constructing a new dataset and proposing a novel training strategy with the curriculum learning paradigm, we guide the model in reducing overfitting to noisy labels. Additionally, we design a more appropriate model performance evaluation method, that alleviates the performance assessment bias caused by noisy labels. By conducting experiments on models with UNet and Segformer, we have validated the effectiveness of our proposed method. This paper is the first to consider the impact of label noise on the detection of clouds and snow in remote sensing images.

--------------------------------------------------------------------------------------------------------

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

NOTSOFAR-1 Challenge introduces datasets, tasks and baseline system to advance research in distant speech recognition for multi-speaker scenarios. It provides key resources to apply data-driven methods in this domain.

Authors:  Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

Link:  https://arxiv.org/abs/2401.08887v1

Date: 2024-01-16

Summary:

We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics. It is recorded across 30 conference rooms, featuring 4-8 attendees and a total of 35 unique speakers. Second, a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This is aligned with common setups in actual conference rooms, and avoids technical complexities associated with multi-device tasks. It also allows for the development of geometry-specific solutions. The NOTSOFAR-1 Challenge aims to advance research in the field of distant conversational speech recognition, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmarking datasets.

--------------------------------------------------------------------------------------------------------

Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive

Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive integrates adversarial supervision into layout-to-image diffusion models to enhance layout faithfulness. It has applications in domain generalization of semantic segmentation models.

Authors:  Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva

Link:  https://arxiv.org/abs/2401.08815v1

Date: 2024-01-16

Summary:

Despite the recent advances in large-scale diffusion models, little progress has been made on the layout-to-image (L2I) synthesis task. Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout. This limits their usability in practice. To mitigate this, we propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM). Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout. To encourage consistent adherence to the input layout over the sampling steps, we further introduce the multistep unrolling strategy. Instead of looking at a single timestep, we unroll a few steps recursively to imitate the inference process, and ask the discriminator to assess the alignment of denoised images with the layout over a certain time window. Our experiments show that ALDM enables layout faithfulness of the generated images, while allowing broad editability via text prompts. Moreover, we showcase its usefulness for practical applications: by synthesizing target distribution samples via text control, we improve domain generalization of semantic segmentation models by a large margin (~12 mIoU points).

--------------------------------------------------------------------------------------------------------

Resolving Ethics Trade-offs in Implementing Responsible AI

Resolving Ethics Trade-offs in Implementing Responsible AI proposes a framework for managing tensions between AI ethics principles in system implementation. It aims to facilitate regulatory requirements for responsible AI.

Authors:  Conrad Sanderson, Emma Schleiger, David Douglas, Petra Kuhnert, Qinghua Lu

Link:  https://arxiv.org/abs/2401.08103v1

Date: 2024-01-16

Summary:

While the operationalisation of high-level AI ethics principles into practical AI/ML systems has made progress, there is still a theory-practice gap in managing tensions between the underlying AI ethics aspects. We cover five approaches for addressing the tensions via trade-offs, ranging from rudimentary to complex. The approaches differ in the types of considered context, scope, methods for measuring contexts, and degree of justification. None of the approaches is likely to be appropriate for all organisations, systems, or applications. To address this, we propose a framework which consists of: (i) proactive identification of tensions, (ii) prioritisation and weighting of ethics aspects, (iii) justification and documentation of trade-off decisions. The proposed framework aims to facilitate the implementation of well-rounded AI/ML systems that are appropriate for potential regulatory requirements.

--------------------------------------------------------------------------------------------------------

Inpainting Normal Maps for Lightstage data

Inpainting Normal Maps for Lightstage Data introduces a GAN-based method to plausibly fill missing areas in normal maps derived from lightstage capture. It establishes a foundation for performance capture applications.

Authors:  Hancheng Zuo, Bernard Tiddeman

Link:  https://arxiv.org/abs/2401.08099v1

Date: 2024-01-16

Summary:

This study introduces a novel method for inpainting normal maps using a generative adversarial network (GAN). Normal maps, often derived from a lightstage, are crucial in performance capture but can have obscured areas due to movement (e.g., by arms, hair, or props). Inpainting fills these missing areas with plausible data. Our approach extends previous general image inpainting techniques, employing a bow tie-like generator network and a discriminator network, with alternating training phases. The generator aims to synthesize images aligning with the ground truth and deceive the discriminator, which differentiates between real and processed images. Periodically, the discriminator undergoes retraining to enhance its ability to identify processed images. Importantly, our method adapts to the unique characteristics of normal map data, necessitating modifications to the loss function. We utilize a cosine loss instead of mean squared error loss for generator training. Limited training data availability, even with synthetic datasets, demands significant augmentation, considering the specific nature of the input data. This includes appropriate image flipping and in-plane rotations to accurately alter normal vectors. Throughout training, we monitored key metrics such as average loss, Structural Similarity Index Measure (SSIM), and Peak Signal-to-Noise Ratio (PSNR) for the generator, along with average loss and accuracy for the discriminator. Our findings suggest that the proposed model effectively generates high-quality, realistic inpainted normal maps, suitable for performance capture applications. These results establish a foundation for future research, potentially involving more advanced networks and comparisons with inpainting of source images used to create the normal maps.

--------------------------------------------------------------------------------------------------------

Image Translation as Diffusion Visual Programmers

Image Translation as Diffusion Visual Programmers proposes Diffusion Visual Programmers for transparent image translation via sequences of visual programs. It promises broader applications in harmonizing image translation with cognitive intelligence.

Authors:  Cheng Han, James C. Liang, Qifan Wang, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Ying Nian Wu, Dongfang Liu

Link:  https://arxiv.org/abs/2401.09742v1

Date: 2024-01-18

Summary:

We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion model within the GPT architecture, orchestrating a coherent sequence of visual programs (i.e., computer vision models) for various pro-symbolic steps, which span RoI identification, style transfer, and position manipulation, facilitating transparent and controllable image translation processes. Extensive experiments demonstrate DVP's remarkable performance, surpassing concurrent arts. This success can be attributed to several key features of DVP: First, DVP achieves condition-flexible translation via instance normalization, enabling the model to eliminate sensitivity caused by the manual guidance and optimally focus on textual descriptions for high-quality content generation. Second, the framework enhances in-context reasoning by deciphering intricate high-dimensional concepts in feature spaces into more accessible low-dimensional symbols (e.g., [Prompt], [RoI object]), allowing for localized, context-free editing while maintaining overall coherence. Last but not least, DVP improves systemic controllability and explainability by offering explicit symbolic representations at each programming stage, empowering users to intuitively interpret and modify results. Our research marks a substantial step towards harmonizing artificial image translation processes with cognitive intelligence, promising broader applications.

--------------------------------------------------------------------------------------------------------

Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive Analysis

Enhancing Robustness of LLM-Synthetic Text Detectors proposes methods to improve detector robustness on academic writing scenarios. It addresses issues related to misuse of LLMs.

Authors:  Zhicheng Dou, Yuchen Guo, Ching-Chun Chang, Huy H. Nguyen, Isao Echizen

Link:  https://arxiv.org/abs/2401.08046v1

Date: 2024-01-16

Summary:

The emergence of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4) used by ChatGPT, has profoundly impacted the academic and broader community. While these models offer numerous advantages in terms of revolutionizing work and study methods, they have also garnered significant attention due to their potential negative consequences. One example is generating academic reports or papers with little to no human contribution. Consequently, researchers have focused on developing detectors to address the misuse of LLMs. However, most existing methods prioritize achieving higher accuracy on restricted datasets, neglecting the crucial aspect of generalizability. This limitation hinders their practical application in real-life scenarios where reliability is paramount. In this paper, we present a comprehensive analysis of the impact of prompts on the text generated by LLMs and highlight the potential lack of robustness in one of the current state-of-the-art GPT detectors. To mitigate these issues concerning the misuse of LLMs in academic writing, we propose a reference-based Siamese detector named Synthetic-Siamese which takes a pair of texts, one as the inquiry and the other as the reference. Our method effectively addresses the lack of robustness of previous detectors (OpenAI detector and DetectGPT) and significantly improves the baseline performances in realistic academic writing scenarios by approximately 67% to 95%.

--------------------------------------------------------------------------------------------------------

Lessons Learned from Designing an Open-Source Automated Feedback System for STEM Education

Lessons Learned from Designing an Open-Source Automated Feedback System summarizes findings from developing an open-source automated feedback system for STEM education. It reveals insights into user acceptance and intentions to use such systems.

Authors:  Steffen Steinert, Lars Krupp, Karina E. Avila, Anke S. Janssen, Verena Ruf, David Dzsotjan, Christian De Schryver, Jakob Karolus, Stefan Ruzika, Karen Joisten, Paul Lukowicz, Jochen Kuhn, Norbert Wehn, Stefan Küchemann

Link:  https://arxiv.org/abs/2401.10531v1

Date: 2024-01-19

Summary:

As distance learning becomes increasingly important and artificial intelligence tools continue to advance, automated systems for individual learning have attracted significant attention. However, the scarcity of open-source online tools that are capable of providing personalized feedback has restricted the widespread implementation of research-based feedback systems. In this work, we present RATsApp, an open-source automated feedback system (AFS) that incorporates research-based features such as formative feedback. The system focuses on core STEM competencies such as mathematical competence, representational competence, and data literacy. It also allows lecturers to monitor students' progress. We conducted a survey based on the technology acceptance model (TAM2) among a set of students (N=64). Our findings confirm the applicability of the TAM2 framework, revealing that factors such as the relevance of the studies, output quality, and ease of use significantly influence the perceived usefulness. We also found a linear relation between the perceived usefulness and the intention to use, which in turn is a significant predictor of the frequency of use. Moreover, the formative feedback feature of RATsApp received positive feedback, indicating its potential as an educational tool. Furthermore, as an open-source platform, RATsApp encourages public contributions to its ongoing development, fostering a collaborative approach to improve educational tools.

--------------------------------------------------------------------------------------------------------

Optimisation in Neurosymbolic Learning Systems

Optimisation in Neurosymbolic Learning Systems summarizes key research questions related to integrating neural networks and symbolic AI via techniques like fuzzy reasoning and probabilistic reasoning. It has applications in improving explainability and correctness verification of trained systems.

Authors:  Emile van Krieken

Link:  https://arxiv.org/abs/2401.10819v1

Date: 2024-01-19

Summary:

Neurosymbolic AI aims to integrate deep learning with symbolic AI. This integration has many promises, such as decreasing the amount of data required to train a neural network, improving the explainability and interpretability of answers given by models and verifying the correctness of trained systems. We study neurosymbolic learning, where we have both data and background knowledge expressed using symbolic languages. How do we connect the symbolic and neural components to communicate this knowledge? One option is fuzzy reasoning, which studies degrees of truth. For example, being tall is not a binary concept. Instead, probabilistic reasoning studies the probability that something is true or will happen. Our first research question studies how different forms of fuzzy reasoning combine with learning. We find surprising results like a connection to the Raven paradox stating we confirm "ravens are black" when we observe a green apple. In this study, we did not use the background knowledge when we deployed our models after training. In our second research question, we studied how to use background knowledge in deployed models. We developed a new neural network layer based on fuzzy reasoning. Probabilistic reasoning is a natural fit for neural networks, which we usually train to be probabilistic. However, they are expensive to compute and do not scale well to large tasks. In our third research question, we study how to connect probabilistic reasoning with neural networks by sampling to estimate averages, while in the final research question, we study scaling probabilistic neurosymbolic learning to much larger problems than before. Our insight is to train a neural network with synthetic data to predict the result of probabilistic reasoning.

--------------------------------------------------------------------------------------------------------

DeepEdit: Knowledge Editing as Decoding with Constraints

DeepEdit formulates knowledge editing for LLMs as decoding with constraints to improve relevance, coherence and awareness. It has applications in controlled knowledge revision for question answering.

Authors:  Yiwei Wang, Muhao Chen, Nanyun Peng, Kai-Wei Chang

Link:  https://arxiv.org/abs/2401.10471v1

Date: 2024-01-19

Summary:

We develop a new perspective of knowledge editing for large language models (LLMs) as decoding with constraints. We propose DeepEdit (Depth-first Search based Progressive Decoding for Knowledge Editing), a neuro-symbolic method that improves knowledge editing with better coherence of reasoning, relevance to the question, and awareness of updated knowledge. DeepEdit can be flexibly applied to all black-box LLMs: it does not require any access to the model parameters, representations, or output vocabulary distributions. DeepEdit progressively produces the high-quality reasoning steps towards effective knowledge editing. It utilizes a depth-first search to revise the LLMs' output, which improves the output's informativeness to the input question and awareness of the updated knowledge. Qualitatively, DeepEdit effectively controls LLMs to produce more succinct reasoning in accord with knowledge editing. Quantitatively, DeepEdit yields significant gains on MQuaKE, a challenging multi-hop question-answering dataset with knowledge editing. We release the source code at https://github.com/wangywUST/DeepEdit.

--------------------------------------------------------------------------------------------------------


EYE ON A.I. GETS READERS UP TO DATE ON THE LATEST FUNDING NEWS AND RELATED ISSUES. SUBSCRIBE FOR THE WEEKLY NEWSLETTER.