FAMOSE: A ReAct Approach to Automated Feature Discovery

Brave API

FAMOSE is an approach that leverages ReAct agents to automate feature discovery in tabular machine learning tasks, enabling autonomous feature augmentation and selection. By integrating the ReAct (Reasoning and Acting) framework, FAMOSE reduces reliance on domain expertise, allowing the agent to iteratively reason about potential features, act by generating or transforming them, and observe the impact on model performance. This process follows the core ReAct loop of Thought → Action → Observation, where the agent combines chain-of-thought reasoning with external tool use to navigate complex decision spaces.

The agent’s reasoning traces provide transparency into its decision-making, making it possible to audit and validate each step of the feature engineering process. FAMOSE exemplifies how ReAct-based agents can be applied beyond question answering or web search, extending into automated machine learning (AutoML) and agentic data preprocessing workflows. By interacting with data transformation tools and evaluation environments, the agent dynamically constructs and refines features, mimicking how human data scientists experiment and iterate. This approach aligns with broader trends in agentic AI, where systems like MLAgentBench also employ ReAct and Reflexion frameworks to solve diverse ML problems autonomously

Generated Feb 22, 2026

Cerebras Thinking

FAMOSE introduces a novel framework for automating feature engineering in tabular machine learning by leveraging Large Language Model (LLM) agents guided by the ReAct (Reasoning + Acting) paradigm. Unlike traditional AutoML systems that rely on exhaustive search algorithms or static libraries of transformations, FAMOSE employs an autonomous agent that iteratively reasons about the dataset, hypothesizes potential feature augmentations, and executes code to implement and validate them. The system operates within a closed loop: the agent analyzes data distributions and correlation patterns, generates feature creation strategies (such as mathematical transformations or interaction terms), and evaluates the impact of these features on downstream model performance. This dynamic approach allows the system to discover complex, non-intuitive feature representations that standard automated methods might miss.

The key contribution of this work lies in its demonstration that LLMs can effectively function as competent data scientists, capable of writing and debugging Python code to manipulate data structures rather than merely selecting from pre-defined options. By utilizing the ReAct framework, the agent maintains a memory of its actions and can self-corate or pivot strategies based on validation feedback, significantly reducing the reliance on manual domain expertise. This is a critical advancement for the field of AutoML because it lowers the barrier to entry for high-performance tabular modeling; it enables practitioners with deep learning knowledge but limited statistical intuition to achieve state-of-the-art results while simultaneously accelerating the iterative cycle of model development.

Generated Mar 12, 2026

Open-Weights Reasoning

FAMOSE: A ReAct Approach to Automated Feature Discovery

The paper FAMOSE: A ReAct Approach to Automated Feature Discovery introduces a novel framework for automated feature engineering and selection in tabular machine learning (ML). Building on the ReAct (Reasoning + Acting) paradigm, FAMOSE enables agents to autonomously explore and augment tabular data by generating new features through logical reasoning and programmatic actions. Unlike traditional feature engineering methods that rely heavily on domain expertise, FAMOSE leverages a combination of large language models (LLMs) and symbolic reasoning to dynamically propose and refine features, making the process more scalable and adaptable to diverse datasets.

The key contributions of FAMOSE include its ability to automatically discover meaningful features without manual intervention, reducing the need for expert knowledge while improving model performance. The approach integrates LLM-based reasoning to hypothesize potential feature transformations and symbolic execution to validate and apply them. Experimental results demonstrate that FAMOSE outperforms baseline feature selection methods, particularly in low-data regimes where manual feature engineering is impractical. By bridging the gap between automated reasoning and practical ML workflows, FAMOSE offers a promising direction for making tabular data preprocessing more efficient and accessible. This work is significant for researchers and practitioners seeking to reduce the manual burden in feature engineering while maintaining (or improving) model accuracy.

Generated Mar 12, 2026