ML-Ralph is an autonomous ML agent that automates experiment workflows for machine learning engineers. It uses Claude or Codex to run experiments through a cognitive loop: orient, research, hypothesize, execute, analyze, validate, and decide. ML-Ralph connects to Python workflows and integrates with tools like Weights & Biases for tracking experiments.
git clone https://github.com/pentoai/ml-ralph.gitml-ralph is an autonomous ML engineering agent that automates the experiment loop for machine learning projects. It operates through a structured cognitive framework—understand, strategize, execute, and reflect—allocating 70% effort to verification and understanding, 20% to strategy, and 10% to execution. The agent works from a product requirements document, autonomously runs experiments, tracks metrics, and accumulates structured learnings across iterations. Built with a terminal user interface and Claude Code integration, ml-ralph helps ML engineers iterate faster by handling planning, execution, analysis, and learning extraction.
Install Bun v1.0+, tmux, and Claude Code CLI. Run `bunx @pentoai/ml-ralph` inside any ML project directory to launch the terminal UI. Define your goals through a PRD and the agent will work through stories autonomously.
Automating hyperparameter tuning and experiment tracking for ML models
Verifying data integrity and running exploratory analysis before model training
Testing competing hypotheses with minimal viable experiments
Structured documentation and learning accumulation across multiple experiment iterations
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/pentoai/ml-ralphCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
I need ML-Ralph to help me with a machine learning experiment for [COMPANY] in the [INDUSTRY] sector. The goal is to [SPECIFIC GOAL, e.g., improve customer churn prediction]. I have [DATA] available and want to use [SPECIFIC TOOLS, e.g., TensorFlow, PyTorch] for this experiment. Can you guide me through the process?
# ML-Ralph Experiment Workflow for Customer Churn Prediction ## Orient - **Objective**: Improve customer churn prediction for TechSolutions Inc. in the SaaS industry. - **Data**: 5 years of customer interaction data, including usage patterns, support tickets, and billing history. - **Tools**: TensorFlow, Scikit-learn, Weights & Biases. ## Research - **Key Findings**: Customer churn is highly correlated with support ticket resolution time and feature usage frequency. - **Relevant Papers**: "Predicting Customer Churn with Deep Learning" (2022), "Feature Importance in Churn Models" (2021). ## Hypothesis - **Hypothesis**: A model combining LSTM for temporal patterns and XGBoost for feature importance will outperform traditional logistic regression. ## Execute - **Data Preprocessing**: Normalized numerical features, encoded categorical variables, and handled missing values. - **Model Training**: Trained LSTM and XGBoost models separately and then combined their predictions. ## Analyze - **Performance Metrics**: Combined model achieved an AUC-ROC of 0.89, compared to 0.82 for logistic regression. - **Feature Importance**: Support ticket resolution time and feature usage frequency were the top predictors. ## Validate - **Cross-Validation**: Model performance was consistent across different validation sets. - **Business Impact**: Potential to reduce churn by 15% if implemented. ## Decide - **Recommendation**: Deploy the combined LSTM-XGBoost model for real-time churn prediction. - **Next Steps**: A/B test the model's predictions against current churn rates.
Meet your new AI Sales Copywriter 10x Faster and 2x Better Sales Content
AI assistant built for thoughtful, nuanced conversation
Get more done every day with Microsoft Teams – powered by AI
Automate your spreadsheet tasks with AI power
Agentic AI Workflow platform
Connected workspace for docs, wikis, and projects
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan