Data engineering skills for analytics and data engineers working with dbt and Snowflake. Automate data pipelines, transform data, and generate SQL code. Integrates with Claude Code for AI-assisted data workflows.
git clone https://github.com/AltimateAI/data-engineering-skills.gitData Engineering Skills is a collection of Claude Code skills built for analytics and data engineers working with dbt and Snowflake. The skills encode proven workflows and best practices, transforming Claude into a capable data engineering assistant for tasks like model creation, error debugging, test schema design, documentation, SQL-to-dbt migration, refactoring, and query optimization. Benchmark results show 53% accuracy on real-world dbt tasks and 84% pass rate on Snowflake query optimization. Skills activate automatically based on your request and work standalone or integrate with Altimate's MCP server for real-time project and warehouse access.
1. **Prepare Your Environment**: Ensure you have dbt installed and configured with Snowflake credentials. Run `dbt init [PROJECT_NAME]` if starting a new project. 2. **Create Source Files**: In your `models/` directory, create a `staging` subdirectory. Add a `schema.yml` file to define sources and tests. 3. **Generate Models**: Use the prompt template to create new dbt models. Replace [PLACEHOLDERS] with your specific requirements. For complex transformations, break them into multiple CTEs as shown in the example. 4. **Test Locally**: Run `dbt compile` to check for syntax errors, then `dbt run --select [MODEL_NAME]` to test the model against your Snowflake warehouse. 5. **Integrate with CI/CD**: Add the generated models to your dbt Cloud or GitHub Actions pipeline. Configure the `dbt test` command to run automatically after each deployment. **Pro Tips:** - Use `dbt docs generate` to create documentation for your models automatically - For large transformations, consider breaking them into multiple models with clear dependencies - Store frequently used transformations in macros to avoid repetition - Monitor Snowflake query performance with `dbt build --profile [PROFILE_NAME] --select [MODEL_NAME]`
Create dbt models following project conventions and best practices
Debug dbt compilation and runtime errors with systematic troubleshooting
Write and test dbt schema tests matching existing project patterns
Optimize expensive Snowflake queries using query history and performance analysis
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/AltimateAI/data-engineering-skillsCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Act as an expert data engineer using dbt and Snowflake. For the project [PROJECT_NAME], write a dbt model to transform [SOURCE_TABLE] into [TARGET_TABLE] with these requirements: [TRANSFORMATION_RULES]. Include error handling for null values in [COLUMN_LIST]. Generate the SQL code and explain the dbt model structure. Add a test for data quality that checks [TEST_CONDITION].
```sql
-- dbt model: stg_customers.sql
-- Transforms raw customer data into a clean staging table
{{
config(
materialized='table',
schema='staging',
tags=['customer']
)
}}
WITH source_data AS (
SELECT
customer_id,
first_name,
last_name,
email,
signup_date,
country,
CASE
WHEN email IS NULL THEN 'missing_email'
ELSE email
END AS cleaned_email,
CASE
WHEN country IN ('USA', 'United States') THEN 'US'
WHEN country IN ('UK', 'United Kingdom') THEN 'UK'
ELSE 'Other'
END AS region
FROM {{ source('raw', 'customers') }}
),
cleaned_data AS (
SELECT
customer_id,
TRIM(first_name) AS first_name,
TRIM(last_name) AS last_name,
cleaned_email,
signup_date::DATE AS signup_date,
region,
CURRENT_TIMESTAMP AS etl_loaded_at
FROM source_data
WHERE customer_id IS NOT NULL
)
SELECT * FROM cleaned_data;
-- Data quality test
{% test not_null('customer_id') %}
SELECT * FROM {{ this }}
WHERE customer_id IS NULL;
{% endtest %}
{% test accepted_values('region', ['US', 'UK', 'Other']) %}
SELECT * FROM {{ this }}
WHERE region NOT IN ('US', 'UK', 'Other');
{% endtest %}
```
**Model Structure Explanation:**
This dbt model follows a clean staging pattern by:
1. First creating a CTE (`source_data`) that handles raw data cleaning including null handling for emails and standardizing country codes
2. Then creating a `cleaned_data` CTE that applies additional transformations like trimming whitespace and proper date casting
3. Finally selecting all columns from the cleaned data CTE
**Key Features Implemented:**
- Null handling for critical fields (customer_id, email)
- Data type conversion (signup_date to DATE)
- Standardization of categorical data (country to region)
- Timestamp for tracking load times
- Two data quality tests: one for null values in customer_id and another for valid region values
**Usage Notes:**
This model should be run as part of your dbt project's staging layer. The tests will automatically run during `dbt test` and fail the build if any records violate the constraints. The `etl_loaded_at` field helps track when data was last processed.SQL-first transformation framework for analytics engineering
AI assistant built for thoughtful, nuanced conversation
IronCalc is a spreadsheet engine and ecosystem
Design, document, and generate code for APIs with interactive tools for developers.
Enterprise workflow automation and service management platform
Automate your spreadsheet tasks with AI power
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan