Data Pipeline Analyst
Turn messy CSVs into clean, queryable datasets with validation
What you can have running in the first 7 days
What is Data Pipeline Analyst?
A data engineering skill for agents that work with structured data. The agent inspects schemas, detects anomalies, writes transformation logic, validates output against expectations, and produces data quality reports. Supports CSV, JSON, SQL databases, and Parquet files. Includes common transforms for deduplication, normalization, and enrichment.
3 min
Advanced
What's Included
- data-pipeline-analyst.md
- transforms/deduplication.md
- transforms/normalization.md
- transforms/enrichment.md
- templates/data-quality-report.md
- templates/schema-analysis.md
- examples/csv-cleanup.md
- examples/sql-migration.md
- config/validation-rules.yaml
- README.md
Preview
# Data Pipeline Analyst Skill
## Pipeline Protocol
### 1. Schema Inspection
- Read first 100 rows and infer column types
- Report: row count, null rate per column, unique counts
- Flag: mixed types, encoding issues, date format inconsistencies
### 2. Quality Check
- Duplicates: check by primary key or full-row hash
- Outliers: flag values > 3 std devs from mean (numeric cols)
- Missing: report null percentage, suggest imputation strategy
### 3. Transform Plan
Before writing any transform code:
- State input schema -> output schema
- List every column that changes and why
- Estimate output row count
- Write validation query to confirm correctnessInstallation Guide
Get up and running in under 5 minutes.
# Copy the skill into your project
cp data-pipeline-analyst/SKILL.md .claude/skills/data-pipeline-analyst.md
# Verify it loads
claude /skill data-pipeline-analystOperator Pack. Pay once for the asset. Upgrade to implementation only when you want higher-touch help.
Community acceleration
Bring your workflow into the Solo Unicorn community for sharper feedback, operator critique, and more visibility once the system is live.
Upgrade path
- Start with this package and validate the workflow.
- Add specialized skills or bundles once the core system is stable.
- Use the community to sharpen positioning, demos, and feedback loops.
Need this adapted to your business?
Buy the asset first if you can run it yourself. If this workflow is business-critical or needs custom implementation, move into a sprint or fractional CIO advisory instead of guessing.
Discuss implementation →Tags
Related Products
Analytics Reporter
Transforms raw data into the insights that drive your next decision.
Data Consolidation Agent
Consolidates scattered sales data into live reporting dashboards.
SQL Query Builder
Natural language to optimized SQL with safety checks
Accounts Payable Agent
Moves money across any rail - crypto, fiat, stablecoins - so you don't have to.