Best AI Prompts for Python Data Analysis with ChatGPT
TL;DR
- ChatGPT accelerates Python data analysis by generating visualization code and automation scripts
- Use specific data context and desired output for relevant code generation
- Apply prompt patterns for Matplotlib, Pandas, and Plotly workflows
- Combine AI generation with data exploration for efficient analysis
- Build reusable prompt templates for recurring analysis tasks
Introduction
Data analysis shouldn’t require remembering every Matplotlib API detail or Pandas method signature. The work should focus on extracting insights from data, not wrestling with code syntax. ChatGPT bridges this gap by generating accurate visualization code and data transformation scripts when given clear specifications.
The key lies in providing data context and desired outputs rather than abstract requests. When ChatGPT knows your column names, data types, and visualization goals, it produces code that works with your actual data structure.
This guide provides battle-tested prompts for Python data analysis workflows.
Table of Contents
- Why ChatGPT for Data Analysis
- Pandas Data Manipulation
- Matplotlib Visualizations
- Plotly Interactive Charts
- Data Cleaning Scripts
- Analysis Automation
- FAQ
Why ChatGPT for Data Analysis
Code Generation: Generate accurate Pandas, Matplotlib, and Plotly code quickly.
Syntax Reference: Get correct API usage without searching documentation.
Pattern Application: Apply common analysis patterns to your specific data.
Automation Scripts: Build reusable analysis pipelines.
Debugging Support: Identify and fix data processing errors.
Pandas Data Manipulation
Basic Operations
Prompt 1 - Data Loading:
Write code to load and explore this dataset.
Data file: [filename.csv/excel/json]
Expected structure:
- Rows: [approximate count]
- Key columns: [column names]
Analysis goal:
[What you want to understand from the data]
Code requirements:
1. Load data with appropriate pandas function
2. Display first 5 rows
3. Show data types
4. Show basic statistics for numeric columns
5. Check for missing values
6. Show unique value counts for key categorical columns
Include output interpretation comments.
Prompt 2 - Data Filtering:
Write code to filter this dataset.
DataFrame: df
Columns: [list]
Filter requirements:
1. Date range: [start] to [end], column: [date_column]
2. Category: [value], column: [category_column]
3. Numeric range: [min] to [max], column: [numeric_column]
4. Combined: all conditions together
Expected output:
- Filtered DataFrame
- Row count before and after
- Summary statistics of filtered result
Make filtering clean and efficient.
Aggregation and Transformation
Prompt 3 - GroupBy Analysis:
Analyze this data with groupby operations.
DataFrame: df
Grouping columns: [columns]
Analysis columns: [columns]
Analysis requirements:
1. Group by [column], calculate mean of [column]
2. Group by multiple columns: [columns], count occurrences
3. Aggregate with multiple functions: sum, mean, count, min, max
4. Sort results by [criteria]
Output format:
- Console display with formatted output
- Top 10 results if large dataset
Include interpretation of what the groupings reveal.
Prompt 4 - Data Transformation:
Transform this dataset.
DataFrame: df
Transformation type: [pivot/merge/concat/etc.]
Current structure:
[Describe current columns and structure]
Target structure:
[Describe desired output]
Transformation logic:
1. [Step 1]
2. [Step 2]
3. [Step 3]
Handle edge cases:
- Missing values: [strategy]
- Duplicate keys: [strategy]
- Type mismatches: [strategy]
Generate complete transformation code.
Matplotlib Visualizations
Basic Charts
Prompt 5 - Line Chart:
Create line chart for [metric over time].
Data: [DataFrame or data source]
X-axis: [column with dates/times]
Y-axis: [column to plot]
Chart requirements:
- Title: [descriptive title]
- X-axis label: [label]
- Y-axis label: [label]
- Line style: [solid/dashed]
- Color: [specific color or palette]
- Grid: show minor gridlines
- Figure size: (12, 6)
- DPI: 300 for saving
Annotations to add:
- [Any specific data point annotations]
Save to: [filename.png]
Make the chart publication-ready.
Prompt 6 - Bar Chart:
Create bar chart for [comparison data].
Data: [DataFrame or data source]
Categories: [column]
Values: [column]
Chart type: [vertical/horizontal/stacked/grouped]
Chart requirements:
- Title: [descriptive title]
- Bar labels: [category names]
- Value labels on bars: [show values]
- Legend: [if grouped chart]
- Color scheme: [specific or palette]
- Figure size: (10, 6)
Sorting: [ascending/descending/natural]
Save to: [filename]
Make comparison easy to read.
Advanced Visualizations
Prompt 7 - Multi-Panel Figure:
Create multi-panel figure for [analysis purpose].
Subplot layout: [rows] x [columns]
Total subplots: [number]
Panel A - [Chart type]:
- Data: [what to plot]
- X: [column]
- Y: [column]
- Title: [panel title]
Panel B - [Chart type]:
- Data: [what to plot]
- X: [column]
- Y: [column]
- Title: [panel title]
Shared elements:
- Common X-axis label: [label]
- Common Y-axis label: [label]
- Overall title: [title]
Layout: tight_layout with proper spacing
Save to: [filename]
Prompt 8 - Statistical Plot:
Create statistical visualization for [purpose].
Data: [DataFrame and columns]
Plot type options:
1. Box plot: [columns to compare]
2. Violin plot: [if distribution matters]
3. Histogram: [for single variable]
4. KDE plot: [for density estimation]
Statistical elements:
- Mean/median markers
- Standard deviation shading
- Outlier highlighting
Chart requirements:
- Title: [purpose]
- Labels: [clear axis labels]
- Legend: [if multiple groups]
Save to: [filename]
Make statistical patterns visible.
Plotly Interactive Charts
Interactive Dashboards
Prompt 9 - Interactive Scatter Plot:
Create interactive scatter plot with Plotly.
Data: [DataFrame]
X-axis: [column]
Y-axis: [column]
Color: [column for grouping]
Size: [column for sizing, optional]
Chart requirements:
- Title: [descriptive title]
- Hover info: [what to show on hover]
- Legend: [grouping variable]
Interactivity:
- Zoom: enabled
- Pan: enabled
- Hover tooltips: detailed
Layout:
- Width: 900px
- Height: 600px
- Template: [plotly_white/gridon]
Output as HTML for easy sharing.
Prompt 10 - Interactive Time Series:
Create interactive time series chart.
Data: [DataFrame]
Date column: [column name]
Value columns: [list of columns to plot]
Chart requirements:
- Title: [what the series shows]
- Line curves: [smooth/straight]
- Color palette: [specific colors per series]
Range slider: enabled
Range selector buttons: [1M, 3M, 6M, YTD, 1Y]
Annotations:
- Mark significant events: [if applicable]
Hover mode: [closest/comprehensive]
Output as HTML with embedded JavaScript.
Dashboard Components
Prompt 11 - Interactive Dashboard Panel:
Create dashboard panel for [metric display].
Panel type: KPI card with trend
Data source: [DataFrame column]
Display elements:
- Current value: [formatted with units]
- Comparison to prior: [percentage or absolute]
- Trend arrow: [up/down based on direction]
- Sparkline: [7/30 day trend]
Color coding:
- Green: [positive threshold]
- Red: [negative threshold]
- Yellow: [neutral range]
Dashboard layout specs:
- Card background: white
- Border radius: 10px
- Shadow: subtle
- Padding: 20px
Generate HTML component for dashboard embedding.
Data Cleaning Scripts
Cleaning Operations
Prompt 12 - Comprehensive Data Cleaning:
Clean this dataset for analysis.
Data: [DataFrame]
Columns: [list]
Cleaning operations:
1. Missing values:
- Numeric columns: [fill strategy]
- Categorical columns: [fill strategy]
- Drop rows if: [criteria]
2. Outliers:
- Detection method: [IQR/z-score/percentile]
- Handling: [cap/remove/investigate]
3. Data type corrections:
- [Column]: [current] -> [target type]
- [Column]: [current] -> [target type]
4. Duplicates:
- Check: [specific columns]
- Action: [drop/keep first/keep last]
5. Text cleaning:
- [Column]: strip whitespace, [additional cleaning]
- [Column]: standardize format
Output:
- Cleaned DataFrame
- Report of changes made
- Validation checks
Generate production-ready cleaning script.
Validation Scripts
Prompt 13 - Data Validation:
Write data validation for [dataset].
Schema requirements:
- [Column]: dtype=[type], required=[yes/no], unique=[yes/no]
- [Column]: dtype=[type], range=[min-max], required=[yes/no]
- [Column]: allowed_values=[list]
Validation checks:
1. Type validation: [column checks]
2. Range validation: [value bounds]
3. Required fields: [null checks]
4. Uniqueness: [duplicate checks]
5. Pattern matching: [regex if text]
Output:
- Validation report
- Flagged issues list
- Row indices of problematic records
Make validation reusable for ongoing data quality checks.
Analysis Automation
Report Generation
Prompt 14 - Automated Analysis Report:
Generate automated analysis script for [dataset].
Data source: [file/path or connection]
Report sections:
1. Executive Summary:
- Row/column counts
- Key metrics overview
- Notable findings (auto-detected)
2. Data Quality:
- Missing value summary
- Duplicate record count
- Data type issues
3. Univariate Analysis:
- Numeric columns: distribution statistics
- Categorical columns: value counts
- Visualization: histograms and bar charts
4. Bivariate Analysis:
- Correlations between numeric columns
- [Specific relationships to explore]
5. Output:
- Print report to console
- Export charts to [directory]
- Export summary to [file.csv]
Run with: python analyze_[dataset].py
Make it fully automated.
Pipeline Construction
Prompt 15 - ETL Pipeline:
Build data pipeline for [purpose].
Pipeline stages:
Stage 1 - Extract:
- Source: [file/database/API]
- Connection: [credentials/parameters]
- Extraction method: [full/incremental]
Stage 2 - Transform:
- [Transformation 1]
- [Transformation 2]
- [Transformation 3]
Stage 3 - Load:
- Target: [destination]
- Mode: [overwrite/append]
- Schema: [if applicable]
Error handling:
- Connection failures: [retry/log/exit]
- Data validation failures: [ quarantine/reject/log]
- Transformation errors: [log/skip/notify]
Logging:
- Log level: [INFO/DEBUG]
- Log file: [path]
- Key milestones logged
Schedule (if applicable):
[crontab or similar]
Generate complete, production-ready pipeline.
FAQ
How do I get accurate visualization code from ChatGPT?
Provide data context: column names, data types, sample values. Specify exact visualization goals: “bar chart comparing X across categories” works better than “make a chart.”
Can ChatGPT handle large datasets?
ChatGPT generates code, not run data processing. For large datasets, the code it generates should include efficient pandas operations, chunking for memory management, or sampling strategies.
How do I customize ChatGPT-generated visualizations?
Add specific requirements to prompts: “use corporate blue color scheme,” “add data labels,” “make title larger.” Be explicit about customization needs.
Can ChatGPT help with statistical analysis?
Yes. Specify the statistical test or analysis you need: “calculate correlation matrix,” “perform t-test between groups,” “run linear regression.” Provide context about your data structure.
How do I save and reuse effective prompts?
Build a prompt library with your common analysis types. Include data context templates: “When I have [data type] and want [visualization], use this prompt structure.” Refine based on what produces best results.
Conclusion
ChatGPT transforms Python data analysis from API lookup to insight extraction. Provide clear data context and visualization goals; receive production-ready code that works with your actual data structure.
Key Takeaways:
- Specify data context and column names for relevant code
- Include visualization details in prompts
- Build reusable prompt templates for recurring tasks
- Review generated code before running
- Combine AI efficiency with data exploration intuition
Focus on insights, not syntax.
Looking for more data analysis resources? Explore our guides for Python automation and data visualization best practices.