BigQuery charges by the byte processed. A query that scans 100GB costs 100 times more than a query that scans 1GB, even if they return the same number of rows. For teams running BigQuery at scale, query optimization is not a nice-to-have performance tweak. It is a direct cost control mechanism.
The challenge is that SQL optimization requires understanding both the query structure and BigQuery’s execution model. Most optimization advice is either too generic (“avoid SELECT *”) or too specialized (“use APPROX_COUNT_DISTINCT instead of COUNT DISTINCT”). What engineers need is guidance that accounts for their specific query, data, and use case.
Gemini 3 Pro fills that gap. Given your actual query and schema context, it identifies specific optimization opportunities and explains why each matters for BigQuery’s architecture. Here are 10 prompts that unlock that capability.
Key Takeaways
- BigQuery costs scale with data scanned, not with result set size
- Partitioning and clustering are the highest-impact optimizations for large tables
- Approximate functions can reduce computation by orders of magnitude for acceptable accuracy
- JOIN strategy dramatically affects performance in BigQuery
- Always examine query execution plans to understand what BigQuery actually does
- Test optimized queries against original queries for performance and cost comparison
Why BigQuery Optimization Matters More Than Other Databases
Unlike traditional databases where optimization focuses on query speed, BigQuery optimization is fundamentally about cost. BigQuery’s architecture means that most optimizations reduce the bytes processed, which directly reduces cost, while simultaneously improving query speed.
This changes the optimization calculus. In a traditional database, you might accept a slower query if it is easier to maintain. In BigQuery, a slower query that scans fewer bytes is always preferable because it costs less to run.
Understanding what BigQuery charges for also reveals where the biggest optimization opportunities are. JOINs that duplicate data across large tables are expensive. Subqueries that process the same intermediate result multiple times are expensive. Scanning tables without partition filters is expensive. All of these are fixable with better SQL.
10 Best Gemini 3 Pro SQL Query Optimization Prompts for BigQuery
Prompt 1: General Query Cost Reduction
Analyze and optimize the following BigQuery SQL query for cost reduction. I want to reduce bytes processed while maintaining result accuracy.
Query:
[ paste your SQL query here ]
Schema context:
- Table being queried: [table name]
- Table size: [approximate size in GB/TB if known]
- Partition field: [field used for partitioning, if any]
- Clustering fields: [fields used for clustering, if any]
Specific concerns:
- [e.g., this query runs daily on a cron, cost is becoming high / this query times out / result accuracy can be approximate]
Provide:
1. Byte reduction estimate for each suggested optimization
2. Specific rewrite of problematic clauses
3. Alternative approaches if the current approach is fundamentally expensive
4. Partition and cluster utilization analysis
Why this prompt structure works: Cost optimization requires knowing what you are optimizing for. This prompt provides the query, schema context, and your specific concerns, which lets Gemini 3 Pro provide targeted recommendations rather than generic advice.
Prompt 2: JOIN Performance Analysis
Analyze the following BigQuery query for JOIN performance issues:
Query:
[ paste your SQL with JOINs ]
Table sizes:
- Table A: [size and whether partitioned/clustered]
- Table B: [size and whether partitioned/clustered]
- Table C: [size and whether partitioned/clustered]
JOIN keys:
- A to B: [join condition]
- B to C: [join condition]
Current issue: [e.g., query is slow / query produces unexpected row multiplication / query runs out of memory]
Provide:
1. Analysis of why the JOIN is expensive (broadcast vs. shuffle, cardinality issues)
2. Rewrite that handles the JOIN more efficiently
3. Recommended table ordering for JOINs
4. Handling of NULLs in join keys
5. If using JOIN strategy that requires assumptions, state them explicitly
Why this prompt structure works: JOIN performance depends entirely on table sizes, data distribution, and join key characteristics. This prompt provides the context BigQuery needs to recommend the right JOIN strategy.
Prompt 3: Partition Filter Optimization
The following query does not utilize table partitions efficiently:
Query:
[ paste your SQL query ]
Table: [table name]
Partition field: [field name]
Typical query filter: [what you typically filter on]
Current behavior: [e.g., query scans entire table / partition filter is not being recognized / query filters on a field that is not the partition field]
Provide:
1. Explanation of why the partition filter is not being utilized
2. Rewrite that ensures partition pruning
3. Alternative approach if the required filter cannot be applied to the partition field
4. Monitoring query to verify partition utilization in execution plan
Why this prompt structure works: Partition pruning is the most effective cost optimization for large tables. This prompt diagnoses why partition pruning is not happening and provides specific rewrites to enable it.
Prompt 4: Approximate Function Conversion
Convert the following exact BigQuery aggregation query to use approximate functions where accuracy is acceptable:
Query:
[ paste your SQL with COUNT DISTINCT, COUNT(DISTINCT), or other expensive aggregations ]
Aggregation that needs optimization:
[ e.g., COUNT(DISTINCT user_id) - we need approximately 95% accuracy or better ]
Business use case:
[ e.g., daily active user reporting / unique visitor counts for dashboard ]
Required accuracy: [ percentage or whether exact count is required ]
Provide:
1. Conversion to APPROX_COUNT_DISTINCT or other approximate functions
2. Expected error rate with approximate approach
3. Comparison of cost reduction vs. accuracy trade-off
4. Validation query to confirm approximate results are within acceptable bounds
Why this prompt structure works: APPROX_COUNT_DISTINCT and other approximate functions can reduce computation by 10-100x with typically less than 2% error. This prompt helps identify where approximate functions are appropriate and how to implement them correctly.
Prompt 5: Subquery Optimization
Optimize the following BigQuery query that uses subqueries:
Query:
[ paste your SQL with subqueries ]
Subquery usage:
- [ e.g., correlated subquery in WHERE clause / multiple subqueries that could share intermediate results ]
Performance issue:
[ e.g., subquery runs for every row / intermediate result is recomputed multiple times / query is timing out ]
Provide:
1. Explanation of why the current subquery approach is expensive
2. Rewrite using window functions, CTEs, or JOINs instead
3. Shared intermediate computation approach if multiple subqueries compute similar results
4. Cost comparison between original and rewritten approach
Why this prompt structure works: Subqueries, especially correlated subqueries in WHERE clauses, are one of the most common sources of expensive BigQuery queries. This prompt generates alternative approaches using BigQuery’s strengths.
Prompt 6: Repeated Query Pattern Optimization
We run variations of this query repeatedly with different filter values:
Base query:
[ paste your SQL query ]
Typical filter variations:
- filter_field = [list of typical values]
- date_range typically covers [typical range]
- This query runs [frequency, e.g., hourly/daily]
Cost per run: [estimate if known]
Total monthly cost: [estimate if known]
Provide:
1. Analysis of what changes between runs and what stays the same
2. Caching recommendations to avoid recomputation
3. Materialized view or table approach if underlying data changes infrequently
4. Query parameterization suggestions for BI tool integration
5. Estimated cost reduction from recommended changes
Why this prompt structure works: Repeated queries are the biggest cost opportunity for teams running dashboards or scheduled jobs. This prompt identifies what can be cached or pre-computed versus what must run fresh each time.
Prompt 7: ARRAY and STRUCT Query Optimization
Optimize the following BigQuery query that processes ARRAY or STRUCT data types:
Query:
[ paste your SQL that unnests arrays or accesses struct fields ]
Data structure:
- [ describe the array/struct schema ]
Current performance issue:
[ e.g., UNNEST creates large row expansion / repeated array access in WHERE clause is slow ]
Provide:
1. Explanation of why array processing is expensive
2. Rewrite using BigQuery ARRAY functions that avoid row expansion
3. Alternative approach using subselects or lateral joins
4. Index/clustering recommendations for array-heavy access patterns
Why this prompt structure works: ARRAY and STRUCT processing in BigQuery requires understanding of how UNNEST operations affect row counts. This prompt generates alternatives that work with array data without the performance cost of row expansion.
Prompt 8: Date/Time Manipulation Optimization
Optimize the following BigQuery query with expensive date/time operations:
Query:
[ paste your SQL with date_part, DATE_TRUNC, TIMESTAMP_DIFF, or other date manipulations ]
Date operations used:
[ list the date functions being used and on what fields ]
Performance issue:
[ e.g., DATE_TRUNC on unpartitioned field is slow / current_timestamp() prevents caching / date parsing from string is expensive ]
Provide:
1. Rewrite that optimizes date operations for BigQuery
2. Partition and clustering recommendations for date fields
3. Current_timestamp replacement that enables query caching
4. Cost comparison if query runs frequently
Why this prompt structure works: TIMESTAMP functions like current_timestamp() prevent BigQuery’s query caching. Date parsing from strings is expensive. These issues compound in queries that run on schedules.
Prompt 9: Full Table Scan Prevention
This query is scanning more data than necessary:
Query:
[ paste your SQL query ]
Table: [table name]
Table size: [size]
Partition field: [field]
Cluster fields: [fields]
What I expect to be scanned: [e.g., last 7 days based on filter]
What BigQuery actually scans: [e.g., entire table]
WHERE clause breakdown:
[ describe your filters ]
Provide:
1. Analysis of why full table scan occurs despite filter
2. Rewrite that ensures selective scanning
3. Filter order recommendations
4. Partition and clustering field recommendations
5. Execution plan query to verify what is actually scanned
Why this prompt structure works: Full table scans on large tables are the most expensive BigQuery pattern. This prompt diagnoses the specific filter issue causing the full scan.
Prompt 10: Query Review for BI Tool Integration
Review the following query for use in a BI tool (Looker Studio/Tableau/Metabase) where it will be run with different filter values by end users:
Query:
[ paste your SQL query ]
BI tool context:
- Dashboard loads [number] views per day
- Users typically filter by [fields]
- Query result should support [chart types or granularity]
Security context:
- Row-level security required: [field that determines what users see]
- Users should only see their own data: [Y/N]
Provide:
1. Recommended parameterization approach for BI tool integration
2. Row-level security implementation
3. Aggregation level recommendations for dashboard performance
4. Caching strategy for common filter combinations
5. Cost estimate for typical dashboard usage patterns
Why this prompt structure works: BI tool queries introduce complexity around parameterization, row-level security, and caching that standard query optimization does not address. This prompt generates BI-optimized SQL.
How to Get Better Results from BigQuery Optimization Prompts
Provide Table Schema
BigQuery optimization requires understanding your table’s partitioning, clustering, and data types. Include your table schema in prompts for accurate recommendations.
Explain Business Context
The same query might have different optimal implementations depending on whether it runs once or a million times per day. Business context (frequency, user count, result accuracy requirements) affects optimization decisions.
Verify Execution Plans
AI recommendations should be verified against BigQuery’s actual execution plan using the EXPLAIN or EXPLICIT PLAN modes. Compare bytes processed before and after optimization.
Test Accuracy Trade-offs
Approximate function conversions and other optimizations may introduce acceptable accuracy trade-offs. Always test that results remain within acceptable bounds for your use case.
FAQ
Does BigQuery optimization also speed up queries?
Yes. Because BigQuery charges per byte processed, optimizations that reduce bytes processed almost always reduce query execution time proportionally. The main exception is queries limited by network latency or result set size rather than computation.
How much can I reduce BigQuery costs with optimization?
Typical optimization reduces costs by 30-90% depending on the starting point. Queries with no partition filters, SELECT *, and exact COUNT DISTINCT on large tables have the highest reduction potential. Already-optimized queries have less room for improvement.
Should I use approximate functions for all COUNT DISTINCT?
No. Approximate functions are appropriate for exploratory analysis, dashboards, and reports where 1-2% error is acceptable. Do not use approximate functions for financial calculations, user-facing counts that affect business logic, or any case where exact answers are required.
How do I verify that a partition filter is actually being used?
Run your query with EXPLAIN or check the execution details in the BigQuery console. Look for “Stage: Read from table” entries that show “Full table scan” versus “Filtered using partition columns.”
Conclusion
BigQuery’s pricing model makes query optimization a direct cost management strategy, not just a performance tuning exercise. Every byte not processed is a byte not billed.
The 10 prompts in this guide cover the main optimization scenarios: cost reduction, JOIN performance, partition utilization, approximate functions, subquery elimination, repeated query patterns, array processing, date manipulation, full table scan prevention, and BI tool integration. Each prompt is structured to provide BigQuery-specific context that drives targeted recommendations.
Use these prompts to audit your most expensive queries. Start with the queries that run most frequently or process the most data. Even small optimizations compound when applied to queries running on hourly schedules.
The goal is not perfect SQL on the first try. The goal is continuous improvement: run the query, see what BigQuery actually does with it, optimize based on what you learn, and repeat.