SQL Mistakes to Avoid: Optimize Your Queries Like a Pro

Anastasia Sukhanova @devtodev
3 min readNov 14, 2024

--

Learn how to optimize SQL queries in product analytics by avoiding common pitfalls, enhancing query performance, and tracking key metrics like retention and DAU efficiently.

SQL is a powerful tool for data analysis, but even small mistakes can slow down your queries, increase costs, and impact your product analytics.

This article shares practical tips to help you refine your SQL practices, reduce costs, and improve your data workflows in devtodev.

1. Avoid Over-Retrieving Data with SELECT *

Using SELECT * can slow down performance by fetching unnecessary columns. Instead, specify only the columns you need.

Inefficient:

with funnel_events AS (
SELECT * FROM your_project.custom_events
WHERE eventtime::date = '2024-09-09'
)
SELECT
name as stage,
COUNT(DISTINCT devtodevid) as num_users
FROM funnel_events
WHERE name in ('registration','engagement','purchase')
GROUP BY name
ORDER BY num_users desc

Optimized:

with funnel_events AS (
SELECT DISTINCT
name,
devtodevid
FROM your_project.custom_events
WHERE eventtime::date = '2024-09-09'
AND name in ('registration','engagement','purchase')
)
SELECT
name as stage,
COUNT(devtodevid) as num_users
FROM funnel_events
GROUP BY name
ORDER BY num_users desc;

2. Use Event Time Filters

For large datasets, applying date filters like eventtime reduces processing time and improves efficiency.

Inefficient:​

SELECT 
devtodevid,
name,
eventtime
FROM
your_project.custom_events
WHERE
name in ('login', 'purchase', 'logout');

Optimized:​​​

SELECT 
devtodevid,
name,
eventtime
FROM
your_project.custom_events
WHERE
eventtime::date BETWEEN '2024-09-01' AND '2024-09-10'
AND name in ('login', 'purchase', 'logout');

3. Streamline DAU/MAU Aggregations

Counting unique users with DISTINCT can be slow. Pre-aggregate DAU tables or group users before counting.

Inefficient: ​

SELECT 
COUNT(DISTINCT devtodevid) AS daily_active_users
FROM your_project.sessions
WHERE eventtime::date = '2024-09-09';

Optimized:​

SELECT 
COUNT(devtodevid) AS daily_active_users
FROM your_project.dau
WHERE eventdate = '2024-09-09';

4. Consolidate Funnel Analysis with FILTER

Instead of using multiple subqueries for funnel analysis, consolidate steps with FILTER clauses.

Inefficient:​

SELECT *
FROM
(SELECT COUNT(*) FROM your_project.custom_events WHERE name = 'registration' and eventtime>=current_date - interval '1 month') AS total_registrations,
(SELECT COUNT(*) FROM your_project.custom_events WHERE name = 'engagement' and eventtime>=current_date - interval '1 month') AS total_engagements,
(SELECT COUNT(*) FROM your_project.custom_events WHERE name = 'purchase' and eventtime>=current_date - interval '1 month') AS total_purchases

Optimized:​

SELECT 
COUNT(DISTINCT devtodevid) filter (where name = 'registration') as total_registrations,
COUNT(DISTINCT devtodevid) filter (where name = 'engagement') as total_engagements,
COUNT(DISTINCT devtodevid) filter (where name = 'purchase') as total_purchases
FROM your_project.custom_events
WHERE eventtime::date >= current_date - interval '1 month';

5. Calculate Retention Efficiently with Window Functions

Using subqueries for retention metrics is resource-intensive. Window functions like FIRST_VALUE provide a faster alternative.

6. Pre-Aggregate Data for Dashboards

Real-time aggregation in dashboards can cause delays. Use materialized views to speed up data loading.

Inefficient:

SELECT 
COUNT(DISTINCT devtodevid) AS total_users,
COUNT(paymentid) AS total_purchases,
SUM(revenueusd) AS total_revenue
FROM your_project.payments
WHERE eventtime::date = '2024-09-01';

Optimized:

-- Pre-aggregated summary table created using VIEW
CREATE MATERIALIZED DAILY VIEW
daily_purchase_summary_2024 (purchase_date,total_users, total_purchases, total_revenue) as
SELECT
eventtime::date AS purchase_date,
COUNT(DISTINCT devtodevid) AS total_users,
COUNT(paymentid) AS total_purchases,
SUM(revenueusd) AS total_revenue
FROM your_project.payments
WHERE eventtime::date >= '2024-01-01'
GROUP BY 1;
-- Querying the summary table from VIEW
SELECT
total_purchases,
total_revenue
FROM view.daily_purchase_summary_2024
WHERE purchase_date = '2024-09-01';

7. Optimize Cohort Granularity

In cohort analysis, group data by weeks or months to reduce dataset size and improve processing.

Inefficient:

SELECT 
devtodevid,
created::date as user_created_date
FROM your_project.users
WHERE created::date >= '2024-01-01';

Optimized:​

SELECT 
devtodevid,
DATE_TRUNC('month', created) AS cohort_month
FROM your_project.users
WHERE created::date >= '2024-01-01';

8. Combine Conditions to Avoid Redundant Data Selection

Avoid multiple scans by combining cohort conditions into a single query with CASE statements.

9. Filter Before Joining

Apply filters to reduce rows before joining, making your queries faster and more efficient.

Conclusion

Effective SQL optimization helps product analytics teams make better, data-driven decisions while reducing costs. By following these best practices, you’ll enhance query performance, reduce resource consumption, and ensure faster access to critical insights.

For more details and practical examples, check out the full article, where we dive into advanced SQL optimization strategies, troubleshooting tips, and real-world use cases that can streamline your data workflows.

Leveraging tools like devtodev can further enhance your SQL practices, helping you achieve faster, more efficient insights for your product analytics needs.

Read more about using analytics to improve your product’s performance.

--

--

Anastasia Sukhanova @devtodev
Anastasia Sukhanova @devtodev

Written by Anastasia Sukhanova @devtodev

Customer Success Manager at www.devtodev.com. Everything you need to know about analyzing and improving games and apps.

No responses yet