How to Write an Efficient SQL

by Apurva T. on April 9, 2013

in Performance Tuning

Writing a good SQL helps your organization in several ways. It has a manifold effect, which many developers do not realize either because they do not care or they do not know. This about the time you can save by writing a good SQL. No need of DBAs to be involved, you can avoid urgent or outage scenarious, no need for upper management to be involved and no need for RCAs and permanent fixes.

Writing a good SQL does not take much effort. You just need to be aware of a few golden nuggets.

A poor performing SQL has several causes. One of them is tablescan.

When and Why Table Scan

It is not that table scan or TBScan is always a poor way of fetching data from the tables. Many-a-times, the optimizer selects table scan over index scam or IXScan. Some of those conditions are mentioned below:

  • If the amount of data is high (eg – PS_ADDRESS_TYP_TBL), a search using an index may not be faster than a table scan.
  • If the query includes all rows in the table (without a WHERE clause), the optimizer would do a table scan.
  • If you’re retrieving a lot of data from the table, an index may give no advantage.
  • If the optimizer does not have accurate information on data distribution, it may pick the wrong index or no index at all.
  • If you include certain elements in the WHERE clause, you may make the index unavailable.

When and Why Index Scan

It is not that table scan or TBScan is always a poor way of fetching data from the tables. Many-a-times, the optimizer selects table scan over index scam or IXScan. Some of those conditions are mentioned below:

  • WHERE clause is a good match of an existing index.
  • Presence of DISTINCT, UNION and ORDER BY clause is likely to invoke index scan.
  • To maintain unique, primary key and foreign key constraints.
  • To avoid sorts (sorts are always expensive).

WHERE clause pitfalls:

You can avoid index scan with a bad WHERE clause. Conditions which lead to a bad WHERE clause are:

  • Using columns in a WHERE clause, which do not match the order of columns in the index
  • Comparing columns in the same table
  • Choosing columns with low-selectivity indexes
  • Doing a math like COUNT on a column before comparing it to a constant
  • Applying a function to column data before comparing it to a constant
  • Matching with LIKE
  • Comparing to NULL
  • Negating with NOT
  • Converting values
  • Using IN

To write a better SQL, avoid the pitfalls mentioned above and note the suggestions below

  • Use BETWEEN instead of AND for range values
  • Use JOINS instead of IN wherever possible
  • Use IN for subqueries when the outer table has more rows than the inner table
  • If the outer query has an additional restrictive expression, use EXISTS.
  • If the outer query is WHERE NOT, use NOT EXISTS.
  • Avoid Order By. If unavoidable – keep this in mind. The fastest sort is an ascending sort of a presorted integer with unique values. ORDER BY works faster if some preordering occurs before the sort. Encourage preordering by using indexes, especially primary key and clustered indexes. One way to encourage the DBMS to use an index is to add redundant expressions to your queries

Examples:

Bad SQL Good SQL
SELECT * FROM PS_H_EMPL_VERIFY
WHERE H_LET_TYPE=’PS‘ and REG_REGION=’US’
and EMPL_RCD=0 and EMPLID=’10001′
SELECT * FROM PS_H_EMPL_VERIFY
WHERE EMPLID=’10001′ and EMPL_RCD=0
and REG_REGION=’US’  and H_LET_TYPE=’PS‘
SELECT prodnum, price  FROM product
WHERE price * 2 > 200
SELECT prodnum, price  FROM product
WHERE price > 100
SELECT emplid,name,birthdate
FROM ps_employee
WHERE emplid || ‘ ‘ || name = ‘443232366 Scorley‘;
SELECT emplid,name,birthdate
FROM ps_employee
WHERE emplid = ‘443232366’ and name = ‘Scorley’ ;
SELECT emplid,name,birthdate
FROM ps_employee
WHERE name like ‘S%Z’;
SELECT emplid,name,birthdate
FROM ps_employee
WHERE name like ‘S%’ ;
SELECT emplid,name,birthdate
FROM ps_employee
WHERE birthdate <> ‘1900-01-01’;
SELECT emplid,name,birthdate
FROM ps_employee
WHERE birthdate  > ‘1900-01-01’ ;
SELECT column1 FROM Table1 GROUP BY column1 SELECT DISTINCT column1 FROM Table1

 

Join Vs Subquery
One major point in favor of writing joins instead of subqueries is that it’s better not to tell the DBMS what to do. By writing joins, you’re giving the DBMS choices instead of forcing it to follow a specific plan. One major point in favor of writing subqueries is that a subquery loop may have fewer iterations, so subqueries may be faster than joins.

Generally joins are better, but if, on average, several rows are in the driven query for each row in the driver query (for example, a primary-to-foreign or other one-to-many relation), and especially if matching rows come first in the driven query, then subqueries are better.

Golden Rules of Writing a Good SQL

01  If the code involves an obvious math expression, do evaluate it and transform the condition to the evaluated result.
02. Avoid functions on columns.
03. When you know the distribution of a set of values, you can speed things up by transforming not equals searches to greater than and less than searches.
04. Transform a series of ORed expressions on the same column to IN.
05. LIKE will always beat multiple SUBSTRINGs, so don’t transform.
06. Transform UNION to OR.
07. Use CASE expressions for final filtering in the select list.
08. The WHERE clause must use columns in the order in which they appear in the index.
09. GROUP BY performs better if you keep the number of grouping columns small.
10. You can improve performance on some grouped joins by replacing the join with a subquery
11. Use DISTINCT instead of GROUP BY if your query doesn’t involve aggregate functions.
12. Because most DBMSs do sort before grouping, you can help the process by listing GROUP BY and ORDER BY columns in the same order.Grouping is fast if the list is sorted.
13. Use IN for subqueries when the outer table has many rows and the inner table has few rows.
14. Use EXISTS for subqueries when the outer query has a search condition in addition to the subquery condition.
15. The EXCEPT operator is useful for transforming NOT IN subqueries.
16. Minimize no of tables in a join not more than 5 tables
17. Don’t use DISTINCT or ORDER BY in your SELECT statements unless you really need them
18. Avoid the use of outer joins unless absolutely necessary.
19. Make sure to join on all key fields together between tables to increase performance and to avoid full table scan and costing on Database.
20. Subqueries in SQL should be avoided when the table within the subquery could be joined in the main ‘From’ clause.
21. All records and views should have key fields.
22. Create additional indexes if needed.
23. Use UNION ALL rather than UNION unless you need to eliminate duplicates
24. Don’t use a view if all the data you want is in a single underlying table
25. Any rows you can remove with WHERE, rather than HAVING, make your query more efficient

Related Posts Plugin for WordPress, Blogger...

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: