WP Introduction to SQL: Examples, Best Practices and Pitfalls | Imperva

Introduction to SQL: Examples, Best Practices and Pitfalls

Introduction to SQL: Examples, Best Practices and Pitfalls

SQL (Structured Query Language) has been with us for more than half a century and it’s not going away anytime soon. Popular in both traditional relational databases and newer NoSQL databases technologies, SQL is widely used for data analytics, Big Data processing, coding languages, and more.

I’m a fan of SQL. It’s a powerful, yet simple language. In this post, I’ll share:

  • The power of SQL and why we’re using it
  • Common pitfalls and best practices

What is SQL and Why You Should Use it?

SQL was initially developed by IBM in the early 1970s, adopted by Oracle later that decade, and has been used in relational databases ever since. In 1986, SQL was adopted as a standard by the American National Standards Institute (ANSI).

As the volume of data has grown over time, new databases and technologies have emerged. One of these is Big Data, for data sets too complex for traditional data-processing applications to deal with. Common in Big Data, NoSQL databases aren’t based on the relational model. They’re sometimes called Not Only SQL, to emphasize that they may support an SQL-like query language. Some examples of newer non-relational databases and libraries that support SQL include:

SQL is widely used both in OLTP (Transaction Processing) and OLAP (Analytic Processing), despite the two being very different use cases:

Snippet 1: pandasql

Transaction Processing Analytic Processing
Common usage Relational databases Big Data databases
SQL usage Insert, update, and queries by many users concurrently Select queries. Few Queries at a time
Scale of computation Small Large – entire tables
Query duration Fast Can be hours or more
Performance related features Indices and keys Partitions, redundant data (per use-case)
Analytic functions Infrequently used Common

Before diving into SQL, here’s an example of SQL usage outside of databases — Python’s Data Analysis Library:

Code Output
import pandas as pd
import pandasql as ps

data_frame = pd.DataFrame([[1111, 'John'],
                           [1234, 'Ted'],
                           [2222, 'James'],
                           [2345, 'John']],
                   columns=['ID', 'Name'])

sql = """SELECT name, COUNT(DISTINCT id)
         FROM data_frame
         GROUP BY NAME """

print(ps.sqldf(sql, locals()))
    Name  COUNT(DISTINCT ID)
0  James                   1
1   John                   2
2    Ted                   1

As these examples show, SQL is alive and well in the newest and most popular technologies.

Let’s take a closer look at its power.

SQL Functions Examples

The strongest part of SQL is its functions. Here are the three types of functions SQL supports:

Type Scope Reduction of records Examples
Simple Single record No CASE … WHEN 1 THEN column_1 WHEN 2 THEN column_2 ELSE 3 END
GREATEST (column_1, column_2, column_3)
COALESCE (column_1)
Aggregated (group by) Group of records Yes SUM(column_1)
AVG(column_1)
Analytic Group of records No SUM(column_1) OVER (PARTITION BY column_2)
LEAD/LAG - value of a field in the next or previous record

We recommend going over the functions of your query engine to understand its capabilities. Here, for example, you can find Presto’s documentation on aggregate functions and analytic functions.

Recommended SQL Coding Conventions (Naming and More)

To gain the power of SQL we highly recommend using coding conventions. This will improve both writing and reading, and improve team work. Here’s a short list of the more important ones:

Names and case Use underscores where you would naturally include a space in the name (last name -> last_name)
Upper case for keywords and functions (SELECT, FROM, WHERE, IN, SUM…)
Lower case for everything else: variables, table names, columns etc.
Explicitly use keywords In joins use keywords: CROSS, INNER… ON, LEFT… ON, RIGHT… ON
Avoid joins as part of the WHERE clause: SELECT… FROM table_a, table_b WHERE…
Aliasing – Always include the AS keyword
Avoid nested forms, use WITH instead Avoid SELECT … FROM (SELECT …)

Use WITH clauses! This way you can avoid complex queries, and create step-by-step readable queries

Avoid IN / NOT IN – Use JOIN. 

Line Spacing Before AND, OR After each keyword definition
Include comments Both /* and closing */ and — are acceptable
Do not use SELECT * Yes, it’s compact and easy, but new columns are added and included to your query result without a need. Write explicitly the column name you need

SQL Query Examples: Good & Bad

To show you how important the coding conventions are, please look at these good and bad examples.

Type Good Bad
Flow Two simple queries One complex query
Keywords Uppercase Lowercase, like other text
Join Explicit – easy to understand Implicit – with other WHERE predicates
Indentation Line spacing before OR helps in knowing its scope No line spacing before OR. Larger condition will be much hard to understand
Comments Used Not used

Snippet 4: Bad example

select emp.department, sum(exp.amount) expenses
from employees emp, expenses exp
where emp.department in (select department
from employees
where department like '%development%' or department like '%management%'
group by department
having count(distinct id) > 50) and emp.id = exp.employee_id
group by emp.department

Snippet 5: Good example

-- filter: Development and Management departments with more than 50 employees
WITH dept AS 
(SELECT department
   FROM employees
   WHERE department LIKE '%Development%'
      OR department LIKE '%Management%'
   GROUP BY department
   HAVING Count(DISTINCT id) > 50)
-- find expenses for each of the filtered departments
SELECT emp.department,
   SUM(exp.amount) AS expenses
FROM employees AS emp
   INNER JOIN dept
   ON emp.department = dept.department
   INNER JOIN expenses AS exp
   ON emp.id = exp.employee_id
GROUP BY emp.department

Here’s a nice online tool for SQL formatting that you can use: https://sqlformat.org/

SQL Problems Troubleshooting – Mistakes & Pitfalls

Like any other language, SQL has its own pitfalls which you should be aware of to save yourself time and prevent bugs in your application.

The first cause for SQL errors & bugs – Null values

Snippet 6: NULL query result

SELECT NULL = NULL --> NULL
SELECT NULL != NULL --> NULL
SELECT NULL IS NULL --> true
SELECT 1 + NULL --> NULL
 
SELECT SUM(num_column) 
WHERE  FALSE --> NULL

To overcome this issue you should use coalesce whenever there’s a danger:

Snippet 7: Overcome Null in joins

FROM       t1
INNER JOIN t2
        ON COALESCE(t1.name, '') = COALESCE(t2.name, '')

SQL Indices and Partitions Are Not Used

We spend a lot of time building our database schema by adding indices and declaring partitions. But we won’t get the performance that we want if we stop there — we have to make sure our database engine uses them.

For example, let’s assume we have an index on the first_name column:

SELECT first_name
FROM employee
WHERE UPPER(first_name) LIKE 'A%'
Index not used
SELECT first_name
FROM employee
WHERE first_name LIKE 'A%' OR first_name LIKE 'a%'
Index used

There may be changes from one implementation to another, but the concept is valid in any SQL implementation.

How do you make sure your indices and partitions are being used? Analyze your query execution plan. In some SQL implementations, you can use the EXPLAIN keyword. In Oracle, for example, it’s called EXPLAIN PLAN.

Analytical Functions Pitfalls in SQL – Evaluation Order

This one is important if you’re using an analytic function, as misunderstanding the evaluation order may lead to bugs. For example:

Snippet 8: Analytic function with filter

SELECT first_name, last_name, department,
       COUNT(1) OVER (PARTITION BY department) AS size_of_department
FROM   employees
WHERE  last_name LIKE 'A%'

We’ll get the number of employees in the department with a last name beginning with A, instead of all the employees in the department. Remember that analytic functions are executed last.

SQL Standardization

Date and time syntax, NULLs, string concatenation, comparison case sensitivity and other aspects of SQL  vary from one vendor to another – making it hard to move an application between vendors.

Try using standard SQL only. For example – use COALESCE instead of ISNULL (SQL Server) or NVL(Oracle).

Prevent SQL Injection (SQLi) Attacks

A well known security pitfall in web applications is the SQL injection attack – a code injection technique that attacks applications by inserting statements into a field for execution. Such attacks are usually used to steal data, change data, or change permissions in the attacked database.

Preventing SQL Injection attacks starts during development, and should be followed by the use of scanners and web application firewalls to protect your application. Please refer to the SQL Injection blog post  for examples and ways to prevent such attacks. 

Wrapping Up

SQL is great. I personally prefer SQL-based language to any other query language or API. However, you need to use it correctly to make your work easier. Start by building your schema correctly and later use coding conventions and other options, like the WITH keyword and window functions. 

Don’t forget to keep the known pitfalls in mind. And, If you’re writing an application, then protect it from SQL Injection.