Relational Database Interview Questions

Ashish Lahoti

Last Modified: June 12, 2022

Page content

List of frequently asked relational database interview questions. Keep following this post for regular updates.

What are ACID properties in a database?

ACID is an acronym for Atomicity, Consistency, Isolation, and Durability, which refers to the four properties of a transaction in Database. These properties ensures the accuracy and integrity of the data in database.

A transaction is a sequence of instructions that are executed as a single unit of work.

Atomicity - A transaction is an atomic unit; hence, all the instructions within a transaction will successfully execute, or none of them. If any of the instructions fail, the entire transaction should abort and rollback.
For example, transferring money between bank accounts is a transaction, where the value must be debited from one account and credit in another account. If any of them failed, then other must be rolled back.
Consistency - A database is initially in a consistent state, and it should remain consistent after every transaction, whether transaction execute successfully or fails.
For example, a transaction must fail, while trying to insert duplicate in a table column having unique constraint.
Isolation - If the multiple transactions are running concurrently, they should not be affected by each other; i.e., the result should be the same as the result obtained if the transactions were running sequentially.
For example, if two clients are trying to buy at the same time the last available product on the web site, when the first user finishes the shopping, it will make the transaction of the other user be interrupted.
Durability - Changes that have been persisted to the database should remain even in the case of software and hardware failure.

What is SQL?

SQL (Structured Query Language) is a language used for storing, reading, updating, and deleting data using simple code snippets, called queries, in an RDBMS (relational database management system).

Given below is an example. You first create a table, insert some data in it and then view all the data in that table.

-- Creating a table in database

-- CREATE TABLE <Table name>
CREATE TABLE Employee (
    -- <Variable name> <Variable type>	 
    ID int,
    Name varchar(255),
    Gender varchar(10),
    Age int
);

-- Inserting 
INSERT INTO
  -- <Table name>
  Employee 
    -- <Variable names>
    (ID, Name, Gender, Age) 
  VALUES
    -- <Variable values>
    (1, 'Adam', 'Male', 22),
    (2, 'Eva', 'Female', 28),
    (3, 'Joy', 'Male', 10);

-- Selecting everything we stored in the table to view
SELECT *
FROM Employee;

What are different type of SQL statements?

Data Definition Language (DDL) commands are used to define the structure that holds the data. These commands are auto-committed i.e. changes done by the DDL commands on the database are saved permanently and cannot be rolled back
e.g. CREATE, ALTER, TRUNCATE, DROP, RENAME, COMMENT
Data Manipulation Language (DML) commands are used to manipulate the data of the database. These commands are not auto-committed and can be rolled back
e.g. SELECT, INSERT, UPDATE, DELETE, MERGE
Data Control Language (DCL) commands are used to deal with the rights, permission, and other controls of the database
e.g. GRANT, REVOKE
Transaction Control Language (TCL) commands are used to deal with transactions within the database
e.g. COMMIT, ROLLBACK, SAVEPOINT

What are different types of clauses in SQL?

SQL clauses are used to qualify a database query by restricting or altering the values that it returns. The types of SQL clauses are:

FROM: Used to specify which tables the data will be pulled from. For example, get all records from Employee table:-
```
SELECT * FROM Employee;
```
WHERE: Used to filter the records by one or more conditions. For example, get all Female employee’s record over the age 18:-
```
SELECT * FROM Employee WHERE Gender = 'Female' and Age > 18;
```
ORDER BY: Used to sort the records returned by query. For example, get all the employee’s records sorted by Name in ascending order:-
```
SELECT * FROM Employee ORDER BY Name;
```
GROUP BY: Used to group the records together that have similar values. For example, get average salary of employees by department:-
```
SELECT DepartmentId, AVG(Salary) FROM Employee GROUP BY DepartmentID;
```
HAVING: Used in combination with the GROUP BY clause. Used to filter the records from each group after groupings are made.
```
SELECT DepartmentId, AVG(Salary) FROM Employee GROUP BY DepartmentID HAVING AVG(Salary) > 3000;
```
DISTINCT: Used to get distinct records from a table, duplicate records are removed. For example, you may wish to list the different gender in Employee’s table:-
```
SELECT DISTINCT(Gender) FROM Employee;
```

What is the difference between the WHERE and HAVING clause?

When GROUP BY is not used, the WHERE and HAVING clauses are essentially equivalent.

However, when GROUP BY is used:

The WHERE clause is used to filter records from all rows before groupings are made.
The HAVING clause is used to filter records from each group of rows after groupings are made.

What are Aggregate Functions?

An aggregate function is a function that’s applied on multiple values, and returns a single value after the calculation. In SQL, there are multiple aggregate functions. The most commonly used ones are:

MIN() - to find the minium out of all values of a column
MAX() - to find the maximum out of all values of a column
SUM() - to find the sum of all values of a column
AVG() - to find the average of all values of a column
COUNT() - to find the count of all values of a column

SELECT MIN(Age), MAX(Age), SUM(Age), AVG(Age), COUNT(Age) FROM Employee;

What are TRUNCATE, DELETE and DROP statements?

DELETE is a DML command used to delete specific rows from a table using WHERE clause. All the rows can be deleted if WHERE clause is not specified. It requires explicit commit to make its effect permanent. It can be rolled back.
```
DELETE FROM Employees
WHERE EmployeeId > 1000;
```
TRUNCATE is a DDL command used to delete all the rows from the table. It does not require a commit to make the changes permanent and this is the reason why rows deleted by truncate cannot be rolled back
```
TRUNCATE TABLE Employees;
```
DROP is a DDL command used to delete the table structure including all the rows from the database. It cannot be rolled back
```
DROP TABLE Employees;
```

What are the different types of JOIN clauses in SQL?

A JOIN clause combines records from multiple tables into a single table, based on the common values that they share between one or more columns.

Suppose we have Employee and Department tables which share common department_id column.

Employee
+----+---------+--------+--------+---------------+
| id | name    | gender | salary | department_id |
+----+---------+--------+--------+---------------+
| 1  | Adam    | m      | 2500   |  1            |
| 2  | Becca   | f      | 1500   |  1            |
| 3  | Charlie | m      | 5500   |  2            |
| 4  | Eva     | f      | 500    |  NULL         |
+----+---------+--------+--------+---------------+

Department
+---------------+---------+
| department_id | name    |
+---------------+---------+
| 1             | HR      | 
| 2             | Dev     | 
| 3             | IT      | 
| 4             | QA      |
+---------------+---------+

The different types of JOIN clauses in SQL are:

INNER JOIN (a.k.a. “Simple Join”) Returns all records that have at least one match in both tables. This is the default type of join if no specific JOIN type is specified.

SELECT * FROM Employee e JOIN Department d ON e.department_id = d.department_id;
-- OR
SELECT * FROM Employee e, Department d WHERE e.department_id = d.department_id;

LEFT JOIN Returns all rows from the left table, and the matched rows from the right table; i.e., the results will contain all records from the left table, even if the JOIN condition doesn’t find any matching records in the right table. This means that if the ON clause doesn’t match any records in the right table, the JOIN will still return a row in the result for that record in the left table, but with NULL in each column from the right table.
```
SELECT * FROM Employee e LEFT JOIN Department d ON e.department_id = d.department_id;
```
RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table, and the matched rows from the left table. This is the exact opposite of a LEFT JOIN; i.e., the results will contain all records from the right table, even if the JOIN condition doesn’t find any matching records in the left table. This means that if the ON clause doesn’t match any records in the left table, the JOIN will still return a row in the result for that record in the right table, but with NULL in each column from the left table.
```
SELECT * FROM Employee e RIGHT JOIN Department d ON e.department_id = d.department_id;
```
FULL JOIN (or FULL OUTER JOIN): Returns all rows for which there is a match in EITHER of the tables. Conceptually, a FULL JOIN combines the effect of applying both a LEFT JOIN and a RIGHT JOIN; i.e., its result set is equivalent to performing a UNION of the results of left and right outer queries.
```
SELECT * FROM Employee e FULL JOIN Department d ON e.department_id = d.department_id;
```
CROSS JOIN Returns all records where each row from the first table is combined with each row from the second table (i.e., returns the Cartesian product of the sets of rows from the joined tables).
Note that a CROSS JOIN can either be specified using the CROSS JOIN syntax (“explicit join notation”) or listing the tables in the FROM clause separated by commas without using a WHERE clause to supply join criteria (“implicit join notation”).
```
SELECT * FROM Employee e CROSS JOIN Department d ON e.department_id = d.department_id;
-- OR
SELECT * FROM Employee e, Department d;
```

What are Constraints?

Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during creation of table or after creationg using the ALTER TABLE command. The constraints are:

NOT NULL - Restricts NULL value from being inserted into a column.
CHECK - Verifies that all values in a field satisfy a condition.
DEFAULT - Automatically assigns a default value if no value has been specified for the field.
UNIQUE - Ensures unique values to be inserted into the field.
INDEX - Indexes a field providing faster retrieval of records.
PRIMARY KEY - Uniquely identifies each record in a table.
FOREIGN KEY - Ensures referential integrity for a record in another table.

CREATE TABLE table_name (
    column1 datatype constraint,
    column2 datatype constraint,
    column3 datatype constraint,
    ....
);

CREATE TABLE Employee (
    ID int NOT NULL UNIQUE,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int CHECK (Age>=18),
    City varchar(255) DEFAULT 'NA',
);

What is a NOT NULL constraint?

The NOT NULL constraint ensures that all values in a column must always contains a value, and not contains NULL. This provides guarantee that record cannot be inserted or updated with NULL value for that column.

/* Create table with NOT NULL constraint on ID, LastName and FirstName column */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255) NOT NULL,
    Age int
);

/* Add NOT NULL constraint on Age column */
ALTER TABLE Employee
MODIFY Age int NOT NULL;

What is a UNIQUE constraint?

A UNIQUE constraint ensures that all values in a column are different. This provides a guarantee of uniqueness for the column(s) and helps identify each row uniquely. There can be multiple unique constraints defined per table.

/* Create table with UNIQUE constraint on ID column */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    UNIQUE (ID)
);

/* Create table with UNIQUE constraint on multiple columns */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    CONSTRAINT UC_Employee UNIQUE (ID, LastName)
);

/* Add UNIQUE constraint on ID column */
ALTER TABLE Employee 	 
ADD UNIQUE (ID);

/* Add UNIQUE constraint on multiple columns */
ALTER TABLE Employee
ADD CONSTRAINT UC_Employee UNIQUE (ID, LastName);

/* Drop UNIQUE constraint */
ALTER TABLE Employee
DROP CONSTRAINT UC_Employee;

What is a Primary Key constraint?

The PRIMARY KEY constraint uniquely identifies each row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint. A table can have one and only one primary key, which is comprised of single or multiple columns.

/* Create table with PRIMARY KEY constraint on ID column */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    PRIMARY KEY (ID)
);

/* Create table with PRIMARY KEY constraint on multiple columns */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    CONSTRAINT PK_Employee PRIMARY KEY (ID, LastName)
);

/* Add PRIMARY KEY constraint on ID column */
ALTER TABLE Employee 	 
ADD PRIMARY KEY (ID);

/* Add PRIMARY KEY constraint on multiple columns */
ALTER TABLE Employee
ADD CONSTRAINT PK_Employee PRIMARY KEY (ID, LastName);

/* Drop UNIQUE constraint */
ALTER TABLE Employee
DROP CONSTRAINT PK_Employee;

What is a Foreign Key constraint?

A FOREIGN KEY is a key used to link two tables together.

A FOREIGN KEY comprises of single or collection of columns in a table that essentially refer to the PRIMARY KEY in another table. Foreign key constraint ensures referential integrity in the relation between two tables.

The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.

/* Department table with ID as PRIMARY KEY */
CREATE TABLE Department (
    ID INT NOT NULL,
    Name VARCHAR(255),
    PRIMARY KEY (ID)
);

/* Employee table with DepartmentID as FOREIGN KEY 
which is linked to ID column of Department table */
CREATE TABLE Employee (
    ID INT NOT NULL,
    Name VARCHAR(255),
    DepartmentID INT,
    PRIMARY KEY (ID),
    FOREIGN KEY (DepartmentID) REFERENCES Department(ID)
);

/* Add FOREIGN KEY constraint on DepartmentID column */
ALTER TABLE Employee
ADD CONSTRAINT FK_DepartmentEmployee 
FOREIGN KEY (DepartmentID) REFERENCES Department(ID);

What is Index?

An index is a structure in a database that can help retrieve data faster. When you search table data with an SQL query, it will search the whole table and return the result. An unindexed table is called a heap. The data stored in such tables are usually not arranged in any particular way. It is stored in the order it was entered. Therefore, searching for data can be very slow and frustrating.

When you query an indexed table, the database will go to the index first and retrieve the corresponding records directly. The two main index types are:

Clustered
Non-clustered

What is Clustered Index?

Clustered index defines the order in which data is physically stored in a table. Therefore, only one clustered index can be created on a given database table.

Q: What is the maximum number of clustered indexes a table can have?
A: 1

This is similar to a physical telephone directory, where the records are sorted by first name, and last name.

In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.

CREATE TABLE Employee
(
    id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    gender VARCHAR(50) NOT NULL
 )

CREATE CLUSTERED INDEX IX_tblEmployee_Id
ON student(id ASC)

Notice that “id” column is primary key in “Employee” table so clustered index is automatically created for “id” column. The clustered index physically stores and maintains the records in “Employee” table in the ascending order of “id” even if you insert by records by random “id”.

Clustered index on “id” make sure that you always get records sorted by “id” using following statement:-

SELECT * from Employee

What is Non-clustered Index?

Non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place.

This is similar to a textbook where the book content is located in one place and the index is located in the last few pages. This allows for more than one non-clustered index per table.

Q: If we create many non-clustered indexes on a table, what operations are likely to be considerably slower than before?
A: Write operations (INSERT, UPDATE, DELETE, MERGE)

Q: When we talk about non-clustered index, how many KEY columns can it have and how many INCLUDED columns?
A: 16 key columns and unlimited include columns (all table columns can be included).

Q: Which datastructure is used generally to create non-clustered indexes on a table?
A: B and B+ Trees are mostly used. Some of the database also use BitMap data structure to create indexes where cardinality is low.

What is the difference between Clustered and Non-Clustered Index?

Clustered vs Non-Clustered Index

Clustered Index	Non-Clustered Index
Can be only one clustered index per table	Can create multiple non-clustered indexes on a single table
Clustered indexes only sort tables. Therefore, they do not consume extra storage.	Non-clustered indexes are stored in a separate place from the actual table claiming more storage space.
Clustered indexes are faster	Non-clustered indexes are slower

What is Composite Index?

Whether clustered or non-clustered index, when multiple columns are used to create index then it is known as Composite Index

CREATE NONCLUSTERED INDEX IX_tblEmployee_Name_Gender
ON student(name ASC, gender DESC)

Note that the sequence of column in Composite Index matters. Above composite index will be used to find records for below queries:-

SELECT * from Employee where name="John" and gender="Male"
SELECT * from Employee where name="Eva"

but not for below query:-

SELECT * from Employee where gender="Female"

What is Cardinality for Database Query Optimization?

When it comes to database query optimization, Cardinality refers to number of unique values in a column of a table. We can calculate cardinality using SQL statement:-

SELECT DISTINCT COUNT(columnName) from tableName;

The higher the cardinality of column, the lower the duplicate values in the column. There are three types of SQL cardinality :-

High Cardinality refers to columns with mostly uncommon or unique values.
such as employee_id, email_address, phone_number columns in Employee Table.
Unique indexes are used to improve performance of such columns.
Normal Cardinality refers to columns with somewhat uncommon values.
such as first_name, last_name columns in Employee Table.
Low Cardinality refers to columns with mostly duplicate values
such as gender, joining_year columns in Employee Table.
Bit-map indexes are used to improve the performance of such columns.

How to do Performance tuning of SQL queries?

Queries on large databases can have significant execution times. Few ways to optimize SQL queries are as follows:-

Use Indexes
A straightforward, yet effective, method for query optimization is indexing. This way, the whole database won’t need to be searched for these queries. For example, if a system needed to query first and last names and ID numbers of employees based on their age, the following script could be used to create the index:
```
CREATE INDEX age_queries
ON Employees (last_name, first_name, id, age);
```
Use Inner Join instead of Where
To link data from two or more tables, the Where clause is used. However, Where uses a kind of join that joins each record of each table and then filters it for the result. This means that if there are 100 records in the Employee table and 100 in the Salary table, it will generate a table of 10,000 records and then check for the correct records for this query.
```
Select Employees.first_name, Employees.last_name, Employees.id
From Employees, Salary
Where Employees.id = Salary.id;
```
In contrast, if the Inner Join is used, the required records will be produced without anything in the middle.
```
Select Employees.first_name, Employees.last_name, Employees.id
From Employees Inner Join Salary
On Employees.id = Salary.id;
```
Use Limit for testing
Before running a query on a large database with other potential users, it is advised to use the limit operator to ensure that it actually works. The limit keyword returns a limited number of results from a query, as shown below.
```
Select Employees.first_name, Employees.last_name, Employees.id
From Employees Inner Join Salary
On Employees.id = Salary.id
Where Salary.amount > 4500
Limit 10;
```
This will return only ten names and ID numbers of Employees with salaries greater than 4,500.
Use Explain
When querying a large database, it might be smart to use the Explain operator before running the query. Explain displays what is called a Query Plan. The query plan shows which indexing is used, order of execution and the approximate time each step might take. Using this information, more time-expensive steps may be better optimized. The following example shows how to use this operator:
```
Explain
Select Employees.first_name, Employees.last_name, Employees.id
From Employees Inner Join Salary
On Employees.id = Salary.id
Where Salary.amount > 4500;
```
Be Specific with Select queries
When working with Select queries, it is smart to avoid the * operator, which returns all columns for the queried records unless otherwise required. Instead, only the required information should be extracted. For example:
```
Select *
From Employees;
```
This operation would generate all columns of all records from the Employees table. This is inefficient if only the names were required. In that case, the following query should be used:
```
Select first_name, last_name
From Employees;
```

Find Second Highest Salary from Employee Table?

+----+--------+
| Id | Salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+

SELECT max(Salary) AS SecondHighestSalary
FROM Employee
WHERE Salary < (SELECT max(Salary) FROM Employee)

+---------------------+
| SecondHighestSalary |
+---------------------+
| 200                 |
+---------------------+

Find Duplicate Emails from Employee Table?

+----+---------+
| Id | Email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+

SELECT Email
FROM Employee
GROUP BY Email
HAVING count(Email) > 1;

+---------+
| Email   |
+---------+
| a@b.com |
+---------+

Find Emails starting with vowels (i.e a,e,i,o,u) from Employee Table?

+----+-------------------+
| Id | Email             |
+----+-------------------+
| 1  | adam@gmail.com    |
| 2  | charlie@yahoo.com |
| 3  | david@hotmail.com |
| 4  | eva@apple.com     |
| 5  | fang@bing.com     |
+----+-------------------+

SELECT Email
FROM Employee
WHERE Email REGEXP "^[aeiou]";

+------------------+
| Email            |
+------------------+
| adam@gmail.com   |
| eva@apple.com    |
+------------------+

Swap values of gender column in Employee Table?

Column gender has m=male and f=female. Swap all f and m values (i.e., change all f values to m and vice versa) with a single update statement and no intermediate temp table.

+----+------+--------+--------+
| id | name | gender | salary |
+----+------+--------+--------+
| 1  | A    | m      | 2500   |
| 2  | B    | f      | 1500   |
| 3  | C    | m      | 5500   |
| 4  | D    | f      | 500    |
+----+------+--------+--------+

There are multiple solutions possible for this:-

Using CASE

UPDATE salary
SET
    gender = CASE gender
        WHEN 'm' THEN 'f'
        ELSE 'm'
    END;

Using IF

UPDATE salary 
SET gender = IF (gender = "m", "f", "m");

Using XOR ^ Operation

UPDATE salary 
SET gender = CHAR(ASCII('f') ^ ASCII('m') ^ ASCII(gender));

Next Steps…

Practice SQL queries online using this free tool:- https://www.sql-practice.com/