Relational Database Interview Questions
List of frequently asked relational database interview questions. Keep following this post for regular updates.
What are ACID properties in a database?
ACID is an acronym for Atomicity, Consistency, Isolation, and Durability, which refers to the four properties of a transaction in Database. These properties ensures the accuracy and integrity of the data in database.
A transaction is a sequence of instructions that are executed as a single unit of work.
-
Atomicity - A transaction is an atomic unit; hence, all the instructions within a transaction will successfully execute, or none of them. If any of the instructions fail, the entire transaction should abort and rollback.
For example, transferring money between bank accounts is a transaction, where the value must be debited from one account and credit in another account. If any of them failed, then other must be rolled back. -
Consistency - A database is initially in a consistent state, and it should remain consistent after every transaction, whether transaction execute successfully or fails.
For example, a transaction must fail, while trying to insert duplicate in a table column having unique constraint. -
Isolation - If the multiple transactions are running concurrently, they should not be affected by each other; i.e., the result should be the same as the result obtained if the transactions were running sequentially.
For example, if two clients are trying to buy at the same time the last available product on the web site, when the first user finishes the shopping, it will make the transaction of the other user be interrupted. -
Durability - Changes that have been persisted to the database should remain even in the case of software and hardware failure.
What is SQL?
SQL (Structured Query Language) is a language used for storing, reading, updating, and deleting data using simple code snippets, called queries, in an RDBMS (relational database management system).
Given below is an example. You first create a table, insert some data in it and then view all the data in that table.
-- Creating a table in database
-- CREATE TABLE <Table name>
CREATE TABLE Employee (
-- <Variable name> <Variable type>
ID int,
Name varchar(255),
Gender varchar(10),
Age int
);
-- Inserting
INSERT INTO
-- <Table name>
Employee
-- <Variable names>
(ID, Name, Gender, Age)
VALUES
-- <Variable values>
(1, 'Adam', 'Male', 22),
(2, 'Eva', 'Female', 28),
(3, 'Joy', 'Male', 10);
-- Selecting everything we stored in the table to view
SELECT *
FROM Employee;
What are different type of SQL statements?
- Data Definition Language (DDL) commands are used to define the structure that holds the data. These commands are auto-committed i.e. changes done by the DDL commands on the database are saved permanently and cannot be rolled back
e.g. CREATE, ALTER, TRUNCATE, DROP, RENAME, COMMENT - Data Manipulation Language (DML) commands are used to manipulate the data of the database. These commands are not auto-committed and can be rolled back
e.g. SELECT, INSERT, UPDATE, DELETE, MERGE - Data Control Language (DCL) commands are used to deal with the rights, permission, and other controls of the database
e.g. GRANT, REVOKE - Transaction Control Language (TCL) commands are used to deal with transactions within the database
e.g. COMMIT, ROLLBACK, SAVEPOINT
What are different types of clauses in SQL?
SQL clauses are used to qualify a database query by restricting or altering the values that it returns. The types of SQL clauses are:
-
FROM: Used to specify which tables the data will be pulled from. For example, get all records from Employee table:-
SELECT * FROM Employee;
-
WHERE: Used to filter the records by one or more conditions. For example, get all Female employee’s record over the age 18:-
SELECT * FROM Employee WHERE Gender = 'Female' and Age > 18;
-
ORDER BY: Used to sort the records returned by query. For example, get all the employee’s records sorted by Name in ascending order:-
SELECT * FROM Employee ORDER BY Name;
-
GROUP BY: Used to group the records together that have similar values. For example, get average salary of employees by department:-
SELECT DepartmentId, AVG(Salary) FROM Employee GROUP BY DepartmentID;
-
HAVING: Used in combination with the
GROUP BY
clause. Used to filter the records from each group after groupings are made.SELECT DepartmentId, AVG(Salary) FROM Employee GROUP BY DepartmentID HAVING AVG(Salary) > 3000;
-
DISTINCT: Used to get distinct records from a table, duplicate records are removed. For example, you may wish to list the different gender in Employee’s table:-
SELECT DISTINCT(Gender) FROM Employee;
What is the difference between the WHERE and HAVING clause?
When GROUP BY
is not used, the WHERE
and HAVING
clauses are essentially equivalent.
However, when GROUP BY
is used:
- The
WHERE
clause is used to filter records from all rows before groupings are made. - The
HAVING
clause is used to filter records from each group of rows after groupings are made.
What are Aggregate Functions?
An aggregate function is a function that’s applied on multiple values, and returns a single value after the calculation. In SQL, there are multiple aggregate functions. The most commonly used ones are:
- MIN() - to find the minium out of all values of a column
- MAX() - to find the maximum out of all values of a column
- SUM() - to find the sum of all values of a column
- AVG() - to find the average of all values of a column
- COUNT() - to find the count of all values of a column
SELECT MIN(Age), MAX(Age), SUM(Age), AVG(Age), COUNT(Age) FROM Employee;
What are TRUNCATE, DELETE and DROP statements?
- DELETE is a DML command used to delete specific rows from a table using WHERE clause. All the rows can be deleted if WHERE clause is not specified. It requires explicit commit to make its effect permanent. It can be rolled back.
DELETE FROM Employees WHERE EmployeeId > 1000;
- TRUNCATE is a DDL command used to delete all the rows from the table. It does not require a commit to make the changes permanent and this is the reason why rows deleted by truncate cannot be rolled back
TRUNCATE TABLE Employees;
- DROP is a DDL command used to delete the table structure including all the rows from the database. It cannot be rolled back
DROP TABLE Employees;
What are the different types of JOIN clauses in SQL?
A JOIN
clause combines records from multiple tables into a single table, based on the common values that they share between one or more columns.
Suppose we have Employee
and Department
tables which share common department_id
column.
Employee
+----+---------+--------+--------+---------------+
| id | name | gender | salary | department_id |
+----+---------+--------+--------+---------------+
| 1 | Adam | m | 2500 | 1 |
| 2 | Becca | f | 1500 | 1 |
| 3 | Charlie | m | 5500 | 2 |
| 4 | Eva | f | 500 | NULL |
+----+---------+--------+--------+---------------+
Department
+---------------+---------+
| department_id | name |
+---------------+---------+
| 1 | HR |
| 2 | Dev |
| 3 | IT |
| 4 | QA |
+---------------+---------+
The different types of JOIN clauses in SQL are:
- INNER JOIN (a.k.a. “Simple Join”) Returns all records that have at least one match in both tables. This is the default type of join if no specific JOIN type is specified.
SELECT * FROM Employee e JOIN Department d ON e.department_id = d.department_id; -- OR SELECT * FROM Employee e, Department d WHERE e.department_id = d.department_id;
- LEFT JOIN Returns all rows from the left table, and the matched rows from the right table; i.e., the results will contain all records from the left table, even if the JOIN condition doesn’t find any matching records in the right table. This means that if the ON clause doesn’t match any records in the right table, the JOIN will still return a row in the result for that record in the left table, but with NULL in each column from the right table.
SELECT * FROM Employee e LEFT JOIN Department d ON e.department_id = d.department_id;
- RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table, and the matched rows from the left table. This is the exact opposite of a LEFT JOIN; i.e., the results will contain all records from the right table, even if the JOIN condition doesn’t find any matching records in the left table. This means that if the ON clause doesn’t match any records in the left table, the JOIN will still return a row in the result for that record in the right table, but with NULL in each column from the left table.
SELECT * FROM Employee e RIGHT JOIN Department d ON e.department_id = d.department_id;
- FULL JOIN (or FULL OUTER JOIN): Returns all rows for which there is a match in EITHER of the tables. Conceptually, a FULL JOIN combines the effect of applying both a LEFT JOIN and a RIGHT JOIN; i.e., its result set is equivalent to performing a UNION of the results of left and right outer queries.
SELECT * FROM Employee e FULL JOIN Department d ON e.department_id = d.department_id;
- CROSS JOIN Returns all records where each row from the first table is combined with each row from the second table (i.e., returns the Cartesian product of the sets of rows from the joined tables).
Note that a CROSS JOIN can either be specified using the CROSS JOIN syntax (“explicit join notation”) or listing the tables in the FROM clause separated by commas without using a WHERE clause to supply join criteria (“implicit join notation”).SELECT * FROM Employee e CROSS JOIN Department d ON e.department_id = d.department_id; -- OR SELECT * FROM Employee e, Department d;
What are Constraints?
Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during creation of table or after creationg using the ALTER TABLE command. The constraints are:
- NOT NULL - Restricts NULL value from being inserted into a column.
- CHECK - Verifies that all values in a field satisfy a condition.
- DEFAULT - Automatically assigns a default value if no value has been specified for the field.
- UNIQUE - Ensures unique values to be inserted into the field.
- INDEX - Indexes a field providing faster retrieval of records.
- PRIMARY KEY - Uniquely identifies each record in a table.
- FOREIGN KEY - Ensures referential integrity for a record in another table.
CREATE TABLE table_name (
column1 datatype constraint,
column2 datatype constraint,
column3 datatype constraint,
....
);
CREATE TABLE Employee (
ID int NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int CHECK (Age>=18),
City varchar(255) DEFAULT 'NA',
);
What is a NOT NULL constraint?
The NOT NULL constraint ensures that all values in a column must always contains a value, and not contains NULL. This provides guarantee that record cannot be inserted or updated with NULL value for that column.
/* Create table with NOT NULL constraint on ID, LastName and FirstName column */
CREATE TABLE Employee (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255) NOT NULL,
Age int
);
/* Add NOT NULL constraint on Age column */
ALTER TABLE Employee
MODIFY Age int NOT NULL;
What is a UNIQUE constraint?
A UNIQUE constraint ensures that all values in a column are different. This provides a guarantee of uniqueness for the column(s) and helps identify each row uniquely. There can be multiple unique constraints defined per table.
/* Create table with UNIQUE constraint on ID column */
CREATE TABLE Employee (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
UNIQUE (ID)
);
/* Create table with UNIQUE constraint on multiple columns */
CREATE TABLE Employee (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CONSTRAINT UC_Employee UNIQUE (ID, LastName)
);
/* Add UNIQUE constraint on ID column */
ALTER TABLE Employee
ADD UNIQUE (ID);
/* Add UNIQUE constraint on multiple columns */
ALTER TABLE Employee
ADD CONSTRAINT UC_Employee UNIQUE (ID, LastName);
/* Drop UNIQUE constraint */
ALTER TABLE Employee
DROP CONSTRAINT UC_Employee;
What is a Primary Key constraint?
The PRIMARY KEY constraint uniquely identifies each row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint. A table can have one and only one primary key, which is comprised of single or multiple columns.
/* Create table with PRIMARY KEY constraint on ID column */
CREATE TABLE Employee (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (ID)
);
/* Create table with PRIMARY KEY constraint on multiple columns */
CREATE TABLE Employee (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
CONSTRAINT PK_Employee PRIMARY KEY (ID, LastName)
);
/* Add PRIMARY KEY constraint on ID column */
ALTER TABLE Employee
ADD PRIMARY KEY (ID);
/* Add PRIMARY KEY constraint on multiple columns */
ALTER TABLE Employee
ADD CONSTRAINT PK_Employee PRIMARY KEY (ID, LastName);
/* Drop UNIQUE constraint */
ALTER TABLE Employee
DROP CONSTRAINT PK_Employee;
What is a Foreign Key constraint?
A FOREIGN KEY is a key used to link two tables together.
A FOREIGN KEY comprises of single or collection of columns in a table that essentially refer to the PRIMARY KEY in another table. Foreign key constraint ensures referential integrity in the relation between two tables.
The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.
/* Department table with ID as PRIMARY KEY */
CREATE TABLE Department (
ID INT NOT NULL,
Name VARCHAR(255),
PRIMARY KEY (ID)
);
/* Employee table with DepartmentID as FOREIGN KEY
which is linked to ID column of Department table */
CREATE TABLE Employee (
ID INT NOT NULL,
Name VARCHAR(255),
DepartmentID INT,
PRIMARY KEY (ID),
FOREIGN KEY (DepartmentID) REFERENCES Department(ID)
);
/* Add FOREIGN KEY constraint on DepartmentID column */
ALTER TABLE Employee
ADD CONSTRAINT FK_DepartmentEmployee
FOREIGN KEY (DepartmentID) REFERENCES Department(ID);
What is Index?
An index is a structure in a database that can help retrieve data faster. When you search table data with an SQL query, it will search the whole table and return the result. An unindexed table is called a heap. The data stored in such tables are usually not arranged in any particular way. It is stored in the order it was entered. Therefore, searching for data can be very slow and frustrating.
When you query an indexed table, the database will go to the index first and retrieve the corresponding records directly. The two main index types are:
- Clustered
- Non-clustered
What is Clustered Index?
Clustered index defines the order in which data is physically stored in a table. Therefore, only one clustered index can be created on a given database table.
Q: What is the maximum number of clustered indexes a table can have?
A: 1
This is similar to a physical telephone directory, where the records are sorted by first name, and last name.
In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.
CREATE TABLE Employee
(
id INT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
gender VARCHAR(50) NOT NULL
)
CREATE CLUSTERED INDEX IX_tblEmployee_Id
ON student(id ASC)
Notice that “id” column is primary key in “Employee” table so clustered index is automatically created for “id” column. The clustered index physically stores and maintains the records in “Employee” table in the ascending order of “id” even if you insert by records by random “id”.
Clustered index on “id” make sure that you always get records sorted by “id” using following statement:-
SELECT * from Employee
What is Non-clustered Index?
Non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place.
This is similar to a textbook where the book content is located in one place and the index is located in the last few pages. This allows for more than one non-clustered index per table.
Q: If we create many non-clustered indexes on a table, what operations are likely to be considerably slower than before?
A: Write operations (INSERT, UPDATE, DELETE, MERGE)
Q: When we talk about non-clustered index, how many KEY columns can it have and how many INCLUDED columns?
A: 16 key columns and unlimited include columns (all table columns can be included).
Q: Which datastructure is used generally to create non-clustered indexes on a table?
A: B and B+ Trees are mostly used. Some of the database also use BitMap data structure to create indexes where cardinality is low.
What is the difference between Clustered and Non-Clustered Index?
Clustered vs Non-Clustered Index
Clustered Index | Non-Clustered Index |
---|---|
Can be only one clustered index per table | Can create multiple non-clustered indexes on a single table |
Clustered indexes only sort tables. Therefore, they do not consume extra storage. | Non-clustered indexes are stored in a separate place from the actual table claiming more storage space. |
Clustered indexes are faster | Non-clustered indexes are slower |
What is Composite Index?
Whether clustered or non-clustered index, when multiple columns are used to create index then it is known as Composite Index
CREATE NONCLUSTERED INDEX IX_tblEmployee_Name_Gender
ON student(name ASC, gender DESC)
Note that the sequence of column in Composite Index matters. Above composite index will be used to find records for below queries:-
SELECT * from Employee where name="John" and gender="Male"
SELECT * from Employee where name="Eva"
but not for below query:-
SELECT * from Employee where gender="Female"
What is Cardinality for Database Query Optimization?
When it comes to database query optimization, Cardinality refers to number of unique values in a column of a table. We can calculate cardinality using SQL statement:-
SELECT DISTINCT COUNT(columnName) from tableName;
The higher the cardinality of column, the lower the duplicate values in the column. There are three types of SQL cardinality :-
- High Cardinality refers to columns with mostly uncommon or unique values.
such as employee_id, email_address, phone_number columns in Employee Table.
Unique indexes are used to improve performance of such columns. - Normal Cardinality refers to columns with somewhat uncommon values.
such as first_name, last_name columns in Employee Table. - Low Cardinality refers to columns with mostly duplicate values
such as gender, joining_year columns in Employee Table.
Bit-map indexes are used to improve the performance of such columns.
How to do Performance tuning of SQL queries?
Queries on large databases can have significant execution times. Few ways to optimize SQL queries are as follows:-
-
Use Indexes
A straightforward, yet effective, method for query optimization is indexing. This way, the whole database won’t need to be searched for these queries. For example, if a system needed to query first and last names and ID numbers of employees based on their age, the following script could be used to create the index:CREATE INDEX age_queries ON Employees (last_name, first_name, id, age);
-
Use
Inner Join
instead ofWhere
To link data from two or more tables, the Where clause is used. However, Where uses a kind of join that joins each record of each table and then filters it for the result. This means that if there are 100 records in the Employee table and 100 in the Salary table, it will generate a table of 10,000 records and then check for the correct records for this query.Select Employees.first_name, Employees.last_name, Employees.id From Employees, Salary Where Employees.id = Salary.id;
In contrast, if the
Inner Join
is used, the required records will be produced without anything in the middle.Select Employees.first_name, Employees.last_name, Employees.id From Employees Inner Join Salary On Employees.id = Salary.id;
-
Use
Limit
for testing
Before running a query on a large database with other potential users, it is advised to use the limit operator to ensure that it actually works. The limit keyword returns a limited number of results from a query, as shown below.Select Employees.first_name, Employees.last_name, Employees.id From Employees Inner Join Salary On Employees.id = Salary.id Where Salary.amount > 4500 Limit 10;
This will return only ten names and ID numbers of Employees with salaries greater than 4,500.
-
Use
Explain
When querying a large database, it might be smart to use the Explain operator before running the query. Explain displays what is called a Query Plan. The query plan shows which indexing is used, order of execution and the approximate time each step might take. Using this information, more time-expensive steps may be better optimized. The following example shows how to use this operator:Explain Select Employees.first_name, Employees.last_name, Employees.id From Employees Inner Join Salary On Employees.id = Salary.id Where Salary.amount > 4500;
-
Be Specific with Select queries
When working with Select queries, it is smart to avoid the * operator, which returns all columns for the queried records unless otherwise required. Instead, only the required information should be extracted. For example:Select * From Employees;
This operation would generate all columns of all records from the Employees table. This is inefficient if only the names were required. In that case, the following query should be used:
Select first_name, last_name From Employees;
Find Second Highest Salary from Employee Table?
+----+--------+
| Id | Salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+
SELECT max(Salary) AS SecondHighestSalary
FROM Employee
WHERE Salary < (SELECT max(Salary) FROM Employee)
+---------------------+
| SecondHighestSalary |
+---------------------+
| 200 |
+---------------------+
Find Duplicate Emails from Employee Table?
+----+---------+
| Id | Email |
+----+---------+
| 1 | a@b.com |
| 2 | c@d.com |
| 3 | a@b.com |
+----+---------+
SELECT Email
FROM Employee
GROUP BY Email
HAVING count(Email) > 1;
+---------+
| Email |
+---------+
| a@b.com |
+---------+
Find Emails starting with vowels (i.e a,e,i,o,u) from Employee Table?
+----+-------------------+
| Id | Email |
+----+-------------------+
| 1 | adam@gmail.com |
| 2 | charlie@yahoo.com |
| 3 | david@hotmail.com |
| 4 | eva@apple.com |
| 5 | fang@bing.com |
+----+-------------------+
SELECT Email
FROM Employee
WHERE Email REGEXP "^[aeiou]";
+------------------+
| Email |
+------------------+
| adam@gmail.com |
| eva@apple.com |
+------------------+
Swap values of gender column in Employee Table?
Column gender
has m=male and f=female. Swap all f and m values (i.e., change all f values to m and vice versa) with a single update statement and no intermediate temp table.
+----+------+--------+--------+
| id | name | gender | salary |
+----+------+--------+--------+
| 1 | A | m | 2500 |
| 2 | B | f | 1500 |
| 3 | C | m | 5500 |
| 4 | D | f | 500 |
+----+------+--------+--------+
There are multiple solutions possible for this:-
-
Using CASE
UPDATE salary SET gender = CASE gender WHEN 'm' THEN 'f' ELSE 'm' END;
-
Using IF
UPDATE salary SET gender = IF (gender = "m", "f", "m");
-
Using XOR ^ Operation
UPDATE salary SET gender = CHAR(ASCII('f') ^ ASCII('m') ^ ASCII(gender));
Next Steps…
Practice SQL queries online using this free tool:- https://www.sql-practice.com/