Relational Database Interview Questions Relational Database Interview Questions

Page content

List of frequently asked relational database interview questions. Keep following this post for regular updates.

What are ACID properties in a database?

A transaction is a set of instruction(s) on database such as create, update, and delete.

Relational database provides guarantee of data integrity of these transactions through ACID properties, which is an acronym for atomicity, consistency, isolation, and durability.

  1. Atomicity - A transaction is an atomic unit; hence, all the instructions within a transaction will successfully execute, or none of them will execute. If any of the instructions fail, the entire transaction should abort and rollback.

  2. Consistency - A database is initially in a consistent state, and it should remain consistent after every transaction, whether transaction execute successfully or fails.

  3. Isolation - If the multiple transactions are running concurrently, they should not be affected by each other; i.e., the result should be the same as the result obtained if the transactions were running sequentially.

  4. Durability - Changes that have been committed to the database should remain even in the case of software and hardware failure.

What is SQL?

SQL (Structured Query Language) is a language used for storing, reading, updating, and deleting data using simple code snippets, called queries, in an RDBMS (relational database management system).

Given below is an example. You first create a table, insert some data in it and then view all the data in that table.

-- Creating a table in database

-- CREATE TABLE <Table name>
CREATE TABLE Employee (
    -- <Variable name> <Variable type>	 
    ID int,
    Name varchar(255),
    Gender varchar(10),
    Age int
);

-- Inserting 
INSERT INTO
  -- <Table name>
  Employee 
    -- <Variable names>
    (ID, Name, Gender, Age) 
  VALUES
    -- <Variable values>
    (1, 'Adam', 'Male', 22),
    (2, 'Eva', 'Female', 28),
    (3, 'Joy', 'Male', 10);

-- Selecting everything we stored in the table to view
SELECT *
FROM Employee;

What are different type of SQL statements?

  1. Data Definition Language (DDL) commands are used to define the structure that holds the data. These commands are auto-committed i.e. changes done by the DDL commands on the database are saved permanently e.g. CREATE, ALTER, TRUNCATE, DROP, RENAME, COMMENT

  2. Data Query Language (DQL) commands are used to perform queries to on the data e.g. SELECT

  3. Data Manipulation Language (DML) commands are used to manipulate the data of the database. These commands are not auto-committed and can be rolled back e.g. INSERT, UPDATE, DELETE, MERGE

  4. Data Control Language (DCL) commands are used to deal with the rights, permission, and other controls of the database e.g. GRANT, REVOKE

  5. Transaction Control Language (TCL) commands are used to deal with transactions within the database e.g. COMMIT, ROLLBACK, SAVEPOINT

What are different types of clauses in SQL?

SQL clauses are used to qualify a database query by restricting or altering the values that it returns. The types of SQL clauses are:

  • FROM: Used to specify which tables the data will be pulled from. For example, get all records from Employee table:-

    SELECT * FROM Employee;
    
  • WHERE: Used to filter the data by one or more conditions. For example, get all Female employee’s record over the age 18:-

    SELECT * FROM Employee WHERE Gender = 'Female' and Age > 18;
    
  • ORDER BY: Used to sort the data returned by query. For example, get all the employee’s records sorted by Name in ascending order:-

    SELECT * FROM Employee ORDER BY Name;
    
  • GROUP BY: Used to group together rows that have the same values. For example, get average salary of employees by department:-

    SELECT DepartmentId, AVG(Salary) FROM Employee GROUP BY DepartmentID;
    
  • HAVING: Used in combination with the GROUP BY clause. It restricts the returned values to only those that fulfill a given condition.

    SELECT DepartmentId, AVG(Salary) FROM Employee GROUP BY DepartmentID HAVING AVG(Salary) > 3000;
    
  • DISTINCT: Used to retrieve a table with distinct records, duplicate records are removed. For example, you may wish to list the different gender in Employee’s table:-

    SELECT DISTINCT(Gender) FROM Employee;
    

What are Aggregate Functions?

An aggregate function is a function that’s applied on multiple values, and returns a single value after the calculation. In SQL, there are multiple aggregate functions. The most commonly used ones are:

  1. MIN() - to find the minium out of all values of a column
  2. MAX() - to find the maximum out of all values of a column
  3. SUM() - to find the sum of all values of a column
  4. AVG() - to find the average of all values of a column
  5. COUNT() - to find the count of all values of a column
SELECT MIN(Age), MAX(Age), SUM(Age), AVG(Age), COUNT(Age) FROM Employee;

What are TRUNCATE, DELETE and DROP statements?

  • DELETE is DML statement used to delete specific rows from a table using WHERE clause. All the rows can also be deleted if WHERE clause is not used. It requires explicit commit to make its effect permanent.
    DELETE FROM Employees
    WHERE EmployeeId > 1000;
    
  • TRUNCATE is a DDL statement used to delete all the rows from the table. It does not require a commit to make the changes permanent and this is the reason why rows deleted by truncate could not be roll-backed.
    TRUNCATE TABLE Employees;
    
  • DROP command is used to delete the table structure including all the rows from the database.
    DROP TABLE Employees;
    

What are the different types of JOIN clauses in SQL?

A JOIN clause combines records from multiple tables into a single table, based on the common values that they share between one or more columns.

The different types of JOIN clauses in SQL are:

  1. INNER JOIN - Returns all records that have at least one match in both tables.

  2. LEFT JOIN -Returns all records from the left table and all matching records from the right table.

  3. RIGHT JOIN - Returns all records from the right table, and all matching records from the left table.

  4. FULL JOIN - Returns all records that have at least one match in either table.

  5. CROSS JOIN - Returns all possible combinations of rows from the left table and the right table. This is known as the Cartesian product.

What are Constraints?

Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during creation of table or after creationg using the ALTER TABLE command. The constraints are:

  • NOT NULL - Restricts NULL value from being inserted into a column.
  • CHECK - Verifies that all values in a field satisfy a condition.
  • DEFAULT - Automatically assigns a default value if no value has been specified for the field.
  • UNIQUE - Ensures unique values to be inserted into the field.
  • INDEX - Indexes a field providing faster retrieval of records.
  • PRIMARY KEY - Uniquely identifies each record in a table.
  • FOREIGN KEY - Ensures referential integrity for a record in another table.
CREATE TABLE table_name (
    column1 datatype constraint,
    column2 datatype constraint,
    column3 datatype constraint,
    ....
);

CREATE TABLE Employee (
    ID int NOT NULL UNIQUE,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int CHECK (Age>=18),
    City varchar(255) DEFAULT 'NA',
);

What is a NOT NULL constraint?

The NOT NULL constraint ensures that all values in a column must always contains a value, and not contains NULL. This provides guarantee that record cannot be inserted or updated with NULL value for that column.

/* Create table with NOT NULL constraint on ID, LastName and FirstName column */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255) NOT NULL,
    Age int
);

/* Add NOT NULL constraint on Age column */
ALTER TABLE Employee
MODIFY Age int NOT NULL;

What is a UNIQUE constraint?

A UNIQUE constraint ensures that all values in a column are different. This provides a guarantee of uniqueness for the column(s) and helps identify each row uniquely. There can be multiple unique constraints defined per table.

/* Create table with UNIQUE constraint on ID column */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    UNIQUE (ID)
);

/* Create table with UNIQUE constraint on multiple columns */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    CONSTRAINT UC_Employee UNIQUE (ID, LastName)
);

/* Add UNIQUE constraint on ID column */
ALTER TABLE Employee 	 
ADD UNIQUE (ID);

/* Add UNIQUE constraint on multiple columns */
ALTER TABLE Employee
ADD CONSTRAINT UC_Employee UNIQUE (ID, LastName);

/* Drop UNIQUE constraint */
ALTER TABLE Employee
DROP CONSTRAINT UC_Employee;

What is a Primary Key constraint?

The PRIMARY KEY constraint uniquely identifies each row in a table. It must contain UNIQUE values and has an implicit NOT NULL constraint. A table can have one and only one primary key, which is comprised of single or multiple columns.

/* Create table with PRIMARY KEY constraint on ID column */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    PRIMARY KEY (ID)
);

/* Create table with PRIMARY KEY constraint on multiple columns */
CREATE TABLE Employee ( 	 
    ID int NOT NULL,
    LastName varchar(255) NOT NULL,
    FirstName varchar(255),
    Age int,
    CONSTRAINT PK_Employee PRIMARY KEY (ID, LastName)
);

/* Add PRIMARY KEY constraint on ID column */
ALTER TABLE Employee 	 
ADD PRIMARY KEY (ID);

/* Add PRIMARY KEY constraint on multiple columns */
ALTER TABLE Employee
ADD CONSTRAINT PK_Employee PRIMARY KEY (ID, LastName);

/* Drop UNIQUE constraint */
ALTER TABLE Employee
DROP CONSTRAINT PK_Employee;

What is a Foreign Key constraint?

A FOREIGN KEY is a key used to link two tables together.

A FOREIGN KEY comprises of single or collection of columns in a table that essentially refer to the PRIMARY KEY in another table. Foreign key constraint ensures referential integrity in the relation between two tables.

The table containing the foreign key is called the child table, and the table containing the candidate key is called the referenced or parent table.

/* Department table with ID as PRIMARY KEY */
CREATE TABLE Department (
    ID INT NOT NULL,
    Name VARCHAR(255),
    PRIMARY KEY (ID)
);

/* Employee table with DepartmentID as FOREIGN KEY 
which is linked to ID column of Department table */
CREATE TABLE Employee (
    ID INT NOT NULL,
    Name VARCHAR(255),
    DepartmentID INT,
    PRIMARY KEY (ID),
    FOREIGN KEY (DepartmentID) REFERENCES Department(ID)
);

/* Add FOREIGN KEY constraint on DepartmentID column */
ALTER TABLE Employee
ADD CONSTRAINT FK_DepartmentEmployee 
FOREIGN KEY (DepartmentID) REFERENCES Department(ID);

What is Index?

An index is a structure in a database that can help retrieve data faster. When you search table data with an SQL query, it will search the whole table and return the result. An unindexed table is called a heap. The data stored in such tables are usually not arranged in any particular way. It is stored in the order it was entered. Therefore, searching for data can be very slow and frustrating.

When you query an indexed table, the database will go to the index first and retrieve the corresponding records directly. The two main index types are:

  1. Clustered
  2. Non-clustered

What is Clustered Index?

Clustered index defines the order in which data is physically stored in a table. Therefore, only one clustered index can be created on a given database table.

Q: What is the maximum number of clustered indexes a table can have?
A: 1

This is similar to a physical telephone directory, where the records are sorted by first name, and last name.

In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.

CREATE TABLE Employee
(
    id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    gender VARCHAR(50) NOT NULL
 )

CREATE CLUSTERED INDEX IX_tblEmployee_Id
ON student(id ASC)

Notice that “id” column is primary key in “Employee” table so clustered index is automatically created for “id” column. The clustered index physically stores and maintains the records in “Employee” table in the ascending order of “id” even if you insert by records by random “id”.

Clustered index on “id” make sure that you always get records sorted by “id” using following statement:-

SELECT * from Employee

What is Non-clustered Index?

Non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place.

This is similar to a textbook where the book content is located in one place and the index is located in the last few pages. This allows for more than one non-clustered index per table.

Q: If we create many non-clustered indexes on a table, what operations are likely to be considerably slower than before?
A: Write operations (INSERT, UPDATE, DELETE, MERGE)

Q: When we talk about non-clustered index, how many KEY columns can it have and how many INCLUDED columns?
A: 16 key columns and unlimited include columns (all table columns can be included).

Q: Which datastructure is used generally to create non-clustered indexes on a table?
A: B and B+ Trees are mostly used. Some of the database also use BitMap data structure to create indexes where cardinality is low.

What is the difference between Clustered and Non-Clustered Index?

Clustered vs Non-Clustered Index
Clustered IndexNon-Clustered Index
Can be only one clustered index per tableCan create multiple non-clustered indexes on a single table
Clustered indexes only sort tables. Therefore, they do not consume extra storage.Non-clustered indexes are stored in a separate place from the actual table claiming more storage space.
Clustered indexes are fasterNon-clustered indexes are slower

What is Composite Index?

Whether clustered or non-clustered index, when multiple columns are used to create index then it is known as Composite Index

CREATE NONCLUSTERED INDEX IX_tblEmployee_Name_Gender
ON student(name ASC, gender DESC)

Note that the sequence of column in Composite Index matters. Above composite index will be used to find records for below queries:-

SELECT * from Employee where name="John" and gender="Male"
SELECT * from Employee where name="Eva"

but not for below query:-

SELECT * from Employee where gender="Female"

What is Cardinality for Database Query Optimization?

When it comes to database query optimization, Cardinality refers to number of unique values in a column of a table. We can calculate cardinality using SQL statement:-

SELECT DISTINCT COUNT(columnName) from tableName;

The higher the cardinality of column, the lower the duplicate values in the column. There are three types of SQL cardinality :-

  1. High Cardinality refers to columns with mostly uncommon or unique values.
    such as employee_id, email_address, phone_number columns in Employee Table.
    Unique indexes are used to improve performance of such columns.
  2. Normal Cardinality refers to columns with somewhat uncommon values.
    such as first_name, last_name columns in Employee Table.
  3. Low Cardinality refers to columns with mostly duplicate values
    such as gender, joining_year columns in Employee Table.
    Bit-map indexes are used to improve the performance of such columns.

How to do Performance tuning of SQL queries?

Queries on large databases can have significant execution times. Few ways to optimize SQL queries are as follows:-

  1. Use Indexes
    A straightforward, yet effective, method for query optimization is indexing. This way, the whole database won’t need to be searched for these queries. For example, if a system needed to query first and last names and ID numbers of employees based on their age, the following script could be used to create the index:

    CREATE INDEX age_queries
    ON Employees (last_name, first_name, id, age);
    
  2. Use Inner Join instead of Where
    To link data from two or more tables, the Where clause is used. However, Where uses a kind of join that joins each record of each table and then filters it for the result. This means that if there are 100 records in the Employee table and 100 in the Salary table, it will generate a table of 10,000 records and then check for the correct records for this query.

    Select Employees.first_name, Employees.last_name, Employees.id
    From Employees, Salary
    Where Employees.id = Salary.id;
    

    In contrast, if the Inner Join is used, the required records will be produced without anything in the middle.

    Select Employees.first_name, Employees.last_name, Employees.id
    From Employees Inner Join Salary
    On Employees.id = Salary.id;
    
  3. Use Limit for testing
    Before running a query on a large database with other potential users, it is advised to use the limit operator to ensure that it actually works. The limit keyword returns a limited number of results from a query, as shown below.

    Select Employees.first_name, Employees.last_name, Employees.id
    From Employees Inner Join Salary
    On Employees.id = Salary.id
    Where Salary.amount > 4500
    Limit 10;
    

    This will return only ten names and ID numbers of Employees with salaries greater than 4,500.

  4. Use Explain
    When querying a large database, it might be smart to use the Explain operator before running the query. Explain displays what is called a Query Plan. The query plan shows which indexing is used, order of execution and the approximate time each step might take. Using this information, more time-expensive steps may be better optimized. The following example shows how to use this operator:

    Explain
    Select Employees.first_name, Employees.last_name, Employees.id
    From Employees Inner Join Salary
    On Employees.id = Salary.id
    Where Salary.amount > 4500;
    
  5. Be Specific with Select queries
    When working with Select queries, it is smart to avoid the * operator, which returns all columns for the queried records unless otherwise required. Instead, only the required information should be extracted. For example:

    Select *
    From Employees;
    

    This operation would generate all columns of all records from the Employees table. This is inefficient if only the names were required. In that case, the following query should be used:

    Select first_name, last_name
    From Employees;
    

Find Second Highest Salary from Employee Table?

+----+--------+
| Id | Salary |
+----+--------+
| 1  | 100    |
| 2  | 200    |
| 3  | 300    |
+----+--------+
SELECT max(Salary) AS SecondHighestSalary
FROM Employee
WHERE Salary < (SELECT max(Salary) FROM Employee)
+---------------------+
| SecondHighestSalary |
+---------------------+
| 200                 |
+---------------------+

Find Duplicate Emails from Employee Table?

+----+---------+
| Id | Email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+
SELECT Email
FROM Employee
GROUP BY Email
HAVING count(Email) > 1;
+---------+
| Email   |
+---------+
| a@b.com |
+---------+