SQL: ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE() Ranking functions

Introduction: One of the most obvious and useful set of window functions available in SQL Server are ranking functions where rows from result set are ranked according to a certain scheme. 

In previous articles i have explained How to Update table data using inner join in sql server and CTE recursive query to get employee manager hierarchy with level and  Query to search any text in all stored procedures,views and functions and Query to get upcoming birthdays within week and Without primary key column update first or last n records in table

Ranking functions return a numeric ranking value for each row in a partition.  Some rows might receive the same value as other rows depending on the ranking function used. So, 
Ranking functions are non-deterministic.

There are four ranking functions available in Sql:
1) ROW_NUMBER()
2) RANK()
3) DENSE_RANK()
4) NTILE()

These functions are having some similarities and significant difference.

Similarity:

1) Should have OVER clause and ORDER BY clause inside the OVER clause. PARTITION BY clause is optional.

2) Can have PARTITION BY clause inside the OVER clause. When combined with a PARTITION BY clause, all of these functions reset the returned integer value to 1.

3) All of them return an increasing integer with a base value of 1.

4) ROW_NUMBER(), RANK() and DENSE_RANK() functions return the same output If there are no duplicate values in the column used by the ORDER BY clause.

Differences       

Differences will be described with practical examples mentioned in this article.

ROW_NUMBER:  Assigns unique number starting at 1 for every row, even if one or more rows have the same value.

ROW_NUMBER can be used with or without PARTITION BY clause:

ROW_NUMBER() Function without PARTITION BY clause: If we want to add serial number/row number to each row of the result set regardless they are same or different.

ROW_NUMBER() Function with PARTITION BY clause: If we want to add serial number/row number to each group, and reset it for every group.

Similarly RANK(), DENSE_RANK() and NTILE() ranking functions can also be used with or without PARTITION BY clause.

RANK: It ranks each row in the result set based on the mentioned column in Over clause of query. If values in the ranking column are the same, they receive the same rank. However, the next number in the ranking sequence is skipped creating a gap in between the groups.

Example: If same record is repeated 3 times and has rank "3", the next rank will be 3+3=6 i.e. there will be a gap in between the records sequence.

DENSE_RANK: This function is similar to Rank with only 1 difference; this will not leave gaps between groups if there is a tie between the ranks of the preceding records. The next number in the ranking sequence is then used to rank the row or rows that follow.

Example: If same record is repeated 3 times and has rank "3", the next rank will be 4 i.e. no gaps in between the record sequence.

NTILE: It divides the result set into the number of groups specified as an argument to the function. If we pass 3 as an argument to this function then it will divide the result set into 3 groups.

That division is based on the total number of rows in the result set divided by the number in the argument. NTILE is non-deterministic.

If the total number of rows in result set is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 12, and there are 3 groups, each group will contain 4 rows.

But as in our example below if there are 11 rows in result set and we pass 3 as an argument to this function then it will break the result set into 3 groups. First and second group will have 4 records each and last group will have remaining 3 records. First group records will have rank 1, second group records will have rank 2 and third group records will have rank 3.

Though all these are ranking functions, the difference between rank(), dense_rank(),  row_number() can be understood when we have ties on ranking i.e. duplicate records. For example, if we are ranking students by their marks then what would be the rank of two students scoring same marks?
It depends upon which ranking function we use.

Let's create a temporary table and insert some dummy data into it for demonstration purpose using the following script.

IF OBJECT_ID( 'tempdb..#tbStudent' ) IS NOT NULL
DROP TABLE #tbStudent;

GO
CREATE TABLE #tbStudent
(
            StudentId        INT IDENTITY(1,1) PRIMARY KEY NOT NULL ,
            StudentName VARCHAR(50),
            Section            CHAR(1),
            Marks              INT,
);

INSERT INTO #tbStudent
VALUES
('Rohan', 'A',90),
('Priya', 'A',45),
('Aasif', 'A',90),
('Vivek', 'B',58),
('Arjun', 'B',40),
('Sidharth', 'A',100),
('Aalia', 'A',95),
('Kabir', 'B',90),
('Robin', 'B',40),
('Rahul', 'A',30),
('Anju', 'B',32);

--View table records
SELECT * FROM #tbStudent

Output will be as: 
StudentId
StudentName
Section
Marks
1
Rohan
A
90
2
Priya
A
45
3
Aasif
A
90
4
Vivek
B
58
5
Arjun
B
40
6
Sidharth
A
100
7
Aalia
A
95
8
Kabir
B
90
9
Robin
B
40
10
Rahul
A
30
11
Anju
B
32

As can see three students (Rohan, Aasif and Kabir) scored same marks i.e. 90 and (Arjun and Robin) scored same 40 marks.

Now let’s rank them using all ranking functions one by one and see the difference.

ROW_NUMBER() without Partition By clause
It always generates a unique ranking even with duplicate records. That's why it is useful to solve problems like second or nth highest marks/salary among students/employees etc.

In the following example, we have three students (Rohan, Aasif and Kabir) and two student (Arjun and Robin) with the same marks and even though we have generated row numbers over marks column it produces different row number for these students with the same marks i.e. 11 unique numbers in sequence for 11 records.

SELECT StudentId, StudentName, Section, Marks, ROW_NUMBER() OVER (ORDER BY Marks DESC) AS RowNumber FROM #tbStudent

Output will be as:
StudentId
StudentName
Section
Marks
RowNumber
6
Sidharth
A
100
1
7
Aalia
A
95
2
8
Kabir
B
90
3
1
Rohan
A
90
4
3
Aasif
A
90
5
4
Vivek
B
58
6
2
Priya
A
45
7
5
Arjun
B
40
8
9
Robin
B
40
9
11
Anju
B
32
10
10
Rahul
A
30
11

As we can see Rohan, Aasif and Kabir have same marks i.e. 90 but they got the different 3rd, 4th and 5th rank and Arjun and Robin scored same 40 marks and they got 8th and 9th rank. In case of a tie, ranks are assigned on a random basis (any of the students scoring same marks can appear before or after the other if we execute this query multiple times.)

ROW_NUMBER() with PARTITION BY clause:

SELECT StudentId, StudentName, Section, Marks, ROW_NUMBER() OVER (PARTITION BY SECTION ORDER BY Marks DESC) AS RowNumber FROM #tbStudent

Output will be as:
StudentId
StudentName
Section
Marks
RowNumber
6
Sidharth
A
100
1
7
Aalia
A
95
2
1
Rohan
A
90
3
3
Aasif
A
90
4
2
Priya
A
45
5
10
Rahul
A
30
6
8
Kabir
B
90
1
4
Vivek
B
58
2
5
Arjun
B
40
3
9
Robin
B
40
4
11
Anju
B
32
5

Here we have partitioned the records based on student Sections. So it generated row number 1 to 6 for first group i.e. for Section A, and reset it for next group and generated row number 1 to 5 for Section B.

Similarly RANK(), DENSE_RANK() and NTILE() ranking functions can also be used with PARTITION BY clause. You can give them a try.

RANK()
The rank() function assigns same rank to same values which are not distinguishable by ORDER BY. Also, the next different rank does not start from immediate next number but there is a gap i.e. if 3rd, 4th and 5th student have the same marks then they will have same rank 3 as in our example, and the next student Vivek which has different marks will have new rank 6.

Here is the example to clarify the point.

SELECT StudentId, StudentName, Section, Marks, RANK() OVER (ORDER BY Marks DESC) AS [Rank] FROM #tbStudent

Output will be as:
StudentId
StudentName
Section
Marks
Rank
6
Sidharth
A
100
1
7
Aalia
A
95
2
8
Kabir
B
90
3
1
Rohan
A
90
3
3
Aasif
A
90
3
4
Vivek
B
58
6
2
Priya
A
45
7
5
Arjun
B
40
8
9
Robin
B
40
8
11
Anju
B
32
10
10
Rahul
A
30
11

Rohan, Aasif and Kabir got the same rank 3, but the next student Vivek got the rank 6 , instead of 4 because it keeps original ordering similar to ROW_NUMBER function. Similarly Arjun and Robin got the same rank 8 and next student Anju got rank 10 instead of 9.

DENSE_RANK()

The dense_rank() function is similar to rank() window function i.e. same values will be assigned the same rank, but the next different value will have rank which is just one more than the previous rank i.e. if 3rd, 4th and 5th students (Rohan, Aasif and Kabir) has the same marks then they will have same rank 3 but 6th student Vivek, which has different marks will have rank 4, unlike rank 6 as in the case with rank() function.  Similarly 8th and 9th students Arjun and Robin got the rank 6 and next student Anju got rank 7 instead of 10 as in the case with rank() function.

There will be no gap on ranking in case of dense_rank() as shown in the following example:

SELECT StudentId, StudentName, Section, Marks, DENSE_RANK() OVER (ORDER BY Marks DESC) AS DenseRank FROM #tbStudent

Output will be as:
StudentId
StudentName
Section
Marks
DenseRank
6
Sidharth
A
100
1
7
Aalia
A
95
2
8
Kabir
B
90
3
1
Rohan
A
90
3
3
Aasif
A
90
3
4
Vivek
B
58
4
2
Priya
A
45
5
5
Arjun
B
40
6
9
Robin
B
40
6
11
Anju
B
32
7
10
Rahul
A
30
8
  
NTILE()

It divides the result set into the number of groups specified as an argument to the function. If we pass 3 as an argument to this function then it will divide the result set into 3 groups.
That division is based on the total number of rows in the result set divided by the number in the argument.

Here is the example to clarify the point.

SELECT StudentId, StudentName, Section, Marks, NTILE(3) OVER(ORDER BY Marks DESC) AS [NTILE] FROM #tbStudent

Output will be as:
StudentId
StudentName
Section
Marks
NTILE
6
Sidharth
A
100
1
7
Aalia
A
95
1
8
Kabir
B
90
1
1
Rohan
A
90
1
3
Aasif
A
90
2
4
Vivek
B
58
2
2
Priya
A
45
2
5
Arjun
B
40
2
9
Robin
B
40
3
11
Anju
B
32
3
10
Rahul
A
30
3

There are 11 rows in result set and we passed 3 as an argument to this function, it divided the result set into 3 groups. First and second group have 4 records and third group have remaining 3 records.

Difference between ROW_NUMBER() vs RANK() vs DENSE_RANK() and NTILE()

As demonstrated and explained, the difference between rank, row_number, and dense_rank can be noticed when there are duplicate records. Since in all of the above examples we are ranking records based on marks, if more than one student will have the same marks then the difference between these three ranking functions can be noticed.

The row_number gives continuous numbers, while rank and dense_rank gives the same rank for duplicates, but the next number in rank is as per continuous order so you will see a jump. but dense_rank doesn't have any gap in rankings. ntile divides the result into specified groups and a group number is then assigned to each row identifying which group the row belongs to.

And here is the query which clearly shows the difference in the ranking functions:

SELECT
StudentId, StudentName, Section,Marks,
ROW_NUMBER() OVER (ORDER BY Marks DESC) AS RowNumber,
RANK() OVER (ORDER BY Marks DESC) AS [Rank],
DENSE_RANK() OVER (ORDER BY Marks DESC) AS DenseRank,
NTILE(3) OVER(ORDER BY Marks DESC) AS [NTILE]
FROM #tbStudent

Output will be as:
StudentId
StudentName
Section
Marks
RowNumber
Rank
DenseRank
NTILE
6
Sidharth
A
100
1
1
1
1
7
Aalia
A
95
2
2
2
1
8
Kabir
B
90
3
3
3
1
1
Rohan
A
90
4
3
3
1
3
Aasif
A
90
5
3
3
2
4
Vivek
B
58
6
6
4
2
2
Priya
A
45
7
7
5
2
5
Arjun
B
40
8
8
6
2
9
Robin
B
40
9
8
6
3
11
Anju
B
32
10
10
7
3
10
Rahul
A
30
11
11
8
3

As we can see In case of tie :

ROW_NUMBER() functions assigned unique row numbers to each row even for records with same marks i.e. 11 unique numbers in sequence for 11 records.

RANK() function assigned same rank for same marks, but the next different rank is not in sequence, there is a gap. Here in above example Rohan, Aasif and Kabir got rank 3 and Vivek got rank 6 since rank 3 repeated 3 times. Similarly Arjun and Robin got the same rank 8 and next student Anju got rank 10 instead of 9.

DENSE_RANK() also assigned same rank for same marks but there is no gap in between the sequence. The next different is in sequence. Here in above example Rohan, Aasif and Kabir got rank 3 and next student Vivek got rank 4. Similarly Arjun and Robin got the rank 6 and next student Anju got rank 7 instead of 10.

NTILE(3) Divided the result set into 3 groups.

If we use PARTITION BY with all ranking functions then the output will be as:

SELECT StudentId, StudentName,Section, Marks,
ROW_NUMBER() OVER (PARTITION BY Section ORDER BY Marks DESC) AS [row number with partition],
RANK() OVER (PARTITION BY Section ORDER BY Marks DESC) AS [rank with partiton],
DENSE_RANK() OVER (PARTITION BY Section ORDER BY Marks DESC) AS [dense rank with partition] ,
NTILE(3) OVER(PARTITION BY Section ORDER BY Marks DESC) AS [ntile with partiton]
FROM #tbStudent

Output will be as:
StudentId
StudentName
Section
Marks
RowNumber
Rank
DenseRank
NTILE
6
Sidharth
A
100
1
1
1
1
7
Aalia
A
95
2
2
2
1
1
Rohan
A
90
3
3
3
2
3
Aasif
A
90
4
3
3
2
2
Priya
A
45
5
5
4
3
10
Rahul
A
30
6
6
5
3
8
Kabir
B
90
1
1
1
1
4
Vivek
B
58
2
2
2
1
5
Arjun
B
40
3
3
3
2
9
Robin
B
40
4
3
3
2
11
Anju
B
32
5
5
4
3

It works similar but for each partition i.e. for each group.(Section)

That's all about the difference between ROW_NUMBER(), RANK(), DENSE_RANK() and NTILE() function in SQL SERVER.

Now over to you:
A blog is nothing without reader's feedback and comments. So please provide your valuable feedback so that i can make this blog better and If you like my work; you can appreciate by leaving your comments, hitting Facebook like button, following on Google+, Twitter, Linkedin and Pinterest, stumbling my posts on stumble upon and subscribing for receiving free updates directly to your inbox . Stay tuned and stay connected for more technical updates.  
Previous
Next Post »

If you have any question about any post, Feel free to ask.You can simply drop a comment below post or contact via Contact Us form. Your feedback and suggestions will be highly appreciated. Also try to leave comments from your account not from the anonymous account so that i can respond to you easily..