Skip to main content

Excel Find and Remove Duplicates

https://www.myonlinetraininghub.com/excel-find-and-remove-duplicates

Highlight Duplicates with Conditional Formatting

Conditional Formatting can quickly highlight duplicates in a column. Simply select the column of cells containing the suspected duplicates > Home tab > Conditional Formatting > Highlight Cells Rules > Duplicate Values:
highlight duplicates with conditional formatting
Tip: You can change the format by clicking the drop down for ‘Values with’ (see image above).
Once the formatting is applied you can use filters (Data tab > Filters), based on the cell fill color or font color to display or hide the duplicate values:
Excel Find and Remove Duplicates
Pros: Great for visually highlighting duplicates in a column while retaining them in the dataset. You can use filters to hide duplicates or focus on them.
Cons: Duplicates remain in the dataset, and that may be exactly what you want, but if you just want to get rid of them, then keep reading.
This method also doesn’t highlight the row and only identifies duplicates in a single column.

Highlight Duplicate Rows with a Conditional Formatting Formula

Let’s say you want to highlight rows that contain duplicates across a row. For example, rows 9 and 11 have the same Date and ID:
duplicate rows
For this we need to apply the conditional formatting using a formula:
apply conditional formatting
Pros: Highlights the whole row and takes into consideration more than one column. Filters can be used to hide duplicates from view.
Cons: Formula can be difficult to remember. Duplicates remain in the dataset.

Identify Duplicates with a Formula

You can add a column to your data table to tag rows containing duplicates. The formula below is looking for duplicate rows, i.e. where both the Date and ID values are duplicated:
identify duplicates with a formula
The formula in cell C7:
=IF(COUNTIFS($A$7:$A$11,$A7,$B$7:$B$11,$B7)>1,"Duplicate","")
uses COUNTIFS to check both the Date and ID columns are the same, if the count is greater than 1then ‘Duplicate’ is returnedotherwise the cell is left blank.
Tip: If you only want to check a single column, let’s say the ID column, then you could use the COUNTIF formula like so:
using COUNTIF to identify duplicates
Pros: Column containing duplicate tag can be used in PivotTables or other formulas to ignore or focus on duplicate rows.
Cons: Formula can be difficult to remember. Requires an extra column in your dataset. Could be cumbersome in large files.

Remove Duplicate Values

We’ve looked at highlighting or tagging cells or rows containing duplicates, but sometimes you want to remove duplicates so you have a unique list of values. There are a few ways to tackle this.
Let’s say we want to remove duplicate rows from the table below i.e. we want to retain row 7 with ‘Produce’ and ‘Richard’, but we want to remove one of the duplicate rows (9 or 11) containing ‘Produce’ and ‘Rachel’:
remove duplicate values
We can use the Remove Duplicates tool on the Data tab of the ribbon:
Remove Duplicates tool
By selecting both the Department and Name columns I’m telling Excel that I want it to find duplicates where the values in both columns are the same. Note that I also have the ‘My data has headers’ box checked so it ignores my headers.
And I’m left with a list of unique rows:
remove duplicate rows
Pros: Quick and easy to use.
Cons: Removal of duplicates is permanent. If your data gets updated then you need to run the Remove Duplicates process again.

Power Query Remove Duplicates

Power Query (available in Excel 2010 onwards), also has a Remove Duplicates tool.
Format your data in an Excel Table then load the data into Power Query:
Excel 2010 & 2013: Power Query tab > From Table:
load data in Power Query 2010 and 2013
Excel 2016: Data tab > Get & Transform group: From Table:
load data in Power Query 2016
This will load the data into Power Query and open the Power Query Editor window. In the Power Query Editor simply select the columns you want it to find duplicates for (hold Ctrl to select multiple columns, or Shift to select contiguous columns) > Home tab > Remove Rows > Remove Duplicates:
remove duplicates
Pros: The great thing about using Power Query is if your source data gets updated you can Refresh the query and it will remove duplicates again, with just the click of a button. Original data remains intact, plus you have a new view of the data that excludes the duplicates.
Cons: Requires a few more steps than the previous example. Retaining original data may make the file unnecessarily large. If so, the original data can be stored in a separate file.
*Versions of Excel supporting Power Query. Download Power Query here.

Remove Duplicates with Advanced Filter

Advanced Filter can extract a list of unique items from a column or columns. First select the data, then Data tab > Advanced:
remove duplicates with advanced filter
In the Advanced filter dialog box (image above) choose to copy the list to another location (4 & 5), and check the box for ‘Unique records only’. And voila, we now have two lists, the original, and the list excluding duplicates in columns E & F:
unique records only
Pros: Reasonably easy to use. Also has an option to just filter the list to hide duplicates. Can handle multiple columns of data.
Cons: No link is maintained between the original data and the filtered data. If the original data gets updated then the Advanced Filter must be run again.

Identify Duplicates with PivotTables

A PivotTable is an excellent way to quickly identify if you have any duplicates in a column.
Place the field you want to check for duplicates in both the Rows and Values areas, in my case it’s the Name field. The PivotTable gives you a list of unique names and the count:
identify duplicates with PivotTables
Tips: sort the PivotTable Count column in descending order to bring the duplicates to the top; right-click a cell in the values area > Sort > Sort Largest to Smallest:
Sort the PivotTable Count
Or filter the Count column to only show records greater than 1:
filter the Count column
Pros: Quick and easy to do and great for large datasets because you can sort the count in descending order to bring any duplicates to the top, or filter to only show duplicates. The PivotTable also provides the count of an item so you can see how many times it is duplicated.
Cons: Doesn’t remove duplicates, only highlights them.
So, there you have 6 ways to identify or remove duplicates. Depending on my needs I like to use Power Query to remove duplicates, Conditional Formatting to visually indicate duplicate records and PivotTables to identify if large datasets contain duplicates.

Related Lessons

Comments