Automatic Clustering 🚀
You can use the clustering feature to automatically merge alternative representations of the same values together using algorithms.
1. Click the button cluster to open the Cluster window
2. Choose the algorithm adapted to your cleaning task:
- Edit distance Groups values together when the number of characters added, changed or deleted to get from one value to the other is equal or inferior to the parameter Maximum distance. For instance, the distance between "Cafés" and "cafe" is 3: Edit C into c , edit é into e, and add the s .
- Fingerprint Group values by matching their fingerprint. This algorithm is effective if your values are capitalized irregularly or if special characters are used in some and not others. For instance, "café" and "Cafe" both have the same fingerprint, "cafe".
3. Once you found the desired setting, edit the
New value for each cluster in order to rename all values it contains.
merge selected 💫
1. Select a column that contain irregular spelling or capitalization of the same values, for example: Apple, Aple and APPLE. The module lists all unique values within the column:
2. Edit values in the module. As you do so, identical values are merged together, and all instances are instantly corrected in the table.
After you merged values, you can get any of the original values it comprises un-clustered. This will result in the tables showing those non-edited values again.
1. Open a cluster by clicking on the icon next to it
2. Within the cluster, mouse over the number of instances at the right of the value you wish to delete, to delete it. The value is now back in the main list, and in the table.
Filtering out values
You can drop a value or a cluster from the table using the checkbox on its left. All rows containing the value (or all the values within a cluster!) will be dropped too.
The Refine module can be switched to Edit mode by clicking on the expand icon in its header.