This powerful step helps you group rows that have the same values into summary rows, such as "the number of customers in each country", or "the number of passengers per month or week".
Grouping by one column
1. Select the column you wish to group by.
2. Click 'execute'.
By default, the operation 'Count' will count the number of rows within each group and display the result in a new column.
3. If needed, rename the operation output column to an accurate description of the data.
You can add as many operations as needed by click the
+ADDbutton. Each operation's result will be returned in a new column.
Select the type of operation:
Count: Count the number of rows in each group.
Count unique: Count the number of unique values stored in a group. If the same value is found twice or more, it will return a count of
Sum: Returns the sum of all numerical values stored in a group.
Average (mean): Returns the average of all numerical values stored in a group. Learn more about Average (mean) values.
Median: Returns the median of all numerical values stored in a group. Learn more about Median values.
Min: Returns the smallest numerical value in each group
Max: Returns the largest numerical value in each group
First: Returns the value stored in the first row in each group
If needed, select the column that contains the values you want to use in the operation. For example, if you want to count the total of sales for each month, you will need to group by Month and select the column containing sales data to sum them up for each group.
Grouping by date
Make sure that the column you are grouping by is of the type 'Date'.
If you are working with a Timestamp column, you will also need to specify the timezone of the input data in order for Workbench to interpret dates accurately. You can do so in the same 'Convert timestamp to date' step.
Help with dates
If you need help figuring out the types of columns you want to group by, select the option 'Help with dates' (make sure columns you want to group by are selected) and press 'Play'. Workbench will display options to convert those columns in Date type.
Below are a few examples - If you need a primer on date types, please follow the Introduction to date types tutorial.
Example 1 - Converting types from Timestamp to Date
The column's type is Timestamp. Clicking 'Help with dates' provides a button to convert the column 'Created_at' from the type Timestamp to the type Date.
After converting the column to the type Date, a new step is added and both the unit and timezone can be edited.
Example 2 - Converting types from Text to date
The column's type is text. Clicking 'Help with dates' provides two buttons
One to convert the column 'Created_at' from the type Text to the type Date.
One to convert the column 'Created_at' from the type Text to the type Timestamp.
The format of the text in the column 'created_at' is a format used for dates. After converting the column from Text to Date, a new step is added and both the input timezone and output unit can be specified.
Example 3 - Converting types from Text to Timestamp, then to date.
During your data cleaning, you may encounter columns that have the type Text but are meant to be Timestamp. For example, if you upload a CSV file containing timestamps. This is the case below: The values look like timestamps, but the column type is Text.
In this case, the column needs to be converted to Timestamp first..
.. and then from Timestamp to Date, where the timezone and date unit can be specified.