Cohort analysis is used for getting insights about specific users (or another kind of actors) behaviour when they are grouped in cohorts by some criteria. This kind of analytics is performed with dataset represents a history of some events; this might be page views statistics or transactions (say, purchases) or user actions log. Typically cohort analysis is used for:
To perform cohort analysis on input you need to have a dataset with the following columns:
user_id
create_date
or registration_date
event_date
This article describes how to use SeekTable for cohort analysis and create user retention report (like in Google Analytics).
Online demo for this kind of report: user retention report sample.
Lets assume that our history data is represented by user_events.csv file. After file upload SeekTable automatically suggests "year" and "month" dimensions for date columns:
To perform cohort analysis we need to add one custom dimension to calculate number of months (this may be days or weeks) between event data and registration date (Cube → Edit Configuration). For this purpose "Expression" type of dimension can be used:
Type | Name | Label | Format | Parameters |
---|---|---|---|---|
Expression |
month_from_reg |
Months from Reg Date |
|
In dimension with type="Expression" 1-st parameter is a formula expression, and next N parameters are names of dimensions that are used as arguments of the formula. See also reference on calculated fields for more details on this topic.
Then let's add a measure that counts number of unique users inside groups:
Type | Parameters | Name | Label |
---|---|---|---|
CountUnique |
user_id |
CountUniqueOfUserID |
Count Unique Users |
Now we can configure user retention by month report:
You can use options from "Format" tab to make your user retention report like in Google Analytics:
% of row
By Row
Grand Total
and Column Totals
Now you have basic cohort report; you can use another dimensions on columns to change cohort criteria; also you can change the formula and user another period instead of month - say, a day or quarter.
In case of SQL database all steps are the same as for CSV, only difference is that instead of
CountUnique
measure type (which is available only for CSV cubes) you need to configure custom measure based on SQL COUNT(DISTINCT)
:
FirstValue
COUNT(DISTINCT user_id)
CountUniqueOfUserID
Count Unique Users
In the case of SQL data source it makes sense to calculate number of months (weeks, days) between an event and a registration date on SQL level,
without usage of Expression-type dimension for month_from_reg
. For example:
Field
month_from_reg
Months from Reg Date
DATEDIFF(month, event_date, reg_date)
TIMESTAMPDIFF(MONTH, event_date, reg_date)
(DATE_PART('year', reg_date::date) - DATE_PART('year', event_date::date))*12 + DATE_PART('month', reg_date::date) - DATE_PART('month', event_date::date)