You can configure a 'live' connection to your SQL database and use it as a data source for operational reports (pivot tables, charts, flat tables) .
Data is not imported: SeekTable executes aggregate queries SELECT .. GROUP BY
on-the-fly (direct query) to load only necessary data for the concrete report.
There are no limitations on the dataset size; however, your database should be able to execute aggregate queries fast enough: ideally in seconds (up to 2 mins max). In case of moderate size fact tables - up to millions of rows - most popular DBs can do that. For large tables a dataset size can be limited by applying WHERE conditons on indexed columns (see parameters setup section below), or by usage of pre-aggregated tables / materialized views. For a real-time big data analytics (billions of rows, TBs in size) you can consider to use specialized columnar databases (data warehouses):
SQL-based cubes are defined with a fact table; for flexibility purposes it is configured as a Select Query template, so this could be a resultset returned by a view/function/complex SQL statement. Cube dimensions are mapped to columns (or SQL-expressions), and measures are SQL aggregate functions.
For summary reports (pivot tables/charts) SeekTable ROLAP engine generates GROUP BY
queries in this way:
SELECT * FROM facts
column1
and column2
dimensions on Rows/Columns and Sum of column3
measure on Values
SeekTable executes this query under the hood:
SELECT column1, column2, SUM(column3) FROM facts GROUP BY column1, column2
SELECT *
generated query will be:
SELECT column1, column2, SUM(column3) FROM (<Select Query>) t GROUP BY column1, column2
A star-schema is fully supported: dimensions may be resolved with conditional JOINs, also JOINs may be inside Select Query and it is possible to include them into the final query depending on the report configuration (only when JOIN is needed).
SELECT * FROM some_table_or_dataviewYou can specify a complex SQL query here (with JOINs, WHERE).
SELECT * FROM
if possible (in this case SeekTable will not wrap it with the outer SELECT
and *
will be replaced with actual columns needed for the report)GROUP BY
unless you really want to work with pre-aggregated datasetORDER BY
, it will be ignored anywayt
table alias, it is reserved for the outer SELECT
generated by the engine (when needed)If everything is fine you should see a new cube dashboard with the list of available dimensions and measures.
In case of connection error you'll see an orange box with an error message; you may click on "Edit Configuration" and apply necessary changes.
Field
: dimension name refers to table column or result of SQL expression (can be provided as the first "Parameters" value).Expression
: dimension is defined as calculated field with custom formula that uses another dimensions as arguments (formula and arguments should be specified in "Parameters").
Field
this is column name specifier (possibly with table alias prefix).{0}
can be used).
Examples:
prefix {0} suffix
→ append custom prefix and/or suffix{0:yyyy-MM-dd}
→ format date (or timestamp) as 2017-05-25{0:MMM}
→ format month number (1-12) as a short month name (Jan, Feb etc){0:MMMM}
→ format month number (1-12) as a full month name (January, February etc){0:ddd}
→ format day-of-week number (0-6) as a short day-of-week name (0=Sun, 1=Mon, 2=Tue etc){0:dddd}
→ format day-of-week number (0-6) as a full day-of-week name (0=Sunday, 1=Monday, 2=Tuesday etc)Field
: you can specify a custom SQL expression for this dimension (1-st parameter), or dimension's ID column when dimension name refers to a column from "Conditional JOIN rule". Report parameters may be used in this SQL, in this way you can have a dimension that depends on the user-entered value(s).
(<expr>)
until you want to force
special SQL generation mode which uses GROUP BY <ordinal>
syntax
for this particular dimension (this convention works only if DB supports this syntax).
Expression
: you can specify formula expression (1-st parameter) and dimension names for the arguments (2-nd, 3-rd etc parameter).
Count
: the number of aggregated rows.Sum
: the total sum of a numeric column.Average
: the average value of a numeric column.Min
: the minimal value of a column.Max
: the maximum value of a column.FirstValue
: custom SQL aggregate expression; for example: 'COUNT(DISTINCT some_column)'.Expression
: measure is defined as a formula calculated by SeekTable.Count
: no parameters needed.Sum
/Average
/Min
/Max
: a column name to aggregate (this can be a custom SQL expression).FirstValue
: the first parameter is a custom SQL aggregate expression.Expression
: the first parameter is an expression, and next parameters are names of measures used as arguments in the expression.
{0}
can be used).
Examples:
{0:$#.##}
→ format number as $10.25 (or empty if no value){0:0,.0#}k
→ show number in thousands with "k" suffix{0:0.#|k}
→ if number>1000 shorten it with "k" suffix{0:0,,.0#}M
→ show number in millions with "M" suffix{0:0.#|M}
→ if number>1000000 shorten it with "M" suffix{0:0.#|kMB}
→ shorten large number with appropriate "k"/"M"/"B" suffixReport parameters are used when you need to declare user-defined variable and use it in the SQL template as you want; typical usage is SQL query filtering with WHERE conditions.
String
: text-based value.Int32
: 32-bit integer (max value is 2,147,483,647).Int64
: 64-bit integer (max value is 9,223,372,036,854,775,807).Decimal
: Fixed-point number with max 28 significant digits. Decimal point is '.' character.DateTime
: datetime or date value (in this case datetime value has 0:00:00 time). Date value should be specified as string in YYYY-MM-DD format.Boolean
: accepts only 'True' or 'False' value.Parameter["param_name"]
.When parameter is defined it can be used in Select Query as following:
SELECT * FROM orders o WHERE 1=1 @orderDate[ and o.orderDate>={0} ]
Parameter's placeholder syntax:
@
orderDate
[ ]
@paramName[ expression_when_defined ; expression_when_NOT_defined ]
{0}
Sql.Raw
function).Notes:
SELECT * FROM orders WHERE 1=1 @countries[ and o.country IN ({0}) ]
;
→ ;;
or \;
]
→ ]]
or \]
{
→ {{
or \{
}
→ }}
or \}
\
→ \\
@
→ \@
@table_join_enabled[ LEFT JOIN table a ON (a.id=table_id @column_param[and a.column={0}] ) ]
Data in SQL databases often is organized with star schema: when main facts table contains only numbers and foreign keys, and actual values are stored separately in dimension tables. To resolve these lookups these dimension tables should be joined in the cube's Select Query, for example:
SELECT o.*, c.name as country_name FROM order o LEFT JOIN countries c ON (c.country_id=o.country_id)
This works fine if you have relatively small number of rows in the 'orders' table and everything works very fast. But what if you have millions of rows?.. Every JOIN adds signficant overhead and excessive database load.
Fortunately, solution exists: JOINs may be applied after facts table aggregation, and only when they are needed for the concrete report. This is possible with Conditional JOINs for lookups setup, for example:
SELECT * from orders o
JOIN SQL | Apply For Dimensions |
---|---|
LEFT JOIN countries c ON (c.country_id=t.country_id)
Important: always use t. prefix to refer a column from facts table, no matter what you've specified in the "Select Query". |
c.country_name Names of one or more dimensions that require this JOIN to resolve its value. |
Type | Name | Parameters |
---|---|---|
Field |
c.country_name Refers to the column in joined table. |
o.country_id Refers to the column with foreign key in facts table. |
To get totals for measures defined with a custom SQL expression you can check Use "GROUP BY CUBE" option if
your DB supports this SQL syntax (MySql doesn't) and columns used as dimensions are not nullable.
For "GROUP BY CUBE" query database adds additional rows for subtotals where NULL
values are used for grouping columns.
If your columns have NULL
values
you need to use another value instead; to do that you may define a custom SQL for the dimension, for example:
COALESCE(column,'')
(supported in all databaseses)ISNULL(column,'')
(SQL Server)IFNULL(column,'')
(BigQuery, Snowflake)One more limitation is usage of pivot table's Filter: by default it is applied in-memory and since SeekTable engine cannot calculate "FirstValue" totals you still can get empty totals when this kind of filtering is used. To avoid that you can:
year=2019
name:"John"
) is translated only when Translate LIKE conditions is enabledWhen a measure is calculated with a custom SQL, the engine cannot perform roll-ups and this leads to some limitations in "Expression"-type dimensions usage. In particular, dimension's formula should produce a 1:1 projection - in other words, unique set of arguments should lead to an unique expression evaluation result (otherwise report can display zeros/empty cells).
Range grouping is a technique used in pivot tables to group numerical data into ranges. This is useful when you want to summarize large amounts of data into smaller, more meaningful categories.
For example, let's say you have a dataset containing the ages of customers. You could use range grouping to group the ages into categories such as 18-24, 25-34, 35-44, and so on.
To configure range grouping in SeekTable you need to add a new dimension (say, "Age (ranges)") and define it as an SQL expression that calculates groups according to your requirements:
age_ranges
Field
case when age between 0 and 18 then '0-18' when age between 19 and 60 then '19-60' else '60+' end
In a similar way you can define custom groups even for non-number columns (collapse several values into a one special group).