CSV (or TSV) file is a simplest data source in SeekTable: it is enough to click on Upload Data, choose a local file, and wait until file is uploaded. That's all: now you can create summary report (pivot table, chart) or review/filter/sort CSV data rows with a grid-view ('flat table').
All parameters are detemined automatically:
sep=,
in the first line. (Split)
suffix
is configured automatically. If engine doesn't detect some column as multi-value (this may happen if first CSV rows doesn't contain multiple values for that column)
it is possible to setup (Split)
dimension explicitely.
CSV file may be compressed with zip or gzip, and this is only option if you want to upload large CSV file that is greater than upload limit (50Mb). Uncompressed CSV size might be up to 500Mb; SeekTable works fine with files of this size and pivot table generation should not take more than 5-10 seconds.
It is possible to define special expressions for custom handling of CSV values (parse timespans, apply replace rules etc).
To use a set of columns as values in reports an unpivot operation may be configured.
In some cases you might need to refresh saved pivot table reports by uploading a newer version of the dataset. This is possible in the following ways:
https://www.seektable.com/cube/feb3828c57474791a5ee92b83d4195e0In this case cube ID is
feb3828c57474791a5ee 92b83d4195e0
.
mydata.csv
→ feb3828c57474791a5ee 92b83d4195e0.csv
mydata.zip
→ feb3828c57474791a5ee 92b83d4195e0.zip
(if CSV file is zipped)
curl -k -F "file=@sales.csv" -H "Authorization: YOUR_SEEKTABLE_ACCOUNT_API_KEY" \ https://www.seektable.com/api/cube/import/csv?cubeId=CUBE_IDwhere sales.csv is a path to the local CSV file, YOUR_SEEKTABLE_ACCOUNT_API_KEY is an API key value from "Manage Account" → "Get API Key", CUBE_ID is GUID of the cube to refresh.
You can use curl to refresh reports on a schedule with help of Task Scheduler (on Windows) or cron (on Linux or MacOS).
Note: data refresh is possible only if new CSV file has the same columns as in old version of CSV file (with the same names, new columns are OK).
It is possible to define expression-based (calculated) dimensions that are evaluated before aggregation step (on row-level). For example, if you have date column (say, "Some Date") it is possible to calculate quarter or day-of-week in the following way:
Field
Date.Quarter(Column["Some Date"])
(quarter number 1-4)Column["Some Date"]!=null ? Column["Some Date"].DayOfWeek : null
(day-of-week number 0-6) + Format={0:ddd}
(0=Sun, 1=Mon, 2=Tue etc) or {0:dddd}
(0=Sunday, 1=Monday, 2=Tuesday etc)Date.Week(Column["Some Date"])
(week-of-year number)Parameter["param_name"]
; in particular, it is possible to define
a special dimenion for dynamic grouping criteria that depends on the user's choice, for example:
Parameter["dimension_name_param"]!=null ? Column[ Parameter["dimension_name_param"] ] : "-- choose a dimension --"
Column
is a special variable that provides access to the context CSV row values; you can use column name directly
if it doesn't contain spaces or special symbols - for example, SomeDate
. More details about expression syntax you can found
here.
Count
: the number of aggregated rows.Sum
: the total sum of a numeric column.Average
: the average value of a numeric column.Min
: the minimal value of a column.Max
: the maximum value of a column.Count Unique
: the number of unique or distinct values of a column.Quantile/Median
: 2-quantile value (median) for the a numeric column.Mode
: the value that appears most often.Variance
: the variance Var(X) of a numeric column. It is possible to calculate sample variance / standard deviartion / sample standard deviation by specifying 2-nd parameter for this measure type.Expression
: measure defined as formula calculated by SeekTable.Count
: no parameters needed.Sum
/Average
/Min
/Max
/Count Unique
/Mode
: name of CSV column to aggregate.Quantile/Median
: first parameter is name of CSV column; second parameter is optional and represents percentile value (default value is 0.5 which gives median).Variance
: first parameter is name of CSV column; second parameter is optional and can be: Variance, SampleVariance, StandardDeviation, SampleStandardDeviation (if not specified "Variance" is assumed).Expression
: first parameter is an expression, and next parameters are names of measures used as arguments in the expression.
{0}
can be used).
Examples:
{0:$#.##}
→ format number as $10.25 (or empty if no value){0:0,.0#}k
→ show number in thousands with "k" suffix{0:0.#|k}
→ if number>1000 shorten it with "k" suffix{0:0,,.0#}M
→ show number in millions with "M" suffix{0:0.#|M}
→ if number>1000000 shorten it with "M" suffix{0:0.#|kMB}
→ shorten large number with appropriate "k"/"M"/"B" suffixYou can define report parameters to specify report-specific filtering condition by CSV column(s) in Filter Expression:
String
: text-based value.Int32
: 32-bit integer (max value is 2,147,483,647).Int64
: 64-bit integer (max value is 9,223,372,036,854,775,807).Decimal
: Fixed-point number with max 28 significant digits. Decimal point is '.' character.DateTime
: date or datetime value (in case of date this will be datetime value with 0:00:00 time). Date value should be specified as string in YYYY-MM-DD format.Boolean
: accepts only 'True' or 'False' value.Parameter["param_name"]
."%"+Parameter["param_name"]+"%"
(wrap with '%' for 'like' condition).When parameter is defined it can be used in Filter Expression as following:
"1"="1" @paramName[ and "csv_column_name":field="{0}":var ]
In this sample parameter name is paramName
and CSV column to filter is csv_column_name
.
Always-true condition "1"="1"
is used to simplify conditions conjunction (it is enough to start parameter template with and
).
Parameter placeholder syntax:
@
paramName
[ ]
@paramName[ expression_when_defined ; expression_when_NOT_defined ]
"{0}":var
{0}
inserts the parameter name → that gives "paramName":var
which refers to the parameter value.Filter Expression syntax:
"csv_column_name":field <condition> "value"[:datatype]
<
, <=
, >
, >=
, =
, !=
, like
, in
.
'Like' works like in SQL: prepend/append '%' for 'contains' (you can add '%' automatically with help of parameter's Expression)."Test"
, or parameter name with a 'var' datatype: "param_name":var
string
, int32
, int64
, decimal
, double
, datetime
C1 and C2
, C1 or C2
, C1 or (C2 and C3)
In most cases SeekTable detects a multi-value column automatically and adds a special <column_name> (Split)
dimension.
If this detection fails you still can manually configure multi-value handing for the column in this way:
{"Multivalue":true, "MultivalueSeparator":";"}
<column_name> (Split)
(exactly one space should be between a column name and the suffix).
CSV files don't specify data types for columns and SeekTable have to determine these data types automatically (by analyzing first N rows of the dataset). In most cases this autodetection works well, however in some cases column's data type may be incorrectly detected:
String
type, but in fact it may contain numbers or dates.Integer
(so all non-numbers are ignored), but in fact this should be String
.To handle that you may explicitely define column's data type in this way:
{"ValueTypeCode":"String"}Another type codes that may be specified:
Int32
, Decimal
, Double
, DateTime
, Boolean
.
If your data is not in CSV format you can easily convert it to CSV using (free) command line tools like csvkit or xsv. For example:
in2csv data.json > data.csv
.in2csv data.xls > data.csv
.Alternatively, you may query Excel/JSON/multiple CSVs directly with the help of built-in DuckDB engine (self-hosted SeekTable only).
In case if you want to create a pivot table by a CSV file that exceeds upload limit (even when zipped) you still can do that in one of the following ways: