Introduction
Instead of Extracting all the data every time you can just add new rows of data. Doing this will make your extracts run faster and sooner.
Requirements
In order to do an incremental extract you need to
- Not aggregate the extract (This adds a group and if you want to extract your can still do it in customSQL)
- Have a date, datetime, or number to use as the identifier of new rows.
Instructions
What happens when Tableau refreshes an incremental extract?
- It looks for the max identifier in the extract
- adds a line to the where clause so only records greater the identifier come back
NOTE: Tableau will round seconds to three decimal points. If you have a max datetime of
1-10-28 12:05:2.2345
and then a new row with1-10-28 12:05:2.2348
is added to the database it wil not be included in your incremental extract
NOTE: Making changes to your datasource (including sql) will trigger Tableau to re-generate the extract.
How does that change when I am using CustomSQL?
This works great if you drag and drop your tables and join them in Tableau. Sometimes you have to use CustomSQL so how can we build the SQL to accomidate incremental extracts.
How CustomSQL works in Tableau
The short answer is it wraps the SQL you paste in with what it needs to create the extract or render the viz.
SELECT
the fields we need --for live queries
fields in the sql that are not hidden in Tableau --for extracts
FROM
--start your sql
SELECT
stuff
FROM
THINGS
WHERE
1=1
--end your sql
WHERE
stuff in the filter pane --live
datasource filters --live
extract filters --extract
incremental_column > MAX(incremental_column_in_extract) --incremental extracts
GROUP BY
every dimesion --when extracts are aggregated or if it is a live query
How to optimize your SQL for incremental extracts
This is fine for the first run, but your sql might return 3 years of data ever time. Every subsquent time it will return it all just to have Tableau say "Thanks, but I only need these new rows."
WHERE
(trunc(sysdate) = TO_DATE('January 11, 2019, 12:00 A.M.','Month dd, YYYY, HH:MI A.M.') --Set to today when I create the extract today it will pull everything
OR
date_col = sysdate-1) --rows from yesterday
Common Issues
Understand how your database works. If a row is changed it might not get loaded if the date is the same.
In this case new transactions adjust the prior one. We could just sum the dollars and get 10 and incrementally extract based on the transaction date.
account | transaction-date | dollars | type |
---|---|---|---|
tom | 1/1/00 | 5 | charge |
tom | 1/2/00 | -5 | adjustment |
tom | 1/2/00 | 10 | charge |
In this case transactions are updated
Using the transaction date does not work because the row changes but not the date one day the row might look like:
account | transaction-date | dollars |
---|---|---|
tom | 1/1/00 | 5 |
then the next it might be changed and look like
account | transaction-date | dollars |
---|---|---|
tom | 1/1/00 | 10 |
An incremental based on the transaction-date wouldn't pick up the change so. Two ways around this:
- there is an update date field. This will end up with duplicate rows you will need to use a LOD in Tableau to get the row with the Max updated record
account | transaction-date | dollars | updated |
---|---|---|---|
tom | 1/1/00 | 5 | 1/1/00 |
tom | 1/1/00 | 10 | 1/2/00 |
- do a incremental and a full extract. The server will let you schedule both and I might do incrementals each day and then a full refresh on the weekend In this case some drift can happen and tom shows as $15 but the full refresh will clean things back up.
Cool Hacks: Snapshots!
- Use CustomSQL
- include the sysdate as a columm
- increment the extract on the sysdate
WARNING this will exponentially expand your extract but in a transactional system with no history you can answer some cool questions like how many tickets were open on a day? Also useful for snapshoting metrics for a period. Then you can compare This month to the last easily in Tableau.
In this simple example the sql returns the total amount of charges, payments, and adjustments for yesterday. Running it daily will overtime create a nice reporting table. 3 rows per day (charge/payment/adjustment).
select
t.POST_DATE
,tt.name
,sysdate --sysdate is here so we can use it for our incremental counter
,SUM(t.amount)
,count(distinct t.account_ID)
from TRANSACTIONS t
left outer join TRANSACTION_TYPE tt on t.tran_type=tt.tran_type
WHERE t.Post_Date = trunc(sysdate)-1
group by
t.post_date
,tt.name