A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Friday, July 28, 2017

Generate TSQL time slices

I had some log data I wanted to bucket into 15 second time slices and I figured if I have solved this once, I will need to do it again so to the blog machine! This will use the LEAD, TIMEFROMPARTS and ROW_NUMBER() to accomplish this.
SELECT
    D.Slice AS SliceStart
,   LEAD
    (
        D.Slice
    ,   1
        -- Default to midnight
    ,   TIMEFROMPARTS(0,0,0,0,0)
    )
    OVER (ORDER BY D.Slice) AS SliceStop
,   ROW_NUMBER() OVER (ORDER BY D.Slice) AS SliceLabel
FROM
(
    -- Generate 15 second time slices
    SELECT 
        TIMEFROMPARTS(A.rn, B.rn, C.rn, 0, 0) AS Slice
    FROM
        (SELECT TOP (24) -1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL)) FROM sys.all_objects AS AO) AS A(rn)
        CROSS APPLY (SELECT TOP (60) (-1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL))) FROM sys.all_objects AS AO) AS B(rn)
        -- 4 values since we'll aggregate to 15 seconds
        CROSS APPLY (SELECT TOP (4) (-1 + ROW_NUMBER() OVER (ORDER BY(SELECT NULL))) * 15  FROM sys.all_objects AS AO) AS C(rn)
) D

That looks like a lot, but it really isn't. Starting from the first inner most query, we select the top 24 rows from sys.all_objects and use the ROW_NUMBER function to generate us a monotonically increasing set of values, thus 1...24. However, since the allowable range of hours is 0 to 23, I deduct one from this value (A). I repeat this pattern to generate minutes (B) except we get the top 60. Since I want 15 second intervals, for the seconds query, I only get the top 4 values. I deduct one so we have {0,1,2,3} and then multiply by 15 to get my increments (C). If you want different time slices, that's how I would modify this pattern.

Finally having 3 columns of numbers, I use TIMEFROMPARTS to build a time data type with the least amount of precision and present that as "Slice" and encapsulate that a derived table (D). Running that query gets me a list of periods but I don't know what the end period is.

We can calculate the end period by using the LEAD function. I present my original Slice as SliceStart. I then use the LEAD function to calculate the next (1) value based on the Slice column. In the case of 23:59:45, the "next" value in our data set is NULL. To address that scenario, we pass in a the default value for the lead function.

No comments: