A blog about SQL Server, SSIS, C# and whatever else I happen to be dealing with in my professional life.

Find ramblings

Monday, January 22, 2018

Staging Metadata Framework for the Unknown

Staging metadata framework for the unknown

That's a terrible title but it's the best I got. A client would like to report out of ServiceNow some metrics not readily available in the PowerBI App. The first time I connected, I got a quick look at the Incidents and some of the data we'd be interested in but I have no idea how that data changes over time. When you first open a ticket, maybe it doesn't have a resolved date or a caused by field populated. And since this is all web service stuff and you can customize it, I knew I was looking at lots of iterations to try and keep up with all the data coming back from the service. How can I handle this and keep sane? Those were my two goals. I thought it'd be fun to share how I solved the problem using features in SQL Server 2016.

To begin, I created a database called RTMA to perform my real time metrics analysis. CREATE DATABASE RTMA; With that done, I created a schema within my database like USE RTMA; GO CREATE SCHEMA ServiceNow AUTHORIZATION dbo; To begin, we need a table to hold our discovery metadata.

CREATE TABLE 
    ServiceNow.ColumnSizing
(
    EntityName varchar(30) NOT NULL
,   CollectionName varchar(30) NOT NULL
,   ColumnName varchar(30) NOT NULL
,   ColumnLength int NOT NULL
,   InsertDate datetime NOT NULL
    CONSTRAINT DF_ServiceNow_ColumnSizing_InsertDate DEFAULT (GETDATE())
);

CREATE CLUSTERED COLUMNSTORE INDEX
    CCI_ServiceNow_ColumnSizing
    ON ServiceNow.ColumnSizing;
The idea for this metadata table is that we'll just keep adding more information in for the entities we survey. All that matters is the largest length for a given combination of Entity, Collection, and Column.

In the following demo, we'll add 2 rows into our table. The first batch will be our initial sizing and then "something" happens and we discover the size has increased.

INSERT INTO
    ServiceNow.ColumnSizing
(
    EntityName
,   CollectionName
,   ColumnName
,   ColumnLength
,   InsertDate
)
VALUES
    ('DoesNotExist', 'records', 'ABC', 10, current_timestamp)
,   ('DoesNotExist', 'records', 'BCD', 30, current_timestamp);

Create a base table for our DoesNotExist. What columns will be available? I know I'll want my InsertDate and that's the only thing I'll guarantee to begin. And that's ok because we're going to get clever.

DECLARE @entity nvarchar(30) = N'DoesNotExist'
,   @Template nvarchar(max) = N'DROP TABLE IF EXISTS ServiceNow.Stage;
    CREATE TABLE
        ServiceNow.Stage
    (
    
    InsertDate datetime CONSTRAINT DF_ServiceNow_Stage_InsertDate DEFAULT (GETDATE())
    );
    CREATE CLUSTERED COLUMNSTORE INDEX
        CCI_ServiceNow_Stage
    ON
        ServiceNow.Stage;'
,   @Columns nvarchar(max) = N'';

DECLARE @Query nvarchar(max) = REPLACE(REPLACE(@Template, '', @Entity), '', @Columns);
EXECUTE sys.sp_executesql @Query, N'';

We now have a table with one column so let's look at using our synthetic metadata (ColumnSizing) to augment it. The important thing to understand in the next block of code is that we'll use FOR XML PATH('') to concatenate rows together and the CONCAT function to concatenate values together.

See more here for the XML PATH "trick"

If we're going to define columns for a table, it follows that we need to know what table needs what columns and what size those columns should be. So, let the following block be that definition.

DECLARE @Entity varchar(30) = 'DoesNotExist';

SELECT
    CS.EntityName
,   CS.CollectionName
,   CS.ColumnName
,   MAX(CS.ColumnLength) AS ColumnLength
FROM
    ServiceNow.ColumnSizing AS CS
WHERE
    CS.ColumnLength > 0
    AND CS.ColumnLength =  
    (
        SELECT
            MAX(CSI.ColumnLength) AS ColumnLength
        FROM
            ServiceNow.ColumnSizing AS CSI
        WHERE
            CSI.EntityName = CS.EntityName
            AND CSI.ColumnName = CS.ColumnName
    )
    AND CS.EntityName = @Entity
GROUP BY
    CS.EntityName
,   CS.CollectionName
,   CS.ColumnName;

We run the above query and that looks like what we want so into the FOR XML machine it goes.

DECLARE @Entity varchar(30) = 'DoesNotExist'
,   @ColumnSizeDeclaration varchar(max);

;WITH BASE_DATA AS
(
    -- Define the base data we'll use to drive creation
    SELECT
        CS.EntityName
    ,   CS.CollectionName
    ,   CS.ColumnName
    ,   MAX(CS.ColumnLength) AS ColumnLength
    FROM
        ServiceNow.ColumnSizing AS CS
    WHERE
        CS.ColumnLength > 0
        AND CS.ColumnLength =  
        (
            SELECT
                MAX(CSI.ColumnLength) AS ColumnLength
            FROM
                ServiceNow.ColumnSizing AS CSI
            WHERE
                CSI.EntityName = CS.EntityName
                AND CSI.ColumnName = CS.ColumnName
        )
        AND CS.EntityName = @Entity
    GROUP BY
        CS.EntityName
    ,   CS.CollectionName
    ,   CS.ColumnName
)
SELECT DISTINCT
    BD.EntityName
,   (
        SELECT
            CONCAT
            (
                ''
            ,   BDI.ColumnName
            ,   ' varchar('
            ,   BDI.ColumnLength
            ,   '),'
            ) 
        FROM
            BASE_DATA AS BDI
        WHERE
            BDI.EntityName = BD.EntityName
            AND BDI.CollectionName = BD.CollectionName
        FOR XML PATH('')
) AS ColumnSizeDeclaration
FROM
    BASE_DATA AS BD;

That looks like a lot, but it's not. Run it and you'll see we get one row with two elements: "DoesNotExist" and "ABC varchar(10),BCD varchar(30)," That trailing comma is going to be a problem, that's generally why you see people either a leading delimiter and use STUFF to remove it or in the case of a trailing delimiter LEFT with LEN -1 does the trick.

But we're clever and don't need such tricks. If you look at the declaration for @Template, we assume there will *always* be at final column of InsertDate which didn't have a comma preceding it. Always define the rules to favor yourself. ;)

Instead of the static table declaration we used, let's marry our common table expression, CTE, with the table template.

DECLARE @entity nvarchar(30) = N'DoesNotExist'
,   @Template nvarchar(max) = N'DROP TABLE IF EXISTS ServiceNow.Stage;
    CREATE TABLE
        ServiceNow.Stage
    (
    
    InsertDate datetime CONSTRAINT DF_ServiceNow_Stage_InsertDate DEFAULT (GETDATE())
    );
    CREATE CLUSTERED COLUMNSTORE INDEX
        CCI_ServiceNow_Stage
    ON
        ServiceNow.Stage;'
,   @Columns nvarchar(max) = N'';

-- CTE logic patched in here

;WITH BASE_DATA AS
(
    -- Define the base data we'll use to drive creation
    SELECT
        CS.EntityName
    ,   CS.CollectionName
    ,   CS.ColumnName
    ,   MAX(CS.ColumnLength) AS ColumnLength
    FROM
        ServiceNow.ColumnSizing AS CS
    WHERE
        CS.ColumnLength > 0
        AND CS.ColumnLength =  
        (
            SELECT
                MAX(CSI.ColumnLength) AS ColumnLength
            FROM
                ServiceNow.ColumnSizing AS CSI
            WHERE
                CSI.EntityName = CS.EntityName
                AND CSI.ColumnName = CS.ColumnName
        )
        AND CS.EntityName = @Entity
    GROUP BY
        CS.EntityName
    ,   CS.CollectionName
    ,   CS.ColumnName
)
SELECT DISTINCT
    @Columns = (
        SELECT
            CONCAT
            (
                ''
            ,   BDI.ColumnName
            ,   ' varchar('
            ,   BDI.ColumnLength
            ,   '),'
            ) 
        FROM
            BASE_DATA AS BDI
        WHERE
            BDI.EntityName = BD.EntityName
            AND BDI.CollectionName = BD.CollectionName
        FOR XML PATH('')
) 
FROM
    BASE_DATA AS BD;

DECLARE @Query nvarchar(max) = REPLACE(REPLACE(@Template, '', @Entity), '', @Columns);
EXECUTE sys.sp_executesql @Query, N'';

Bam, look at it now. We took advantage of the new DROP IF EXISTS (DIE) syntax to drop our table and we've redeclared it, nice as can be. Don't take my word for it though, ask the system tables what they see.

SELECT
    S.name AS SchemaName
,   T.name AS TableName
,   C.name AS ColumnName
,   T2.name AS DataTypeName
,   C.max_length
FROM
    sys.schemas AS S
    INNER JOIN
        sys.tables AS T
        ON T.schema_id = S.schema_id
    INNER JOIN
        sys.columns AS C
        ON C.object_id = T.object_id
    INNER JOIN
        sys.types AS T2
        ON T2.user_type_id = C.user_type_id
WHERE
    S.name = 'ServiceNow'
    AND T.name = 'StageDoesNotExist'
ORDER BY
    S.name
,   T.name
,   C.column_id;
Excellent, we now turn on the actual data storage process and voila, we get a value stored into our table. Simulate it with the following.
INSERT INTO ServiceNow.StageDoesNotExist
(ABC, BCD) VALUES ('Important', 'Very, very important');
Truly, all is well and good.

*time passes*

Then, this happens

WAITFOR DELAY ('00:00:03');

INSERT INTO
    ServiceNow.ColumnSizing
(
    EntityName
,   CollectionName
,   ColumnName
,   ColumnLength
,   InsertDate
)
VALUES
    ('DoesNotExist', 'records', 'BCD', 34, current_timestamp);
Followed by
INSERT INTO ServiceNow.StageDoesNotExist
(ABC, BCD) VALUES ('Important','Very important, yet ephemeral data');
To quote Dr. Beckett: Oh boy

No comments: