.NET Zone is brought to you in partnership with:

I'm a UK based BI Consultant, specialising in SQL Server, C# and Business Intelligence. I've been working with SQL Server for over 10 years, and have over 15 years in development. Nick is a DZone MVB and is not an employee of DZone and has posted 20 posts at DZone. You can read more from them at their website. View Full User Profile

T-SQL vs. CLR – A Comparison

07.03.2012
| 3964 views |
  • submit to reddit

I’ve recently been looking at optimizing a data load that I’m working on, and have got a quite a nice example of how the performance can be improved significantly by creating a CLR object.

To do this, I created a set of data (which is downloadable from here, is approx. 3mb and is extractable using 7-Zip ) which has approx. 200k records in it. The file has two columns (pipe delimited), which are a row id, and a set of events in a string (in the format “1,2,3,4,5”). Each string has a variable number of events in it (between 1 and 75), which looks something like this.

image

What I’m wanting to do is split this out, to have an Event Id, Sequence_Number and Event with the EventList above being split over multiple records, and a Sequence number showing the position of the event code in the list. Sounds straightforward.

image

I’m effectively looking for a quick way to generate this data.

T-SQL – Custom_Split

A custom script to split the table was created, looking like this:

create function [dbo].[Split] 
( 
    @string nvarchar(4000), 
    @delimiter nvarchar(10) 
) 
returns @table table 
( 
    [Value] nvarchar(4000) 
) 
begin 
    declare @nextString nvarchar(4000) 
    declare @pos int, @nextPos int 
 
    set @nextString = '' 
    set @string = @string + @delimiter 
 
    set @pos = charindex(@delimiter, @string) 
    set @nextPos = 1 
    while (@pos <> 0) 
    begin 
        set @nextString = substring(@string, 1, @pos - 1) 
 
        insert into @table 
        ( 
            [Value] 
        ) 
        values 
        ( 
            @nextString 
        ) 
 
        set @string = substring(@string, @pos + len(@delimiter), len(@string)) 
        set @nextPos = @pos 
        set @pos = charindex(@delimiter, @string) 
    end 
    return 
end 
-- This code was taken from here: http://stackoverflow.com/questions/314824/t-sql-opposite-to-string-concatenation-how-to-split-string-into-multiple-recor/314833#314833

This was executed in Management Studio, with this script:

SELECT EventId,
   IDENTITY (int, 1, 1) as SequenceNo,
   Value as Event
   INTO dbo.SQL_Events   
      FROM dbo.LargeFile
   CROSS  APPLY  dbo.split(dbo.LargeFile.EventList, ‘,’) AS t
   ORDER  BY dbo.LargeFile.EventId

Running this generated a table with 9.8 million rows and took 6 minutes 8 seconds, so handling approx. 26.6k records a second.

CLR – SplitEvents

To split the EventList in a CLR object, I did the following in Visual Studio 2010.

1. Choose the .NET framework, CLR Project

image

2. Right click the project when it’s created, and choose ‘Add –> Stored Procedure’.

In the body of the class, I used the following code:

using System;
using System.Data;
using System.Data.SqlClient;
using Microsoft.SqlServer.Server;public partial class StoredProcedures
{
    [Microsoft.SqlServer.Server.SqlProcedure]
    public static void CLR_Split()
    {
        // Put your code here
         char[] delimiters = new char[] { ‘,’, ‘-’ };

        using (SqlConnection conn = new SqlConnection(“Context Connection=true”))
        {
            using (SqlCommand cmd = new SqlCommand(“SELECT EventId, EventList FROM LargeFile”, conn))
            {
                SqlPipe pipe = SqlContext.Pipe;
                SqlMetaData[] cols = new SqlMetaData[3];
                cols[0] = new SqlMetaData(“EventId”, SqlDbType.Int);
                cols[1] = new SqlMetaData(“Sequence_No”, SqlDbType.Int);
                cols[2] = new SqlMetaData(“Event”, SqlDbType.Int);

                conn.Open();
                using (SqlDataReader reader = cmd.ExecuteReader())
                {
                    int iRow = 0;
                    if (reader.HasRows)
                    {
                        SqlDataRecord rec = new SqlDataRecord(cols);
                        pipe.SendResultsStart(rec);
                        while (reader.Read())
                        {
                            int iCTIQueueId = Convert.ToInt32(reader[0]);
                            string sIncoming = reader[1].ToString();
                            if (sIncoming.Length > 0)
                            {
                                string[] sSplit = sIncoming.Split(delimiters);
                                int iSplitCount = 1;
                                foreach (string s in sSplit)
                                {
                                    try
                                    {
                                        int iSplitVal = Convert.ToInt32(s);
                                        rec.SetSqlInt32(0, iCTIQueueId);
                                        rec.SetSqlInt32(1, iSplitCount);
                                        rec.SetSqlInt32(2, iSplitVal);
                                        pipe.SendResultsRow(rec);
                                    }
                                    catch (Exception)
                                    {
                                    }
                                    iSplitCount++;
                                }
                            }
                            iRow++;
                        }
                        pipe.SendResultsEnd();
                    }
                    else
                            pipe.Send(“”);
                }
                conn.Close();
            }
        }
    }

};

Having compiled and deployed this to SQL Server, it could be run using this :

INSERT INTO CLR_Events (EventId, Sequence_No, Event)
EXEC CLR_Split

Notes

When running these tests, I was running SQL Server 2008 R2, on a i5-430m laptop, with 4Gb RAM.

The database had plenty of space pre-allocated to it, so there was no performance hit based on growing the database.

Published at DZone with permission of Nick Haslam, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

David Spenard replied on Thu, 2012/07/05 - 9:38am

Nice concise example. I recently worked on a data crunching initiative that had similar needs. I'll keep this in mind for next time.

Steven Goodgrove replied on Wed, 2012/08/08 - 8:28am

Just wondering what the results of the CLR procedure performance were?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.