January 30, 2009

Best Practices for Date/Time Calculations in SQL Server

Arithmetic Operations on Time Portion of Date/Time DataSo far this article has discussed the arithmetic operations on the date portion of data/time values. So when you try to run SELECT GETDATE() + 1, the second addend is implicitly considered the number of days that have to be added to the value returned by the GETDATE() function. But what if you want to add one hour or one minute to the current datetime value? Using the SQL Server DATEADD() function you can do that very easily, but using addition or substraction operators will be tricky.
Let's examine how to do that. One day consists of 24 hours or 1,440 minutes or 86,400 seconds. So if you run the following statement:
SELECT GETDATE() + 1/24
You probably will get the current datetime plus one hour. In reallity, however, you cannot use just 1/24 because both dividend and divisor are integers, and the result of dividing them will be zero. You need to convert the integers to decimal or float data types as follows in order to get the correct result:
-- How to add one hour
---------------------------------------
SELECT GETDATE()
SELECT DATEADD(hh, 1, GETDATE())
SELECT GETDATE () + CAST(1 AS dec(9,4))/ CAST (24 AS dec(9,4))
SELECT GETDATE () + CAST(1 AS dec)/ CAST (24 AS dec)
SELECT GETDATE () + CAST (1 AS float)/ CAST (24 AS float)
Results:
2007-03-25 20:31:13.870
2007-03-25 21:31:13.870
2007-03-25 21:31:13.867
2007-03-25 21:31:13.870
2007-03-25 21:31:13.870
-- How to add one minute
---------------------------------------
SELECT GETDATE()
SELECT DATEADD(mi, 1, GETDATE())
SELECT GETDATE () + CAST(1 AS dec(9,4))/ CAST (1440 AS dec(9,4))
SELECT GETDATE () + CAST(1 AS dec(18,9))/ CAST (1440 AS dec(18,9))
SELECT GETDATE () + CAST (1 AS float)/ CAST (1440 AS float)
Results:
2007-03-25 20:35:15.127
2007-03-25 20:36:15.127
2007-03-25 20:36:15.123
2007-03-25 20:36:15.127
2007-03-25 20:36:15.127
-- How to add one second
---------------------------------------
SELECT GETDATE()
SELECT DATEADD(ss, 1, GETDATE())
SELECT GETDATE () + CAST(1 AS dec(9,4))/ CAST (86400 AS dec(9,4))
SELECT GETDATE () + CAST(1 AS dec(18,9))/ CAST (86400 AS dec(18,9))
SELECT GETDATE () + CAST(1 AS dec(24,18))/ CAST (86400 AS dec(24,18))
SELECT GETDATE () + CAST (1 AS float)/ CAST (86400 AS float)
Results:
2007-03-25 20:42:26.617
2007-03-25 20:42:27.617
2007-03-25 20:42:27.613
2007-03-25 20:42:27.613
2007-03-25 20:42:27.613
2007-03-25 20:42:27.617
-- How to add one second, using variables
------------------------------------------
DECLARE @dec1 dec(24,18), @dec2 dec(24,18), @dec3 dec(24,18), @dt datetime
SELECT @dec1 = 1, @dec2 = 86400, @dt = GETDATE();
SELECT @dec3 = @dec1 / @dec2;
SELECT @dt
SELECT DATEADD(ss, 1, @dt)
SELECT @dt + @dec3
SELECT @dt + CAST (1 AS float)/ CAST (86400 AS float)
Results:
2007-03-25 20:49:16.817
2007-03-25 20:49:17.817
2007-03-25 20:49:17.813
2007-03-25 20:49:17.817
As you can see from the last example, the SQL Server function DATEADD() works perfectly, but an addition operator may cause a problem. For example, when you try to add one hour or one minute, you need to find a sufficient precision for decimal data type. Otherwise, the result will be slightly inaccurate. However, when you try to add one second, applying an addition operator and decimal conversion, you won't be able to get the exact result at all.
On the other hand, the float conversion looks precise and safe for the time calculations with an addition operator, but if you start to use it you may run into a problem: duplicates and missing values. To illustrate and understand the problem, create and populate an auxillary table:
SET NOCOUNT ON;
DECLARE @max int, @cnt int;
SELECT @cnt = 10000;
IF EXISTS(SELECT * FROM sysobjects
WHERE ID = (OBJECT_ID('sequence')) AND xtype = 'U')
DROP TABLE sequence;
CREATE TABLE sequence(num int NOT NULL);
INSERT INTO sequence VALUES(1);
SELECT @max = 1;
WHILE(@max <= @cnt)
BEGIN
INSERT INTO sequence
SELECT @max + num FROM sequence;
SELECT @max = MAX(num) FROM sequence;
END
When you run this script, it will insert 16,384 sequential numbers into the table sequence. (The number 16,384 doesn't have any special meaning. It was selected for illustration purposes only.)
Now, generate the sequence of hours using the auxillary table and SQL Server's date/time function as follows:
IF EXISTS(SELECT * FROM sysobjects
WHERE id = OBJECT_ID('test'))
DROP TABLE test;
SELECT num, DATEADD(hh, num, 'Dec 31, 2006 23:00:00') dt
INTO test
FROM sequence;
SELECT * FROM test;
Results:
num dt
----------- -----------------------
1 2007-01-01 00:00:00.000
2 2007-01-01 01:00:00.000
3 2007-01-01 02:00:00.000
. . . . . . . . . . . . . . . . . .
3099 2007-05-10 02:00:00.000
3100 2007-05-10 03:00:00.000
. . . . . . . . . . . . . . . . . .
16381 2008-11-13 12:00:00.000
16382 2008-11-13 13:00:00.000
16383 2008-11-13 14:00:00.000
16384 2008-11-13 15:00:00.000
The function DATEADD() works as expected and generates a sequence of datetime values with one-hour intervals.
Now, try to roll up the sequence you just generated using the same the SQL Server DATEADD() function:
SELECT DISTINCT DATEADD(hh, -num, dt) FROM test
Results:
2006-12-31 23:00:00.000
The last result proves that SQL Server's date/time function generates date/time values properly. In order to test the solution that uses an arithmetic operator (+), run the following example:
DECLARE @time float
SELECT @time = CAST(1 as float)/CAST(24 as float)
IF EXISTS(SELECT * FROM sysobjects
WHERE id = OBJECT_ID('test'))
DROP TABLE test;
SELECT num, (CAST('Dec 31, 2006 23:00:00' AS datetime) + @time * num) dt
INTO test
FROM sequence;
SELECT * FROM test;
num dt
----------- -----------------------
1 2007-01-01 00:00:00.000
2 2007-01-01 01:00:00.000
3 2007-01-01 02:00:00.000
4 2007-01-01 03:00:00.000
5 2007-01-01 03:59:59.997
6 2007-01-01 05:00:00.000
7 2007-01-01 05:59:59.997
8 2007-01-01 07:00:00.000
9 2007-01-01 08:00:00.000
10 2007-01-01 08:59:59.997
. . . . . . . . . . . . . . . . . .
16380 2008-11-13 11:00:00.000
16381 2008-11-13 11:59:59.997
16382 2008-11-13 12:59:59.997
16383 2008-11-13 14:00:00.000
16384 2008-11-13 14:59:59.997
You will find that an addition operator produces inaccurate results. Sometimes they differ from the expected ones by 3 ms. If you try to roll up the sequence of generated date/time values, you will get more than one date/time "seed" value (as in the following example) and that is incorrect:
SELECT DISTINCT DATEADD(hh, -num, dt) FROM test
Results:
2006-12-31 22:59:59.997
2006-12-31 23:00:00.000
You may say that +/- 3 ms precision is acceptable for most applications, but look how that seemingly tiny problem can produce a bigger one:
SELECT CONVERT(varchar(100), dt, 100)
FROM test
ORDER BY num
Results:
Jan 1 2007 12:00AM
Jan 1 2007 1:00AM
Jan 1 2007 2:00AM
Jan 1 2007 3:00AM
Jan 1 2007 3:59AM
Jan 1 2007 5:00AM
Jan 1 2007 5:59AM
. . . . . . . . . .
Nov 13 2008 5:59AM
Nov 13 2008 6:59AM
Nov 13 2008 8:00AM
Nov 13 2008 8:59AM
Nov 13 2008 9:59AM
Nov 13 2008 11:00AM
Nov 13 2008 11:59AM
Nov 13 2008 12:59PM
Nov 13 2008 2:00PM
Nov 13 2008 2:59PM
This example uses a CONVERT() function to produce a different date/time format. As a result, inaccuracies in the generated values increased from 3 ms to 1 minute and became unacceptable. However, this is not the only problem. If you try to generate the sequences of minutes or seconds, things become even worse. Look at this example:
DECLARE @time float
SELECT @time = cast(1 as float)/cast(1440 as float)
IF EXISTS(SELECT * FROM sysobjects
WHERE id = OBJECT_ID('test'))
DROP TABLE test;
SELECT num, (CAST('Dec 31, 2006 23:59:00' AS datetime) + @time * num) dt
INTO test
FROM sequence;
SELECT * FROM test;
Results:
num dt
----------- -----------------------
1 2007-01-01 00:00:00.000
2 2007-01-01 00:01:00.000
. . . . . . . . . . . . . . . . . .
1579 2007-01-02 02:17:59.997
1580 2007-01-02 02:19:00.000
1581 2007-01-02 02:19:59.997
. . . . . . . . . . . . . . . . . .
16382 2007-01-12 09:01:00.000
16383 2007-01-12 09:01:59.997
16384 2007-01-12 09:03:00.000
As you can see, there are inaccuracies in the generated values again. In addition, when you try to convert these values to another format as follows, you will get duplicated or missing dates:
SELECT CONVERT(varchar(100), dt, 100)
FROM test
ORDER BY num
Results:
Jan 1 2007 12:00AM
Jan 1 2007 12:01AM
. . . . . . . . . .
Jan 2 2007 12:00AM
Jan 2 2007 12:00AM
Jan 2 2007 12:02AM
Jan 2 2007 12:02AM
Jan 2 2007 12:04AM
Jan 2 2007 12:04AM
. . . . . . . . . .
Jan 12 2007 9:00AM
Jan 12 2007 9:01AM
Jan 12 2007 9:01AM
Jan 12 2007 9:03AM
For instance, there are two values "Jan 02, 2007 12:02AM", but the value "Jan 02, 2007 12:03AM" is missing.
If you want to see the list of all duplicates, you can run the following query:
SELECT COUNT(*), CONVERT(varchar(100), dt, 100)
FROM test
GROUP BY CONVERT(varchar(100), dt, 100)
HAVING COUNT(*) > 1
ORDER BY 2
Finally, you can generate the sequence of seconds using the same approach:
DECLARE @time float
SELECT @time = cast(1 as float)/cast(86400 as float)
IF EXISTS(SELECT * FROM sysobjects
WHERE id = OBJECT_ID('test'))
DROP TABLE test;
SELECT num, (CAST('Dec 31, 2006 23:59:59' AS datetime) + @time * num) dt
INTO test
FROM sequence;
SELECT * FROM test;
Results:
num dt
----------- -----------------------
1 2007-01-01 00:00:00.000
2 2007-01-01 00:00:01.000
3 2007-01-01 00:00:02.000
4 2007-01-01 00:00:03.000
5 2007-01-01 00:00:03.997
6 2007-01-01 00:00:05.000
7 2007-01-01 00:00:06.000
8 2007-01-01 00:00:07.000
9 2007-01-01 00:00:08.000
10 2007-01-01 00:00:08.997
11 2007-01-01 00:00:09.997
. . . . . . . . . . . . . . . . . .
16382 2007-01-01 04:33:00.997
16383 2007-01-01 04:33:02.000
16384 2007-01-01 04:33:03.000
SELECT CONVERT(varchar(100), dt, 120)
FROM test
ORDER BY num
Results:
2007-01-01 00:00:00
2007-01-01 00:00:01
2007-01-01 00:00:02
2007-01-01 00:00:03
2007-01-01 00:00:03
2007-01-01 00:00:05
. . . . . . . . . .
2007-01-01 04:32:55
2007-01-01 04:32:56
2007-01-01 04:32:56
2007-01-01 04:32:58
2007-01-01 04:32:59
2007-01-01 04:33:00
2007-01-01 04:33:00
2007-01-01 04:33:02
2007-01-01 04:33:03
This last example has the same problems as the previous one. In addition, using arithmetic operators for date/time manipulations can lead to other errors, weird results, degradation in performance, and more problems than are discussed here. You can avoid all these problems by using the SQL Server's date/time functions.

January 27, 2009

Search text in All Stored Procedures in a Database

declare @searchString varchar(100)
Set @searchString = '%' + strSearchText+ '%'
SELECT Distinct SO.Name
FROM sysobjects SO (NOLOCK) I
NNER JOIN syscomments SC (NOLOCK)
on SO.Id = SC.ID AND SO.Type = 'P'
AND SC.Text LIKE @searchString
ORDER BY SO.Name

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...