Data Engineering with Avishkar: 2009

December 21, 2009

What SQL server DBA should know?

 About SQL Server Versions
 New Installation
 New Systems – Setup Options
 Diving Into an Existing Installation
 Locate SQL Server Instances
 Check For Service Packs Installed
 Performance Monitor
 Performance Monitor Checks
 Disk Space Checks
 Check on Jobs
 Check On Maintenance Plans
 Check For Off-System Backups
 File Sizes, Locations and Trending
 Reviewing Security
 Best Practices Tools
 Best Practices Analyzer
 Best Practices Analyzer Targets
 Microsoft Baseline Security Analyzer 2.1
 Identify Logins
 Firewalls, Access Protection
 Disaster Recovery Plan
 Understand Backup Models
 Recovery Model: Full
 Recovery Model: Simple
 Recovery Model: Bulk_logged
 Disaster Planning and Testing
 Manual Processes
 Best Practices
 Third-Party Tools…
 SQL Server Profiler
 Profiler Tips
 Database Engine Tuning Advisor
 What Is An Index?
 What Happens Without an Index?
 Add an Index…
 How Do I Figure Out Where I Need Indexes?
 How Do I Create Indexes?
 Creating Indexes - MS
 Options – Creating Indexes
 Creating Indexes – T-SQL
 Clustered vs. Non-Clustered
 Clustered Index Tips
 "Covered" Indexes
 When NOT To Index
 Index Maintenance
 Index Checking
 Index Defragmentation
 Index Defragmentation in SQL2k5
 About Joins
 A Note About "Aliasing"
 Simple Joins
 Basic Joins
 Join Tips

November 26, 2009

Defragmenting Indexes in SQL Server 2005

There are several things that should be done on a regular basis and one of these things is to manage database fragmentation. Depending on the tables, queries and indexes that are being used fragmentation can cause performance issues as well as using unnecessary space in the database. Database fragmentation is similar to disk fragmentation in that the data is stored in various places in the database file instead of sequentially or next to like data within the database. This often occurs where you have non-sequential keys and the constant inserting, updating and deleting of data causes the data to become fragmented as well as the use of additional data pages to store the data.
As the indexes are modified, the information stored in them becomes fragmented, resulting in the information being scattered across the data files. When this occurs, the logical ordering of the data no longer matches the physical ordering, which can lead to a deterioration of query performance.
To fix this problem, indexes must be periodically reorganized or rebuilt (defragmented) so the physical order of the leaf-level pages matches the logical order of the leaf nodes.

So where is the information stored?
To analyze SQL Server 2005 or 2008 indexes, you use the system function sys.dm_db_index_physical_stats to determine which indexes are fragmented and the extent of that fragmentation.

The sys.dm_db_index_physical_stats function returns a number of values that provide details about the indexes you specify. avg_fragmentation_in_percent: Percentage of the logical index that is fragmented.
fragment_count: Number of fragments in the leaf level.
avg_fragment_size_in_pages: Average number of pages in a leaf-level fragment.
page_count: Number of index or data pages.
An index always has at least one fragment (fragment_count). The maximum number of fragments that an index can have is equal to the number of pages (page_count).
Ideally, the avg_fragmentation_in_percent value should be as close to zero as possible, and the avg_fragment_size_in_pages should be as high as possible.
Microsoft recommends that you reorganize your index if the avg_fragmentation_in_percent value is less than or equal to 30% and rebuild the index if the value is greater than 30%.
Reorganizing Indexes
When you reorganize an index, SQL Server physically reorders the leaf-level pages to match the logical order of the leaf nodes. The process uses the existing pages only and does not allocate new ones, but it does compact the index pages. In addition, reorganization uses minimal resources and is automatically performed online, without blocking queries or updates. You should reorganize indexes only if they’re lightly fragmented, otherwise, you should rebuild them
ALTER INDEX PK_StoreContact_CustomerID_ContactID ON AdventureWorks.Sales.StoreContact
REORGANIZE

Rebuilding Indexes
Rebuilding an index is generally more effective than reorganizing it because it drops the original index and builds a new one. As a result, the index starts clean with minimal fragmentation and the pages are compacted, and new pages are allocated as needed. In addition, you can choose to rebuild an index offline (the default) or online.
ALTER INDEX IX_StoreContact_ContactID ON
AdventureWorks.Sales.StoreContact
REBUILD

SQL Server script to rebuild all tables which are fragmented.
-- =============================================
-- Author: Avishkar Meshram
-- Create date: 11/26/2009
-- Description: Check fragmentation in table and rebuild table
-- =============================================
CREATE PROCEDURE [dbo].[RebuildFragmentedTables]
-- Add the parameters for the stored procedure here
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
DECLARE @Table VARCHAR(255)
DECLARE @cmd NVARCHAR(500)
DECLARE @dbid smallint;
DECLARE @FragCount smallint; -- Fragmentation level in table
SET @cmd = 30
Create table #RebuildTable
(
Table_Name varchar(50)
)
-- Gather tables which are fragmented
INSERT INTO #RebuildTable
SELECT OBJECT_NAME(OBJECT_ID)
FROM sys.dm_db_index_physical_stats(DB_ID(),
NULL, NULL, NULL , 'DETAILED')
WHERE avg_fragmentation_in_percent > 30

SET @cmd = 'DECLARE TableCursor CURSOR FOR
SELECT Distinct Table_Name
FROM #RebuildTable'
-- Rebuild only those tables which are fragmented
EXEC (@cmd)
OPEN TableCursor
FETCH NEXT FROM TableCursor INTO @Table
WHILE @@FETCH_STATUS = 0
BEGIN
SET @cmd = 'ALTER INDEX ALL ON ' + @Table + ' REBUILD'
PRINT (@cmd)
--EXEC (@cmd)
FETCH NEXT FROM TableCursor INTO @Table
END
CLOSE TableCursor
DEALLOCATE TableCursor
DROP TABLE #RebuildTable
END

July 22, 2009

SQL SERVER – 2005 – List Tables in Database With/Without Primary Key

USE DatabaseName;
GO
SELECT SCHEMA_NAME(schema_id) AS SchemaName,name AS TableName
FROM sys.tables
WHERE OBJECTPROPERTY(OBJECT_ID,'TableHasPrimaryKey') = 0
ORDER BY SchemaName, TableName;
GO

SELECT i.name AS IndexName,OBJECT_NAME(ic.OBJECT_ID) AS TableName,COL_NAME(ic.OBJECT_ID,ic.column_id) AS ColumnName
FROM sys.indexes AS i
INNER JOIN sys.index_columns AS ic
ON i.OBJECT_ID = ic.OBJECT_ID
AND i.index_id = ic.index_id
WHERE i.is_primary_key = 1

Add windows user to sysadmin role to SQL server 2005

A SQL Server instance can contain multiple user databases. Each user database has a specific owner; the owner defaults to the database creator. By definition, members of the sysadmin server role (including system administrators if they have access to SQL Server through their default group account) are database owners (DBOs) in every user database. In addition, there is a database role, db_owner, in every user database. Members of the db_owner role have approximately the same privileges as the dbo user.

To explicitly grant SQL Server 2005 logins directly to the service accounts that are used by SQL Server 2005 and by SQL Server Agent.
CREATE LOGIN [FFSERVER\Calldp] FROM WINDOWS WITH DEFAULT_DATABASE=[master]

Use an account that is a member of the SYSADMIN fixed server role to provision the logins that you added in above step in the SYSADMIN fixed server role.
EXEC master..sp_addsrvrolemember @loginame = N'FFSERVER\Calldp', @rolename = N'sysadmin'

July 20, 2009

To get table and column which are using particular data type

select object_name(c.id),c.name
from sys.objects o,
sys.syscolumns c,
sys.types t
where o.object_id=c.id
and c.xtype=t.system_type_id
and t.name='varbinary'
and o.type='U'

July 16, 2009

Some usefull queries in performance monitoring

Some usefull queries in performance monitoring:
SELECT
d1.session_id,
d3.[text],
d1.login_time,
d1.login_name,
d2.wait_time,
d2.blocking_session_id,
d2.cpu_time,
d1.memory_usage,
d2.total_elapsed_time,
d2.reads,d2.writes,
d2.logical_reads,
d2.sql_handle
FROM sys.dm_exec_sessions d1
JOIN sys.dm_exec_requests d2 ON d1.session_id=d2.session_id
CROSS APPLY sys.dm_exec_sql_text(d2.sql_handle) d3

Idntify locking query:
select distinct
t1.session_id as THIS_IS_Blocked_session,
t1.status as Blocked_session_status,
t1.command as Blocked_session_command,
t1.DatabaseName,
t1.text as Blocked_session_text,
t2.session_id as BY_THIS_Blocking_session,
t2.status as Blocking_session_status,
t2.command as Blocking_session_command,t2.DatabaseName,t2.text as Blocking_session_text
from
(
select
er.session_id,
er.status,
er.command,
DB_NAME(database_id) AS 'DatabaseName',
SUBSTRING(st.text, (er.statement_start_offset/2)+1,((CASE er.statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE er.statement_end_offset END - er.statement_start_offset)/2) + 1) AS text
FROM
sys.dm_exec_requests AS er
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) AS st
) as T1
join sysprocesses as s on t1.session_id = s.spid
join syslockinfo as l on s.spid = l.req_spid
join
(
select
er.session_id,
er.status,
er.command,
DB_NAME(database_id) AS 'DatabaseName',
SUBSTRING(st.text, (er.statement_start_offset/2)+1,((CASE er.statement_end_offset WHEN -1 THEN DATALENGTH(st.text) ELSE er.statement_end_offset END - er.statement_start_offset)/2) + 1) AS text
FROM
sys.dm_exec_requests AS er
CROSS APPLY sys.dm_exec_sql_text(er.sql_handle) AS st
) as T2 on s.blocked = t2.session_id
where s.blocked >0

July 10, 2009

Script for Triggers in SQL server

-- To drop triggers associated with Particular table

select object_name(parent_obj),' drop trigger '+object_name(b.id)--b.text
from sysobjects a
join syscomments b
on a.id=b.id
where object_name(parent_obj)='Table_Name'
and xtype='TR'

-- To create script of triggers associated with Particular table
select object_name(parent_obj),object_name(b.id),b.text
from sysobjects a
join syscomments b
on a.id=b.id
where object_name(parent_obj)='Table_Name'
and xtype='TR'

June 18, 2009

Varchar Vs NVarchar

Varchar is a single-byte character and nvarchar is a double-byte character.
The obvious point that the same text takes up twice as much space in the database as nvarchar - because the unicode character set uses 2 bytes per character. So unless you need funny characters it is better to use varchar. This is not just because of database space - obviously all data being pulled back to the client is doubled up too - so 100 chars of varchar = about 100 bytes but it would be 200 bytes for nvarchar doubling network loads.
It deals with the performance hit you can get when SQL Server implicitly converts your data to nchar and nvarchar for you when you have enabled Unicode support in your data sources.
If you are selecting against a char or varchar column in your database with a Unicode value in an nchar or nvarchar variable, it uses Clustered Index Scan in execution plan since every value in the table must be converted before it can be compared.
i.e. CONVERT_IMPLICIT(nvarchar(50),[myvarchar],0)=[@myparam]
While varchar uses Clustered Index seek.

June 04, 2009

What is a Bookmark Lookup in Sql Server?

A bookmark lookup is the process of finding the actual data in the SQL table, based on an entry found in a non-clustered index.
When you search for a value in a non-clustered index, and your query needs more fields than are part of the index leaf node (all the index fields, plus any possible INCLUDE columns), then SQL Server needs to go retrieve the actual data page(s) - that's what's called a bookmark lookup.
In some cases, that's really the only way to go - only if your query would require just one more field (not a whole bunch of 'em), it might be a good idea to INCLUDE that field in the non-clustered index. In that case, the leaf-level node of the non-clustered index would contain all fields needed to satisfy your query (a "covering" index), and thus a bookmark lookup wouldn't be necessary anymore.
It's a NESTED LOOP which joins a non-clustered index with the table itself on a row pointer.
Happens for the queries like this:SELECT col1
FROM table
WHERE col2 BETWEEN 1 AND 10
, if you have an index on col2.
The index on col2 contains pointers to the indexed rows.
So, in order to retrieve the value of col1, the engine needs to scan the index on col2 for the key values from 1 to 10, and for each index leaf, refer to the table itself using the pointer contained in the leaf, to find out the value of col1.
The Bookmark Lookup operator uses a bookmark (row ID or clustering key) to look up the corresponding row in the table or clustered index. The Argument column contains the bookmark label used to look up the row in the table or clustered index. The Argument column also contains the name of the table or clustered index in which the row is looked up. If the WITH PREFETCH clause appears in the Argument column, the query processor has determined that it is optimal to use asynchronous prefetching (read-ahead) when looking up bookmarks in the table or clustered index.

When a non-indexed column is required in the SELECT list, the actual table row must be accessed to retrieve the column. This is called a bookmark lookup, and it can be a very expensive operation on a large table. I have often seen bookmark lookups as the bottleneck on a poor performing query. So, depending on the queries in the application, it could be beneficial to have quite a few columns in the index to avoid bookmark lookups.
Avoiding bookmark lookups and table scans are two reasons you may have for wanting extra columns as part of the index. Another thing to consider is that indexes take up substantial disk space and must be updated whenever data changes are made to the table. Indexes must be finely tuned and not just arbitrarily created to avoid a negative impact on performance.

To avoid Bookmark Lookup use Included Columns in SQL Server 2005
Key Column Size Limitations :
The key columns of an index may not exceed 16 columns or a key length of 900 bytes. Using more than 16 columns in an index seems excessive to me, but maybe there are some business or performance reasons to do this based on the application. The 900 byte limitation means that the sum of the maximum possible width of each of the columns in the index must not exceed 900 bytes. For example, an int column takes up 4 bytes. The number of bytes in a char or varchar column is equal to the number of characters the column can hold. The number of bytes in an nchar or nvarchar column is twice the number of characters.

By adding columns that can not be added to the index key as Included Columns, you can overcome the index key size limitations. Included columns are not counted in the 16 column or 900 byte limits. You can also add columns of the new large value data types, such as varchar(max), as an Included Column. The Included Columns are not stored at the intermediate levels of the index, only at the leaf level. Even though they are not stored at the intermediate levels, Included Columns still improved the performance of several queries I tested when the column was used in the WHERE clause along with the first column in the index. Bookmark lookups are avoided when Included Columns are part of the SELECT list.
A column of any data type except for text, ntext and image can be an Included Columns. Computed columns can also be used. The column can not be dropped while it is an Included Column.
USE AdventureWorks
GO
--Add a new large column to the table
ALTER TABLE Sales.SalesOrderDetail
ADD TestIndex char(500) NULL
GO

--Update the column with some data
UPDATE Sales.SalesOrderDetail
SET TestIndex = replicate(cast(SalesOrderID as varchar) +
cast(SalesOrderDetailID as varchar) +
coalesce(CarrierTrackingNumber,''),5)
GO

--Drop the index on the ProductID
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )
GO
exec sp_spaceused 'Sales.SalesOrderDetail'
--Recreate the index
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)
GO
--Before the new column is added to the index
exec sp_spaceused 'Sales.SalesOrderDetail'
GO

--Drop the index
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO
--Add the new column to the index
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID],
[TestIndex]
)
GO

exec sp_spaceused 'Sales.SalesOrderDetail'

--Drop the index
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO

--Add the column as an included colum
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)INCLUDE ( [TestIndex])

exec sp_spaceused 'Sales.SalesOrderDetail'

GO
--Reverse all changes
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id =
OBJECT_ID(N'[Sales].[SalesOrderDetail]') AND name = N'IX_SalesOrderDetail_ProductID')
DROP INDEX [IX_SalesOrderDetail_ProductID]
ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )

GO
ALTER TABLE Sales.SalesOrderDetail DROP COLUMN TestIndex
GO
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail]
(
[ProductID]
)

http://www.codersrevolution.com/index.cfm/2009/2/14/SQL-Server-Performance-How-Are-My-Indexes-Being-Using

http://www.sql-server-performance.com/articles/clustering/create_nonkey_columns_p1.aspx

What is a Bookmark Lookup in Sql Server?

Microsoft SQL Server 2005 : MIRRORING, CLUSTERING, OR REPLICATION

Microsoft SQL Server 2005 offers four solutions to mitigate the risk of hardware failure. In order of data-availability and minimized data-loss, your choices are log shipping, replication, and mirroring or clustering.

Log shipping :
In log shipping, log backups from a primary database server are ‘shipped’ to a secondary database server at an alterable, pre-determined interval. As the logs arrive at the secondary server, they are ‘applied’ to the database. Before a failover can occur, all unrestored logs must first be manually applied to the secondary server.
The advantage of log shipping is that the same logs can be applied to multiple standby servers for multiple purposes, and is an integral component of any disaster recovery plan. The disadvantage is that failover cannot be automated, that downtime can be lengthy, and that there is a distinct possibility of data-loss depending on the shipping interval.
Log shipping is supported in all versions of SQL Server 2005, excepting ‘Express’ and ‘Compact’ editions.
Replication :
Replication is an automated distribution of data to one or more receiving nodes. Microsoft refers to this relationship as a publish-subscribe model—a Publisher (the primary database) pushes its information to a secondary database server called a Subscriber.
Subscriber nodes are available for reporting, data warehousing, and backup purposes, and should be considered for any application where the strain of these tasks should not be performed by the primary online node.
For online applications that are very lookup-intensive (e.g., searches), connections can be sent to a farm of Subscriber nodes. In that same scenario, database writes (e.g., ecommerce purchases) can be sent to the Publisher for processing. This application architecture ensures proper scalability in the database layer, and is a very effective load-balancing mechanism.
SQL Server 2005 supports three types of replication: snapshot, transactional, and merge. Snapshot replication is a good choice when data changes infrequently or the dataset is small. Merge replication is appropriate when multiple environments work independently and on separate sets of data, which later need to be re-integrated (merged). Transactional replication is the best choice when data changes frequently and needs to be pushed to subscribers in near-real-time.
Replication is supported by all editions of SQL Server 2005, but only ‘publishing’ is not available in the ‘Express’ or ‘Compact’ editions.

Mirroring :
Database mirroring is new in SQL Server 2005. Mirroring is a form of database clustering to keep an exact real-time copy of the database on a second server. This setup requires three servers: a Principal, a Mirror, and a Witness. The Witness brokers the relationship between the Principal and the Mirror, and initiates a seamless automated failover when necessary.
The advantage of mirroring is that it is a cost-effective solution when only one or very few databases need to be maintained, and the entire dataset fits onto the servers’ disk drives. The drawback is that the Mirror cannot be accessed for reporting or backups (only snapshots). Additionally, Mirroring is not a server but a database clustering technology. This means that the Mirror server must have all the proper logins, SQL Agent jobs, SQL Server Integration Services, and other supporting components and configurations. Lastly, Mirroring is not appropriate for inter-dependent databases, as only one database might failover, breaking the dependencies.

Failover Clustering:
Failover clustering is as much a function of the underlying operating system as it is SQL Server. In failover clustering for SQL Server 2005, the advantages and caveats flip-flop when compared to database mirroring. Failover clustering has higher hardware and resource requirements—such as shared external storage (a SAN)—but can accommodate a much higher volume of data and databases.

Windows Server 2003, with Microsoft Cluster Service (MSCS) supports Active/Passive and Active/Active cluster topologies. The “Standard” edition of SQL Server 2005 supports only two nodes in the cluster, whereas “Enterprise” supports up to eight active nodes. An Active/Passive cluster only supports one instance of a database. Should the Active node fail, the Passive node will take over the workload. Up until then, the Passive node has been sitting around idly with nothing to do, except for receiving transaction updates from the Active server. However, in Active/Passive clustering the financial benefit is that the Passive node does not require a separate database license, yet can be used for backup and reporting purposes of the idle database.
Active/Active clustering supports multiple database instances, meaning that individual nodes in the cluster can be online and actively collecting and manipulating data, and also failover to another node in the cluster. In an Active/Active cluster all nodes need to be licensed.

· Replication would be primarily suited for online-offline data synchronization scenarios (laptop , mobile devices, remote servers).
· Log shipping could be used to have a failover server with manual switching.
· Database Mirroring is an automatic failover technique
· Failover Clustering is an advanced type of database mirroring.

Failover clustering is an availability technology that provides redundancy at the hardware level and is built on top of Windows Clustering technology, i.e. it is not specific to SQL Server.
For example, the processor blows up on Server A. Fortunately Server A is part of a SQL Server Cluster and so Server B takes over the job of providing the SQL Server Service, within a matter of seconds. All of this occurs automatically and is transparent to the database users and or application being served by the cluster.
The main difference between Database Mirroring and clustering is that SQL Clustering provides redundancy at the instance level whereas database mirroring provides redundancy at the database level.
Log shipping is considered more of a redundancy technology.
For example, it can be used to provide a full copy of your primary environment, typically used as a warm standby that can be manually brought online. This can be used to provide additional redundancy to your backup strategy. Log shipping can also be used to offload reporting from a primary server by creating a read only copy of the production database at an alternative location/server.
Replication is quite a diverse technology and can be used to cater for a number of different scenarios, the choice of which will determine the specific type of replication that is implemented.
For example, merge replication can be used to support distributed processing by spreading the workload of an application across several servers, i.e. distributed processing architectures.
Merge replication often requires an application that is relatively aware of its environment. Techniques such as conflict resolution also have to be taken into consideration in order to ensure data consistency across the entire integrated environment.
Transactional Replication can be used in a similar fashion to log shipping however you can limit the specific objects that are replicated to the subscriber. This can be useful if only a subset of tables is required for reporting purposes.

A failover cluster is a combination of one or more physical disks in a Microsoft Cluster Service (MSCS) cluster group, known as a resource group, that are participating nodes of the cluster. The resource group is configured as a failover clustered instance that hosts an instance of SQL Server. A SQL Server failover clustered instance appears on the network as if it were a single computer, but has functionality that provides failover from one node to another if one node becomes unavailable.
Failover clusters provide high-availability support for an entire Microsoft SQL Server instance, in contrast to database mirroring, which provides high-availability support for a single database. Database mirroring works between failover clusters and, also, between a failover cluster and a nonclustered host.

Typically, when mirroring is used with clustering, the principal server and mirror server both reside on clusters, with the principal server running on the failover clustered instance of one cluster and the mirror server running on the failover clustered instance of a different cluster. You can establish a mirroring session in which one partner resides on the failover clustered instance of a cluster and the other partner resides on a separate, unclustered computer, however.
If a cluster failover makes a principal server temporarily unavailable, client connections are disconnected from the database. After the cluster failover completes, clients can reconnect to the principal server on the same cluster, or on a different cluster or an unclustered computer, depending on the operating mode.
When deciding how to configure database mirroring in a clustered environment, the operating mode you use for mirroring is significant.
High-Safety mode Session with Automatic Failover
If you intend to mirror a database in high-safety mode with automatic failover, a two-cluster configuration is recommended for the partners. This configuration provides maximum availability. The witness can reside either on a third cluster or on an unclustered computer.
If the node running the current principal server fails, automatic failover of the database begins within a few seconds, while the cluster is still failing over to another node. The database mirroring session fails over to the mirror server on the other cluster or unclustered computer, and the former mirror server becomes the principal server. The new principal server rolls forward its copy of the database as quickly as possible and brings it online as the principal database. After the cluster failover completes, which typically takes several minutes, the failover clustered instance that was formerly the principal server becomes the mirror server
The following illustration shows an automatic failover between clusters in a mirroring session running in high-safety mode with a witness (which supports automatic failover).
The three server instances in the mirroring session reside on three distinct clusters: Cluster_A, Cluster_B, and Cluster_C. On each cluster, a default instance of SQL Server is running as a SQL Server failover clustered instance. When the mirroring session starts, the failover clustered instance on Cluster_A is the principal server, the failover clustered instance on Cluster_B is the mirror server, and the failover clustered instance on Cluster_C is the witness in the mirroring session. Eventually, the active node on Cluster_A fails, which causes the principal server to become unavailable.
Before the cluster has time to fail over, the loss of the principal server is detected by the mirror server, with the help of the witness. The mirror server rolls forward its database and brings it online as the new principal database as quickly as possible. When Cluster_A finishes failing over, the former principal server is now the mirror server, and it synchronizes its database with the current principal database on Cluster_B.
High-Safety Mode Session Without Automatic Failover
If you are mirroring a database in high-safety mode without automatic failover, another node in the cluster will act as the principal server if the node running the current principal server fails. Note that while the cluster is unavailable, the database is unavailable.
High-Performance Mode Session
If you intend to mirror a database in high-performance mode, consider placing the principal server on the failover clustered instance of a cluster and placing the mirror server on an unclustered server in a remote location. If the cluster fails over to a different node, the failover clustered instance will continue as the principal server in the mirroring session. If the entire cluster has problems, you can force service onto the mirror server

June 03, 2009

SQL Server performance tips

Great set of tips for SQL Server performance :

1. Does your SQL statement have a WHERE clause?
I know this sounds obvious, but don't retrieve more data than you need. However, less obvious is that even if your SELECT statement retrieves the same quantity of data without a WHERE clause, it may run faster with one.

2. Is SELECT DISTINCT being used properly?
Again, pretty obvious, but using SELECT DISTINCT where no duplicate records are being returned is an unnecessary performance hit. If you are getting duplicate records, first double check your table joins as this is often the cause and only use the DISTINCT clause if you really need it.

3. Are you using UNION instead of UNION ALL?
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it is much quicker.

4. Are your stored procedures prefixed with 'sp_'?
Any stored procedures prefixed with 'sp_' are first searched for in the Master database rather than the one it is created in. This will cause a delay in the stored procedure being executed.

5. Are all stored procedures referred to as dbo.sprocname?
When calling a stored procedure you should include the owner name in the call, i.e. use EXEC dbo.spMyStoredProc instead of EXEC spMyStoredProc.Prefixing the stored procedure with the owner when executing it will stop SQL Server from placing a COMPILE lock on the procedure while it determines if all objects referenced in the code have the same owners as the objects in the current cached procedure plan.

6. Are you using temporary tables when you don't need to?
Although there is sometimes a benefit of using temporary tables, generally they are best eliminated from your stored procedure. Don't assume that retrieving data multiple times is always less efficient than getting the data once and storing it in temporary table as often it isn't. Consider using a sub-query or derived table instead of a temporary table (see examples below). If you are using a temporary table in lots of JOINS in you stored procedure and it contains loads of data, it might be beneficial to add an index to your temporary table as this may also improve performance.
An example of a derived table instead of a temporary table

SELECT COLUMN1, COLUMN2, COUNTOFCOL3
FROM A_TABLE A
INNER JOIN (SELECT COUNT(COLUMN3) AS COUNTOFCOL3,
COLUMN2
FROM B_TABLE B
INNER JOIN C_TABLE C ON B.ID = C.ID
) ON A.ID = B.ID

7. Are you using Cursors when you don't need to?
Cursors of any kind slow down SQL Server's performance. While in some cases they are unavoidable, often there are ways to remove them from your code.Consider using any of these options instead of using a cursor as they are all faster:
Derived tables
Sub-queries
CASE statements
Multiple queries
Temporary tables

8. Are your Transactions being kept as short as possible?
If you are use SQL transactions, try to keep them as short as possible. This will help db performance by reducing the number of locks. Remove anything that doesn't specifically need to be within the transaction like setting variables, select statements etc.

9. Is SET NO COUNT ON being used?
By default, every time a stored procedure is executed, a message is sent from the server to the client indicating the number of rows that were affected by the stored procedure. You can reduce network traffic between the server and the client if you don't need this feature by adding SET NO COUNT ON at the beginning of your stored procedure.

10. Are you using IN or NOT IN when you should be using EXISTS or NOT EXISTS?
If you are using IN or NOT IN in a WHERE clause that contains a sub-query you should re-write it to use either EXISTS, NOT EXISTS or perform a LEFT OUTER JOIN. This is because particularly the NOT IN statement offers really poor performance. The example below probably better explains what I mean: e.g. This SQL statement:
SELECT A_TABLE.COLUMN1
FROM A_TABLE
WHERE A_TABLE.COLUMN2 NOT IN (SELECT A_TABLE2.COLUMN2FROM A_TABLE2)Could be re-written like this:
SELECT A_TABLE.COLUMN1
FROM A_TABLE
WHERE NOT EXISTS (SELECT A_TABLE2.COLUMN2FROM A_TABLE2
WHERE A_TABLE.COLUMN2 = A_TABLE2.COLUMN2
)

11. Do you have a function that acts directly on a column used in a WHERE clause?
If you apply a function to a column used in the WHERE clause of your SQL statement, it is unlikely that the SQL statement will be able to make use of any indexes applied to that column.e.g.
SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE SUBSTRING (FIRSTNAME,1,1) = 'm'Could be re-written:
SELECT A_TABLE.LASTNAME
FROM A_TABLE
WHERE FIRSTNAME LIKE = 'm%'Where you have a choice of using the IN or BETWEEN clausesUse the BETWEEN clause as it is much more efficiente.g. This SQL statement:
SELECT A_TABLE.NAME
FROM A_TABLE
WHERE A_TABLE.NUMBER IN (100, 101, 102, 103)Should be re-written like this:
SELECT A_TABLE.NAME
FROM A_TABLE
WHERE A_TABLE.NUMBER BETWEEN 100 AND 103

12. Are you doing excessive string concatenation in your stored procedure?
Where possible, avoid doing loads of string concatenation as it is not a fast process in SQL Server.

13. Have you checked the order of WHERE clauses when using AND?
If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written (assuming that no parenthesis have been used to change the order of execution). You may want to consider one of the following when using AND:
Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time.
If both parts of an AND expression are equally likely being false, put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression.

12. Have you checked that you are using the most efficient operators?
Often you don't have much of a choice of which operator you use in your SQL statement. However, sometimes there is an alternative way to re-write your SQL statement to use a more efficient operator. Below is a list of operators in their order of performance (with the most efficient first).
=
>, >=, <, <=
LIKE
<>

May 29, 2009

SQL Server Query Execution Plan Analysis

When it comes time to analyze the performance of a specific query, one of the best methods is to view the query execution plan. A query execution plan outlines how the SQL Server query optimizer actually ran (or will run) a specific query. This information if very valuable when it comes time to find out why a specific query is running slow.
If you want to see an execution plan, but you don't want to run the query, you can choose the option "Display Estimated Execution Plan" (located on the Query drop-down menu). When you select this option, immediately an execution plan (in graphical format) will appear. The difference between these two (if any) is accountable to the fact that when a query is really run (not simulated, as in this option), current operations of the server are also considered. In most cases, plans created by either method will produce similar results.
If you see any of the following in an execution plan, you should consider them warning signs and investigate them for potential performance problems. Each of them are less than ideal from a performance perspective.
Index or table scans: May indicate a need for better or additional indexes.
Bookmark Lookups: Consider changing the current clustered index, consider using a covering index, limit the number of columns in the SELECT statement.
Filter: Remove any functions in the WHERE clause, don't include wiews in your Transact-SQL code, may need additional indexes.
Sort: Does the data really need to be sorted? Can an index be used to avoid sorting? Can sorting be done at the client more efficiently?
It is not always possible to avoid these, but the more you can avoid them, the faster query performance will be.
If you have a stored procedure, or other batch Transact-SQL code that uses temp tables, you cannot use the "Display Estimated Execution Plan" option in the Query Analyzer or Management Studio to evaluate it. Instead, you must actually run the stored procedure or batch code. This is because when a query is run using the "Display Estimated Execution Plan" option, it is not really run, and temp tables are not created. Since they are not created, any references to them in the code will fail, which prevents an estimated execution plan from being created.
On the other hand, if you use a table variable instead of a temp table, you can use the "Display Estimated Execution Plan" option
If you have a very complex query you are analyzing in Query Analyzer or Management Studio as a graphical query execution plan, the resulting plan can be very difficult to view and analyze. You may find it easier to break down the query into its logical components, analyzing each component separately.

Keep the following in mind when viewing a graphical execution plan:
In very complex query plans, the plan is divided into many parts, with each part listed one on top of the other on the screen. Each part represents a separate process or step that the query optimizer has to perform in order to get to the final results.
Each of the execution plan steps is often broken down into smaller sub-steps. Unfortunately, they are displayed on the screen from right to left. This means you must scroll to the far right of the graphical query plan to see where each step starts.
Each of the sub-steps and steps is connected by an arrow, showing the path (order) taken of the query when it was executed.
Eventually, all of the parts come together at the top left side of the screen.
If you move your cursor above any of the steps or sub-steps, a pop-up windows is displayed, providing more detailed information about this particular step or sub-step.
If you move your cursor over any of the arrows connecting the steps and sub-steps, you see a pop-up window showing how many records are being moved from one step or sub-step to another step or sub-step.
The arrows that connect one icon to another in a graphical query plan have different thicknesses. The thickness of the arrow indicates the relative cost in the number of rows and row size of the data moving between each icon. The thicker the arrow, the more the relative cost is.
You can use this indicator as a quick gauge as to what is happening within the query plan of your query. You will want to pay extra attention to thick arrows in order to see how it affects the performance of your query. For example, thick lines should be at the right of the graphical execution plan, not the left. If you see them on the left, this could indicate that too many rows are being returned, and that the query execution plan is less than optimal.
In an execution plan, each part of it is assigned a percentage cost. This represents how much this part costs in resource use, relative to the rest of the execution plan. When you analyze an execution plan, you should focus your efforts on those parts that have the largest percentage cost. This way, you focus your limited time on those areas that have the greatest potential for a return on your time investment.

When you place the cursor over a table name (and its icon) in a graphical execution plan and display the pop-up window, you will see one of several messages. These messages tell you if and how an index was used to retrieve data from a table. They include:
Table Scan: If you see this message, it means there was no clustered index on the table and that no index was used to look up the results. Literally, each row in the table had to be examined. If a table is relatively small, table scans can be very fast, sometimes faster than using an index. So the first thing you want to do, when you see that a table scan has been performed, is to see how many rows there are in the table. If there are not many, then a table scan may offer the best overall performance. But if this table is large, then a table scan will most likely take a long time to complete, and performance will suffer. In this case, you need to look into adding an appropriate index(s) to the table that the query can use. Let's say that you have identified a query that uses a table scan, but you also discover that there is an appropriate nonclustered index, but it is not being used. What does that mean, and why isn't the index being used? If the amount of data to be retrieved is large, relative to the size of the table, or if the data is not selective (which means that there are many rows with the same values in the same column), a table scan is often performed instead of an index seek because it is faster. For example, if a table has 10,000 rows, and the query returns 1,000 of them, then a table scan of a table with no clustered index will be faster than trying to use a non-clustered index. Or, if the table had 10,000 rows, and 1,000 of the rows have the same value in the same column (the column being used in the WHERE clause), a table scan is also faster than using a non-clustered index. When you view the pop-up window when you move the cursor over a table in a graphical query plan, notice the "Estimated Row Count" number. This number is the query optimizer's best guess on how many rows will be retrieved. If a table scan was done, and this number is very high, this tells you that the table scan was done because a high number of records were returned, and that the query optimizer believed that it was faster to perform a table scan than use the available non-clustered index.
Index Seek: When you see this, it means that the query optimizer used a non-clustered index on the table to look up the results. Performance is generally very quick, especially when few rows are returned.
Clustered Index Seek: If you see this, this means that the query optimizer was able to use a clustered index on the table to look up the results, and performance is very quick. In fact, this is the fastest type of index lookup SQL Server can do.
Clustered Index Scan: A clustered index scan is like a table scan, except that it is done on a table that has a clustered index. Like a regular table scan, a clustered index scan may indicate a performance problem. Generally, they occur for two different reasons. First, there may be too many rows to retrieve, relative to the total number of rows in the table. See the "Estimated Row Count" to verify this. Second, it may be due to the column queried in the WHERE clause may not be selective enough. In any event, a clustered index scan is generally faster than a standard table scan, as not all records in the table always have to be searched when a clustered index scan is run, unlike a standard table scan. Generally, the only thing you can do to change a clustered index scan to a clustered index seek is to rewrite the query so that it is more restrictive and fewer rows are returned.
In most cases, the query optimizer will analyze joins and JOIN the tables using the most efficient join type, and in the most efficient order. But not always. In the graphical query plan you will see icons that represent the different types of JOINs used in the query. In addition, each of the JOIN icons will have two arrows pointing to it. The upper arrow pointing to the JOIN icon represents the outer table in the join, and the lower arrow pointing to the JOIN icon represent the inner table in the join. Follow the arrows back to see the name of the tables being joined.
Sometimes, in queries with multiple JOINs, tracing the arrow back won't reveal a table, but another JOIN. If you place the cursor over the arrows pointing to the upper and lower JOINs, you will see a popup window that tells you how many rows are being sent to the JOIN for processing. The upper arrow should always have fewer rows than the lower arrow. If not, then the JOIN order selected by the query optimizer might be incorrect (see more on this below).
First of all, let's look at JOIN types. SQL Server can JOIN a table using three different techniques: nested loop, hash, and merge. Generally, the fastest type of join in a nested loop, but if that is not feasible, then a hash JOIN or merge JOIN is used (as appropriate), both of which tend to be slower than the nested JOIN.
When very large tables are JOINed, a merge join, not a nested loop join, may be the best option. The only way to know is to try both and see which one is the most efficient.
If a particular query is slow, and you suspect it may be because the JOIN type is not the optimum one for your data, you can override the query optimizer's choice by using a JOIN hint. Before you use a JOIN hint, you will want to take some time to learn about each of the JOIN types and how each one works. This is a complicated subject, beyond the scope of this tip.
JOIN order is also selected by the query optimizer, which it trying to select the most efficient order to JOIN tables. For example, for a nested loop join, the upper table should be the smaller of the two tables. For hash joins, the same is true; the upper table should be the smaller of the two tables. If you feel that the query optimizer is selecting the wrong order, you can override it using JOIN hints.
In many cases, the only way to know for sure if using a JOIN hint to change JOIN type or JOIN order will boost or hinder performance is to give them a try and see what happens.
Often, when viewing a graphical query execution plan, you see an icon labeled "Bookmark Lookup." Bookmark lookups are quite common to see. Essentially, they are telling you that the Query Processor had to look up the row columns it needs from a heap or a clustered index, instead of being able to read it directly from a non-clustered index.
For example, if all of the columns in the SELECT, JOIN, and WHERE clauses of a query don't all exist in the non-clustered index used to locate the rows that meet the query's criteria, then the Query Optimizer has to do extra work and look at the heap or clustered index to find all the columns it needs to satisfy the query.
Another cause of a bookmark lookup is using SELECT *, which should never be used, as in most cases it will return more data that you really need.
Bookmark lookups are not ideal from a performance perspective because extra I/O is required to look up all the columns for the rows to be returned.
If you think that a bookmark lookup is hurting a query's performance, you have four potential options to avoid it. First, you can create a clustered index that will be used by the WHERE clause, you can take advantage of index intersection, you can create a covering non-clustered index, or you can (if you have SQL Server 2000/2005 Enterprise Edition, create an indexed view. If none of these are possible, or if using one of these will use more resources than using the bookmark lookup, then the bookmark lookup is the optimal choice.

In a graphical query execution plan, often you see the Stream Aggregate icon. This just means that some sort of aggregation into a single input is being performed. This is most commonly seen when a DISTINCT clause is used, or any aggregation operator, such as AVG, COUNT, MAX, MIN, or SUM.

April 23, 2009

Schtasks: To create, delete, query, change, run, and end scheduled tasks on a local or remote computer

Schtasks used to create, delete, query, change, run, and end scheduled tasks on a local or remote computer. Running Schtasks.exe without arguments displays the status and next run time for each registered task.

The following syntax is used to create a task on the local or remote computer.
schtasks /Create
[/S system [/U username [/P [password]]]]
[/RU username [/RP [password]] /SC schedule [/MO modifier] [/D day]
[/M months] [/I idletime] /TN taskname /TR taskrun [/ST starttime]
[/RI interval] [ {/ET endtime /DU duration} [/K]
[/XML xmlfile] [/V1]] [/SD startdate] [/ED enddate] [/IT] [/Z] [/F]

Deleting a Task
The following syntax is used to delete one or more scheduled tasks.
schtasks /Delete
[/S system [/U username [/P [password]]]]
[/TN taskname] [/F]

Running a Task
The following syntax is used to immediately run a scheduled task.
schtasks /Run
[/S system [/U username [/P [password]]]]
/TN taskname

Ending a Running Task
The following syntax is used to stop a running scheduled task.
Note To stop a remote task from running, ensure that the remote computer has the File and Printer Sharing and Remote Scheduled Tasks Management firewall exceptions enabled.
schtasks /End
[/S system [/U username [/P [password]]]]
/TN taskname

Querying for Task Information
The following syntax is used to display the scheduled tasks from the local or remote computer.
schtasks /Query
[/S system [/U username [/P [password]]]]
[/FO format /XML] [/NH] [/V] [/TN taskname] [/?]

Changing a Task
The following syntax is used to change how the program runs, or change the user account and password used by a scheduled task.
schtasks /Change
[/S system [/U username [/P [password]]]] /TN taskname
{ [/RU runasuser] [/RP runaspassword] [/TR taskrun] [/ST starttime]
[/RI interval] [ {/ET endtime /DU duration} [/K] ]
[/SD startdate] [/ED enddate] [/ENABLE /DISABLE] [/IT] [/Z] }
http://technet.microsoft.com/en-us/library/bb490996.aspx

April 15, 2009

Cascading deletes in SQL Server

SQL Server supports cascading deletes. SQL Server does so via foreign key constraints with the DELETE CASCADE flag set. In the following example, after creating the objects and inserting some data, we delete a USR_ID from the parent data. After querying the child table (TUSER_PHONE) a second time, we can see that the cascading delete worked :

CREATE TABLE TUSERS
(
USR_ID int CONSTRAINT [PK_Temp_Users1] PRIMARY KEY CLUSTERED ([USR_ID])
)

CREATE TABLE TUSER_PHONE
(
USR_ID int
,CONSTRAINT [PK_Temp_Users2] PRIMARY KEY CLUSTERED ([USR_ID])
)
ALTER TABLE [dbo].TUSER_PHONE WITH CHECK ADD
CONSTRAINT [FK_Temp_UsersPhone_Users] FOREIGN KEY([USR_ID])
REFERENCES [dbo].[TUsers] ([USR_ID])
ON DELETE CASCADE
GO
INSERT INTO TUSERS
SELECT 1 UNION SELECT 2 UNION SELECT 3
INSERT INTO TUSER_PHONE
SELECT 1 UNION SELECT 2 UNION SELECT 3
SELECT * FROM TUSER_PHONE
DELETE TUSERS WHERE USR_ID=2
SELECT * FROM TUSER_PHONE

How to find record in all tables in database

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

-- =============================================
-- Author: Avishkar
-- Create date: 04/15/2009
-- Description: It will find the particular word in all tables in all fields and gives the result
-- Execute as : Exec SearchStringInAllTables 'avi'
-- =============================================
ALTER PROCEDURE [dbo].[SearchStringInAllTables]
(
@SearchStr nvarchar(100)
)
AS
BEGIN
CREATE TABLE #Results (ColumnName nvarchar(370), ColumnValue nvarchar(3630))
SET NOCOUNT ON
DECLARE @TableName nvarchar(256), @ColumnName nvarchar(128), @SearchStr2 nvarchar(110)
SET @TableName = ''
SET @SearchStr2 = QUOTENAME('%' + @SearchStr + '%','''')
WHILE @TableName IS NOT NULL
BEGIN
SET @ColumnName = ''
SET @TableName =
(
SELECT MIN(QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME))
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
AND QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME) > @TableName
AND OBJECTPROPERTY(
OBJECT_ID(
QUOTENAME(TABLE_SCHEMA) + '.' + QUOTENAME(TABLE_NAME)
), 'IsMSShipped'
) = 0
)
WHILE (@TableName IS NOT NULL) AND (@ColumnName IS NOT NULL)
BEGIN
SET @ColumnName =
(
SELECT MIN(QUOTENAME(COLUMN_NAME))
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = PARSENAME(@TableName, 2)
AND TABLE_NAME = PARSENAME(@TableName, 1)
AND DATA_TYPE IN ('char', 'varchar', 'nchar', 'nvarchar')
AND QUOTENAME(COLUMN_NAME) > @ColumnName
)
IF @ColumnName IS NOT NULL
BEGIN
INSERT INTO #Results
EXEC
(
'SELECT ''' + @TableName + '.' + @ColumnName + ''', LEFT(' + @ColumnName + ', 3630)
FROM ' + @TableName + ' (NOLOCK) ' +
' WHERE ' + @ColumnName + ' LIKE ' + @SearchStr2
)
END
END
END
SELECT ColumnName, ColumnValue FROM #Results
END

SQL Server - Fragmentation

Even though SQL Server is very good at more or less automatically maintaining most statistics, physical data structuring and configurations etc (especially compared to some of it's competitors), there are still some situations where you need to perform a little manual work to help it. One of those areas is fragmentation.

Different types of fragmentation
Fragmentation occurs as data is modified. There are two major distinctions regarding fragmentation, disk fragmentation and SQL Server fragmentation.

Logical disk fragmentation:
The database files can be logically fragmented in the file system just like any other operating system files. This occurs when the file system cannot create the file in one logically contigous space. The result of this is that the disk head must move back and forth over the disk when reading from the file. To remove this type of fragmentation you can use the defragmentation tools that are included in Windows or use some third-party application. Remember though that SQL Server must be stopped during the defragmentation process because when it is started the database files are in use by SQL Server, even if no user is actually using that specific database, and files in use can not be defragged.

Avoiding fragmentation of database files:
To keep fragmentation in the database files at a minimum without defragging them you should avoid using the autogrow setting in SQL Server for them. Preallocate enough size for them from the start and when growth still is needed try to increase the size in large chunks at once. Also make sure you keep the database files on separate disks away from other files to make sure logical contigous space is available for them. Separating the data files from the log file is helpful and also important for performance in other ways, since the log file is written to sequentially.

SQL Server Index FragmentationThe other major type of fragmentation, and the one that is the most interesting to us, is SQL Server fragmentation. This means that the data stored inside the database files is fragmented on an index level basis, regardless of whether the database file is logically fragmented or not. The easiest way to describe it in a couple of words it to say that it prevents SQL Server from being able to use indexes in an optimal way. Index fragmentation can be either internal or external. Note that external fragmentation is not the same thing as logical disk fragmentation, though the name might imply it. Each of these types are described in detail below.
Internal fragmentation
Pages that have a lot of free space are said to be internally fragmented. This is caused by rows that are removed by DELETE statements or when pages are split and only filled to about half. Empty space on pages means there are less rows per page, which in turn means more page reads. Systems that are read-intensive can see a significant degrade in performance from these extra reads. Even for applications that are defined as OLTP (OnLine Transaction Processing) systems, i.e. write-intensive systems, the ratio between reads and writes is usually more than 80/20.
Sometimes necessary and wanted:
However, page splits are very expensive operations. Sometimes the tradeoff of extra page reads during read operations for fewer page splits during write operations is correct. For systems that really are write-intensive, or perhaps specific operations that generate many writes, this might be necessary. There is really no correct advice regarding this, as it is completely different between different applications and systems.

External fragmentation:
When tables grow as new data is inserted into them the ideal situation is that pages get allocated to an extent, then when that extent is filled up a new extent is allocated, then filled up with pages and so on. This way all the pages will always be located adjacent to eachother and readingeight pages will in a best case scenario only require reading one extent from disk.

Reality, however, is seldom ideal. New rows, when inserted, are bound to cause page splits sooner or later, as will updates that cause variable sized rows to grow too large to fit on the original page. When a new page is created from a page split it will be allocated in the same extent as the original page if there is room for it. If the extent is already full, which will normally be the case in a production system, a new extent will be allocated to the index and the new page will be placed there. The new page will not be contigous to the original page, so when following the page chain using the previous and next pointers in the page headers SQL Server will need to perform an extra extent switch. In a worst case scenario reading eight pages in order might require seven extent switches, and eight extents must be read from disk.

Ordered scans are affected:
When reading individual rows external fragmentation will not affect the performance at all, and the same goes for unordered scans. The reason for the latter is that unordered scans can use the IAM pages to find which extents need to be fetched in a very efficient way. It is only for scans that need to fetch the pages in logical order that the page chain is used and external fragmentation might become a degrading factor for performance.

Viewing fragmentation information:
To view how much fragmentation an index has you use DBCC SHOWCONTIG. It can show information for all indexes on a table or just a single index at a time. Using the option TABLERESULTS you get extra columns in the output that describe statistics about the index. By default DBCC SHOWCONTIG scans the page chain at the leaf level, but it is possible to scan all levels of an index.
When reviewing the output from DBCC SHOWCONTIG you should pay special attention to the following metrics:
· Avg. Page Density (full): Shows the average level of how filled the pages are. A percentage means the pages are almost full, and a low indicates much free space. This value should be compared to the fill factor setting specified when the index was created to decide whether or not the index is internally fragmented.
· Scan Density: Shows the ratio between the Best Count of extents that should be necessary to read when scanning all the pages of the index, and the Actual Count of extents that was read. This percentage should be as close to 100% as possible. Defining an acceptable level is difficult, but anything under 75% definitely indicates external fragmentation.
· Logical Scan Fragmentation: Shows the ratio of pages that are out of logical order. The value should be as close to 0% as possible and anything over 10% indicates external fragmentation.

Two things to know about DBCC SHOWCONTIG’ output and the numbers in Scan Denisty, Logical Scan Fragmentation and Extent Scan Fragmentation
1) If your indexes are small, then these values do not matter at all. Why? Because SQL Server has two types of extents: Uniform Extents and Mixed Extents. Uniform extents are extents that are owned by a single object where as mixed extents are used for more than one smaller objects. In the case of a uniform extent, all the 8 pages for the extent (an extent = 8 pages = 64KB) are allocated to one single object. That is why in smaller objects, you will see that the scan density will be low, extent scan fragmentation will be high since the object will be using say a single page in N extents. If the objects are large, then SQL Server will be using the uniform extents and you will see scan density closer to 100% if there is no fragmentation.
2) Another thing to note is that the extent scan fragmentation data that is shown in the output of DBCC SHOWCONTIG does not represent the true value if there are multiple files in the filegroup and the index belongs to that filegroup. This is actually also documented in BOL.

Defragging:
There are four different ways to defragment an index.

Drop and recreate index Pros: Completely rebuilds the index. Reorders and compacts the index pages, and removes unnecessary pages. Cons: Index is not available during process which means that queries will suffer. The process will block all queries accessing the index and also be blocked itself by other processes using the index, unable to do it’s work until those processes are finished. This option is especially bad for clustered indexes as dropping a clustered index means that all non-clustered indexes must be rebuilt (to use RID as bookmark), and then recreating the clustered index will once again force a rebuild of all non-clustered indexes.
CREATE INDEX ... WITH DROP_EXISTING Pros: Rebuilds index in one step which is good for clustered indexes, as it means that non-clustered indexes do not need to be rebuilt. Can be used to recreate (and thereby defragment) indexes created by constraints, if the index definition matches the requirements of the constraint. Cons: Potential blocking problems with other processes in the same way as for drop and recreate index above.
DBCC DBREINDEX Pros: Does the best job of removing internal and external fragmentation since it physically assigns new pages to the index and moves data to them. Has the possibility to rebuild all indexes on a table using one statement. Recreates indexes for constraints without forcing you to know their requirements. Cons: The same blocking issues as for the two above. Runs in a transaction so all work that has been done is rolled back if the command is stopped.
DBCC INDEXDEFRAG Pros: Reorders the leaf pages in index order – enhancing performance for scans especially – and compacts them using the setting specified for fill factor when the index was created. Empty pages are removed. This is an online operation, meaning it does not take long-term locks and thereby do not block other processes and do not get blocked itself by them. Work that has already been done will still be done if the command is cancelled. Cons: The fact that this is an online operation can also have a negative impact, as this means that DBCC INDEXDEFRAG will simply skip pages that are used by other processes. Very fragmented indexes will take a long time to defragment, so they should probably be rebuilt instead if possible. Generates a lot of log activity.

April 14, 2009

SQL Server 2005- Indexes

If you have a primary key set on that table then a clustered index will be created .The other columns which are widely used in the queries and which are used in the 'where' clause you can create non-clustered indexes on them.But the creation of indexes should be balanced and you need to optimise it by testing .More number of indexes results in slow inserts and also increases the maintenance costs.

You need info about different things before deciding on the indexes. Some of them are
Volume of data in the table (IMP),
Type of queries that are going to hit the table (IMP)
Purpose of the data , Connectivity , Concurrent connections, Server Hardware.....
There are other considerations also but offhand i can think of these.Of these the first two are most important and index design is influenced the most by them.
Clustered index keep the actual data in the leaf pages of the index in a sorted order. In many of the cases clustered index also functions as the Primary key/unique index and so you would want your most "selective" columns to be part of the clust index.

best practices are
* frequently used tables must have a clustered index
* no need create index on tables having little records or columns having duplicates (like country, gender etc) with exception if the columns are filtered on often it may be very beneficial to have an index on the column even if there are lots of duplicates. The index may need to be covering for SQL to use it.
* avoid duplicate index sets like (col1,col2, col3) and (col1, col3,col2)
In above situation, they must have the same columns in the same order. There are queries that could effectivly used the first of those indexes, but not the second and vis versa.
* create indexes on columns which are frequently used in WHERE and JOINS.guidelines for selecting clustered indexe key
* columns that contain large number of unique values
* queries that returns range of values.
* columns that are accessed sequentially
* queries that returns large resultset
* columns which are not updated frequently.guidelines for selecting nonclustered indexe key
* queries dont return large resultset
* columns frequently involved in WHERE having exact match

SQL Server 2005 Blocking

Database queries should be able to execute concurrently without errors and within acceptable wait times. When they don't, and when the queries behave correctly when executed in isolation, you will need to investigate the causes of the blocking. Generally, blocking is caused when a SQL Server process is waiting for a resource that another process has yet to release. These waits are most often caused by requests for locks on user resources. SQL Server 2005 has added some important new tools that adds to this toolkit. These tools include:Enhanced System Monitor counters (Perfmon) DMV's: sys.dm_os_wait_stats, sys.dm_os_waiting_tasks and sys.dm_tran_locks Blocked Process Report in SQL Trace SQLDiag Utility In System Monitor, the Processes Blocked counter in the SQLServer:General Statistics object shows the number of blocked processes. The Lock Waits counter from the SQLServer:Wait Statistics object can be added to determine the the count and duration of the waiting that is occurring. The Processes blocked counter gives an idea of the scale of the problem, but only provides a summary , so further drill-down is required. DMV's such as sys.dm_os_waiting_tasks and sys.dm_tran_locks give accurate and detailed blocking information.The sys.dm_os_waiting_tasks DMV returns a list of all waiting tasks, along with the blocking task if known. There are a number of advantages to using this DMV over the sp_who2 or the sysprocesses view for detecting blocking problems:The DMV only shows those processes that are waiting sys.dm_os_waiting_tasks returns information at the task level, which is more granular than the session level. Information about the blocker is also shown sys.dm_os_waiting_task returns the duration of the wait, enabling filtering to show only those waits that are long enough to cause concern The sys.dm_os_waiting_task DMV returns all waiting tasks, some of which may be unrelated to blocking and be due to I/O or memory contention. To refine your focus to only lock-based blocking, join it to the sys.dm_tran_locks DMV.

SQL SERVER - Isolation levels

Whenever a statement is executed, or a data modification is made, it runs under the influence of an isolation level. The isolation level directly impacts the performance of SQL Server, along with the integrity of the data.

Essentially, an isolation level determines the degree to which a data modification is isolated from another data modification, or a database modification is isolated from a statement (such as a SELECT statement), or how a statement is isolated from another statement.

More specifically, isolation levels are used to control the following:
If any locks are taken when data is read, and what kind of locks are being requested.
How long read locks are kept.
And last, they determine if a read operation acting on a row that is experiencing a data modification, (1) blocks the data modification until the row is unlocked, (2) retrieves the committed version of the row before the start of the data modification, or (3) reads the uncommitted data modified by the data modification.
Traditionally, SQL Server has supported four isolation levels:
Read Uncommitted: This is the lowest isolation level. It only isolates transactions and activities to ensure that physically corrupt data is never read. It allows dirty reads, nonrepeatable reads, and phantom reads.
Read Committed: This isolation level does not permit dirty reads, but does allow nonrepeatable reads and phantom reads. This is the default isolation level for SQL Server, and is used for each connection to SQL Server unless one of the other isolation levels has manually been set for a connection.
Repeatable Read: This isolation level does not permit dirty reads or nonrepeatable reads, but does allow phantom reads.
Serializable Read: This is the highest isolation level and ensures that all transactions and statements are completely isolated from each other. It does not allow dirty reads, nonrepeatable reads, or phantom reads.
Read Uncommitted provides the best concurrency, but the least data integrity. On the other extreme, Serializable Read provides the worst concurrency, but the highest level of data integrity. As you know, the higher the concurrency, the better SQL Server performs, and the lower the concurrency, the worse SQL Server performs. As a DBA, it is your job to balance the needs between data integrity and concurrency by selecting an appropriate isolation level.

In SQL Server 2005, two new isolation levels are introduced, both of which use row versioning. They include:
READ_COMMITTED_SNAPSHOT (statement level)
ALLOW_SNAPSHOT_ISOLATION (transaction level)
The purpose of these new isolation levels is to give DBAs or developers more granularity in their choice of isolation levels, with the goal of improving read concurrency. When read concurrency is increased, there are fewer locks to be managed by SQL Server. This results in less locking resources required by SQL Server, along with less blocking. This in turn boosts SQL Server's performance.
When a row versioning-based isolation level (which includes the two new ones we are now discussing) is enabled at the database level, the database engine maintains versions of each row that is modified (for an entire database). Whenever a transaction modifies any row, an image of the row before the modification is copied into a page of what is called the version store. The version store is located in the tempdb database and is used for temporary storage of versioned rows for all of the databases on a single SQL Server instance.

April 08, 2009

Deadlock

A deadlock is different to blocking, which is what I think you mean when you say that " ... The update SPs are fairly simple and should update fast enough to avoid a deadlock [Block] ..."
A block is not a deadlock, while a deadlock is a special kind of stalled block.A deadlock doesn't time out. SQL finds deadlocks, chooses a victim and kills it, allowing the other transaction to proceed - hopefully before it times out.
So what is a deadlock. It is a mutal thing, where generally two processes each have a resource, and want a resource that the other has (and like kids, wont budge!)
What causes deadlocks is insert/update (generally) code, that accesses the tables in different orders.
If all transactions update tables in the same order, then you cannot have deadlock. One transaction may need to wait for the other, but they won't both be waiting on each other.
To see a little more, run these tracesdbcc traceon( 3604 )dbcc traceon( 1204 )
You will need to do this each time the server restarts.
Now in the SQL Error Log, you have far more details from both processes.
C:\Program Fiiles\Microsoft SQL Server\MSSQL\LOG

Deadlocks aren't caused by people/systems accessing the same data at the same time. They're caused by situations where processes cannot resolve locks.
For example, process A has to lock table X, and then lock table Y, while process B has to lock table Y, and then table X. If A has a lock on X and wants a lock on Y, but B has a lock on Y and wants a lock on X, neither one can finish, thus a "deadlock" occurs, and SQL kills one of them.
The main way to solve that is to make sure that processes access and lock data in the same sequence as each other. No deadlocks that way.
I'd definitely analyze the deadlocks, find what's causing them, and fix that, rather than just taking the "nolock shotgun" and hoping you hit something with it. Using nolock as a solution for deadlocks is just asking for dirty reads and junked up data. Lots of people are taught to use nolock for this kind of thing, but aren't made aware of the fact that it can result in serious data problems.
For example, user A takes a look at row 3 of table X. Meanwhile, process B updates that row, but user A never sees the update, because of nolock and such. So user A updates the row, overwriting what process B did. Now, is that row of data correct? Process B's update is gone, as if it had never been done, and user A may have wanted a different update than the one he did, if he knew the data was being changed by process B. It can make a big mess.

March 13, 2009

How to deploy database in production server

Few ways ....
1) If you have all the scripts for all objects.. then create new db run the scripts.
2) Backup database and restore on to production
3) Deattach-reattach to production.

Exclusive lock on a particular table for select * from statement

select * from statement gets an exclusive lock on table which is having less than 10K rows . Activity monitor shows lock by processes for an objects named dbname.dbo.spt_fallback_db.
spt_fallback_db is in master and is a system table.
The issue get solved when reorganized the index in that table.
The optimizer will lock objects if it deems it will yield the best query performance. Out of date statistics and a fragmented index are a prime combo for this type of thing

March 04, 2009

Find Text in all columns of all tables in a Database

CREATE PROCEDURE GetColumnVal
@TEXT VARCHAR
AS
BEGIN
DECLARE @TABLES TABLE
([id] INT IDENTITY(1,1),
TableName VARCHAR(500),
ColumnName VARCHAR(500)
)
INSERT INTO @TABLES(TableName, ColumnName)
SELECT O.[NAME], C.[NAME]
FROM SYSOBJECTS O
JOIN SYSCOLUMNS C
ON C.ID = O.ID
WHERE O.XTYPE = 'U'
AND C.XTYPE NOT IN
(
127 --bigint
, 173 --binary
, 104 --bit
, 61 --datetime
, 106 --decimal
, 62 --float
, 34 --image
, 56 --int
, 60 --money
, 108 --numeric
, 59 --real
, 58 --smalldatetime
, 52 --smallint
, 122 --smallmoney
, 189 --timestamp
, 48 --tinyint
, 36 --uniqueidentifier
, 165 --varbinary
)
ORDER BY O.[NAME], C.[NAME]
IF EXISTS (SELECT NAME FROM TEMPDB.DBO.SYSOBJECTS WHERE NAME LIKE '#TMPREPORT%')
BEGIN
DROP TABLE #TMPREPORT
END
CREATE TABLE #TMPREPORT(COUNTER INT, TABLENAME VARCHAR(500), COLUMNNAME VARCHAR(500))
DECLARE @CNTR INT, @POS INT, @TableName VARCHAR(500), @ColumnName VARCHAR(500), @SQL VARCHAR(8000)
SELECT @POS = 1, @CNTR = MAX([ID]), @TableName = '', @ColumnName = ''
FROM @TABLES
--SELECT @POS, @CNTR, * FROM @TABLES
WHILE @POS <= @CNTR
BEGIN
SELECT @TableName = TableName, @ColumnName = ColumnName
FROM @TABLES
WHERE [ID] = @POS
SELECT @SQL = 'SELECT COUNT(*), ''' + @TABLENAME + ''' [TABLE],''' + @COLUMNNAME + '''[COLUMN] FROM ' + @TableName + ' WHERE CAST(' + @ColumnName + ' AS VARCHAR) LIKE ''%' + @TEXT + '%'''
--PRINT @SQL
BEGIN TRY
INSERT INTO #TMPREPORT(COUNTER, TABLENAME, COLUMNNAME)
EXEC(@SQL)
END TRY
BEGIN CATCH
PRINT @@ERROR
PRINT @SQL
END CATCH
SELECT @POS = @POS + 1
END
SELECT * FROM #TMPREPORT WHERE COUNTER > 0
DROP TABLE #TMPREPORT
End

March 03, 2009

Does the order of columns in an index matter?

A single column index is fairly straightforward to understand. You may have heard it compared to the index in the back of a technical book. When you want to find some information in the book, say DBCC INPUTBUFFER, you turn to the index in the back of the book and look up DBCC INPUTBUFFER. The index doesn't actually contain the information on DBCC INPUTBUFFER, it has a pointer to the page in the book where the command is described. So, what do you do? You turn to that page and read all about how DBCC INPUTBUFFER may be used. This is a good analogy for a single column non-clustered index.

Composite Indexes
In Microsoft SQL Server, you can also create an index that contains more than one column. These are known as composite indexes. A good analogy for a composite index is the telephone book.
A telephone book lists every individual in the area who has a publicly available telephone number. It's organized not by one column, but two: last name, first name. (Ignoring the middle initial that is sometimes listed but most often treated as an extension of the person's first name). If you wanted to look up someone in the telephone book, you'd first navigate to the last name and then the first name. For example to find Jake Smith, you'd first locate the Smiths, then within the Smiths, you'd find Jake. The same holds true for a composite SQL Server index.
This is all good to know, but how does it really affect query resolution?
Well, let's consider an example. Let's assume you have a Customers table as described below.CREATE TABLE Customers ( Customer_ID INT NOT NULL IDENTITY(1,1) ,Last_Name VARCHAR(20) NOT NULL ,First_Name VARCHAR(20) NOT NULL ,Email_Address VARCHAR(50) NULL )
It has a clustered index on Customer_ID and composite index on the Last_Name, First_Name columns as shown below.CREATE CLUSTERED INDEX ix_Customer_ID ON Customers(Customer_ID)CREATE INDEX ix_Customer_Name ON Customers(Last_Name, First_Name)
Finding a specific row
To find a specific row, we could execute the following query.SELECT *FROM CustomersWHERE Last_Name = 'smith' AND First_Name = 'Jake'
It should be pretty obvious that the ix_Customer_Name index would work well to satisfy this query. A quick look at the execution plan confirms our expectations.

Finding a last name
Now, let's broaden our search a bit to retrieve all customers whose last name is Smith. The following query may be executed for this. SELECT *FROM CustomersWHERE Last_Name = 'smith'
Looking at the query execution plan, we can see that SQL Server did indeed use the ix_Customer_Name composite index; it performed an index seek to find the rows that satisfied the query, then it used a Key Lookup to retrieve the non-indexed column information. You'll notice that this time, however, more work was expended in the Key Lookup than in the Index Seek.

Returning to our telephone book analogy, we can see why this index was deemed efficient by the Query Optimizer. To find all of the Smiths in the telephone book, we'd navigate to the page that contains the first Smith and keep moving forward until we found something other than Smith.
Finding a first name
Now, let's see what happens if we need to find all people who have a first name of Jake. Let's execute the following query.SELECT *FROM CustomersWHERE First_Name = 'Jake'
This yields the following query execution plan.

Notice that this time, SQL Server used a Clustered Index Scan to resolve the query. This is tantamount to a complete table scan. It did not use our ix_Customer_Name index.
Once again, returning to the telephone book example, we can see why. Think about find all of the Jakes in the telephone book. You'd have to start on the first page of the book and look through every entry. Why? Because it's not organized by first name; it's organized by last name, first name.
Does this mean that composite indexes are worthless? No. They have great value to SQL Server. In our first example we were able to use the ix_Customer_Name index to navigate directly to Smith, Jake. We just need to give considerable forethought to the kinds of queries our applications will be submitting to our server and create the appropriate indexes to handle those queries.
Cheers!

Nested Stored Procedure with Begin Transaction and Rollback

Nested Stored Procedure with Begin Transaction and Rollback: It is mentionned in the SQL Documentation that a Rollback is full rollback (all levels) are rolled back, in contrast to the commit that only commits the current level. you can't execute a rollback anywhere other than the outermost transaction if you want to avoid errors. BOL says:Naming multiple transactions in a series of nested transactions with a transaction name has little effect on the transaction. Only the first (outermost) transaction name is registered with the system. A rollback to any other name (other than a valid savepoint name) generates an error. None of the statements executed before the rollback are in fact rolled back at the time this error occurs. The statements are rolled back only when the outer transaction is rolled back.

When you use nested sp with transactions, that is you using nested transaction. When use nested transactions without name it, when you call rollback tran, it will rollback the outer most transaction. So it set @@TranCount to 0. You can try the code below to test:

begin tran
print @@trancount
begin tran
print @@trancount
begin tran
print @@trancount
commit tran
print @@trancount
begin tran
print @@trancount
rollback tran
print @@trancount

You can't rollback (but commit OK) nested (inner) transaction with name. Because only the first (outermost) transaction is registered with the system. A rollback to any other name (other than a valid savepoint name) generates an error.
Test this code:begin tran T1
print @@trancount
begin tran T2
print @@trancount
begin tran T3print @@trancount
commit tran T3 --OK
print @@trancount
rollback tran T2 --Error
print @@trancount
So, if you want to rollback at inner trans, use :
SAVE TRANSACTION MyName IF (@@error <> 0)BEGIN ROLLBACK TRANSACTION MyNameEND

January 30, 2009

Best Practices for Date/Time Calculations in SQL Server

Arithmetic Operations on Time Portion of Date/Time DataSo far this article has discussed the arithmetic operations on the date portion of data/time values. So when you try to run SELECT GETDATE() + 1, the second addend is implicitly considered the number of days that have to be added to the value returned by the GETDATE() function. But what if you want to add one hour or one minute to the current datetime value? Using the SQL Server DATEADD() function you can do that very easily, but using addition or substraction operators will be tricky.
Let's examine how to do that. One day consists of 24 hours or 1,440 minutes or 86,400 seconds. So if you run the following statement:
SELECT GETDATE() + 1/24
You probably will get the current datetime plus one hour. In reallity, however, you cannot use just 1/24 because both dividend and divisor are integers, and the result of dividing them will be zero. You need to convert the integers to decimal or float data types as follows in order to get the correct result:
-- How to add one hour
---------------------------------------
SELECT GETDATE()
SELECT DATEADD(hh, 1, GETDATE())
SELECT GETDATE () + CAST(1 AS dec(9,4))/ CAST (24 AS dec(9,4))
SELECT GETDATE () + CAST(1 AS dec)/ CAST (24 AS dec)
SELECT GETDATE () + CAST (1 AS float)/ CAST (24 AS float)
Results:
2007-03-25 20:31:13.870
2007-03-25 21:31:13.870
2007-03-25 21:31:13.867
2007-03-25 21:31:13.870
2007-03-25 21:31:13.870
-- How to add one minute
---------------------------------------
SELECT GETDATE()
SELECT DATEADD(mi, 1, GETDATE())
SELECT GETDATE () + CAST(1 AS dec(9,4))/ CAST (1440 AS dec(9,4))
SELECT GETDATE () + CAST(1 AS dec(18,9))/ CAST (1440 AS dec(18,9))
SELECT GETDATE () + CAST (1 AS float)/ CAST (1440 AS float)
Results:
2007-03-25 20:35:15.127
2007-03-25 20:36:15.127
2007-03-25 20:36:15.123
2007-03-25 20:36:15.127
2007-03-25 20:36:15.127
-- How to add one second
---------------------------------------
SELECT GETDATE()
SELECT DATEADD(ss, 1, GETDATE())
SELECT GETDATE () + CAST(1 AS dec(9,4))/ CAST (86400 AS dec(9,4))
SELECT GETDATE () + CAST(1 AS dec(18,9))/ CAST (86400 AS dec(18,9))
SELECT GETDATE () + CAST(1 AS dec(24,18))/ CAST (86400 AS dec(24,18))
SELECT GETDATE () + CAST (1 AS float)/ CAST (86400 AS float)
Results:
2007-03-25 20:42:26.617
2007-03-25 20:42:27.617
2007-03-25 20:42:27.613
2007-03-25 20:42:27.613
2007-03-25 20:42:27.613
2007-03-25 20:42:27.617
-- How to add one second, using variables
------------------------------------------
DECLARE @dec1 dec(24,18), @dec2 dec(24,18), @dec3 dec(24,18), @dt datetime
SELECT @dec1 = 1, @dec2 = 86400, @dt = GETDATE();
SELECT @dec3 = @dec1 / @dec2;
SELECT @dt
SELECT DATEADD(ss, 1, @dt)
SELECT @dt + @dec3
SELECT @dt + CAST (1 AS float)/ CAST (86400 AS float)
Results:
2007-03-25 20:49:16.817
2007-03-25 20:49:17.817
2007-03-25 20:49:17.813
2007-03-25 20:49:17.817
As you can see from the last example, the SQL Server function DATEADD() works perfectly, but an addition operator may cause a problem. For example, when you try to add one hour or one minute, you need to find a sufficient precision for decimal data type. Otherwise, the result will be slightly inaccurate. However, when you try to add one second, applying an addition operator and decimal conversion, you won't be able to get the exact result at all.
On the other hand, the float conversion looks precise and safe for the time calculations with an addition operator, but if you start to use it you may run into a problem: duplicates and missing values. To illustrate and understand the problem, create and populate an auxillary table:
SET NOCOUNT ON;
DECLARE @max int, @cnt int;
SELECT @cnt = 10000;
IF EXISTS(SELECT * FROM sysobjects
WHERE ID = (OBJECT_ID('sequence')) AND xtype = 'U')
DROP TABLE sequence;
CREATE TABLE sequence(num int NOT NULL);
INSERT INTO sequence VALUES(1);
SELECT @max = 1;
WHILE(@max <= @cnt)
BEGIN
INSERT INTO sequence
SELECT @max + num FROM sequence;
SELECT @max = MAX(num) FROM sequence;
END
When you run this script, it will insert 16,384 sequential numbers into the table sequence. (The number 16,384 doesn't have any special meaning. It was selected for illustration purposes only.)
Now, generate the sequence of hours using the auxillary table and SQL Server's date/time function as follows:
IF EXISTS(SELECT * FROM sysobjects
WHERE id = OBJECT_ID('test'))
DROP TABLE test;
SELECT num, DATEADD(hh, num, 'Dec 31, 2006 23:00:00') dt
INTO test
FROM sequence;
SELECT * FROM test;
Results:
num dt
----------- -----------------------
1 2007-01-01 00:00:00.000
2 2007-01-01 01:00:00.000
3 2007-01-01 02:00:00.000
. . . . . . . . . . . . . . . . . .
3099 2007-05-10 02:00:00.000
3100 2007-05-10 03:00:00.000
. . . . . . . . . . . . . . . . . .
16381 2008-11-13 12:00:00.000
16382 2008-11-13 13:00:00.000
16383 2008-11-13 14:00:00.000
16384 2008-11-13 15:00:00.000
The function DATEADD() works as expected and generates a sequence of datetime values with one-hour intervals.
Now, try to roll up the sequence you just generated using the same the SQL Server DATEADD() function:
SELECT DISTINCT DATEADD(hh, -num, dt) FROM test
Results:
2006-12-31 23:00:00.000
The last result proves that SQL Server's date/time function generates date/time values properly. In order to test the solution that uses an arithmetic operator (+), run the following example:
DECLARE @time float
SELECT @time = CAST(1 as float)/CAST(24 as float)
IF EXISTS(SELECT * FROM sysobjects
WHERE id = OBJECT_ID('test'))
DROP TABLE test;
SELECT num, (CAST('Dec 31, 2006 23:00:00' AS datetime) + @time * num) dt
INTO test
FROM sequence;
SELECT * FROM test;
num dt
----------- -----------------------
1 2007-01-01 00:00:00.000
2 2007-01-01 01:00:00.000
3 2007-01-01 02:00:00.000
4 2007-01-01 03:00:00.000
5 2007-01-01 03:59:59.997
6 2007-01-01 05:00:00.000
7 2007-01-01 05:59:59.997
8 2007-01-01 07:00:00.000
9 2007-01-01 08:00:00.000
10 2007-01-01 08:59:59.997
. . . . . . . . . . . . . . . . . .
16380 2008-11-13 11:00:00.000
16381 2008-11-13 11:59:59.997
16382 2008-11-13 12:59:59.997
16383 2008-11-13 14:00:00.000
16384 2008-11-13 14:59:59.997
You will find that an addition operator produces inaccurate results. Sometimes they differ from the expected ones by 3 ms. If you try to roll up the sequence of generated date/time values, you will get more than one date/time "seed" value (as in the following example) and that is incorrect:
SELECT DISTINCT DATEADD(hh, -num, dt) FROM test
Results:
2006-12-31 22:59:59.997
2006-12-31 23:00:00.000
You may say that +/- 3 ms precision is acceptable for most applications, but look how that seemingly tiny problem can produce a bigger one:
SELECT CONVERT(varchar(100), dt, 100)
FROM test
ORDER BY num
Results:
Jan 1 2007 12:00AM
Jan 1 2007 1:00AM
Jan 1 2007 2:00AM
Jan 1 2007 3:00AM
Jan 1 2007 3:59AM
Jan 1 2007 5:00AM
Jan 1 2007 5:59AM
. . . . . . . . . .
Nov 13 2008 5:59AM
Nov 13 2008 6:59AM
Nov 13 2008 8:00AM
Nov 13 2008 8:59AM
Nov 13 2008 9:59AM
Nov 13 2008 11:00AM
Nov 13 2008 11:59AM
Nov 13 2008 12:59PM
Nov 13 2008 2:00PM
Nov 13 2008 2:59PM
This example uses a CONVERT() function to produce a different date/time format. As a result, inaccuracies in the generated values increased from 3 ms to 1 minute and became unacceptable. However, this is not the only problem. If you try to generate the sequences of minutes or seconds, things become even worse. Look at this example:
DECLARE @time float
SELECT @time = cast(1 as float)/cast(1440 as float)
IF EXISTS(SELECT * FROM sysobjects
WHERE id = OBJECT_ID('test'))
DROP TABLE test;
SELECT num, (CAST('Dec 31, 2006 23:59:00' AS datetime) + @time * num) dt
INTO test
FROM sequence;
SELECT * FROM test;
Results:
num dt
----------- -----------------------
1 2007-01-01 00:00:00.000
2 2007-01-01 00:01:00.000
. . . . . . . . . . . . . . . . . .
1579 2007-01-02 02:17:59.997
1580 2007-01-02 02:19:00.000
1581 2007-01-02 02:19:59.997
. . . . . . . . . . . . . . . . . .
16382 2007-01-12 09:01:00.000
16383 2007-01-12 09:01:59.997
16384 2007-01-12 09:03:00.000
As you can see, there are inaccuracies in the generated values again. In addition, when you try to convert these values to another format as follows, you will get duplicated or missing dates:
SELECT CONVERT(varchar(100), dt, 100)
FROM test
ORDER BY num
Results:
Jan 1 2007 12:00AM
Jan 1 2007 12:01AM
. . . . . . . . . .
Jan 2 2007 12:00AM
Jan 2 2007 12:00AM
Jan 2 2007 12:02AM
Jan 2 2007 12:02AM
Jan 2 2007 12:04AM
Jan 2 2007 12:04AM
. . . . . . . . . .
Jan 12 2007 9:00AM
Jan 12 2007 9:01AM
Jan 12 2007 9:01AM
Jan 12 2007 9:03AM
For instance, there are two values "Jan 02, 2007 12:02AM", but the value "Jan 02, 2007 12:03AM" is missing.
If you want to see the list of all duplicates, you can run the following query:
SELECT COUNT(*), CONVERT(varchar(100), dt, 100)
FROM test
GROUP BY CONVERT(varchar(100), dt, 100)
HAVING COUNT(*) > 1
ORDER BY 2
Finally, you can generate the sequence of seconds using the same approach:
DECLARE @time float
SELECT @time = cast(1 as float)/cast(86400 as float)
IF EXISTS(SELECT * FROM sysobjects
WHERE id = OBJECT_ID('test'))
DROP TABLE test;
SELECT num, (CAST('Dec 31, 2006 23:59:59' AS datetime) + @time * num) dt
INTO test
FROM sequence;
SELECT * FROM test;
Results:
num dt
----------- -----------------------
1 2007-01-01 00:00:00.000
2 2007-01-01 00:00:01.000
3 2007-01-01 00:00:02.000
4 2007-01-01 00:00:03.000
5 2007-01-01 00:00:03.997
6 2007-01-01 00:00:05.000
7 2007-01-01 00:00:06.000
8 2007-01-01 00:00:07.000
9 2007-01-01 00:00:08.000
10 2007-01-01 00:00:08.997
11 2007-01-01 00:00:09.997
. . . . . . . . . . . . . . . . . .
16382 2007-01-01 04:33:00.997
16383 2007-01-01 04:33:02.000
16384 2007-01-01 04:33:03.000
SELECT CONVERT(varchar(100), dt, 120)
FROM test
ORDER BY num
Results:
2007-01-01 00:00:00
2007-01-01 00:00:01
2007-01-01 00:00:02
2007-01-01 00:00:03
2007-01-01 00:00:03
2007-01-01 00:00:05
. . . . . . . . . .
2007-01-01 04:32:55
2007-01-01 04:32:56
2007-01-01 04:32:56
2007-01-01 04:32:58
2007-01-01 04:32:59
2007-01-01 04:33:00
2007-01-01 04:33:00
2007-01-01 04:33:02
2007-01-01 04:33:03
This last example has the same problems as the previous one. In addition, using arithmetic operators for date/time manipulations can lead to other errors, weird results, degradation in performance, and more problems than are discussed here. You can avoid all these problems by using the SQL Server's date/time functions.