Data Engineering with Avishkar: 2010

October 22, 2010

Databse design question

CREATE TABLE [dbo].[CustomerMaster]([CustomerId] [int] IDENTITY(1,1) NOT NULL,[CustomerCode] [varchar](30),[CustomerName] [varchar](200),[CreatedDate] [datetime],[ContactNo] [varchar](20) CONSTRAINT [PK_CustomerId] PRIMARY KEY CLUSTERED ([CustomerId] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerDocuments]([DocAttachmentId] [int] IDENTITY(1,1) NOT NULL,[CustomerId] [int] NOT NULL,[DocumentType] [varchar](5),[DocumentName] [varchar](200) NULL,CONSTRAINT [PK_CustomerDocs] PRIMARY KEY CLUSTERED ([DocAttachmentId] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]

ALTER TABLE [dbo].[CustomerDocuments] WITH CHECK ADD CONSTRAINT [FK_DocAttachment_Customer] FOREIGN KEY([CustomerId])REFERENCES [dbo].[CustomerMaster] ([CustomerId])GO
ALTER TABLE [dbo].[CustomerDocuments] CHECK CONSTRAINT [FK_DocAttachment_Customer]GO
Suppose there are 20 rows in customer master and 2 documents for each customer in CustomerDocuments table,what will be the output after the following statement,
TRUNCATE TABLE CustomerDocuments
TRUNCATE TABLE CustomerMaster
SELECT IDENT_CURRENT( 'CustomerMaster' )

Answer: Error while truncating
Explanation: TRUNCATE Statement cannot be executed on tables with Foreign key references. AS the CustomerMaster table is referenced by CustomerDocuments table the statement will fail to execute.

TRUNCATE TABLE CustomerDocuments -- Completed successfully
TRUNCATE TABLE CustomerMaster -- Cannot truncate table 'CustomerMaster' because it is being referenced by a FOREIGN KEY constraint.
SELECT IDENT_CURRENT( 'CustomerMaster' ) --1

October 13, 2010

Best practises for being a good DBA.

1.Study, Understand and grasp firmly the database concepts and architecture.
2.Begin with small project and practice, practice, practice. small projects are good to lear the basics.
3.A good DBA means well experienced DBA. A DBA becomes experienced as the time passes. So first of all you need time. It is in your hands to make shorter this time. Databases shows their dirty sides with the large amount of data and under heavy pressure. So if you want to become a good DBA in a short time :
-Try to read a lot (a lot means realy a lot). Mostly read best practises.
-Examine big queries and their effects on the database.
-Try to work on a database which have a large amount of data with a heavy usage.

October 05, 2010

Windows Task Scheduler

By using Task Scheduler, you can schedule tasks such as to automate web testing and site monitoring or system tools like the Disk Defragmenter to run at a time that is most convenient for you. Task Scheduler starts each time you start Windows, and runs in the background. With Task Scheduler, you can: Schedule a task to run daily, weekly, monthly, or at certain times (such as system startup).

On Windows XP and Server 2003 you can access this from the Start Menu and clicking on Settings and then Control Panel to Scheduled Tasks.
Windows administrators can easily manage Scheduled Tasks using command line utility named schtasks.exe which is available in Windows XP/2003/7/2008/2008 R2 operating systems.

Create a scheduled task to run notepad at 8:00AM every day:

C:\>schtasks /create /TN "Run notepad" /TR notepad.exe /SC DAILY /ST 08:00:00

Note: Here in above example, we are mentioning any user account. But the command execution will try to configure scheduled account with current logged on user and prompts you to enter the password.
In case, you want to create a scheduled task with specific user account, pass them as command arguments.

C:\>schtasks /create /TN "Run notepad" /TR notepad.exe /SC DAILY /ST 08:00:00 /RU domain\username /RP xxxxxx

We are done with creating scheduled tasks. Now, let’s see how to list the Scheduled Tasks running in local system as well as remote

Query list of scheduled tasks running in local computer.

C:\>schtasks /query

This lists all the scheduled tasks running in local computer.

e.g.

C:\>schtasks /query

TaskName Next Run Time Status
==================================== ======================== ===============
GoogleUpdateTaskMachineCore At logon time
GoogleUpdateTaskMachineCore Never
GoogleUpdateTaskMachineUA Never

Run a scheduled task manually:

If we want to run a scheduled task manually irrespective of it’s schedule, we can do it easily like below.

C:\>schtasks /run /TN “Run notepad”

This triggers the command which is configured in scheduled task(notepad in our example).

Modifying scheduled task:

Now we will see how to modify scheduled tasks. We will be able to modify only few parameters using this utility like, program to run, username, password used by scheduled task. Let’s see how we can change the program to run.

C:\> schtasks /change /TN "Run notepad" /TR cmd.exe

In this example, we have changed the program to run from notepad.exe to cmd.exe

Deleting scheduled tasks:

Below are the few examples which describes usage of delete option.

Examples:
SCHTASKS /Delete /TN * /F
SCHTASKS /Delete /TN "Backup and Restore"

As you can see in above example, /F option forces the delete. This is generally required if a scheduled task is not responding or in running state.

Stop a running scheduled task:

If a scheduled job is running for long time and want to stop it, we can use /END parameter. Now lets see a example how to end a scheduled task

C:\>schtasks /END /TN "Run notepad"

If you want to perform any of the above operations on a remote system, use the parameter /S in the command. For example, to query list of tasks scheduled in remote machine use the below command.

C:\>schtasks /query /S remotehost

e.g.

C:\>schtasks /query /s Remoteservername

TaskName Next Run Time Status
==================================== ======================== ===============
backup 02:00:00, 10/6/2010
DeleteSQLBackup7old 23:00:00, 10/10/2010
SQLVesrionControl Never
SQLVesrionControl 23:00:00, 10/5/2010
VSSToDev2000 Never

September 28, 2010

Security permission to see other session's information

In order to query to see other session information, a user needs specific permissions granted. To view the server-wide DMVs the user must be granted the VIEW SERVER STATE on the server. After running the following query as administrator

To query database specific DMVs a user must be granted the VIEW DATABASE STATE permission in each specific database.

If you want to deny a user permission to query certain DMVs you can use the DENY command and reference those specific views. And remember it's always better to grant permission to roles instead of individual users.

August 28, 2010

Index guidelines

Clustered index guidelines:

Good when queries select large number of adjacent rows (range queries)
Create on the frequently used columns (in JOINs and WHERE with “=“,

“<“, “>“, “BETWEEN”)
If number of returned rows is small – non-clustered index may be as

efficient
Preferred on narrow and highly selective columns..

Remember cost of maintenance:
Updates reorganize the table
Performance impact
Causes index fragmentation over time

Non-clustered index guidelines:

Create for frequent search columns
Use on narrow and highly selective columns
Place on foreign key constraints (for join queries)
Check the workload for “covering” queries
Consider adding included columns

The drawback: maintenance cost
Frequent updates will ruin perf where there are too many indexes
Evaluate benefits of [not] indexing small tables

July 13, 2010

4 W's of Production DBA

Why My server is hung?
A SQL Report outage. Why did this happen?
Who is responsible for outage?
What steps you will do to resolve it ASAP?
What steps you need to do so that it should not happen again?
When issue will resolved?

Most of the production DBAs daily face one of these 4 W's Why,What,Who,When.

Below is a simple guide to tackle these 4 W's :

1. Identify Factors Affecting Performance

Application
Database Design
Microsoft SQL Server
Operating System
Hardware

Now you got answer of Who

2. Resolve Performance bottleneck:

SQL Coding Issues:
Excessive Scanning
Poor use of indexes/missing Indexes
Out of Date Table Statistics
‘Where’ clause not limiting record set enough
Excessive Recompiles
Long blocking durations (multi-connections)

Where:
Limit number of Columns
Avoid <>, OR, and not
Indexed fields
Avoid calculated columns
Avoid functions
Avoid redundant criteria

Joins
Avoid Join Hints
Always Join on indexed columns
Use selectivity from table with fewest rows
Avoid Bookmark lookups

Nested Loop Join
Common When One Table is VerySmall (Candidate outer table) and Other is Large and Indexed on Join Columns (Candidate inner table)

Merge Join
Method for Joins with Sorted Inputs, Usually Inputs are Sequenced on Join Key
Common When Both Tables are Indexed on join key ,Common When Join Sequence Needs to be Preserved on Output.

Hash Join
Join of Disproportionately Sized Tables
Does Not Require Tables to beProcessed in Join Column Sequence
Does Not Produce Ordered Result
Highly Applicable to Ad-hoc Queries
Indexes Can’t be Anticipated

Hardware Issues:

Typical CPU scaling issues
Symptoms:

Plan compilation and recompilations
Plan reuse < 90% is bad
Parallel queries
Parallel wait type cxpacket > 10% of total waits
High runnable tasks or sos_scheduler_yield waits

Causes:

Queries not parameterized
Inefficient Query plan
Not enough stored procedures
MAXDOP is not set to 1
Statistics not updated
Table scan, range scan
SET option changes within SP

Typical IO Scaling Issues:

Symptoms:
High average disk seconds per read (> 10 msec) and write (> 2 msec)

for spindle based devices
Top 2 values for wait stats are one of - ASYNCH_IO_COMPLETION,

IO_COMPLETION, LOGMGR, WRITELOG, PAGEIOLATCH_x

Causes:

Aggravated by Big IOs such as table scans (bad query plans)
Non covering indexes
Sharing of storage backend – combine OLTP and DW workloads
TempDB bottleneck
Too few spindles, HBA’s

Typical Blocking Issues

Symptoms:

High average row lock or latch waits
Will show up in
sp_configure “blocked process threshold” and Profiler “Blocked process

Report”
Top wait statistics are LCK_x. See sys.dm_os_wait_stats.

Causes:
Higher isolation levels
Index contention
Lock escalation
Slow I/O
Sequence number problem

Typical Memory Issues
Symptoms:

Page life expectancy < 300 secs
SQL Cache hit ratio < 99%
Lazy writes/sec constantly active
Out of memory errors

Causes:

Too many large scans (I/O)
Bad query plans
External (other process) pressure

July 07, 2010

Temporary Tables

Temporary Tables

# tells SQL Server that this table is a local temporary table. This table is only visible to this session of SQL Server. When we close this session, the table will be automatically dropped. They are backed by physical disk and are even logged into the transaction log. They act like regular tables in that you can query their data via SELECT queries and modify their data via UPDATE, INSERT, and DELETE statements.

CREATE TABLE #TempTable (
Id int,
Desc char(30) )

select name
from tempdb..sysobjects
where name like '#TempTable%'

name
------------------------------------------------------------------------------------
#TempTable_________________________ . . .________00000000001D

Temporary tables act like physical tables in many ways. You can create indexes and statistics on temporary tables. You can also apply Data Definition Language (DDL) statements against temporary tables to add primary keys, DEFAULTs . You can also add and drop columns from temporary tables.

ALTER TABLE #TempTable ADD CONSTRAINT [DF_Desc] DEFAULT ('Test') FOR [Desc]

ALTER TABLE #TempTable add PID int DEFAULT (10) not null

ALTER TABLE #TempTable ADD PRIMARY KEY CLUSTERED ([PID])
GO

CREATE nonclustered index Ix_desc on #TempTable([Desc])

SQL Server stores the object with a some type of unique number appended on the end of the name. It does all this for you automatically. You just have to refer to #TempTable in your code.

If two different users both create a #TempTable table each will have their own copy of it. The exact same code will run properly on both connections. Any temporary table created inside a stored procedure is automatically dropped when the stored procedure finishes executing. If stored procedure P1 creates a temporary table and calls stored procedure P2, then P2 will be able to use the temporary table that P1 created. It's generally considered good coding practice to explicitly drop every temporary table you create. If you are running scripts through SQL Server Management Studio or Query Analyzer the temporary tables are kept until you explicitly drop them or until you close the session.

Local temporary tables are visible only in the current session... ... Temporary tables are automatically dropped when they go out of scope, unless explicitly dropped using DROP TABLE.

Scopes/Sessions are an interesting thing in sql server.
They follow the rule of inheritance, same as transactions:
A child has all parents' "objects" plus it's own.
So what this means is that you can access a #temp table created
in the parent process from the child of that parent,
but not vice versa.
Any dynamic SQL execution with either exec() or sp_executeSQL is ran in
a child session/scope of the current one.

To check this, pl run below set of queries.

Set1:

exec('select * into #TempTable2 from #TempTable; select * from #TempTable2')
select * from #TempTable2
drop table #TempTable2

Set2:

select * into #TempTable2 from #TempTable
exec('select * from #TempTable2')
select * from #TempTable2
drop table #TempTable2

You can treat this table just like any other table with a few exceptions. The only real major one is that you can't have foreign key constraints on a temporary table.

Global Temporary Tables

You can also create global temporary tables. These are named with two pound signs. For example, ##TempTable is a global temporary table. Global temporary tables are visible to all SQL Server connections. When you create one of these, all the users can see it.
Global temporary tables are automatically dropped when the session that created the table ends and all other tasks have stopped referencing them. The association between a task and a table is maintained only for the life of a single Transact-SQL statement. This means that a global temporary table is dropped at the completion of the last Transact-SQL statement that was actively referencing the table when the creating session ended.

Table variables are similar to temporary tables except with more flexibility and they always stay in memory.Table variables don't need to be dropped when you are done with them.

June 24, 2010

Generally we backed up tables and SPs for safer side with date suffix.
Below is cleaup script for those objects which might not be used after successfull deployment.

-- To drop all tables suffix with dates ( backup unused tables)

SELECT 'DROP TABLE '+s.name+'.'+o.name
FROM sys.schemas s
JOIN sys.objects o on s.schema_id=o.schema_id
AND o.type='U'
WHERE SUBSTRING(o.name,PATINDEX('%[^' + 'a-z._' + ']%',o.name),1) !=''

-- To drop all procedures suffix with dates

SELECT 'drop procedure '+s.name+'.'+o.name
FROM sys.schemas s
JOIN sys.objects o on s.schema_id=o.schema_id
AND o.type='P'
WHERE SUBSTRING(o.name,PATINDEX('%[^' + 'a-z._' + ']%',o.name),1) !=''

June 17, 2010

The difference between estimated and actual plans

Optimization takes place before execution, so in one sense, any query plan is an estimated plan. But when you request an actual plan, SQL Server will actually execute each statement as it displays the plan, so the plan for subsequent statements can change after the earlier statements are executed. In addition, SQL Server adds additional information to the actual plan, after the execution takes place. As mentioned last time, the actual plan includes the number of processors used to execute the query. It will also include the number of rows returned from each step of the plan as it is executed, and the number of times each step is executed. If you are executing a plan that has already been cached and is now being reused, the actual plan will include both the parameter values the plan was originally compiled with and the parameter values used during execution.

i) You cannot use estimated plans for temp tables, you will get an error because the plan for the SELECT cannot be generated when the temp table does not exist

ii) One of the main reasons that the actual plan may different from the estimated plan is because of data changes to your data. In general (and assuming you have the option 'auto update statistics' enabled) if more than 20% of the data in a table changes, the optimizer will detect stale statistics and update them automatically. The updated statistics will then trigger a recompile.

iii) parameterised stored procedure - The first time you run the procedure, a estimated plan will be built based on the first parameter. The second time you run the procedure, the original plan will be used, and the statistics should show that SQL Server is performing more reads than there are pages in the table.
Seeing the different values used for compilation and execution can lead you to suspect an issue with parameter sniffing.

You can request that SQL Server display ESTIMATED plans without executing the query with any of the following options:

SET SHOWPLAN_TEXT ON

SET SHOWPLAN_ALL ON

SET SHOWPLAN_XML ON

For graphical estimated plans, you can use the Query | Display Estimated Execution Plan menu option. This can be invoked with a toolbar button, or with Cntl-L.

You can request that SQL Server display actual plans with any of the following options:

SET STATISTICS PROFILE ON

SET STATISTICS XML ON

June 16, 2010

Indexes in one-to-many relationship between two tables

Whenever you have a one-to-many relationship between two tables, you'll have a foreign key reference in the table on the many side of the relationship.

In most database systems, the primary key is automatically indexed; that's not always the case with the foreign key.

To ensure best performance on joins, make sure that every foreign key in the database is indexed. The next candidate for indexing is any column that will be used for sorting — any column that will consistently be used in an ―order by‖ clause of a SQL query.

You'll also want to index columns that will be used for restricting the returned data set, such as those that consistently appear in ―where clauses.

June 08, 2010

SQL Server 2008 HIERARCHYID Data Type

Very often while a designing database you may need to create a table structure to store hierarchical data (Hierarchal data is defined as a set of data items related to one another in a hierarchy, that is, a parent node has a child and so forth). For example,
• To represent organizational hierarchy
• To represent a family tree
• To store BOM (Bills of Material) information which represents collection of sub-components which make up a final component.
Organizations have struggled in past with the representation of tree like structures in the databases, lot of joins lots of complex logic goes into the place.

Until SQL Server 2005, we had to create a self-referencing table to store hierarchical information, which gradually became more difficult and tedious to work with when hierarchical level grew to more than 2 or 3levels. The more levels, the more difficult to query as it required lot of joins, complex recursive logic. Though with the introduction of CTEs (Common Table Expression) in SQL Server 2005, we were not required to write iterative logic anymore to get a node (i.e. a Manager) and all its child nodes (i.e. all subordinates), but the performance of the CTE method is not that good.

Now the question is given a manager (ReportsTo) ID, get all employees who are directly or indirectly reporting to him, let's see how we can do this in different versions of SQL Server.

--SQL Server 2005
--Get all employees who directly/indirectly report to EmpID = 2
WITH EmployeeHierarchy(EmpID, EmpName, Manager, [Level])
AS
(
SELECT emp.EmpID, emp.EmpName, (SELECT EmpName FROM Employees_2000_2005 WHERE EmpID = emp.ReportsTo) , 1 AS [Level]
FROM Employees_2000_2005 AS emp
--Get all employees who directly/indirectly report to EmpID = 2
WHERE emp.EmpID = 2
UNION ALL
SELECT emp.EmpID, emp.EmpName, Parent.EmpName, Level+1
FROM Employees_2000_2005 AS emp
INNER JOIN EmployeeHierarchy AS Parent ON emp.ReportsTo = parent.EmpID
)
SELECT *
FROM EmployeeHierarchy

--Get all employees who directly/indirectly report to EmpID = 4
WITH EmployeeHierarchy(EmpID, EmpName, Manager, [Level])
AS
(
SELECT emp.EmpID, emp.EmpName, (SELECT EmpName FROM Employees_2000_2005 WHERE EmpID = emp.ReportsTo) , 1 AS [Level]
FROM Employees_2000_2005 AS emp
--Get all employees who directly/indirectly report to EmpID = 4
WHERE emp.EmpID = 4
UNION ALL
SELECT emp.EmpID, emp.EmpName, Parent.EmpName, Level+1
FROM Employees_2000_2005 AS emp
INNER JOIN EmployeeHierarchy AS Parent ON emp.ReportsTo = parent.EmpID
)
SELECT *
FROM EmployeeHierarchy

Even though SQL Server 2005, using CTE, simplified the process of retrieving hierarchical data from the SQL Server but still it has performance penalty. Not only this, the major drawback was still existent, I mean it was a nightmare to move child/sub-child records from one parent to another because of some organizational restructuring/realignment. Luckily with HIERARCHYID in SQL Server 2008 we would not be having those problems anymore. So now let's turn our attention to learn about this new system data type in details.

The new HIERARCHYID data type has been introduced to SQL Server 2008.
The following are the methods in the SQL Server 2008 Database Engine to support HierarchyID data type.

GetAncestor()

This method is useful to find the (nth ancestor of the given child node.

Syntax: child.GetAncestor ( n )

GetDescendant()

This method is very useful to get the descendant of a given node. It has a great significance in terms of finding the new descendant position get the descendants etc.

Syntax: parent.GetDescendant ( child1 , child2 )

This function returns one child node that is a descendant of the parent.

1.If parent is NULL, returns NULL.
2.If parent is not NULL, and both child1 and child2 are NULL, returns a child of parent.
3.If parent and child1 are not NULL, and child2 is NULL, returns a child of parent greater than child1.
4.If parent and child2 are not NULL and child1 is NULL, returns a child of parent less than child2.
5.If parent, child1, and child2 are all not NULL, returns a child of parent greater than child1 and less than child2.
6.If child1 or child2 is not NULL but is not a child of parent, an exception is raised.
7.If child1 >= child2, an exception is raised.

GetLevel()

This method is useful to find the Level of the current node.

Syntax: node.GetLevel ( )

This function will return an integer that represents the depth of this node in the current tree.

GetRoot()

This method will return the root of the hierarchy tree and this is a static method if you are using it within CLR.

Syntax: hierarchyid::GetRoot ( )

It will return the data type hierarchyID.

IsDescendant()

This method returns true/false (BIT) if the node is a descendant of the parent.

Syntax: parent.IsDescendant ( child )

Parse()

Parse converts the canonical string representation of a hierarchyid to a hierarchyid value. Parse is called implicitly when a conversion from a string type to hierarchyid occurs. Acts as the opposite of ToString(). Parse() is a static method.

Syntax: hierarchyid::Parse ( input )

Read()

Read reads binary representation of SqlHierarchyId from the passed-in BinaryReader and sets the SqlHierarchyId object to that value. Read cannot be called by using Transact-SQL. Use CAST or CONVERT instead.

Syntax: void Read( BinaryReader r )

Reparent()

This is a very useful method which helps you to reparent a node i.e. suppose if we want to align an existing node to a new parent or any other existing parent then this method is very useful.

Syntax: node.Reparent ( oldRoot, newRoot )

ToString()

This method is useful to get the string representation of the HierarchyID. The method returns a string that is a nvarchar(4000) data type.

Syntax: node.ToString ( )

Write()

Write writes out a binary representation of SqlHierarchyId to the passed-in BinaryWriter. Write cannot be called by using Transact-SQL. Use CAST or CONVERT instead.

Syntax: void Write( BinaryWriter w )

--First I will create a table which uses new data type
CREATE TABLE Employees_Ver2008
(
EmpID INT PRIMARY KEY NOT NULL IDENTITY,
EmpName VARCHAR(255) NOT NULL,
Title VARCHAR(255) NOT NULL,
HierarchicalLevel HIERARCHYID --Note the new data type here
)
GO
--Insert the Root node first in the table
--HIERARCHYID::GetRoot() is static method which returns the root node of a hierarchy
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('CTO', 'Chief Technology Officer', HIERARCHYID::GetRoot())
GO
--Let's see the data in the table
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString() AS [Position]
FROM Employees_Ver2008
GO

--Insert the first child node of the root node
--Get the root node we wish to insert a descendant of
DECLARE @CTONode HIERARCHYID
SELECT @CTONode = HIERARCHYID::GetRoot() FROM Employees_Ver2008
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('Manager_Dev1', 'Managar -Dev', @CTONode.GetDescendant(NULL, NULL))
GO

--Now let's insert the second child node of the root node
--Get the root node we wish to insert a descendant of
DECLARE @CTONode HIERARCHYID
SELECT @CTONode = HIERARCHYID::GetRoot() FROM Employees_Ver2008
-- Determine the last child position
DECLARE @LastChildPosition HIERARCHYID
SELECT @LastChildPosition = MAX(HierarchicalLevel) FROM Employees_Ver2008
WHERE HierarchicalLevel.GetAncestor(1) = @CTONode
INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES ('Manager_Test1', 'Manager- Test', @CTONode.GetDescendant(@LastChildPosition, NULL))
GO

--Let's see the data in the table
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString() AS [Position] FROM Employees_Ver2008
GO

CREATE PROCEDURE AddEmployee_Ver2008
(@ReportsToID INT, @EmpName VARCHAR(255), @Title VARCHAR(255))
AS
BEGIN
-- Get the root node we wish to insert a descendant of
DECLARE @ReportsToNode HIERARCHYID
SELECT @ReportsToNode = HierarchicalLevel FROM Employees_Ver2008
WHERE EmpID = @ReportsToID
-- Determine the last child position
DECLARE @LastChildPosition HIERARCHYID
SELECT @LastChildPosition = MAX(HierarchicalLevel) FROM Employees_Ver2008
WHERE HierarchicalLevel.GetAncestor(1) = @ReportsToNode

INSERT INTO Employees_Ver2008 (EmpName, Title, HierarchicalLevel)
VALUES (@EmpName, @Title, @ReportsToNode.GetDescendant(@LastChildPosition, NULL))
END
GO
--Let's add remaining nodes of tree using created stored procedure
EXECUTE AddEmployee_Ver2008 2, 'Lead_D1', 'Team Lead- Dev'
and so on...

--Get all employees who directly/indirectly report to EmpID = 4
DECLARE @ReportsToID HIERARCHYID
SELECT @ReportsToID = HierarchicalLevel FROM Employees_Ver2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employees_Ver2008
WHERE HierarchicalLevel.IsDescendantOf(@ReportsToID)= 1

Now let's turn our attention from inserting records to retrieve it. The problem statement is:

Get all employees who directly/indirectly report to a particular Employee.
Get all managers of a given employee in all levels in the management chain until the root node.
--Get all employees who directly/indirectly report to EmpID = 4
DECLARE @ReportsToID HIERARCHYID
SELECT @ReportsToID = HierarchicalLevel FROM Employee_2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employee_2008
WHERE HierarchicalLevel.IsDescendantOf(@ReportsToID)= 1

--Get all managers of a given employee(EmpID=4) in all levels in
--the management chain until the root node
SELECT @ReportsToID = HierarchicalLevel FROM Employee_2008 WHERE EmpID = 4
SELECT EmpID, EmpName, Title, HierarchicalLevel, HierarchicalLevel.ToString()AS [Position]
FROM Employee_2008
WHERE @ReportsToID.IsDescendantOf(HierarchicalLevel) = 1

Now let's see the last example of HIERARCHYID here. The requirement is Lead_D1 has left the job so all his direct reports now will report to new Lead_D3. To accomplish this task you need to write your script which will look something like this.

--Adding new Dev Lead
EXECUTE AddEmployees_Ver20082, 'Lead_D1','DEV Leader'
--Get the HierarchyID of New DEV Lead
DECLARE @NewDevLead HierarchyID= (SELECT HierarchicalLevel FROM Employees_Ver2008WHERE EmpID = @@IDENTITY)
--Get the HierarchyID of old DEV Lead
DECLARE @OldDevLead HierarchyID= (SELECT HierarchicalLevel FROM Employees_Ver2008WHERE EmpID = 4)
UPDATE Employee_2008
SET HierarchicalLevel = HierarchicalLevel.GetReparentedValue(@OldDevLead, @NewDevLead)
WHERE HierarchicalLevel.IsDescendantOf(@OldDevLead)= 1
--Parent is considered its own descendant, so you need to exclude it
AND HierarchicalLevel <> @OldDevLead

HIERARCHYID data type does not ensure uniqueness; I mean there's nothing stopping you from inserting a new row into the Employee table with the same position within the hierarchy as the other nodes unless you define unique constraint on the column.

The HIERARCHYID column can be indexed in two ways

Depth First Strategy: By default when you create an index on HIERARCHYID data type column, it uses Depth First Strategy. In a depth-first index strategy, rows in a sub tree are stored near each other. For example, all employees that report through a manager are stored near their managers' record.

Breadth First Strategy: A breadth-first strategy stores the rows at each level of the hierarchy together. For example, the records of employees who directly report to the same manager are stored near each other. Also to create a bread first index, the system needs to know the level of each record in table; for that purpose you can use GetLevel function of HIERARCHYID.

April 16, 2010

Dropping/Disabling Check Constraints and NOT NULL constraints in SQl Server

The primary job of a constraint is to enforce a rule in the database. Together, the constraints in a database maintain the integrity of the database. For instance, we have foreign key constraints to ensure all orders reference existing products. You cannot enter an order for a product the database does not know about. Maintaining integrity is of utmost importance for a database, so much so that we cannot trust users and applications to enforce these rules by themselves. Once integrity is lost, you may find customers are double billed, payments to the supplier are missing, and everyone loses faith in your application. We will be talking in the context of the SQL Server sample Northwind database.
Data integrity rules fall into one of three categories: entity, referential, and domain.

Check Constraints
Check constraints contain an expression the database will evaluate when you modify or insert a row. If the expression evaluates to false, the database will not save the row. Building a check constraint is similar to building a WHERE clause. You can use many of the same operators (>, <, <=, >=, <>, =) in additional to BETWEEN, IN, LIKE, and NULL. You can also build expressions around AND and OR operators. You can use check constraints to implement business rules, and tighten down the allowed values and formats allowed for a particular column.

NOT NULL Constraints
A logically unknown expression happens when a NULL value is present in an expression.

Any SQL INSERT, UPDATE or DELETE command applied to a table with constraints enabled has the possibility of failing.

For example, updates applied to a Parent Table may fail if the statement leaves orphaned rows in a child table, INSERT command against a Child Table may fail if a matching foreign key value does not exist in the parent table.
Constraint failures will result in the statement being rolled back - coding an application front end to deal with such errors is generally easier than handling all the business rules in code.

Some applications require to ignore check constraints and NOT NULL constraints , arguing that the business logic of an application should be retained within the business logic layer of an application. However, in such a scenario, it becomes possible for invalid data to be added to the database. So we need to disable or drop these constraints.

Following is a one line command can do a thing like this:

-- Drop Check constraints
select 'ALTER TABLE ' + s.name+'.'+OBJECT_NAME(o.parent_object_id) + ' DROP CONSTRAINT ' +OBJECT_NAME(OBJECT_ID)
FROM sys.objects as o
join sys.schemas as s on s.schema_Id = o.schema_id
WHERE o.type_desc LIKE '%CHECK_CONSTRAINT'

-- Drop NOT NULL constraints on column and and allow NULL

select 'ALTER TABLE ' + s.name+'.'+o.name + ' ALTER COLUMN ' +c.name + ' '+ t.name +' NULL'
FROM sys.objects as o
join sys.schemas as s on s.schema_id=o.schema_id
join sys.syscolumns as c on o.name=object_name(c.id)
join sys.systypes as t on c.usertype=t.usertype
where o.type='U'
and c.isnullable=1

April 12, 2010

Deadlocks in SQL Server

Deadlocks are a side effect of blocking and not a SQL Server Bug. Poorly written queries in SQL Server can trigger deadlock in the system. We all knows cyclic dependency causes dead lock.

A deadlock occurs when two system server process IDs (SPIDs) are waiting for a resource and neither process can advance because the other process is preventing it from getting the resource. For example, P1 holds a lock on table A and requests a lock on table B; P2 holds a lock on table B and requests a lock on table A.

The lock manager(Lock Monitor)/ thread checks for deadlocks. When a lock manager/ deadlock detection algorithm detects a deadlock, the lock manager chooses one of the SPIDs as a victim. The lock manager initiates a 1205 error message that is sent to the client, and the lock manager kills the SPID.

When SQL Server find a deadlock, it kill one process (deadlock victim) rolls back the transaction of the deadlock victim, and returns a 1205 error to the application. Database engine choose deadlock victim according to the least cost to rollback.

what can cause deadlocks?

• Locks

• Worker threads

• Memory

• Parallel query execution-related resources

• Multiple Active Result Sets (MARS) resources.

They can generally be resolved by changing application or stored procedure code to access tables in the same logic order, or by changing the actual database schema or indexing structures. The first step in Deadlock elimination is to identify what two statements are actually causing the deadlocks to occur.

Trace Flags can be used to output the details of deadlocks as well, and generally provide you the maximum amount of information to help facilitate identifying the root of the problem. In SQL 2000, flags 1204 (basic) and 1205 (detailed) provide information regarding deadlocks.

An Example statement to enable tracing to the ErrorLog for all connections:

DBCC TRACEON (1204, 3605, -1)

DBCC TRACEON (1205, 3605, -1)

DBCC TRACEON (1222, 3605, -1)

SQL Profiler

The SQL Profiler can also be used to help in identifying deadlocking. In order to identify the deadlocking statements in SQL Profiler, you need to be tracing the following events.

Locks - Lock:Deadlock

Locks - Lock:Deadlock Chain

Stored Procedure - SP:StmtStarting

TSQL - SQL:StmtStarting

How to create profiler trace to find deadlock?

1.Start SQL profiler

2.On the Trace Properties dialog box, on the General tab, check Save to file and specify a path to save the trace

3.Click “Show all events”

4.Click the Events tab, above specified events

(i) Add the Deadlock graph event to the trace from the Locks category. You will get an additional tab appears on the Trace Properties window, called Event Extraction Settings.

(ii) Click the Save Deadlock XML Events Separately check box. This causes the deadlock information to be written to a separate file.

When a deadlock is encountered, the Deadlock chain event will output the SPID's associated with the deadlock. From there, you can look at the StmtStarting event for the SPID's immediately following the Deadlock Event to isolate the problem statements.

This trace will record all deadlocks on this SQL Server instance, along with the ID of the source table of contention.

To translate the database id and object id into names, you can do

SELECT database_name(DatabaseID)

SELECT object_name(ObjectID)

You can also run following query to check which process is being blocked.

SELECT * FROM sys.sysprocesses WHERE blocked <> 0

Get the SPID from blocked column

DBCC inputbuffer (SPID)

sp_who2 'active'

sp_lock2

How to reproduce a deadlock:

-- ==================================

-- Create Tables

-- ==================================

CREATE TABLE [dbo].[Customers](

[CustomerID] [int] NOT NULL,

[CustName] [nvarchar](50) NOT NULL,

[PhoneNumber] [nvarchar](24) NOT NULL,

[Email] [nvarchar](100) NULL

)

CREATE TABLE [dbo].[Products](

[ProductID] [int] NOT NULL,

[ProductName] [nvarchar](40) NOT NULL,

[Price] [decimal](18, 2) NOT NULL

)

CREATE TABLE [dbo].[Orders](

[OrderID] [int] NOT NULL,

[CustomerID] [int] NULL,

[ProductID] [int] NOT NULL,

[OrderDate] [datetime] NOT NULL,

[DeliveryDate] [datetime] NOT NULL

)

-- ==================================

-- Add rows

-- ==================================

insert into [Customers] values ( 1,'Aaron Bryant','754-555-0137','aaron.bryant@gmail.com')

insert into [Customers] values ( 2,'Abigail Bailey','329-555-0153','abigail.bailey@gmail.com')

insert into [Customers] values ( 3,'Adrian Gray','379-555-0152','adrian.gray@gmail.com')

insert into [Products] values ( 1,'Bacon ice cream',300)

insert into [Products] values ( 2,'Chocolate ',200)

insert into [Products] values ( 3,'Coffee',150)

insert into [Orders] values ( 1,2,3,'2007-07-04','2007-07-06')

insert into [Orders] values ( 2,2,2,'2007-07-14','2007-08-06')

insert into [Orders] values ( 3,3,3,'2007-09-04','2007-09-06')

insert into [Orders] values ( 4,1,3,'2007-11-14','2007-11-16')

-- ====================================

-- Execute this query using a query window for Deadlock Thread 1

-- ====================================

BEGIN TRANSACTION

UPDATE [Customers] SET [PhoneNumber] = '999-555-1212' WHERE [CustomerID] = 1

WAITFOR DELAY '00:00:10'

UPDATE [Products] SET [Price] = 120 WHERE [ProductID]= 2

ROLLBACK TRANSACTION

-- ====================================

-- Immediately execute this query using a query window for Deadlock Thread 2

-- ====================================

BEGIN TRANSACTION

UPDATE [Products] SET [ProductName] = 'Ice cream' WHERE [ProductID]=1

UPDATE [Customers] SET [CustName] = 'Mr Abigail Bailey' WHERE [CustomerID] = 2

WAITFOR DELAY '00:00:10'

ROLLBACK TRANSACTION

All transactions in a deadlock will wait indefinitely unless one of the participating transactions is rolled back, for example, because its session was terminated.

The application with the transaction chosen as the deadlock victim should retry the transaction, which usually completes after the other transaction or transactions involved in the deadlock have finished.

View and save a deadlock graph in XML

Viewing a deadlock graph in XML format allows you to copy the inputbuffer of Transact-SQL statements involved in the deadlock. You may also prefer to analyze deadlocks in a text-based format.

If you have used a Transact-SQL query to return deadlock graph information, to view the deadlock graph XML, select the value in the deadlock_xml column from any row to open the deadlock graph's XML in a new window in SSMS.

<victim-list>

</victim-list>

<process-list>

unknown </frame>

unknown </frame>

</executionStack>

-- ====================================

-- Immediately execute this query using a query window for Deadlock Thread 2

-- ====================================

BEGIN TRANSACTION

UPDATE [Products] SET [ProductName] = 'Ice cream' WHERE [ProductID]=1

UPDATE [Customers] SET [CustName] = 'Mr Abigail Bailey' WHERE [CustomerID] = 2

WAITFOR DELAY '00:00:10'

ROLLBACK TRANSACTION </inputbuf>

</process>

unknown </frame>

unknown </frame>

</executionStack>

-- ====================================

-- Execute this query using a query window for Deadlock Thread 1

-- ====================================

BEGIN TRANSACTION

UPDATE [Customers] SET [PhoneNumber] = '999-555-1212' WHERE [CustomerID] = 1

WAITFOR DELAY '00:00:10'

UPDATE [Products] SET [Price] = 120 WHERE [ProductID]= 2

ROLLBACK TRANSACTION </inputbuf>

</process>

</process-list>

<resource-list>

<owner-list>

</owner-list>

<waiter-list>

</waiter-list>

</ridlock>

<owner-list>

</owner-list>

<waiter-list>

</waiter-list>

</ridlock>

</resource-list>

</deadlock>

The XML for this example deadlock graph is:

A deadlock graph typically has three nodes:

Victim-list. The deadlock victim process identifier.

Process-list. Information on all the processes involved in the deadlock. Deadlock graphs use the term 'process' to represent a session running a transaction.

Resource-list. Information about the resources involved in the deadlock.

March 26, 2010

TempDB in SQL server 2005

Tempdb holds all temporary tables and temporary stored procedures. Tempdb will automatically increase and decrease as the data engine performs actions. Using a lot of temp tables and other operations can cause the log to grow,
but if you stopped the service and restarted it, then tempdb is created with
the last size set up.

Scenarios that can cause tempdb to fill up:
any sorting that requires more memory than has been allocated to SQL Server will be forced to do its work in tempdb;

 if the sorting requires more space than you have allocated to tempdb, one of the above errors will occur;

 DBCC CheckDB('any database') will perform its work in tempdb -- on larger databases, this can consume quite a bit of space;

 DBCC DBREINDEX or similar DBCC commands with 'Sort in tempdb' option set will also potentially fill up tempdb;

 large resultsets involving unions, order by / group by, cartesian joins, outer joins, cursors, temp tables, table variables, and hashing can often require help from tempdb;

 any transactions left uncommitted and not rolled back can leave objects orphaned in tempdb;

 Queries which show a relatively high value in the WRITES data column of a Profiler trace. Seeing SPOOL operators in the query plan is another sign that SQL Server is performing large numbers of writes to tempdb. The SPOOL operator means that SQL Server is "spooling" an interim result set to a secondary location (tempdb) so that it can do something with that data later in the plan.

 use of an ODBC DSN with the option 'create temporary stored procedures' set can leave objects there for the life of the connection.

Identifying tempdb bottleneck using Perfmon counters:

Use following performance counters to monitor tempdb issue in SQL Server.
 Temp Tables Creation Rate (SQL Server 2005): The number of temporary tables or variables that are created per second. The value depends on the workload. If your workload creates many temporary tables and the temporary table cache cannot be used, you may see high values. In such cases investigate why the temporary table cache cannot be used; this might be by design in the application. For example, if there is an explicit DDL after the temporary table is created, the table cannot be cached. In a perfect scenario, if there were 100% temporary table cache hits, this counter would be 0
 Temp Tables For Destruction (SQL Server 2005): The number of temporary tables or variables waiting to be destroyed by the cleanup system thread. The number of temporary tables or variables waiting to be destroyed by the cleanup system thread should be 0. Temporary spikes are expected, but should be rare. Spikes can happen when the temporary table cache or the procedure cache is being cleaned up.

Fixing tempdb full issue:

If tempdb is full, it gives one of the following error:
 1101 or 1105: A session has to allocate more space in tempdb in order to continue
 3959: The version store is full.
 3967: The version store has been forced to shrink because tempdb is full.
 3958 or 3966: A transaction is unable to find a required version record in tempdb.
Restarting SQL Server will re-create tempdb from scratch, and it will return to its usually allocated size. In and of itself, this solution is only effective in the very short term; assumedly, the application and/or T-SQL code which caused tempdb to grow once will likely cause it to grow again.

To shrink tempdb, you can consider using DBCC ShrinkDatabase, DBCC ShrinkFile (for the data or the log file).

Shrinking files is not a recommended practice, since these files will probably grow again. In addition, shrink operations cause data fragmentation. Performing shrink operation on tempdb has the following limitations:
• Shrink operations do not shrink the version store or internal objects. This means that you will not recover space if the internal object or version store page needs to be moved.

Identifying session which might be causing issue and fixing it by killing that session:

If you can't shrink the log, it might be due to an uncommitted transaction. See if you have any long-running transactions with the following command:

DBCC OPENTRAN -- or DBCC OPENTRAN('tempdb')

Check the oldest transaction (if it returns any), and see who the SPID is (there will be a line starting with 'SPID (Server Process ID) : '). Use that in the following:

DBCC INPUTBUFFER SPID

This will tell you at least a portion of the last SQL command executed by this SPID, and will help you determine if you want to end this process with:

KILL SPID number

Batch larger heavily-logged operations (especially deletes) that *might* overflow into tempdb into reasonable 'chunks' of rows, especially when joins are involved.

March 22, 2010

Finding Column Dependencies in SQL server 2005

If you need to make small changes to an existing database object, then you will need to do some impact assessment to determine what SQL Server Objects will be affected by your small change.We need to determine column dependencies within database as well as these dependencies across different databases.

Using the SYSCOMMENTS table:
The actual code for views, rules, defaults, triggers, CHECK constraints, DEFAULT constraints and stored procedures are stored in the syscomments table. The column TEXT in the syscomments table contains the actual code for all these objects. Knowing this allows you to write some simple T-SQL code that can scan the syscomments table looking for an actual table column name or database object name that you are planning on changing. By scanning the syscomments table, you will be able to narrow the focus of your impact analysis down to only those objects that might be affected by your change.

e.g
select name
from syscomments c
join sysobjects o on c.id = o.id
where TEXT like '%Titles%' and TEXT like '%Title_Id%' -- we are determining in which objects Title_Id column of Titles table is used.

This query looks for syscomments rows that contain both the string "Title_Id " and "titles" in the same record. With this query, I find all syscomments records that contain both a reference to the table and column that I planned on changing. This query joins the syscomments and sysobjects table so that the object name can be displayed in the results.

Using the sp_depends Stored Procedure:
SQL Sever maintains a database system table named sysdepends. SQL Server uses this table to store object dependencies. This table only contains information about objects that are dependent upon other objects. It does not contain detailed dependency information, such as which attributes of an object are referenced by other objects. Even so, you can use the information in the sysdepends table to help narrow down your impact analysis when you are making a schema change.

To access the sysdepends table information SQL Server provides the sp_depends system stored procedure.
e.g. EXEC sp_depends 'Titles'

Lets say , it return objects dbo.titleview , GetAllTitles

This will return all objects that are dependent on the table name "Titles."

Now run below query to get all objects which use column Title_id.

select distinct object_name(id) from sys.syscomments
where TEXT like '%Titles%' and TEXT like '%Title_Id%'
and object_name(id) in (
'GetAllTitles',
'titleview'
')

Not All Code is Stored in SQL Server
Now keep in mind the syscomments and sp_depends methods may not find all your code that is affected by a schema change. These two methods are only able to scan objects that are stored in SQL Server (views, rules, defaults, triggers, CHECK constraints, DEFAULT constraints, and stored procedures). If you have written your application code in such a way that allows you to issue T-SQL SELECT, UPDATE, INSERT and DELETE statements in code blocks that are not stored in SQL Server, then you will need to use other methods for scanning this code. Hopefully where you store the code for your application outside SQL Server is in a repository or file structure of some kind where you can do a global find or search of some kind.

When you are making a database schema change, it is important to identify all the objects that might be affected by the change. Without doing a complete impact assessment, you run the risk of causing problems when implementing your schema change. Therefore having some easy, and automated method to help identify the objects that will be impacted by your schema change is critical, and a time saver. The next time you are doing some impact analysis associated with a schema change to your database, you might consider using one or all the methods I discussed to help narrow down the objects impacted by your proposed change.

Log File in SQL Server 2005

In SQL Server 2005, each database contains at least one data file and one transaction log file. SQL Server stores the data physically in the data file. The transaction log file stores the details of all the modifications that you perform on your SQL Server database and the details of the transactions that performed each modification. Because the transactional integrity is considered a fundamental and intrinsic characteristic of SQL Server, logging the details of the transactions cannot be turned off in SQL Server.

Recovery Model
Recovery model is the way you want the transactions to be registered into the log file. Since log files include all the transactions in the database, they can be used for recovery. There are 3 types of recovery, here they are, and what they do.

Simple recovery model:

The simple recovery model is just that: simple. In this approach, SQL Server maintains only a minimal amount of information in the transaction log. SQL Server truncates the transaction log each time the database reaches a transaction checkpoint, leaving no log entries for disaster recovery purposes.

In databases using the simple recovery model, you may restore full or differential backups only.

Full recovery model:

In the event of a database failure, you have the most flexibility restoring databases using the full recovery model. In addition to preserving data modifications stored in the transaction log, the full recovery model allows you to restore a database to a specific point in time.

The bulk-logged recovery model is a special-purpose model that works in a similar manner to the full recovery model. The only difference is in the way it handles bulk data modification operations. The bulk-logged model records these operations in the transaction log using a technical known as minimal logging. This saves significantly on processing time, but prevents you from using the point-in-time restore option.

Issues due to large transaction log file:

Sometimes, the transaction log file may become very large and you may run out of disk space. When a transaction log file grows until the log file uses all the available disk space and cannot expand any more, you can no longer perform any data modification operations on your database. Additionally, SQL Server may mark your database as suspect because of the lack of space for the transaction log expansion.

To recover from a situation where the transaction logs grow to an unacceptable limit, you must reduce the size of the transaction logs. To do this, you must truncate the inactive transactions in your transaction log and shrink the transaction log file.

When the transaction logs grow to an unacceptable limit, you must immediately back up your transaction log file. While the backup of your transaction log files is created, SQL Server automatically truncates the inactive part of the transaction log. The inactive part of the transaction log file contains the completed transactions, and therefore, the transaction log file is no longer used by SQL Server during the recovery process. SQL Server reuses this truncated, inactive space in the transaction log instead of permitting the transaction log to continue to grow and to use more space.

The backup operation or the Truncate method does not reduce the log file size. To reduce the size of the transaction log file, you must shrink the transaction log file. To shrink a transaction log file to the requested size and to remove the unused pages, you must use the DBCC SHRINKFILE operation. The DBCC SHRINKFILE Transact-SQL statement can only shrink the inactive part inside the log file.

If your database's log file size reaches its limit, you can truncate and then shrink it to its minimum size by using the following commands:

USE DatabaseName
GO
DBCC SHRINKFILE(, 1)
BACKUP LOG WITH TRUNCATE_ONLY
DBCC SHRINKFILE(, 1)

If you've limit your log file size, and it gives an error when it has reached its maximum limit, you can do the following to tackle the issue:
1: Set your database's recovery mode to "SIMPLE". By default, SQL Server 2005 sets a database's recovery mode to "FULL". Now, before changing the recovery mode, you should know what you're willing to recede. When a database's recovery mode is "FULL", it means when you restore or when the database gets corrupted, SQL Server will restore it to a point right before it got corrupted! On the other hand, if you use "SIMPLE" recovery mode, it will simply restore the database to the last point the backup was made. For instance, you took a database's backup at, let's say, 10 am, and your database got corrupted at 5 pm the same day. With FULL recovery mode, you'll be able to restore your batabase to a state it was in right before 5 pm. With SIMPLE recovery mode, you'll be able to restore your database to a state it was in at 10 am!

2: Take periodic backup of your database's log file! When you take its backup, the committed transactions in the log file go in overwrite mode.

When you have your recovery mode to full, you are supposed to make two types of backups:
1.Full backup every ones in a while.
2.Transaction log backups according to transaction log volume to keep the log file small and being able to do a point in time recovery , If you skip this not only will the log file keep on growing, but the only restore you can do will be a restore from the full backup. And in that case you loose all the work from that moment on. If you backup the transaction log as well, you can do a roll forward and restore up until the last transaction log backup.

Schedule a job to take the backup of your log file and schedule it according to transaction log volume.

If you restore a database, a new database is created for you, and a new logfile. As SQL Server is a write ahead RDBMS, every transaction first goes to the log, and only then to disk. So a database in SQL Server will always contain a data file and a log file. Right after the restore, your database will be as at the time of the backup, the logfile will be almost empty. Only when you do a 'special' restore, some info from the logfile can be used. Under normal circumstances, all open transactions that are in the logfile are rolled back after a restore. So actually, there is some logfile data in a full backup ... but 'backup database' statement can never 'empty' the logfile for you !

Eliminate the Log File Completely
Sometimes, we just do not need the big log file. For example, I have a 40GB log file. I am sure I do not need this log file and want to get rid of it completely to free up the hard drive space. The logic is:

Backup DB
Detach DB
Rename Log file
Attach DB
New log file will be recreated
Delete Renamed Log file.

Let’s say, the database name is testDev. In the SQL Server Management Studio,

Highlight the database-> Tasks->Detach..-> Click OK
Go to log file folder -> rename the testDev_log.ldf to be like testDev_log-aa.ldf
Highlight Databases->Attach…-> Click Add -> add the database testDev, highlight the log file and click the ‘Remove’ button. This means you only attach testDev.mdf
After this is done, you can verify the contents of the attached database and then delete the log file
This way we can safely delete the log file and free up the space.

February 12, 2010

Accidental DBA

There seems to be a lot of small companies out there who have appointed some unwilling developer as their DBA. Being a DBA comes with a lot of responsibility. For the most part, being a DBA means that you are in charge of making sure that the company's data is secure, backed up and accessible 24x7.

Accidental DBA doesn't mean "wait for things to break, then do your best to fix them." Save your sanity and do some proactive work now to keep things flowing later. You won't be sorry, and you'll be a hero to your company and/or clients.

- Check maintenance plans, make sure they're running as you expect. Check them again next month. And the month after. This is something you want to know is working.

- Restore your database backup to another server (or another database). Test it, make sure it works, make sure it has all of the information you expect it to.

- Check on index fragmentation for about 25% of your tables. If a table isn't showing issues, set it aside and check it again in 6 months. If it is in need of defragmentation, correct it, then check it again in 2 weeks.

- Check your free disk space on all drives. Do it monthly, log the results, watch the trend, plan for when you'll need more disk space. A simple Excel worksheet will do for keeping track, don't make it more elaborate than it needs to be.

Once you aquire DBA skiils, concentrate on following things.

Know your SLA - SLA stands for service level agreement. In most companies the SLA is determined before a project goes live. The SLA describes the minimum performance criteria that a provider promises to meet while delivering a service. It also outlines any remedial action and any penalties that will take effect if performance falls below the promised standard. In a nutshell this means "how long does the server have to be down before someone loses their job?". In general, all decisions regarding backups, database maintenance and etcetera should revolve around the criteria set forth in the SLA. That is why it is first in this list.

Develop a Disaster Recovery Strategy - If the SLA says that you can only be down for a maximum of an hour in the event of an emergency then you better make sure that you have a restore strategy to fulfill the requirements. There are many high availability technologies to chose from but each one has its limitations.
Clustering - A database cluster consists of two or more physical machines (nodes) which utilize a shared disk on a SAN in order to keep the data files accessible at all times. If one of the nodes fail then the drives hosting the SQL data will "failover" to the other node and the SQL services will start up on the secondary node. When the services restart on the secondary node, SQL Server will go through the auto-recovery process and all transactions that were active before the failover will be rolled back. The drawbacks to clustering are that you have a single point of failure at the SAN. Therefore in order to be safe you will probably need SAN level replication. For small companies a SAN may be too expensive to implement and having a secondary SAN for replication may be completely out of the question.

Mirroring - There are two different configurations available for mirroring. The first is high-safety mode which uses a two-phase commit. This means that the transaction has to safely be applied on both instances before a transaction is considered complete. The second option is high-performance mode which uses an asynchronous style of transactions. In high-performance mode a transaction is considered complete as soon as it is finished on the principal server. The records will eventually make it to the mirror server but there could be a lag. Mirroring is a good option if you do not have a SAN. In addition, many people use the secondary "mirror server" for reporting purposes. While transactions are being applied to the mirror server the database is is inaccessible, but you can create a database snapshot of the mirrored copy which will be available at all times. Reporting on the snapshot will take some strain off your principal server but you will have to make sure your end users do not require "real-time" data.

Log Shipping - Log shipping is similar to mirroring in the fact that transactions are replayed on another server. However, Log shipping only gives you the warm standby option whereas mirroring can do both hot and warm standby depending on the configuration. The one drawback to mirroring is that it allows a single destination database where log shipping can support multiple.
Replication - Replication is yet another technology for replaying transactions on a separate server. There are three different kinds of replication which include merge, transactional and snapshot. While replicating data the databases can be accessed on the secondary server for reporting in "real-time". The drawback to replication is that it is not going to provide you with a automatic failover. If the primary server goes down you will need to manually redirect your client applications to the secondary server. For the most part, replication is really not intended to be a disaster recovery tool. It was really meant as a way to distribute data between different servers for reporting and accessibility purposes. You can replicate an entire database, a table or only certain rows from a table. If I was planning a DR strategy I would probably stick with Mirroring or Clustering.

Pick a Recovery Model - Do you need point in time recovery up to the second? If so then you should probably use the full recovery model. Bulk-logged offers a little better performance than Full but it does come with the risk of data loss. Under the bulk logged recovery model, a damaged data file can result in your end users having to manually redo some transactions. The simple recovery model is the best for performance but it only will allow you to recover to the most recent full or differential backup. Simple recovery is ideal for development and test servers but hardly ever used for production databases.

Backup - Backup your databases to a separate location. At my company we keep a few days worth of "online" backups on the NAS (do not put backups on the same machine that you are backing up!) and then after a few days they get written to tape. Make sure that you test your backups on a regular basis. Keep track of how long the restore process works and also make sure that if you need point in time recovery that you are familiar with how to apply transaction logs. When an emergency occurs you want to complete confidence in your backups and your restore process.

SQL Server Injection

View more presentations from avishkarm.

Covering Index using Included Columns

You can extend the functionality of nonclustered indexes by adding nonkey columns to the leaf level of the nonclustered index. By including nonkey columns, you can create nonclustered indexes that cover more queries. This is because the nonkey columns have the following benefits:

They can be data types not allowed as index key columns.

They are not considered by the Database Engine when calculating the number of index key columns or index key size.

An index with included nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns. Performance gains are achieved because the query optimizer can locate all the column values within the index; table or clustered index data is not accessed resulting in fewer disk I/O operations.

Anything that is in the where clause would be a key, anything in your select statement would be an included column.

When an index contains all the columns referenced by the query it is typically referred to as covering the query.

January 22, 2010

Pros and Cons of stored procedures

* Stored procedures generally result in improved performance because the database can optimize the data access plan used by the procedure and cache it for subsequent reuse.
* Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
* Stored procedures result in easier maintenance because it is generally easier to modify a stored procedure than it is to change a hard-coded SQL statement within a deployed component.
* Stored procedures add an extra level of abstraction from the underlying database schema. The client of the stored procedure is isolated from the implementation details of the stored procedure and from the underlying schema.
* Stored procedures can reduce network traffic, because SQL statements can be executed in batches rather than sending multiple requests from the client.

The advantages of using stored procedures are:

1) Network Bandwidth: Assume that the application server(s) and the database server(s) are separate servers. Since the source code is actually stored on the database server, the application only needs to send the name and the parameters for executing it and this in turn reduces the amount of data that is sent to the server. When you use embedded SQL or dynamically generated SQL through an ORM, then the full source of commands must be transmitted and in a scenario where there is a lot of concurrent activity going on and the load on the system requires a lot of users, this can very soon become a performance bottleneck. This can be mitigate in part by a judicious usage of views as well.

2) Abstraction Layer: It helps in separating the business logic from data manipulation logic. Since the interface to the application remains the same, changes done internally to the stored procedures/packages do not effect the application and in turn leads to easy deployment of changes.

3) It offers simplified batching of the commands. Since stored procedures/packages are meant to do a “Unit of Work”, this procedural logic can be simple to maintain and offers additional advantages like making use of the rich feature functionality that the database engines provide. SQL is a SET based language and using SET based procedural logic is the easiest and most performant way of dealing with the data. With every new release of Oracle, SQL Server or DB2 LUW, new features are being introduced in PL/SQL, T-SQL and/or SQL/PL which makes handling of different types of requirements very easy in the DB layer code.

4) Increased data security and integrity: One can secure the tables for direct access and only grant privileges on the stored procedures/packages.

5) By using bind variables (parameterized queries), SQL injection issues are mitigated as well.

6) Code Instrumentation and tracing logic can be built in very easily using the stored procedures. This is one thing that we implemented for one of our clients recently. We created a table which had a list of the DB code that was being used in the schema and this table had a trace_flag column in it which could have 4 different values:

0 (no code instrumentation),

1 (log the sequence of events),

2 ( log the sequence of events and the time taken by those SQL statements),

3 ( log the sequence of events + the time taken + the execution plan from that point of time – since the execution plan can very easily be different at the time of execution under a load scenario vs when you actually run it separately), and

4 (Start the trace – example: starting 10046 level 12 trace in the case of Oracle).

Using this logic, code instrumentation and troubleshooting production issues became very easy. One could then run reports against the data that was logged and present it to the end user or the support personnel. Code instrumentation can be done in the application tier as well using the same guidelines (or using logging blocks like MSFT logging block in .Net) and a good programmer would always instrument their code. However, for the DB code, this code instrumentation becomes a lot more easier to implement.

Cons of using stored procedures:

1) If your application runs on multiple RDBMS, example: You are a vendor and you need to provide your product that runs on Oracle, SQL Server and DB2 LUW in order to expand your customer base, then in that scenario, you have to code or put fixes in for three different code bases. Not only that, you need to have proper staffing to ensure that the code written is optimal since the locking and concurrency mechanisms are very different between these RDBMS. Also, the language used by all these “big 3″ is very different as well.

2) We have seen client shops which offload all of their CRUD operations on the DB tier – as a result they end up with one-liner stored procedures and if you have say 200 objects, you now have 200 * 4 (select/insert/update/delete) stored procedures or one procedure per object with the flag option to indicate the operation and need to code the procedure to use dynamic SQL in order to take into account the conditional parameter logic. Maintenance becomes a nightmare in this case. Also, developers/DBAs sometimes get carried away with this approach and forget that SQL is a set based language – one example is that using this scheme, a client shop was doing purges of the data and kept calling the delete procedure by passing in the ID (surrogate Key) value when they actually could have purged and archived the data using a SET based logic. Procedures are supposed to do unit of work – having one liner procedures and invoking that many calls in a batch does not yield any benefit. In addition, it has to un-necessarily incur the cost of checking permissions and plan associated with that one liner procedure – the cost is albeit a very miniscule one.

3) Parsing of strings is not one of the forte’s of the DB languages – that code is better suited in the application tier unless you start using CLR or Java code.

So, the bottom line is that one needs to carefully evaluate which portion(s) of the application really belong as stored procedure/packages. In applications that work on volume of data and do bulk data processing, it is always advisable to have that logic in stored procedures/packages that reside on the DB layer so that one can take advantage of the SET based approach of the SQL language.

One can also use ORMs (Object Relational Mappers) like Hibernate to prepare their data access strategy – one can then extend it to make calls to the database stored procedure (if so desired in certain situations), have externalized SQL statements, have dynamic SQL statements being prepared by the ORM etc.. One just has to make the right decision depending upon the application and the requirements that are needed.

Another thing to remember is when people point out that an advantage of stored procedures is that the code is always pre-compiled, that is not always true, there can be scenarios that can lead to re-compilation. Also, if proper bind variables are being used for the queries built using an ORM, it serves the same purpose (i.e. has the same advantage as that of a parsed/compiled stored procedure query) since the plan for that SQL statement is parsed and compiled. Depending upon a lot of factors (cache sizes, concurrent load, statistics updates etc.), that plan may or may not be available the next time the call gets made.

A good mix of an ORM plus DB code is usually the best compromise to get the best of both the worlds. Deciding what goes where is the key and though there are guidelines on that, it can vary from application to application depending upon the requirements.

Thanks for article :Coding Horror :programming and human factors by Jeff Atwood
For more information:
Pl visit http://www.codinghorror.com/blog/2004/10/who-needs-stored-procedures-anyways.html

January 06, 2010

Maximum Capacity Specifications for SQL Server

Maximum Capacity Specifications for SQL Server

The following tables specify the maximum sizes and numbers of various objects defined in SQL Server components.

Database Engine Objects
The following table specifies the maximum sizes and numbers of various objects defined in SQL Server databases or referenced in Transact-SQL statements.

SQL Server Database Engine object Maximum sizes/numbers SQL Server (32-bit) Maximum sizes/numbers SQL Server (64-bit)
Batch size1
65,536 * Network Packet Size
65,536 * Network Packet Size

Bytes per short string column
8,000
8,000

Bytes per GROUP BY, ORDER BY
8,060
8,060

Bytes per index key2
900
900

Bytes per foreign key
900
900

Bytes per primary key
900
900

Bytes per row8
8,060
8,060

Bytes in source text of a stored procedure
Lesser of batch size or 250 MB
Lesser of batch size or 250 MB

Bytes per varchar(max), varbinary(max), xml, text, or image column
2^31-1
2^31-1

Characters per ntext or nvarchar(max) column
2^30-1
2^30-1

Clustered indexes per table
1
1

Columns in GROUP BY, ORDER BY
Limited only by number of bytes
Limited only by number of bytes

Columns or expressions in a GROUP BY WITH CUBE or WITH ROLLUP statement
10
10

Columns per index key7
16
16

Columns per foreign key
16
16

Columns per primary key
16
16

Columns per nonwide table
1,024
1,024

Columns per wide table
30,000
30,000

Columns per SELECT statement
4,096
4,096

Columns per INSERT statement
4096
4096

Connections per client
Maximum value of configured connections
Maximum value of configured connections

Database size
524,272 terabytes
524,272 terabytes

Databases per instance of SQL Server
32,767
32,767

Filegroups per database
32,767
32,767

Files per database
32,767
32,767

File size (data)
16 terabytes
16 terabytes

File size (log)
2 terabytes
2 terabytes

Foreign key table references per table4
253
253

Identifier length (in characters)
128
128

Instances per computer
50 instances on a stand-alone server for all SQL Server editions except for Workgroup. Workgroup supports a maximum of 16 instances per computer.

SQL Server supports 25 instances on a failover cluster.
50 instances on a stand-alone server.

25 instances on a failover cluster.

Length of a string containing SQL statements (batch size)1
65,536 * Network packet size
65,536 * Network packet size

Locks per connection
Maximum locks per server
Maximum locks per server

Locks per instance of SQL Server5
Up to 2,147,483,647
Limited only by memory

Nested stored procedure levels6
32
32

Nested subqueries
32
32

Nested trigger levels
32
32

Nonclustered indexes per table
999
999

Number of distinct expressions in the GROUP BY clause when any of the following are present: CUBE, ROLLUP, GROUPING SETS, WITH CUBE, WITH ROLLUP
32
32

Number of grouping sets generated by operators in the GROUP BY clause
4,096
4,096

Parameters per stored procedure
2,100
2,100

Parameters per user-defined function
2,100
2,100

REFERENCES per table
253
253

Rows per table
Limited by available storage
Limited by available storage

Tables per database3
Limited by number of objects in a database
Limited by number of objects in a database

Partitions per partitioned table or index
1,000
1,000

Statistics on non-indexed columns
30,000
30,000

Tables per SELECT statement
Limited only by available resources
Limited only by available resources

Triggers per table3
Limited by number of objects in a database
Limited by number of objects in a database

Columns per UPDATE statement (Wide Tables)
4096
4096

User connections
32,767
32,767

XML indexes
249
249

1Network Packet Size is the size of the tabular data stream (TDS) packets used to communicate between applications and the relational Database Engine. The default packet size is 4 KB, and is controlled by the network packet size configuration option.

2The maximum number of bytes in any index key cannot exceed 900 in SQL Server. You can define a key using variable-length columns whose maximum sizes add up to more than 900, provided no row is ever inserted with more than 900 bytes of data in those columns. In SQL Server, you can include nonkey columns in a nonclustered index to avoid the maximum index key size of 900 bytes.

3Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of all objects in a database cannot exceed 2,147,483,647.

4Although a table can contain an unlimited number of FOREIGN KEY constraints, the recommended maximum is 253. Depending on the hardware configuration hosting SQL Server, specifying additional FOREIGN KEY constraints may be expensive for the query optimizer to process.

5This value is for static lock allocation. Dynamic locks are limited only by memory.

6If a stored procedure accesses more than 8 databases, or more than 2 databases in interleaving, you will receive an error.

7If the table contains one or more XML indexes, the clustering key of the user table is limited to 15 columns because the XML column is added to the clustering key of the primary XML index. In SQL Server, you can include nonkey columns in a nonclustered index to avoid the limitation of a maximum of 16 key columns. For more information, see Index with Included Columns.

8 SQL Server supports row-overflow storage which enables variable length columns to be pushed off-row. Only a 24-byte root is stored in the main record for variable length columns pushed out of row; because of this, the effective row limit is higher than in previous releases of SQL Server. For more information, see the "Row-Overflow Data Exceeding 8 KB" topic in SQL Server Books Online.

Maximum in Oracle:

Maximum in Oracle:

Database file size Maximum Operating system dependent. Limited by maximum
operating system file size; typically 222 or 4 MB blocks

Bigfile Tablespaces Number of blocks 232 (4 GB) blocks
Smallfile (traditional)
Tablespaces
Number of blocks 222 (4 MB) blocks

GROUP BY clause Maximum length The GROUP BY expression and all of the
nondistinct aggregate functions (for example, SUM,
AVG) must fit within a single database block.

Indexes Maximum per table Unlimited
total size of indexed column 75% of the database block size minus some
overhead

Columns Per table 1000 columns maximum
Restrict the number of columns less than 255 and the performance will be good.

Per index (or clustered index) 32 columns maximum
Per bitmapped index 30 columns maximum
Constraints Maximum per column Unlimited
Subqueries Maximum levels of subqueries
in a SQL statement
Unlimited in the FROM clause of the top-level
query
255 subqueries in the WHERE clause
Partitions Maximum length of linear
partitioning key
4 KB - overhead
Maximum number of columns
in partition key
16 columns
Maximum number of partitions
allowed per table or index
64 KB - 1 partitions
Rows Maximum number per table Unlimited

Tables Maximum per clustered table 32 tables
Maximum per database Unlimited

January 05, 2010

Compare MS SQL server Vs Oracle 10g

SQL Server:

Pros:
Its easy installation, self-tuning capabilities, graphical administration, integrated business intelligence and extensive collection of help wizards
The total cost of ownership (TCO) of SQL Server 2005 is lower than that of Oracle
Specialised index on a computed column
Indexed View

Cons:
Locking and concurrency: SQL Server has no multi-version consistency model, which means that "writers block readers and readers block writers" to ensure data integrity

Performance and tuning:
a. DBA has no "real" control over sorting and cache memory allocation. The memory allocation is decided only globally in the server properties memory folder, and that applies for ALL memory and not CACHING, SORTING, etc.
b. All pages (blocks) are always 8k and all extents are always 8 pages (64k). This means you have no way to specify larger extents to ensure contiguous space for large objects.
c. In SQL Server, no range partitioning of large tables and indexes. In Oracle, a large 100 GB table can be seamlessly partitioned at the database level into range partitions.

With SQL Server 2005, INSERT, UPDATE, and DELETE statements are executed serially (MERGE is not supported).

Oracle:
Pros :
you can use Oracle on multiple platforms. Whereas Microsoft created SQL Server to be used on the Microsoft platform only, Oracle is available on multiple platforms, including Windows, Unix and now Linux, which is the foundation of Oracle's Real Application Clusters (RAC) strategy.

Locking and concurrency: "readers don't block writers and writers don't block readers." This is possible without compromising data integrity because Oracle will dynamically re-create a read-consistent image for a reader of any requested data that has been changed but not yet committed. In other words, the reader will see the data as it was before the writer began changing it (until the writer commits).

function-based indexes
Oracle will execute INSERT, UPDATE, DELETE, and MERGE statements in parallel when accessing both partitioned and non-partitioned database objects

Cons:
"Implementation of something similar to MSSQL Identity by using Oracle sequence would require reflecting the sequence name in the application or creating a trigger for each table/sequence pair.
Cost is higher
Required skilled DBA.

CONCURRENCY MODEL
concurrency control. The main differences are summarized in the table below:
Oracle Database 10g SQL Server 2005
Multi-version read consistency Always enabled. Not by default.
Must be enabled.
Non-escalating row-level locking Yes Locks escalate

SQL Server 2005 introduces two new isolation levels3:
read committed with snapshots (statement-level read consistency)
snapshot isolation (transaction-level read consistency)
These isolation levels correspond to Oracle’s READ COMMITTED and SERIALIZABLE isolation levels, respectively