Data Engineering with Avishkar: SQL server 2008 Performance tuning

Showing posts with label SQL server 2008 Performance tuning. Show all posts

April 25, 2016

Heap Table

Tables without clustered indexes are called heaps.
They’re scattered on disk anywhere that SQL Server can find a spot, and they’re not stored in any order whatsoever. This can make for really fast inserts – SQL Server can just throw the data down – but slow selects, updates, and deletes.
These tables have a index_id value of 0 in the sys.indexes catalog view
Cases where heaps perform better than tables with clustered indexes.

1.Fast inserts
2. Since there is no clustered index, additional time is not needed to maintain the index and there is not the need for additional space to store the clustered index tree.
3. If your heap table only has INSERTS occurring, your table will not become fragmented, since only new data is written.

Drawback of Heaps:
1. Specific data can not be retrieved quickly, unless there are also non-clustered indexes.
2. Data is not stored in any particular order.
3. Fragmentation:
Resolving the fragmentation for a Heap table not easy as there is no rebuilding or reorganizing your clustered index. You need to create a new table and insert data from the heap table into the new table based on some sort order or Export the data, truncate the table and import the data back into the table

In following scenarios, you should use Heap tables:

Tables containing OLTP transactions
Data Warehouse fact tables
Log tables of all kinds
Telco Call Detail Records (CDR)
Event from Web Servers and social media

August 12, 2014

Multiple Ways to Improve SQL Query Performance

Multiple Ways to Improve SQL Query Performance

1. Re-writing SQL Query
Re-writing SQL Query by following best practices for getting the fastest response like avoid writing a SQL query using multiple joins ( JOIN containing more than 4 tables). ( Pls refer Nested Loop http://avishkarm.blogspot.in/2014/08/what-is-nested-loops-join.html)
Try to remove cursors from the query and use set-based query.
Remove non-correlated scalar sub query as a separate query instead of part of the main query and store the output in a variable.
Avoid Multi-statement TVFs are more costly than inline TFVs.

2. Indexes
Index can magically reduce the data retrieval time and help to improve SQL query performance and give you best query response time.
Keep in mind that clustered index should be created on a unique column.
A non-clustered index is most useful if the ratio of qualifying number of rows/total number of rowsis around 5% or less, which means if the index can eliminate 95% of the rows from consideration. As a best practice, most selective columns should be placed leftmost in the key of a non-clustered index.
Dropping unused indexes can help to speed up data modifications without affecting data retrieval.

3. Statistic Creation and Updates

4.Revisit Your Schema Definitions
Availability of the right constraint(FORIGEN KEY, NOT NULL and CEHCK constraints ) on the right place always helps to improve the query performance, like FORIGEN KEY constraint helps to simplify joins by converting some outer or semi-joins to inner joins and CHECK constraint also helps a bit by removing unnecessary or redundant predicates.

What is Hash Join?

What is Hash Join?
The optimizer usually uses a hash join when at least one of the two join tables does not have an index on the join column or when the database server must read a large number of rows from both tables. No index and no sorting is required when the database server performs a hash join.

A hash join consists of two activities: first building the hash table (build phase) and then probing the hash table (probe phase). Figure 48 shows the hash join in detail.

In the build phase, the database server reads one table and, after it applies any filters, creates a hash table. Think of a hash table conceptually as a series of buckets, each with an address that is derived from the key value by applying a hash function. The database server does not sort keys in a particular hash bucket.

Smaller hash tables can fit in the virtual portion of database server shared memory. The database server stores larger hash files on disk in the dbspace specified by the DBSPACETEMP configuration parameter or the DBSPACETEMP environment variable.

In the probe phase, the database server reads the other table in the join and applies any filters. For each row that satisfies the filters on the table, the database server applies the hash function on the key and probes the hash table to find a match.

May 07, 2014

SQL Server Stored Procedure runs slow sometimes

Sometimes, you might face a very weird issue with a stored procedure on SQL Server 2008 R2. Sometimes, about once every month, some procedures that becomes very slow, takes about 6sec to run instead of a few milliseconds.

Possible reasons:

When the sp is compiled, it is cached and this cache is reused every time I call it, and this cached version gets corrupted for some reason.

Possible Solutions:

1. When you first compile a stored procedure, its execution plan gets cached.

If the sproc has parameters whose definitions can significantly change the contained query's execution plan like index scans vs seeks, the stored procedure's cached plan may not work best for all parameter definitions.

One way to avoid this is to include a RECOMPILE clause with the CREATE PROCEDURE statement.

Example:

CREATE PROCEDURE SP
@myParam
WITH RECOMPILE
AS
BEGIN
-- SP Body
END
GO

By doing this, a new plan will be generated each time the procedure is called. If recompile time < time lost by its using the wrong cached plan, this is worth implementing. In your case, it will also save you the time/planning required to manually recompile this procedure every time you notice it is executing slowly.

2. Update your statistics:

EXEC sp_updatestats

sp_updatestats can trigger a recompile of stored procedures or other compiled code. However, sp_updatestats might not cause a recompile, if only one query plan is possible for the tables referenced and the indexes on them. A recompilation would be unnecessary in these cases even if statistics are updated.

3. Parameter sniffing
Parameter sniffing is a technique the SQL Server optimizer uses to try to figure out parameter values/ranges so it can choose the best execution plan for your query. In some instances SQL Server does a poor job at parameter sniffing & doesn't pick the best execution plan for the query.
With Parameter sniffing Sql Server uses to optimze the query execution plan for a stored procedure. When you first call the SP, Sql Server looks at the given parameter values of your call and decides which indices to use based on the parameter values.

Data Engineering with Avishkar