Data Engineering with Avishkar: database

Showing posts with label database. Show all posts

March 17, 2016

Common string operations in SQL Server 2008

1. Find the last occurence of a character within a string in sql server 2008

DECLARE @StringA varchar(100), @pos int, @FindChar char(1)=','
SET @stringA = 'DirectoryA,'

SET @pos = LEN(@StringA) - CHARINDEX(',',REVERSE(@StringA))
SELECT @pos

2. How to remove special char from a string in sql server 2008

select replace(@stringA,',','')

3. How to remove last occurence from a string in sql server 2008

Select SUBSTRING (@stringA, 1, len(@stringA)-1)

4. How to remove last occurence of comma from a string in sql server 2008

select case

when right(rtrim(@stringA),1) = ',' then substring(rtrim(@stringA),1,len(rtrim(@stringA))-1)

else @stringA END

March 16, 2016

Issues with MS SQL Server Standard edition in AWS

Issues with MS SQL Server Standard edition in AWS

1. Database Mirroring and index maintenance
MS SQL Server Standard edition in AWS allows synchronous mirroring only.In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror’s transaction log, and thus led to a decrease in transaction throughput on the principal.

Solution: log stream compression

SQL Server 2008 introduces a new feature called "Database Mirroring Log Compression". With SQL Server 2008, the outgoing log stream from the principal to the mirror is compressed, thereby minimizing the network bandwidth used by database mirroring. In a network constrained for bandwidth, compressing the log stream helps improve the application performance and throughput.Log compression compression rates of at least 12.5% are achieved.

2. You can’t cluster in EC2… with SQL Server 2008 R2
You can’t cluster in EC2. There’s no shared storage. EBS volumes can only be attached to a single server at a time. You could share the drives, but you really shouldn’t do that. You could use some Windows based SAN system that mounts drives and pretends it’s a SAN, but without any guarantees of performance or longevity, why would you want to?

HA options : Log shipping, mirroring and transactional replication

3. Scaling SQL Server in EC2: SCALING UP

There’s a finite limit to how much you can scale up SQL Server in EC2. That limit is currently 8 cores and 68.4GB of RAM. Those 8 cores currently are Intel Xeon X5550s that clock in at 2.66 GHz, but that will have changed by the time you’re reading this.

4.Backup or restore database:

Backup or restore database activity on AWS SQL Server standard edition with EBS disks have following issues.
1. Connections timeout
2. I/O requests taking longer than than usual
3. The timeouts was not just from the database being restored, but from all databases as EBS drives get clogged up causing IO to bunch up severely.

Some recommendations:
1. It looks as though the disk reads are maxing out the EBS connection, doublecheck EC2 tuning settings?
2. Choose an EBS–optimized instance that provides more dedicated EBS throughput than your application needs; otherwise, the connection between Amazon EBS and Amazon EC2 can become a performance bottleneck.
You can enable EBS optimization for the other instance types that support EBS optimization when you launch the instances, or enable EBS optimization after the instances are running.

January 13, 2016

Amazon RDS

-->

Amazon RDS

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easier to set up, operate, and scale a relational database in the cloud.

Amazon RDS Components

The basic building block of Amazon RDS is the DB instance. Each DB instance runs a DB engine. The computation and memory capacity of a DB instance is determined by its DB instance class. E.g. db.m4.large | db.m4.xlarge | db.m4.2xlarge | db.m4.4xlarge | db.m4.10xlarge.

June 13, 2014

Basic Database Terms

A session is a single connection to SQL Server, identified by a unique SessionID value. It is initiated through an application when the open method is used on a connection object or through a tool like SSMS when the File | Connect menu item is selected. Even though multiple sessions may originate from the same application (and many query windows opened by the same user using the same SSMS instance), as far as SQL Server is concerned, these are all completely separate SQL Server sessions.

Locking occurs when a SQL Server session takes "ownership" of a resource by acquiring a lock, prior to performing a particular action on that resource, such as reading or updating. Locking will stay in effect until SQL Server releases the locks. Note that locking itself is not a problem; it has very little measurable impact on any aspect of our systems, including performance, except when it results in blocking or deadlocking, or when we are performing excessive monitoring of our system locks.

Blocking occurs when at least two sessions desire concurrent access to the same resource. One session acquires a lock on the resource, in order to perform some action, and so renders that resource temporarily unavailable to other sessions. As a result, other sessions requiring the same resource are temporarily blocked. Typically, the blocked sessions will gain control of the resource after the blocking session releases the locks, so that access to the resource is serialized. Note that not all concurrent access will cause blocking; it is dependent on the operations being performed by the sessions, which determines the type of locks that are acquired.

A deadlock occurs when two sessions mutually block each other. Neither one can release the resources it holds until it acquires a lock on the resource the other session holds. A deadlock can also involve more than two sessions, trapped in a circular chain of dependencies. For example, session A may hold a resource that session B wants, and in turn session A is waiting for session C to release a resource. Session B may also hold a resource that session C wants. So session A is blocking B and is blocked by C, session B is blocking C and is blocked by A, and session C is blocking A and is blocked by B. None of the three sessions can proceed.

• Pressure is a term used to indicate a state where competition for access to a certain resource is causing performance issues. In a database with well-designed tables and queries, SQL Server acquires and releases locks quickly and any blocking is fleeting, and undetectable by the end-user. However, in certain circumstances, such as when long-running transactions hold locks on a resource for a long time, or where a very high number of sessions all require access to the same shared resource, blocking issues can escalate to the point where one session that is blocked, in turn blocks other sessions, which in turn block others. As the "queue" of blocked sessions grows longer, so the load on the system increases and more and more users start to experience unacceptable delays. In such cases, then we say that the resource is experiencing pressure.

Transactions The simplest definition of a transaction is that it is a single unit of work; a task or set of tasks that together form an "all-or-nothing" operation. If some event interrupts a transaction in the middle, so that not all of it was completed, the system should treat the transaction as if it never occurred at all. Transactions can apply to other kinds of systems besides databases, but since this is a database-specific book, we'll be concerned only with database transactions. A transaction can be short, like changing the price of one book in the inventory, or long, like updating the quantity sold of every inventory item at the beginning of an accounting period.

March 22, 2010

Finding Column Dependencies in SQL server 2005

If you need to make small changes to an existing database object, then you will need to do some impact assessment to determine what SQL Server Objects will be affected by your small change.We need to determine column dependencies within database as well as these dependencies across different databases.

Using the SYSCOMMENTS table:
The actual code for views, rules, defaults, triggers, CHECK constraints, DEFAULT constraints and stored procedures are stored in the syscomments table. The column TEXT in the syscomments table contains the actual code for all these objects. Knowing this allows you to write some simple T-SQL code that can scan the syscomments table looking for an actual table column name or database object name that you are planning on changing. By scanning the syscomments table, you will be able to narrow the focus of your impact analysis down to only those objects that might be affected by your change.

e.g
select name
from syscomments c
join sysobjects o on c.id = o.id
where TEXT like '%Titles%' and TEXT like '%Title_Id%' -- we are determining in which objects Title_Id column of Titles table is used.

This query looks for syscomments rows that contain both the string "Title_Id " and "titles" in the same record. With this query, I find all syscomments records that contain both a reference to the table and column that I planned on changing. This query joins the syscomments and sysobjects table so that the object name can be displayed in the results.

Using the sp_depends Stored Procedure:
SQL Sever maintains a database system table named sysdepends. SQL Server uses this table to store object dependencies. This table only contains information about objects that are dependent upon other objects. It does not contain detailed dependency information, such as which attributes of an object are referenced by other objects. Even so, you can use the information in the sysdepends table to help narrow down your impact analysis when you are making a schema change.

To access the sysdepends table information SQL Server provides the sp_depends system stored procedure.
e.g. EXEC sp_depends 'Titles'

Lets say , it return objects dbo.titleview , GetAllTitles

This will return all objects that are dependent on the table name "Titles."

Now run below query to get all objects which use column Title_id.

select distinct object_name(id) from sys.syscomments
where TEXT like '%Titles%' and TEXT like '%Title_Id%'
and object_name(id) in (
'GetAllTitles',
'titleview'
')

Not All Code is Stored in SQL Server
Now keep in mind the syscomments and sp_depends methods may not find all your code that is affected by a schema change. These two methods are only able to scan objects that are stored in SQL Server (views, rules, defaults, triggers, CHECK constraints, DEFAULT constraints, and stored procedures). If you have written your application code in such a way that allows you to issue T-SQL SELECT, UPDATE, INSERT and DELETE statements in code blocks that are not stored in SQL Server, then you will need to use other methods for scanning this code. Hopefully where you store the code for your application outside SQL Server is in a repository or file structure of some kind where you can do a global find or search of some kind.

When you are making a database schema change, it is important to identify all the objects that might be affected by the change. Without doing a complete impact assessment, you run the risk of causing problems when implementing your schema change. Therefore having some easy, and automated method to help identify the objects that will be impacted by your schema change is critical, and a time saver. The next time you are doing some impact analysis associated with a schema change to your database, you might consider using one or all the methods I discussed to help narrow down the objects impacted by your proposed change.

Log File in SQL Server 2005

In SQL Server 2005, each database contains at least one data file and one transaction log file. SQL Server stores the data physically in the data file. The transaction log file stores the details of all the modifications that you perform on your SQL Server database and the details of the transactions that performed each modification. Because the transactional integrity is considered a fundamental and intrinsic characteristic of SQL Server, logging the details of the transactions cannot be turned off in SQL Server.

Recovery Model
Recovery model is the way you want the transactions to be registered into the log file. Since log files include all the transactions in the database, they can be used for recovery. There are 3 types of recovery, here they are, and what they do.

Simple recovery model:

The simple recovery model is just that: simple. In this approach, SQL Server maintains only a minimal amount of information in the transaction log. SQL Server truncates the transaction log each time the database reaches a transaction checkpoint, leaving no log entries for disaster recovery purposes.

In databases using the simple recovery model, you may restore full or differential backups only.

Full recovery model:

In the event of a database failure, you have the most flexibility restoring databases using the full recovery model. In addition to preserving data modifications stored in the transaction log, the full recovery model allows you to restore a database to a specific point in time.

The bulk-logged recovery model is a special-purpose model that works in a similar manner to the full recovery model. The only difference is in the way it handles bulk data modification operations. The bulk-logged model records these operations in the transaction log using a technical known as minimal logging. This saves significantly on processing time, but prevents you from using the point-in-time restore option.

Issues due to large transaction log file:

Sometimes, the transaction log file may become very large and you may run out of disk space. When a transaction log file grows until the log file uses all the available disk space and cannot expand any more, you can no longer perform any data modification operations on your database. Additionally, SQL Server may mark your database as suspect because of the lack of space for the transaction log expansion.

To recover from a situation where the transaction logs grow to an unacceptable limit, you must reduce the size of the transaction logs. To do this, you must truncate the inactive transactions in your transaction log and shrink the transaction log file.

When the transaction logs grow to an unacceptable limit, you must immediately back up your transaction log file. While the backup of your transaction log files is created, SQL Server automatically truncates the inactive part of the transaction log. The inactive part of the transaction log file contains the completed transactions, and therefore, the transaction log file is no longer used by SQL Server during the recovery process. SQL Server reuses this truncated, inactive space in the transaction log instead of permitting the transaction log to continue to grow and to use more space.

The backup operation or the Truncate method does not reduce the log file size. To reduce the size of the transaction log file, you must shrink the transaction log file. To shrink a transaction log file to the requested size and to remove the unused pages, you must use the DBCC SHRINKFILE operation. The DBCC SHRINKFILE Transact-SQL statement can only shrink the inactive part inside the log file.

If your database's log file size reaches its limit, you can truncate and then shrink it to its minimum size by using the following commands:

USE DatabaseName
GO
DBCC SHRINKFILE(, 1)
BACKUP LOG WITH TRUNCATE_ONLY
DBCC SHRINKFILE(, 1)

If you've limit your log file size, and it gives an error when it has reached its maximum limit, you can do the following to tackle the issue:
1: Set your database's recovery mode to "SIMPLE". By default, SQL Server 2005 sets a database's recovery mode to "FULL". Now, before changing the recovery mode, you should know what you're willing to recede. When a database's recovery mode is "FULL", it means when you restore or when the database gets corrupted, SQL Server will restore it to a point right before it got corrupted! On the other hand, if you use "SIMPLE" recovery mode, it will simply restore the database to the last point the backup was made. For instance, you took a database's backup at, let's say, 10 am, and your database got corrupted at 5 pm the same day. With FULL recovery mode, you'll be able to restore your batabase to a state it was in right before 5 pm. With SIMPLE recovery mode, you'll be able to restore your database to a state it was in at 10 am!

2: Take periodic backup of your database's log file! When you take its backup, the committed transactions in the log file go in overwrite mode.

When you have your recovery mode to full, you are supposed to make two types of backups:
1.Full backup every ones in a while.
2.Transaction log backups according to transaction log volume to keep the log file small and being able to do a point in time recovery , If you skip this not only will the log file keep on growing, but the only restore you can do will be a restore from the full backup. And in that case you loose all the work from that moment on. If you backup the transaction log as well, you can do a roll forward and restore up until the last transaction log backup.

Schedule a job to take the backup of your log file and schedule it according to transaction log volume.

If you restore a database, a new database is created for you, and a new logfile. As SQL Server is a write ahead RDBMS, every transaction first goes to the log, and only then to disk. So a database in SQL Server will always contain a data file and a log file. Right after the restore, your database will be as at the time of the backup, the logfile will be almost empty. Only when you do a 'special' restore, some info from the logfile can be used. Under normal circumstances, all open transactions that are in the logfile are rolled back after a restore. So actually, there is some logfile data in a full backup ... but 'backup database' statement can never 'empty' the logfile for you !

Eliminate the Log File Completely
Sometimes, we just do not need the big log file. For example, I have a 40GB log file. I am sure I do not need this log file and want to get rid of it completely to free up the hard drive space. The logic is:

Backup DB
Detach DB
Rename Log file
Attach DB
New log file will be recreated
Delete Renamed Log file.

Let’s say, the database name is testDev. In the SQL Server Management Studio,

Highlight the database-> Tasks->Detach..-> Click OK
Go to log file folder -> rename the testDev_log.ldf to be like testDev_log-aa.ldf
Highlight Databases->Attach…-> Click Add -> add the database testDev, highlight the log file and click the ‘Remove’ button. This means you only attach testDev.mdf
After this is done, you can verify the contents of the attached database and then delete the log file
This way we can safely delete the log file and free up the space.

Data Engineering with Avishkar