Data Engineering with Avishkar: 03/01/2016

March 31, 2016

How do you set a default value for a MySQL Datetime column?

How do you set a default value for a MySQL Datetime column?
In SQL Server, it's getdate(). What is the equivalant for MySQL?

The DEFAULT value clause in a data type specification indicates a default value for a column. With one exception, the default value must be a constant; it cannot be a function or an expression. This means, for example, that you cannot set the default for a date column to be the value of a function such as NOW() or CURRENT_DATE. The exception is that you can specify CURRENT_TIMESTAMP as the default for a TIMESTAMP column.

1. Create table

Create table test
(
id integer,
CreatedOn TIMESTAMP DEFAULT CURRENT_TIMESTAMP);

Or get around the problem using a trigger,Create trigger to set value everytime new row is inserted.

CREATE TRIGGER TRIG_test_CreatedOn

BEFORE INSERT
ON test
FOR EACH ROW

SET NEW.CreatedOn = NOW();

2. Existing Datetime column?

CREATE TRIGGER TRIG_test2_CreatedOn

BEFORE INSERT
ON test2
FOR EACH ROW

SET NEW.CreatedOn = NOW();

Or

alter table test2 CHANGE COLUMN CreatedOn CreatedOn TIMESTAMP NOT NULL
DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;

March 17, 2016

QlikView Vs Qlik Sense

QlikView Vs Qlik Sense

Qlik Sense is not just a new release of QlikView. Instead it is something different. There are so many similarities and differences between the two products

The two products have the same analysis engine and old scripts and old formulae will (almost) always work exactly the same way as before. Both the two products both have the same Green-White-Gray logic; both use the same calculation engine; both have roughly the same response times; and you should use the same considerations for both when it comes to data modelling.

Differences

1. QlikView used for Guided Analytics of QlikView and QlikSense used for Self Service BI.
2. QlikSense has a responsive UI on any platform QlikView don't have.

Common string operations in SQL Server 2008

1. Find the last occurence of a character within a string in sql server 2008

DECLARE @StringA varchar(100), @pos int, @FindChar char(1)=','
SET @stringA = 'DirectoryA,'

SET @pos = LEN(@StringA) - CHARINDEX(',',REVERSE(@StringA))
SELECT @pos

2. How to remove special char from a string in sql server 2008

select replace(@stringA,',','')

3. How to remove last occurence from a string in sql server 2008

Select SUBSTRING (@stringA, 1, len(@stringA)-1)

4. How to remove last occurence of comma from a string in sql server 2008

select case

when right(rtrim(@stringA),1) = ',' then substring(rtrim(@stringA),1,len(rtrim(@stringA))-1)

else @stringA END

March 16, 2016

Issues with MS SQL Server Standard edition in AWS

Issues with MS SQL Server Standard edition in AWS

1. Database Mirroring and index maintenance
MS SQL Server Standard edition in AWS allows synchronous mirroring only.In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror’s transaction log, and thus led to a decrease in transaction throughput on the principal.

Solution: log stream compression

SQL Server 2008 introduces a new feature called "Database Mirroring Log Compression". With SQL Server 2008, the outgoing log stream from the principal to the mirror is compressed, thereby minimizing the network bandwidth used by database mirroring. In a network constrained for bandwidth, compressing the log stream helps improve the application performance and throughput.Log compression compression rates of at least 12.5% are achieved.

2. You can’t cluster in EC2… with SQL Server 2008 R2
You can’t cluster in EC2. There’s no shared storage. EBS volumes can only be attached to a single server at a time. You could share the drives, but you really shouldn’t do that. You could use some Windows based SAN system that mounts drives and pretends it’s a SAN, but without any guarantees of performance or longevity, why would you want to?

HA options : Log shipping, mirroring and transactional replication

3. Scaling SQL Server in EC2: SCALING UP

There’s a finite limit to how much you can scale up SQL Server in EC2. That limit is currently 8 cores and 68.4GB of RAM. Those 8 cores currently are Intel Xeon X5550s that clock in at 2.66 GHz, but that will have changed by the time you’re reading this.

4.Backup or restore database:

Backup or restore database activity on AWS SQL Server standard edition with EBS disks have following issues.
1. Connections timeout
2. I/O requests taking longer than than usual
3. The timeouts was not just from the database being restored, but from all databases as EBS drives get clogged up causing IO to bunch up severely.

Some recommendations:
1. It looks as though the disk reads are maxing out the EBS connection, doublecheck EC2 tuning settings?
2. Choose an EBS–optimized instance that provides more dedicated EBS throughput than your application needs; otherwise, the connection between Amazon EBS and Amazon EC2 can become a performance bottleneck.
You can enable EBS optimization for the other instance types that support EBS optimization when you launch the instances, or enable EBS optimization after the instances are running.

Database Mirroring and index maintenance in AWS

Database Mirroring and index maintenance in AWS

Database Mirroring, the database has to be in the full recovery mode, so all index rebuilds are fully logged. It all comes down to the amount of transaction log generated and whether this causes a problem.

MS SQL Server Standard edition in AWS allows synchronous mirroring only.In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror’s transaction log, and thus led to a decrease in transaction throughput on the principal.

Basically, the way mirroring works is by shipping all the transaction log from the principal to the mirror, and then replaying the log records on the mirror database. Network bandwidth can be a bottleneck for mirrored databases where the rate of transaction log generation is high enough to saturate the network. On synchronously mirrored databases (where a transaction cannot commit on the principal until all the transaction log for it has been hardened in the log of the mirror database), this can even lead to workload throughput degradation.

The index creation time for a clustered and a nonclustered index for various mirroring safety levels. Creating an index on a large table takes longer with safety FULL compared to safety OFF. The performance impact of synchronous mirroring is more pronounced if you create a clustered index than if you create a nonclustered index. This is because creating a clustered index generates much more transaction log compared to that generated by a nonclustered index.

Performance under database mirroring is highly dependent upon the network performance and the log I/O performance. If either the network performance or the log I/O performance is a bottleneck, the performance of database mirroring for the workload can degrade significantly with the safety FULL operation.

The network becomes a bottleneck for mirroring and it slows down both the maintenance and reduces the workload throughput.

Solution: log stream compression

SQL Server 2008 introduces a new feature called "Database Mirroring Log Compression". With SQL Server 2008, the outgoing log stream from the principal to the mirror is compressed, thereby minimizing the network bandwidth used by database mirroring. In a network constrained for bandwidth, compressing the log stream helps improve the application performance and throughput.

Log compression compression rates of at least 12.5% are achieved.

The downside of this is that compression comes with a cost as extra CPU must be used. The obvious CPU load increases comes from having to compress the transaction log on the principal and then decompress it again on the mirror before hardening it in the log. The non-obvious increase will come because the principal and mirror should end up being able to process more transactions/second, which means more CPU is needed.

This extra CPU could be a problem for some systems that are already pegged in terms of CPU load, and so they may actually see a drop in performance when log stream compression is enabled. Now, it’s on by default when you upgrade to SQL Server 2008 but there is a way to turn it off.

Data Engineering with Avishkar