March 30, 2014

Embedded Business Intelligence

Embedded Business Intelligence

EMBEDDED BI is the integration of reports, dashboards, and analytic views inside an application. The information is typically displayed and managed by a BI platform and is placed directly within the application user interface to improve the context and usability of the data. Use of an embedded BI platform delivers state-of-the-art reporting and analytics without the time and expense of having to build it.

 BI is embedded into operational world and help to make better decisions in real time which is  relevanttimely and actionable

  1. Real Time BI - Embedded BI acts on real time data, not time delayed data stored in a separate warehouse or OLAP cube. A key factor in this is the source of the data - it comes from the application (or uses the same source as the application), not a data warehouse or data mart.
  2. Seamless Integration – Users do not want to switch applications between undertaking operational and reporting activities.  Integrated security and look and feel assist create a seamless integration between the host application and reporting.
  3. End User Centric – Embedded BI is much more end-user focused than traditional BI. With embedded BI you cannot assume that your users has knowledge of both the BI application and the data set being analysed.  Embedded BI needs to be significantly easier to use without training.
Benefits of Embedded BI:

Eckerson intimated that because users of standalone BI solutions are required to exit operational applications in order to access relevant reports, then subsequently re-enter the operational application to take appropriate action based on the intelligence garnered from the BI tool, their productivity is reduced. 

Eckerson said reduced productivity was a result of two key factors:
  • Having to exit, enter and re-enter different applications breaks user “train of thought”; and
  • Having to view analytical information via a separate BI application means the data is not viewed in its optimal context.

. These benefits included:
  • Higher perception of BI ease-of-use
  • Higher perception of information relevancy
  • Higher perception of reporting and analytics accessibility
  • Boost BI user adoption: Embedding BI functionality into an existing software application enables users to access and interact with those analytical features within a framework that they are already accustomed to, thereby increasing ease-of-use and lowering resistance to adoption. Not only does replicating the look and feel of the core application reduce barriers to adoption, embedding BI into an existing operations-specific application also ensures the relevancy of the analytics produced for the user base.
  • Boost BI effectiveness: Embedded BI can directly link reporting and analytics capabilities to operational processes to help improve the immediacy and relevancy with which users attain data-based insights, hence assisting to directly link insight to action.
  • Support pervasive BI: Embedded BI enables more pervasive use of reporting and analytics – and facilitates and underpins the development of an organizational culture based on fact-based decision-making – because BI insights are delivered via the applications and processes that users already utilize on a regular basis to perform their job. Therefore, embedding analytical capabilities into existing applications and processes is an effective way to deliver BI to a wide range of business departments without having to purchase a standalone BI platform to meet the requirements of each user group.
  • Build a bridge between information and action: By combining analytical and operational functions, embedded BI empowers users with the context they need to understand the relationships between operational processes and business data, enabling them to react faster to emergent internal and external business threats or opportunities.
  • Boost organizational effectiveness and efficiency by facilitating process automation: Embedded BI, directly linked to operational applications, can trigger automated actions and / or alerts that improve or address function-specific business processes (based on pre-determined benchmarks) in drastically reduced timeframes.
  • Enhance the salability and value of your core applications: And, if you’re a software vendor, adding an analytics module to your core application(s) can significantly increased the salability and value or your product(s).
  • March 16, 2014



    Qlik's QlikView product has become a market leader with its capabilities in data discovery, a segment of the BI platform market that it pioneered. QlikView is a self-contained BI platform, based on an in-memory associative search engine and a growing set of information access and query connectors, with a set of tightly integrated BI capabilities.
    • Qlik has embarked on one of the boldest strategies of any vendor to address enterprises' unmet need for a BI platform standard that can fulfill both business users' requirements for ease of use and IT's requirements for enterprise features relating to reusability, data governance and control, scalability, and so on. In the second half of 2014, Qlik plans to release a completely rearchitected product, QlikView.Next, featuring a redesigned interactive visualization user experience called Natural Analytics, to make it easier for users to discover and share new insights. Natural Analytics builds on the company's associative search capability and incorporates enhanced comparisons, collaboration, workflow, sharing and data dialogs, as well as enhanced insights from unique visualization techniques that Qlik acquired from NComVA in June 2013. QlikView.Next will also provide completely rearchitected enterprise server and administration capabilities, including reusable semantic intelligence and modeling that draws on its acquisition of Expressor Software, open APIs for extensibility, expanded data connectivity, and enhanced scalability and security features. By providing both business-user-oriented and IT-friendly capabilities, QlikView.Next has the potential to make Qlik a differentiated and viable enterprise-standard alternative to the incumbent BI players.
    • Customers choose QlikView for the intuitive interactive experience it offers; this is most often deployed in dashboards, where it enables business users to freely explore and find connections, patterns and outliers in data without having to model those relationships in advance. In particular, QlikView's associative search enables users easily to see which query results are related, to compare them, and more importantly to identify which data elements are not related, without having to write complex SQL. Users can also filter data using search capabilities. The percentage of QlikView customers that choose the platform because of its ease of use for end users is in the top two of all the vendors surveyed; an above-average percentage also select QlikView because of its ease of use for developers. QlikView's ease of use is coupled with an above-average score for the complexity of the types of analysis that users can conduct with the platform, and an above-average score for the breadth of functionality used. As a result, Qlik received one of the highest scores for market understanding of any vendor in the Magic Quadrant survey. In common with those of other stand-alone data discovery vendors, Qlik's customers also report achieving above-average business benefits. This powerful combination of advantages has been a key driver of data discovery success for vendors in general, and for Qlik in particular.
    • Qlik's customers also have a positive view of QlikView's composite functional capabilities, which, weighted for use, were rated above the survey average, including above-average individual scores for dashboards, interactive visualization, search-based data discovery (rated No. 1), geospatial intelligence, business user data mashup, collaboration (a score near the top), big data support (also near the top) and mobile BI. As a result of a high degree of satisfaction with its mobile functionality, Qlik has among the highest percentage of users deploying, piloting or planning to deploy mobile capabilities in the next 12 months.
    • Qlik's above-average scores for ease of use for developers, particularly when compared with traditional IT-centric enterprise vendors, has resulted in better-than-average implementation costs, IT developer costs and overall three-year BI platform ownership costs per user. The perception that QlikView offers a relatively low cost of ownership, when compared with other vendors' products, is also evident from the high percentage of customers that choose QlikView because of its implementation cost and associated effort, as well as its TCO.
    • Qlik has been successfully expanding its reach and awareness beyond its traditional stronghold of Europe (it was founded in Sweden) to North America, as well as to the growing regions of Asia/Pacific and Latin America. The partner channel is more important to Qlik than to any other BI platform vendor except Microsoft, particularly in comparison to its stand-alone data discovery competitors. The partner channel will be particularly important to Qlik's growth after the introduction of QlikView.Next, given the expectation that partners will use the platform's planned improved openness to build new QlikView.Next-based solutions.
    • The enterprise-readiness of the current release of QlikView remains a work in process. Despite QlikView being deployed in multiple departments and around the world, only half the QlikView customers we surveyed identified QlikView as their enterprise standard. This is far below the figures of most other incumbent BI vendors, whose customers report standardization rates of over 70%. QlikView received below-average customer survey scores for enterprise features such as metadata management, BI infrastructure and embeddable analytics. Additionally, customers and implementers continued to express concerns about QlikView's facilities for managing security and administering large numbers of named users. Although user deployment sizes and average data sizes continue to increase, they are around the survey average.
    • Customers most often select QlikView for its ease of use for end users, particularly in terms of its interactive dashboards and when compared with the offerings of the incumbent IT-centric vendors. However, in terms of visual-based interactive exploration and analysis capabilities, user experience, and the time it takes for business users to gain proficiency in authoring, the current QlikView 11.x release is considered more limited than offerings from other stand-alone data discovery vendors. With QlikView.Next, Qlik is placing major emphasis on filling this gap.
    • Qlik plans for QlikView.Next to deliver the combination of business user and IT capabilities that is currently lacking in the market. However, QlikView.Next will be delivered more than a year later than expected, which creates opportunities for its competitors to narrow any gaps. Moreover, no major rearchitecting is without risks to both customers and vendor, especially when the latter is also facing a more intense competitive landscape, as is the case with Qlik. It is not unusual for initial "point versions" of major releases to take time to reach complete stability. In addition, adopting this major new release will require some degree of migration, which could delay some deployments that might otherwise have occurred in 1H14. During the extended period before QlikView.Next's arrival, its competitors are not standing still. Incumbent vendors, stand-alone data discovery players and new market entrants continue aggressively to build and enhance their data discovery features, to innovate and make progress (some quickly) toward narrowing Qlik's "land and expand" potential and, more importantly, toward addressing the big "white space" opportunity (to delight business users while still offering IT control) that Qlik plans to address with QlikView.Next.
    • Qlik's customer experience results remain mixed. QlikView earned positive scores for product quality, which led to an overall above-average customer experience score. However, support scores for QlikView were again just below the survey average. Similarly, sales experience continued to be rated below the survey average. We believe these results are partly influenced by Qlik's rapid growth, since both support and sales proficiency are strongly correlated with employees' length of service; high growth means a larger percentage of relatively new sales and support people. Moreover, Qlik's sales and support organizations are in transition from selling to and supporting departments to selling to and supporting strategic enterprise deployments. A successful transformation on both fronts is critical if Qlik is to fulfill its enterprise aspirations for QlikView.Next.

    Magic Quadrant for Business Intelligence and Analytics

    For this Magic Quadrant, Gartner defines BI and analytics as a software platform that delivers 17 capabilities across three categories: information delivery, analysis and integration.

    Information Delivery
    Reporting: Provides the ability to create highly formatted, print-ready and interactive reports, with or without parameters.
    Dashboards: A style of reporting that graphically depicts performances measures. Includes the ability to publish multi-object, linked reports and parameters with intuitive and interactive displays; dashboards often employ visualization components such as gauges, sliders, checkboxes and maps, and are often used to show the actual value of the measure compared to a goal or target value. Dashboards can represent operational or strategic information.
    Ad hoc report/query: Enables users to ask their own questions of the data, without relying on IT to create a report. In particular, the tools must have a reusable semantic layer to enable users to navigate available data sources, predefined metrics, hierarchies and so on.
    Microsoft Office integration: Sometimes, Microsoft Office (particularly Excel) acts as the reporting or analytics client. In these cases, it is vital that the tool provides integration with Microsoft Office, including support for native document and presentation formats, formulas, charts, data "refreshes" and pivot tables. Advanced integration includes cell locking and write-back.
    Mobile BI: Enables organizations to develop and deliver content to mobile devices in a publishing and/or interactive mode, and takes advantage of mobile devices' native capabilities, such as touchscreen, camera, location awareness and natural-language query.

    Interactive visualization: Enables the exploration of data via the manipulation of chart images, with the color, brightness, size, shape and motion of visual objects representing aspects of the dataset being analyzed. This includes an array of visualization options that go beyond those of pie, bar and line charts, including heat and tree maps, geographic maps, scatter plots and other special-purpose visuals. These tools enable users to analyze the data by interacting directly with a visual representation of it.
    Search-based data discovery: Applies a search index to structured and unstructured data sources and maps them into a classification structure of dimensions and measures that users can easily navigate and explore using a search interface. This is not the ability to search for reports and metadata objects. This would be a basic feature of a BI platform.
    Geospatial and location intelligence: Specialized analytics and visualizations that provide a geographic, spatial and time context. Enables the ability to depict physical features and geographically referenced data and relationships by combining geographic and location-related data from a variety of data sources, including aerial maps, GISs and consumer demographics, with enterprise and other data. Basic relationships are displayed by overlaying data on interactive maps. More advanced capabilities support specialized geospatial algorithms (for example, for distance and route calculations), as well as layering of geospatial data on to custom base maps, markers, heat maps and temporal maps, supporting clustering, geofencing and 3D visualizations.
    Embedded advanced analytics: Enables users to leverage a statistical functions library embedded in a BI server. Included are the abilities to consume common analytics methods such as Predictive Model Markup Language (PMML) and R-based models in the metadata layer and/or in a report object or analysis to create advanced analytic visualizations (of correlations or clusters in a dataset, for example). Also included are forecasting algorithms and the ability to conduct "what if?" analysis.
    Online analytical processing (OLAP): Enables users to analyze data with fast query and calculation performance, enabling a style of analysis known as "slicing and dicing." Users are able to navigate multidimensional drill paths. They also have the ability to write-back values to a database for planning and "what if?" modeling. This capability could span a variety of data architectures (such as relational, multidimensional or hybrid) and storage architectures (such as disk-based or in-memory).

    BI infrastructure and administration: Enables all tools in the platform to use the same security, metadata, administration, object model and query engine, and scheduling and distribution engine. All tools should share the same look and feel. The platform should support multitenancy.
    Metadata management: Tools for enabling users to leverage the same systems-of-record semantic model and metadata. They should provide a robust and centralized way for administrators to search, capture, store, reuse and publish metadata objects, such as dimensions, hierarchies, measures, performance metrics/key performance indicators (KPIs), and report layout objects, parameters and so on. Administrators should have the ability to promote a business-user-defined data mashup and metadata to the systems-of-record metadata.
    Business user data mashup and modeling: Code-free, "drag and drop," user-driven data combination of different sources and the creation of analytic models, such as user-defined measures, sets, groups and hierarchies. Advanced capabilities include semantic autodiscovery, intelligent joins, intelligent profiling, hierarchy generation, data lineage and data blending on varied data sources, including multistructured data.
    Development tools: The platform should provide a set of programmatic and visual tools and a development workbench for building reports, dashboards, queries and analysis. It should enable scalable and personalized distribution, scheduling and alerts of BI and analytics content via email, to a portal and to mobile devices.
    Embeddable analytics: Tools including a software developer's kit with APIs for creating and modifying analytic content, visualizations and applications, embedding them into a business process, and/or an application or portal. These capabilities can reside outside the application, reusing the analytic infrastructure, but must be easily and seamlessly accessible from inside the application, without forcing users to switch between systems. The capabilities for integrating BI and analytics with the application architecture will enable users to choose where in the business process the analytics should be embedded.
    Collaboration: Enables users to share and discuss information, analysis, analytic content and decisions via discussion threads, chat and annotations.
    Support for big data sources: The ability to support and query hybrid, columnar and array-based data sources, such as MapReduce and other NoSQL databases (graph databases, for example). Support could include direct Hadoop Distributed File System (HDFS) query or access to MapReduce through Hive.

    Apache Cassandra


    The Apache Cassandra database is the right choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

    Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching.

    Apache Cassandra is a highly scalable and high-performance distributed database
    management system that can serve as both an operational datastore (the “system of record”) for
    online/transactional applications, and as a read-intensive database for business intelligence
    systems. Cassandra is able to manage the distribution of data across multiple data centers and
    offers incremental scalability with no single points of failure.
    Cassandra is a logical choice for enterprises that need high degrees of uptime, reliability, and
    very fast performance.
    Cassandra was originally incubated at Facebook and is based upon Google’s BigTable and
    Amazon’s Dynamo software. The end result is an extremely scalable and fault-tolerant data
    infrastructure that solves small to big data problems, handles write intensive user traffic, delivers
    sub-millisecond caching layer reads, and supports demanding workloads involving petabytes of

    Cassandra Architecture

    Cassandra is a peer-to-peer distributed data management system where every
    node is essentially the same with respect to how it functions in the cluster. In Cassandra, there is
    no concept of a “master node” or anything similar, with the benefit being derived that no single

    point of failure exists for any key process or function.

    The scale-out aspect of Cassandra allows node additions to occur with no disruption to
    application uptime. Cassandra automatically partitions data across nodes once one or more
    nodes have been added to a cluster and “seeds” the new nodes from existing machines in the
    cluster.Data redundancy to protect against hardware failure and other data loss scenarios is also built
    into and managed transparently by Cassandra.

    An administrator, architect, or developer only has to specify a replication and data-partitioning
    strategy. From there, Cassandra takes care of everything.
    All nodes in the cluster communicate with each other through the gossip protocol. If a node goes
    down, the cluster detects the failure and automatically routes user requests away from the failed
    machine. Once the failed node is operational again, it rejoins the cluster, and its data is brought
    back up to date via the other nodes.

    Why Cassandra
    • MySQL drives too many random I/Os
    • File-based solutions require far too many locks
    The new face of data
    • Scale out, not up
    • Online load balancing, cluster growth
    • Flexible schema
    • Key-oriented queries
    • CAP-aware

    CQL Language

    CQL provides a very similar syntax to that used in all RDBMSs, making it very easy for
    developers and administrators coming from the relational world to begin working with Cassandra.

    DDL, DML, and SELECT functionality all can be found in CQL.

    cqlsh> CREATE TABLE monkeySpecies (
        species text PRIMARY KEY,
        common_name text,
        population varint,
        average_size int
    ) WITH comment='Important biological records'
       AND read_repair_chance = 1.0;
    CREATE TABLE timeline (
        userid uuid,
        posted_month int,
        posted_time uuid,
        body text,
        posted_by text,
        PRIMARY KEY (userid, posted_month, posted_time)
    ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

    cqlsh> INSERT INTO users (KEY, password) VALUES ('jsmith', 'ch@ngem3a');

    cqlsh> SELECT * FROM users
    ... WHERE gender='f' AND
    ... state='TX' AND
    ... birth_year='1968';

    Batched Operations
    Cassandra supports tunable consistency on a per-operation basis, meaning developers can
    choose how strong or loose they want data consistency to be for a particular request. If a
    developer wants to apply a certain consistency level for a number of different requests, he or she
    can encase them in a BEGIN and APPLY BATCH statement.

    INSERT INTO users (KEY, password) VALUES (‘user1’, ‘mypass’)
    UPDATE users SET password = ‘newpass’ WHERE KEY = ‘user1’
    INSERT INTO users (KEY, password) VALUES (‘user2’, ‘user2pass’)
    DELETE name FROM users WHERE key = ‘user5’

    Batched operations allow a developer to retry (if necessary) a group of changes in an idempotent

    Cassandra highlights

    • High availability
    • Incremental scalability
    • Eventually consistent
    • Tunable tradeoffs between consistency and latency
    • Minimal administration
    • No SPF (Single Point of Failure)

    Applications suitable to use Cassandra 

  • Dispersed applications that need to serve numerous geographies with the same fast response times
  • Web online applications or 
  • other systems needing around-the-clock transactional input capabilities.
  • Applications needing extreme degrees of uptime and no single point of failure
  • Applications that need easy data elasticity, so capacity can be added to service peak workloads for various periods of time and then shrink back 
  • when user traffic reduction allows – all done in an online fashion
  • Write-intensive applications that must take in large volumes of data continuously card systems, music download purchases, device/sensor data, web clickstream,data, archiving systems, event logging.
  • Management of large data volumes (terabytes-petabytes) that must be kept online for query access and business intelligence processing.
  • Systems that need to store and directly deal with a combination of structured,unstructured, and semi-structured data, with a requirement for a flexible schema/data
    storage paradigm that allows for easy and online structure modifications
  • February 15, 2014

    Essential Guidelines for Choosing a Business Intelligence

    Implementing a Business Intelligence (BI) solution leads to dramatic operational improvements and benefits that far outweigh the investments in time, money, and personnel necessary to select, deploy, and maintain such an application.
    The correct tool should fit the company like a glove.  It must be, amongst other things, adaptable, scalable, secure, affordable, and have the ability to report on a multitude of business trends.  It should inform of business practices and identify opportunities that allow the end user to make educated decisions.
    Consolidation of data is the main drive in the development of business intelligence.  Data de duplication is a growing necessity due to the increase in mobility.  Organizations must now deal with increasing demands, higher degrees of accuracy, firmer controls and tighter timelines.  Companies must find a way to digest large data sets.  Automated reporting and notifications are essential in staying on top of the market trends.  The right business intelligence solution will include the ability to report, plan, monitor and analyze volumes of data.  In turn, this will allow for more strategic and informed business decisions.  The information can be processed and distributed on a company wide basis with even the simplest of templates and visuals.  In this modern age, it is essential for all employees to engage in the analytical process of information to some extent.   Reporting capabilities should be flexible and display actionable data sets.  The ultimate objective is to develop long term goals based on the data produced through business intelligence.
    The Business Intelligence Technology Stack
    To build a Business Intelligence solution, enterprises will need to consider new investments and upgrades to current technology to build out the BI technology stack.
    Storage and computing hardware - To implement BI, firms will need to invest or upgrade their data storage infrastructure.
    Applications and data sources - To develop an effective BI solution, source data will need to be scrubbed and organized. The challenge is that source data can come from any number of applications, most using proprietary data formats and application-specific data structures.
    Data integration- Extraction, transformation and loading (ETL) tools pull data from multiple sources, and load the data into a data warehouse. Again, the trend in data integration and Enterprise Application Integration, in general, is toward standardization through XML and web services.

    Relational databases and data warehouses -Firms will need a data warehouse to store and organize tactical or historical information in a relational database. Organizing data in this way allows the user to extract and assemble specific data elements from a complete dataset to perform a variety of analyses.

    OLAP applications and analytic engines -  Online analytic processing (OLAP) applications provide a layer of separation between the storage repository and the end user's analytic application of choice. Its role is to perform special analytical functions that require high-performance processing power and more specialized analytical skills.

    Analytic applications -Analytic applications are the programs used to run queries against the data to perform either "slide-and-dice" analysis of historical data or more predictive analyses, often referred to as "drill-down" analysis. For example, a customer intelligence application might enable a historical analysis of customer orders and payment history. Alternatively, users could drill down to understand how changing a price might affect future sales in a specific region.

    Information presentation and delivery products - The results of a query can be returned to the user in a variety of ways. Many tools provide presentation through the analytic application itself and offer dashboard formats to aggregate multiple queries. Also, enterprises can purchase packaged or custom reporting products, such as Crystal Reports. An important trend in BI presentation is leveraging XML to deliver analyses through a portal or any other Internet-enabled interface, such as a personal digital assistant (PDA).
    1. Is the package a complete solution?
    Complete BI solution is how quickly your own data can be leveraged to produce reporting, analytics, visualization, and easy integration with a variety of disparate data sources.  It should go beyond to predefined queries to ad hoc queries and dynamic selection by user.
    2.  Is the solution easy to use and administer?
    It’s important that ongoing use of your selected system minimizes the amount of IT involvement required. One of the biggest burdens on IT is the mandatory data warehouses necessitated by many BI solutions
    3. Investment in hardware and resources
    Can we use existing technology, hardware, people resources.
    Given the investment in hardware, software, training, and opportunity costs, it’s vital that the selected technology offer as short a path to productivity as possible. To attain a sufficient ROI, users will need to derive value very quickly.
    4.Will the solution scale?
    Even the most successful BI implementation can run into difficulties when faced with increased data volumes and usage loads. Workloads increase for a number of reasons, including natural data volume growth, selecting additional dimensions for analysis, and incorporating new data sources.
    5. Self-Service BI Is About More Than Interactivity- Organizations constantly struggle with their data. Integrating, managing, and verifying data sources are continuous exercises required for businesses looking at ways to increase their competitive advantage and understand what is occurring within their organization’s daily operations. 
    Why So Many BI Projects Fail?
    1. A confusing product landscape - Confuse between requirement  i.e. reporting Vs  analytics
    2. BI Cost model - Complicate the job of estimating a return on investment for the substantial BI outlays.
    3.Operational limitations - Some BI software require very lengthy deployments users often complain of bloated feature sets, static reports and queries, and built-in IT when it’s time to modify or expand the inventory of queries and reports. On the other hand, point solutions regularly suffer from considerable amounts of missing functionality. 

    December 16, 2013

    Database supporting different languages

    As we all know, many global industries wants to increase their business worldwide and grow at the same time, they would want to widen their business by providing services to the customers worldwide by supporting different languages like Chinese, Japanese, Korean and Arabic. Many websites these days are supporting international languages to do their business and to attract more and more customers and that makes life easier for both the parties.
    To store the customer data into the database the database must support a mechanism to store the international characters, storing these characters is not easy, and many database vendors have to revised their strategies and come up with new mechanisms to support or to store these international characters in the database. Some of the big vendors like Oracle, Microsoft, IBM and other database vendors started providing the international character support so that the data can be stored and retrieved accordingly to avoid any hiccups while doing business with the international customers.
    The difference in storing character data between Unicode and non-Unicode depends on whether non-Unicode data is stored by using double-byte character sets. All non-East Asian languages and the Thai language store non-Unicode characters in single bytes. Therefore, storing these languages as Unicode uses two times the space that is used specifying a non-Unicode code page. On the other hand, the non-Unicode code pages of many other Asian languages specify character storage in double-byte character sets (DBCS). Therefore, for these languages, there is almost no difference in storage between non-Unicode and Unicode.

    Collation itself specifies the rules for how strings of character data are sorted and compared. The rules for sorting data vary depending on the language and locale.
    For example if you was to use a Lithuanian collation, the letter "Y" would appear between "I" and "J" if sorted. And if using the traditional Spanish collation "ch" would be sorted at the end of a list of words beginning with "c".

    You can specify collations at following:
    1. Creating or altering a database.
    2. Creating or altering a table column.
    You can specify collations for each character string column using the COLLATE clause of the CREATE TABLE or ALTER TABLE statement. You can also specify a collation when you create a table using SQL Server Management Studio. If you do not specify a collation, the column is assigned the default collation of the database.
    3.Casting the collation of an expression.
    You can use the COLLATE clause to apply a character expression to a certain
    collation. Character literals and variables are assigned the default collation of
    the current database. Column references are assigned the definition collation of
    the column.
    4.When restoring or attaching a database, the default collation of the database and
    the collation of any char, varchar, and text columns or parameters

    The COLLATE clause can be applied only for the char, varchar, text, nchar,
    nvarchar, and ntext data types.

    You can execute the system function fn_helpcollations to retrieve a list of all the

    valid collation names for Windows collations and SQL Server collations:
    SELECT name, description
    FROM fn_helpcollations();

    Use of nchar, nvarchar, nvarchar(max), and ntext is the same as char, varchar, varchar(max), and text, respectively, except:
    Unicode supports a wider range of characters.
    More space is needed to store Unicode characters.
    The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar.
    Unicode constants are specified with a leading N, for example, N'A Unicode string'

    The collation on the database  to accept the Russian characters using the CYRILLIC_GENERAL_CI_AS collation set.  The CI stands for CASE INSENSTIVE and the AS
    stands for ACCENT SENSTIVE.

    April 01, 2013

    Amazon RDS for Microsoft SQL Server

    Amazon RDS frees you up to focus on application development by managing time-consuming database administration tasks including provisioning, backups, software patching, monitoring, and hardware scaling.

    You can run Amazon RDS for SQL Server under two different licensing models – “License Included” and “License Mobility through Software Assurance (or Bring Your Own License – BYOL)”.
    "License Included" pricing starts at $0.035 per hour and is inclusive of software, underlying hardware resources, and Amazon RDS management capabilities.

    Microsoft’s License Mobility program allows customers who already own SQL Server licenses to run SQL Server deployments on Amazon RDS.

    Amazon RDS for SQL Server DB Instances can be provisioned with either standard storage or Provisioned IOPS storage. Amazon RDS Provisioned IOPS is a storage option designed to deliver fast, predictable, and consistent I/O performance, and is optimized for I/O-intensive, transactional (OLTP) database workloads.

    Amazon Web Services

    Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers. Amazon Elastic Block Store (EBS) provides persistent storage to Amazon EC2 instances.

    Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.

    Amazon RDS is a web service that makes it easier to set up, operate, and scale a relational database in the cloud.

    Amazon DynamoDB is a high performance non-relational database service that is easy to set up, operate, and scale. It is designed to address the core problems of database management, performance, scalability, and reliability. It also provides predictable high performance and low latency at scale.

    Amazon SimpleDB is a web service providing the core database functions of data indexing and querying in the cloud.

    mazon Simple Queue Service (Amazon SQS) offers a reliable, highly scalable, hosted queue in the cloud.

    Amazon Simple Email Service (Amazon SES) is a highly scalable and cost-effective bulk and transactional email-sending service for businesses and developers.

    Amazon Glacier is an extremely low-cost storage service that provides secure and durable storage for data archiving and backup. It is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable.

    Amazon CloudFront is a web service for content delivery. It delivers your content using a global network of edge locations and works seamlessly with Amazon S3 which durably stores the original, definitive versions of your files.

    Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. It is protocol-compliant with Memcached, so code, applications, and tools that you use today with your existing Memcached environments work seamlessly with the service.

    Amazon CloudWatch is a web service that enables you to monitor your Amazon EC2 instances, Amazon EBS volumes, Elastic Load Balancers, and Amazon RDS database instances in real-time. You can also supply your own custom application metrics. With Amazon CloudWatch you can access up-to-the-minute statistics, view graphs, and set alarms for your metric data.

    Amazon Virtual Private Cloud (Amazon VPC) is a secure and seamless bridge between a company's existing IT infrastructure and the AWS cloud.

    Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.

    AWS Import/Export transfers large amounts of data directly onto and off of storage devices using Amazon's high-speed internal network and bypassing the Internet.

    March 02, 2013

    Storing Log Data using MongoDB

    This blog outlines the basic patterns and principles for using MongoDB as a persistent storage engine for log data from servers and other machine data.Servers generate a large number of events (i.e. logging,) that contain useful information about their operation including errors, warnings, and users behavior. By default, most servers, store these data in plain text log files on their local file systems.While plain-text logs are accessible and human-readable, they are difficult to use, reference, and analyze without holistic systems for aggregating and storing these data.

    1. Schema Design
    The schema for storing log data in MongoDB depends on the format of the event data that you’re storing.The preferred approach is to extract the relevant information from the log data into individual fields in a MongoDB document.When you extract data from the log into fields, pay attention to the data types you use to render the log data into MongoDB. Using proper types for your data also increases query flexibility: if you store date as a timestamp you can make date range queries, whereas it’s very difficult to compare two strings that represent dates. The same issue holds for numeric fields; storing numbers as strings requires more space and is difficult to query.When extracting data from logs and designing a schema, also consider what information you can omit from your log tracking system. In most cases there’s no need to track all data from an event log, and you can omit other fields.

    2.System Architecture
    Insertion speed is the primary performance concern for an event logging system. At the same time, the system must be able to support flexible queries so that you can return data from the system efficiently.
    MongoDB has a configurable write concern. This capability allows you to balance the importance
    of guaranteeing that all writes are fully recorded in the database with the speed of the insert.
    For example, if you issue writes to MongoDB and do not require that the database issue any response, the writeoperations will return very fast (i.e. asynchronously,) but you cannot be certain that all writes succeeded.
    The following command will insert the event object into the events collection.
    >>>, w=0)
    By setting w=0, you do not require that MongoDB acknowledges receipt of the insert. Although very fast, this is risky
    because the application cannot detect network and server failures. See write-concern for more information.

    Conversely,if you require that MongoDB acknowledge every write operation, the database will not return as quickly but you can be certain that every item will be present in the database.
    In this case use pass w=1 argument as follows:
    >>>, w=1)

    Finally, if you have extremely low tolerance for event data loss, you can require that MongoDB replicate the data to multiple secondary replica set members before returning:
    >>>, w=majority)

    Eventually your system’s events will exceed the capacity of a single event logging database instance. In these situations you will want to use a sharded cluster, which takes advantage of MongoDB’s sharding functionality.
    In a sharded environment the limitations on the maximum insertion rate are:
    • the number of shards in the cluster.
    • the shard key you chose.
    Because MongoDB distributed data in using “ranges” (i.e. chunks) of keys, the choice of shard key can control how MongoDB distributes data and the resulting systems’ capacity for writes and queries.
    Shard key choices:
    • Shard by Time
    • Shard by a Semi-Random Key
    • Shard by an Evenly-Distributed Key in the Data Set
    • Shard by Combine a Natural and Synthetic Key

    Choosing a Mobile BI Solution

    Mobile BI Solution are helping remote employees/users  manage supply chains more efficiently or keeping traveling executives informed of the latest financial developments, today’s mobile ad hoc reporting solutions provide the dynamic capabilities organizations need to stay competitive and drive innovation in the field.
    While working in the field used to mean relying on static data, today’s mobile BI solutions offer the ability to generate interactive reports with in-depth analytic functionality.

    • Solutions which provide unified user experience across all devices are most suitable for Mobile BI solutions. 
    • Rather than relying on static data, users should be able to use real-time updates to inform their decisions.
    • Mobile BI solution should facilitates sharing reports, both over wireless networks and in person.
    • Users may need to access mobile BI solutions from remote locations where internet connectivity is low or absent  or on a plane. While a lack of connectivity prohibits real-time updates, a good mobile BI offering should have some form of reliable offline access to recent and saved reports so that employees can tap into data-driven insights.

    February 27, 2013

    NoSQL Key-Value Store

    Basic terminology:
    • Key-Value Store – data is stored in unstructured records consisting of a key + the values associated with that record
    • NoSQL –Doesn’t use SQL commands
    Let’s say you’ve got millions of data records — as you might have for example, if you’ve got millions of users who visit your website.
    looks like a “row” in a database table).Note that not every user has the same information — some users will have a username, some will only have an email address, some users will have provided their name and others will not.  Each record has a different length and different values.
    To store this kind of data, you create a key for each record and then store whatever fields are available as bins (what would be columns in a structured database) — where each bin consists of a name and a value.  Then you create a bin for each piece of data you have.  If you don’t have a particular piece of data, you don’t have a blank field (like in a relational table), you simply don’t store a bin for that data.
    This type of database is called a Key-Value Store because each record has a primary key and a collection of values (bins).  It’s also called a Row Store because all of the data for a single record is stored together, in something that we can think of conceptually as a row.

    Example of unstructured data for user records:

    Key: 1 ID:av First Name: Avishkar

    Key: 2 Email: Location: Mumbai Age: 37

    Key: 3  Facebook ID: avishkarmeshram  Password: xxx  Name: Avishkar

    Data is organized into policy containers called ‘namespaces’, semantically similar to ‘databases’ in an RDBMS system. Namespaces are configured when the cluster is started, and are used to control retention and reliability requirements for a given set of data. 
    Within a namespace, data is subdivided into ‘sets’ (similar to ‘tables’) and ‘records’ (similar to ‘rows’). Each record has an indexed ‘key’ that is unique in the set, and one or more named ‘bins’ (similar to columns) that hold values associated with the record.

    Indexes (primary keys) are stored in DRAM for ultra-fast access and values can be stored either in DRAM or more cost-effectively on SSDs. Each namespace can be configured separately, so small namespaces can take advantage of DRAM and larger ones gain the cost