October 12, 2011

Business Analytics: Good Data and Poor Data

Today, most organizations use data in two ways:
Transactional/Operational use (“running the business”), and Analytic use (“improving the business”).

Good business demands GoodData.
Business analytics can provide amazing insights into how an organization is operating -- in hindsight, with insight and with foresight. But one must be attentive to the quality of the data being analyzed and put first things first. Step one is to check the validity of the data, ensure its quality and completeness. Step two is to ask those key questions that help provide the information needed to make informed decisions. Internal auditors, armed with analytic technologies of their own, can provide a huge amount of assistance in determining data quality and addressing the risk of drawing incorrect conclusions based on bad data.

It is of great value to any enterprise risk management program to incorporate a program that includes processes for assessing, measuring, reporting, reacting to, and controlling different aspects of risks associated with poor data quality.

Data quality is a critical prerequisite to effective business analytics. Poor data quality jeopardizes the performance and efficiency of operational systems. It undermines the value of analytic and business intelligence systems upon which
organizations rely to make key decisions. Decisions based on poor data can result in direct financial loss.

Business leaders need to pay serious attention to the accuracy, quality and reliability of their data. The most obvious cause for poor data quality is data entry. If an organization has no standards or IT controls for how data is entered into a system, the data will quickly reflect their lack. It is in this way that duplicate entries are made in master data.

While Business Intelligence tools can create beautiful and compelling dashboards and graphics, if the data that the tools rely on is of poor quality, their results are
meaningless, or at least potentially badly flawed.
Data cleansing and data quality are important in order to ensure quality results from business intelligence analytics.

You can never fully predict how business users will want to analyze their data, so give them complete freedom to drill down in any direction they choose.Deliver them something good in a week as opposed to something great in six months.Good data is attained by integrating multiple data sources, deriving a ‘single version of the truth,’ and putting that good data (and unstructured content) into a data warehouse where the BA/BI tools can perform their magic. DDD (data-driven decision making) begins and ends with good data.Analytics applications that nicely present dashboards, scorecards, historical trends, predictive analys, and give me actionable insights, can all benefit from good data. Good data begins with data integration, data quality, and a good data warehouse.

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...