Data Engineering with Avishkar: 2019

November 29, 2019

How e-commerce companies can handle stampede of shoppers on Annual Sale Day?

Most of the world top e-commerce companies host annual sale like Amazon - Prime Day, Walmart- Black Friday Online Deals and Flipkart - Big Billion sale. No matter how much these marketplaces prepares, the full scale of consumer activity can only be known when the anticipated day arrives, often shocking expectations causing spike in transactions in span of a single second.

To handle such spike, these marketplace uses distributed database designed to serve online transaction processing. But it is again limited to individual machine capacity of database storage engines to handle a spike in transactions.

One solution is effective use of caching and shared storage design to improve its scalability and apply machine learning methods to predict spike in transactions to emulate the workload and analyzing QPS (queries per second) performance in the performance testing.

June 16, 2019

How do I uninstall and reinstall MySQL?

You did MySQL install of Ubuntu 16.04 Server and due to some reasons MySQL is not starting up and you want to uninstall and reinstall it.

Pls follow below procedures for MySQL reinstallation.

apt-get remove -y mysql-*
sudo apt-get purge mysql* sudo apt-get autoremove sudo apt-get autoclean
sudo apt-get remove dbconfig-mysql

Then upgrade MySQL distribution sudo apt-get dist-upgrade

Then install MySQL sudo apt-get install mysql-server
Grant All permissions
https://avishkarm.blogspot.com/2017/04/how-to-allow-mysql-remote-access-in.html

That's all.

May 08, 2019

MongoDB Data Migration, Backup and Restore

MongoDB is one of the most popular NoSQL database engines. Managing MongoDB production environment requires back it up, restore data, etc. In case of converting MongoDB to SSL, move data from MongoDB on one server to another server, we can use import/export or backup/restore.

Importing and exporting a database means dealing with data in a human-readable format, compatible with other software products. In contrast, the backup and restore operations create or use MongoDB-specific binary data, which preserves not only the consistency and integrity of your data but also its specific MongoDB attributes. Thus, for migration its usually preferable to use backup and restore as long as the source and target systems are compatible.

Import/export:

MongoDB uses json and bson (binary json) formats for storing its information. Json is the human-readable format which is perfect for exporting and, eventually, importing your data.Json does not support all the data types available in bson and there will be the so called 'loss of fidelity' of the information.

Export:

sudo mongoexport --db mydb -c collections--out newdbexport.json

2019-05-06T15:47:30.931-0700 connected to: localhost

2019-05-06T15:47:31.931-0700 [........................] mydb .collections 0/10234 (0.0%)

2019-05-06T15:47:32.932-0700 [#######.................] mydb .collections 6100/10234 (31.5%)

2019-05-06T15:47:33.827-0700 [########################] mydb .collections 10234/10234 (100.0%)

2019-05-06T15:47:33.828-0700 exported 10234 records

Import:

sudo mongoimport --db mydb --collection collection --file newdbexport.json

While importing json file,you don't have to worry about explicitly creating a MongoDB database. If the database you specify for import doesn't already exist, it is automatically created. In MongoDB the structure is automatically created upon the first document (database row) insert.

Backup/Restore:

Importing your data. using export/import json file have possibility of 'loss of fidelity' of the information.J son does not support all the data types available in bson and tit is advised to use

mongodump and mongorestore to take (and restore) a full binary backup of your MongoDB database.

Backup:

Dump a collection to a BSON file.

mongodump -h hostname -d dbname-c collectionname-o

If you want to dump all collections in one go, simply omit the "-c collectionname" argument in the invocation below.

mongodump -h hostname -d dbname-c collectionname-o

Restore:

For restoring MongoDB we'll be using the command mongorestore which works with the binary backup produced by mongodump.

mongorestore -d mydb /root/dump/mydb/collections.bson

Use with --drop to make sure that the target database is first dropped so that the backup is restored in a clean database.

sudo mongorestore --db newdb --drop /var/backups/mongobackups/01-20-16/newdb/

mongorestore -d mydb /root/dump/mydb/collections.bson

2019-05-06T15:43:26.403-0700 checking for collection data in /root/dump/mydb/collections.bson

2019-05-06T15:43:26.435-0700 reading metadata for mydb.collections from /root/dump/mydb/collections.metadata.json

2019-05-06T15:43:26.452-0700 restoring mydb.collections from /root/dump/mydb/collections.bson

2019-05-06T15:43:27.280-0700 restoring indexes for collection mydb.collections from metadata

2019-05-06T15:43:27.284-0700 finished restoring mydb.collections (10234 documents)

2019-05-06T15:43:27.284-0700 done

root@ubuntu:~#

Verify the Collections Exists

March 07, 2019

Data Democratization

At a fundamental level typical business management need answers to Five W's and one H.
Let's take a example of e-Commerce company.

Who is responsible for revenue growth?
What was the most popular product last month?
When the new features will be launched?
Where we are in terms of revenue growth?
Why web site traffic growth is low since last quarter?
How we turn revenue trend from negative to positive?

Dashboards are great at presenting answers to the "What ","Where" ,"What" questions.
Unfortunately, the "Why" , "How" questions are often much more difficult to tackle, and they typically require a data investigation of sorts. It typically involves a data deep dive that needs to be tackled from a variety of angles which have not been planned for. How do we achieve the goal without hiring a massive data staff or expecting all business employees to become data scientists?

One must-have is to make the data easily accessible to those who need it. Gone are the days when you should typically require a long business justification and second line manager approval. We need to lower the barriers to access standard, non-sensitive business data and should provide Self Serve Data Analysis rather than facilitated data analysis. Need to build a true data democracy to enable non-data expert SME's to perform self-service analytics.
Data being the “oil” the benefits should be shared freely with all types of users in an understandable format. This data could be further refined or consumed for appropriate data – driven decisions.

Data democratization is the ability for information in a digital format to be accessible to the average end user and there are no gatekeepers that create a bottleneck at the gateway to the data.The goal of data democratization is to allow non-specialists to be able to gather and analyze data so that they can use it to expedite decision-making and uncover opportunities for an organization. The goal is to have anybody use data at any time to make decisions with no barriers to access or understanding.

Data Democratization is a process and has to be embedded and called out into the regular Big Data Development Life Cycle. It involves people, process, and technology to arrive at the innovative, valuable business decisions from the insights gained. Data lake as a technology or platform helps in implementing data democracy more efficiently and effectively.

A data lake is a raw collection of data, and users would only worry about the format at the time of access.
The enterprise data lake is the core and future of the Modern Data warehouse architecture which is complemented by the components of metadata management, master data management, data governance, and security across the layers Data Lake allows data to be stored in the native form and therefore broadens the horizon of usage and increase flexibility and adaptability as per the requirement.

January 13, 2019

MongoDB: Schemaless does not mean you have no Schema

RDBMSs usually have a pre-defined schema: tables with columns, each with names and a data type. When working with a RDBMSs, we’re often confronted with complex schemas that define the structure of the data. When we want to make changes to the database, we may have to wrestle with schema changes as well. The implications of making a schema change include being sure that existing data fits the new schema. Or, more commonly, that the existing application programming won’t break when we modify the database schema. The strict controls and exactness imposed by schemas and by typed languages allow you to keep large groups of developers on the same page, and can allow you to catch bugs earlier in the development cycle.

Document databases are a flexible alternative to the pre-defined schemas of relational databases. Each document in a collection can have a unique set of fields, and those fields can be added or removed from documents once they are inserted which makes document databases, and MongoDB in particular, an excellent way to prototype applications. However, this flexibility is not without cost and the most underestimated cost is that of predictability.

MongoDB is an open-source, non-relational database developed by MongoDB, Inc. MongoDB stores data as documents in a binary representation called BSON (Binary JSON).Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the collection, without updating a central system catalog, and without taking the system offline. MongoDB’s document data model maps naturally to objects in application code, making it simple for developers to learn and use. With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. This makes rapid development and changes easy.

When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard.

A good technique is to have a schema definition which can be shared among programs and tools. Different programs can then agree on the schema.We can also represent a schema definition which can be useful in design and modeling, data validation, and schema migration.

Also having a properly designed schema will allow you to get the best performance from MongoDB and Data Analysts will also know of different policies and metadata that are required to understand the data.

January 05, 2019

Who Moved My Cheese?

It is a well-known fact that any technology can fade in no time due to new and rapid advancements rolling out every now or then. Hence, you have to be aware of the advancements in the technology industry and trends that can benefit your business in the future.

Anticipate Change

Even if we are doing well, we should always have a Plan B. Better be prepared than regret later.

Monitor Change:

Smell the Cheese Often So You Know When It Is Getting Old

You should be receptive to your environment in order to observe the changes around you.

Managed database services moved Cheese from Database developers but they can always look for role in Big Data for another and better career.

Adaption:

The Quicker You Let Go Of Old Cheese, the Sooner You Can Enjoy New CheeseAs soon as we notice that there is no cheese left we should not waste any time, and left in search of a new cheese.So what do you do when change happens? Just go with it. Nothing good will happen if you deny it or just complain about it. It is safer to search in the maze than to remain in a cheese-less situation.

e.g Cloud technology moved Cheese from System Administrator, Network Administrators but they can always look for DevOps, Cloud architect roles

Move with the Cheese

Hem was hesitant to leave Cheese Station C because that was his comfort zone. What is holding Hem back? Fear. He was frightened of the unknown that’s why he refused to move on. On the other hand, Haw realises that he has to overcome this fear in order to find a new cheese.

Enjoy Change! :

Savour the Adventure and Enjoy the Taste Of New Cheese!

Project Leader/Project Manager having 15+ years of experience are in comfort zone and and given up Hands-on and now suddenly with project management tools, there would be not main role of these project managers. Now that you accepted change, enjoy the new cheese! You need to appreciated the change of pace and the thrill of the hunt. Now, you should ready to embark on a new journey in your life as a Scrum master in Agile Development.

Be Ready To Change Quickly and Enjoy It Again: They Keep Moving the Cheese

Again, as you enjoy the new cheese, be wary of your surroundings because another one will come soon. An AI bot creates log-in tests and then goes throughout the SDLC. Today CI/CD and streamlining long-term, this will be very commonplace. Testing will be as natural as writing code which will be done by machines and Automation testing role soon outdated. You’ll experience success with new role by steering continuous delivery teams towards quality instead of being a quality gatekeeper, monitoring impact of code changes in production, eliminating testing bottlenecks, improve slow feedback loops in continuous delivery pipelines

Like the characters in the "Who Moved My Cheese?” story, we will learn that these lessons are applicable to the circumstances we encounter in our career. It will change your thinking and attitude towards change. It’s ideal to be like Sniff and Scurry who are always ready and adaptive, but it’s not too late to be like Hem who learned embraced change.

Data Engineering with Avishkar