January 13, 2019

MongoDB: Schemaless does not mean you have no Schema


RDBMSs usually have a pre-defined schema: tables with columns, each with names and a data type. When working with a RDBMSs, we’re often confronted with complex schemas that define the structure of the data. When we want to make changes to the database, we may have to wrestle with schema changes as well. The implications of making a schema change include being sure that existing data fits the new schema. Or, more commonly, that the existing application programming won’t break when we modify the database schema. The strict controls and exactness imposed by schemas and by typed languages allow you to keep large groups of developers on the same page, and can allow you to catch bugs earlier in the development cycle.

Document databases are a flexible alternative to the pre-defined schemas of relational databases. Each document in a collection can have a unique set of fields, and those fields can be added or removed from documents once they are inserted which makes document databases, and MongoDB in particular, an excellent way to prototype applications. However, this flexibility is not without cost and the most underestimated cost is that of predictability.

MongoDB is an open-source, non-relational database developed by MongoDB, Inc. MongoDB stores data as documents in a binary representation called BSON (Binary JSON).Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the collection, without updating a central system catalog, and without taking the system offline. MongoDB’s document data model maps naturally to objects in application code, making it simple for developers to learn and use. With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. This makes rapid development and changes easy.

When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard.

A good technique  is to have a schema definition which can be shared among programs and tools. Different programs can then agree on the schema.We can also represent a schema definition which can be useful in design and modeling, data validation, and schema migration.

Also having a properly designed schema will allow you to get the best performance from MongoDB and  Data Analysts will also know of different policies and metadata that are required to understand the data.

No comments:

Creating DataFrames from CSV in Apache Spark

 from pyspark.sql import SparkSession spark = SparkSession.builder.appName("CSV Example").getOrCreate() sc = spark.sparkContext Sp...