Byte Introduction

Get started using MongoDB

Skills:

MongoDB

Objective

Get started with MongoDB

Background/Recap

Relational (or SQL based) databases were popular due to the reduced data duplication they brought with them. However, the reduction in storage costs and massive amounts of unstructured data have paved the way for NoSQL based databases. In a lot of cases where it is not required to deal with complex relational data models that occur due to the large number of tables and the relationships between them, NoSQL comes into play. While the relational databases store data in the form of rows and columns, the NoSQL databases store data as documents (e.g. JSON).

MongoDB is one of the more popular NoSQL databases in use today.

Primary goals

  1. Understand the need for MongoDB

  2. Learn Mongo shell commands

Objective

Get started with MongoDB

Background/Recap

Relational (or SQL based) databases were popular due to the reduced data duplication they brought with them. However, the reduction in storage costs and massive amounts of unstructured data have paved the way for NoSQL based databases. In a lot of cases where it is not required to deal with complex relational data models that occur due to the large number of tables and the relationships between them, NoSQL comes into play. While the relational databases store data in the form of rows and columns, the NoSQL databases store data as documents (e.g. JSON).

MongoDB is one of the more popular NoSQL databases in use today.

Primary goals

  1. Understand the need for MongoDB

  2. Learn Mongo shell commands

Opening Mongo Shell

To start the Mongo shell, execute mongo in the terminal. You’ll see an output similar to the below snapshot.

Note: All Mongo commands covered in this Byte are to be executed while inside the Mongo shell (unless specified otherwise).

image alt text

Mongo Shell

Why should MongoDB be used?

  • It is schema-free and does away with the need to define a fixed structure. You can just start adding data and it isn’t necessary to have any relation between those. The only restriction with this is the supported data structures.

  • The document format used in place of rows and columns can be easily mapped by programming languages

  • Horizontally scalable i.e, data can be distributed across multiple servers to improve performance whereas for relational databases, performance can only be improved by increasing the performance of the server it’s run on.

  • Provides good support for Hierarchical data like JSON or XML. For large amounts of unstructured data, SQL based DB would be significantly slower.

  • Offers higher efficiency and reliability which in turn helps with better storage capacity and speed requirements.

Let’s look at some of these in more detail


Let’s take an example of some data to be stored in a database. See the data below, for the weather forecast for London city on January 30, 2017 (given by the "dt" value).


{

   "coord":{

      "lon":-0.13,

      "lat":51.51

   },

   "weather":[

      {

         "id":300,

         "main":"Drizzle",

         "description":"light intensity drizzle",

         "icon":"09d"

      }

   ],

   "base":"stations",

   "main":{

      "temp":280.32,

      "pressure":1012,

      "humidity":81,

      "temp_min":279.15,

      "temp_max":281.15

   },

   "visibility":10000,

   "wind":{

      "speed":4.1,

      "deg":80

   },

   "clouds":{

      "all":90

   },

   "dt":1485789600,

   "sys":{

      "type":1,

      "id":5091,

      "message":0.0103,

      "country":"GB",

      "sunrise":1485762037,

      "sunset":1485794875

   },

   "id":2643743,

   "name":"London",

   "cod":200

}

TODO

Come up with the different tables that would be required to store this data in a relational (SQL) database.

Points to note

These tables will be required - Coordinates, Weather, Main, Wind, Clouds, Sys and one parent table which has foreign key relationship to all these other tables.


So, we need to come up with the tables to be used, set data types for all the attributes, define the relationships, and create all these tables before going ahead and adding data to these tables. This is termed as having a "fixed schema" as the structure is defined before adding data.


MongoDB uses dynamic schema and inherently supports hierarchical data (nested data) as it’s a NoSQL database. This means that the schema (structure) is dynamically created from the data and individual data entries don't have to be in the same format. Hence, there’s no overhead to adding complex data like we saw above, making it a preferred choice to store this kind of data.

Other advantages of using MongoDB

As MongoDB can be scaled horizontally, MongoDB clusters can be geographically distributed on servers to improve query performance.


MongoDB is a persistent (or disk-based) database unlike Redis, which is a NoSQL based in-memory database. Though Redis can be significantly faster than MongoDB when the stored data is smaller in size, MongoDB can operate much faster as the database gets larger.


Sharding is the concept of storing partitions of data (shard) in different database servers.

  • For sharding in relational databases like MySQL, application level changes are required as well as needing to suffer downtime due to interlinking of the tables. These also require servers with high specifications which are costly.

  • Due to the horizontal scalability, sharding in MongoDB is cost-effective as buying several low-cost machines is often cheaper than buying a smaller number of machines with significantly higher specifications.


Additionally, MongoDB stores data in the BSON (Binary JSON) format.

  • BSON basically is an improvised version of the popular JSON format used

  • BSON supports additional data types like Date out of the box in addition to the data types like String, Boolean, Number, Array supported by JSON.

  • Parsing data is much quicker with BSON as data isn’t saved in text form (human-readable) as in JSON. BSON adds metadata to like lengths of string used to make queries faster

  • As popular programming languages like JavaScript, Python has data types (JavaScript Object, Python Dictionary) to which JSON can be directly mapped, it removes any transformation required to process the data.


However, in the below cases, Relational databases are a better bet than NoSQL databases

  • If large amount of structured data is to be written and it’s critical to avoid loss of data

  • If the structure of data may not change, "fixed schema" of relational databases can be better even with the overhead of designing the schema as it becomes possible to perform complex queries using SQL

  • If storage is an issue and need to avoid data redundancy

Creating a Database and Collection

We’ll be using a practical scenario of storing and processing data for QTravels, which is a Hotel Booking portal. As you can imagine, for a large travel website, there will be innumerable subsystems that work in unison to give the end user a great experience. We’ll look at the Reservations component of the website.


Start the Mongo shell by executing mongo command in the Linux terminal (as done in the Setup section). Here are some Mongo commands to start with

image alt text

TODO -

  • Select the database, qtravels (even though you’ve not created it)

  • Print the currently selected database to verify qtravels is selected

  • Print out all available databases

Did you find any discrepancies in the results?


> use qtravels

switched to db qtravels

> 

> db

qtravels

> 

> show dbs

admin   0.000GB

config  0.000GB

local   0.000GB

> 

The qtravels database isn’t listed among the available databases but we could still select it!


The use <db-name> command selects a database with the specified name even if it doesn’t exist. Mongo doesn’t create a new database here.


Similar to a relational database (eg: MySql) being made up of a number of Tables, MongoDB database is made up of a number of Collections. Some Mongo commands related to collections are

image alt text

TODO -

  • Print out all Collections in the qtravels database. Are there any Collections by default?

  • Create a new Collection, reservations for storing customer reservations

  • Print out all Collections in the qtravels database and verify a new Collection was created

  • Print out all available Databases again. Was the qtravels Database created now?


> db

qtravels

> 

> db.getCollectionNames()

[ ]

> 

> db.createCollection("reservations")

{ "ok" : 1 }

> 

> db.getCollectionNames()

[ "reservations" ]

> 

> show dbs

admin     0.000GB

config    0.000GB

local     0.000GB

qtravels  0.000GB

> 


You’ll have found that

  • There aren’t any collection created by default

  • Creating the reservations collection also created the qtravels database

Note

For users in Crio workspace, the database and any data added can be lost when the workspace session ends. This is because only data stored inside ~/workspace is persisted across workspace sessions.

Curious Cats

  • Find out an alternative command to print out collections in a database other than db.getCollectionNames()

Documents

Documents in MongoDB are like rows of data in a Relational database. A MongoDB Collection is a collection of Documents.

image alt text

Collection as group of Documents (Source: https://docs.mongodb.com/)

Create Documents

The data to be stored in the reservations collection can have attributes as below. These Documents indicate the hotel reservations made on the QTravels portal.


{

   "reservation_id":"FDA771871",

   "city":"South Leahstad",

   "starts_at":"2019-06-04T20:13:58",

   "ends_at":"2019-06-06T20:13:58"

}

The insert() command is used on a Collection to add Documents to that collection.

image alt text


db.reservations.insert({ "reservation_id" : "FDA771871", "city" : "South Leahstad", "starts_at" : "2019-06-04T20:13:58", "ends_at" : "2019-06-06T20:13:58" });

TODO -

  • Add 2 more Documents to this Collection using below reservation data

{ "reservation_id" : "406453A1D", "city" : "Delhi", "starts_at" : "2019-06-04T20:13:58", "ends_at" : "2019-06-08T20:13:58" }

{ "reservation_id" : "B492A55CE", "city" : "Delhi", "starts_at" : "2019-06-04T20:13:58", "ends_at" : "2019-06-04T20:13:58" }

  • What if you add multiple Documents with the same reservation_id value? Does it show an error?

{ "reservation_id" : "B492A55CE", "city" : "London", "starts_at" : "2019-06-04T20:13:58", "ends_at" : "2019-06-04T20:13:58" }

  • What happens if some attribute is missing while inserting?

{ "reservation_id" : "B492A55CE", "city" : "London", "starts_at" : "2019-06-04T20:13:58"}

image alt text

You’ll find that

  • It’s possible to add multiple Documents with the same reservation_id attribute. So, this attribute isn’t really a unique ID for MongoDB

  • MongoDB supports insertion of Documents with different number of attributes unlike SQL databases.

Read Documents

The find() method is used to lookup Documents based on filtering criteria that you specify. You can use it to lookup the Documents from the reservations Collection.


TODO -

  • List out all the Documents in the reservations Collection of qtravels Database

  • Do you find any additional attribute in the Document data printed out?

  • List out the Documents for which reservation is to London

image alt text

There’s an additional _id attribute which you’ll notice is unique among the documents. This attribute is used as an identifier by Mongo and is analogous to the Primary Key in relational databases. The _id value is used as the key to retrieve the corresponding document. If the insert() command doesn’t specify this value, MongoDB auto generates it to be unique.


TODO - Find the number of digits in the _id value.


The _id attribute is a 12 byte number and is generated from

  • System time in Unix epochs (4 bytes)

  • Machine identifier (3 bytes)

  • Process ID (2 bytes)

  • Counter, starting from a randomly generated number (3 bytes)

The _id value displayed is in hexadecimal format and each hexadecimal number constitutes 4 bits of data. This is why the _id value has 24 digits.

Tip

Use the pretty() method on find() output to pretty print the data. Eg: db.reservations.find({}).pretty()

image alt text

Update Documents

There will be cases we need to update the already inserted Documents. For example, a customer might want to change their travel destination. The update() method is used on a Collection to update Documents in that Collection.


TODO -

  • Add a new reservation with below data to the reservations collection and then update only its city attribute to Kolkata.

{ "reservation_id" : "C241F00A7", "city" : "Bangalore", "starts_at" : "2020-09-10T20:13:58", "ends_at" : "2020-06-18T20:13:58" }

  • Verify the document’s city attribute was updated by listing out documents with the new and old city values

image alt text


db.reservations.update({ "reservation_id" : "C241F00A7" }, { $set : {"city" : "Kolkata"} })

$set is a special operator used to set the value of a field. Here, the city attribute of document(s) with the given reservation_id value is set to Kolkata.


TODO - Add a new reservation with the below data. Update its city to your favorite city and starting date to "2020-09-15T20:13:58"


{ "reservation_id" : "Z241F00A7", "city" : "Bangalore", "starts_at" : "2020-09-10T20:13:58", "ends_at" : "2020-06-18T20:13:58" }

image alt text


db.reservations.update({ "reservation_id" : "Z241F00A7"}, { $set : {"city" : "Calicut", "starts_at" :  "2020-09-15T20:13:58"} } )

The $set operator can take multiple attribute values as inputs to update


TODO - Update all reservations with city attribute as London to Manchester. (You can add below London reservations first). Does all the reservations get updated with a single update() call? How would you update multiple documents in one go?


db.reservations.insert({ "reservation_id" : "A06453A1D", "city" : "London", "starts_at" : "2019-06-04T20:13:58", "ends_at" : "2019-06-08T20:13:58" })

db.reservations.insert({ "reservation_id" : "A492A55CE", "city" : "London", "starts_at" : "2019-06-04T20:13:58", "ends_at" : "2019-06-04T20:13:58" })

image alt text

The update() method doesn’t update multiple documents out of the box. For that, the multi option has to be set as true


db.reservations.update({"city" : "London"}, { $set : {"city" : "Manchester"} } )

db.reservations.update({"city" : "London"}, { $set : {"city" : "Manchester"} }, {multi:true} )

Delete Documents

The remove() method is used to delete a Document from a Collection.


TODO - Execute the below command to add a reservation and then delete the document using its _id value.


db.reservations.insert({ "_id": ObjectId("4f4e3fcf380d499d856e728d"), "reservation_id" : "Z241F00A7", "city" : "Bangalore", "starts_at" : "2020-09-10T20:13:58", "ends_at" : "2020-06-18T20:13:58" })

image alt text

To delete a Document by the _id attribute, the full attribute value has to be provided in the query.


db.reservations.remove({ "_id": ObjectId("4f4e3fcf380d499d856e728d")})

TODO - Execute the below command. Delete all the reservations for the city of London with a starting date "2020-06-18T20:13:58".


db.reservations.insert({ "reservation_id" : "A06453A1D", "city" : "London", "starts_at" : "2020-06-18T20:13:58", "ends_at" : "2020-06-28T20:13:58" })

db.reservations.insert({ "reservation_id" : "A492A55CE", "city" : "London", "starts_at" : "2020-06-18T20:13:58", "ends_at" : "2020-06-27T20:13:58" })

db.reservations.insert({ "reservation_id" : "A492A55CE", "city" : "London", "starts_at" : "2019-06-04T20:13:58", "ends_at" : "2019-06-04T20:13:58" })

image alt text


db.reservations.remove({ $and : [ {"city" : "London"}, {"starts_at" : "2020-06-18T20:13:58"} ] })

The $and operator can take multiple conditions to perform the logical AND operation. The conditions are to be given in an array (within "[]")

Curious Cats

  • Can the _id attribute be set using the insert() method? What happens if you try to add an already existing _id value?

Querying using Mongo

The main use case with any database is to lookup or query existing data. We’ll look at a few ways to do this in Mongo.


Count Documents

TODO -

  • Execute below commands

db.reservations.insert({ "reservation_id" : "A06453A1D", "city" : "London", "starts_at" : "2029-06-18T20:13:58", "ends_at" : "2020-06-28T20:13:58" })

db.reservations.insert({ "reservation_id" : "A492A55CE", "city" : "London", "starts_at" : "2020-06-18T20:13:58", "ends_at" : "2020-06-27T20:13:58" })

db.reservations.insert({ "reservation_id" : "A06453A1D", "city" : "London", "starts_at" : "2028-06-18T20:13:58", "ends_at" : "2020-06-28T20:13:58" })

db.reservations.insert({ "reservation_id" : "A492A55CE", "city" : "London", "starts_at" : "2020-06-18T20:13:58", "ends_at" : "2020-06-27T20:13:58" })

  • Use the count() method on the reservations collection to find the total number of customer reservations

  • Find the number of reservations in a particular city.

  • Find the number of reservations that are yet to start. ("start_date" should be greater than or equal to current date)

image alt text

(The exact numbers can be different for you)


db.reservations.count( { $where : function() { return new Date(this.starts_at) > new Date() } } )


The $where operator can be used to set a condition that needs to be checked. Documents satisfying the condition will be retrieved. Here, the new Date() method returns the current date and new Date(this.starts_at) converts the provided input string (i.e. this.starts_at) to a date.

List Documents

TODO -

  • Use the find() method on the reservations collection to find all same day reservations (starts_at and ends_at values will be same)

  • Find all reservations made across two different cities (do this by using a single command)

image alt text

(The output could be different for you)


The command used to find all 1 day reservations is


db.reservations.find({$where : function() {  return new Date(this.starts_at).setHours(0,0,0,0) == new Date(this.ends_at).setHours(0,0,0,0)}})

Here, the starts_at and the ends_at values are first converted to dates and then time is set to 00:00:00 hours using the setHours() method. Then, any document matching the equality is returned.


The $in operator can be used to return documents matching an array of values for an attribute, here "city"


db.reservations.find({"city" : { $in: ["Delhi", "Bangalore"] }})

Curious Cats

  • A new offer gives any customer with reservations for more than 7 days an additional stay of one day for free. Apply this offer to the reservations Collection by incrementing the end date by one day if the total stay of duration is 7 days or more.

Summary

Newfound Superpowers

  • Practical knowledge of MongoDB

Now you can

  • Answer these MongoDB interview questions

    • What is MongoDB and why should we use MongoDB?

    • What is the difference between MongoDB and MySQL or between NoSQL and SQL?

    • When would you choose MongoDB and when would you choose an SQL based database?

    • What is _id field in MongoDB?

    • How to create new MongoDB databases and collections?

    • How do you perform CRUD operations in MongoDB?

    • What format does MongoDB store data? How is it related to/different from JSON?

    • Explain sharding and why it is easier in MongoDB compared to relational databases?