Reply to comment

strict warning: Only variables should be passed by reference in /mnt/volume-sfo2-01/www/riceball.com/public/d/modules/book/book.module on line 559.

Learning to Install MongoDB

Learning MongoDB

I'm a beginner to MongoDB, but having just gone through the process, I hope that these notes may help other beginners get through the process of spinning up a MongoDB instance.

The Platform

We're going to be installing MongoDB on Ubuntu Server 14.04. We'll install on a virtual machine (VM) provisioned with VirtualBox. I won't get into the details in this tutorial, but there are VirtualBox tutorials everywhere, and I'll link to some notes about my specific setup. Virtual machines are an essential tool when you're learning and practicing.

My goal was to bring up a MongoDB instance to port a Parse.com project before their service was discontinued. This tutorial will bring up a MongoDB instance and set up a Replica Set for reliability, but will not get into Parse.com migration.

Set up a VirtualBox VM with Ubuntu

I like to keep a VM set up with a clean "net install" of Ubuntu that's set up similar to a generic Vagrant box. (Vagrant is a command line tool to set up virtual machine configurations, and start and stop these VMs. We won't be using Vagrant. I just think it's a good way to set up VMs for development.)

A generic net install of Ubuntu Linux Server, with IPv4 static addresses for networking, should be good enough. Ubuntu Linux is at http://ubuntu.com.

The following blog posts summarize how I've set up my network of virtual machines, but they aren't edited, so they are a little rough:

Debian and Ubuntu Networking Configuration /etc/network/interfaces with a Static IP Address

Setting up a Small Dev Network

Docker on VirtualBox installation commands

Clone your clean VM.

Once you have a base image working, clone it to create your working copy of the VM. This way, when you screw it up, you can delete the clone and start over.

[images of how to clone]

Pick the "Linked Clone" option, and check "Reinitialize the MAC Address".

(Reinitializing the MAC address gives the VM a different network card identifier, so packets will be delivered correctly.)

Don't install the Ubuntu mongodb package. Use MongoDB's repository, and install mongodb-org instead.

Ubuntu's version of MongoDB was out of date, and wasn't current enough for Parse, so I decided to install the latest version.

Installation instructions are at MongoDB's site:

Install MongoDB Community Edition on Ubuntu

There are instructions for other operating systems, but I have used Ubuntu 14.04.

A summary of the installation commands are:

sudo -s

apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927

echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list

apt-get update

apt-get install -y mongodb-org

Delete all the old files.

I didn't bother to purge the files from my previous installation of MongoDB, which was the old version in the Ubuntu repository. Stray config files from the old installation were left around, and this caused problems with running the correct version of MongoDB. This problem had me baffled for a while.

The new config files are named "mongod.conf" and "mongod". The old config files were named "mongodb.conf" and "mongodb". The old files were in these locations:

/etc/init/mongodb.conf
/etc/init.d/mongodb
/etc/mongodb.conf

So, if you see them, delete them. You'll need to make a symlink in /etc/init.d with these commands:

cd /etc/init.d
sudo ln -s /lib/init/upstart-job mongod

That will cause Bash's tab completion to work with the "service" command.

Start the server.

The server should start automatically; if it doesn't, you need to start it manually.

sudo service monogd restart

You can test that it's working by typing mongo:

mongo

That attempts to connect to the local server. You should see something like this:

vagrant@marinela:~$ mongo 
MongoDB shell version: 3.2.8
connecting to: test
> 

A new installation of MongoDB is not secure.

A new installation of MongoDB has the following security and authorization features turned off:

  • User accounts.
  • Authorization for resources and actions.
  • Identification and authorization between computers.

MongoDB also expects to be operating in a "secured network", meaning it's not exposed to the entire Internet. More on this later.

This lack of security is convenient for administrators, because it means you can connect to the server and set it up without dealing with passwords or keys.

Once user accounts are created, and permissions assigned, you can turn on the security features.

Set up the users.

For this tutorial, we'll set up several users. I'm using bad passwords. In your real deployments, you should use good passwords, and use some kind of password management software, like LastPass, to keep track of them.

Make a text file, "users.js" with the following:

/* connect as root to 127.0.0.1 and pipe this in to make the users */

use admin
db.createUser({ user:'admin', pwd:'admin', roles: [ 'root' ]});
db.auth('admin','admin');

db.createUser({ user:'cluster', pwd:'cluster', roles: [ 'clusterAdmin' ]});
db.createUser({ user:'backup', pwd:'backup', roles: [ 'backup' ]});
db.createUser({ user:'restore', pwd:'restore', roles: [ 'restore' ]});

use mydata
db.createUser({ user:'mydata', pwd:'mydata', roles: [ 'dbOwner' ]});

Then pipe it into mongo.

cat users.js | mongo 127.0.0.1

Connecting to 127.0.0.1 is the same as connecting to localhost or not specifying a host at all. MongoDB's default port is 27017. The following are all equivalent:

mongo
mongo localhost
mongo 127.0.0.1
mongo localhost:27017
mongo 127.0.0.1:27017

This file is described in detail below.

Set up a super administrator called admin.

In Unix parlance, the superuser is called "root"; MongoDB has a role called "root", which basically grants total control over the server to a user.

use admin
db.createUser({ user:'admin', pwd:'admin', roles: [ 'root' ]});

Switch to that admin user.

When you connect to the server, you don't have an identity. You need to authenticate yourself to the server:

db.auth('admin','admin');

Create users for cluster, backup, and restore.

Authentication and authorization for clustering, backups, and restoration are managed per-server. This is different from other systems that would have these accounts established per-database.

db.createUser({ user:'cluster', pwd:'cluster', roles: [ 'clusterAdmin' ]});
db.createUser({ user:'backup', pwd:'backup', roles: [ 'backup' ]});
db.createUser({ user:'restore', pwd:'restore', roles: [ 'restore' ]});

By the way, to back up the database, you can use the mongodump command like this:

mongodump -u backup -p backup

The mongodump tool dumps the contents of the entire server to a directory named "dump".

Create a new database.

There is no command to create a new database. You just name it and use it.

The "use" command, above worked this way:

use admin

There's a database called admin that stores MongoDB's user accounts and authorization information.

use mydata

That command changes the current database. mydata doesn't exist, yet, but when we eventually save data to it, it'll be created.

Create a user for the new database.

The following user account, "mydata", is created even before the database exists:

db.createUser({ user:'mydata', pwd:'mydata', roles: [ 'dbOwner' ]});

Connect with this new user, and create a collection.

You can then connect to the database as the "mydata" user, and create a collection, like this, from the shell:

mongo -u mydata -p mydata mydata

Then in the mongo shell:

db.createCollection("SomeCollection");

Two ways to log on

Note that there are two ways to log on.

User and password on the command line.
mongo -u mydata -p mydata
Connect anonymously, and then authenticate.
mongo

Then in the mongo shell:

db.auth('mydata', 'mydata');

The MongoDB Authorization system: Users, Roles, and Capabilities

Opening Up the Server to the Internet

MongoDB expects to run in a secured network. This means that incoming connections from the Internet must be restricted by a firewall so the entire Internet cannot connect to MongoDB.

This is critical because, as we learned, above, you can connect anonymously to the MongoDB server, and then authenticate after connecting. Each connection consumes some memory and CPU resources on the server, so the server is subject to being the target of a denial of service (DOS) attack.

So, you must have a firewall. A software firewall is OK, but a hardware firewall plus a software firewall a better solution. The hardware firewall will reduce the load on the server's CPU.

Furthermore, as noted above, a new installation of MongoDB has no security restrictions! We need to change the configuration so that security is enabled; that's explained below.

Set up the UFW software firewall on Linux.

Uncomplicated Firewall (UFW) is what I use to manage the software firewall on Linux. If you don't have it, you can get it from the repo:

sudo apt-get install ufw

Create two rules to allow remote access. Use ifconfig to find your interface's address:

# These two lines allow web server access.
ufw allow in to 17.10.12.189 port 80
ufw allow in to 17.10.12.189 port 443

# This line keeps your SSH connection up.
ufw allow in to 17.10.12.189 port 22

# This is for enabling access to MongoDB from the Parse.com network.
ufw allow in from 54.85.224.0/20 to 17.10.12.189 port 27017
ufw enable

Change the server configuration to enable networking.

/etc/mongod.conf is the mongod server's configuration file. It's in a format called YAML, Yet Another Markup Language. YAML is a file format that's an "object serialization" format like JSON. The most important thing to know is that in YAML files, the indentation matters, and that spaces must be used to indent the text. Copy the example indentation exactly.

More about YAML is at yaml.org.

Some of the stock configuration needs to be changed. The net section should be modified as below, and the two other sections added. Change 12.10.12.189 to the IP address of your server. (Type the "ifconfig" command to see the address.)

net:
  port: 27017
  bindIp: 17.10.12.189,127.0.0.1

setParameter:
  enableLocalhostAuthBypass: true

security:
  authorization: enabled

The "magic" here is the enableLocalhostAuthBypass feature, which allows a user who is on the server to have root (superuser) access to the server. If there's any misconfiguration in the permissions, SSH into to the server and bypass security.

What this implies, however, is that you must connect to the server via the external IP address to test your security. You can do that like this:

mongo 192.168.111.27

You can also test by connecting to the server from a Terminal window in the VM's host, or from another VM.

Just remember that if you are connecting from the server's VM, all the authorization features are disabled.

Also note that security.authorization is enabled, explicitly. When the server starts with security.authorization enabled, all the user account permissions are in effect.

The default value for the security.authorization setting is disabled. If the server starts without the setting, every user connecting has full access to the entire server. So, enable it, first.

Do not open up your server to the Internet unit you have set the security.authorization to "enabled".

Restart the server.

service mongod restart

Test that you can connect via the external IP address.

Open a Terminal window on the host OS, or in a different VM and type:

mongo 192.168.111.27

You should get the prompt.

Tweak the number of processes.

If you see a warning like this:

** WARNING: soft rlimits too low. rlimits set to 1872 processes, 
64000 files. Number of processes should be at least 32000 : 
0.5 times number of files.

It means you need to tweak an OS setting to allow more processes. This is related to the fact we're on a VM that's been given little memory.

Instructions are here: http://serverfault.com/questions/591812/how-to-set-ulimits-for-mongod

In summary

sudo -s
echo "user soft nproc 64000" > /etc/security/limits.conf
reboot

Note that you must reboot, because this is a kernel setting.

Restart and connect again.

Run these commands:

service mongod restart
mongo 127.0.0.1

If you don't get the shell, check that your configuration, above, is correct.

The MongoDB Shell

MongoDB can be administered by a shell. It's based on Javascript, and is a little different from the typical command line experience.

Getting help.

Type "help" to see the help screens.

Exiting.

To exit the shell, press Control-D.

Readline and EMACS Keys

The shell can replay previous commands. Just press the UP arrow, or press Control-P. You'll go backwards through the history of successfully executed commands.

You can also move left and right across the line, and edit it.

The key combination Control-E will move the cursor to the end of the line.

The key combination Control-A will move the cursor to the start of the line.

How it's different from other command lines.

There are two broad categories of commands. There are the ones that look like other command shells you've used before with databases like SQLite, Oracle, PostgreSQL, and MySQL:

use someDatabase
help
show users

Then, there are the ones that look like Javascript and work nothing like other command shells:

db.getUsers();
db.getName();
db.auth("foo", "bar");

The MongoDB shell is like a Javascript Read-Eval-Print Loop (REPL). You can even define variables:

var u = db.getUsers();
var f = function(x) { return x*x; }

Tab completion

As you're typing these commands, you can press the tab key to see possible completions. So you don't need to remember all the commands.

Unfortunately, this tab-completion doesn't work for the commands in the first, upper list.

db.*

The good news is that the commands that follow "db." are supported by the tab-completion feature.

The commands are also called "method names". For example, createCollection() is a method on the db object. It's called like this:

db.createCollection("foo");

db.collection.*, a convenience.

In the db namespace, there are properties that correspond to each of the collections created by the db.createCollection() method.

If you create a collection that has the same name as an existing method, the collection will be created, but you won't be able to access it through a property.

For example, if you made a collection called "auth", you cannot access it with db.count. Here's a transcript of my attempt to do this.

> db.createCollection("auth")
> db.auth.count()
2016-07-27T21:00:16.655-0700 E QUERY    [thread1] TypeError: db.auth.count is not a function :
@(shell):1:1

Here's what happens when I create a collection named "foo".

> db.createCollection("foo")
> db.foo.count()
0

So the rule is: you can have this nice shortcut, where there's a property that corresponds to you collection's name, unless the collection name clashes with an existing name in the db namespace.

The argument object.

Remember this command?

db.createUser({ user:'backup', pwd:'backup', roles: [ 'backup' ] });

The createUser() function takes only one argument. It appears to be three arguments, but it's actually a single object that's passed as an argument. You can see the brackets, "{}", around it.

Many functions take both strings and objects as arguments.

Chaining calls.

The MongoDB shell uses Javascript, which has the object oriented feature where the result of a function call can be an object with methods. For example, db.getCollectionNames() returns an Array of names. Try this:

db.getCollectionNames();

The output looks like this:

[
    "Group",
    "JKLogBookmark",
    "Log",
    "Message",
    "_Cardinality",
    "_EventDimension",
    "_Index",
    "_QueryToolQuery",
    "_Role",
    "_SCHEMA",
    "_User",
    "_dummy"
]

Note the braces "[]". Array objects have numerous methods. The Array.map() method applies a function to each element of the array, and returns an array of return values.

The db.getCollection("CollectionName") method gets a collection object, which has the count() method to count the number of documents in the collection. (This is called the cardinality of the collection.)

db.getCollection("Group").count();

The output is:

10

The Array.map() method can be used to apply count() to all the elements of an array:

db.getCollectionNames().map(function(n) { return db.getCollection(n).count(); });

The output looks like this. It's not real - I just made this up:

[ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120 ]

That's not so useful, because it's hard to know what each number means. Here's a better way to get the counts:

db.getCollectionNames().map(function(n) { return n + ": " + db.getCollection(n).count(); });

The output looks like this. Again, fake:

[ "Group: 10", "JKLogBookmark: 20", ....etc.

You can also define a function to make the above easier:

var format_count = function (n) { return n + ": " + db.getCollection(n).count(); };

And use it like this:

db.getCollectionNames().map(format_count);

Note that there are no parentheses after format_count. It's just bare. That means format_count is a reference to the function; it doesn't execute the function.

Making a Replica Set

Replicas are the main way data is backed up in MongoDB. A replica is an instance of a MongoDB server that copies its data from another MongoDB server.

The copy is called the replica and the original is called the primary.

We will construct a two-node replica set.

The instructions explain in a lot of detail why you need three nodes, but we'll do only two for the purposes of this tutorial. It's just as easy to create a three node replica set. Beyond that, you will need to pore over the documentation, because I think the complexity of larger clusters requires more depth of knowledge.

Clone your Virtual Machine

In the VirtualBox Manager, right-click on your mongodb virtual machine. Select "Clone..." from the menu.

(vbox.png)

Change the name of the VM, and make it a Linked Clone. Check off the option to reset the MAC address. You need to do this last thing because the two VMs are going to participate on the same network.

(vbox-clone.png)

Change the ip address.

The following command will open up the network settings.

sudo vi /etc/network/interfaces

Find the static settings for eth0 and change the IP address:

auto eth0
iface eth0 inet static
 address 192.168.111.28
 netmask 255.255.0.0
 gateway 192.168.1.1
 broadcast 192.168.255.255

I'm too lazy to type "sudo /etc/init.d/networking restart", so I force a reboot with "sudo reboot".

Change the firewall configuration.

Internal Authentication

Create a random keyfile to serve as a password for the node.

All the machines in a replica set must have the same keyfile pre-shared with the cluster.

Change the configuration

The tutorial at MongoDB uses command line options, but I prefer to use the configuration file /etc/mongod.conf

Create a named replica set

Make the replication section look like this. "rs0" is the name of the replication set.

replication:
  replSetName: rs0

Set the storage engine to use a smaller footprint

My virtual machine didn't have enough disk space to hold all the files for the database and a journal. The logs said so. I needed to reduce the file size. There's a setting for this, smallFiles.

storage:
  mmapv1:
    smallFiles: true

Rudimentary MongoDB

This isn't a complete tutorial on using MongoDB. It's just here as an overview and a reference so you can test that your database is working.

Creating a New Database

Dropping (Deleting) a Database

Inserting Documents

Deleting Documents

Finding Documents

Transforming Documents

Aggregation Operations

Reply

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.

More information about formatting options

5 + 10 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.