MongoDB Authentication Process: Locking Up The Database

Intro

This is my second MongoDB post of many planned, so I’ve decided to start the Mini Mongo Series, catchy right?

The first problem I faced when setting up my own MongoDB instance on AWS was figuring out how to do simple authentication. If there are no users configured in myDatabase.system.users, anyone can access the database without authenticating. E.g. to prevent anyone from simply going to ec2-xx-xx-xxx.compute-1.amazonaws.com:27017 and accessing the data.

There are a lot of great tutorials on how to add security and authenticate using drivers but they come as many separate small tutorials. So here it is in one unified place and my experiences setting it up. If you you have any questions or comments feel free to comment below.

1. Add a user to the database

We are going to add a user to our database and  allow them to use credentials to authenticate later.

First make sure mongod is running without the auth flag.

Then run mongo in another command line to access the interactive MongoDB shell. Lets add a user to our pets database.

$ mongod 
...
$ mongo
> use pets 
> db.addUser('petar','myPassword');

Now that we have a user created an account, lets allow them to authenticate. Documentation here

> db.auth('petar', 'myPassword');

You can confirm this user was added by running the following.

> db.system.users.find()

2. Confirm that authentication works

A simple test case to ensure our authentication works is to run mongod --auth and try to connect from the mongo command line with and without the user credentials we previously created.

With credentials we expect to see the list of databases

$ mongod --auth 
...
$ mongo -u petar -p myPassword 
> show dbs
admin 0.203125GB
pets  0.203125GB

Without credentials we expect to see an error saying we are unauthorized

$ mongod --auth 
...
$ mongo
> show dbs
failed:{ "ok" : 0, "errmsg" : "unauthorized" }

You’ll see that since we didn’t provide our user credentials, we get an error.

Connect MongoDB to Node.js

Using the MongoDB driver from 10gen and some boilerplate code, we can connect and authenticate to the database using the user we just setup.

Run the Node.js app with environment variables

In order not to hard code the env variables in your app, which is generally a bad practice, we can pass in the credentials from the command line and have them available under process.env

$ PETAR_MONGODB_DB_HOST=aws.xxx.com \
PETAR_MONGODB_DB_PORT=27018 \
PETAR_DB_NAME=myDbName \
PETAR_MONGODB_DB_USERNAME=authUser \
PETAR_MONGODB_DB_PASSWORD=authUserPass \
node app.js

Conclusion

We’ve managed to create a user, enable authentication, put that authentication to use, and confirm that it is working. Our Node.js app is now connected to our MongoDB instance using an authenticated user, preventing unauthorized access.

$ successfully auth to open AWS MongoDB:  true

In-depth Resources

MongoDB Lesson Learned: Remeber To Use Indexes

After launching my first node.js + MongoDB API in production, I was religiously monitoring it like parents watching over their first born child.

As traffic started to roll in, the API server began to quietly cry. The average response in milliseconds increased as more and more concurrent queries were happening in the DB. But this wasn’t happening on my local development environment when I was running three times more traffic in my stress tests. Of course localhost beats the production environment any day of the week right?

Long story short, I forgot to enable indexes on the production environment. After adding db.collection.ensureIndex({"items": 1}) to the most important key that I was querying, the beams of sun broke through the cloudy sky.

Lets look at the difference in our famous before and after example:

Before
> GET /sites/?… 200 2691ms – 128b

After
> GET /sites/?… 200 91ms – 128b

And a chart for giggles

API response time

Thanks to the awesome folks that make these amazing tools

Update: 2/11/13

On a side note, this small instance was able to handle over 1.5k requests per minute without ever breaking a sweat. Big ups OpenShift.

A Problem With Proximity Based Apps

Intro

This is not a battle between flat geometry and spherical geometry to calculate distances between points. This falls somewhere between a rant, my thoughts on the subject, and how we can improve proximity based applications to return relevant results.

Proximity is the state of being accurately close to something. In our case we have an individual trying to find the closest fast-food location while driving (I know you shouldn’t be using your phone when driving, but let’s make an exception). An app uses the individual’s location, its latitude and longitude, to determine the closest burger shack. The nearest five locations are sent back and our individual is on their way to grubbing down.

The biggest problem I have noticed with location based recommendations, is that they fail to take into consideration the distance between two points on a map (arc length) versus the distance between two points while driving (linear). The distance is NOT the same. By simply calculating the two points a map, the distance will tend to be much shorter because there are no streets, directions, or obstacles. While driving there are factors such as speed, traffic, and one-way streets to take into consideration.

Imagine driving on a road with an app’s suggested location on your left hand side over the freeway but the shortest path to this location is to go down the road for 3 miles then another 3 miles back, when the closest location is actually 4 miles down the road.

Flat Geometry

Lets examine this by first calculating the distance between two points while driving. My starting location is in a parking lot at 37.3879242, -121.9821091. To get to the nearest IN-N-Out burger (37.3610039,-122.0248489), I have to drive down a street which takes me out of the way, then get on the freeway, and make a couple more turns. The shortest possible path according to Google Maps is 5.1 miles [1].

driving between two points

Spherical Geometry

Now we compute this using the Haversine Forumla, a commonly used spherical formula in navigation, to find the distance between two points on a sphere. For the sake of brevity, the result is 2.996 miles [2]. The first downfall is the precession of the Earth’s radius; it is rounded to 3961 miles, which is optimized for locations around 39 degrees from the equator (roughly the latitude of Washington DC, USA). A more precise number is difficult to calculate because the Earth is not perfectly spherical, so no single value serves as an exact radius. I think that locations in a radius are most useful when they simply act as a reference, not as a directive.

The Difference

Comparing the two results, the difference between them is huge…at 2.104 miles. And this is only for locations within a radius of 10 miles. The distance between two points in a radius of 50 miles would be increasingly less accurate when driving because the number of required turns would increase. We can think of all these turns as the sum of the two shorter sides of triangles (not hypotenuse). While using spherical geometry is a straight shot because there are no extra deltas (it’s the hypotenuse).

Then again computing the latter is significantly easier and faster than the prior. Tools such as MongoDB’s geospacial queries [3] makes distance calculation easy and flexible but lack one thing, the shortest path. The most optimal solution would be combining spherical geometry with a shortest path algorithm such as Dijkstras [4] to determine to nearest location the with shortest route. While the shortest path is not necessary for all use cases, it matters the most when you are driving to your destination.

Just remember, “Fast is fine, but accuracy is everything.”

[1] http://goo.gl/maps/Vc67E
[2] http://andrew.hedges.name/experiments/haversine/
[3] http://docs.mongodb.org/manual/core/geospatial-indexes/#geospatial-indexes-distance-calculation
[4] http://en.wikipedia.org/wiki/Dijkstra’s_algorithm