[Fixed]-Using MongoDB as our master database, should I use a separate graph database to implement relationships between entities?

8👍

Mike,

you should be able to store your relationship data in the graph database. Its high performance on traversing big graphs comes from locality, i.e. you don’t run queries globally but rather start a a set of nodes (which equal documents in your case, which are looked up by an index. you might even store start-node-ids for quick access in your mongo documents). From there you can traverse arbitrarily large paths in constant time (wrt data set size).

What are your other requirements (i.e. data set size, # of concurrent accesses etc, relationship/graph complexity).

Your queries are a really good fit for the graph database and easily expressable in its terms.

I’d suggest that you just grab a graphdb like neo4j and do a quick spike with your domain to verify the general feasibility and also find out additional questions you would like to have answered before investing in the second technology.

P.S. If you hadn’t started yet, you could also have gone with a pure graphdb approach as graph databases are a superset of document databases. And you’d rather talk domain in your case anyway than just generic documents. (E.g. structr is a CMS built on top of Neo4j).

6👍

The documents in MongoDB very much resemble nodes in Neo4j, minus the relationships. They both hold key-value properties. If you’ve already made the choice to go with MongoDB, then you can use Neo4j to store the relationships and then bridge the stores in your application. If you’re choosing new technology, you can go with Neo4j for everything, as the nodes can hold property data just as well as documents can.

As for the relationship part, Neo4j is a great fit. You have a graph, not unrelated documents. Using a graph database makes perfect sense here, and the sample queries have graph written all over them.

Honestly though, the best way to find out what works for you is to do a PoC – low cost, high value.

Disclaimer: I work for Neo Technology.

1👍

stay with mongodb. Two reasons – 1. its better to stay in the same domain if you can to reduce complexity and 2. mongodb is excellent for querying and requires less work than redis, for example.

1👍

We ended up using both, we are implementing a search engine for a transportation network.

Trying to implement relationships in MongoDB can become unwieldy once you go beyond 1 or 2 “links”. Essentially you would be storing objectids in an array and if you want to implement bi-directional relationships, then you have to implement two separate links. In Mongo, a “pointer” to an entity (or “link”) is just another text property (that can be interpreted differently), it is not a first class object like a relationship in Neo4j.

So we decided to use Neo4j to store the relationships and MongoDB to store everything else. The challenge then became keeping the two stores in sync.

We are using a 10gen lab project called “MongoConnector” which is mechanism to keep MongoDB in sync with another store. The project is currently unsupported, but the code is available:

http://blog.mongodb.org/post/29127828146/introducing-mongo-connector

MongoConnector uses the replica mechanism to implement the syncing. Essentially you are monitoring the MongoDB OpLog and you are implementing callbacks for any upserts (update or insert) and deletes. This implementation is called a “DocumentManager” in MongoConnector speak. We ended implementing a Neo4jDocumentManager.

On the query side, we found that Neo is better suited for “friend of a friend” kind of query, whereas MongoDB was better for general purpose queries, ie. per field or range queries dealing with dates.

I’ve been planning to have a talk and a blog post, but I haven’t got to it yet:

http://www.meetup.com/graphdb-boston/events/91703472/

There are drawbacks to this solution, like things getting out of sync if a process goes down or syncing being slow (not in realtime).

Leave a comment