ITKeyword,专注技术干货聚合推荐

注册 | 登录

解决database - MongoDB vs. Cassandra

相关推荐:MEAN - database in mongodb vs relational database

MEAN framework. I don't understand why is the mongodb so popular that it is build in the framework and you cannot easily change the database (or can I

I am evaluating what might be the best migration option.

Currently, I am on a sharded MySQL (horizontal partition), with most of my data stored in JSON blobs. I do not have any complex SQL queries (already migrated away after since I partitioned my db).

Right now, it seems like both MongoDB and Cassandra would be likely options. My situation:

Lots of reads in every query, less regular writes

Not worried about "massive" scalability

More concerned about simple setup, maintenance and code

Minimize hardware/server cost

database database-design mongodb cassandra


 |
  this question

edited Dec 16 '11 at 20:24

Brian

2,49782327

asked May 23 '10 at 17:39

ming yeow

8,3851984140

closed as not constructive by LittleBobbyTables, BNL, Florent, Praveen, Sergio Tulentsev Nov 6 '12 at 14:31

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references,

or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question

can be
 d and possibly reopened, visit the help center for guidance.If this question can be reworded to fit the rules in the help center, please edit the question.

277

 

Ironically, I observed, most of the time, the closed question are most useful (see view count and upvotes). I wonder if there should be another StackOverflow website for questions like this, where they are not closed, and people can still contribute.

– Watt

Nov 12 '13 at 20:58

103

 

Yup, Massive fail, find a Q that everybody's interested in... Close it.

– Wren

Dec 16 '13 at 13:02

41

 

Once again the Stack overflow censors strike back. You are right Watt, I too find that the closed questions are sometimes the better ones to read.

– Boltimuss

Mar 3 '14 at 12:43

14

 

This question would also serve the community well being open as the products in question mature and evolve over time.

Like it or not, articles like this on Stackoverflow have very high Google rankings, and will be continuously used as sources of knowledge to readers.

– Excalibur

Apr 8 '14 at 16:59

20

 

I agree it's so sad to see how many questions are closed beause someone thinks it's not useful for others. I think it doesn't hurt anybody to leave them open. If you are not interested just don't reply.

– skan

Aug 14 '14 at 11:43

 | 

show 8 more comments

6 Answers

6

-------Accepted-------Accepted-------Accepted-------

Lots of reads in every query, fewer regular writes

Both databases perform well on reads where the hot data set fits in memory.

Both also emphasize join-less data models (and encourage denormalization instead), and both provide indexes on documents or rows, although MongoDB's indexes are currently more flexible.

Cassandra's storage engine provides constant-time writes no matter how big your data set grows. Writes are more problematic in MongoDB, partly because of the b-tree based storage engine, but more because of the per database write lock.

For analytics, MongoDB provides a custom map/reduce implementation; Cassandra provides native Hadoop support, including for Hive (a SQL data warehouse built on Hadoop map/reduce) and Pig (a Hadoop-specific analysis language that many think is a better fit for map/reduce workloads than SQL).

Not worried about "massive" scalability

If you're looking at a single server, MongoDB is probably a better fit. For those more concerned about scaling, Cassandra's no-single-point-of-failure architecture will be easier to set up and more reliable. (MongoDB's global write lock tends to become more painful, too.) Cassandra also gives a lot more control over how your replication works, including support for multiple data centers.

More concerned about simple setup, maintenance and code

Both are trivial to set up, with reasonable out-of-the-box defaults for a single server. Cassandra is simpler to set up in a multi-server configuration since there are no special-role nodes to worry about; here is a screencast demonstrating setting up a 4-node Cassandra cluster in two minutes.

If you're presently using JSON blobs, MongoDB is an insanely good match for your use case, given that it uses BSON to store the data. You'll be able to have richer and more queryable data than you would in your present database. This would be the most significant win for Mongo.


 |
  this answer

edited Jan 28 '13 at 19:10

PINGAS

52

answered May 24 '10 at 3:58

Michael

5,59421220

50

 

Totally different, a comment isn't big enough, but ... Cassandra is a linearly scalable (amortized constant time reads & writes) dynamo/google bigtable hybrid that features fast writes regardless of data size. It's feature set is minimalistic, little beyond that of an ordered key value store. MongoDB is a heavily featured (and fast) document store at the cost of durability and guarantees about writes persisting (since they're not immediately written to disk). They're different beasts with different philosophies, MongoDB's closer to a RDMS replacement ...

– Michael

May 24 '10 at 23:56

12

 

while Cassandra is lower level but allows for uber scaling (see Twitter/Digg/Facebook), but you're going to have to be deliberate in how you lay your data out, build secondary indexes etc, since no flexible querying is allowed.

– Michael

May 24 '10 at 23:59

7

 

Because everyone mentioned twitter here in relation to Cassandra: they are not using Cassandra for persisting tweets, they use still MySQL here (engineering.twitter.com/2010/07/cassandra-at-twitter-today.html). Ok, but I can imagine that they still store lots of data for other purposes in Cassandra.

– High6

Jan 13 '12 at 8:05

6

 

It looks like the global write lock may have been removed in Mongo 2.2...

– Matt F

Oct 18 '12 at 4:16

5

 

Even before my project went live, I am feeling the pain points of Mongodb. Hot backup is a basic requirement. To do a hot backup in a Linux server, you have to first setup a LVM partition (not so common) and take a snapshot before every backup session. Another easy way is use Mongodb paid backup service. But, that service is

expensive (2.3$/GB/month). Soon you will need a replicaset for fault tolerance. With open source version, the nodes can exchanges data only as clear text. For SSL you have to go with Entprise edition. And that is 10,000$. Goodbye Mongodb. Refactoring my code to Cassandra.

– Karthik Sankar

Oct 2 '14 at 15:16

 | 

show 11 more comments

I've used MongoDB extensively (for the past 6 months), building a hierarchical data management system, and I can vouch for both the ease of setup (install it, run it, use it!) and the speed. As long as you think about indexes carefully, it can absolutely scream along, speed-wise.

I gather that Cassandra, due to its use with large-scale projects like Twitter, has better scaling functionality, although the MongoDB team is working on parity there. I should point out that I've not used Cassandra beyond the trial-run stage, so I can't speak for the detail.

The real swinger for me, when we were assessing NoSQL databases, was the querying - Cassandra is basically just a giant key/value store, and querying is a bit fiddly (at least compared to MongoDB), so for performance you'd have to duplicate quite a lot of data as a sort of manual index. MongoDB, on the other hand, uses a "query by example" model.

For example, say you've got a Collection (MongoDB parlance for the equivalent to a RDMS table) containing Users. MongoDB stores records as Documents, which are basically binary JSON objects. e.g:

{

FirstName: "John",

LastName: "Smith",

Email: "john@smith.com",

Groups: ["Admin", "User", "SuperUser"]

}

If you wanted to find all of the users called Smith who have Admin rights, you'd just create a new document (at the admin console using Javascript, or in production using the language of your choice):

{

LastName: "Smith",

Groups: "Admin"

}

...and then run the query. That's it. There are added operators for comparisons, RegEx filtering etc, but it's all pretty simple, and the Wiki-based documentation is pretty good.


 |
  this answer

answered Jul 1 '10 at 22:29

Richard K.

相关推荐:database - Tables already created to insert into a cassandra keyspace to test

luster a little, how data replicates, etc.I have a cassandra cluster formed by 5 machines ( centos 7 & cassie 3.4 on them).My question :Are ther

1,3181713

34

 

Update (8th August 2011): Amazon's Ireland EC2 data centre had a lightning-related incident last night, and in sorting out our server recovery, I discovered one pretty crucial point: if you've got a replication set of two servers (and they're easy to setup), make sure you have an Arbiter node, so if one goes down, the other one doesn't panic and stall in Secondary mode! Trust me, that's a pain in the behind to sort out with a big database.

– Richard K.

Aug 8 '11 at 20:11

4

 

to add what @Richard K said, you should have arbiter node when you have even number of nodes (primary+secondary) in a replica set.

– Amareswar

Feb 3 '13 at 21:04

  

 

Added to that consider mongodb when more aggregation to be done on data analytics.

– user1503117

Oct 1 '15 at 14:04



 | 

Why choose between a traditional database and a NoSQL data store? Use both! The problem with NoSQL solutions (beyond the initial learning curve) is the lack of transactions -- you do all updates to MySQL and have MySQL populate a NoSQL data store for reads -- you then benefit from each technology's strengths. This does add more complexity, but you already have the MySQL side -- just add MongoDB, Cassandra, etc to the mix.

NoSQL datastores generally scale way better than a traditional DB for the same otherwise specs -- there is a reason why Facebook, Twitter, Google, and most start-ups are using NoSQL solutions. It's not just geeks getting high on new tech.


 |
  this answer

answered Apr 17 '12 at 0:45

Jason Grant Taylor

61952

6

 

I totally agree. I am using mongodb + mysql in one of the upcoming product that I am architecting. It is an upcoming financial product cloud. mysql is used where we absolutely need transactional capabilities. mongodb is used to store non-computing complex data structures that just need to be pulled up when required. working good so far. :)

– ramonrails

Jul 19 '13 at 17:05

  

 

I also used such a dual approach in most of my projects, and in some others the NFS mounted file system was used together with PostgreSQL for seismic blobs nearing 1 Gb in some cases. A path is a kind of query to the key value database.

– h22

Aug 28 '14 at 11:51

  

 

Nice idea! Thanks.

– Tom Dworzanski

Mar 22 '15 at 7:19

  

 

Here is a link to a question I asked about how to architect both sql and nosql databases: dba.stackexchange.com/questions/102053/… I could use some insight you may have

– j will

May 20 '15 at 15:45

  

 

He already has escaped from transactions for good => now infinite scalability might be possible .. otherwise -> not :)

– bodrin

Oct 14 '15 at 14:44

 | 

show 1 more comment

I'm probably going to be an odd man out, but I think you need to stay with MySQL. You haven't described a real problem you need to solve, and MySQL/InnoDB is an excellent storage back-end even for blob/json data.

There is a common trick among Web engineers to try to use more NoSQL as soon as realization comes that not all features of an RDBMS are used. This alone is not a good reason, since most often NoSQL databases have rather poor data engines (what MySQL calls a storage engine).

Now, if you're not of that kind, then please specify what is missing in MySQL and you're looking for in a different database (like, auto-sharding, automatic failover, multi-master replication, a weaker data consistency guarantee in cluster paying off in higher write throughput, etc).


 |
  this answer

answered Feb 23 '12 at 20:50

Kostja

778511

12

 

He is using sharding, which means his data is partitioned manually across servers. Mongodb can automate sharding, which may be a benefit.

– fabspro

Feb 14 '13 at 11:17

12

 

He is also storing mostly JSON blobs in RDBMS -- rendering relational design (features) useless.

– Damir Sudarevic

Mar 22 '13 at 11:54

1

 

The data model and automatic sharding are indeed different, but when choosing a database, you need to look at the storage engine first, and the rest of bells and whistles second. How is the storage engine going to perform under a load spike? How is autosharding feature going to perform under a data inflow spike? Before you relinquish control to the database for these important aspects, you'd better make sure it's going to be capable of the task.

– Kostja

Apr 30 '13 at 9:23

4

 

Relational model is one of the most well thought-out, efficient to implement and frugal data models out there. "Rendering relational design features useless" may relate to constraints, triggers, or referential integrity - but these all are pay per use.

– Kostja

Jul 12 '13 at 17:58



 | 

I haven't used Cassandra, but I have used MongoDB and think it's awesome.

If your after simple setup, this is it. You simply untar MongoDB and run the mongod daemon and that's it..it's running.

Obviously that's only a starter, but to get you started it's easy.


 |
  this answer

answered May 23 '10 at 17:57

dalton

2,7541124

1

 

AFAIK, the same applies to Cassandra as well. Untar, run the daemon. The test cluster is setup and ready for production!

– asgs

Jun 4 '15 at 18:55



 | 

I saw a presentation on mongodb yesterday. I can definitely say that setup was "simple", as simple as unpacking it and firing it up. Done.

I believe that both mongodb and cassandra will run on virtually any regular linux hardware so you should not find to much barrier in that area.

I think in this case, at the end of the day, it will come down to which do you personally feel more comfortable with and which has a toolset that you prefer. As far as the presentation on mongodb, the presenter indicated that the toolset for mongodb was pretty light and that there werent many (they said any really) tools similar to whats available for MySQL. This was of course their experience so YMMV. One thing that I did like about mongodb was that there seemed to be lots of language support for it (Python, and .NET being the two that I primarily use).

The list of sites using mongodb is pretty impressive, and I know that twitter just switched to using cassandra.


 |
  this answer

answered May 23 '10 at 17:57

GrayWizardx

8,31821833



 | 

protected by Tats_innit Aug 12 '14 at 2:57

Thank you for your interest in this question.

Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site.

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for?

Browse other questions tagged database database-design mongodb cassandra

or ask your own question.

相关推荐:mongodb error creating initial database config information 问题处理

开发同事说应用里面mongodb写入报错,自己进服务器查看,报错信息如下:[mongodb@azure_d1_dbm1_3_11 ~]$ /usr/local/mongodb-linux-x86_64-3.0.3/bin/mongo loc

I am evaluating what might be the best migration option.

Currently, I am on a sharded MySQL (horizontal partition), with most of my data stored in JSON blobs. I do not...


相关阅读排行


用户评论

游客

相关内容推荐

最新文章

×

×

请激活账号

为了能正常使用评论、编辑功能及以后陆续为用户提供的其他产品,请激活账号。

您的注册邮箱: 修改

重新发送激活邮件 进入我的邮箱

如果您没有收到激活邮件,请注意检查垃圾箱。