ITKeyword,专注技术干货聚合推荐

注册 | 登录

解决database - MongoDB vs. Cassandra

相关推荐:mongodb error creating initial database config information 问题处理

开发同事说应用里面mongodb写入报错,自己进服务器查看,报错信息如下:[mongodb@azure_d1_dbm1_3_11 ~]$ /usr/local/mongodb-linux-x86_64-3.0.3/bin

I am evaluating what might be the best migration option.

Currently, I am on a sharded MySQL (horizontal partition), with most of my data stored in JSON blobs. I do not have any complex SQL queries (already migrated away after since I partitioned my db).

Right now, it seems like both MongoDB and Cassandra would be likely options. My situation:

  • Lots of reads in every query, less regular writes
  • Not worried about "massive" scalability
  • More concerned about simple setup, maintenance and code
  • Minimize hardware/server cost
database mongodb database-design cassandra
|
  this question
edited Feb 13 at 14:51 Andriy Kuba 3,992 2 11 29 asked May 23 '10 at 17:39 ming yeow 9,658 20 94 149

closed as off-topic by Martijn Pieters♦ Mar 1 at 12:06

This question appears to be off-topic. The users who voted to close gave this specific reason:

  • "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." – Martijn Pieters
If this question can be reworded to fit the rules in the help center, please edit the question.

3   An official performance benchmark statistics is available. Cassandra vs MongoDB vs HBase –  Ravi Nov 11 '14 at 19:21      >Lots of reads in every query, less regular writes => Look for CQRS (separate your reads from your writes probably without event sourcing but check whether you can update your read model async .. sync may work too .. it depends on your use-cases) –  bodrin Oct 14 '15 at 14:46

 | 

6 Answers
6

-------Accepted-------Accepted-------Accepted-------

Lots of reads in every query, fewer regular writes

Both databases perform well on reads where the hot data set fits in memory. Both also emphasize join-less data models (and encourage denormalization instead), and both provide indexes on documents or rows, although MongoDB's indexes are currently more flexible.

Cassandra's storage engine provides constant-time writes no matter how big your data set grows. Writes are more problematic in MongoDB, partly because of the b-tree based storage engine, but more because of the per database write lock.

For analytics, MongoDB provides a custom map/reduce implementation; Cassandra provides native Hadoop support, including for Hive (a SQL data warehouse built on Hadoop map/reduce) and Pig (a Hadoop-specific analysis language that many think is a better fit for map/reduce workloads than SQL).

Not worried about "massive" scalability

If you're looking at a single server, MongoDB is probably a better fit. For those more concerned about scaling, Cassandra's no-single-point-of-failure architecture will be easier to set up and more reliable. (MongoDB's global write lock tends to become more painful, too.) Cassandra also gives a lot more control over how your replication works, including support for multiple data centers.

More concerned about simple setup, maintenance and code

Both are trivial to set up, with reasonable out-of-the-box defaults for a single server. Cassandra is simpler to set up in a multi-server configuration since there are no special-role nodes to worry about; here is a screencast demonstrating setting up a 4-node Cassandra cluster in two minutes.

If you're presently using JSON blobs, MongoDB is an insanely good match for your use case, given that it uses BSON to store the data. You'll be able to have richer and more queryable data than you would in your present database. This would be the most significant win for Mongo.


|
  this answer
edited Jan 28 '13 at 19:10 PINGAS 5 2 answered May 24 '10 at 3:58 Michael 6,362 2 14 20 62   Totally different, a comment isn't big enough, but ... Cassandra is a linearly scalable (amortized constant time reads & writes) dynamo/google bigtable hybrid that features fast writes regardless of data size. It's feature set is minimalistic, little beyond that of an ordered key value store. MongoDB is a heavily featured (and fast) document store at the cost of durability and guarantees about writes persisting (since they're not immediately written to disk). They're different beasts with different philosophies, MongoDB's closer to a RDMS replacement ... –  Michael May 24 '10 at 23:56 18   while Cassandra is lower level but allows for uber scaling (see Twitter/Digg/Facebook), but you're going to have to be deliberate in how you lay your data out, build secondary indexes etc, since no flexible querying is allowed. –  Michael May 24 '10 at 23:59 7   Because everyone mentioned twitter here in relation to Cassandra: they are not using Cassandra for persisting tweets, they use still MySQL here ( engineering.twitter.com/2010/07/cassandra-at-twitter-today.‌​html). Ok, but I can imagine that they still store lots of data for other purposes in Cassandra. –  High6 Jan 13 '12 at 8:05 6   It looks like the global write lock may have been removed in Mongo 2.2... –  Matt F Oct 18 '12 at 4:16 8   Even before my project went live, I am feeling the pain points of Mongodb. Hot backup is a basic requirement. To do a hot backup in a Linux server, you have to first setup a LVM partition (not so common) and take a snapshot before every backup session. Another easy way is use Mongodb paid backup service. But, that service is expensive (2.3$/GB/month). Soon you will need a replicaset for fault tolerance. With open source version, the nodes can exchanges data only as clear text. For SSL you have to go with Entprise edition. And that is 10,000$. Goodbye Mongodb. Refactoring my code to Cassandra. –  Karthik Sankar Oct 2 '14 at 15:16  |  show more comments

I've used MongoDB extensively (for the past 6 months), building a hierarchical data management system, and I can vouch for both the ease of setup (install it, run it, use it!) and the speed. As long as you think about indexes carefully, it can absolutely scream along, speed-wise.

I gather that Cassandra, due to its use with large-scale projects like Twitter, has better scaling functionality, although the MongoDB team is working on parity there. I should point out that I've not used Cassandra beyond the trial-run stage, so I can't speak for the detail.

相关推荐:database design - What's The Best Practice In Designing A Cassandra Data Model

ks for you E.g., I've heard that exporting/importing the Cassandra data is very difficult, making me wonder if that's going to hinder synci

The real swinger for me, when we were assessing NoSQL databases, was the querying - Cassandra is basically just a giant key/value store, and querying is a bit fiddly (at least compared to MongoDB), so for performance you'd have to duplicate quite a lot of data as a sort of manual index. MongoDB, on the other hand, uses a "query by example" model.

For example, say you've got a Collection (MongoDB parlance for the equivalent to a RDMS table) containing Users. MongoDB stores records as Documents, which are basically binary JSON objects. e.g:

{
   FirstName: "John",
   LastName: "Smith",
   Email: "john@smith.com",
   Groups: ["Admin", "User", "SuperUser"]
}

If you wanted to find all of the users called Smith who have Admin rights, you'd just create a new document (at the admin console using Javascript, or in production using the language of your choice):

{
   LastName: "Smith",
   Groups: "Admin"
}

...and then run the query. That's it. There are added operators for comparisons, RegEx filtering etc, but it's all pretty simple, and the Wiki-based documentation is pretty good.


|
  this answer
answered Jul 1 '10 at 22:29 Richard K. 1,508 1 8 13 37   Update (8th August 2011): Amazon's Ireland EC2 data centre had a lightning-related incident last night, and in sorting out our server recovery, I discovered one pretty crucial point: if you've got a replication set of two servers (and they're easy to setup), make sure you have an Arbiter node, so if one goes down, the other one doesn't panic and stall in Secondary mode! Trust me, that's a pain in the behind to sort out with a big database. –  Richard K. Aug 8 '11 at 20:11 4   to add what @Richard K said, you should have arbiter node when you have even number of nodes (primary+secondary) in a replica set. –  Amareswar Feb 3 '13 at 21:04      Added to that consider mongodb when more aggregation to be done on data analytics. –  user1503117 Oct 1 '15 at 14:04

 | 

Why choose between a traditional database and a NoSQL data store? Use both! The problem with NoSQL solutions (beyond the initial learning curve) is the lack of transactions -- you do all updates to MySQL and have MySQL populate a NoSQL data store for reads -- you then benefit from each technology's strengths. This does add more complexity, but you already have the MySQL side -- just add MongoDB, Cassandra, etc to the mix.

NoSQL datastores generally scale way better than a traditional DB for the same otherwise specs -- there is a reason why Facebook, Twitter, Google, and most start-ups are using NoSQL solutions. It's not just geeks getting high on new tech.


|
  this answer
answered Apr 17 '12 at 0:45 Jason Grant Taylor 729 5 2 6   I totally agree. I am using mongodb + mysql in one of the upcoming product that I am architecting. It is an upcoming financial product cloud. mysql is used where we absolutely need transactional capabilities. mongodb is used to store non-computing complex data structures that just need to be pulled up when required. working good so far. :) –  ramonrails Jul 19 '13 at 17:05      I also used such a dual approach in most of my projects, and in some others the NFS mounted file system was used together with PostgreSQL for seismic blobs nearing 1 Gb in some cases. A path is a kind of query to the key value database. –  h22 Aug 28 '14 at 11:51 1   Here is a link to a question I asked about how to architect both sql and nosql databases: dba.stackexchange.com/questions/102053/… I could use some insight you may have –  j will May 20 '15 at 15:45      He already has escaped from transactions for good => now infinite scalability might be possible .. otherwise -> not :) –  bodrin Oct 14 '15 at 14:44      If you add MySQL, it's cumbersome to scale linearly like cassandra. You might end up with a single point of failure and a clumsy way to restore your data after a server failure. –  Rafael Sanches Mar 10 '16 at 22:40  |  show more comment

I'm probably going to be an odd man out, but I think you need to stay with MySQL. You haven't described a real problem you need to solve, and MySQL/InnoDB is an excellent storage back-end even for blob/json data.

There is a common trick among Web engineers to try to use more NoSQL as soon as realization comes that not all features of an RDBMS are used. This alone is not a good reason, since most often NoSQL databases have rather poor data engines (what MySQL calls a storage engine).

Now, if you're not of that kind, then please specify what is missing in MySQL and you're looking for in a different database (like, auto-sharding, automatic failover, multi-master replication, a weaker data consistency guarantee in cluster paying off in higher write throughput, etc).


|
  this answer
answered Feb 23 '12 at 20:50 Kostja 958 6 11 12   He is using sharding, which means his data is partitioned manually across servers. Mongodb can automate sharding, which may be a benefit. –  fabspro Feb 14 '13 at 11:17 12   He is also storing mostly JSON blobs in RDBMS -- rendering relational design (features) useless. –  Damir Sudarevic Mar 22 '13 at 11:54 1   The data model and automatic sharding are indeed different, but when choosing a database, you need to look at the storage engine first, and the rest of bells and whistles second. How is the storage engine going to perform under a load spike? How is autosharding feature going to perform under a data inflow spike? Before you relinquish control to the database for these important aspects, you'd better make sure it's going to be capable of the task. –  Kostja Apr 30 '13 at 9:23 4   Relational model is one of the most well thought-out, efficient to implement and frugal data models out there. "Rendering relational design features useless" may relate to constraints, triggers, or referential integrity - but these all are pay per use. –  Kostja Jul 12 '13 at 17:58

 | 

I haven't used Cassandra, but I have used MongoDB and think it's awesome.

If your after simple setup, this is it. You simply untar MongoDB and run the mongod daemon and that's it..it's running.

Obviously that's only a starter, but to get you started it's easy.


|
  this answer
answered May 23 '10 at 17:57 dalton 2,932 1 13 24 5   AFAIK, the same applies to Cassandra as well. Untar, run the daemon. The test cluster is setup and ready for production! –  asgs Jun 4 '15 at 18:55

 | 

I saw a presentation on mongodb yesterday. I can definitely say that setup was "simple", as simple as unpacking it and firing it up. Done.

I believe that both mongodb and cassandra will run on virtually any regular linux hardware so you should not find to much barrier in that area.

I think in this case, at the end of the day, it will come down to which do you personally feel more comfortable with and which has a toolset that you prefer. As far as the presentation on mongodb, the presenter indicated that the toolset for mongodb was pretty light and that there werent many (they said any really) tools similar to whats available for MySQL. This was of course their experience so YMMV. One thing that I did like about mongodb was that there seemed to be lots of language support for it (Python, and .NET being the two that I primarily use).

The list of sites using mongodb is pretty impressive, and I know that twitter just switched to using cassandra.


|
  this answer
answered May 23 '10 at 17:57 GrayWizardx 9,453 2 21 37      At the end of the day it is apples vs oranges comparison. Both the databases have their own strengths. Here are some things to consider - Object model, Secondary indexes, write scalability, high avaialability etc. have a blog post that explains the high level strategic differences between mongodb and cassandra here - scalegrid.io/blog/cassandra-vs-mongodb –  Dharshan Aug 14 '16 at 2:19

 | 

protected by Tats_innit Aug 12 '14 at 2:57

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged database mongodb database-design cassandra or ask your own question.

相关推荐:mongodb - NoSQL Database Design - Documents with Tagging

cation requirements.There can be lot of users (500k+)Every user can enter his/her documentsEvery user will probably create 10-200 docum


相关阅读排行


用户评论

游客

相关内容推荐

最新文章

×

×

请激活账号

为了能正常使用评论、编辑功能及以后陆续为用户提供的其他产品,请激活账号。

您的注册邮箱: 修改

重新发送激活邮件 进入我的邮箱

如果您没有收到激活邮件,请注意检查垃圾箱。