See MongoDB

qaz@lemmy.world · 2 months ago

See MongoDB

jubilationtcornpone@sh.itjust.works · edit-2 2 months ago

Every time I’m assigned to a project that uses a document database

“So how are you guys handling all your related data?”

Finds collection of massive JSON documents containing all the related data

“Oh boy.”

BlueBockser@programming.dev · edit-2 2 months ago

What’s the problem with that? In my previous team, we had a structure with four levels of nesting where we only ever needed to query the first two levels. At first we used Postgres with normalized tables, but it was just slow as hell. Switching to MongoDB actually made our performance issues vanish.

Of course it all depends on what kinds of queries you need to run, but I don’t think that large JSON documents are necessarily a problem.

Ephera@lemmy.ml · 2 months ago

They’re talking about relations between data. For example, when you delete a user, you may also want to delete their stored data.

To some degree, this is less of a problem with document databases, because they don’t force you to chop your data into small parts like relational databases do (e.g. you can have lists of that user’s stored data as part of the JSON document). But you will likely still need some relations at some point.

Chances are you have a layer in your application code which ensures these relations that way.
Which is fine in my opinion. With relational databases, there’s also often some relations which you cannot model in the database.
But yeah, it requires somewhat more software architecture awareness, to not lump the relation checking logic into general application logic. And you can’t connect a second application to that database, without having to implement the relations another time or at least pulling them out into a shared library.

Rayquetzalcoatl@lemmy.world · 2 months ago

I’m currently building something using Mongo as the DB, and have so far been making sure to assign the user ID to everything that relates to that user when it’s created.

Wouldn’t you have to do something like that in MySQL anyway to ensure that the entries related to each other?

Ephera@lemmy.ml · 2 months ago

Oh yeah, I’m saying that relational databases push you even more to assign IDs to every miniscule piece of information, especially if you’re following best practices (3NF or similar).

For example, you’re not supposed to say that a user has a list of interests, you’re supposed to say that there’s users with a user_id and then there’s user_interests with a user_id and an interest_description, so two separate tables.
If those interests can be indexed, then you’d want three tables:

users(user_id)
interests(interest_id, description)
user_interests(user_id, interest_id) (N-M-Mapping)

I mean, this might not be the best example, as it kind of makes sense to not always load the user interests whenever you do anything with the user, but yeah, the point is that you’re supposed to split it up into separate tables and then JOIN it as you need it.

With these RDBMS, your entire data loading logic is supposed to happen in-database, so you pretty much need to chop the data into the smallest possible parts and assign IDs to all of those parts, to give you the flexibility to access them how you need to.

Rayquetzalcoatl@lemmy.world · 2 months ago

Ahhh okay I see. I’ve developed sites for a decade but have never had to really consider how databases work in any great detail, which obviously now with mongo I am having to think about. Thank you for clarifying! I’ve got some reading to do 🫡

jubilationtcornpone@sh.itjust.works · 2 months ago

In my previous team, we had a structure with four levels of nesting

Those are rookie numbers.

ℍ𝕂-𝟞𝟝@sopuli.xyz · 2 months ago

NoSQL has always been a niche use case thing.

For some stuff, no ACID is no problem. They have their place. What I’m more suspicious of is things like Google offering distributed databases that they pretend as if they could break the CAP theorem.

Lucy :3@feddit.org · edit-2 2 months ago

And yet my Uni treats it like the biggest thing in existence. Meanwhile I’ve never used anything other than RDBS and Redis (only for cache), neither in private nor at work.

ℍ𝕂-𝟞𝟝@sopuli.xyz · 2 months ago

MongoDB is huge though for all the wrong reasons, businesses think that just because it’s JS, they can just have frontend devs - sorry, they are “fullstack” now - doing DBA work.

I worked as one of two NoSQL DBAs for a Fortune 50 finance company, and there is a ton of CV-driven development going on giving NoSQL a bad name. Most use cases don’t need NoSQL. And for those which do, NoSQL is almost always harder to implement than simple SQL based RDBMSs.

CallMeAnAI@lemmy.world · 2 months ago

Jumping in this, bingo. JavaScript only shops scare the fuck out of me.

GenderNeutralBro@lemmy.sdf.org · 2 months ago

Just wail til they become AI-generated-JavaScript-only shops. They’re gonna be vibing like the Tacoma Narrows Bridge.

noobface@lemmy.world · 2 months ago

why is my deploy process so slow? ©_©=> 500k npm packages

Aurenkin@sh.itjust.works · 2 months ago

Sharded RDBS gets you very very far from my experience at least.

ℍ𝕂-𝟞𝟝@sopuli.xyz · 2 months ago

Definitely, and I’m saying that while my jobs were mostly on NoSQL and I love doing it.

oce 🐆@jlai.lu · edit-2 2 months ago

If you need to run queries that aggregate big amounts of data in a reasonable time and cost, you’ll need something built for it. For example, with a column oriented file format instead of the row oriented file format found in traditional relational databases

Pennomi@lemmy.world · 2 months ago

And the key word “big” here is far bigger than most engineers need to deal with. Hell, most supposed “big data” problems I’ve seen people try to tackle are small enough to fit the whole database into memory.

ℍ𝕂-𝟞𝟝@sopuli.xyz · 2 months ago

My point is more that 90% of use cases don’t need that, and for those that do, you can’t just slap eg. Cassandra at it and pretend it’s a relational database.

Stizzah@lemmygrad.ml · 2 months ago

It always depends on the context… My current job is 100% on Elasticsearch and I’m not missing transactions at all.

Gonzako@lemmy.world · 2 months ago

What’s ACID?

ramjambamalam@lemmy.ca · edit-2 2 months ago

Atomicity: either all parts of the transaction complete, or all parts of the transaction don’t complete; there’s no “partly complete” state

Consistency: the state of the database after a transaction is stable; all “downstream” effects (e.g. triggers) of the query are complete before the transaction is confirmed.

Isolation: concurrent transactions behave the same as sequential transactions

Durability: a power failure or crash won’t lose any transactions

Traditionally, ACID is where relational databases shine.

SpaceCadet@sopuli.xyz · 2 months ago

ACID is really just an arbitrary set of requirements for databases that made sense way back in the day when things were much simpler. ACID starts to hold you back when you want to scale out, because to have consistency you have to wait for your transaction to percolate through all the nodes of your system, and it doesn’t allow for things like a replicating node to be temporarily offline or lagging behind. Turns out though that not everything needs to be strictly ACID. For example, there are many cases where it doesn’t matter that a reader node has stale data for a second or two.

The thing MongoDB does is that instead of being dogmatically ACID all the time it allows you to decide exactly how ACID your transactions and your reads need to be, through the writeConcern and readConcern parameters. If you want it to be completely ACID, you can, but it comes at a cost.

Traditionally, ACID is where relational databases shine.

Relational databases shine with ACID on single-node systems when they’re not trying to solve the scale-out problem that MongoDB is trying to solve, but when they are trying to do that, they actually do much worse.

For example: most RDBMS systems have some kind of replication system, where you can replicate your transactions to one or more backup nodes either for failover or to use as a read-only node.

Now if you consider that whole system, replicas included, as “the database”, none of them are ACID, and I don’t know of any RDMBS-es that has mechanisms to automatically recover from a crashed primary without data loss, or that can handle the “split brain” problem.

CosmicTurtle0@lemmy.dbzer0.com · 2 months ago

You’ve gotten good answers from other folks but I’ll provide a ELI5:

Basically a set of rules in the database to make sure that it is immediately consistent.

NoSQL databases offer eventual consistency in exchange for speed so they are generally not considered to be ACID compliant.

Most traditional databases (MySQL, postgresql, etc.) are.

There are a couple of emerging companies that try to tackle speed for traditional databases. CockroachDB offers a postgress-based database that scales more like NoSQL while still offering ACID transactions.

TiDB is a similar company but for MySQL.

mcv@lemmy.zip · 2 months ago

Not all NoSQL databases are the same. Neo4j is acid compliant, and lightning fast for complex relationships that relational databases struggle with.

qjkxbmwvz@startrek.website · 2 months ago

https://en.m.wikipedia.org/wiki/ACID

Atomicity (something happens in its entirety or not at all), consistency (database is always in a valid state — if the database has constraints, they will always be honored), isolation (transactions don’t step on each other), durability (complete transaction is complete even if there’s a power failure).

Not a database expert, my parenthetical explanations may need work.

SkaveRat@discuss.tchncs.de · 2 months ago

But it’s webscale

Fenrisulfir@lemmy.ca · 2 months ago

I still suggest piping data to /dev/null during meetings to see who gets the joke

Zorsith@lemmy.blahaj.zone · 2 months ago

Oh man, do i have the product for you

https://devnull-as-a-service.com/

perishthethought@piefed.social · 2 months ago

Haha, they have a careers page.

Zorsith@lemmy.blahaj.zone · edit-2 2 months ago

Licensing

Pick any. We license it this way then.

*The old Facebook ReactJS non-compete license

*University of Utah Public License

*Apple Public Source License v1.x

*AT&T Public License

*The JSON License

*The Occulus Rift License

*TrueCrypt License 3.0

NegativeNull@lemmy.world · edit-2 2 months ago

Wow, that video is 15 years old!

https://youtu.be/b2F-DItXtZs

Björn@swg-empire.de · 2 months ago

If that is what it takes to get these kick-ass benchmarks.

qaz@lemmy.world · 2 months ago

Amazing video

r00ty@kbin.life · 2 months ago

I never understood why people compare nosql to rdbms. They are entirely different systems with different use cases.

Where you neee data consistency and need to always get the same results to a query go with a structured rdbms. Where you need speed over all of that (and there are real use cases for this) then nosql is for you. Using both is of course a likely result too.

There’s of course a lot of other considerations. But they’re different tools for different situations.

psycotica0@lemmy.ca · 2 months ago

I think it’s because the early marketing and hype compared NoSQL to rdbms. At the beginning they were all “hey man, don’t schemas suck? Isn’t it a pain having to migrate your data? Sometimes you just wanna cram shit somewhere, go fast, break things, and your DBA is a jackass! MongoDB”

And people, at that time, were either like “what the fuck?” and continue to not trust it to this day, or “hell yeah brother!” and then put everything into Mongo and were surprised when it lost some data or got into a corrupted state, or at least were surprised the first time they thought “huh, I really wish there was some consistency to all this data…”

So yeah, I think MongoDB didn’t come into the scene as “I’m a new kinda thing that has niche uses” it came on as “hey pussy, why are you still using your dad’s DB. Are you afraid?” and people still carry that in their hearts

cub Gucci@lemmy.today · 2 months ago

This. Why would almost every web app use a 50 yo language designed for accountants to store like 10kB of data on a remote disk? If you’re not in the field, it should be astronomically confusing

expr@programming.dev · 2 months ago

Mongodb is not actually faster. Postgres still beats it in any benchmark that matters.

Nothing is ever actually schema-less. There is merely explicit and implicit schemas. If you don’t want to bother encoding the schema as proper columns and instead want the schema to remain implicitly encoded in JSON, Postgres’ jsonb columns do a better job of that than any NoSQL database does.

PitLoversNeedMeds@jlai.lu · 2 months ago

What about PostgreSQL?

r00ty@kbin.life · 2 months ago

I do use postgres but only as an rdb provider. I thought while it supports json data as a type, does it provide for all of the other advantages of nosql databases for their use case?

Ultimately I feel like the best solution is to have a single database provider that could do both fully. I’m not sure it’s really there yet. But halpt to be told I’m wrong. I’ve not really needed that myself for my projects.

jubilationtcornpone@sh.itjust.works · 2 months ago

The query speed isn’t quite there but I would say it’s close enough for a lot of purposes, especially with proper indexing. And JSON column fields are indexable. Two things I’ve used Postgres’ JSON functionality for are:

1.) Storing unstructured data. 2.) Storing structured data that would exceed the table column limit.

In both cases, I’ve typically needed to extract the relevant data from the JSON records to either be stored in another table of turned into a materialized view so live query performance on the JSON columns was not that important.

PitLoversNeedMeds@jlai.lu · 2 months ago

Fair enough! Thanks for the detailed write-up

CallMeAnAI@lemmy.world · 2 months ago

Because nerds like to be smarter and “more efficient”.

God forbid you fuck around when PHP though!

cub Gucci@lemmy.today · 2 months ago

Or does writes to S3 via LSM

bastion@feddit.nl · 2 months ago

This is kinda absolute BS at this point, though.

Mongo has acid transactions, and has for years now. Although this is only within the same database, there are plenty of dbms (including rdbms) that don’t support cross-database transactions.

Mongo also, since time immemorial, has had “write concern” to ensure that it’s written to disk (to the journal) before the transaction is completed.

clif@lemmy.world · 2 months ago

This post is very timely because I was just introducing some new people to Mongo earlier this week and led off with “Now you might still hear people say ‘mongo is trash, it’s not even ACID compliant!’ but those people are dumb… it’s had that for years and years and is just another DBMS at this point (but not relational)”

… the last part also answers the other reply to this post. Yes.

CanadaPlus@lemmy.sdf.org · 2 months ago

So is it just another database software at this point, then?

ℍ𝕂-𝟞𝟝@sopuli.xyz · 2 months ago

With a JS-based query language, yeah

bastion@feddit.nl · 2 months ago

Yes, ish. There are aspects of it that are really valuable, and fit some use cases extremely well. But, in some senses, yes. Like any DBMS, you’ve got to know it’s strengths and weaknesses. And if you do, there are definitely circumstances where you’d choose it over others. But not always.

Kuranashi@lemmy.world · 2 months ago

Let’s use JavaScript as the query language, nothing could ever go wrong with that…