I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).
With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.
Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.
There can be duplicate SSNs due to name changes of an individual, that’s the easiest answer. In general, it’s common to just add a new record in cases where a person’s information changes so you can retain the old record(s) and thus have a history for a person (look up Slowly Changing Dimensions (SCD)). That’s how the SSA is able to figure out if a person changed their gender, they just look up that information using the same SSN and see if the gender in the new application is different from the old data.
Another accusation Elon made was that payments are going to people missing SSNs. The best explanation I have for that is that various state departments have their own on-premise databases and their own structure and design that do not necessarily mirror the federal master database. There are likely some databases where the SSN field is setup to accept strings only, since in real life, your SSN on your card actually has dashes, those dashes make the number into a string. If the SSN is stored as a string in a state database, then when it’s brought over to the federal database (assuming the federal db is using a number field instead of text), there can be some data loss, resulting in a NULL.
JFC: married individuals, or divorced and name change back, would be totally fucked. Just on the very surface is his fuckery.
Hypothetically you could have a separate “previous names” table where you keep the previous names and on the main table you only keep the current name. There are a lot of ways to design a db to not unnecessarily duplicate SSNs, but without knowing the implementation it’s hard to say how wrong Musk is. But it’s obvious he doesn’t know what he’s talking about because we know that due to human error SSN-s are not unique and you can’t enforce uniqueness on SSN-s without completely fucking up the system. Complaining about it the way he did indicates that he doesn’t really understand why things are the way they are.
Another accusation Elon made was that payments are going to people missing SSNs.
A much simpler answer is thatnot all Americans actually have an SSN. The Amish for example have religious objections towards insurance, so they were allowed to opt out from social security and therefore don’t get an SSN.It’s true that some Americans don’t have Social Security numbers, but those Americans can’t collect Social Security benefits unless/until they get one.
My bad, I thought it was about payments in general (including other programs) but it says social security database. Sorry.
TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.
You can imagine this would be very useful when taking backups for instance, we call this a “Copy on Write” approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)
now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about how the DB operates, it wouldn’t remove “duplicates” it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.
The problem here, as a non programmer, is that i don’t understand why you would ever de-duplicate a database. Maybe there’s a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove “duplicate” entries, however that’s supposed to work)
Elon doesn’t know what “de-duplication” is, and i don’t know why you would ever want that in a DB, seems like a really good way to explode everything,
i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another
Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.
what elon is implying here (remove “duplicate” entries, however that’s supposed to work)
Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.
Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.
in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case. Maybe even use historical backups or CoW to retain that kind of data.
Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.
and naturally, he doesn’t know what the term “de-duplication” means. Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.
in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case.
… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there
Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.
… I don’t think you understand how modern databases are designed
… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there
u were talking about not keeping historical data, which is one of the proposed reasons you would have “duplicate” entries, i was just clarifying that.
… I don’t think you understand how modern databases are designed
it’s my understanding that when it comes to storing data that it shouldn’t be possible to have two independent stores of the exact same thing, in two separate places, you could have duplicate data entries, but that’s irrelevant to the discussion of de-duplication aside from data consolidation. Which i don’t imagine is an intended usecase for a DB. Considering that you literally already have one identical entry. Of course you could simply make it non identical, that goes without saying.
Also, we’re talking about the DB used for the social security database, not fucking tigerbeetle.
Ssn being unique isnt a dumb idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.
Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.
Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.
even then, i wonder if there’s some sort of “row hash function” that takes a hash of all the data in a single entry, and generates a universally unique hash of that entry, as a form of “global id”
I think a lot of comments here miss the mark, it’s not really just about stating the gov does not use SQL or speculation regarding keys.
Deduplication is generally part of a compression strategy and has nothing to do with SQL. If we’re being generous he may have been talking about normalization, but no one I have ever met has confused the two terms (they are distinctly different from an engineering perspective).
There are degrees of normalization too, so it may make total sense to normalize 3NF (third normal form) rather than say 6NF depending on the data.
This is it, relational databases are normalized under forms, deduplicate is usually a term used when talking about a concrete data set from data sources like a database, not the relational data model in the database itself.
Thats interesting. I didn’t know anything about normal forms, but a quick glance at G4G has some interesting information. I don’t have the time to go through their full article at the moment, but its been added to my to do list.
Link for the lazy: https://www.geeksforgeeks.org/types-of-normal-forms-in-dbms/
Because SQL is everywhere. If Musk knew what it was, he would know that the government absolutely does use it.
This explanation makes no sense in the context of OP’s question, given the order of comments…
Yeah, a better explanation is that Deduplicating Databases are an absolutely terrible idea for every use case, as it means deleting history from the database.
Because of course the government uses SQL. It’s as stupid as saying the government doesn’t use electricity or something equally stupid. The government is myriad agencies running myriad programs on myriad hardware with myriad people. My damned computers at home are using at least 2-3 SQL databases for some of the programs I run.
SQL is damn near everywhere where data sets are found.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Oh, well another user pointed out that SSN’s are not unique, I think they are recycled after death or something. In any case, I do know that when the SSN system was first created it was created by people who said this is NOT MEANT to be treated as unique identifiers for our populace, and if it were it would be more comprehensive than an unsecure string of numbers that anyone can get their hands on. But lo and behold, we never created a proper solution and we ended up using SSN’s for identity purposes. Poop.
I’m pretty sure there is a federal statute that says ONLY the SSA may collect or use SSNs, as to federal agencies. I argued it once when a federal agency court tried to tell me that it couldn’t process part of my client’s case without it. I didn’t care but my client was crotchety and would only even give me the last four.
Edit. It’s a regulation:
https://www.law.cornell.edu/cfr/text/28/802.23
An agency cannot require disclosure of an SSN for any right or benefit unless a specific federal statute requires it or the agency required the disclosure prior to 1975.
In my case the agency got back to me with some federal statute that didn’t say what they said it said, and eventually they had to admit they were wrong.
SSNs being duplicated would be entirely expected depending upon the table’s purpose. There are many forms of normalization in database tables.
I mean just think about this a little bit, if the purpose is transactions or something and each row has a SSN reference in it for some reason, you’d have a duplicate SSN per transaction row.
A tiny bit of learning SQL and you could easily see transactional totals grouped by SSN (using, get this, a group by clause). This shit is all 100% normal depending upon the normalization level of the schema. There are even – almost obviously – tradeoffs between fully normalizing data and being able to access it quickly. If I centralize the identities together and then always only put the reference id in a transactional table, every query that needs that information has to go join to it and the table can quickly become a dependency knot.
There was a “member” table for instance in an IBM WebSphere schema that used to cause all kinds of problems, because every single record was technically a “member” so everything in the whole system had to join to it to do anything useful.
had to join to it
I don’t think I get what this means. As you describe it, that reference id sounds comparable to a pointer, and so there should be a quick look up when you need to de-reference it, but that hardly seems like a “dependency knot”?
I feel like this is showing my own ignorance on the back end if databasing. Can you point me to references that explain this better?
I’m talking about a SQL join. It’s essentially combining two tables into one set of query results and there are a number of different ways to do it.
https://www.w3schools.com/sql/sql_join.asp
Some joins are fast and some can be slow. It depends on a variety of different factors. But making every query require multiple joins to produce anything of use is usually pretty disastrous in real-life scenarios. That’s why one of the basics of schema design is that you usually normalize to what’s called third normal form for transactional tables, but reporting schemas are often even less normalized because that allows you to quickly put together reporting queries that don’t immediately run the database into the ground.
DB normalization and normal forms are practically a known science, but practitioners (and sometimes DBAs) often have no clue that this stuff is relatively settled and sometimes even use a completely wrong normal form for what they are doing.
https://en.m.wikipedia.org/wiki/Database_normalization
In most software (setting aside well-written open source), the schema was put together by someone who didn’t even understand what normal form they were targeting or why they would target it. So the schema for one application will often be at varying forms of normalization, and schemas across different applications almost necessarily will have different normal forms within them even if they’re properly designed.
All that said, detecting, grouping, comparing, and removing duplicates is a basic function of SQL. It’s definitely not expected that, for instance, database tables would never contain a duplicate reference to a SSN. Leon is indeed demonstrating here that he’s a complete idiot when it comes to databases. (And he goes a step further by saying the government doesn’t use SQL when it obviously does somewhere. SQL databases are so ubiquitous that just about any modern software package contains one.)
Aha Airforce one likely uses SQL
AF1 probably needs a database just for it’s in in-flight menu.
It’s entirely possible that the database is pre SQL.
He didn’t say the SSN database isn’t SQL. He said the GOVERNMENT doesn’t use SQL.
If SSNs are used as a primary key (a unique identifier for a row of data) then they’d have to be duplicated to be able to merge data together.
However, even if they aren’t using ssn as an identifier as it’s sensitive information. It’s not uncommon to repeat data either for speed/performance sake, simplicity in table design, it’s in a lookup table, or you have disconnected tables.
Having a value repeated doesn’t tell you anything about fraud risk, efficency, or really anything. Using it as the primary piece of evidence for a claim isn’t a strong arguement.
This sounds like a reasonable argument.
Can you pass any resources with examples on when having duplicate values would be useful/best practices?
Sure, basically any time you have a many-to-many relationship you’ll have to repeat keys multiple times. Think students taking courses. You’d have a students table and a courses table, but the relationship is many students take many courses. So you’d want a third table for lookups where each row is [student_id, course_id].
This stackoverflow post has a similar example with authors and books - https://stackoverflow.com/questions/13970628/how-do-i-model-a-many-to-many-relation-in-sql-server#13970688
This is the answer… it seems few on lemmy have ever normalized a database. But they do know how to give answers!
Thanks, OP seemed more curious about the technical aspects than just the absurdity of the comment (since pretty much every business uses SQL) so hoped a more technical explanation might be appreciated.
If he doesn’t think the government uses sql after having his goons break into multiple government servers he is an idiot.
If he is lying to cover his ass for fucking up so many things (the more likely explanation) then saying “he never used sql” is basically a dig at how technically inept he really is despite bragging about being a tech bro.
I saw a comment about this in the last couple of days that was really interesting and educational. Unfortunately I can’t seem to find it again to link it, but the gist of it was that there would be two things wrong with using SSNs as primary keys in a SQL database:
- You should not use externally generated data as primary keys
- You should not use personally identifying data as primary keys
Using SSNs as keys would violate both.
I went looking for best practices regarding SQL primary keys and found this really interesting post and discussion on Stack Overflow:
https://stackoverflow.com/questions/337503/whats-the-best-practice-for-primary-keys-in-tables
My first thought was that people’s SSNs can and do change, and sometimes (rarely?) people may have more than one SSN. Like someone mentions in that link, human error would be another reason why you would not want to use external data and particularly SSNs as primary keys.
It may be bad practice to use SSN as a primary key, but that won’t deter thousands of companies from doing exactly that.
Oh, I hear you!
From what I’m seeing in other comments, it seems SSNs aren’t used as primary keys, but they are part of generating the primary key. I haven’t seen anyone directly say it, but it sounds like the primary key is a hash of SSN + DOB (I hope with more data to add entropy, because thats still a tiny bit of data to build a rainbow table from).
Still, assuming we haven’t begun re-using SSNs, it seems concerning to me that a SSN is appearing multiple times in the database. It seems a safe assumption that the uniqueness of a SSN should make the resultant hash unique, so a SSN appearing as associated to multiple primary keys should be a concern, right?
Other comments have led me to believe the “duplicate SSNs” are probably appearing in “different fields” (e.g. a dead man’s SSN would appear directly associated to him, but also as a sort of “collecting payments from” entry in his living wife’s entry). That would a misrepresentation of the facts (which we know Vice Bro, Elon Musk the Wise and Honest would never do). Occam’s Razor though has me leaning in that direction.
That all makes sense, except if someone’s SSN changes (which happens under certain circumstances), doesn’t that invalidate their primary key or require a much more complicated operation of issuing a new record and relinking all the existing relationships?
I can imagine an SSN existing in more than one primary key due to errors. If they use SSNs in the primary key at all, but combined with something else, that leads me to believe that the designers felt that SSNs were reliable for being a pure primary key.
I agree with you about Occam’s Razor. The guy has demonstrated multiple times that he’s a dishonest moron.
I’m not familiar with cases where someone’s SSN could change. Could you link to resources on when that would happen?
I don’t have any resources handy, but I do know someone who this happened to: they were an immigrant who got an SSN the first time they migrated to the US, went back to live in their country for a number of years, then returned to the US and I guess applied for an SSN again. Voilá, two SSNs and a mess.
Yeah, I can imagine thats be an administrative headache. I do not envy them the opportunity of sorting that out.
Thanks for the example though. That makes sense.
I don’t envy either party either. You’re welcome!
That all makes sense, except if someone’s SSN changes (which happens under certain circumstances), doesn’t that invalidate their primary key or require a much more complicated operation of issuing a new record and relinking all the existing relationships?
Yes, in the case of duplicate SSN assignments for two people (rare) l you would need to change their records to align with the new SSN while not changing the records that go the the person who keeps the SSN. We do it with state identifiers and it is a gigantic pain in the ass.
If two numbers are assigned to the same person merging them to one of the two is far easier.
I can definitely imagine all that. Thanks!
I think the thing that’s catching you up the most is that you’re assuming Elon has the slightest clue what he’s talking about about. In your mind, you’ve read the words “the social security database” from his post and have made assumptions about what that means.
I’ve worked with databases for 20+ years, several of those being years working on federal government systems. Each agency has dozens or possibly hundreds of databases all used for different purposes. Saying “the social security database” is so fucking general that it’s basically nonsensical. It’d be like saying “Ford’s car database”.
Elon clearly heard someone technical talking about something, then misinterpreted it for his own purposes to justify what he is doing by destroying our government institutions. His follow up of saying the government doesn’t use SQL just reinforces that point.
Trying to logically backtrack into what he actually meant - and what the primary keys should be - is just sane washing an insane statement.
Dedup is about saving storage and has literally nothing to do with primary keys.
It’s a terminology thing really yes. I mean a database (SQL or not) shouldn’t need de-duplication by nature of how the record index/keys work.
If they’re not using a form of SQL though, I’d be very interested in what they are using. Back in the 90s I was messing around with things like Btrieve and other even more antiquated database engines. But all the software I used that utilised such things was converted to use a form of SQL (even if in some cases there were internal wrappers to allow access in the older way too via legacy code) over 20 years ago.
If I were an American though my biggest concern would be that Musk is able to know the structure AND content of the social security database. His post (if we believe it) demonstrates he must have access to both pieces of information.
His post (if we believe it) demonstrates he must have access to both pieces of information.
At best he is referring to an older mainframe he is aware of not being sql while being completely oblivious of all the government systems that are in sql.
Which isn’t giving him any credit, because in that case he is atill running his mouth based on being ignorant about other government systems.
I submitted data to a government database yesterday that I know for a fact is sql because we have had an ongoing years long relationship that involves improving that system and aligning our state level sql database. The government absolutely uses sql frequently, even if they still have older mainframes with some other database architecture.
The government absolutely uses sql frequently, even if they still have older mainframes with some other database architecture.
This makes more sense. But even then they would surely transfer data from the old system over.
I mean I’m liking the idea that they went down into the basement, started up an old mini computer, with “superman 3” magnetic tapes with data from the 1980s to force them to try to integrate with that and only after transferring the data at 1000cps, find out it’s entirely out of date.
I mean, it won’t be the case, but I’d really like it to be. 😛
This makes more sense. But even then they would surely transfer data from the old system over.
All you gotta do is snap your fingers!
Moving data from system to system is a massive undertaking. It probably needs to be restructured, and decisions made during the process will be found to be imperfect and adjustments will need to be made along the way.
Then you have to change all the connections to other systems and recreate the existing reports and by the way the changed structure impacts all of that and you need to revisit why you have all this stuff snd why don’t we just leave it alone after all.
There is a reason that legacy systems stick around. I’m sure they have legacy mainframes with financial data. At my state office we have a financial mainframe we have been wanting to get rid of for over a decade and while we have peeled off what processes we can there is still a ton left to do. Nothing about it is easy compared to creating something new from scratch, in fact transitioning to a new system to replace an old system is probably ten times as much work. Not to mention you still have to use and maintain the old system the entire time!
He is saying the US government doesn’t use structured databases.
At least 90% of all databases have a structure.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
As someone explained in another comment, you often duplicate information due to rules around cardinality to gain improvements in retrieval an. structure. I would be pretty worried if SSSNs were being used as a a widepread primary key in any set of tables - those should generally be UUIDs that can be optimized for gashing while avoiding collisions.
Even if we are being generous to Elon, we could assume that social security payments are processed on mainframes given how many have to go out and the legacy nature of the program. Most mainframe shops I know have adapted an SQL interface for records in some capacity, but who knows what he is looking at.
Government federal IT is done at a per agency basis. I would say oracle database is pretty much the most licensed piece of software the government does use outside of Redhat Linux and windows desktop.
Clearly the solution is to just use a big Excel spreadsheet.
In our company I’m friends with one of the lead devs. He once told me “no matter what way you look at it, excel is never the answer” lol I’m sure he was a bit biased, but I’ve seen my fair share of macro-ridden abominations over the years
Excel is accounting workbook software, it is not suitable for data storage. Although people certainly use it that way.
It makes a pretty good calculator. 🧮
It’s an amazing tool if only one person is updating / maintaining the file. The moment collaboration starts, you’re all fucked. I’m currently maintaining one that I inherited that is at least 10 years old and comes with a 50 page instruction manual on how to run it every month… that then gets posted to a shared drive where anyone can edit.
And then the rest of the month is spent explaining to the end users how they fucked it up this time.
On the flip side, I’ve also built sheets that could parse data between Nav, MySQL, and SQL ERP systems with tables of over 5million rows each on a single button refresh that ran flawlessly for years… because I was the only maintainer and the sheets were locked from accepting changes from other users.
The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.
The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.
The man continues to be a malignant moron
The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.
Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people and I don’t know how to find the start of the thread on twitter since I only use it when I accidentally click on a link to it.
https://www.ssa.gov/history/hfaq.html
Q20: Are Social Security numbers reused after a person dies?
A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.
My guess would be around your note. If someone mistakenly has two SSNs (due to fraud, error, or name changes), combining DOB helps detect inconsistencies.
Some other possibilities, and I’m just throwing out ideas at this point:
- Adding DOB could help with manual lookups and verification.
- Using SSN + DOB ensures a standard key format across agencies, making it easier to link records.
- Prevents accidental duplication if an SSN is mistyped.
- Maybe the databases were optimized for fixed-length fields, and combining SSN + DOB fit within memory constraints.
- It was easier to locate records with a “human-readable” key. Where as something like a UUID is harder for humans to read or sift through.
Take this with a grain of salt as I’m not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.
In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.
It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.
Hell, I work in a state agency and one of our older databases has a dozen tables with databases.
- One has the whole thing as a long int: 222333444
- One has the whole thing as a string: 2223334444 (which of course can’t be directly compared to the one that is a long int…)
- One has separate fields for area code and the rest with a hyphen: 222 and 333-4444
- One has the whole thing with parenthesis, a space, and a hyphen as a string: (222) 333-4444
The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don’t need to include the parenthesis or hyphens in the database itself.
Okay but if that happens, musk is right that that’s a bit of a denormalization issue that mayne needs resolving.
SSNs should be stored as strings without any hyphen or additional markup, nothing else.
- Storing as a number can cause issues if you ever wanna support trailing zeros
- any “styling” like hyphens should be handled by a consuming front end system, you want only the important data in the DB to maximize query times
It’s more likely though it’s just a composite key…
This is not what he is actively doing though. He isn’t trying to improve databases.
He is tearing down entire departments and agencies and using shit like this to justify it.
Sure but my point is, if it was the scenario you described, then Elon would be talking about the right kind of denormalization problem.
Denormalization due to multiple different tables storing their own copies of the same data, in different formats worse yet, would actually be the kind of problem he’s tweeting about.
As opposed to a composite key on one table which means him being an ultracrepidarian, as usual.
Musk canceled the support for the long running Common Education Data Standards (CEDS) which is an initiative to promote better database standards and normalization for the states to address this kind of thing.
It does not fucking matter if he is technically correct about one tiny detail because he is only using to to destroy, not to improve efficiency.
The SSN is likely to appear in multiple tables, because they will reference a central table that ties it all together. This central table will likely only contain the SSN, the birth date (from what others have been saying), as well as potentially first and last name. In this table, the entries have to be unique.
But then you might have another table, like a table listing all the physical exams, which has the SSN to be able to link it to the person’s name, but ultimately just adds more information to this one person. It does not duplicate the SSN in a way that would be bad.A given SSN appearing in multiple tables actually makes sense. To someone not familiar with SQL (i.e. at about my level of understanding), I could see that being misinterpreted as having multiple SSN repeated “in the database”.
Of all the comments ao far, I find yours the most compelling.
Theoretically, yeah, that’s one solution. The more reasonable thing to do would be to use the foreign key though. So, for example:
SSN_Table
ID | SSN | Other info
Other_Table
ID | SSN_ID | Other info
When you want to connect them to have both sets of info, it’d be the following:
SELECT * FROM SSN_Table JOIN Other_Table ON SSN_Table.ID = Other_Table.SSN_ID
EDIT: Oh, just to clear up any confusion, the SSN_ID in this simple example is not the SSN itself. To access that in this example query, it’d by SSN_Table.SSN
This is true, but there are many instances where denormalization makes sense and is frequently used.
A common example is a table that is frequently read. Instead of going to the “central” table the data is denormalized for faster access. This is completely standard practice for every large system.
There’s nothing inherently wrong with it, but it can be easily misused. With SSN, I’d think the most stupid thing to do is to use it as the primary key. The second one would be to ignore the security risks that are ingrained in an SSN. The federal government, being large as it is, I’m sure has instances of both, however since Musky is using his possy of young, arrogant brogrammers, I’m positively certain they’re completely ignoring the security aspect.
Yeah, no one appreciates security.
I probably overused that saying to explain it: ‘if theres no break ins, why do we pay for security? Oh, there was a break in - what do we even pay security for?’
To be a bit more generic here, when you’re at government scale you’re generally deep in trade-off territory. Time and space are frequently opposed values and you have to choose which one is most important, and consider the expenses of both.
E.g. caching is duplicating data to save time. Without it we’d have lower storage costs, but longer wait times and more network traffic.
Yeah, I work daily with a database with a very important non-ID field that is denormalized throughout most of the database. It’s not a common design pattern, but it is done from time to time.
Yeah, databases are complicated and make my head hurt. Glancing through resources from other comments, I’m realizing I know next to nothing about database optimization. Like, my gut reaction to your comment is that it seems like unnecessary overhead to have that data across two tables - but if one sub-dept didn’t need access to the raw SSN, but did need access to less personal data, j could see those stored in separate tables.
But anyway, you’re helping clear things up for me. I really appreciate the pseudo code level example.
It’s necessary to split it out into different tables if you have a one-to-many relationship. Let’s say you have a list of driver licenses the person has had over the years, for example. Then you’d need the second table. So something like this:
SSN_Table
ID | SSN | Other info
Driver_License_Table
ID | SSN_ID | Issue_Date | Expiry_Date | Other_Info
Then you could do something like pull up a person’s latest driver’s license, or list all the ones they had, or pull up the SSN associated with that license.
I think a likely scenario would be for name changes, such as taking your partner’s surname after marriage.
Beat me to asking this follow up, though you linking additional resources is probably more effort that I would have done. Thanks for that!
“The government” is multiple agencies and departments. There is no single computer system, database, mainframe, or file store that the entire US goverment uses. There is no standard programming language used. There is no standard server configuration. Each agency is different. Each software project is different.
When someone says the government doesn’t use sql, they don’t know what they are talking about. It could be refering to the fact that many government systems are ancient mainframe applications that store everything in vsam. But it is patently false that the government doesn’t use sql. I’ve been on a number of government contracts over the years, spanning multiple agencies. MsSQL was used in all but one.
Furthermore, some people share SSNs, they are not unique. It’s a common misconception that they are, but anyone working on a government software learns this pretty quickly. The fact that it seems to be a big shock goes to show that he doesn’t know what he is doing and neither do the people reporting to him.
Not only is he failing to understand the technology, he is failing to understand the underlying data he is looking at.
working on government software
FTFY
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the Vice Bro doesn’t understand how SQL works).
I’m not aware of any instance where two people share an SSN though. The Social Security Administration even goes as far as to say they don’t recycle the SSNs of dead people (its linked a couple times in other comments and Voyager doesn’t let me save drafts of comments, I’ll make an edit to this comment with that link for you).
Can you point me to somewhere showing multiple people can share an SSN?
Edit: as promised: The Social Security FAQ page
My wife has a tax payment history under two different legal names which share a single SSN
Hmmm, well I can’t speak to how the actual databases are put together, so maybe they would have that as two separate unique primary keys with a duplicated SSN.
But it really seems like bad design if they out it together that way…
Worth noting is that “good” database design evolved over time (https://en.wikipedia.org/wiki/Database_normalization). If anything was setup pre-1970s, they wouldn’t have even had the conception of the normal forms used to cut down on data duplication. And even after they were defined, it would have been quite a while before the concepts trickled down from acedmemia to the engineers actually setting up the databases in production.
On top of that, name to SSN is a many-to-many relationship - a single person can legally change their name, and may have to apply for a new SSN (e.g. in the case of identity theft). So even in a well normalized database, when you query the data in a “useful” form (e.g. results include name and SSN), it’s probably going to appear as if there are multiple people using the same SSN, as well as multiple SSNs assigned to the same person.
This is from 15 years ago, so I don’t know how much has changed since then. But this sounds like the sort of thing they mean.
https://www.nbcnews.com/technolog/odds-someone-else-has-your-ssn-one-7-6c10406347
In responding to other comments, I’ve found similar things.
All the same, thanks for the resources!
I mean I don’t know a ton about SQL but one thing to keep in mind about SSNs is they were not originally meant to be used for identification but because we have no form of national id and places still needed a way to verify who you are people just started using SSNs for that since it’s something everyone has and there wasn’t really a better option. So now the government has been having to try and make them work for that and make them more secure. The better solution would be to make some form of national id that is designed to be secure but Republicans and people like Musk would probably call that government overreach or a way to spy and track people.
Ugh, YES, I am so frustrated at the counter arguments for this that I constantly hear spouted by my (ultra-conservative) family.
I hope that notion re-enters the public consciousness as a part of this (not holding my breath tho)
I’d imagine the numbers of dead people eventually get cycled around to. 9 digits only gives you 999,999,999 people to go through, and we have over a third of that in existence right now.
Assuming the whole “duplicate SSN” thing isn’t just a complete fabrication, we have no idea what table he was even looking at! A table of transactions e.g. would have a huge number of duplicate SSNs.
The fact that SSN aren’t singular identifiers has been public knowledge for quite a while. ID analytics has shown in over a decade of studies that some people have multiple SSN attached to their name, while some (over five million) SSN are used by three or more living individuals. If you search “ID analytics SSN” you’ll find loads of articles reporting on this dating back to 2010 and a bit before.
Was trying to find something from the SSA itself on the topic, but didn’t turn anything useful up on the quick.
Here is a link for the lazy on the topic: https://www.nbcnews.com/technolog/odds-someone-else-has-your-ssn-one-7-6C10406347
If there are timestamped records for things like name changes then you’d get “duped” SSNs
Might seem like a stupid question, but I’m in nostupidquestions sooo… Did Elon really do this tweet with the word “retard” in it? Obviously am on Lemmy so don’t use Twitter.
Yep, just another example of what a trash human being he is.
Wow, that’s fucked.