When running
rsync -Paz /home/sbird "/run/media/sbird/My Passport/sbird"
As told by someone, I run into a ran out of storage error midway. Why is this? My disk usage is about 385 GiB for my home folder, and there is at least 800 GiB of space in the external SSD (which already has stuff like photos and documents). Does rsync make doubly copies of it or something? That would be kind of silly. Or is it some other issue?
Note that the SSD is from a reputable brand (Western Digital) so it is unlikely that it is reporting a fake amount of storage.
Simple: Exfat does not support symbolic links. So every file that’s just a symbolic link on your btrfs filesystem is getting copied in full (the link is being resolved) to your Exfat drive.
Solution don’t use Exfat. For backups from btrfs, I recommend using btrfs with compression enabled.
Also don’t forget to rebalance your btrfs partitions regularly to reclaim lost space! Also, delete old snapshots!
That makes a lot of sense. I can’t reformat the external SSD though, since it has a bunch of other files and needs to be used by my family (who are mostly Windows users)
The simplest explanation for the size difference could be if you have a symlink in your home folder pointing outside it. Idk if rsync traverses symlinks and filesystems by default, i.e. goes into linked folders instead of just copying the link, but you might want to check that. Note also that exFAT doesn’t support symlinks, dunno what rsync does in that case.
It would be useful to run
ls -R >file.txtin both the source and target directories and diff the files to see if the directory structure changed. (The-loption would report many changes, since exFAT doesn’t support Unix permissions either.)As others mentioned, if you have hardlinks in the source, they could be copied multiple times to the target, particularly since exFAT, again, doesn’t have hardlinks. But the primary source of hardlinks in normal usage would probably be git, which employs them to compact its structures, and I doubt it that you have >300 GB of git repositories.
A second possibility is the deduplication feature of BTRFS. If you made copies of files on your SSD, they only take up space there when changing something - thats how i keep 5 differently modded Cyberpunk 2077 installations on my drive while only taking up a fraction of space that would be needed - I wouldn’t be able to copy this drive 1:1 onto a different filesystem.
Ah, I knew the mention of btrfs heebied my jeebies a little, but forgot about the CoW thing.
Idk if rsync traverses symlinks and filesystems by default,
From the man page:
Beginning with rsync 3.0.0, rsync always sends these implied directories as real directories in the file list, even if a path element is really a symlink on the sending side. This prevents some really unexpected behaviors when copying the full path of a file that you didn’t realize had a symlink in its path.
That means, if you’re transferring the file
~/foo/bar/file.txt, where~/foo/bar/is a symlink to~/foo/baz, thebazdirectory will essentially be duplicated and end up as the real directory/SSD/foo/barand/SSD/foo/baz.Yeah, that would do it. If OP has such symlinks, they probably need to add an exception for rsync.
rsync does not delete files at the target by default, it has kept all of the original files when they were deleted from the original source location.
You must specify --delete for it to also delete files at the target location when they are deleted at the source.
The directory “sbird” in the SSD did not exist beforehand though?
Are you saying this is your first run?
Run ‘ncdu /run/media/sbird’ to find out why there’s no space on it.
Let’s back up and check your assumptions: How did you check that the disk usage of your home folder is 385GiB and that there are 780GiB of free disk space on your external drive?
Using du -sh --apparent-size /home/sbird returns 382G, with external SSD 129G (as expected)
Is your external SSD partitioned into multiple drives? Are you saying that your SSD is reported to be 129GB by the OS?
129G used (as in the files that were there before rsync)
Checking “properties” using Dolphin. Could that be incorrect?
I’d say you can trust that.
BTRFS supports compression and deduplication, so the actual disk space used might be less than the total size of your home directory. I’d run
du -sh --apparent-size /home/sbirdto check how large your home dir actually is. If it’s larger than 780 GiB, there’s your problem. Otherwise there might be hardlinks which rsync is copying multiple times; add the-Hflag to copy hardlinks as hardlinks.383G for /home/sbird (definitely not more than 780G) so that is strange. Using -H doesn’t work since the external SSD is exFAT (which from a quick search doesn’t support symlinks)
You can rerun the
ducommand with--count-linksto count hardlinks multiple times. If that shows >780GiB you have a lot of hardlinks somewhere, which you can narrow down by rerunning the command on each of the subdirectories in your home directory.Your options would be to delete the hardlinks to decrease your total file size, exclude them from the rsync with
--exclude, or repartition your SSD to a filesystem that supports hardlinks.With --count-links, it is just 384G so that is probably not the issue?
There might be a possibility that recursion is happening and a directory is looping into itself and filling up your storage.
I have some suggestions for your command to help make a more consistent experience with rsync.
1:
--dry-run(-n) is great for troubleshooting issues. It performs a fake transfer so you can sort issues before moving any data. Remove this option when you are confident about making changes.2:
--verbose --human-readable(-vh) will give you visual feedback so you can see what is happening. Combine this with --dry-run so you get a full picture of what rsync will attempt to do before any changes are made.3:
--compress(-z) might not be suitable for this specific job, as I understand, it’s meant to compress data during a file transfer intended over a network. In your commands current state, it’s just adding extra processing power which might not be useful for a connected device.4: If you are transferring directories/folders, I found more consistent behaviour from rsync by adding a trailing slash at the end of a path. For example use “/home/username/folder_name/” and not “/home/username/folder_name”. I’ve run into recursion issues by not using a trailing slash.
Don’t use a trailing slash if you are transferring a single file. That distinction helps me to understand what I’m transferring too.
5:
--deletewill make sure your source folder and destination folder are a 1:1 match. Any files deleted in the source folder will be deleted in the destination folder. If you want to keep any and all added files in your destination folder, this option can be ignored.--archive(-a) and--partial --progress(-P) are both good and don’t need to be changed or removed.If you do happen to be running into a recursion issue that’s filling up your storage, you may need to look into using the
--excludeoption to exclude the problem folder.How do I find which folder is causing problems? When using --verbose and --dry-run, it goes way too fast and the terminal can’t see all of the history
You can store the output of
rsyncin a file by usingrsync ALL_THE_OPTIONS_YOU_USED > rsync-output.txt. This creates a file called rsync-output.txt in your current directory which you can inspect later.This, however means that you won’t see the output right away. You can also use
rsync ALL_THE_OPTIONS_YOU_USED | tee rsync-output.txt, which will both create the file and display the output on your terminal while it is being produced.Having a quick scroll of the output file (neat tip with the > to get a text file, thanks!) nothing immediately jumps out to me. There isn’t any repeated folders or anything like that from a glance. Anything I should look out for?
You checked 385GiB of files by hand? Is that size made up by a few humongously large files?
I suggest using
uniqto check if you have duplicate files in there. (uniq’s input must be sorted first). If you still have the output file from the previous step, and it’s calledrsync-output.txt, dosort rsync-output.txt | uniq -dc. This will print the duplicates and the number of their occurrences.when using uniq nothing is printed (I’m assuming that means no duplicates?)
I’m sorry. I was stupid. If you had duplicates due to a file system loop or symlinks, they would all be under different names. So you wouldn’t be able to find them with this method.
running du command with --count-links as suggested by another user returns 384G (so that isn’t the problem it seems)
Ok then, that makes sense
If you don’t spot any recusion issues, I’d suggest looking for other issues and not spend too much time here. At least now you have some troubleshooting knowledge going forward. Best of luck figuring out the issue.
Does your terminal have a scroll back limit? You may need to change that setting if there is a limit.
That will depend on which terminal you are using and it may have a different name so I can’t really help more with this specific issue. You’ll have to search that up based on the terminal you are using.
Personally, I have no more tips that those that have already been presented in this comment section. What I would do now to find out what’s going on is the age-old divide-and-conquer debugging technique:
Using rsync or a file manager (yours is Dolphin), only copy a few top-level directories at a time to your external drive. Note the directories you are about to move before each transfer. After each transfer check if the sizes of the directories on your internal drive (roughly) match those on your external drive (They will probably differ a little bit). You can also use your file manager for that.
If all went fine for the first batch, proceed to the next until you find one where the sizes differ significantly. Then delete that offending batch from the external drive. Divide the offending batch into smaller batches (select fewer directories if you tried transferring multiple; or descend into a single directory and copy its subdirectories piecewise like you did before).
In the end you should have a single directory or file which you have identified as problematic. That can then be investigated further.
YIKES, I found that .local is around 30GB on my system ssd, over 50GB in the external SSD. Much of that is due to Steam and Kdenlive. I can try uninstalling Steam…
Something interesting that I found: according to dolphin, many folders have many GB extra (e.g. 52GB vs 66GB for documents folder which is kind of crazy) while filelight records 52GB vs 112GB for documents folder, which if true, is kind of insane. Using du -sh records 53G vs 136G (they’re the same when using --apparent-size, weird. Specifically for Godot directory, it’s 3.8GB vs 41 GB!!!)!!! Files like videos and games seem to be about the same size, while Godot projects with git are much bigger. Weird.
These differences really are insane. Maybe someone more knowledgeable can comment on why different tools differ so wildly in the total size they report.
I have never used BTRFS, so I must resort to forwarding googled results like this one.
Could you try
compsize ~? If thePerccolumn is much lower than 100% or theDisk Usagecolumn is much lower than theUncompressedcolumn, then you have some BTRFS-specific file-size reduction on your hands, which your external exFAT naturally can’t replicate.percentage of total is 83% (292G vs uncompressed 349G apparently)
It’s good you found some pathological examples, but I’m at the end of my rope here.
You can use these examples and the other information you gathered so far and ask specifically how these size discrepancies can be explained and maybe mitigated. I suggest more specialized communities for this such as !linux@lemmy.ml, !linux@programming.dev, !linux@lemmy.world, !linux4noobs@programming.dev, !linux4noobs@lemmy.world, !linuxquestions@lemmy.zip.
I have cross posted to a Linux community. Thank you so much for all your help :DDDD
I’m assuming that Filelight count file size differently, and I will be trusting the result from dolphin more
using -H with rsync doesn’t seem to do anything unfortunately…
With Dolphin Godot directory is 1GB vs 5GB. Why is there a difference between filelight, dolphin, and du -sh? So weird
it looks like much of the extra Godot bulk is in .git and .Godot directories
Something about never knowing the time when one has two clocks
Oh that’s actually a good idea. Thanks person! I will report back soon
Maybe you have hard links or sparse files in your source directory. Try with -H for hard links first. You can try with --sparse but I think hard links are more likely.
Using -H throws an error as symlinks aren’t supported in exFAT it seems.
By the way, do you have lots of torrents downloaded or large virtual machines installed? Both torrent clients and virtual machine managers use ‘sparse files’ to save space until you actually download the whole torrent or write a lot to the VM’s disk. Those files would be copied at full size to exFAT.
If you have folders with such content, you can use e.g. Double Commander to check the actual used size of those folders (with ctrl-L in Doublecmd). Idk which terminal utils might give you those numbers in place, but aforementioned
ncducan calculate them and present as a tree.using du -hsc returns 384G with /home/sbird, and 150G inside the external SSD (when it does not have any of the files transferred with rsync)
Well, that’s not what I meant. If you have directories with torrents or VMs,
dumight report different size for those directories on the source and target disks. Then it might’ve meant that those are the culprits.With just the source disk, you can check
du -hsc dirnameversusdu -hsc --apparent-size dirnameto check if the disk space used is much smaller than the ‘apparent size’, which would mean there are sparse files in the directory, i.e. not fully written to disk. rsync would copy those files to full ‘apparent size’.As mentioned elsewhere, btrfs might also save space on the source disk by not writing duplicate files multiple times — but idk if
duwould report that, since it’s specific to btrfs internals.
For a typical user, hard links would be mostly employed by git for its internal structures, and it’s difficult to accumulate over 300 GB of git repos.
Sparse files would actually be more believable, since they’re used by both torrent clients and virtual machines.
Could it be you have lots of tiny files and/or a rather large-ish block size on your SSD?
You can check the block size with
sudo blockdev --getbsz /dev/$THE_DEVICE.using the command returns 512 for the external SSD and 4096 for the SSD in my laptop. What does that mean?
What does that mean?
Imagine your hard drive like a giant cupboard of drawers. Each drawer can only have one label, so you must only ever store one “thing” in one drawer, otherwise you wouldn’t be able to label the thing accurately and end up not knowing what went where.
If you have giant drawers (a large block size), but only tiny things (small files) to store, you end up wasting a lot of space in the drawer. It could fit a desktop computer, but you’re only putting in a phone. This problem is called “internal fragmentation” and causes files to take up way more space than it would seem they need.
–––––
However, in your case, the target block size is actually smaller, so this is not the issue you’re facing.






