When running

rsync -Paz /home/sbird "/run/media/sbird/My Passport/sbird"

As told by someone, I run into a ran out of storage error midway. Why is this? My disk usage is about 385 GiB for my home folder, and there is at least 800 GiB of space in the external SSD (which already has stuff like photos and documents). Does rsync make doubly copies of it or something? That would be kind of silly. Or is it some other issue?

Note that the SSD is from a reputable brand (Western Digital) so it is unlikely that it is reporting a fake amount of storage.

  • Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 hours ago

    Simple: Exfat does not support symbolic links. So every file that’s just a symbolic link on your btrfs filesystem is getting copied in full (the link is being resolved) to your Exfat drive.

    Solution don’t use Exfat. For backups from btrfs, I recommend using btrfs with compression enabled.

    Also don’t forget to rebalance your btrfs partitions regularly to reclaim lost space! Also, delete old snapshots!

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 hours ago

      That makes a lot of sense. I can’t reformat the external SSD though, since it has a bunch of other files and needs to be used by my family (who are mostly Windows users)

  • [object Object]@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    7 hours ago

    The simplest explanation for the size difference could be if you have a symlink in your home folder pointing outside it. Idk if rsync traverses symlinks and filesystems by default, i.e. goes into linked folders instead of just copying the link, but you might want to check that. Note also that exFAT doesn’t support symlinks, dunno what rsync does in that case.

    It would be useful to run ls -R >file.txt in both the source and target directories and diff the files to see if the directory structure changed. (The -l option would report many changes, since exFAT doesn’t support Unix permissions either.)

    As others mentioned, if you have hardlinks in the source, they could be copied multiple times to the target, particularly since exFAT, again, doesn’t have hardlinks. But the primary source of hardlinks in normal usage would probably be git, which employs them to compact its structures, and I doubt it that you have >300 GB of git repositories.

    • Wildmimic@anarchist.nexus
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 hours ago

      A second possibility is the deduplication feature of BTRFS. If you made copies of files on your SSD, they only take up space there when changing something - thats how i keep 5 differently modded Cyberpunk 2077 installations on my drive while only taking up a fraction of space that would be needed - I wouldn’t be able to copy this drive 1:1 onto a different filesystem.

    • bleistift2@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 hours ago

      Idk if rsync traverses symlinks and filesystems by default,

      From the man page:

      Beginning with rsync 3.0.0, rsync always sends these implied directories as real directories in the file list, even if a path element is really a symlink on the sending side. This prevents some really unexpected behaviors when copying the full path of a file that you didn’t realize had a symlink in its path.

      That means, if you’re transferring the file ~/foo/bar/file.txt, where ~/foo/bar/ is a symlink to ~/foo/baz, the baz directory will essentially be duplicated and end up as the real directory /SSD/foo/bar and /SSD/foo/baz.

  • drkt@scribe.disroot.org
    link
    fedilink
    arrow-up
    10
    ·
    10 hours ago

    rsync does not delete files at the target by default, it has kept all of the original files when they were deleted from the original source location.

    You must specify --delete for it to also delete files at the target location when they are deleted at the source.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      9 hours ago

      The directory “sbird” in the SSD did not exist beforehand though?

      • drkt@scribe.disroot.org
        link
        fedilink
        arrow-up
        4
        ·
        8 hours ago

        Are you saying this is your first run?

        Run ‘ncdu /run/media/sbird’ to find out why there’s no space on it.

  • bleistift2@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    2
    ·
    7 hours ago

    Let’s back up and check your assumptions: How did you check that the disk usage of your home folder is 385GiB and that there are 780GiB of free disk space on your external drive?

  • degenerate_neutron_matter@fedia.io
    link
    fedilink
    arrow-up
    2
    ·
    7 hours ago

    BTRFS supports compression and deduplication, so the actual disk space used might be less than the total size of your home directory. I’d run du -sh --apparent-size /home/sbird to check how large your home dir actually is. If it’s larger than 780 GiB, there’s your problem. Otherwise there might be hardlinks which rsync is copying multiple times; add the -H flag to copy hardlinks as hardlinks.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 hours ago

      383G for /home/sbird (definitely not more than 780G) so that is strange. Using -H doesn’t work since the external SSD is exFAT (which from a quick search doesn’t support symlinks)

      • degenerate_neutron_matter@fedia.io
        link
        fedilink
        arrow-up
        1
        ·
        6 hours ago

        You can rerun the du command with --count-links to count hardlinks multiple times. If that shows >780GiB you have a lot of hardlinks somewhere, which you can narrow down by rerunning the command on each of the subdirectories in your home directory.

        Your options would be to delete the hardlinks to decrease your total file size, exclude them from the rsync with --exclude, or repartition your SSD to a filesystem that supports hardlinks.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          6 hours ago

          With --count-links, it is just 384G so that is probably not the issue?

  • confusedpuppy@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    2
    ·
    7 hours ago

    There might be a possibility that recursion is happening and a directory is looping into itself and filling up your storage.

    I have some suggestions for your command to help make a more consistent experience with rsync.

    1: --dry-run (-n) is great for troubleshooting issues. It performs a fake transfer so you can sort issues before moving any data. Remove this option when you are confident about making changes.

    2: --verbose --human-readable (-vh) will give you visual feedback so you can see what is happening. Combine this with --dry-run so you get a full picture of what rsync will attempt to do before any changes are made.

    3: --compress (-z) might not be suitable for this specific job, as I understand, it’s meant to compress data during a file transfer intended over a network. In your commands current state, it’s just adding extra processing power which might not be useful for a connected device.

    4: If you are transferring directories/folders, I found more consistent behaviour from rsync by adding a trailing slash at the end of a path. For example use “/home/username/folder_name/” and not “/home/username/folder_name”. I’ve run into recursion issues by not using a trailing slash.

    Don’t use a trailing slash if you are transferring a single file. That distinction helps me to understand what I’m transferring too.

    5: --delete will make sure your source folder and destination folder are a 1:1 match. Any files deleted in the source folder will be deleted in the destination folder. If you want to keep any and all added files in your destination folder, this option can be ignored.

    --archive (-a) and --partial --progress (-P) are both good and don’t need to be changed or removed.

    If you do happen to be running into a recursion issue that’s filling up your storage, you may need to look into using the --exclude option to exclude the problem folder.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 hours ago

      How do I find which folder is causing problems? When using --verbose and --dry-run, it goes way too fast and the terminal can’t see all of the history

      • bleistift2@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        4
        ·
        7 hours ago

        You can store the output of rsync in a file by using rsync ALL_THE_OPTIONS_YOU_USED > rsync-output.txt. This creates a file called rsync-output.txt in your current directory which you can inspect later.

        This, however means that you won’t see the output right away. You can also use rsync ALL_THE_OPTIONS_YOU_USED | tee rsync-output.txt, which will both create the file and display the output on your terminal while it is being produced.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          7 hours ago

          Having a quick scroll of the output file (neat tip with the > to get a text file, thanks!) nothing immediately jumps out to me. There isn’t any repeated folders or anything like that from a glance. Anything I should look out for?

          • bleistift2@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            2
            ·
            6 hours ago

            You checked 385GiB of files by hand? Is that size made up by a few humongously large files?

            I suggest using uniq to check if you have duplicate files in there. (uniq’s input must be sorted first). If you still have the output file from the previous step, and it’s called rsync-output.txt, do sort rsync-output.txt | uniq -dc. This will print the duplicates and the number of their occurrences.

            • sbird@sopuli.xyzOP
              link
              fedilink
              English
              arrow-up
              0
              ·
              6 hours ago

              when using uniq nothing is printed (I’m assuming that means no duplicates?)

              • bleistift2@sopuli.xyz
                link
                fedilink
                English
                arrow-up
                2
                ·
                6 hours ago

                I’m sorry. I was stupid. If you had duplicates due to a file system loop or symlinks, they would all be under different names. So you wouldn’t be able to find them with this method.

                • sbird@sopuli.xyzOP
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  6 hours ago

                  running du command with --count-links as suggested by another user returns 384G (so that isn’t the problem it seems)

          • confusedpuppy@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            2
            ·
            6 hours ago

            If you don’t spot any recusion issues, I’d suggest looking for other issues and not spend too much time here. At least now you have some troubleshooting knowledge going forward. Best of luck figuring out the issue.

      • confusedpuppy@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        1
        ·
        7 hours ago

        Does your terminal have a scroll back limit? You may need to change that setting if there is a limit.

        That will depend on which terminal you are using and it may have a different name so I can’t really help more with this specific issue. You’ll have to search that up based on the terminal you are using.

  • bleistift2@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    6 hours ago

    Personally, I have no more tips that those that have already been presented in this comment section. What I would do now to find out what’s going on is the age-old divide-and-conquer debugging technique:

    Using rsync or a file manager (yours is Dolphin), only copy a few top-level directories at a time to your external drive. Note the directories you are about to move before each transfer. After each transfer check if the sizes of the directories on your internal drive (roughly) match those on your external drive (They will probably differ a little bit). You can also use your file manager for that.

    If all went fine for the first batch, proceed to the next until you find one where the sizes differ significantly. Then delete that offending batch from the external drive. Divide the offending batch into smaller batches (select fewer directories if you tried transferring multiple; or descend into a single directory and copy its subdirectories piecewise like you did before).

    In the end you should have a single directory or file which you have identified as problematic. That can then be investigated further.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      4 hours ago

      YIKES, I found that .local is around 30GB on my system ssd, over 50GB in the external SSD. Much of that is due to Steam and Kdenlive. I can try uninstalling Steam…

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      5 hours ago

      Something interesting that I found: according to dolphin, many folders have many GB extra (e.g. 52GB vs 66GB for documents folder which is kind of crazy) while filelight records 52GB vs 112GB for documents folder, which if true, is kind of insane. Using du -sh records 53G vs 136G (they’re the same when using --apparent-size, weird. Specifically for Godot directory, it’s 3.8GB vs 41 GB!!!)!!! Files like videos and games seem to be about the same size, while Godot projects with git are much bigger. Weird.

      • bleistift2@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 hours ago

        These differences really are insane. Maybe someone more knowledgeable can comment on why different tools differ so wildly in the total size they report.

        I have never used BTRFS, so I must resort to forwarding googled results like this one.

        Could you try compsize ~? If the Perc column is much lower than 100% or the Disk Usage column is much lower than the Uncompressed column, then you have some BTRFS-specific file-size reduction on your hands, which your external exFAT naturally can’t replicate.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          4 hours ago

          percentage of total is 83% (292G vs uncompressed 349G apparently)

      • bleistift2@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 hours ago

        It’s good you found some pathological examples, but I’m at the end of my rope here.

        You can use these examples and the other information you gathered so far and ask specifically how these size discrepancies can be explained and maybe mitigated. I suggest more specialized communities for this such as !linux@lemmy.ml, !linux@programming.dev, !linux@lemmy.world, !linux4noobs@programming.dev, !linux4noobs@lemmy.world, !linuxquestions@lemmy.zip.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 hours ago

          I have cross posted to a Linux community. Thank you so much for all your help :DDDD

      • sbird@sopuli.xyzOP
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        4 hours ago

        I’m assuming that Filelight count file size differently, and I will be trusting the result from dolphin more

      • sbird@sopuli.xyzOP
        link
        fedilink
        English
        arrow-up
        0
        arrow-down
        1
        ·
        5 hours ago

        using -H with rsync doesn’t seem to do anything unfortunately…

      • sbird@sopuli.xyzOP
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        5 hours ago

        With Dolphin Godot directory is 1GB vs 5GB. Why is there a difference between filelight, dolphin, and du -sh? So weird

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          5 hours ago

          it looks like much of the extra Godot bulk is in .git and .Godot directories

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          5 hours ago

          Something about never knowing the time when one has two clocks

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      6 hours ago

      Oh that’s actually a good idea. Thanks person! I will report back soon

  • olosta@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    8 hours ago

    Maybe you have hard links or sparse files in your source directory. Try with -H for hard links first. You can try with --sparse but I think hard links are more likely.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 hours ago

      Using -H throws an error as symlinks aren’t supported in exFAT it seems.

      • [object Object]@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        6 hours ago

        By the way, do you have lots of torrents downloaded or large virtual machines installed? Both torrent clients and virtual machine managers use ‘sparse files’ to save space until you actually download the whole torrent or write a lot to the VM’s disk. Those files would be copied at full size to exFAT.

        If you have folders with such content, you can use e.g. Double Commander to check the actual used size of those folders (with ctrl-L in Doublecmd). Idk which terminal utils might give you those numbers in place, but aforementioned ncdu can calculate them and present as a tree.

        • sbird@sopuli.xyzOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 hours ago

          using du -hsc returns 384G with /home/sbird, and 150G inside the external SSD (when it does not have any of the files transferred with rsync)

          • [object Object]@lemmy.world
            link
            fedilink
            arrow-up
            1
            ·
            6 hours ago

            Well, that’s not what I meant. If you have directories with torrents or VMs, du might report different size for those directories on the source and target disks. Then it might’ve meant that those are the culprits.

            With just the source disk, you can check du -hsc dirname versus du -hsc --apparent-size dirname to check if the disk space used is much smaller than the ‘apparent size’, which would mean there are sparse files in the directory, i.e. not fully written to disk. rsync would copy those files to full ‘apparent size’.

            As mentioned elsewhere, btrfs might also save space on the source disk by not writing duplicate files multiple times — but idk if du would report that, since it’s specific to btrfs internals.

    • [object Object]@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      7 hours ago

      For a typical user, hard links would be mostly employed by git for its internal structures, and it’s difficult to accumulate over 300 GB of git repos.

      Sparse files would actually be more believable, since they’re used by both torrent clients and virtual machines.

  • bleistift2@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    8 hours ago

    Could it be you have lots of tiny files and/or a rather large-ish block size on your SSD?

    You can check the block size with sudo blockdev --getbsz /dev/$THE_DEVICE.

    • sbird@sopuli.xyzOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      7 hours ago

      using the command returns 512 for the external SSD and 4096 for the SSD in my laptop. What does that mean?

      • bleistift2@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        7 hours ago

        What does that mean?

        Imagine your hard drive like a giant cupboard of drawers. Each drawer can only have one label, so you must only ever store one “thing” in one drawer, otherwise you wouldn’t be able to label the thing accurately and end up not knowing what went where.

        If you have giant drawers (a large block size), but only tiny things (small files) to store, you end up wasting a lot of space in the drawer. It could fit a desktop computer, but you’re only putting in a phone. This problem is called “internal fragmentation” and causes files to take up way more space than it would seem they need.

        –––––

        However, in your case, the target block size is actually smaller, so this is not the issue you’re facing.