Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Comments on Efficiently determining disk usage of a folder (without starting from scratch every time)

Post

Efficiently determining disk usage of a folder (without starting from scratch every time)

+2
−0

When I use my computer, one question I commonly want to answer for myself is "how much space is being used by the contents of this folder?". Typical file/window managers, IMX, answer this question the same way that Windows does: by recursing over directory contents and summing their logical sizes. This doesn't suit my needs, for three reasons:

  • While the logical size of an individual file is interesting to me, a sum of logical sizes is not; I want a sum of physical sizes, because the question is about disk usage.

  • It does the calculation (and directory traversal) on the fly, and doesn't show a progress bar or even a clear indication that it's done. Sometimes the file count and size sum will pause for seconds at a time and then start increasing again.

  • It's very slow.

I know that I can use du at the command line to get physical sizes, and it's clear when du is finished because it outputs to the terminal and eventually returns to a terminal prompt. However, it doesn't solve the performance issue.

Is there a filesystem that natively caches this information about directories, or well-known software that maintains such a cache - so that if I e.g. check the size of /home/user, the size of /home/user/Desktop is already known and can be returned instantaneously (as long as the subfolder hasn't been modified in the mean time)? Similarly, caching the result for /home/user/Desktop should speed up a later check for /home/user, since it wouldn't have to consider the Desktop contents. It would also be nice to have a GUI for such a program.

I thought about making such a program, but I don't want to reinvent the wheel. I'd also be interested if there's any way to make ext4 filesystems cache this information automatically, even though they don't appear to by default.

History
Why does this post require attention from curators or moderators?
You might want to add some details to your flag.
Why should this post be closed?

3 comment threads

File system dependent (2 comments)
What do you consider slow? (5 comments)
Links (hard and symbolic) (3 comments)
What do you consider slow?
matthewsnyder‭ wrote 5 months ago

For me, ncdu can do ~1 million files in 5 seconds. du on the same is 2 s. This already seems pretty fast.

Karl Knechtel‭ wrote 5 months ago · edited 5 months ago

On my system, time du -s ~ is reporting over a minute of real time, for less than half a million files (per find ~ -type f | wc -l, which incidentally is much faster). This is from an internal SSD, so disk throughput shouldn't be the problem. Could it be a quirk of ext4? (Are you using something else?)

But really, I would like it to be even faster than is theoretically possible by re-statting everything each time - by instead reusing existing information.

matthewsnyder‭ wrote 5 months ago

It is in fact ext4, but I just realized it's an nvme drive and gets over 1 GB/s. But when you say "internal SSD", is that also nvme? It seems a bit odd it's taking so long for you. Can you try testing the drive speed? You might need fio or nvme-cli, or Gnome Disks might be able to do it.

As for reusing the information - yes, I agree it would be nice. On Windows I used SpaceSniffer and WinDirStat, and they definitely cache. Have you tried Gnome Disk Usage Analyzer (baobab)? I think that caches.

matthewsnyder‭ wrote 5 months ago

Actually, maybe that's an answer, so I'll post it.

Karl Knechtel‭ wrote 4 months ago · edited 4 months ago

Oh, no, it's physically mounted internally but it's a 2.5" SSD connected by SATA. What I meant by "internal" is that it isn't limited by USB (even 3.0). I'd prefer not to benchmark the disk formally at the moment because my backup setup is currently less than stellar. But it should be capable of somewhere in the range of 500 MB/s, by my understanding, because that's how these kinds of disks perform generally.

Come to think of it, it's possible that my SATA cables are getting downgraded to 3Gb/s instead of 6 due to some other internal fault... the main part of the system is almost 10 years old now.