PDA

View Full Version : Need a Formula to Predict Disk Usage



tofu
2008-Aug-21, 09:43 PM
At work, I've been asked to come up with a formula that we can use to forecast data storage needs. The situation is, we have tens of thousands of users and they each have a quota of X MB. Most use substantially less space than they are allocated. A few had a compelling reason to use more, and were granted a larger quota.

The straightforward solution is to graph total usage over time, project that into the future, and buy that amount of disk space plus 10%. I'm wondering if any of the brilliant BAUT minds can suggest a better model than that. I haven't gathered all the past usage data yet, but I'd bet that we get spikes at certain times of the year (perhaps students collecting reference materials for a term paper). I think I'd like to find out how banks determine how much cash to have on hand.

So, any of you people have any tips for me?

JustAFriend
2008-Aug-21, 10:25 PM
1990 thinking; Disk space is TOO cheap these days to fuss with stuff like that.

Giving 30,000 users 30Mb each would barely fill a $150 1TB drive.

Get an enclosure that'll take 6 or 8 terrabyte drives in raid and be done with it for less than a couple thousand bucks. If your boss screams about a couple thou for 30,000 workers then he's too cheap to work for....

jj_0001
2008-Aug-21, 10:42 PM
JustAFriend nailed it.

Consider the amount of salary time a double-controller SATA RAID box with 8 1.2TB drives in it corresponds to. That's 7.2TB worth of RAID-backed storage, spread among 30,000 users. That's 240 meg per user.

That box will cost you somewhere between two and 3 K $US.

3K$US is, oh, say, 100 hours of cheap engineer time. If managing the disc quota costs each user more than 12 minutes, you win.

Strictly speaking, one also has to account for maintainence and power. That's not going to up the cost/user a whole lot, over the whole life of the system.

IF that's not enough space, buy 2. That's a half an hour, give or take, over the life of the box.

tdvance
2008-Aug-21, 11:20 PM
It depends on how it's used though--if the tens of thousands of users actually log into the systems, each megabyte will be very expensive; either that or they will have murderously slow connections. If it's just a single database with tens of thousands of users submitting in mass no more than a handful of queries per minute, much less of a problem. (But still, look how quickly BAUT slows down at quite a bit less than 10,000 users--though I don't know the types and sizes and numbers of servers).

An engineer (no longer among the living) told me, it's not memory, it's memory bandwidth. Memory is cheap. (he meant ram, but applies to disks too). A yottabyte is useless if it takes a lifetime to access it.

tdvance
2008-Aug-21, 11:34 PM
You'd probably want to know somoething aobut the distribution of the users' usage.

You have n users. If you can measure or guess that the mean usage per user is u, and the standard deviation is s, then the "central limit theorem" says that (assuming the users are "equal"--they aren't but it might be a good enough approximation) the total usage will be approximately a normal distribution with mean n*u and standard deviation s/sqrt(n). The approximation is pretty accurate for n bigger than something like 30.

Then, checking the normal distribution out on Wikipedia:

84.1% of the time the usage will be less than u + s/sqrt(n)
97.7% of the time, the usage will be less than u + 2*s/sqrt(n)
99.8% of the time, the usage will be less than u + 3*s/sqrt(n)

cjl
2008-Aug-22, 05:31 AM
It depends on how it's used though--if the tens of thousands of users actually log into the systems, each megabyte will be very expensive; either that or they will have murderously slow connections. If it's just a single database with tens of thousands of users submitting in mass no more than a handful of queries per minute, much less of a problem. (But still, look how quickly BAUT slows down at quite a bit less than 10,000 users--though I don't know the types and sizes and numbers of servers).

An engineer (no longer among the living) told me, it's not memory, it's memory bandwidth. Memory is cheap. (he meant ram, but applies to disks too). A yottabyte is useless if it takes a lifetime to access it.
Bingo.

This is why 15000RPM enterprise drives are so expensive. If all of those thousands of people are trying to access at once, that little array of terabyte consumer drives will choke. The high speed enterprise drives are specifically made for IOPS, not capacity, for exactly that reason.

jj_0001
2008-Aug-22, 05:55 AM
If we're actually having lots of users accessing this thing at once, there isn't a server on the planet going to cope, 15kRPM discs or not.

In such a case, breaking up this into a lot of smaller stores is going to be necessary. Your biggest care then is balancing the traffic, not the storage, so much.

But the OP sounded sorta like a backup store, or something. I saw no mention of bandwidth.

cjl
2008-Aug-22, 05:59 AM
Well, a couple of these (http://www.stec-inc.com/products/zeus/Zeus-IOPS-Solid-State-Drive.php) would work. Of course, they cost tens of thousands of dollars. Each.

tdvance
2008-Aug-22, 02:35 PM
If we're actually having lots of users accessing this thing at once, there isn't a server on the planet going to cope, 15kRPM discs or not.

In such a case, breaking up this into a lot of smaller stores is going to be necessary. Your biggest care then is balancing the traffic, not the storage, so much.

But the OP sounded sorta like a backup store, or something. I saw no mention of bandwidth.

It is a problem large corporations have solved--their own employees accessing computer cycles like mad. The solution is expensive, though--and employees complain about the quotas given that "I have a 1TB disk at home that's so cheap...." and have to be told, no, that doesn't scale.

If it's a backup store--and if it's done like carbonite.com where you put an app on each user's computer to back up the disk at the time of its choosing (to spread out all the bakcups)--it might be manageable without too much cost.

Of course, I looked at and rejected carbonite.com because they would put an app on my machine that backs up automatically :)

tofu
2008-Aug-22, 03:26 PM
You'd probably want to know somoething aobut the distribution of the users' usage.

Yes, this is the way I need to start thinking.

You know, for a small business, I wonder if something like wuala http://wua.la/ would work. The people who need more space could get it from all of their coworkers.

suntrack2
2008-Aug-23, 12:46 PM
At work, I've been asked to come up with a formula that we can use to forecast data storage needs. The situation is, we have tens of thousands of users and they each have a quota of X MB. Most use substantially less space than they are allocated. A few had a compelling reason to use more, and were granted a larger quota.

The straightforward solution is to graph total usage over time, project that into the future, and buy that amount of disk space plus 10%. I'm wondering if any of the brilliant BAUT minds can suggest a better model than that. I haven't gathered all the past usage data yet, but I'd bet that we get spikes at certain times of the year (perhaps students collecting reference materials for a term paper). I think I'd like to find out how banks determine how much cash to have on hand.

So, any of you people have any tips for me?


before some 19 years back I was a marketing executive and advisor in one of the reputed computer company here, in which I did the work to suggest "which computer is economically best to the customer", actually I was just an advisor of that company to the clients, and you know at that time only 486,586 computers were existed in the market, no any further applications were present at that time, here.

see, if this link may be assist you in guiding.
http://www.databasejournal.com/features/mssql/article.php/3414111