Message boards : Number crunching : compression of files
| Author | Message | 
|---|---|
|  Runaway1956 Send message Joined: 5 Nov 05 Posts: 19 Credit: 535,400 RAC: 0 | 
 I have 4 computers running, with a combined RAC of 850 (ATM).  I'm downloading a heckuva lot of bits and bytes - on a dial up connection. I realize that most people these days are on broadband, and they don't realize how BIG a 2 MB file is. But, it takes minutes to download each file on 56k. In fact, most of my connection hours seem to be Boinc Rosetta downloads. :( Would it be possible to compress the download files, so they download faster? Even the really fast broadband users might benefit, considering that some have bandwidth restrictions in thier contracts. (Exceed "x" gig download limit, pay a premium type of thing.) I opened one of the 2 MB .gz files at random, using Winrar. It extracted a 6 MB file, which I compressed again using Winrar, at best compression. hom001_aa1ten_09_05.200_v1_3.gz starts out at 2,152,755 bytes hom001_aa1ten_09_05.200_v1_3 extracted is 6,581,169 bytes. hom001_aa1ten_09_05.200_v1_3.rar is 918,056 bytes. In other words, I can compress those files down to ~ 43% of the size they are being shipped at. Meaning, my computers would only be using ~ 43% of the time they are now using on the dial-up. Not to mention, some of the big league crunchers might save a couple dollars a month on their broadband connections. Anyone can run the same test - just grab a file or six at random from your Rosetta project folder, and see what you can do with them. (copy them somewhere else to play with them - don't mess up your Rosetta folder. ;) ) What do ya say, folks? Can we get better compression on those .gz files, please???? Note that if my RAC goes up much more, the dial-up connection won't keep up with the crunching. :(   | 
| Scribe  Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 | 
 | 
|  Runaway1956 Send message Joined: 5 Nov 05 Posts: 19 Credit: 535,400 RAC: 0 | 
 Thanks Scribe. I read the links you supplies, as well as doing some browsing before I found your post. ;) I have gone into preferences, and set my target runtime to the highest - 1 day. Probably won't see any difference for a day or two, maybe Monday or Tuesday I can report that I'm using less bandwidth to do as much or more work. I certainly hope that David and crew are considering a change to compression methods - Gzip looks like it is probably the best solution to that specific problem. It's still not clear to me how that CPU runtime thing will reduce bandwidth - I'm off to read some more, lol   | 
|  Runaway1956 Send message Joined: 5 Nov 05 Posts: 19 Credit: 535,400 RAC: 0 | 
 Thanks Scribe. OK, first thing I thought was, that will kill my RAC. But, that's not true, as credit = computer time x bench, roughly speaking. Second thought, what's the purpose in crunching the same unit longer? I'd have to go back and read to get it exactly right - but I'm giving the computer more time to build more models - sort of double and triple checking the work. I'll read more, but that's the idea I get from it. So, pushing the runtime up actually helps the science, and costs me nothing in credits, right? I like it. ;)   | 
| Scribe  Send message Joined: 2 Nov 05 Posts: 284 Credit: 157,359 RAC: 0 | 
 ......So, pushing the runtime up actually helps the science, and costs me nothing in credits, right? ....right! :thumb     | 
|  dcdc Send message Joined: 3 Nov 05 Posts: 1834 Credit: 124,260,318 RAC: 8 | 
 Thanks Scribe. You can run lots of tests on each work unit - the more you run on each, the fewer WUs you have to download ;) HTH Danny | 
| BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 | 
 At the moment, we participants are looking for 10,000 models for each of the WUs that are released.  (When we get lots more participants, they'd like to bump that up to 100,000 for certain types of WUs which aren't being resolved very well with only 10,000 models.) The project doesn't care if 1 model is returned from 10,000 machines, 10 models are returned from 1000 machines, 100 models are returned from 100 machines, or 1000 models are returned from 10 machines; as long as they all get returned in time. Welcome aboard.. and have fun reading through all the discussions on science and project progress to get a good feel for what's going on here. :) | 
|  Runaway1956 Send message Joined: 5 Nov 05 Posts: 19 Credit: 535,400 RAC: 0 | 
 Well, it seems to be working.   There was a glitch, at first. The WU didn't want to be worked, it seemed, when the time went from 2 hours up to 24. I set the time back to 8 hours, it seemed to help - but I eventually had to reset two of the machines. They would just hang, but it didn't seem like that dreaded 1% hangup - one WU got to 12% and hung, another got to 78% and hung. Restarting the machine would set that particular WU back to 0% - resulting in wasted time, and lost results. It seems all the machines are finally settled in, doing one WU each 8 hours. And, the family is off my butt, complaining about hogging all the bandwidth. I may try 24 hours again - but not for a little while. Thanks guys - I've gained a lot of knowlege from this thread.   | 
            Message boards : 
            Number crunching : 
        compression of files
    
 
         ©2025 University of Washington 
https://www.bakerlab.org