Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 294 · 295 · 296 · 297 · 298 · 299 · 300 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,616,240 RAC: 22,198 |
And the boinc-process host is down again. Grant Darwin NT |
tgbauer Send message Joined: 5 Jan 06 Posts: 10 Credit: 100,868,563 RAC: 74,698 |
Have a work unit that doesn't seem to be getting as far as others, and has an unusually long model (the graphics shows a dot with a line that seems to go on into infinity) Other Tasks are running as expected.
This is stderr.txt command: rosetta_4.20_x86_64-apple-darwin -run:protocol jd2_scripting @flags_rb_09_09_632102_625918__t000__0_C1_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_09_09_632102_625918__t000__0_C1_robetta.zip -frag_weight_aligned 0.5 -max_registry_shift 4 -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 3499362 Using database: database_357d5d93529_n_methyl/minirosetta_database error: zipfile probably corrupt (segmentation violation) error: zipfile probably corrupt (illegal instruction) BOINC:: CPU time: 64841.5s, 36000s + 28800s[2024-10-21 22:25: 9:] :: BOINC Output exists: default.out.gz Size: WARNING! cannot get file size for default.out.gz: could not open file. -1 InternalDecoyCount: 0 (GZ) ----- 0 ----- Stream information inconsistent. Writing W_0000001 error: zipfile probably corrupt (segmentation violation) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
Have a work unit that doesn't seem to be getting as far as others, and has an unusually long model (the graphics shows a dot with a line that seems to go on into infinity) It's probably already errored out by now, but with all those errors and running over 2.5days without starting, you should abort it if it's still going. It hasn't started, let alone stand any chance of finishing. Let your core have something more productive to run. |
tgbauer Send message Joined: 5 Jan 06 Posts: 10 Credit: 100,868,563 RAC: 74,698 |
Fortunately this seems to be a one-off and other tasks are processing as expected. Restarting bionic client caused it to realize it needed to error out this task. Maybe at some point bionic client will recognize similar errors (for any project) and avoid a restart or abort |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,616,240 RAC: 22,198 |
And the boinc-process host is down again.Still dead, so still no work being Validated. Grant Darwin NT |
tgbauer Send message Joined: 5 Jan 06 Posts: 10 Credit: 100,868,563 RAC: 74,698 |
Looks like Application "Rosetta Beta 6.06" tasks are using 2.5GB of RAM each! That becomes a bit inefficient when have 128 cores in a computer and 128GB RAM (only 46/128 cores used). Ones before that and "Rosetta 4.20" are consuming less than 0.5GB (and all 128 cores used). The recent beta 6.06 tasks are now using less than 1GB (600MB compressed). Thank you for fixing the RAM size! Now I'm able to use all cores again |
Bill Swisher Send message Joined: 10 Jun 13 Posts: 32 Credit: 32,808,535 RAC: 52,672 |
It appears that they (whoever they are) have resolved the massive memory gobbling. Do you think I would be wise to remove the limitation on the beta runs? I currently have it limited to only 6 per computer. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
I think so. It's possible it ran short of RAM as some tasks are demanding high amounts recently, but better to think of it as a one-off and just move on. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
And the boinc-process host is down again.Still dead, so still no work being Validated. It came back about 8hrs ago. Everything nearly cleared down now. And some tasks became available, but have all been gobbled up again. All very hand-to-mouth |
Matthew Tireman Send message Joined: 24 Mar 20 Posts: 6 Credit: 387,215 RAC: 2,069 |
:/ |
Matthew Tireman Send message Joined: 24 Mar 20 Posts: 6 Credit: 387,215 RAC: 2,069 |
One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks. It almost immediately fails the tasks. Ive: Reinstalled boinc Enabled virtiualization Reinstalled virtualbox twice If this isn't solveable then is it possible to disable Rosetta 6 beta tasks specifically on this machine? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,269,631 RAC: 2,588 |
One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks. I tried to look up which of your systems that is in order to see if I could help. The information I found by clicking on your author name did not include the system type (phenom ii), only items like the CPU and GPU types, so I couldn't help. 9phenom ii |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
One of my systems (phenom ii x6 1065t) fails all Rosetta BETA 6 tasks yet is fine with Rosetta 4 tasks. It looks to be this one. I can't help either. Some tasks crashed with their wingman too, but others completed fully and successfully. The only thing I might ask about is if that PC is overclocked or old and maybe overheating. Might it need a clean-out of dust from fans and vents in order to run cooler? Can't do any harm. But I'm guessing - I have no idea what's wrong. And there's no way to disable Rosetta Beta tasks only. If Matthew doesn't mind the wasted bandwidth, let them crash out in a few seconds and someone else will have a go at them while he moves on with other tasks that do run successfully. |
Bill Swisher Send message Joined: 10 Jun 13 Posts: 32 Credit: 32,808,535 RAC: 52,672 |
[ Ahh...but there is! At least under linux. Thanks to the beta jobs asking for 2+GB of memory I took the hint(s) and restricted them. But they've fixed that problem so it turned into a "learning experience" and I'm limiting the number of einstein@home jobs now. Details available via private message if anyone is interested in how. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,616,240 RAC: 22,198 |
The only thing to try that comes to mind is to reset the Project. If one of the data files needed for Beta Tasks has become corrupted, that can cause the problem you're experiencing. Resetting the project will release all downloaded work, and clear out all existing application & database files & re-download them from the project from scratch. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1673 Credit: 17,616,240 RAC: 22,198 |
boinc-process host has died yet again... Grant Darwin NT |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 1994 Credit: 9,524,889 RAC: 7,500 |
boinc-process host has died yet again... I missed it a little |
OffDutyTaoist Send message Joined: 10 Oct 06 Posts: 3 Credit: 1,982,768 RAC: 1,070 |
My Pixel 6 recently was having issues with Rosetta v4.20 arm-android-linux-gnu. Specifically: rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_04_05_2997716_34_0 rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_05_12_2997716_33_0 rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_07_2997716_32_0 When they started, it would get up to about ~1.5 to 1.75% completed and then reset my phone, and start over at 0%. I aborted all three, in retrospect I should have pause two and tried to isolate if one exactly causing the issue. But, I have some other stuff going on and acted out of frustration, so that one is on me. If I can provide anything else that might help, let me know. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
boinc-process host has died yet again... Still down, but two batches of tasks issued and 1m+ queued up to process |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2117 Credit: 41,155,895 RAC: 16,061 |
boinc-process host has died yet again... Still down, 400k awaiting validation now, but also the front page info seems to have frozen - no update for @18hrs while the Server Status page still seems ok. For now |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2024 University of Washington
https://www.bakerlab.org