Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 291 · 292 · 293 · 294 · 295 · 296 · 297 . . . 300 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,616,240
RAC: 22,198
Message 109752 - Posted: 18 Sep 2024, 22:24:13 UTC

The boinc-process host is down again, so no Validation for work being returned at this time.
Grant
Darwin NT
ID: 109752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2117
Credit: 41,155,895
RAC: 16,061
Message 109753 - Posted: 18 Sep 2024, 22:47:45 UTC - in response to Message 109752.  

The boinc-process host is down again, so no Validation for work being returned at this time.

Sometimes the server page doesn't report accurately, so when I see some parts of boinc-process are running (some assimilators) I'm not sure what to think.
Rosetta_beta and Rosetta_python validators were showing as running for a while, even when other parts weren't, but have now switched to not running again.
Whatever's really happening, it all comes across as very flaky <sigh>
ID: 109753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TPCBF

Send message
Joined: 29 Nov 10
Posts: 111
Credit: 5,070,625
RAC: 2,159
Message 109754 - Posted: 18 Sep 2024, 23:54:16 UTC - in response to Message 109753.  

Well, no new task during the day, nothing validated, still all assimilator/vaildators not running... :(
ID: 109754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,616,240
RAC: 22,198
Message 109759 - Posted: 19 Sep 2024, 22:31:27 UTC

The boinc-process host is back up again, although we now have a error message on the main page in the Server Status section
Notice: Undefined variable: stats in /projects/boinc/rosetta/html/user/index.php on line 81

Grant
Darwin NT
ID: 109759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,616,240
RAC: 22,198
Message 109760 - Posted: 20 Sep 2024, 3:53:47 UTC

Another 600k or so Tasks just released.

Hopefully things will stay up for a while.
Grant
Darwin NT
ID: 109760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2117
Credit: 41,155,895
RAC: 16,061
Message 109764 - Posted: 20 Sep 2024, 12:08:51 UTC - in response to Message 109760.  

Another 600k or so Tasks just released.

Hopefully things will stay up for a while.

I arrived at my PC that crashed every task it grabbed from the last batch, like yours did, last night and saw boinc-process was back an hour or two before you posted.
It'd been back for some while already, going by how much the validation backlog had reduced.
Now that tasks are available, let's see if it handles this new batch any better.

I'm currently on another PC that crashed last Monday and missed the last batch altogether, but is rushing through its last few WCG tasks that are right up against their deadline, so I won't find out how this one goes until I get home tonight. Fingers crossed on them all.
ID: 109764 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,616,240
RAC: 22,198
Message 109770 - Posted: 21 Sep 2024, 3:29:40 UTC

Server Status is showing all green, but a backlog is developing with the Assimilators.
Grant
Darwin NT
ID: 109770 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2117
Credit: 41,155,895
RAC: 16,061
Message 109773 - Posted: 21 Sep 2024, 10:33:08 UTC - in response to Message 109764.  
Last modified: 21 Sep 2024, 10:33:22 UTC

Another 600k or so Tasks just released.

Hopefully things will stay up for a while.

I arrived at my PC that crashed every task it grabbed from the last batch, like yours did, last night and saw boinc-process was back an hour or two before you posted.
It'd been back for some while already, going by how much the validation backlog had reduced.
Now that tasks are available, let's see if it handles this new batch any better.

I'm currently on another PC that crashed last Monday and missed the last batch altogether, but is rushing through its last few WCG tasks that are right up against their deadline, so I won't find out how this one goes until I get home tonight. Fingers crossed on them all.

Both running fine and running all tasks to completion.
Not sure what the previous blip was about
ID: 109773 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Bill F
Avatar

Send message
Joined: 29 Jan 08
Posts: 44
Credit: 1,561,577
RAC: 1,172
Message 109775 - Posted: 23 Sep 2024, 4:55:22 UTC

Trying to get attention that the Stat's export for RALPH has not been updated in over 42 days and that the Posting on the RALPH Message Board is getting no attention.

Respectfully
Bill F
In October 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.

ID: 109775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,524,889
RAC: 7,500
Message 109776 - Posted: 23 Sep 2024, 5:12:27 UTC - in response to Message 109775.  
Last modified: 23 Sep 2024, 5:13:37 UTC

Trying to get attention that the Stat's export for RALPH has not been updated in over 42 days and that the Posting on the RALPH Message Board is getting no attention.


After years on Ralph, i think it's a lost cause...
I write, sometimes, in their forums, but i have not a lot of hope

(not that the Rosetta forums are much better)
ID: 109776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2117
Credit: 41,155,895
RAC: 16,061
Message 109787 - Posted: 25 Sep 2024, 14:36:01 UTC

Somehow grabbed 16 rb tasks this morning to fill my cache
Just checked how many tasks were issued and it appears to be next to none - seems I was just lucky with the odd few
Then I saw boinc-process is down again - not so lucky after all <sigh>
ID: 109787 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,616,240
RAC: 22,198
Message 109792 - Posted: 30 Sep 2024, 7:33:47 UTC

A glitch is back on the main page
Notice: Undefined variable: stats in /projects/boinc/rosetta/html/user/index.php on line 81
Just under the Server Status heading.
Grant
Darwin NT
ID: 109792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,616,240
RAC: 22,198
Message 109793 - Posted: 1 Oct 2024, 7:31:00 UTC
Last modified: 1 Oct 2024, 7:50:19 UTC

Glitch fixed, and lo and behold- new work!


And an interesting batch it is- It's Beta 6.06 work, but they're using 1.4 to 1.7GB of RAM each, and it looks like their target Runtime is 8 hours (unlike the usual 200-400MB of RAM and 3 hrs runtime of previous Beta Tasks.
Grant
Darwin NT
ID: 109793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bill Swisher

Send message
Joined: 10 Jun 13
Posts: 32
Credit: 32,808,832
RAC: 52,579
Message 109795 - Posted: 1 Oct 2024, 16:30:53 UTC - in response to Message 109793.  
Last modified: 1 Oct 2024, 16:32:11 UTC

Ouch! At first glance this beta does not seem to play well with my processors. I had two of them go slightly insane, so to speak, btw the only two to download the beta. Both the computers are running openSUSE Leap 15.6, the first (an old AMD Ryzen Threadripper 2950X) jumped up to 99 active users and started swapping like mad (32Gb of memory). I had to hit the power button to get to where I could suspend boinc. The second (a newer AMD Ryzen 9 7950X) only jumped up to 64 users before I suspended boinc.
I'm heading out of town for a couple of days so all the machines aren't getting any Rosetta until I get back and can watch them.
ID: 109795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2117
Credit: 41,155,895
RAC: 16,061
Message 109796 - Posted: 1 Oct 2024, 21:16:44 UTC - in response to Message 109793.  

Glitch fixed, and lo and behold- new work!

And an interesting batch it is - It's Beta 6.06 work, but they're using 1.4 to 1.7GB of RAM each, and it looks like their target Runtime is 8 hours (unlike the usual 200-400MB of RAM and 3 hrs runtime of previous Beta Tasks.

I polled 2hrs before your msg and got nothing, then not again until 3hrs ago - and only just noticed. Argh
Not many left to grab now either - a small batch.

On runtime, I've said before I use a 12hr runtime, but didn't mention that the last batch of 16 I sneaked a few days ago only ran 8hrs too.
Not sure if that's a coincidence as my 12hr setting usually overrides the default set to the individual tasks.

Anyway, work is work. I'll take whatever I can get.
ID: 109796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 187
Credit: 6,375,683
RAC: 5,738
Message 109797 - Posted: 1 Oct 2024, 22:03:33 UTC - in response to Message 109795.  

Ouch! At first glance this beta does not seem to play well with my processors.


The first three I got ran just fine. My machine is running Red Hat Enterprise Linux release 8.10 (Ootpa)
us ing kernel 4.18.0-553.22.1.el8_10.x86_64

1583987342 	1409332410 	1 Oct 2024, 10:50:55 UTC 	1 Oct 2024, 18:23:18 UTC 	Completed and validated 	26,366.61 	25,872.01 	369.98 	Rosetta Beta v6.06
x86_64-pc-linux-gnu
1583987355 	1409332393 	1 Oct 2024, 10:50:55 UTC 	1 Oct 2024, 18:26:58 UTC 	Completed and validated 	27,345.06 	26,815.76 	383.71 	Rosetta Beta v6.06
x86_64-pc-linux-gnu
1583987363 	1409332409 	1 Oct 2024, 10:50:55 UTC 	1 Oct 2024, 18:23:18 UTC 	Completed and validated 	26,759.98 	26,251.39 	375.50 	Rosetta Beta v6.06
x86_64-pc-linux-gnu

ID: 109797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Klimax

Send message
Joined: 27 Apr 07
Posts: 44
Credit: 2,800,788
RAC: 2,415
Message 109798 - Posted: 2 Oct 2024, 5:01:05 UTC - in response to Message 109793.  

Glitch fixed, and lo and behold- new work!


And an interesting batch it is- It's Beta 6.06 work, but they're using 1.4 to 1.7GB of RAM each, and it looks like their target Runtime is 8 hours (unlike the usual 200-400MB of RAM and 3 hrs runtime of previous Beta Tasks.

So that explains why WUs are failing on one of my computers. 20 threads and only 16GB of RAM and fairly small paging file. They could have warned us...
ID: 109798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,616,240
RAC: 22,198
Message 109799 - Posted: 2 Oct 2024, 6:46:45 UTC - in response to Message 109798.  
Last modified: 2 Oct 2024, 6:53:06 UTC

So that explains why WUs are failing on one of my computers. 20 threads and only 16GB of RAM and fairly small paging file. They could have warned us...
It's been the general rule of thumb since i've been here (a bit over 4 years)- 1.5GB of RAM per core/thread is needed in order to do Rosetta work.
It's only been recently with the Beta application that Tasks have used less (there were batches of Rosetta 4.20 work that have Tasks that used 2- 4GB each).


Edit-
Interestingly- on one system all running tasks are using up to 1.6GB of RAM, on the other only 2 are using more than 1GB of RAM, the rest 400-700MB.
Grant
Darwin NT
ID: 109799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1673
Credit: 17,616,240
RAC: 22,198
Message 109800 - Posted: 2 Oct 2024, 6:49:47 UTC - in response to Message 109797.  

Ouch! At first glance this beta does not seem to play well with my processors.
The first three I got ran just fine. My machine is running Red Hat Enterprise Linux release 8.10 (Ootpa)
us ing kernel 4.18.0-553.22.1.el8_10.x86_64
128GB of RAM on a 16 core/thread system leaves plenty of RAM available for the system even when all cores/threads are doing Rosetta work.
Grant
Darwin NT
ID: 109800 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Klimax

Send message
Joined: 27 Apr 07
Posts: 44
Credit: 2,800,788
RAC: 2,415
Message 109801 - Posted: 2 Oct 2024, 8:56:55 UTC - in response to Message 109799.  

So that explains why WUs are failing on one of my computers. 20 threads and only 16GB of RAM and fairly small paging file. They could have warned us...
It's been the general rule of thumb since i've been here (a bit over 4 years)- 1.5GB of RAM per core/thread is needed in order to do Rosetta work.
It's only been recently with the Beta application that Tasks have used less (there were batches of Rosetta 4.20 work that have Tasks that used 2- 4GB each).


Edit-
Interestingly- on one system all running tasks are using up to 1.6GB of RAM, on the other only 2 are using more than 1GB of RAM, the rest 400-700MB.

Argh. I just ( after writing a reply) realized what happened. NumberFields uses OpenCL for multiprecision arithmetic and OCL compiler will during compilation use up lots of RAM (sharp increase to few GBs, after it it will return back to fairly small footprint). So I have enough of virtual memory, when it's not being exhausted by another project...
ID: 109801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 291 · 292 · 293 · 294 · 295 · 296 · 297 . . . 300 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org