[Mageia-sysadm] Jonund and Ecosse restarted...
Thomas Backlund
tmb at mageia.org
Tue Aug 9 17:55:40 CEST 2011
Pascal Terjan skrev 9.8.2011 16:37:
> On Fri, Aug 5, 2011 at 00:33, Thomas Backlund<tmb at mageia.org> wrote:
>> Hi,
>>
>> Since both Jonund and Ecosse had dropped some of their build speed,
>> I checked them out.
>>
>> both had zombie rpmbuild processes with the oldest dating about ~8 days ago,
>> slow disk io and Ecosse hit ATA Bus Reset errors.
>>
>> So I restarted both to flush out the memory and re-init the disc
>> controllers. both are now running nicely again.
>
> ecosse is very very slow and looking at dmesg, it seems to have happened again
>
> ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> ata3.00: failed command: FLUSH CACHE EXT
> ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata3.00: status: { DRDY }
> ata3: soft resetting link
> ata3.00: configured for UDMA/133
> ata3.00: retrying FLUSH 0xea Emask 0x4
> ata3: EH complete
>
> urpmi installs a few packages per minute only, spending most time in D
> state while nothing else is running on the machine
Hm,
this one seems to start the whole mess:
INFO: task jbd2/dm-0-8:633 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
jbd2/dm-0-8 D 0000000000000002 0 633 2 0x00000000
ffff88022e3a7b40 0000000000000046 ffff88022e3a7b00 ffffffff8104d7ea
0000000000015840 0000000000000282 ffff88022e3a7af0 000000006c462645
ffff88022e3a7fd8 000000000000faf0 ffff88022f172da0 ffff88022dca16d0
Call Trace:
[<ffffffff8104d7ea>] ? try_to_wake_up+0x2da/0x410
[<ffffffff81084609>] ? ktime_get_ts+0xa9/0xe0
[<ffffffff810e9680>] ? sync_page+0x0/0x50
[<ffffffff813c3b33>] io_schedule+0x73/0xc0
[<ffffffff810e96bd>] sync_page+0x3d/0x50
[<ffffffff813c418f>] __wait_on_bit+0x5f/0x90
[<ffffffff810e9843>] wait_on_page_bit+0x73/0x80
[<ffffffff8107a1d0>] ? wake_bit_function+0x0/0x40
[<ffffffff810f3bf5>] ? pagevec_lookup_tag+0x25/0x40
[<ffffffff810e9c5d>] filemap_fdatawait_range+0x10d/0x1a0
[<ffffffff810e9d1b>] filemap_fdatawait+0x2b/0x30
[<ffffffffa0008df5>] jbd2_journal_commit_transaction+0x745/0x12f0 [jbd2]
[<ffffffff8106b53b>] ? try_to_del_timer_sync+0x7b/0xe0
[<ffffffffa000f193>] kjournald2+0xb3/0x200 [jbd2]
[<ffffffff8107a190>] ? autoremove_wake_function+0x0/0x40
[<ffffffffa000f0e0>] ? kjournald2+0x0/0x200 [jbd2]
[<ffffffff81079c66>] kthread+0x96/0xa0
[<ffffffff8100ae24>] kernel_thread_helper+0x4/0x10
[<ffffffff81079bd0>] ? kthread+0x0/0xa0
[<ffffffff8100ae20>] ? kernel_thread_helper+0x0/0x10
IIRC, it was fixed in newer kernels, but since mdv still fails to
release fixed kernels for 2010.1 I guess I'll either release a fixed
kernel myself for Mageia BS, or simply rebuild the Mageia 1 kernel for it.
One other thing we could do for now is to only have one buildbot running
until the kernel is fixed.
--
Thomas
More information about the Mageia-sysadm
mailing list