[Mageia-dev] Issues with dracut

Colin Guthrie mageia at colin.guthr.ie
Tue Feb 14 16:53:55 CET 2012


'Twas brillig, and Colin Guthrie at 03/02/12 16:00 did gyre and gimble:
> 'Twas brillig, and David W. Hodgins at 03/02/12 08:04 did gyre and gimble:
>> On Tue, 17 Jan 2012 07:22:30 -0500, Colin Guthrie
>> <mageia at colin.guthr.ie> wrote:
>>
>>> Are things working OK for you now with dracut or is it still busted?
>>
>> Just to clarify why I think the problem is happening on single
>> core systems.
>>
>> On a multi-core system, the bash and udevd processes will be
>> running on different cores.
>> When the script executes the udev settle command, it continues
>> to execute, so the loop checking to see if udev is done finds
>> it isn't, so it then looks for/runs the initqueue jobs.
>>
>> On a single core system, the bash script waits for the settle
>> command to finish, so then finds it's done, and exits without
>> even trying to run the initqueue jobs.
>>
>> The patch in my prior message is effectively changing the script
>> from "udev done or jobs done" to "udev done and jobs done".
> 
> Hmm, actually thinking about this more, I'm not 100% sure I agree with
> this argument. The number of cores should be irrelevant here as the
> program itself should be dealing with things synchronously anyway.
> 
> I'm wondering if it's more of an issue relating to the fact that it's
> not specifically waiting for the LVM device to be ready. I guess your /
> is either not on LVM or is in a different Volume Group? In my tests it
> worked, but perhaps the dual core machine is simply that bit faster (and
> it's speed, not #cores that is important)?
> 
> In the file parse-lvm.sh, it does a for loop and has a wait_for_dev
> call. This function will put stuff into the initqueue that should
> prevent the exiting of the loop until that device exists...
> 
>     for dev in $(getargs rd.lvm.vg rd_LVM_VG=) $(getargs rd.lvm.lv
> rd_LVM_LV=); do
>         wait_for_dev "/dev/$dev"
>     done
> 
> Now according to the man page, these options are only meant to be used
> to restrict what devices are activated so they shouldn't be needed per-se.
> 
> But it brings an important point... there does not appear to be any
> "wait_for_dev" calls for the usrmount module  So nothing is going to be
> waiting for the device to exist. If it takes a little while to come up
> it could lead to your error.
> 
> And herein we have chicken and egg... we don't know where /usr is (i.e.
> which /dev/foo) until we mount /  (as we have to read /etc/fstab). But
> by the time we've mounted /, we've already exited this loop and thus
> cannot re-enter the loop to wait for more devices.
> 
> Tricky, and certainly something I'll discuss with Harald this weekend.
> He does have a separate branch that deals with usr mounting in a more
> holistic way (i.e. it handles /usr/bin being a separate mount if that
> floats your boat!), but I've not looked at this for a while to see if
> he's progressed any with it.
> 
> All in all, it's perhaps just the fact that the first call to udevadm
> settle is skipped due to there being nothing in your initqueue/finished/
> folder? You can check via passing rd.break=initqueue and looking in the
> folder.
> 
> If so, then all that should be needed to get this into shape is to put a
> dummy file in there as part of the 98usrmount module, have that file
> delete itself and return and error code, thus causing check_finished()
> to return non zero and thus the call to udevsettle will be reached.
> 
> 
> If this is NOT the issue, then it should just be a timing thing plain
> and simple. To confirm, this you should simply be able to pass
> rd.break=pre-pivot to the command line, wait a little while and then
> just type exit to continue the boot process. This extra time should be
> sufficient for udev to "see" the LVM stuff and for the mount command to
> succeed (I hope!)
> 
> Sorry for the long reply. You will likely have to poke in the dracut
> code to understand everything I'm saying, but it looks like you're doing
> that happily already :D


OK, so I sadly didn't get a chance to speak to Harald in Brussels (only
saw him briefly during a talk so couldn't go through my list of issues
:)) but think my comments above were correct.

To summarise, a problem would occur if / was on ext4 and /usr was on
LVM. The LVM would never get activated. If / was on LVM too (but a
different VG to /usr) then all would be fine.

I think this is the scenario you had issues with.

Looking at the new code in dracut 015, I think it writes out the
variables I mentioned above (rd.lvm.vg) into a cmdline.d folder and thus
the LVM for /usr should now get activated.

In short, can you test the new dracut version just submitted?

Cheers

Col


-- 

Colin Guthrie
colin(at)mageia.org
http://colin.guthr.ie/

Day Job:
  Tribalogic Limited http://www.tribalogic.net/
Open Source:
  Mageia Contributor http://www.mageia.org/
  PulseAudio Hacker http://www.pulseaudio.org/
  Trac Hacker http://trac.edgewall.org/


More information about the Mageia-dev mailing list