There is no need to post twice about the same issue.
I proted a piece of software (stacakble filesystem) from linux 2.6.11 to 2.6.17.13. One of the major changes are to the inode semaphores which have been now replaced with mutexes. I have replaced calls to up/ down in older kernel to calls mutex_unlock and mutex_lock in the newer kernel. But strangely when i mount the statcakble FS and try some sort of compilation on the mount point, the compilation continues for a while and then i see a dump as pasted below.
Conditions : SMP tunrned off and i see that the mutexes are not being cyclically acquired by appropriate printks and this happens when a program named "ld" is trying to unlock a mutex which was taken by itself. Any hints?
Dump :
BUG: spinlock cpu recursion on CPU#0, kswapd0/121
lock: c03ac5f4, .magic: dead4ead, .owner: ld/5148, .owner_cpu: 0
<c01e2d22> _raw_spin_lock+0x48/0xec <c016bf5a> prune_dcache+0x10/0x118
<c016c076> shrink_dcache_memory+0x14/0x2b <c0143c89> shrink_slab+0xe4/0x148
<c012b217> finish_wait+0x2a/0x4b <c0144b23> balance_pgdat+0x212/0x31d
<c0144d60> kswapd+0xce/0xd0 <c012b17c> autoremove_wake_function+0x0/0x35
<c0144c92> kswapd+0x0/0xd0 <c0101005> kernel_thread_helper+0x5/0xb
BUG: soft lockup detected on CPU#0!
<c013b0ed> softlockup_tick+0x90/0xa1 <c01231d5> update_process_times+0x35/0x57
<c0105df7> timer_interrupt+0x60/0x99 <c013b18a> handle_IRQ_event+0x23/0x4c
<c013b22f> __do_IRQ+0x7c/0xd1 <c0104d04> do_IRQ+0x62/0x7f
=======================
<c010356a> common_interrupt+0x1a/0x20 <c010dfc7> delay_pmtmr+0xb/0x13
<c01e1dc1> __delay+0x9/0xa <c01e2d4f> _raw_spin_lock+0x75/0xec
<c016bf5a> prune_dcache+0x10/0x118 <c016c076> shrink_dcache_memory+0x14/0x2b
<c0143c89> shrink_slab+0xe4/0x148 <c012b217> finish_wait+0x2a/0x4b
<c0144b23> balance_pgdat+0x212/0x31d <c0144d60> kswapd+0xce/0xd0
<c012b17c> autoremove_wake_function+0x0/0x35 <c0144c92> kswapd+0x0/0xd0
<c0101005> kernel_thread_helper+0x5/0xb
BUG: soft lockup detected on CPU#0!
<c013b0ed> softlockup_tick+0x90/0xa1 <c01231d5> update_process_times+0x35/0x57
<c0105df7> timer_interrupt+0x60/0x99 <c013b18a> handle_IRQ_event+0x23/0x4c
<c013b22f> __do_IRQ+0x7c/0xd1 <c0104d04> do_IRQ+0x62/0x7f
=======================
<c010356a> common_interrupt+0x1a/0x20 <c010dfc7> delay_pmtmr+0xb/0x13
<c01e1dc1> __delay+0x9/0xa <c01e2d4f> _raw_spin_lock+0x75/0xec
<c016bf5a> prune_dcache+0x10/0x118 <c016c076> shrink_dcache_memory+0x14/0x2b
<c0143c89> shrink_slab+0xe4/0x148 <c012b217> finish_wait+0x2a/0x4b
<c0144b23> balance_pgdat+0x212/0x31d <c0144d60> kswapd+0xce/0xd0
<c012b17c> autoremove_wake_function+0x0/0x35 <c0144c92> kswapd+0x0/0xd0
<c0101005> kernel_thread_helper+0x5/0xb
BUG: soft lockup detected on CPU#0!
Last edited by gopala.surya; 10-12-2006 at 07:20 AM.
There is no need to post twice about the same issue.
My crime is that of curiosity. My crime is that of judging people by what they say and think, not what they look like. My crime is that of outsmarting you, something that you will never forgive.
Stackable filesystems, I assume you mean UnionFS which is known to currently be very prone by design to deadlock. To paraphrase Al Viro on the issue of UnionFS... NOOOO!!!!!!!!!! It's a cool idea but the implementation is horrible and it's likely to always be so. Regardless there are new versions out for current kernels, if you do kernel development it's advisable to always work on either Andrew Morton' -mm tree or Linus' current tree because of ongoing work fixing bugs and API changes.
Regardless I think you'll have more succes with you custom code by posting it to the Linux Kernel Mailing List, I doubt anyone can deduce your problem without actually seeing your code let alone anyone here.
I am using Fedora core 5 on which i have installed 2.6.17.13. I am porting a project called tracefs. I am aware that porting from semaphores to mutexes in inodes is trivial. I suspect that I might have missed a step or two while porting the changes. Wanted to check what should be cheked for if such bugs r thrown. Do we still get spinklock issues if SMP=OFF?
This is with reference to my post above. I re compiled the original version of the software and ran it on 2.6.11. On this i find the problem existed even earlier but kernel 2.6.11 just printed a warning
kernel: lib/dec_and_lock.c:32: spin_lock(fs/dcache.c:c0408e1c)
already locked by lib/dec_and_lock.c/32
kernel: fs/dcache.c:176: spin_unlock(fs/dcache.c:c0408e1c) not locked
This in 2.6.16 and above is explicitly caught as BUG and system halts.
The header linux/spinlock.h included earlier had a macro defined which
used to post a message as this and continue. The new linux/spinlock.h header does not have a option as in 2.6.16. There is another header file asm-i386/spinlock.h now BUGS this and halts.
If this problem was faced and solved by anyone or anyone knows how to solve this, pls give me some pointers as to what might be causing these problems.
Bookmarks