I’m having a problem with POSIX semaphores using Linux version 2.6.23.8-34.fc7. I have one POSIX management thread feeding data to multiple POSIX processing threads. I’m using round-robin scheduling with each thread at the same priority and specifically setting the processor affinity for each of the processing threads so as to use the 2 CPUs in parallel equally diving the processing load amongst them. I use the ‘sched_yield’ call in the management thread after posting the semaphore to allow the processing threads to proceed. The processing interval is on the order of 40 ms and the quantum is set to the default value of 100 ms. I’m using individual POSIX unnamed binary semaphores for IPC between the management thread and the processing threads. I’ve noticed that occasionally one or more of the processing threads will lose synchronization with the management thread with no apparent errors reported. I’ve identifying the problem by placing loop counters in both the management and processing threads and printing errors when they differ. This scheme appears to work properly for many hours but eventually one or more of the processing threads will differ in the loop count. I’ve written similar code on other UNIX systems so I’m pretty sure it’s not a programming error. The problem appears to only occur when the management and processing threads are on different CPUs. The problem is further exacerbated by an increase in the number of processing threads and will manifest itself within minutes with this increased load. I’ve found no mention of this problem on the WEB. Is this a known problem? Any ideas? Do I have to dump the memory cache to synchronize the two CPUs or something? Thanks…
Bookmarks