scripts/twister: Fix race with device-testing

CPython is sometimes described as "single threaded" due to the GIL,
but the interpreter will still "preemptively" switch between threads
(the details seem poorly documented).

So the time between checking whether acquire is 1 and decrementing the
count could result in more than one thread seeing an "available"
device, and more than one test being run (simultaneously, on the same
physical device!).  We have a big herd of threads all polling for
this, so in a large test run this would happen maybe one time out of
20-30 tries.

Use a lock.  Also remove the very similar looking
DUT.get_available_device() method, which had the same bug but appears
to be dead code.

Fixes #32679

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This commit is contained in:
Andy Ross 2021-03-02 05:13:07 -08:00 committed by Anas Nashif
parent e58e2767f8
commit 098fce351f

21
scripts/pylib/twister/twisterlib.py Normal file → Executable file
View file

@ -649,25 +649,22 @@ class DeviceHandler(Handler):
log_out_fp.close()
def get_available_device(self, instance):
device = instance.platform.name
for d in self.suite.duts:
if d.platform == device and d.available and (d.serial or d.serial_pty):
d.available = 0
d.counter += 1
return d
return None
def device_is_available(self, instance):
device = instance.platform.name
fixture = instance.testcase.harness_config.get("fixture")
for d in self.suite.duts:
if fixture and fixture not in d.fixtures:
continue
if d.platform == device and d.available and (d.serial or d.serial_pty):
if d.platform != device or not (d.serial or d.serial_pty):
continue
d.lock.acquire()
avail = False
if d.available:
d.available = 0
d.counter += 1
avail = True
d.lock.release()
if avail:
return d
return None
@ -3845,7 +3842,7 @@ class DUT(object):
self.pre_script = pre_script
self.probe_id = None
self.notes = None
self.lock = Lock()
self.match = False