scripts/twister: Fix race with device-testing

CPython is sometimes described as "single threaded" due to the GIL, but the interpreter will still "preemptively" switch between threads (the details seem poorly documented). So the time between checking whether acquire is 1 and decrementing the count could result in more than one thread seeing an "available" device, and more than one test being run (simultaneously, on the same physical device!). We have a big herd of threads all polling for this, so in a large test run this would happen maybe one time out of 20-30 tries. Use a lock. Also remove the very similar looking DUT.get_available_device() method, which had the same bug but appears to be dead code. Fixes #32679 Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
2021-03-02 05:13:07 -08:00 · 2021-03-02 05:13:07 -08:00 · 098fce351f
parent e58e2767f8
commit 098fce351f
1 changed files with 9 additions and 12 deletions
--- a/scripts/pylib/twister/twisterlib.py
+++ b/scripts/pylib/twister/twisterlib.py
@ -649,25 +649,22 @@ class DeviceHandler(Handler):

        log_out_fp.close()

-    def get_available_device(self, instance):
-        device = instance.platform.name
-        for d in self.suite.duts:
-            if d.platform == device and d.available and (d.serial or d.serial_pty):
-                d.available = 0
-                d.counter += 1
-                return d
-
-        return None
-
    def device_is_available(self, instance):
        device = instance.platform.name
        fixture = instance.testcase.harness_config.get("fixture")
        for d in self.suite.duts:
            if fixture and fixture not in d.fixtures:
                continue
-            if d.platform == device and d.available and (d.serial or d.serial_pty):
+            if d.platform != device or not (d.serial or d.serial_pty):
+                continue
+            d.lock.acquire()
+            avail = False
+            if d.available:
                d.available = 0
                d.counter += 1
+                avail = True
+            d.lock.release()
+            if avail:
                return d

        return None
@ -3845,7 +3842,7 @@ class DUT(object):
        self.pre_script = pre_script
        self.probe_id = None
        self.notes = None
-
+        self.lock = Lock()
        self.match = False