[Solved]-Automatically detect test coupling

5👍

For a definite answer you’d have to run each test in complete isolation from the rest.

With pytest, which is what I use, you could implement a script that first runs it with --collect-only and then use the test node ids returned to initiate an individual pytest run for each of them.
This will take a good while for your 1500 tests, but it should do the job as long as you completely recreate
the state of your system between each individual test.

For an approximate answer, you can try running your tests in random order and see how many start failing. I had a similar question recently, so I tried two pytest plugins — pytest-randomly and pytest-random:
https://pypi.python.org/pypi/pytest-randomly/
https://pypi.python.org/pypi/pytest-random/

From the two, pytest-randomly looks like the more mature one and even supports repeating a certain order by accepting a seed parameter.

These plugins do a good job in randomising the test order, but for a large test suite complete randomisation may not be very workable because you then have too many failing tests and you don’t know where to start.

I wrote my own plugin that allows me to control the level at which the tests can change order randomly (module, package, or global). It is called pytest-random-order: https://pypi.python.org/pypi/pytest-random-order/

UPDATE. In your question you say that the failure cannot be reproduced when running tests individually. It could be that you aren’t completely recreating the environment for individual test run. I think it’s ok that some tests leave state dirty. It is the responsibility of each test case to set up the environment as they need it and not necessarily clean up afterwards due to performance overhead this would cause for subsequent tests or just because of the burden of doing it.

If test X fails as part of a larger test suite and then does not fail when running individually, then this test X is not doing a good enough job in setting up the environment for the test.

👤jbasko

2👍

As you are already using nosetests framework, perhaps you can use nose-randomly (https://pypi.python.org/pypi/nose-randomly) to run the test cases in a random order.

Every time you run the nose tests with nose-randomly, each run is tagged with a random seed which you can use to repeat the same order of running the test.

So you run your test cases with this plugin multiple times and record the random seeds. Whenever you see any failures with a particular order, you can always reproduce them by running them with the random seed.

Ideally it is not possible to identify the test dependencies and the failures unless you run all the combinations of 1500 tests which is 2^1500-1.

So make it a habit to run your tests with random enabled always when you run them. At some point you will hit failures and keep running them till you catch as many failures as possible.

Unless the failures are catching real bugs of your product, it is always a good habit to fix them and make test dependencies as less as possible. This will keep the consistency of a test result and you can always run and verify a test case independently and be sure of the quality of your product around that scenario.

Hope that helps and this is what we do in our work place to achieve exactly the same situation you are trying to achieve.

1👍

I’ve resolved similar issues on a large Django project which was also using nose runner and factory-boy. I can’t tell you how to automatically detect test coupling like the question asked for, but I’ve the hindsight to tell about some of the issues which were causing coupling in my case:

Check all imports of TestCase and make sure they use Django’s TestCase and not unittest’s TestCase. If some developers on the team are using PyCharm, which has a handy auto-import feature, it can be very easy to accidentally import the name from the wrong place. The unittest TestCase will happily run in a big Django project’s test suite, but you may not get the nice commit and rollback features that the Django test case has.

Make sure that any test class which overrides setUp, tearDown, setUpClass, tearDownClass also delegates to super. I know this sounds obvious, but it’s very easy to forget!

It is also possible for mutable state to sneak in due to factory boy. Careful with usages of factory sequences, which look something like:

name = factory.Sequence(lambda n: 'alecxe-{0}'.format(n))

Even if the db is clean, the sequence may not start at 0 if other tests have run beforehand. This can bite you if you have made assertions with incorrect assumptions about what the values of Django models will be when created by factory boy.

Similarly, you can’t make assumptions about primary keys. Suppose a django model Potato is keyed off an auto-field, and there are no Potato rows at the beginning of a test, and factory boy creates a potato i.e. you used PotatoFactory() in the setUp. You are not guaranteed that the primary key will be 1, surprisingly. You should hold a reference to the instance returned by the factory, and make assertions against that actual instance.

Be very careful also with RelatedFactory and SubFactory. Factory boy has a habit of picking any old instance to satisfy a relation, if one already exists hanging around in the db. This means what you get as a related object is sometimes not repeatable – if other objects are created in setUpClass or fixtures, the related object chosen (or created) by a factory may be unpredictable because the order of the tests is arbitrary.

Situations where Django models have @receiver decorators with post_save or pre_save hooks are very tricky to handle correctly with factory boy. For better control over related objects, including the cases where just grabbing any old instance may not be correct, you sometimes have to handle details yourself by overriding the _generate class method on a factory and/or implementing your own hooks using @factory.post_generation decorator.

👤wim

0👍

This happens when a test does not properly tear down its environment.

Ie: in the setup phase of the test, one creates some objects in the test db, perhaps writes to some files, opens network connections, and so on, but does not properly reset the state, thus passing on the information to subsequent tests, which can then fail due to faulty assumptions on their input data.

Rather than focus on coupling between the tests (which in the above case is going to be somewhat moot as it may depend on the order they are run in), perhaps it would be better to run a routine which tests the teardown routine of each test.

This could be accomplished by wrapping your original Test class and overriding the teardown function to include some sort of generic test that the test env has been properly reset for a given test.

Something like:

class NewTestClass(OriginalTestClass):
   ...

    def tearDown(self, *args, **kwargs):
        super(NewTestClass, self).tearDown(*args, **kwargs)
        assert self.check_test_env_reset() is True, "IM A POLLUTER"

And then in your test files replace the import statement of the original test class with the new one:

# old import statement for OriginalTestClass
from new_test_class import NewTestclass as OriginalTestClass

Subsequently running the tests should result in failures for the ones that are causing the pollution.

On the other hand, if you would like to allow for tests to be somewhat dirty, then you can instead view the problem as a faulty setup of the test environment for the failing tests.

From this later perspective, the failing tests are, well, badly written tests and need to be fixed individually.

The two perspectives are to a certain extent yin and yang, you could adopt either view. I favor the latter when possible, as it is more robust.

Leave a comment