[Fixed]-List all suburls and check if broken in python

8👍

Use show-urls command in django-extensions as a starting point. (documentation)

python manage.py show_urls
👤ustun

11👍

Here is an improved class based on excellent @sneawo answer. Features include:

  • automatic loading of all urlconfs based on settings.ROOT_URLCONF;
  • recursive loading of included urlconfs;
  • testing url patterns that expect keyword parameters (via default parameters);
  • testing for different http codes;
  • testing urls that work only for logged in users.

Improvements are welcome.

    from django import test
    from django.core.urlresolvers import reverse
    from django.conf import settings
    import importlib

    class UrlsTest(test.TestCase):

        def test_responses(self, allowed_http_codes=[200, 302, 405],
                credentials={}, logout_url="", default_kwargs={}, quiet=False):
            """
            Test all pattern in root urlconf and included ones.
            Do GET requests only.
            A pattern is skipped if any of the conditions applies:
                - pattern has no name in urlconf
                - pattern expects any positinal parameters
                - pattern expects keyword parameters that are not specified in @default_kwargs
            If response code is not in @allowed_http_codes, fail the test.
            if @credentials dict is specified (e.g. username and password),
                login before run tests.
            If @logout_url is specified, then check if we accidentally logged out
                the client while testing, and login again
            Specify @default_kwargs to be used for patterns that expect keyword parameters,
                e.g. if you specify default_kwargs={'username': 'testuser'}, then
                for pattern url(r'^accounts/(?P<username>[\.\w-]+)/$' 
                the url /accounts/testuser/ will be tested.
            If @quiet=False, print all the urls checked. If status code of the response is not 200,
                print the status code.
            """
            module = importlib.import_module(settings.ROOT_URLCONF)
            if credentials:
                self.client.login(**credentials)
            def check_urls(urlpatterns, prefix=''):
                for pattern in urlpatterns:
                    if hasattr(pattern, 'url_patterns'):
                        # this is an included urlconf
                        new_prefix = prefix
                        if pattern.namespace:
                            new_prefix = prefix + (":" if prefix else "") + pattern.namespace
                        check_urls(pattern.url_patterns, prefix=new_prefix)
                    params = {}
                    skip = False
                    regex = pattern.regex
                    if regex.groups > 0:
                        # the url expects parameters
                        # use default_kwargs supplied
                        if regex.groups > len(regex.groupindex.keys()) \
                            or set(regex.groupindex.keys()) - set(default_kwargs.keys()):
                            # there are positional parameters OR
                            # keyword parameters that are not supplied in default_kwargs
                            # so we skip the url
                            skip = True
                        else:
                            for key in set(default_kwargs.keys()) & set(regex.groupindex.keys()):
                                params[key] = default_kwargs[key]
                    if hasattr(pattern, "name") and pattern.name:
                        name = pattern.name
                    else:
                        # if pattern has no name, skip it
                        skip = True
                        name = ""
                    fullname = (prefix + ":" + name) if prefix else name
                    if not skip:
                        url = reverse(fullname, kwargs=params)
                        response = self.client.get(url)
                        self.assertIn(response.status_code, allowed_http_codes)
                        # print status code if it is not 200
                        status = "" if response.status_code == 200 else str(response.status_code) + " "
                        if not quiet:
                            print(status + url)
                        if url == logout_url and credentials:
                            # if we just tested logout, then login again
                            self.client.login(**credentials)
                    else:
                        if not quiet:
                            print("SKIP " + regex.pattern + " " + fullname)
            check_urls(module.urlpatterns)

4👍

For simple urls without parameters, you can use such test:

from django import test
from django.core.urlresolvers import reverse
from foo.urls import urlpatterns

class UrlsTest(test.TestCase):

    def test_responses(self):
        for url in urlpatterns:
            response = self.client.get(reverse(url.name))
            self.assertEqual(response.status_code, 200)
👤sneawo

2👍

In Django 2.2.x, I had to use this slightly modified version of @sneawo’s excellent answer:

from django import test
from django.urls import reverse, URLPattern

from myapp.urls import urlpatterns


class MyAppUrlsTest(test.SimpleTestCase):

    def test_responses(self):
        for url in urlpatterns:
            # For now, perform only GET requests and ignore URLs that need arguments.
            if not isinstance(url, URLPattern) or url.pattern.regex.groups or not url.name:
                continue
            urlpath = reverse(url.name)
            response = self.client.get(urlpath, follow=True)
            self.assertEqual(response.status_code, 200)

Note that I’m also accounting for views that require arguments by just ignoring them. For my specific, simplistic use case, this also lets me exclude views by not giving them a name in my urlpatterns.

Also see https://github.com/encode/django-rest-framework/pull/5500#issue-146618375.

1👍

Another approach would be to add a logger like Sentry (with Raven) and add the contributed 404 middleware (or simply write your own custom 404 handler)

1👍

If your pages are already uploaded to a Web server, a zero-coding solution is to use the free W3C Link Checker. It will try every link it finds in a page and provide a nice summary.

0👍

I’ve taken a slightly different approach than the one using reverse, instead actually loading the sites and looking for all ‘hrefs’, then following all of those etc. The code below prints all the calls as a hierarchy. Currently it asserts the response code 200 (after following links), if you’re testing 25000 subsites it probably makes sense to just log the response codes and then search the output.

from django.conf import settings
from django.test.testcases import TestCase
import re
from urlparse import urlsplit, urljoin

class GenericTestCase( TestCase ):
    fixtures = []

    def test_links( self ):
        self.p1 = re.compile( r'href="([^"]*)"' )
        self.p2 = re.compile( r"href='([^']*)'" )
        self.visited_urls = set()
        self.visit( '/', 0 )

    def visit( self, url, depth ):
        print( '-' * depth + url ),
        self.visited_urls.add( url )
        response = self.client.get( url, follow=True )
        if response.redirect_chain:
            url = urlsplit( response.redirect_chain[-1][0] ).path
            print( ' => ' + url )
            if url in self.visited_urls:
                return
            self.visited_urls.add( url )
        else:
            print( '' )

        self.assertEquals( response.status_code, 200 )

        refs = self.get_refs( response.content )
        for relative_url in refs:
            absolute_url = urljoin( url, relative_url )
            if not self.skip_url( absolute_url, relative_url ):
                self.visit( absolute_url, depth + 1 )

    def skip_url( self, absolute_url, relative_url ):
        return absolute_url in self.visited_urls \
            or  ':' in absolute_url \
            or absolute_url.startswith( settings.STATIC_URL ) \
            or relative_url.startswith( '#' )

    def get_refs( self, text ):
        urls = set()
        urls.update( self.p1.findall( text ) )
        urls.update( self.p2.findall( text ) )
        return urls
👤wuerg

Leave a comment