[Solved]-Django custom management command running Scrapy: How to include Scrapy's options?


Okay, I have found a solution to my problem. It’s a bit ugly but it works. Since the Django project’s manage.py command does not accept Scrapy’s command line options, I split the options string into two arguments which are accepted by manage.py. After successful parsing, I rejoin the two arguments and pass them to Scrapy.

That is, instead of writing

python manage.py scrapy crawl domain.com -o scraped_data.json -t json

I put spaces in between the options like this

python manage.py scrapy crawl domain.com - o scraped_data.json - t json

My handle function looks like this:

def handle(self, *args, **options):
    arguments = self._argv[1:]
    for arg in arguments:
        if arg in ('-', '--'):
            i = arguments.index(arg)
            new_arg = ''.join((arguments[i], arguments[i+1]))
            del arguments[i:i+2]
            arguments.insert(i, new_arg)

    from scrapy.cmdline import execute

Meanwhile, Mikhail Korobov has provided the optimal solution. See here:

# -*- coding: utf-8 -*- 
# myapp/management/commands/scrapy.py 

from __future__ import absolute_import
from django.core.management.base import BaseCommand

class Command(BaseCommand):

    def run_from_argv(self, argv):
        self._argv = argv

    def handle(self, *args, **options):
        from scrapy.cmdline import execute


I think you’re really looking for Guideline 10 of the POSIX argument syntax conventions:

The argument — should be accepted as a delimiter indicating the end of options.
Any following arguments should be treated as operands, even if they begin with
the ‘-‘ character. The — argument should not be used as an option or as an operand.

Python’s optparse module behaves this way, even under windows.

I put the scrapy project settings module in the argument list, so I can create separate scrapy projects in independent apps:

# <app>/management/commands/scrapy.py
from __future__ import absolute_import
import os

from django.core.management.base import BaseCommand

class Command(BaseCommand):
    def handle(self, *args, **options):
        os.environ['SCRAPY_SETTINGS_MODULE'] = args[0]
        from scrapy.cmdline import execute
        # scrapy ignores args[0], requires a mutable seq

Invoked as follows:

python manage.py scrapy myapp.scrapyproj.settings crawl domain.com -- -o scraped_data.json -t json

Tested with scrapy 0.12 and django 1.3.1

Leave a comment