[Answer]-Django – UnicodeDecodeError: weird character "�"


you may try raw_html extraction: https://github.com/grangier/python-goose#known-issues

you may do some encoding/decoding with the raw html.


Maybe it helps to use unicode for all strings:
Insert from __future__ import unicode_literals at the very first line of your python file and re-try…



Try adding a little u before the string. I don’t see any weird characters there, but I usually use hebrew in my django code and the bash at the top is not always enough

article = g.extract(url=u"http://www.sportingnews.com/ncaa-football/story/2013-09-17/week-4-exit-poll-johnny-manziel-alabama-oregon-texas-mack-brown-mariota")


Even though I can’t reproduce error with this URL, I had similar problems with python-goose. Try:

from goose.configuration import Configuration
from goose import Goose

config = Configuration()
config.parser_class = 'soupparser' # this helped me
g = Goose(config)
article = g.extract(url="http://www.sportingnews.com/ncaa-football/story/2013-09-17/week-4-exit-poll-johnny-manziel-alabama-oregon-texas-mack-brown-mariota")

Leave a comment