6👍
Redshift (ParAccel) is an OLAP-optimised DB, based on a fork of a very old version of PostgreSQL.
It’s good at parallelised read-mostly queries across lots of data. It’s bad at many small transactions, especially many small write transactions as seen in typical OLTP workloads.
You’re partway in between. If you don’t mind a data loss window, then you could reasonably accumulate data points and have a writer thread or two write batches of them to Redshift in decent sized transactions.
If you can’t afford any data loss window and expect to be processing 50+ TPS, then don’t consider using Redshift directly. The round-trip costs alone would be horrifying. Use a local database – or even a file based append-only journal that you periodically rotate. Then periodically upload new data to Redshift for analysis.
A few other good reasons you probably shouldn’t use Redshift directly:
-
OLAP DBs with column store designs often work best with star schemas or similar structures. Such schemas are slow and inefficient for OLTP workloads as inserts and updates touch many tables, but they make querying the data along various axes for analysis much more efficient.
-
Using an ORM to talk to an OLAP DB is asking for trouble. ORMs are quite bad enough on OLTP-optimised DBs, with their unfortunate tendency toward n+1
SELECT
s and/or wasteful chained left joins, tendency to do many small inserts instead of a few big ones, etc. This will be even worse on most OLAP-optimised DBs. -
Redshift is based on a painfully old PostgreSQL with a bunch of limitations and incompatibilities. Code written for normal PostgreSQL may not work with it.
Personally I’d avoid an ORM entirely for this – I’d just accumulate data locally in an SQLite or a local PostgreSQL or something, sending multi-valued INSERT
s or using PostgreSQL’s COPY
to load chunks of data as I received it from an in-memory buffer. Then I’d use appropriate ETL tools to periodically transform the data from the local DB and merge it with what was already on the analytics server.
Now forget everything I just said and go do some benchmarks with a simulation of your app’s workload. That’s the only really useful way to tell.
1👍
In addition to Redshift’s slow transaction processing (by modern DB standards) there’s another big challenge:
Redshift only supports serializable
transaction isolation, most likely as a compromise to support ACID transactions while also optimizing for parallel OLAP mostly-read workload.
That can result in all kinds of concurrency-related failures that would not have been failures on typical DB that support read-committed isolation by default.
- Mercurial: keep 2 branches in sync but with certain persistent differences?
- Django abstract parent model save overriding
- Django on Azure not loading static files