Django Redshift Backend¶
This is a Amazon Redshift database backend for Django.
License¶
Apache Software License
Contents¶
Basic¶
Installation¶
Please install django-redshift-backend with using pip (8.1.1 or later).
$ pip install django-redshift-backend
This backend requires psycopg2
, which may be installed from source or wheel (pre-built binaries).
If you don’t want to specify it separately, you may install it using extra:
# For pre-built binary
$ pip install django-redshift-backend[psycopg2-binary]
# For the source distribution
$ pip install django-redshift-backend[psycopg2]
Please refer to the psycopg2 documentation for more details on the topic.
Django settings¶
ENGINE for DATABASES is ‘django_redshift_backend’. You can set the name in your settings.py as:
DATABASES = {
'default': {
'ENGINE': 'django_redshift_backend',
'NAME': '<your database name>',
'USER': '<your database username>',
'PASSWORD': '<your database password>',
'HOST': '<your database hostname>',
'PORT': '5439',
}
}
For more information, please refer References.
References¶
Differences from postgres_psycopg2 backend¶
Type mapping:
- ‘integer identity(1, 1)’ for AutoField
- ‘bigint identity(1, 1)’ for BigAutoField
- ‘timestamp with time zone’ for DateTimeField
- ‘varchar(max)’ for TextField
- ‘varchar(32)’ for UUIDField
- Possibility to multiply VARCHAR length to support utf-8 string, using REDSHIFT_VARCHAR_LENGTH_MULTIPLIER setting.
Stop using:
- RETURNING (single insert and bulk insert)
- SELECT FOR UPDATE
- SELECT DISTINCT ON
- SET CONSTRAINTS
- INDEX
- DEFERRABLE INITIALLY DEFERRED
- CONSTRAINT
- CHECK
- DROP DEFAULT
To support migration:
- To add column to existent table on Redshift, column must be nullable
- To support modify column, add new column -> data migration -> drop old column -> rename
Please note that the migration support for redshift is not perfect yet.
Note and Limitation¶
Amazon Redshift doesn’t support RETURNING, so last_insert_id
method retrieve MAX(pk) after insertion as a workaround.
refs:
In some case, MAX(pk) workaround does not work correctly. Bulk insertion makes non-contiguous IDs like: 1, 4, 7, 10, … and single insertion after such bulk insertion generates strange id value like 2 (smallest non-used id).
Django Settings¶
settings.DATABASES¶
ENGINE: | Set ‘django_redshift_backend’. |
---|---|
NAME: | Set ‘<your database name>’. |
USER: | Set ‘<your database username>’. |
PASSWORD: | Set ‘<your database password>’. |
HOST: | Set ‘<your database hostname>’. |
PORT: | Set your Redshift server port number. Maybe ‘5439’. |
settings.REDSHIFT_VARCHAR_LENGTH_MULTIPLIER¶
Possibility to multiply VARCHAR length to support utf-8 string. Default is 1.
Django Models¶
Using sortkey¶
There is built-in support for this option for Django >= 1.9. To use sortkey, define an ordering on the model meta with the custom sortkey type django_redshift_backend.SortKey as follow:
class MyModel(models.Model):
...
class Meta:
ordering = [SortKey('col2')]
SortKey in ordering are also valid as ordering in Django.
N.B.: there is no validation of this option, instead we let Redshift validate it for you. Be sure to refer to the documentation.
Using distkey¶
There is built-in support for this option for Django >= 1.11. To use distkey, define an index on the model meta with the custom index type django_redshift_backend.DistKey with fields naming a single field:
class MyModel(models.Model):
...
class Meta:
indexes = [DistKey(fields=['customer_id'])]
Redshift doesn’t have conventional indexes, and we don’t generate SQL for them. We merely use indexes as a convenient place in the Meta to identify the distkey.
You will likely encounter the following complication:
Inlining Index Migrations¶
Django’s makemigrations generates a migration file that first applies a CreateModel operation without the indexes option, and then adds the index in a separate AddIndex operation.
However Redshift requires that the distkey be specified at table creation. As a result, you may need to manually edit your migration files to move the index creation into the initial CreateModel.
That is, to go from:
operations = [
...
migrations.CreateModel(
name='FactTable',
fields=[
('distkeycol', models.CharField()),
('measure1', models.IntegerField()),
('measure2', models.IntegerField())
...
]
),
...
migrations.AddIndex(
model_name='facttable',
index=django_redshift_backend.DistKey(fields=['distkeycol'], name='...'),
),
]
To:
operations = [
...
migrations.CreateModel(
name='FactTable',
fields=[
('distkeycol', models.CharField()),
('measure1', models.IntegerField()),
('measure2', models.IntegerField())
...
],
options={
'indexes': [django_redshift_backend.DistKey(fields=['distkeycol'], name='...')],
},
),
...
]
Inlining ForeignKey Migrations¶
It is common to distribute fact tables on a foreign key column referencing the primary key of a dimension table.
In this case you may also encounter the following added complication:
Django’s makemigrations generates a migration file that first applies a CreateModel operation without the ForeignKey column, and then adds the ForeignKey column in a separate AddField operation. It does this to avoid attempts to create foreign key constraints against tables that haven’t been created yet.
However Redshift requires that the distkey be specified at table creation. As a result, you may need to manually edit your migration files to move the ForeignKey column into the initial CreateModel, while also ensuring that the referenced table appears before the referencing table in the file.
That is, to go from:
operations = [
...
migrations.CreateModel(
name='FactTable',
fields=[
('measure1', models.IntegerField()),
('measure2', models.IntegerField())
...
]
),
...
migrations.CreateModel(
name='Dimension1Table',
fields=[
...
]
),
...
migrations.AddField(
model_name='facttable',
name='dim1',
field=models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='myapp.Dimension1Table'),
),
...
]
To:
operations = [
migrations.CreateModel(
name='Dimension1Table',
fields=[
...
]
),
...
migrations.CreateModel(
name='FactTable',
fields=[
('measure1', models.IntegerField()),
('measure2', models.IntegerField()),
('dim1', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='myapp.Dimension1Table'))
...
]
),
...
]
Development¶
Contribution Guideline¶
This is a Jazzband project. By contributing you agree to abide by the Contributor Code of Conduct and follow the guidelines.
Setup development environment¶
Requires supported Python version
do setup under django-redshift-backend.git repository root as:
$ pip install -U pip setuptools $ pip install -r dev-requires.txt
Testing¶
Run test¶
Just run tox:
$ tox
tox have several sections for testing.
To test the database migration as well, start postgres and test it as follows:
$ cd tests
$ docker-compose up -d
$ TEST_WITH_POSTGRES=1 tox
CI (Continuous Integration)¶
All tests will be run on Github Actions:
Releasing¶
New package version¶
The django-redshift-backend package will be uploaded to PyPI: https://pypi.org/project/django-redshift-backend/.
Here is a release procefure for releasing.
Procedure:
- check CI status testing result: https://github.com/jazzband/django-redshift-backend/actions?query=workflow%3ATest
- update release version/date in
CHANGES.rst
- create Github release, tagging with version name that MUST following semver. e.g.:
git tag 1.0.1
- publish Github release to invoke release process in Github Actions.
- approve release files. please check your email or https://jazzband.co/projects/django-redshift-backend
- check PyPI page: https://pypi.org/p/django-redshift-backend
- bump version in
CHANGES.rst
and commit/push them onto GitHub
Updated documentation¶
Sphinx documentation under doc/
directory on the master branch will be automatically uploaded into ReadTheDocs: https://django-redshift-backend.rtfd.io/.
CHANGES¶
4.0.0 (Unreleased)¶
General:
Incompatible Changes:
Features:
- #83 Drop Python-3.6 support.
- #83 Drop Django-2.2 support.
Bug Fixes:
3.0.0 (2022/02/27)¶
General:
- #87 Drop py2 wheel tag from release package file.
- Add CODE_OF_CONDUCT.rst The linked text which has been referred to from CONTRIBUTING.rst is now included.
Incompatible Changes:
#97 To specify SORTKEY for Redshift, you must use django_redshift_backend.SortKey for Model.Meta.ordering instead of bearer string.
IMPORTANT: With this change, existing migration files that specify ordering are not affected. If you want to apply SortKey to your migration files, please comment out the ordering option once and run makemigrations, then comment in the ordering option and run makemigrations again.
#97 django_redshift_backend.distkey.DistKey is moved to django_redshift_backend.DistKey. However old name is still supported for a compatibility.
#97 Now django-redshift-backend doesn’t support can_rollback_ddl. Originally, Redshift did not support column name/type(size) changes within a transaction. Please refer https://github.com/jazzband/django-redshift-backend/issues/96
#97 changed the behavior of implicit not null column addition. previously, adding a not null column was implicitly changed to allow null. now adding not null without default raises a programmingerror exception.
Features:
- #82 Add Python-3.10 support.
- #98 Add Django-4.0 support.
- #82 Drop Django-3.0 support.
- #98 Drop Django-3.1 support.
- #90,#13,#8: Support manage.py inspectdb, also support working with the django-sql-explorer package. Thanks to Matt Fisher.
- #63 Support changing a field from NOT NULL to NULL on migrate / sqlmigrate.
- #97 Support VARCHAR size changing for UNIQUE, PRIMARY KEY, FOREIGN KEY.
- #97 Support backward migration for DROP NOT NULL column wituout DEFAULT. One limitation is that the DEFAULT value is set to match the type. This is because the only way for Redshift to add NOT NULL without default is to recreate the table.
Bug Fixes:
- #92,#93: since django-3.0 sqlmigrate (and migrate) does not work.
- #37: fix Django contenttype migration that cause ProgrammingError: cannot drop sortkey column “name” exception.
- #64: fix Django auth migration that cause NotSupportedError: column “content_type__app_label” specified as distkey/sortkey is not in the table “auth_permission” exception.
2.1.0 (2021/09/23)¶
General:
- #76 fix test failing on django-dev with py36,py37
- #77 Mondernize setup.cfg and pyproject.toml
Features:
- #81 Add Django 3.2 support.
Bug Fixes:
- #80 uuid field doesn’t work correctly with django 2.x and 3.x. Thanks to xavier-lr.
2.0.1 (2021/03/07)¶
Bug Fixes:
- #74: set supports_aggregate_filter_clause=False (since Django-2.0) to disable FILTER WHERE syntax. Thanks to Michael Wheeler.
- #73: fix broken feature flags since Django-3.0: can_return_columns_from_insert and can_return_rows_from_bulk_insert. Thanks to Agustín Magaña.
2.0.0 (2021/01/04)¶
General:
- #70,#71,#72 Moved CI to GitHub Actions: https://github.com/jazzband/django-redshift-backend/actions Thkanks to Bruno Alla.
Features:
- Drop Python 2.7 and 3.5 support.
- Drop Django 1.11, 2.0 and 2.1 support.
- #68 Add Python 3.8 and 3.9 support.
- #68 Add Django 3.0 and 3.1 support.
Bug Fixes:
- #69 Let users choose between psycopg2 binary or source. Thkanks to Bruno Alla.
- #65,#66 Deprecation warning due to invalid escape sequences. Thanks to Karthikeyan Singaravelan.
Documentation:
- #67 Just a typo cleanup from refs.rst. Thanks to Kostja P.
1.1.0 (2019/08/02)¶
- #60 Change dependencies to support Python 3.7 Thanks to Asher Foa.
1.0.0 (2019/01/29)¶
General:
- The first release from Jazzband organization.
- Using Development Status :: 5 - Production/Stable.
- All GitHub/Travis/other URLs in this product has been migrated to /jazzband/.
New Features:
- #56 Support Django 2.1.
- #57 Support Python 3.7
Bug Fixes:
- #53,#54: UUIDField django model field will cause clash. Thanks to Corentin Dupret.
Development:
- Adopt setuptools_scm for versioning from git tag.
0.9.1 (2018-09-29)¶
- fix trove classifier ‘License’ from BSD to Apache.
- Documentation: Add Contribution Guideline
0.9 (2018-07-24)¶
- #35: Drop support for Django 1.8, 1.9 and 1.10.
- #40: Support Django 2.0.
- #42: Support DISTKEY. Thanks to Benjy Weinberger.
- Documentation: https://django-redshift-backend.rtfd.io/
- Change LICENSE from ‘BSD License’ to ‘Apache Software License’
0.8.1 (2018-06-19)¶
- #38: Fix 0.8 doesn’t compatible with Python 2. Thanks to Benjy Weinberger.
0.8 (2018-06-01)¶
Incompatible Changes:
#23,#10: Redshift support time zones in time stamps for migration
IMPORTANT: With this change, the newly created DateTimeField column will be timestamp with timezone (TIMESTAMPTZ) by migration. Therefore, the existing DateTimeField and the new DateTimeField will have different data types as a redshift schema column type. There are no migration feature by django-redshift-backend. see also: https://github.com/jazzband/django-redshift-backend/pull/23
New Features:
- #20,#26: Support for sortkey. Thanks to Maxime Vdb and Kosei Kitahara.
- #24: Add UUIDField support. Thanks to Sindri Guðmundsson.
- #14: More compat with redshift: not use SELECT DISTINCT ON.
Bug Fixes:
- #15,#21: More compat with redshift: not use CHECK. Thanks to Vasil Vangelovski.
- #18: Fix error on migration with django-1.9 or later that raises AttributeError of ‘sql_create_table_unique’.
- #27: annotate() does not work on Django-1.9 and later. Thanks to Takayuki Hirai.
Documentation:
- Add documentation: https://django-redshift-backend.rtfd.io/
0.7 (2017-06-08)¶
- Drop Python-3.4
- Drop Django-1.7
- Support Python-3.6
- Support Django-1.11
0.6 (2016-12-15)¶
- Fix crush problem when using bulk insert.
0.5 (2016-10-05)¶
- Support Django-1.10
- #9: Add support for BigAutoField. Thanks to Maxime Vdb.
- Fix crush problem on sqlmigrate when field modified.
0.4 (2016-05-17)¶
- Support Python-3.4 and 3.5
- #7: Restore support django-1.7. Version 0.3 doesn’t support django-1.7.
- #4: More compat with redshift: not use SET CONSTRAINTS. Thanks to Maxime Vdb.
- #6: More compat with redshift: not use sequence reset query. Thanks to Maxime Vdb.
- #5: Add REDSHIFT_VARCHAR_LENGTH_MULTIPLIER settings. Thanks to Maxime Vdb.
- Support column type changing on migration.
0.3 (2016-05-14)¶
- #3: more compat with Redshift (AutoField, DateTimeField, Index). Thanks to Maxime Vdb.
- More compat with redshift: add TextField
- More compat with redshift: not use DEFERRABLE, CONSTRAINT, DROP DEFAULT
- More compat with redshift: support modify column
0.2.1 (2016-02-01)¶
- “SET TIME_ZONE” warning is changed as debug log for ‘django.db.backend’ logger.
0.2 (2016-01-08)¶
- Disable “SET TIME_ZONE” SQL execution even if settings.TIME_ZONE is specified.
0.1.2 (2015-06-5)¶
- Support Django-1.8
0.1.1 (2015-03-27)¶
- Disable “SELECT FOR UPDATE” SQL execution.
0.1 (2015-03-24)¶
- Support Django-1.7
- Support “INSERT INTO” SQL execution without “RETURNING” clause.
AUTHORS¶
- Takayuki Shimizukawa <https://github.com/shimizukawa>
- Kosei Kitahara <https://github.com/Surgo>
- Evandro Myller <https://github.com/emyller>
- Maxime Vdb <https://github.com/m-vdb>
Code of Conduct¶
As contributors and maintainers of the Jazzband projects, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
We are committed to making participation in the Jazzband a harassment-free experience for everyone, regardless of the level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, or nationality.
Examples of unacceptable behavior by participants include:
- The use of sexualized language or imagery
- Personal attacks
- Trolling or insulting/derogatory comments
- Public or private harassment
- Publishing other’s private information, such as physical or electronic addresses, without explicit permission
- Other unethical or unprofessional conduct
The Jazzband roadies have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
By adopting this Code of Conduct, the roadies commit themselves to fairly and consistently applying these principles to every aspect of managing the jazzband projects. Roadies who do not follow or enforce the Code of Conduct may be permanently removed from the Jazzband roadies.
This code of conduct applies both within project spaces and in public spaces when an individual is representing the project or its community.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the roadies at roadies@jazzband.co. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. Roadies are obligated to maintain confidentiality with regard to the reporter of an incident.
This Code of Conduct is adapted from the Contributor Covenant, version 1.3.0, available at https://contributor-covenant.org/version/1/3/0/