配置关系如何连接

relationship() will normally create a join between two tables by examining the foreign key relationship between the two tables to determine which columns should be compared. 有多种情况需要定制此行为。

处理多个连接路径

处理的最常见情况之一是两个表之间有多个外键路径。

考虑一个包含Address类的两个外键的Customer类:

from sqlalchemy import Integer, ForeignKey, String, Column
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship

Base = declarative_base()

class Customer(Base):
    __tablename__ = 'customer'
    id = Column(Integer, primary_key=True)
    name = Column(String)

    billing_address_id = Column(Integer, ForeignKey("address.id"))
    shipping_address_id = Column(Integer, ForeignKey("address.id"))

    billing_address = relationship("Address")
    shipping_address = relationship("Address")

class Address(Base):
    __tablename__ = 'address'
    id = Column(Integer, primary_key=True)
    street = Column(String)
    city = Column(String)
    state = Column(String)
    zip = Column(String)

上面的映射,当我们尝试使用它时,会产生错误:

sqlalchemy.exc.AmbiguousForeignKeysError: Could not determine join
condition between parent/child tables on relationship
Customer.billing_address - there are multiple foreign key
paths linking the tables.  Specify the 'foreign_keys' argument,
providing a list of those columns which should be
counted as containing a foreign key reference to the parent table.

上面的消息很长。relationship()可以返回很多潜在的消息,这些消息经过精心定制以检测各种常见配置问题;大多数人会建议解决歧义或其他缺失信息所需的额外配置。

在这种情况下,消息希望我们通过指示每个关键字列应该被考虑,来限定每个relationship(),并且适当的格式如下:

class Customer(Base):
    __tablename__ = 'customer'
    id = Column(Integer, primary_key=True)
    name = Column(String)

    billing_address_id = Column(Integer, ForeignKey("address.id"))
    shipping_address_id = Column(Integer, ForeignKey("address.id"))

    billing_address = relationship("Address", foreign_keys=[billing_address_id])
    shipping_address = relationship("Address", foreign_keys=[shipping_address_id])

在上面,我们指定了foreign_keys参数,它是ColumnColumn对象的列表,指示那些列被视为“外部”换句话说,包含引用父表的值的列。Loading the Customer.billing_address relationship from a Customer object will use the value present in billing_address_id in order to identify the row in Address to be loaded; similarly, shipping_address_id is used for the shipping_address relationship. 两列的联系在持续期间也起着作用;刚刚插入的Address对象的新生成的主键将在刷新期间被复制到关联Customer对象的相应外键列中。

使用Declarative指定foreign_keys时,我们也可以使用字符串名称来指定,但是如果使用列表,则列表是字符串的一部分,这一点很重要:

billing_address = relationship("Address", foreign_keys="[Customer.billing_address_id]")

在这个特定的例子中,因为只有一个Column我们需要:

billing_address = relationship("Address", foreign_keys="Customer.billing_address_id")

在版本0.8中更改: relationship()可以单独根据foreign_keys参数解决外键目标之间的歧义;在这种情况下,不再需要primaryjoin参数。

指定交替连接条件

构造连接时,relationship()的默认行为是,它将一侧的主键列的值与另一侧的外键引用列的值相等。我们可以将此标准更改为我们希望使用primaryjoin参数的任何内容,以及在使用“辅助”表格的情况下使用secondaryjoin参数。

在下面的示例中,使用User类以及存储街道地址的Address类,我们创建一个关系boston_addresses加载指定城市“波士顿”的Address对象:

from sqlalchemy import Integer, ForeignKey, String, Column
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship

Base = declarative_base()

class User(Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    boston_addresses = relationship("Address",
                    primaryjoin="and_(User.id==Address.user_id, "
                        "Address.city=='Boston')")

class Address(Base):
    __tablename__ = 'address'
    id = Column(Integer, primary_key=True)
    user_id = Column(Integer, ForeignKey('user.id'))

    street = Column(String)
    city = Column(String)
    state = Column(String)
    zip = Column(String)

在这个字符串的SQL表达式中,我们使用and_()连接结构为连接条件建立两个不同的谓词 - 连接User.idAddress.user_id列,并将Address中的行限制为city='Boston'使用声明时,像and_()这样的基本SQL函数可以在字符串relationship()参数的计算命名空间中自动使用。

我们在primaryjoin中使用的自定义标准通常仅在SQLAlchemy呈现SQL以加载或表示此关系时才有意义。也就是说,它被用于为了执行每个属性的延迟加载而发出的SQL语句中,或者在查询时构建连接(例如通过Query.join())或通过急切的“加入”或“子查询”加载样式。当内存中的对象被操作时,我们可以将任何Address对象放入boston_addresses集合中,而不管.city属性是。对象将保留在集合中,直到属性过期并从应用该条件的数据库重新加载。当发生刷新时,无条件刷新boston_addresses内部的对象,将主键user.id列的值分配到外键持有的address.user_id列。city条件在这里没有效果,因为flush过程只关心将主键值同步到引用外键值中。

创建自定义外部条件

主要连接条件的另一个要素是如何确定那些被认为是“外来”的列。通常,Column对象的一些子集将指定ForeignKey,或者以其他方式成为与联接条件相关的ForeignKeyConstraint的一部分。relationship()在它决定应该如何加载和保存这种关系的数据时,会查看这个外键状态。但是,primaryjoin参数可用于创建不涉及任何“模式”级外键的连接条件。我们可以明确地将primaryjoinforeign_keysremote_side结合起来,以建立这样的连接。

下面,一个类HostEntry连接到它自己,将字符串content列等同于ip_address列,这是一个名为INET我们需要使用cast()来将连接的一边转换为另一边的类型:

from sqlalchemy import cast, String, Column, Integer
from sqlalchemy.orm import relationship
from sqlalchemy.dialects.postgresql import INET

from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class HostEntry(Base):
    __tablename__ = 'host_entry'

    id = Column(Integer, primary_key=True)
    ip_address = Column(INET)
    content = Column(String(50))

    # relationship() using explicit foreign_keys, remote_side
    parent_host = relationship("HostEntry",
                        primaryjoin=ip_address == cast(content, INET),
                        foreign_keys=content,
                        remote_side=ip_address
                    )

上面的关系会产生一个连接,如:

SELECT host_entry.id, host_entry.ip_address, host_entry.content
FROM host_entry JOIN host_entry AS host_entry_1
ON host_entry_1.ip_address = CAST(host_entry.content AS INET)

An alternative syntax to the above is to use the foreign() and remote() annotations, inline within the primaryjoin expression. 此语法表示relationship()通常自身应用于给定foreign_keysremote_side参数的连接条件的注释。当显式连接条件存在时,这些函数可能更简洁,并且还用于无论该列是多次声明还是在复杂的SQL表达式中,都精确地标记“外来”或“远程”列:

from sqlalchemy.orm import foreign, remote

class HostEntry(Base):
    __tablename__ = 'host_entry'

    id = Column(Integer, primary_key=True)
    ip_address = Column(INET)
    content = Column(String(50))

    # relationship() using explicit foreign() and remote() annotations
    # in lieu of separate arguments
    parent_host = relationship("HostEntry",
                        primaryjoin=remote(ip_address) == \
                                cast(foreign(content), INET),
                    )

在连接条件中使用自定义运算符

Another use case for relationships is the use of custom operators, such as Postgresql’s “is contained within” << operator when joining with types such as postgresql.INET and postgresql.CIDR. 对于自定义运算符,我们使用Operators.op()函数:

inet_column.op("<<")(cidr_column)

然而,如果我们使用这个运算符来构造primaryjoin,那么relationship()仍然需要更多的信息。This is because when it examines our primaryjoin condition, it specifically looks for operators used for comparisons, and this is typically a fixed list containing known comparison operators such as ==, <, etc. 因此,对于我们的自定义操作员参与此系统,我们需要使用is_comparison参数将其注册为比较运算符:

inet_column.op("<<", is_comparison=True)(cidr_column)

一个完整的例子:

class IPA(Base):
    __tablename__ = 'ip_address'

    id = Column(Integer, primary_key=True)
    v4address = Column(INET)

    network = relationship("Network",
                        primaryjoin="IPA.v4address.op('<<', is_comparison=True)"
                            "(foreign(Network.v4representation))",
                        viewonly=True
                    )
class Network(Base):
    __tablename__ = 'network'

    id = Column(Integer, primary_key=True)
    v4representation = Column(CIDR)

以上,一个查询如:

session.query(IPA).join(IPA.network)

将呈现为:

SELECT ip_address.id AS ip_address_id, ip_address.v4address AS ip_address_v4address
FROM ip_address JOIN network ON ip_address.v4address << network.v4representation

版本0.9.2中的新功能: - 添加了Operators.op.is_comparison标志来帮助使用自定义运算符创建relationship()结构。

重叠外键

在使用组合外键时会出现一种罕见的情况,例如,一列可能是通过外键约束引用的多列的主题。

Consider an (admittedly complex) mapping such as the Magazine object, referred to both by the Writer object and the Article object using a composite primary key scheme that includes magazine_id for both; then to make Article refer to Writer as well, Article.magazine_id is involved in two separate relationships; Article.magazine and Article.writer:

class Magazine(Base):
    __tablename__ = 'magazine'

    id = Column(Integer, primary_key=True)


class Article(Base):
    __tablename__ = 'article'

    article_id = Column(Integer)
    magazine_id = Column(ForeignKey('magazine.id'))
    writer_id = Column()

    magazine = relationship("Magazine")
    writer = relationship("Writer")

    __table_args__ = (
        PrimaryKeyConstraint('article_id', 'magazine_id'),
        ForeignKeyConstraint(
            ['writer_id', 'magazine_id'],
            ['writer.id', 'writer.magazine_id']
        ),
    )


class Writer(Base):
    __tablename__ = 'writer'

    id = Column(Integer, primary_key=True)
    magazine_id = Column(ForeignKey('magazine.id'), primary_key=True)
    magazine = relationship("Magazine")

当上面的映射配置完成后,我们会看到这个警告发出:

SAWarning: relationship 'Article.writer' will copy column
writer.magazine_id to column article.magazine_id,
which conflicts with relationship(s): 'Article.magazine'
(copies magazine.id to article.magazine_id). Consider applying
viewonly=True to read-only relationships, or provide a primaryjoin
condition marking writable columns with the foreign() annotation.

What this refers to originates from the fact that Article.magazine_id is the subject of two different foreign key constraints; it refers to Magazine.id directly as a source column, but also refers to Writer.magazine_id as a source column in the context of the composite key to Writer. If we associate an Article with a particular Magazine, but then associate the Article with a Writer that’s associated with a different Magazine, the ORM will overwrite Article.magazine_id non-deterministically, silently changing which magazine we refer towards; it may also attempt to place NULL into this columnn if we de-associate a Writer from an Article. 警告让我们知道这是事实。

为了解决这个问题,我们需要打破Article的行为,以包含以下所有三个功能:

  1. Article first and foremost writes to Article.magazine_id based on data persisted in the Article.magazine relationship only, that is a value copied from Magazine.id.
  2. Article can write to Article.writer_id on behalf of data persisted in the Article.writer relationship, but only the Writer.id column; the Writer.magazine_id column should not be written into Article.magazine_id as it ultimately is sourced from Magazine.id.
  3. Article takes Article.magazine_id into account when loading Article.writer, even though it doesn’t write to it on behalf of this relationship.

To get just #1 and #2, we could specify only Article.writer_id as the “foreign keys” for Article.writer:

class Article(Base):
    # ...

    writer = relationship("Writer", foreign_keys='Article.writer_id')

但是,当查询Writer时,这会影响Article.writer不考虑Article.magazine_id

SELECT article.article_id AS article_article_id,
    article.magazine_id AS article_magazine_id,
    article.writer_id AS article_writer_id
FROM article
JOIN writer ON writer.id = article.writer_id

因此,为了充分利用#1,#2和#3,我们通过将primaryjoin完整地与foreign_keys参数,或者通过使用foreign()进行注释更简洁:

class Article(Base):
    # ...

    writer = relationship(
        "Writer",
        primaryjoin="and_(Writer.id == foreign(Article.writer_id), "
                    "Writer.magazine_id == Article.magazine_id)")

版本1.0.0中已更改: ORM将尝试警告何时将列同时用作来自多个关系的同步目标。

非关系比较/物化路径

警告

本节详细介绍了一个实验功能。

使用自定义表达式意味着我们可以产生不符合常规主/外关键模型的非正统连接条件。一个这样的例子是物化路径模式,在这里我们比较重叠路径令牌的字符串以产生树结构。

通过仔细使用foreign()remote(),我们可以建立一种有效生成基本物化路径系统的关系。基本上,当foreign()remote()位于比较表达式的相同一侧时,该关系被认为是“one to许多”;当他们在不同方面时,这种关系被认为是“多对一”。为了比较我们将在这里使用,我们将处理集合,以便将事物配置为“一对多”:

class Element(Base):
    __tablename__ = 'element'

    path = Column(String, primary_key=True)

    descendants = relationship('Element',
                           primaryjoin=
                                remote(foreign(path)).like(
                                        path.concat('/%')),
                           viewonly=True,
                           order_by=path)

上面,如果给定一个Element对象,其路径属性为"/foo/bar2",我们寻找一个Element.descendants看起来像:

SELECT element.path AS element_path
FROM element
WHERE element.path LIKE ('/foo/bar2' || '/%') ORDER BY element.path

版本0.9.5中的新增功能:已添加支持以允许在primaryjoin条件以及使用ColumnOperators.like()

自我参照多对多关系

多对多关系可以由primaryjoinsecondaryjoin中的一个或两个自定义 - 后者对于指定使用secondary参数。涉及使用primaryjoinsecondaryjoin的常见情况是在建立从类到自身的多对多关系时,如下所示:

from sqlalchemy import Integer, ForeignKey, String, Column, Table
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship

Base = declarative_base()

node_to_node = Table("node_to_node", Base.metadata,
    Column("left_node_id", Integer, ForeignKey("node.id"), primary_key=True),
    Column("right_node_id", Integer, ForeignKey("node.id"), primary_key=True)
)

class Node(Base):
    __tablename__ = 'node'
    id = Column(Integer, primary_key=True)
    label = Column(String)
    right_nodes = relationship("Node",
                        secondary=node_to_node,
                        primaryjoin=id==node_to_node.c.left_node_id,
                        secondaryjoin=id==node_to_node.c.right_node_id,
                        backref="left_nodes"
    )

如上所述,SQLAlchemy无法自动知道哪些列应该连接到right_nodesleft_nodes关系的哪些列。primaryjoinsecondaryjoin参数确定了我们想要如何加入关联表。在上面的声明式表格中,当我们在与Node类相对应的Python块中声明这些条件时,id变量可直接作为Column

Alternatively, we can define the primaryjoin and secondaryjoin arguments using strings, which is suitable in the case that our configuration does not have either the Node.id column object available yet or the node_to_node table perhaps isn’t yet available. 当在一个声明性字符串中引用一个普通的Table对象时,我们使用该表的字符串名称,因为它存在于MetaData中:

class Node(Base):
    __tablename__ = 'node'
    id = Column(Integer, primary_key=True)
    label = Column(String)
    right_nodes = relationship("Node",
                        secondary="node_to_node",
                        primaryjoin="Node.id==node_to_node.c.left_node_id",
                        secondaryjoin="Node.id==node_to_node.c.right_node_id",
                        backref="left_nodes"
    )

这里的经典映射情况是类似的,其中node_to_node可以连接到node.c.id

from sqlalchemy import Integer, ForeignKey, String, Column, Table, MetaData
from sqlalchemy.orm import relationship, mapper

metadata = MetaData()

node_to_node = Table("node_to_node", metadata,
    Column("left_node_id", Integer, ForeignKey("node.id"), primary_key=True),
    Column("right_node_id", Integer, ForeignKey("node.id"), primary_key=True)
)

node = Table("node", metadata,
    Column('id', Integer, primary_key=True),
    Column('label', String)
)
class Node(object):
    pass

mapper(Node, node, properties={
    'right_nodes':relationship(Node,
                        secondary=node_to_node,
                        primaryjoin=node.c.id==node_to_node.c.left_node_id,
                        secondaryjoin=node.c.id==node_to_node.c.right_node_id,
                        backref="left_nodes"
                    )})

请注意,在这两个示例中,backref关键字指定一个left_nodes backref - 当relationship()创建反方向的第二个关系时,足以反转primaryjoinsecondaryjoin参数。

复合“次要”加入

注意

本节介绍SQLAlchemy的一些新增功能和实验功能。

有时,当人们试图在两个表之间建立一个relationship()时,为了加入它们,需要多于两个或三个表参与。这是relationship()的一个区域,它试图推动可能的边界,并且通常需要在SQLAlchemy邮件列表上敲定这些特殊用例的最终解决方案。

在更新版本的SQLAlchemy中,为了提供由多个表组成的复合目标,可以在一些情况下使用secondary参数。以下是这种连接条件的示例(要求版本0.9.2至少按照原样运行):

class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    b_id = Column(ForeignKey('b.id'))

    d = relationship("D",
                secondary="join(B, D, B.d_id == D.id)."
                            "join(C, C.d_id == D.id)",
                primaryjoin="and_(A.b_id == B.id, A.id == C.a_id)",
                secondaryjoin="D.id == B.d_id",
                uselist=False
                )

class B(Base):
    __tablename__ = 'b'

    id = Column(Integer, primary_key=True)
    d_id = Column(ForeignKey('d.id'))

class C(Base):
    __tablename__ = 'c'

    id = Column(Integer, primary_key=True)
    a_id = Column(ForeignKey('a.id'))
    d_id = Column(ForeignKey('d.id'))

class D(Base):
    __tablename__ = 'd'

    id = Column(Integer, primary_key=True)

In the above example, we provide all three of secondary, primaryjoin, and secondaryjoin, in the declarative style referring to the named tables a, b, c, d directly. AD的查询如下所示:

sess.query(A).join(A.d).all()

SELECT a.id AS a_id, a.b_id AS a_b_id FROM a JOIN ( b AS b_1 JOIN d AS d_1 ON b_1.d_id = d_1.id JOIN c AS c_1 ON c_1.d_id = d_1.id) ON a.b_id = b_1.id AND a.id = c_1.a_id JOIN d ON d.id = b_1.d_id

在上面的例子中,我们利用能够将多个表填充到“辅助”容器中,以便我们可以跨多个表加入,同时仍然保持relationship()的“简单”在“左”和“右”方面都有“一张”表;复杂性保持在中间。

版本0.9.2中的新功能:支持将join()结构直接用作secondary参数的目标,包括对连接,急切连接和延迟加载的支持,以及在声明式中支持指定复杂条件(如包含类名称作为目标的连接)的支持。

与非主映射器的关系

在上一节中,我们举例说明了一种技术,我们使用secondary为了在连接条件中放置其他表。有一个复杂的连接案例,即使这种技术是不够的;当我们试图从A加入B时,可以使用任何数量的CD等。在这之间,但是也有AB 直接之间的连接条件。In this case, the join from A to B may be difficult to express with just a complex primaryjoin condition, as the intermediary tables may need special handling, and it is also not expressable with a secondary object, since the A->secondary->B pattern does not support any references between A and B directly. 当这个非常高级的情况出现时,我们可以求助于创建第二个映射作为关系的目标。这是我们使用mapper()的映射,以便映射到包含我们需要的所有附加表的类。为了生成这个映射器作为我们类的“替代”映射,我们使用non_primary标志。

下面通过从AB的简单连接示出了relationship(),但是主连接条件增加了两个附加实体CD,它们也必须具有与AB中的行同时排列的行:

class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    b_id = Column(ForeignKey('b.id'))

class B(Base):
    __tablename__ = 'b'

    id = Column(Integer, primary_key=True)

class C(Base):
    __tablename__ = 'c'

    id = Column(Integer, primary_key=True)
    a_id = Column(ForeignKey('a.id'))

class D(Base):
    __tablename__ = 'd'

    id = Column(Integer, primary_key=True)
    c_id = Column(ForeignKey('c.id'))
    b_id = Column(ForeignKey('b.id'))

# 1. set up the join() as a variable, so we can refer
# to it in the mapping multiple times.
j = join(B, D, D.b_id == B.id).join(C, C.id == D.c_id)

# 2. Create a new mapper() to B, with non_primary=True.
# Columns in the join with the same name must be
# disambiguated within the mapping, using named properties.
B_viacd = mapper(B, j, non_primary=True, properties={
    "b_id": [j.c.b_id, j.c.d_b_id],
    "d_id": j.c.d_id
    })

A.b = relationship(B_viacd, primaryjoin=A.b_id == B_viacd.c.b_id)

在上面的例子中,当我们查询时,我们的B的非主映射器会发出额外的列;这些可以被忽略:

sess.query(A).join(A.b).all()

SELECT a.id AS a_id, a.b_id AS a_b_id FROM a JOIN (b JOIN d ON d.b_id = b.id JOIN c ON c.id = d.c_id) ON a.b_id = b.id

构建启用查询的属性

非常雄心勃勃的自定义连接条件可能无法直接持久化,并且在某些情况下甚至可能无法正确加载。要移除等式的持久性部分,请在relationship()上使用标记viewonly,将其建立为只读属性(写入集合的数据将为在flush()上被忽略)。但是,在极端情况下,请考虑将常规Python属性与Query结合使用,如下所示:

class User(Base):
    __tablename__ = 'user'
    id = Column(Integer, primary_key=True)

    def _get_addresses(self):
        return object_session(self).query(Address).with_parent(self).filter(...).all()
    addresses = property(_get_addresses)