I’ve been working throughHandbook of Relational Database Design通过Candace Fleming和Barbara Von Halle，于1989年发布，它常规地尝试以数据库不可知的方式定义建模数据的最佳做法，并解决了实现和调整数据库的实现问题。我会最终为两者都写下注，而这些是我关于定义数据模型的上半场的笔记。
Quick summary: I haven’t taken a college-level databases course, and chatting with a few folks who have, this book covers much of the material. Aspects of the book are dated, but it’s a great review of the fundamentals, and it filled in a bunch of gaps that I missed from being a practioner first without the theory behind it. Solidly no regrets for spending time with it.
Thethree-schema approachasks data modelers to develop three kinds of schema: an外部架构， 一种conceptual schema， 一种nd aninternal schema。Modeling begins with the development of a number of外部架构, each of which represents the view of a user cohort (in a rental application, perhaps the renter’s view or the landlord’s view). After collecting enough of these user views, the modeler collects them into the unifiedconceptual schemathat reconciles the many user views into a unified schema owned by the database administrator but independent of any particular storage technology. The very final step is to translate thatconceptual schemainto aninternal schema，这是底层存储技术（例如文字MySQL架构）使用的模式。
A few primary takeaways for me from this concept. FIrst, there is an actual standards document for thisproposed in 1975 by the ANSI Committee。Second, it does a very deliberate job of decoupling technology constraints from data modeling. I’ve personally been quite guilty of modeling from a technology-first perspective, and on reflection I’m certain my models are worse for having taken that initial approach.
这些笔记的其余部分主要关注概念模式，这本书通常也指的是logical data model。These notes use those terms somewhat independently, although techniques for developing theconceptual schemaare equally appropriate for developing外部架构。
Properties of effective data model
A good data model is (1) correct, (2) consistent, (3) sharable and (4) flexible. Such a model will be easy to use and remain useful as requirements evolve.
- **Correct **means that the model reflects the reality of the process it models. It doesn’t hide the complexity of the real-world behavior in creative ways, it represents it fully.
- Sharable由许多不同的应用程序模型是可用的。An accurate representation of the underlying process of renting homes should make it possible to support an application for renters applying for homes, for home owners to manage their rental properties, a billing application, an integration with a maintenance company for repairs, and so on.
- Flexiblemodels areevolvable, preserving their correctness as the real systems they represent continue to change.
What I particularly like about these properties is that they focus on the value of modeling (accurately reflect real systems) and acknowledge that most of the work in maintaining a successful model is done after the initial modeling. Successful systems must adapt to a series of increasingly unexpected requirements in order toremain successful。
Developing a data model
The most important aspect of modeling data is to take adata-driven approach。而不是从实现约束开始，忽略完全在第一次通过的那些约束，并专注于业务想要完成的事情。
Process to develop logical data model
- **Build skeletal user views. **Identify the kinds of users you are building for, and for each user document the major entities and relationships from their perspective. Use this to build a data dictionary of all data within the system (you’ll be maintaining this forever).
- Add keys to user views。首先，确定唯一定义实体（主键）的键的组合，以及将用于检索（备用键）的键的组合。其次，确定将跨模型运行的外键。最后，确定控制突变的业务规则（例如，什么是合法插入，删除和更新行为）。
- 通过归一化验证用户视图。The book introduces a number of “normal forms”, each of which is a transformation of a schema with redundant information into a schema with less redundant information. I started typing them all up, but wow, nope not doing that, just readan article on database normalization instead。If you’re less excited to memorize those rules, a rule of thumb is roughly: any modification you make to any entity should be achievable with a narrowly scoped update, e.g. the
whereclause on your query should only edit a single row.
- Determine additional attribute business rules。To quote, “It is useless to understand the structure of information within the business without understanding rules pertaining to that information’s integrity.” This section goes through defining domains for each attribute (what are the legal values for each attribute based on the thing they’re representing, maximum values, minimum values, and so on), ensuring primary keys are unique, defining rules about foreign key relationships (which are optional, which are valid, etc), and defining triggers (what are database enforced checks on mutation events).
- 整合用户视图。Now that you’ve refined each of your user views, merge them together into a unified model that incorporates each user view’s needs.
Something the book does quite nicely is walk through case studies of applying each of these particular techniques to a problem, demonstrating the techniques as they go.
- Work closely with users。Often users are brought into a modeling process late, particularly in cases where folks are taking an implementation-centric approach as opposed to a data-centric approach. This leads to less correct models. Instead start with users and work back to implementation second.
- Follow a structured methodology。Designing a data model is a产品管理格式， 一种nd benefits from the same rigorous, intentional methodology.
- Use structural and integrity constraints。在每个属性的法律值域上定义显式约束，并控制对象之间的关系的修改，以确保底层数据保持正确。
- Use conceptualization and normalization techniques。使用众多正式规范化方法将数据模型简化为其最冗余的可能形式。
- Use diagrams whenever possible。Prose models are harder for readers to intuit than diagram model, and diagrams have much higher information density.
- Build a data dictionary。自动或手动collection of all information related to your data model, in document or textual format, to pair with your diagrams and serve as an enduring source of truth.
- “实体is a person, place, thing or concept about which you wish to record facts”
- Primary keyis the “identify attribute or set of attributes”
- “Foreign key是通过识别关联实体完成关系的属性或一组属性。术语foreign传达属性“所属”或指另一个...实体的想法。“
- One-to-manyrelationship describes one entity (e.g. an author) relating to many others (e.g. books).
- One-to-onerelationship describes an entity (e.g. a book) relating to exactly one other (e.g. its publisher)
- Business rulesdescribe valid model states. They preserveintegrity通过限制假设的值，例如必填字段和可用的引用
- 触发操作are a kind of business rule which are applied during mutation (insert, update and delete)
- Integritycomes in three flavors: entity, referential and domain.实体完整性describes individual attributes, in particular that all components of a primary key exist.参照完整性governs foreign keys across tables, in particular that a foreign key in a reference does exist.Domain integrityapplies to columns, for example the data type, default value, uniqueness, and so on.
Much of the early NoSQL movement was about no longer exposing functionality that didn’t scale, forcing data modeling to directly confront the consequences of internet-scale, and that approach is the opposite of the approach described here that wants us to ignore the technology constraints until after developing the logical data model.
Similarly, most new products are created without product-market-fit, hoping to find fit through rapid iteration. This makes the idea of upfront modeling more challenging, but I also see how the absence of deliberate modeling cripples the productivity of most companies at some point in their growth curve until they’re able to remodel their data, often hidden within a larger rearchitecture project.
三个模式概念的另一个方面，我认为迷人的是重点是用户views, which I’ve effectively not seen used during my time in the industry. They are altogether a bit of an odd duck, perhaps no longer as relevant in a world where many production databases are modeled by the product engineers building on them rather than a database administrator.