Types in the Power Query M formula language

Стаття
08/09/2022

The Power Query M Formula Language is a useful and expressive data mashup language. But it does have some limitations. For example, there is no strong enforcement of the type system. In some cases, a more rigorous validation is needed. Fortunately, M provides a built-in library with support for types to make stronger validation feasible.

Developers should have a thorough understanding of the type system in-order to do this with any generality. And, while the Power Query M language specification explains the type system well, it does leave a few surprises. For example, validation of function instances requires a way to compare types for compatibility.

By exploring the M type system more carefully, many of these issues can be clarified, and developers will be empowered to craft the solutions they need.

Knowledge of predicate calculus and naïve set theory should be adequate to understand the notation used.

PRELIMINARIES

(1) B := { true; false }
B is the typical set of Boolean values

(2) N := { valid M identifiers }
N is the set of all valid names in M. This is defined elsewhere.

(3) P := ⟨B, T⟩
P is the set of function parameters. Each one is possibly optional, and has a type. Parameter names are irrelevant.

(4) Pⁿ := ⋃_0≤i≤n ⟨i, Pⁱ⟩
Pⁿ is the set of all ordered sequences of n function parameters.

(5) P^* := ⋃_0≤i≤∞ Pⁱ
P^* is the set of all possible sequences of function parameters, from length 0 on up.

(6) F := ⟨B, N, T⟩
F is the set of all record fields. Each field is possibly optional, has a name, and a type.

(7) Fⁿ := ∏_0≤i≤n F
Fⁿ is the set of all sets of n record fields.

(8) F^* := ( ⋃_0≤i≤∞ Fⁱ ) ∖ { F | ⟨b₁, n₁, t₁⟩, ⟨b₂, n₂, t₂⟩ ∈ F ⋀ n₁ = n₂ }
F^* is the set of all sets (of any length) of record fields, except for the sets where more than one field has the same name.

(9) C := ⟨N,T⟩
C is the set of column types, for tables. Each column has a name and a type.

(10) Cⁿ ⊂ ⋃_0≤i≤n ⟨i, C⟩
Cⁿ is the set of all ordered sequences of n column types.

(11) C^* := ( ⋃_0≤i≤∞ Cⁱ ) ∖ { C^m | ⟨a, ⟨n₁, t₁⟩⟩, ⟨b, ⟨n₂, t₂⟩⟩ ∈ C^m ⋀ n₁ = n₂ }
C^* is the set of all combinations (of any length) of column types, except for those where more than one column has the same name.

M TYPES

(12) T_F := ⟨P, P^*⟩
A Function Type consists of a return type, and an ordered list of zero-or-more function parameters.

(13) T_L :=〖T〗
A List type is indicated by a given type (called the "item type") wrapped in curly braces. Since curly braces are used in the metalanguage, 〖〗 brackets are used in this document.

(14) T_R := ⟨B, F^*⟩
A Record Type has a flag indicating whether it's "open", and zero-or-more unordered record fields.

(15) T_R^o := ⟨true, F⟩

(16) T_R^• := ⟨false, F⟩
T_R^o and T_R^• are notational shortcuts for open and closed record types, respectively.

(17) T_T := C^*
A Table Type is an ordered sequence of zero-or-more column types, where there are no name collisions.

(18) T_P := { any; none; null; logical; number; time; date; datetime; datetimezone; duration; text; binary; type; list; record; table; function; anynonnull }
A Primitive Type is one from this list of M keywords.

(19) T_N := { t_n, u ∈ T | t_n = u+null } = nullable t
Any type can additionally be marked as being nullable, by using the "nullable" keyword.

(20) T := T_F ∪ T_L ∪ T_R ∪ T_T ∪ T_P ∪ T_N
The set of all M types is the union of these six sets of types:
Function Types, List Types, Record Types, Table Types, Primitive Types, and Nullable Types.

FUNCTIONS

One function needs to be defined: NonNullable : T ← T
This function takes a type, and returns a type that is equivalent except it does not conform with the null value.

IDENTITIES

Some identities are needed to define some special cases, and may also help elucidate the above.

(21) nullable any = any
(22) nullable anynonnull = any
(23) nullable null = null
(24) nullable none = null
(25) nullable nullable t ∈ T = nullable t
(26) NonNullable(nullable t ∈ T) = NonNullable(t)
(27) NonNullable(any) = anynonnull

TYPE COMPATIBILITY

As defined elsewhere, an M type is compatable with another M type if and only if all values that conform to the first type also conform to the second type.

Here is defined a compatibility relation that does not depend on conforming values, and is based on the properties of the types themselves. It is anticipated that this relation, as defined in this document, is completely equivalent to the original semantic definition.

The "is compatible with" relation : ≤ : B ← T × T
In the below section, a lowercase t will always represent an M Type, an element of T.

A Φ will represent a subset of F^*, or of C^*.

(28) t ≤ t
This relation is reflexive.

(29) t_a ≤ t_b ∧ t_b ≤ t_c → t_a ≤ t_c
This relation is transitive.

(30) none ≤ t ≤ any
M types form a lattice over this relation; none is the bottom, and any is the top.

(31) t_a, t_b ∈ T_N ∧ t_a ≤ t_a → NonNullable(t_a) ≤ NonNullable(t_b)
If two types are compatible, then the NonNullable equivalents are also compatible.

(32) null ≤ t ∈ T_N
The primitive type null is compatible with all nullable types.

(33) t ∉ T_N ≤ anynonnull
All nonnullable types are compatible with anynonnull.

(34) NonNullable(t) ≤ t
A NonNullible type is compatible with the nullable equivalent.

(35) t ∈ T_F → t ≤ function
All function types are compatible with function.

(36) t ∈ T_L → t ≤ list
All list types are compatible with list.

(37) t ∈ T_R → t ≤ record
All record types are compatible with record.

(38) t ∈ T_T → t ≤ table
All table types are compatible with table.

(39) t_a ≤ t_b ↔ 〖t_a〗≤〖t_b〗
A list type is compaible with another list type if the item types are compatible, and vice-versa.

(40) t_a ∈ T_F = ⟨ p_a, p^* ⟩, t_b ∈ T_F = ⟨ p_b, p^* ⟩ ∧ p_a ≤ p_b → t_a ≤ t_b
A function type is compatible with another function type if the return types are compatible, and the parameter lists are identical.

(41) t_a ∈ T_R^o, t_b ∈ T_R^• → t_a ≰ t_b
An open record type is never compatible with a closed record type.

(42) t_a ∈ T_R^• = ⟨false, Φ⟩, t_b ∈ T_R^o = ⟨true, Φ⟩ → t_a ≤ t_b
A closed record type is compatible with an otherwise identical open record type.

(43) t_a ∈ T_R^o = ⟨true, (Φ, ⟨true, n, any⟩)⟩, t_b ∈ T_R^o = ⟨true, Φ⟩ → t_a ≤ t_b ∧ t_b ≤ t_a
An optional field with the type any may be ignored when comparing two open record types.

(44) t_a ∈ T_R = ⟨b, (Φ, ⟨β, n, u_a⟩)⟩, t_b ∈ T_R = ⟨b, (Φ, ⟨β, n, u_b⟩)⟩ ∧ u_a ≤ u_b → t_a ≤ t_b
Two record types that differ only by one field are compatible if the name and optionality of the field are identical, and the types of said field are compatible.

(45) t_a ∈ T_R = ⟨b, (Φ, ⟨false, n, u⟩)⟩, t_b ∈ T_R = ⟨b, (Φ, ⟨true, n, u⟩)⟩ → t_a ≤ t_b
A record type with a non-optional field is compatible with a record type identical but for that field being optional.

(46) t_a ∈ T_R^o = ⟨true, (Φ, ⟨b, n, u⟩)⟩, t_b ∈ T_R^o = ⟨true, Φ⟩ → t_a ≤ t_b
An open record type is compatible with another open record type with one fewer field.

(47) t_a ∈ T_T = (Φ, ⟨i, ⟨n, u_a⟩⟩), t_b ∈ T_T = (Φ, ⟨i, ⟨n, u_b⟩⟩) ∧ u_a ≤ u_b → t_a ≤ t_b
A table type is compatible with a second table type, which is identical but for one column having a differing type, when the types for that column are compatible.

REFERENCES

Microsoft Corporation (2015 August)
Microsoft Power Query for Excel Formula Language Specification [PDF]
Retrieved from https://msdn.microsoft.com/library/mt807488.aspx

Microsoft Corporation (n.d.)
Power Query M function reference [web page]
Retrieved from https://msdn.microsoft.com/library/mt779182.aspx

Поділитися через