About record fields

By jaiyalas Posted on 15 Apr 2017. Last modified on 19 Jun 2020 (14:51).

DisambiguateRecordFields
- the easy way
- the other way
DuplicateRecordFields
NamedFieldPuns
RecordWildCards
- more detail

每日豆知識： Records

{-# LANGUAGE DisambiguateRecordFields #-}
{-# LANGUAGE DuplicateRecordFields #-}
{-# LANGUAGE NamedFieldPuns #-}
{-# LANGUAGE RecordWildCards #-}

DisambiguateRecordFields

GHC 預設情況 (haskell 98) 下在同一個 namespace 裡面不允許有同名的 fields 出現。這還算合理，因為每一個 fields 基本上就是被轉換成一個 functions。所以就算這個 functions 的 type 都不一樣也還是會有撞名問題。例如說下面的兩個 modules 中，就算我們把 foo1 和 foo2 的 type 都給出來也是沒有用的。一樣會拿到 Ambiguous occurrence ‘x’ 這樣的錯誤訊息。

module M1 where
    data TinM1 = CinM1 {x :: Int, y :: String}

module M2 where
    import M1
    data T = C {x :: Char}
    foo1 (CinM1 { x = n }) = n+1
    foo2 n = C { x = n+1 }

但是說實在話，我們確實很可能把 fields 取的非常一般化 (aka 菜市場-style)。特別是不同 module 之間彼此撞名的情況是有可能發生的。這邊提供兩種解決這個問題的方法：

the easy way

最簡單的解決方法就是用 qualified fields。例如說，上面的例子如果改成下面這樣就不會有問題。

foo1 (CinM1 { M1.x = n }) = n+1
foo2 n = C { M2.x = n+1 }

當然，這免不了最終一堆東西前面都會跑出個 OOXX. 這種情況。

the other way

此外，ghc 提供一個方便的 extension 來解決這個問題： DisambiguateRecordFields。根據 constructors ，ghc 會判斷出當下的 fields 是從哪來的，所以可以避免 fields 混淆。

module M1 where
    data TinM1 = CinM1 {x :: Int, y :: String}

{-# LANGUAGE DisambiguateRecordFields #-}
module M2 where
    import M1
    data T = C {x :: Char}
    --
    ok1 (CinM1 { x = n }) = n+1
    ok2 n = C { x = n+1 }
    --
    -- bad1 k = k { x = 3 }
    -- bad2 k = x k

DuplicateRecordFields

上面的 DisambiguateRecordFields 解決了不同 modules 之間的 fields 撞名問題。但是如果是同一個 module 裡面的撞名問題，DisambiguateRecordFields 就沒有用了。事實上， haskell 是根本不允許同一個 module 裡面有同樣的 fields。所以，如果試圖用 ghc 來 compile 下述的程式，我們會得到錯誤訊息說 Multiple declarations of ‘x’。

data T1 = C1 {x :: Char}
data T2 = C2 {x :: Int}

好消息是， ghc 提供另一個 extension: DuplicateRecordFields 來專門放寬這個限制。一旦使用這個 extension，ghc 會放棄部分使用 record 時的 type inference。

舉例來說，給定程式：

data T = C {x :: Int}
newT n = C {x = n}

原本 ghc 看到 C { x = n } 時會因為 C 而推導出整個 term 的 type 是 T (以此知道是要用哪個 field)。開了這個 extension 以後 ghc 不會去進行這個 inference，所以，需要我們手動告訴 ghc 那個我們要使用的 fields 的 type 。(這會造成 fields 使用起來會更加瑣碎。)

以下根據 selecting 和 updating 兩種情況分別加以描述。

注意：使用 DuplicateRecordFields 會連帶開啟 DisambiguateRecordFields

selector functions

要能正常使用的方法只有

我們有兩種方法可以 「直接指定該 fields 的 type」 ：

把整個 fields 的 type 寫出來
把 field 的 參數的 type 寫出來

請特別注意，就算我們從其他地方給了足夠的 type information。但是因為 ghc 在這裡壓根就不會幫我們做 inference，所以 ghc 還是會跟你說 error。

舉例來說，下面程式中， bad1 中我們在 pattern 中限制了 constructor 一定要是 C2，所以理論上應該可以決定出那個 x 是要用 T1 的。但是 ghc 在這裡，就會因為沒做 inference 所以不會發現到這點。另一方面，ok1 和 ok3 因為我們指定了參數的 type ，所以可以唯一決定那個一個 field 是誰。如果沒有指定，如 bad2 ，則會無法確定要哪個 x。就算我們給了整個 function 的 type，如 bad3 ，也是沒有效果的。而 ok2 和 ok4 都是直接指定了 field 的 type 所以沒有問題。

data T1 = C1 {x :: Char}
data T2 = C2 {x :: Int}
--
ok1 r = x (r :: T1)
--
ok2 = x :: T1 -> Char
--
ok3 :: T1 -> Char
ok3 r = x (r :: T1)
--
ok4 :: T1 -> Char
ok4 = x
--
-- bad1 r@(C1 _) = x r
--
-- bad2 r = x r
--
-- bad3 :: T1 -> Char
-- bad3 r = x r

record update

和 selector 一樣，直接指定 type 是最安全的作法。

data T = T {f :: Int, g :: Int}
data S = S {f :: Int, g :: Int}
--
ok1 :: S -> S
ok1 x = x {f = 1}
--
ok2 x = x {f = 1} :: S
--
ok3 x = k x {f = 1} -- given k :: S -> a

這裡要特別注意的是，這邊需要明確且直接指定的是：「被 updated 以後的東西的 type」(因為 ghc 不會在這裡做任何 type inference)；或，「立即要被 update 的 record 的 type」，因此下面的 notOk1, notOk2 和 notOk3 都不會過，但是 update4 和可以正確使用。

-- notOk1 = let x :: S
--              x = S 1 1
--          in x { f = 2 }
--
-- notOk2 x =
--     [ x { foo = 3 }
--     , S 1 1 :: T ]
--
-- notOk3 (x :: S) = x { f = 1 }
--
ok4 x = (x :: S) { f = 1 }

如 notOk3 和 update4，為了在 pattern 裡面指定 type，我們會需要啟用

{-# LANGUAGE ScopedTypeVariables #-}

more cases

duplicate fields with different types

因為 ghc 在這邊不會對 record 做任何 type inference，所以就算重名(撞名)的 fields 具有不同 types 也還是一樣會變成 ambiguous field。

data U = U {f :: String}
data V = V {f :: Int}
--
notOk4 x = x {f = 1}

上述的程式片段餵給 ghc 會拿到 Record update is ambiguous, and requires a type signature 。

not 100% overlapping

如果兩個 structures 的 fields 不是 100% overlapped，一種便宜作法是多用幾個 fields 直到 ghc 可以根據使用的 fields 來判斷是要用的是誰。

data A = A {f :: Int}
data B = B {f :: Int, g :: Int}
--
-- notOk5 x = x {f = 1}
--
ok5 x = x {f = 1, g = 2}
ok6 x = x {g = "a"}

NamedFieldPuns

作為 selector ，我們用 field 來對 structure 取其內部的值。

data T = C {num :: Int}

getNum1 c = num c

我們也可以在 record pattern 裡面去把他 (locally) bind 到一個新的變數上。下述 getNum1 和 getNum2 完全是等價的。

getNum1 (C {num = x}) = x
getNum2 c = let x = num c in x

程式設計師的美德「懶惰」告訴我們那個 getNum1 裡面的 x 實在是有點雞肋。幸好， ghc 提供 NamedFieldPuns - 可以直接假裝 field name 是一個 (local) variable 那樣來用。

getNum3 (C {num}) = num

Puns 對於 mixed record patterns 和 qualified field name 也都可以用：

module M where
    data TinM = CinM {x :: Int, y :: String}
module N where
    foo (CinM {M.x, M.y = ""}) = x
    foo (CinM {M.x, M.y}) = x + length y

注意：有時候 qualified field name 的使用是無法避免的。例如說，上例中 module N 裡面可能另外有別的 structure 也有 field name 是 x 或 y。

RecordWildCards

假設我們今天有個 data structure ：

data T = C { p :: Bool
           , x1 :: Int
           , x2 :: Int }

然後我們想要寫一個 function 來根據第一個參數的值決定，我們是要把剩下的數值相加或是相減：

func1 (C {p = True,  x1 = m, x2 = n}) = m + n
func1 (C {p = False, x1 = m, x2 = n}) = m - n

這個 func1 完全是正確可以執行的，但是，上面這樣的重新命名實在是有點繞路又雞肋。所以，懶惰如我等，大概就會開啟 puns 然後把上面改寫成：

func2 (C {p = True,  x1, x2}) = x1 + x2
func2 (C {p = False, x1, x2}) = x1 - x2

很好很好，現在看起來比較不會混淆了。但是！多看久一點我們果然還是會嫌裡面的 x1, x2 這種東西有點礙眼。明明就是寫在 structure 定義裡面的東西，為什麼我還要有事沒事一直去重複打那些 fields！

你想偷懶的願望 ghc 聽到了！ ghc 現在推出 extension RecordWildCards 來讓我們可以用 .. 去省略那些我們懶得打的東西！也就是說，上面又可以改寫成：

func3 (C {p = True,  .. }) = x1 + x2
func3 (C {p = False, .. }) = x1 - x2

注意：啟用 RecordWildCards 會連帶開啟 DisambiguateRecordFields。

more detail

Record wild cards 允許在 construction 時使用，ghc 會去找和 fields 名字一致的變數名稱來用。但是， updating 時則不允許使用 wild cards。因此下面 newT 可以過 (而且其實就算 C 需要的參數值沒有都給齊也 ok，只是會有 warning)，但是 updateT 則 compile 根本不會過。

newT =
    let x1 = 10
        x2 = 2
        x3 = 1
    in C {..}

-- updateT c1 =
--     let x1 = 10
--         x2 = 2
--         x3 = 1
--     in c1 {..}

更進一步說，其實只要 ghc 可以在當下的 scope 中找到和那個 field 同名的東西就可以。要注意的是，field 本身可以是 qualified 或 unqualified 的，但是要拿來用的那個同名變數必須要是 unqualified 的。舉例來說，假設我們已經有定義好兩個 modules 如下：

module A where
    data T = C { x1 :: Int
               , x2 :: Int
               , x3 :: Int }

module B where
    x1 = 1 :: Int
    x2 = 2 :: Int

那麼下述程式是正確的。而其中因為 A 和 B 兩個 module 中有東西撞名，所以我們需要 qualified 其中一個。如果我們如下面一樣 import qualified A，則 newT 裡面就可以用 wildcards。不過如果我們是對 B 做 qualified。那麼 ghc 就會沒法把 B 裡面的 x1 和 x2 拿出來用。因為他們現在全名是 B.x1 和 B.x2，而這不 match 我們 construct T 需要的 x1 和 x2。

{-# LANGUAGE RecordWildCards #-}
module M where
import qualified A
import B
--
newT x3 = A.C {..}

我們可以試跑上面的範例，會得到一個合乎預期的結果！

*M> newT 3
C {x1 = 1, x2 = 2, x3 = 3}

[Tags: haskell]