Posted on October 17, 2019

Organizing configs by usage phase

Configuration sprawl

Have you found your configuration handling begins to sprawl as you add more configuration to your program? This example makes two database connections and allows the hostname and logging level to be configured. Imagine how main might grow over time as different people follow the patterns they find here while adding more parameters.

data ConfigExample1 = ConfigExample1
  { hostname   :: Text
  , userDBUser :: String
  , userDBPass :: String
  , userDBDB   :: String
  , userDBPort :: Integer
  , userDBHost :: String
  , logLevel :: Katip.Severity
  , analysisDBUser :: String
  , analysisDBHost :: String
  , analysisDBPort :: Integer
  , analysisDBDB   :: String
  , analysisDBPass :: String
  } deriving (Generic)


main :: IO ()
main = do
  ConfigExample1{..} <- loadConfig "local.yaml"
  print "Connecting to User Database"
  userDB <- PG.connect PG.ConnectInfo
      { PG.connectUser     = userDBUser
      , PG.connectHost     = userDBHost
      , PG.connectPort     = fromIntegral userDBPort
      , PG.connectPassword = userDBPass
      , PG.connectDatabase = userDBDB
      }
  -- ... a block of initialization code
  -- ... maybe written by other team members
  --
  analysisDB <- PG.connect PG.ConnectInfo
      { PG.connectUser     = analysisDBUser
      , PG.connectHost     = analysisDBHost
      , PG.connectPort     = fromIntegral analysisDBPort
      , PG.connectPassword = analysisDBPass
      , PG.connectDatabase = analysisDBDB
      }  
  runBusinessLogic analysisDB userDB logLevel (Text.unpack hostname)



runBusinessLogic
  :: PG.Connection
  -> PG.Connection
  -> Katip.Severity
  -> String
  -> IO ()
runBusinessLogic userDB analysisDB logLevel myHostname = do
  undefined

What problems will we run into as we try to scale this program up?

  • Flat format Without organization, it’s easy for config files and their parsers to become confusing
  • Unchecked Redundancy The fields of ConfigExample1 correspond to the arguments of runBusinessLogic, but the correspondence is loose (we have a collection of Text and Int fields in the config but a PG.Connection in the arguments.
  • Ad-hoc resource acquisition The code used for reading config fields and creating resources from them can get messy quickly

Organizing configuration

We can address these problems and add a lot of structure to our configuration. Let’s start by drawing a connection between the configuration phase, and the running phase. We’ll start by using a single configuration type that abstracts over the phase at which it is used:

data ConfigF (p :: Phase) = Config
  { userDB     :: AtPhase p PG.Connection
  , analysisDB :: AtPhase p PG.Connection
  , hostname   :: Text
  , logLevel   :: Katip.Severity
  } deriving (Generic)

type Config    = ConfigF 'ConfigTime
type Resources = ConfigF 'RunTime

Some of our configuration fields are regular Haskell datatypes, like Text and Katip.Severity. These values are specified and used in the same way. But the userDB and analysisDB fields are defined in terms of Phase and AtPhase, which we are about to define.

Accepting some complexity here allows us to share a single type definition for specifying two things (a) the configuration data and (b) the configuration result, in one go.

Our main function will look more like this:

main' = do
  cfg :: Config    <- loadConfig "local.yaml"
  res :: Resources <- buildConfig cfg
  runBusinessLogic' res

runBusinessLogic' :: Resources -> IO ()
runBusinessLogic' = undefined

Whether the added complexity is worth it depends on a few things.

  • Configuration complexity If your configuration is simple, then you don’t benefit from deduplicating the configuration and runtime types
  • Complexity budget We’ll be adding a typeclass and a type family. If there are other aspects of your codebase that use your complexity budget, or if the budget is low for other reasons, then it’s best to treat this technique as a fun curiosity. Use your budget on something with a higher power-to-weight ratio, like dhall

Let’s build the required machinery.

First, we use DataKinds to define a new kind Phase inhabited by two types ’ConfigTime and ’RunTime to specialize our application’s configuration for one phase or the other.

data Phase
  = ConfigTime
  | RunTime

Next, a type family will let us associate a configuration type to a runtime type for some particular resource.

class HasPhases t where
  type AtPhase (p :: Phase) t

instance HasPhases PG.Connection where
  type AtPhase 'ConfigTime PG.Connection = PG.ConnectInfo
  type AtPhase 'RunTime    PG.Connection = PG.Connection

Another typeclass allows the construction of a resource from its configuration data. The two type parameters correspond to the configtime and runtime types.

class ToRuntime cty rty where
  toRuntime :: cty -> IO rty

-- | Catch-all instance for regular (not phase-specific) types
instance ToRuntime ty ty where
  toRuntime = return

instance ToRuntime PG.ConnectInfo PG.Connection where
  toRuntime = PG.connect

There is a catch-all instance for cases when the configtime type is the same as the runtime type. When we eventually process our configuration record, the catch-all instance will apply to any field with a regular type, like our hostname field. (It would also match any AtPhase fields where configtime type is equal to the runtime type.. but if we ever use AtPhase, we would do that in order to let the types differ across phases, so we won’t encounter this case. Sorry for the aside, I hope it makes sense)

generic-lens

In order to have our buildConfig function, we will use some fabulous magic from generic-lens. Specifically, constraints allows us to traverse a record, applying an effectful function at every field with a type satisfying some particular constraint.

We choose that constraint to be our ToRuntime class.

Now we can write buildConfig without repeating any details about the internals of our configuration record.

buildConfig :: ConfigF ConfigTime -> IO (ConfigF RunTime)
buildConfig = GenericLens.constraints @ToRuntime toRuntime

Summing up

Summing up, we have removed a few types of duplication and informality from the process of building runtime resources from configuration data.

We parameterized a single configuration record by the phase, either configuration time or runtime.

And we concentrated the work of instantiating a runtime resource record from the configuration record into a very generic function.

Was it worth it? In effect, we sacrificed Haskell98’s simplicity at the altar of the DRY (Don’t Repeat Yourself) principle. Repetition is legitimately dangerous, especially in a codebase with multiple authors, written in a language where type safety encourages us to skimp on testing. The competing interests here (redundancy vs. complexity) depend on just how much redundancy you are cleaning up.

Thanks to Sarah Brofeldt (@srhb), who fleshed out the idea with me and did much of the implementation. Simple as it may look, we had to explore quite a few paths before we found this solution.

Thanks also to K.A. Buhr for his very helpful answer on StackOverflow. Ross Baker read a draft of this post and gave great suggestions for improvement - Thanks!

Related work