Organizing configs by usage phase
Configuration sprawl
Have you found your configuration handling begins to sprawl as you add more
configuration to your program? This example makes two database connections
and allows the hostname and logging level to be configured. Imagine how
main
might grow over time as different people follow the patterns they
find here while adding more parameters.
data ConfigExample1 = ConfigExample1
hostname :: Text
{ userDBUser :: String
, userDBPass :: String
, userDBDB :: String
, userDBPort :: Integer
, userDBHost :: String
, logLevel :: Katip.Severity
, analysisDBUser :: String
, analysisDBHost :: String
, analysisDBPort :: Integer
, analysisDBDB :: String
, analysisDBPass :: String
,deriving (Generic)
}
main :: IO ()
= do
main ConfigExample1{..} <- loadConfig "local.yaml"
print "Connecting to User Database"
<- PG.connect PG.ConnectInfo
userDB = userDBUser
{ PG.connectUser = userDBHost
, PG.connectHost = fromIntegral userDBPort
, PG.connectPort = userDBPass
, PG.connectPassword = userDBDB
, PG.connectDatabase
}-- ... a block of initialization code
-- ... maybe written by other team members
--
<- PG.connect PG.ConnectInfo
analysisDB = analysisDBUser
{ PG.connectUser = analysisDBHost
, PG.connectHost = fromIntegral analysisDBPort
, PG.connectPort = analysisDBPass
, PG.connectPassword = analysisDBDB
, PG.connectDatabase
}
runBusinessLogic analysisDB userDB logLevel (Text.unpack hostname)
runBusinessLogic :: PG.Connection
-> PG.Connection
-> Katip.Severity
-> String
-> IO ()
= do
runBusinessLogic userDB analysisDB logLevel myHostname undefined
What problems will we run into as we try to scale this program up?
- Flat format Without organization, it’s easy for config files and their parsers to become confusing
- Unchecked Redundancy The fields of
ConfigExample1
correspond to the arguments ofrunBusinessLogic
, but the correspondence is loose (we have a collection ofText
andInt
fields in the config but aPG.Connection
in the arguments. - Ad-hoc resource acquisition The code used for reading config fields and creating resources from them can get messy quickly
Organizing configuration
We can address these problems and add a lot of structure to our configuration. Let’s start by drawing a connection between the configuration phase, and the running phase. We’ll start by using a single configuration type that abstracts over the phase at which it is used:
data ConfigF (p :: Phase) = Config
userDB :: AtPhase p PG.Connection
{ analysisDB :: AtPhase p PG.Connection
, hostname :: Text
, logLevel :: Katip.Severity
,deriving (Generic)
}
type Config = ConfigF 'ConfigTime
type Resources = ConfigF 'RunTime
Some of our configuration fields are regular Haskell datatypes, like
Text
and Katip.Severity
. These values are specified and used in
the same way. But the userDB
and analysisDB
fields are defined
in terms of Phase
and AtPhase
, which we are about to define.
Accepting some complexity here allows us to share a single type definition for specifying two things (a) the configuration data and (b) the configuration result, in one go.
Our main
function will look more like this:
= do
main' cfg :: Config <- loadConfig "local.yaml"
res :: Resources <- buildConfig cfg
runBusinessLogic' res
runBusinessLogic' :: Resources -> IO ()
= undefined runBusinessLogic'
Whether the added complexity is worth it depends on a few things.
- Configuration complexity If your configuration is simple, then you don’t benefit from deduplicating the configuration and runtime types
- Complexity budget We’ll be adding a typeclass and a type family. If there are other aspects of your codebase that use your complexity budget, or if the budget is low for other reasons, then it’s best to treat this technique as a fun curiosity. Use your budget on something with a higher power-to-weight ratio, like dhall
Let’s build the required machinery.
First, we use DataKinds
to
define a new kind Phase
inhabited by two types ’ConfigTime
and ’RunTime
to specialize our application’s configuration for one phase or the other.
data Phase
= ConfigTime
| RunTime
Next, a type family will let us associate a configuration type to a runtime type for some particular resource.
class HasPhases t where
type AtPhase (p :: Phase) t
instance HasPhases PG.Connection where
type AtPhase 'ConfigTime PG.Connection = PG.ConnectInfo
type AtPhase 'RunTime PG.Connection = PG.Connection
Another typeclass allows the construction of a resource from its configuration data. The two type parameters correspond to the configtime and runtime types.
class ToRuntime cty rty where
toRuntime :: cty -> IO rty
-- | Catch-all instance for regular (not phase-specific) types
instance ToRuntime ty ty where
= return
toRuntime
instance ToRuntime PG.ConnectInfo PG.Connection where
= PG.connect toRuntime
There is a catch-all instance for cases when the configtime
type is the same as the runtime type. When we eventually
process our configuration record, the catch-all instance will
apply to any field with a regular type, like our
hostname
field. (It would also match any
AtPhase
fields where configtime type is equal
to the runtime type.. but if we ever use AtPhase
,
we would do that in order to let the types differ
across phases, so we won’t encounter this case. Sorry for
the aside, I hope it makes sense)
generic-lens
In order to have our buildConfig
function, we
will use some fabulous magic from
generic-lens.
Specifically, constraints
allows us to traverse a record, applying an effectful function
at every field with a type satisfying some particular constraint.
We choose that constraint to be our ToRuntime
class.
Now we can write buildConfig
without repeating any
details about the internals of our configuration record.
buildConfig :: ConfigF ConfigTime -> IO (ConfigF RunTime)
= GenericLens.constraints @ToRuntime toRuntime buildConfig
Summing up
Summing up, we have removed a few types of duplication and informality from the process of building runtime resources from configuration data.
We parameterized a single configuration record by the phase, either configuration time or runtime.
And we concentrated the work of instantiating a runtime resource record from the configuration record into a very generic function.
Was it worth it? In effect, we sacrificed Haskell98
’s
simplicity at the altar of the DRY (Don’t Repeat Yourself)
principle. Repetition is legitimately dangerous, especially
in a codebase with multiple authors, written in a language
where type safety encourages us to skimp on testing. The
competing interests here (redundancy vs. complexity) depend
on just how much redundancy you are cleaning up.
Thanks to Sarah Brofeldt (@srhb), who fleshed out the idea with me and did much of the implementation. Simple as it may look, we had to explore quite a few paths before we found this solution.
Thanks also to K.A. Buhr for his very helpful answer on StackOverflow. Ross Baker read a draft of this post and gave great suggestions for improvement - Thanks!