Organizing configs by usage phase
Configuration sprawl
Have you found your configuration handling begins to sprawl as you add more
configuration to your program? This example makes two database connections
and allows the hostname and logging level to be configured. Imagine how
main might grow over time as different people follow the patterns they
find here while adding more parameters.
data ConfigExample1 = ConfigExample1
{ hostname :: Text
, userDBUser :: String
, userDBPass :: String
, userDBDB :: String
, userDBPort :: Integer
, userDBHost :: String
, logLevel :: Katip.Severity
, analysisDBUser :: String
, analysisDBHost :: String
, analysisDBPort :: Integer
, analysisDBDB :: String
, analysisDBPass :: String
} deriving (Generic)
main :: IO ()
main = do
ConfigExample1{..} <- loadConfig "local.yaml"
print "Connecting to User Database"
userDB <- PG.connect PG.ConnectInfo
{ PG.connectUser = userDBUser
, PG.connectHost = userDBHost
, PG.connectPort = fromIntegral userDBPort
, PG.connectPassword = userDBPass
, PG.connectDatabase = userDBDB
}
-- ... a block of initialization code
-- ... maybe written by other team members
--
analysisDB <- PG.connect PG.ConnectInfo
{ PG.connectUser = analysisDBUser
, PG.connectHost = analysisDBHost
, PG.connectPort = fromIntegral analysisDBPort
, PG.connectPassword = analysisDBPass
, PG.connectDatabase = analysisDBDB
}
runBusinessLogic analysisDB userDB logLevel (Text.unpack hostname)
runBusinessLogic
:: PG.Connection
-> PG.Connection
-> Katip.Severity
-> String
-> IO ()
runBusinessLogic userDB analysisDB logLevel myHostname = do
undefinedWhat problems will we run into as we try to scale this program up?
- Flat format Without organization, it’s easy for config files and their parsers to become confusing
- Unchecked Redundancy The fields of
ConfigExample1correspond to the arguments ofrunBusinessLogic, but the correspondence is loose (we have a collection ofTextandIntfields in the config but aPG.Connectionin the arguments. - Ad-hoc resource acquisition The code used for reading config fields and creating resources from them can get messy quickly
Organizing configuration
We can address these problems and add a lot of structure to our configuration. Let’s start by drawing a connection between the configuration phase, and the running phase. We’ll start by using a single configuration type that abstracts over the phase at which it is used:
data ConfigF (p :: Phase) = Config
{ userDB :: AtPhase p PG.Connection
, analysisDB :: AtPhase p PG.Connection
, hostname :: Text
, logLevel :: Katip.Severity
} deriving (Generic)
type Config = ConfigF 'ConfigTime
type Resources = ConfigF 'RunTimeSome of our configuration fields are regular Haskell datatypes, like
Text and Katip.Severity. These values are specified and used in
the same way. But the userDB and analysisDB fields are defined
in terms of Phase and AtPhase, which we are about to define.
Accepting some complexity here allows us to share a single type definition for specifying two things (a) the configuration data and (b) the configuration result, in one go.
Our main function will look more like this:
main' = do
cfg :: Config <- loadConfig "local.yaml"
res :: Resources <- buildConfig cfg
runBusinessLogic' res
runBusinessLogic' :: Resources -> IO ()
runBusinessLogic' = undefinedWhether the added complexity is worth it depends on a few things.
- Configuration complexity If your configuration is simple, then you don’t benefit from deduplicating the configuration and runtime types
- Complexity budget We’ll be adding a typeclass and a type family. If there are other aspects of your codebase that use your complexity budget, or if the budget is low for other reasons, then it’s best to treat this technique as a fun curiosity. Use your budget on something with a higher power-to-weight ratio, like dhall
Let’s build the required machinery.
First, we use DataKinds to
define a new kind Phase
inhabited by two types ’ConfigTime and ’RunTime
to specialize our application’s configuration for one phase or the other.
data Phase
= ConfigTime
| RunTimeNext, a type family will let us associate a configuration type to a runtime type for some particular resource.
class HasPhases t where
type AtPhase (p :: Phase) t
instance HasPhases PG.Connection where
type AtPhase 'ConfigTime PG.Connection = PG.ConnectInfo
type AtPhase 'RunTime PG.Connection = PG.ConnectionAnother typeclass allows the construction of a resource from its configuration data. The two type parameters correspond to the configtime and runtime types.
class ToRuntime cty rty where
toRuntime :: cty -> IO rty
-- | Catch-all instance for regular (not phase-specific) types
instance ToRuntime ty ty where
toRuntime = return
instance ToRuntime PG.ConnectInfo PG.Connection where
toRuntime = PG.connectThere is a catch-all instance for cases when the configtime
type is the same as the runtime type. When we eventually
process our configuration record, the catch-all instance will
apply to any field with a regular type, like our
hostname field. (It would also match any
AtPhase fields where configtime type is equal
to the runtime type.. but if we ever use AtPhase,
we would do that in order to let the types differ
across phases, so we won’t encounter this case. Sorry for
the aside, I hope it makes sense)
generic-lens
In order to have our buildConfig function, we
will use some fabulous magic from
generic-lens.
Specifically, constraints
allows us to traverse a record, applying an effectful function
at every field with a type satisfying some particular constraint.
We choose that constraint to be our ToRuntime class.
Now we can write buildConfig without repeating any
details about the internals of our configuration record.
buildConfig :: ConfigF ConfigTime -> IO (ConfigF RunTime)
buildConfig = GenericLens.constraints @ToRuntime toRuntimeSumming up
Summing up, we have removed a few types of duplication and informality from the process of building runtime resources from configuration data.
We parameterized a single configuration record by the phase, either configuration time or runtime.
And we concentrated the work of instantiating a runtime resource record from the configuration record into a very generic function.
Was it worth it? In effect, we sacrificed Haskell98’s
simplicity at the altar of the DRY (Don’t Repeat Yourself)
principle. Repetition is legitimately dangerous, especially
in a codebase with multiple authors, written in a language
where type safety encourages us to skimp on testing. The
competing interests here (redundancy vs. complexity) depend
on just how much redundancy you are cleaning up.
Thanks to Sarah Brofeldt (@srhb), who fleshed out the idea with me and did much of the implementation. Simple as it may look, we had to explore quite a few paths before we found this solution.
Thanks also to K.A. Buhr for his very helpful answer on StackOverflow. Ross Baker read a draft of this post and gave great suggestions for improvement - Thanks!