Configuration with Template::HashExpand
By Ben Tilly ([email protected])
Work originally sponsored by ZipRecruiter
What is Configuration?
Software needs to know about its environment
A hash of keys and values is sufficient
This differs in production and development (and even by developer)
Tends towards repetitive code (we will help this)
Easy problem, lots of possible solutions
Nobody agrees on how to do it
Approach #1, multiple files
Load data from a series of files
Here is sample code:
use Config::Any;
# ... code here.
my $raw_config_list
= Config::Any‐>load_files({files => \@config_path, use_ext => 1});
my $config = {map %$_, map values %$_, @$raw_config_list};
This will accept data in many formats
Approach #1, tradeoffs
Data can come in different formats
Configuration can be reused by other programs, which might not be in Perl
No temptation to put production data in source control
But adding new config variables that require customization is hard
Approach #2, unified config file
No need to leave Perl
Have a giant hash
with every possible configuration
At run time, select the right configuration
$PARAM{$developer} = {%{$PARAM{sandbox}}, ...}
to limit repetition.
Approach #2, tradeoffs
Have full access to Perl while generating config
Seeing other sandboxes helps you create your own
You can add a new config key/value pair and fix sandboxes
But you generate configs you don't need at runtime
But the config file tends to be ugly
Approach #3, external build
Build the exact right configuration
Can use various tools (make, puppet, chef...)
Can be set up in an infinite number of ways
In the end you somehow need to put in the same data, get same result as other approaches
Approach #3, tradeoffs
Most flexible possible solution
Eat your deployment dogfood
Best run-time efficiency
But force an annoying build step on developers
But lot of cognitive overhead
Configuration structure
All approaches lend themselves to similar setups
Let's use a 3-level version with
, sandbox
and individual developer
level has sane defaults for everything
level has sane overrides for a sandbox
individual developer
level has overrides to use the developer's directories, urls, email address, etc.
Where is the repetition?
Many config variables need similar alterations, for instance all
file paths start off with
/home/[% user %]/
, and all
emails go to the developer
If you want to override one line of a nested data structure, you
need to copy the whole thing in every config
This is simple, but has to be repeated for every developer
This makes configuration files longer, and more error-prone to edit
Example of developer's overrides
cron_dir => '/home/btilly/sandbox/files/cron',
tmp_dir => '/home/btilly/sandbox/files/tmp',
email => {
support_email => 'Support <[email protected]>',
pager_email => 'Pagers <[email protected]>',
session => {
cookie_name => 'example_session_btilly',
expires => 60 * 60 * 24 * 14,
expires_long => 60 * 60 * 24 * 60,
storage => '/home/btilly/sandbox/files/session',
unlink_on_exit => 0,
Templating could help
default_email => '[% user %]',
root_files_dir => '/home/[% user %]/sandbox/files',
cron_dir => '[% root_files_dir %]/cron',
tmp_dir => '[% root_files_dir %]/tmp',
email => {
support_email => 'Support <[% default_email %]>',
pager_email => 'Pagers <[% default_email %]>',
session => {
cookie_name => 'example_session_[% user %]',
expires => 60 * 60 * 24 * 14,
expires_long => 60 * 60 * 24 * 60,
storage => '[% root_files_dir %]/session',
unlink_on_exit => 0,
Put all of that in your sandbox
Have a developer config override of
{user => 'btilly'}
After merge,
returns the expanded hash
At ZipRecruiter, even with
included inline, this shrank the configuration file by 20%
Adding new configuration variables is now massively easier
Best Practices
Any repeated text is a candidate to become a new key/value
Anything that you think someone might want to override is a candidate
to become a new key/value
Any nested piece of data that someone might want to override should
become a top-level key/value
Consider making a generic way to override parts of the config at
run-time (eg default_email)
We have enough templating engines
Everyone should know this by now
But everyone thinks that their problem is different
Did CPAN search
Did not find anything like this
Features I wanted
{foo => 'blat', bar => '[% foo %]'}
should substitute in 'blat'
{foo => '\[% foo %]'}
should give '[% foo %]'
depends on bar
depends on foo
, do the right thing
{foo => 'blat', bar => {baz => '[% foo %]'}}
want baz
to get substituted
This is exactly what
provides, no more, no less
How to implement
Make a "to do" hash of what needs to be examined for substitutions
Do a series of passes through the data structure I am expanding
In each pass, try substitutions that have not been successfully done, update "to do"
Quit when all substitutions are done, or no progress was made in a pass
How to parse
First you turn text into a list of tokens
Then you turn those tokens into structure, and do stuff with it
You can use recursion to do this, that is called "Recursive Descent"
Or you can use a variant of a state machine, this is called "Shift-Reduce"
Recursive Descent Parsers
Take a token
Try to use the token one way in our grammar
Try to parse the rest of the string
If that fails, backtrack, try to use the token a different way
Most hand-written parsers use recursive-descent
If you know regular expression engines, this is like an NFA engine
Shift-Reduce Parsers
There is a stack of states you may be in
Decide action based on current top state, and current token
One option is to put a new state on the stack (shift)
Or you can pop off some set of states, reduce them according to
some rule, and push on a new state (reduce)
Easiest to generate with automated tools like YACC
Conceptually an analog of a DFA regular expression engine
HashExpand is very simple
Trivial grammar
While writing it, I was thinking in terms of shift-reduce
Parsing is only one deep
Let's look at the code