Configuration with Template::HashExpand
By Ben Tilly ([email protected])
Work originally sponsored by ZipRecruiter
What is Configuration?
-
Software needs to know about its environment
-
A hash of keys and values is sufficient
-
This differs in production and development (and even by developer)
-
Tends towards repetitive code (we will help this)
-
Easy problem, lots of possible solutions
-
Nobody agrees on how to do it
Approach #1, multiple files
-
Load data from a series of files
-
Here is sample code:
use Config::Any;
# ... code here.
my $raw_config_list
= Config::Any‐>load_files({files => \@config_path, use_ext => 1});
my $config = {map %$_, map values %$_, @$raw_config_list};
-
This will accept data in many formats
Approach #1, tradeoffs
-
Data can come in different formats
-
Configuration can be reused by other programs, which might not be in Perl
-
No temptation to put production data in source control
-
But adding new config variables that require customization is hard
Approach #2, unified config file
-
No need to leave Perl
-
Have a giant hash
%PARAM
with every possible configuration
-
At run time, select the right configuration
-
Use
$PARAM{$developer} = {%{$PARAM{sandbox}}, ...}
to limit repetition.
Approach #2, tradeoffs
-
Have full access to Perl while generating config
-
Seeing other sandboxes helps you create your own
-
You can add a new config key/value pair and fix sandboxes
-
But you generate configs you don't need at runtime
-
But the config file tends to be ugly
Approach #3, external build
-
Build the exact right configuration
-
Can use various tools (make, puppet, chef...)
-
Can be set up in an infinite number of ways
-
In the end you somehow need to put in the same data, get same result as other approaches
Approach #3, tradeoffs
-
Most flexible possible solution
-
Eat your deployment dogfood
-
Best run-time efficiency
-
But force an annoying build step on developers
-
But lot of cognitive overhead
Configuration structure
-
All approaches lend themselves to similar setups
-
Let's use a 3-level version with
production
, sandbox
and individual developer
levels
-
The
production
level has sane defaults for everything
-
The
sandbox
level has sane overrides for a sandbox
-
The
individual developer
level has overrides to use the developer's directories, urls, email address, etc.
Where is the repetition?
-
Many config variables need similar alterations, for instance all
file paths start off with
/home/[% user %]/
, and all
emails go to the developer
-
If you want to override one line of a nested data structure, you
need to copy the whole thing in every config
-
This is simple, but has to be repeated for every developer
-
This makes configuration files longer, and more error-prone to edit
Example of developer's overrides
{
cron_dir => '/home/btilly/sandbox/files/cron',
tmp_dir => '/home/btilly/sandbox/files/tmp',
email => {
support_email => 'Support <[email protected]>',
pager_email => 'Pagers <[email protected]>',
},
session => {
cookie_name => 'example_session_btilly',
expires => 60 * 60 * 24 * 14,
expires_long => 60 * 60 * 24 * 60,
storage => '/home/btilly/sandbox/files/session',
unlink_on_exit => 0,
},
}
Templating could help
{
default_email => '[% user %]@example.com',
root_files_dir => '/home/[% user %]/sandbox/files',
cron_dir => '[% root_files_dir %]/cron',
tmp_dir => '[% root_files_dir %]/tmp',
email => {
support_email => 'Support <[% default_email %]>',
pager_email => 'Pagers <[% default_email %]>',
},
session => {
cookie_name => 'example_session_[% user %]',
expires => 60 * 60 * 24 * 14,
expires_long => 60 * 60 * 24 * 60,
storage => '[% root_files_dir %]/session',
unlink_on_exit => 0,
},
}
Template::HashExpand
-
Put all of that in your sandbox
-
Have a developer config override of
{user => 'btilly'}
-
After merge,
Template::HashExpand::expand_hash($config)
returns the expanded hash
-
At ZipRecruiter, even with
Template::HashExpand
included inline, this shrank the configuration file by 20%
-
Adding new configuration variables is now massively easier
Best Practices
-
Any repeated text is a candidate to become a new key/value
-
Anything that you think someone might want to override is a candidate
to become a new key/value
-
Any nested piece of data that someone might want to override should
become a top-level key/value
-
Consider making a generic way to override parts of the config at
run-time (eg default_email)
We have enough templating engines
-
Everyone should know this by now
-
But everyone thinks that their problem is different
-
Did CPAN search
-
Did not find anything like this
-
Sorry...
Features I wanted
-
{foo => 'blat', bar => '[% foo %]'}
should substitute in 'blat'
-
But
{foo => '\[% foo %]'}
should give '[% foo %]'
-
If
baz
depends on bar
depends on foo
, do the right thing
-
Given
{foo => 'blat', bar => {baz => '[% foo %]'}}
want baz
to get substituted
-
This is exactly what
Template::HashExpand
provides, no more, no less
How to implement
-
Make a "to do" hash of what needs to be examined for substitutions
-
Do a series of passes through the data structure I am expanding
-
In each pass, try substitutions that have not been successfully done, update "to do"
-
Quit when all substitutions are done, or no progress was made in a pass
How to parse
-
First you turn text into a list of tokens
-
Then you turn those tokens into structure, and do stuff with it
-
You can use recursion to do this, that is called "Recursive Descent"
-
Or you can use a variant of a state machine, this is called "Shift-Reduce"
Recursive Descent Parsers
-
Take a token
-
Try to use the token one way in our grammar
-
Try to parse the rest of the string
-
If that fails, backtrack, try to use the token a different way
-
Most hand-written parsers use recursive-descent
-
If you know regular expression engines, this is like an NFA engine
Shift-Reduce Parsers
-
There is a stack of states you may be in
-
Decide action based on current top state, and current token
-
One option is to put a new state on the stack (shift)
-
Or you can pop off some set of states, reduce them according to
some rule, and push on a new state (reduce)
-
Easiest to generate with automated tools like YACC
-
Conceptually an analog of a DFA regular expression engine
HashExpand is very simple
-
Trivial grammar
-
While writing it, I was thinking in terms of shift-reduce
-
Parsing is only one deep
-
Let's look at the code
←
→
/
#