Removing code literals

Every software code contains literals (usually numbers, strings, booleans or characters). They are values related to application configuration, parts of the business logic, natural or language constants, etc.. We have learned to replace many of these values with variables, constants or function calls for various reasons (security, manageability, readability, expressivity,…), which we are not going to analyze here. This article is about giving some guidelines on where to store these replaced literals. Remember that these are just some guidelines, not strict rules. The final decision always belongs to the developer and the context of each literal.

1. Sensitive data and server configuration: this category includes sensitive data (e.g database credentials, encryption keys) or values that depend on the server instance, though it is the first one that draws most of our attention.

Where to put: environmental variables OR out-of-band config files

The main idea here is to limit access to these values as much as possible. The first action is to take them out of the code repository (that is usually accessible by many developers) and delegate their management and/or deployment to some DevOps/sysadmins, hiding them from most eyes. One way to do this is to set them in a server-wide manner as O.S environmental variables. If possible, we could go one step further and limit the processes that can access these values. This is not difficult if, for example, you are using Docker (have a look at Docker Secrets).

Attention should be paid in case you are using debugging, logging or reporting tool that prints/logs all environmental variables in case of an application/system crash. Access to this output should be well controlled or an automated process that scrubs out those data should be used.

A better solution is to use a automatically deployed file, again managed DevOps, that will be loaded and parsed by the application code. This minimizes the exposure of these data (environmental variables are accessible by third-party packages used by your application). Environmental variables are usually part of the application deployment/startup script, which means that we need to re-deploy in order to modify the variables. The case may be the same for the file solution. It depends on the case.

As far as credentials are concerned, a more sophisticated (and secure) way to go is to use a secret management tool (Hashicorp Vault, Square’s Keywhiz, Pinterest’s Knox, AWS Secrets Manager,…) that will be holding these “secrets” encrypted and will be providing them to other service at run-time and on demand. Potential benefits: secure storage for your secrets, central management, centralized auditing, secret rotation/revocation, and more.

2. Service configuration: values editable by developers that may change and are related to application’s architecture/design or its interconnection to other systems (e.g default timeout for external calls, logging level, various paths, base url of external services, …). Usually, these values represent important decisions to how the system works and should be given special care.

Where to put: in-band config files (inside the application repository)

Using config files makes these values easily accessible to developers (which is desirable here) and makes it is easier to check (through git history) if some important parameter of the system has been modified recently. Of course, only as long as you don’t bloat the files with other values of less importance. Whether it is better to share configuration parameters in several flat files or use just one with tree-like data structures (or a combination of these two) is not so important neither our concern here.

You may ask: why not put this configuration in the database ? We could. But there are a few arguments against. The first one is about the case where some of these configuration parameters need to be available even when the database is not. A second one is that stealing data from the application itself is considered more difficult than stealing data from database. If the database is compromised and the web server is not, then our sensitive data is still protected. If the web server is compromised, the database is also compromised. The database can also be compromised through SQL injection. Of course, an application bug can also expose configuration parameters but, still, the first is considered more probable. Going one step further, what about cases where the application is being bootstrapped every time we need to serve a request ? In one case, we need to query the database for every request and in the other we need to read a local file. However, (horizontally) scaling the database is not that easy as scaling the web application.

3. Administration parameters : Sometimes, we want to allow application users to dynamically adjust some service configuration parameters. This adjustment becomes possible through application’s UI for users that hold the required role/credentials.

Where to put: database

4. Literals related to business logic : values editable by developers that are not expected to change. These values are usually part of the business logic.

Where to put: (static) constants or enumerated types.

In some cases, it is totally fine to keep them hard-coded. Especially, when the context makes their semantics clear or if you have some string literals that are used only once. Keep in mind that moving them to in-band config files will not save you from re-deployment. If the semantics are not clear (heard of magic numbers ? ), use constants and try to use descriptive names.

e.g

 $delayInSeconds = calculateDelay();
 $delayInHours = $delayInSeconds / 3600;  

The 3600 number is self-explained by context.

e.g

switch (token.Text) {
     case "+":     return a + b;
     case "-":     return a - b;
     //etc.
 }

These literals can be uniquely identified. No need for replacement.

Of course, even when the semantics are clear, which is usually the case for string literals, replacing them comes with benefits. The most important is avoiding mistyping the literal in some of its appearances. Moreover, using namespaced constants allows you to re-use a literal under different contexts. Generally speaking, when a value assigned to (or compared with) a variable belongs to countable and limited set of potential values, prefer a enumerated type.