Removing code literals

Every piece of software contains literals (usually numbers, strings or booleans). These are values related to application configuration, parts of the business logic, natural or language constants, etc.. We have learned to replace many of these values with variables, constants or function calls for various reasons (security, manageability, readability, expressivity,…), which we are not going to analyze here. Our goal here is to give some guidelines on where to move these literals when we decide to replace them. Remember that these are just some guidelines, not strict rules. The final decision always belongs to the developer and the context of each literal.

1. Sensitive, environment-dependent configuration: this category includes sensitive configuration that is usually different for each execution environment (DEVELOPMENT, STAGING, PRODUCTION,…). For example, database credentials, cloud services credentials/URLs, etc.

Where to put: environmental variables OR out-of-band config files

Since this configuration is different for each execution environment, it makes no sense to hard-code it. Moreover, we want to limit access to these values as much as possible. So, the first action is to take them out of the code repository (that is usually accessible by many developers) and delegate their management and/or deployment to some DevOps/sysadmins, hiding them from most eyes. What we usually do is to store them out-of-band and apply them during deployment process as environment variables. We could make the deployment process directly set the relevant environment variables on each server.

One way to do this is to set them in a server-wide manner as O.S environmental variables. If possible, we could go one step further and limit the processes that can access these values. This is not difficult if, for example, you are using Docker (have a look at Docker Secrets). However, sometimes, this would be troublesome for local development. So, another way to go is to load the environment variables from a relevant file, for example a “.env” file, as part of application’s bootstrapping process. Now, we can add a sample .env file that we could edit locally to serve our local development and that will be replaced during deployment by another .env file that contains the appropriate credentials for the corresponding execution environment.

Attention should be paid when using debugging, logging or reporting tools that prints/logs all environmental variables in case of an application/system crash. Access to this output should be well controlled or an automated process that scrubs out those data should be used.

Some may claim that since environment variables are accessible from anywhere they can be easily stolen by malicious code (e. g third-party packages/libraries with vulnerabilities). Of course, this is not a PHP issue. It may happen in almost all languages, (e.g Javascript https://www.bleepingcomputer.com/news/security/javascript-packages-caught-stealing-environment-variables/ , Ruby https://www.honeybadger.io/blog/securing-environment-variables/ , etc). One way to prevent is to use firewalls in order to limit outbound network communication so that malicious libraries cannot reach their “home”. Another way is to frequently check your components for vulnerabilities. If you don’t, environment variables will not be your only problem.. A more extreme approach is offload as much third-party components as possible to an external helper application that will provide services to your main application but will not use your database or your main AWS services. So, your main application credentials will not be set (in any way) in this helper application and the third-party components offloaded to that helper application will not have access to such credentials.

As far as credentials are concerned, a more sophisticated (and secure) way to go is to use a secret management tool (Hashicorp Vault, Square’s Keywhiz, Pinterest’s Knox, AWS Secrets Manager,…) that will be holding these “secrets” encrypted and will be providing them to other service at run-time and on demand. Potential benefits: secure storage for your secrets, central management, centralized auditing, secret rotation/revocation, and more.

2. Design-managed application configuration: These include (a) architectural or security-related configuration (e.g rate limit for our public API, base URLs or default timeout for external calls, max number of retries) and (b) configuration related to business logic that rarely or never changes (e.g maximum file size for uploaded CVs). Usually, these values represent important decisions to how the system works and should be given special care.

Where to put: in-band config files (inside the application repository)

This kind of application-wide configuration is usually part of the application design process so it needs to be managed by developers. Using config files solves them accessible to developers and makes it is easier to check (through git history) if some important parameter of the system has been modified recently. Of course, only as long as you don’t bloat the files with other values of less importance. Whether it is better to share configuration parameters in several flat files or use just one with tree-like data structures (or a combination of these two) is not so important or our concern here. Sometimes, some of these configuration parameters may depend on the execution environment (e.g we may want to remove rate limiting in testing environment in order to test the API limits). In such case, we can maintain separate configuration files per environment or we can extract those parameters as environment variables.

You may ask: why not put this configuration in the database ? We could. But there are a few arguments against. The first one is about the case where some of these configuration parameters need to be available even when the database is not. A second one is that stealing data from the application itself is considered more difficult than stealing data from database. If the database is compromised and the web server is not, then this configuration is still protected. If the web server is compromised, the database is also compromised. The database can also be compromised through SQL injection. Of course, an application bug can also expose configuration parameters but, still, the first is considered more probable. Going one step further, what about cases where the application is being bootstrapped every time we need to serve a request ? In one case, we need to query the database for every request and in the other we need to read a local file. However, (horizontally) scaling the database is not that easy as scaling the web application.

3. User-managed application configuration: Sometimes, we want to allow application users (e.g administrators or operators) to dynamically adjust some configuration parameters from time to time. This adjustment becomes possible through application’s UI for users that hold the required role/credentials.

Where to put: database

This kind of application-wide configuration usually does not depend on the execution environment and in order to be manageable through the UI, we usually store them in the database.

4. Literals related to business logic : values editable by developers that are not expected to change. These values are usually part of the business logic (e.g they express the status of a domain object, error messages etc.)

Where to put: (static) constants or enumerated types.

In some cases, it is totally fine to keep them hard-coded. Especially, when the context makes their semantics clear or if you have some string literals that are used only once. Keep in mind that moving them to in-band config files will not save you from re-deployment. If the semantics are not clear (heard of magic numbers ?), use constants and try to use descriptive names.

e.g

 $delayInSeconds = calculateDelay();
 $delayInHours = $delayInSeconds / 3600;  

The 3600 number is self-explained by context.

e.g

switch (token.Text) {
     case "+":     return a + b;
     case "-":     return a - b;
     //etc.
 }

These literals can be uniquely identified. No need for replacement.

Of course, even when the semantics are clear, which is usually the case for string literals, replacing them comes with benefits. The most important is avoiding mistyping the literal in some of its appearances. Moreover, using namespaced constants allows you to re-use a literal under different contexts. Generally speaking, when a value assigned to (or compared with) a variable belongs to countable and limited set of potential values, prefer a enumerated type.