Timothy Wolodzko - Environment variables

Using environment variables for storing configuration is a de facto standard. It was one of the recommendations from Heroku’s Twelve-Factor App guide and has become widely adopted since then. We are all familiar with environment variables, but there are many, less known, yet nice to know subtleties.

I would be focusing on environment variables in Unixes, such as Linux or macOS. Windows is a different story, I won’t be covering it here.

1. Environment variables in Linux and macOS

Shell variables are defined using the following syntax:

export KEY=value

It defines a variable that would be available within the enclosing shell and for all the [subprocesses of the shell] subprocess. You can define system-wide variables, available for all the users, in the global /etc/profile config (you need to be root to edit it) or user-specific ~/.profile configurations. Additionally, there are shell-specific configurations like ~ /.bashrc for Bash, or ~/.zshrc for ZSH, etc. Unlike ~/.profile, they are used when running a particular kind of shell. Variables are accessed using the $KEY or ${KEY} syntax.

Variable names can consist of letters, digits, and underscores. By convention, only uppercase letters are allowed. Names with and without underscores are used, e.g. LC_CTYPE, TMPDIR, PYTHONPATH. To view all the currently defined environment variables, use the env command.

To avoid problems with character encoding non-ASCII data can be encoded using Base64. You can use the base64 build-in command-line tool or the utilities available for your programming language.

2. Command-line usage

To define a shell variable from the command-line you can use the local declaration:

$ NAME=Joe
$ echo "$NAME"
Joe

However, such a variable would not be available for the subprocesses running within the shell. Consider the simple hello.py script:

import os

name = os.environ['NAME']
print(f'Hello {name}!')

It will not recognize the local variable:

$ NAME=Joe
$ python hello.py
Traceback (most recent call last):
  File "hello.py", line 3, in <module>
    name = os.environ['NAME']
  File "[...]/os.py", line 675, in __getitem__
    raise KeyError(key) from None
KeyError: 'NAME'

For the variable to be visible within the subprocess, you either need to pass it inline:

$ NAME=Jenny python hello.py
Hello Jenny!
$ echo "$NAME"
Joe

or export it beforehand:

 $ export NAME=Bob
 $ python hello.py
Hello Bob!
 $ echo "$NAME"
Bob

3. Passing variables through the SSH connection

Environment variable can be passed through the SSH connection. One use-case for this is that you can take your local configuration “with you” when connecting to a remote machine, rather than needing to configure it independently.

4. Accessing environment variables from the code

Programming languages commonly expose getters and setters for environment variables. In Go, there are os.Getenv and os.Setenv, in Python there are os.getenv and os.setenv functions, as well as the os.environ mapping object that behaves almost as a Python’s dict, but reads and writes to environment variables.

When using those functions, keep in mind that your program runs in a subshell, so setting or changing the variable would affect the subshell, but not the parent shell:

 $ python
>>> import os
>>> os.environ['NAME']
'Bob'
>>> os.environ['NAME'] = 'Joe'
>>> os.environ['NAME']
'Joe'
>>> exit()
 $ echo "$NAME"
Bob

Forgetting this may lead to bugs when trying to pass information between programs via the environment variables. This won’t work, because only the variables within the subshell are modified, rather than the “global” variables, as shown in the example above.

5. The .env files

Another popular solution is the .env files. The files are used for storing project-specific variables and can overwrite the already available variables. The format of the .env file is simple:

NAME=John
SURNAME=Doe
EMAIL=johndoe@example.com

It can be loaded directly:

 $ source .env
 $ python hello.py
Hello John!

In many cases, this is won’t be needed, as the .env files are being auto-loaded by different software. If you are using the ZSH shell, the dotenv plugin would auto-load a .env file each time you enter a directory containing it. The same is done by virtual environment management tools such as Python’s Pipenv.

There are many solutions for using .env files from code, for example, python-dotenv or the much more sophisticated environs package for Python. Those packages will let you load and parse the .env files to read the configuration.

Since .env files often store sensitive data such as login credentials, it is a good practice to always keep them in .gitignore, so as not to accidentally expose them in a git repository.

6. Docker containers and cloud apps

Environment variables can be hardcoded for Docker containers using the ENV key value instruction in Dockerfile dockerfile. Such variables are accessible within the Dockerfile using the ${key} syntax. They can also be easily passed to a container using the -e argument:

$ docker run -e NAME=Bill debian:stable-slim bash -c 'echo Hi $NAME!'
Hi Bill!

Not only Heroku but also Kubernetes, its derivatives, different cloud platforms, and services let you pass environment variables to the virtual machines, containers, workflows, etc. While this is done differently depending on if you are using Kubernetes, GitHub Actions, Jenkins, an MLOps platform, or something else, they would usually allow you to define a list of key-value pairs for the variables in a YAML configuration file. Usually, there would be two kinds of environment variables: the regular “env” ones and the “secret” variables. Secrets are implemented differently by different software, but commonly they are stored in encrypted form (but not always, for example, by default there is nothing secret about secrets in Kubernetes) and are not directly accessible by the users. When passed to containers, they behave the same as regular environment variables. All this happens behind the scenes, but you should be aware of the difference.