Logging Levels

Introduction

When you start to use a proper logging package, you first need to understanding what logging levels are.

Logging levels allow you to categorize the type of message you want to communicate. When we used the cat() function to print out different steps in our code, you cannot easily tell if what your message is communicating is just informational, a warning, or perhaps even a error message. It may be clear to you, but it also needs to be clear to anyone else that may be interacting with your code. And writing cat("INFO: An informational message") is not very efficient.

Another drawback to using only cat() is that maybe you want control over what messages get printed out based on some minimum threshold. In other words, maybe you have several informational messages in your code, but you only really need to see that output when you’re developing, and not when you’ve pushed your code to production. Having to delete those messages is inconvenient in case you need them again.

Logging levels allows you to:

  1. Define the type of message
  2. Define a threshold for what messages display and when.

Let’s get into examples with log4r

suppressPackageStartupMessages(
  library(log4r)
)
# We need to create a logger object. 
# By default, logs will print to console 
# and only show messages INFO and above

logger <- logger()

log4r::debug(logger, "This is a debug message")
log4r::info(logger, "This is an informational message")
log4r::warn(logger, "This is a warning message")
log4r::error(logger, "This is an error message")
INFO  [2023-09-30 22:04:35] This is an informational message
WARN  [2023-09-30 22:04:35] This is a warning message
ERROR [2023-09-30 22:04:35] This is an error message

See how in the above example, we have four different logging levels (debug, info, warn, and error). Only three of them printed out something. That’s because our logger object was left to the default logger() object of INFO.

Let’s define it to print DEBUG so we can get that first debug message printed out.

library(log4r)

logger <- logger(threshold = "DEBUG")

log4r::debug(logger, "This is a debug message")
log4r::info(logger, "This is a informational message")
log4r::warn(logger, "This is a warning message")
log4r::error(logger, "This is an error message")
DEBUG [2023-09-30 22:04:35] This is a debug message
INFO  [2023-09-30 22:04:35] This is a informational message
WARN  [2023-09-30 22:04:35] This is a warning message
ERROR [2023-09-30 22:04:35] This is an error message
You do not need to define `threshold` statistically in your code. You can also define it as an environment variable, which gives you more flexibility to change the threshold depending on your needs, without changing the code itself.

What Logging Levels Exist?

We’ve gotten into a few logging levels, but there are a few more to be aware of. And what do they mean anyways?

A great resource to start off with is actually from the python logging package.. I’ve also included a few examples of my own below.

Level Description Example Code Example
DEBUG Messages that are helpful in providing an exceptional level of detail that you normally would not need to see, especially for debugging purposes Printing out the output of something mid-way through a series of processing steps, just to check that things look right. debug(logger, paste0("What my df looks like before processing: ", df))
INFO Describes steps or processes happening in your code If you are connecting to a database, you may have a message saying “Connecting to external database” info(logger, "Attempting to connect to database")
WARNING Outputs a message that is recoverable and does not affect the final output of your code Maybe a step involves changing the data type (numeric to character, for example), assuming it’s okay to change it. It shouldn’t cause problems, but you should let your user know. warn(logger, "Ambiguous date format. Assuming date format is mm-dd-yyyy.")
ERROR Something bad happened, but there is an alternative solution and is recoverable. You attempt to connect to a database, but it failed, so it’s using a different backup database instead. error(logger, "Could not connect to database. Used secondary database connection instead.")
FATAL Something bad happened, and it is not recoverable. The program must quit. You cannot connect to any databases, and it’s required for your program to run, and there are no alternatives. fatal(logger, "Could not establish connection to primary or secondary database. Stopping process.")

What Should You Log?

Determining what you should log, and how much, is really dependent on you and your own situation. Some organizations who have logging policies may already have formed their own opinions based on experience and problems they’ve uniquely encountered during their deployments.

One cautionary story from Jeff Atwood (among many things, software developer and co-founder of Stack Overflow) discusses The Problem With Logging. He recalls a time when logging itself was actually the source of buggy code when the server experienced heavy load (logging and database writing resulting in deadlocks). As a result, Stack Overflow logs only exception handling (e.g., result of tryCatch).

Not everyone is building a website that will be heavily visited like Stack Overflow, so you may not even experience deadlocks. Maybe your API that you are logging is only receiving requests 10 times a day. Does that mean you should log every single outcome? Probably not. But having some informational messages at key and major parts of your code may by useful.

In the next section I will discuss more concrete examples of logging and implementing it into your code, especially for exception handling.