CAVEAT: This is pre-released software. There isn’t even a NuGet package for it yet. The GitHub repo lives here – https://github.com/dotnet-architecture/HealthChecks that I cloned and manually added a project reference to a new demo app. There is a plan for it to get made into a NuGet package, but that hasn’t happened yet.
With that said – I will keep this post updated as I see changes being made, and will eventually remove the Caveat above.
Register your Health Checks in your
Inject in an
IHealthCheckService and call
CheckHealthAsync to run all your Health Checks, or another method like
RunGroupAsync to run a subset of the Health Checks you registered in
What is a Health Check?
Health Checks are pretty much what the name implies. They are a way of checking to see if your application is healthy. Health Checks have become especially critical as more and more applications are moving to a microservice-style architecture. While microservice architectures have many benefits, one of the downsides is there is a higher operations overhead to ensuring all of these services are running. Rather than monitoring the health of one majestic monolith, you need to monitor the status of many different services, which are usually responsible for one thing and one thing only. Heatlh Checks are usually used in combination with a service discovery tool such as Consul that monitor your microservices for when they become healthy and unhealthy. If you use Consul for service discovery as well, Consul will automatically route traffic away from your unhealthy microservices and only serve traffic to your healthy microservices… which is awesome.
How do I implement a Health Check?
There are a few different ways to do Health Checks, but the most common way is exposing an HTTP endpoint to your application dedicated to doing Health Checks. Typically you will return a status code of 200 if everything is good, and any non-2xx code means something went wrong. For example, you might return a 500 if something went wrong along with a JSON payload of what exactly went wrong.
Common scenarios to Health Check
What you Health Check will be based on what your application/microservice does, but some common things:
- Can my service connect to a database?
- Can my service query a 3rd party API?
- Likely making some read-only call
- Can my service access the file system?
- Is the Memory and/or CPU above a certain threshold?
Looking at the Microsoft.AspNetCore.HealthChecks package
Microsoft is on the verge of shipping a set of Health Check packages to help you solve this problem in a consistent way. If you look in the GitHub repo you will notice there is also a package for ASP.NET 4.x as well under the Microsoft.AspNet.HealthChecks namespace. There is a samples folder on that GitHub repo that contains how to wire that up if you’re interested in ASP.NET 4.x. I’m going to focus on the ASP.NET Core package for this blog post.
The Microsoft.AspNetCore.HealthChecks package targets
netcoreapp1.0, but I suspect this will change to be either
netstandard2.0 by the time this RTM’s. The ASP.NET 4 project targets
net461, and all the other libraries target
netstandard1.3which works with both .NET Core and Full Framework.
The basic flow is that you register your health checks in your IoC container of choice (such as the built-in Microsoft one, although I prefer SimpleInjector due to the fantastic feature set, blazing speed, and ridiculously good documentation, but I’ll just use the built-in one for these demos). You register these Health Checks via a fluent
HealthCheckBuilder API in your
ConfigureServices method. This
HealthCheckBuilder will build a
HealthCheckService and register it as an
IHealthCheckService in your IoC container.
That looks something like this:
So to run your Health Checks you inject in an
IHealthCheckService into your Controller and then call
There are a few things to note here:
- You get back a
CompositeHealthCheckResultwhich is a summary of all of your health checks that you registered in your
AddHealthChecksmethod in your
CompositeHealthCheckResultclass has a
CheckStatusproperty which is an enum. That enum has 4 options –
Unknown. You can determine what you want to do with each of those. In my simple example above, I consider anything other than
Healthyto be a problem and return a 500 if it’s not
You can also loop over the results of the
CompositeHealthCheckResult by looking at the
Results property and get even more detail about what exactly happened.
You can optionally run a single Health Check by calling
RunCheckAsync and supplying the name of the Health Check that you registered in your
ConfigureServices method (more on that later).
Out of the box Health Checks
Microsoft ships quite a few Health Checks out of the box that fit into the Common Scenarios section above. They are:
- URL Health Check via
- SQL Server Health Check via
- PrivateMemorySizeCheck via
- VirtualMemorySizeCheck via
- WorkingSetCheck via
- A few Azure Health Checks (such as BLOB Storage, Table Storage, File Storage, and Queue storage).
Let’s take a look at the URL Health Check and the SQL Server Health Check.
URL Health Checks
The URL Health Check lets you specify a URL and then it will execute a
GET to that URL and see if the URL returns a Success Status Code or not (any 2xx Status Code like 200).
You can register the URL Health Check by adding this.
Then you inject your
IHealthCheckService and call
CheckHealthAsync as shown above. If you want to just run this single Health Check, and not others you may have registered, you’ll need to know that the name is not configurable. The name will be UrlCheck(https://github.com). So you would run that single check with
Another thing to note, that second parameter where I’m passing
TimeSpan.FromMilliseconds(1) is the
CacheDuration of the
HealthCheckResult. The default is 5 minutes. So if you have some other service (like Consul) pinging your Health Check endpoint every minute, the
HealthCheckResult will be the same for 5 minutes until the
CacheDuration expires. To me, that doesn’t make a ton of sense, and I don’t want to risk an up to 5 minute delay on being notified when my service becomes unhealthy. So by only adding a 1 millisecond cache, I’m effectively adding no caching at all.
There is also another parameter to the
AddUrlCheck method where you can pass a Func to the URL Checker. This is nice in scenarios such as:
- You want to execute something other than a
- You need to do something special with the HttpRequest in general such as add Auth Headers or something.
- You want to validate the response’s Content contains some specific words or HTML.
So the URL Check should satisfy just about any Web check you could possibly want to do with that flexibility.
Built-in SQL Server Health Checks
The SQL Check lets you specify a name and a connection string to connect to.
The first parameter, “SQL DB Check”, is just the name I chose. You can make it whatever makes sense to you. To run this check, as mentioned above, you would call this from
Making your own custom Health Check
You can of course make your own custom Health Check. For me, most of my use cases are solved by the built-in ones, as I’m usually checking if an API is available (which I could do with the URL Check and overriding the
checkFunc parameter) or I’m checking to see if a SQL Server is available. But you could implement your own if you are missing some functionality that you need such as checking if another DB store is available or how much free space a drive has.
To do that, derive from
IHealthCheck and implement the interface. Below is an example of one that checks to make sure the C drive has at least 1 GB of free space.
Then in your
ConfigureServices method, register the custom Health Check with the lifestyle that makes sense for the Health Check (Singleton, Scoped, Transient, etc.) and then add it to the
AddHealthChecks registration that we’ve done before.
Group your health checks together
You can group your health checks together in a
HealthCheckGroup if you want (such as all performance checks like CPU, Memory, Disk Space, etc. go under a group called “performance”) or you can let them live on their own and mix and match.
This enables you to do things like only call that Group of Health Checks via the
RunGroupAsync method off of
Reminder – this is demo code. Some flaws include that my Health Check endpoint is unsecured and anyone can hit the endpoint. You will likely want to secure your Health Check endpoint, especially if it is on the Internet, so someone doesn’t spam your Health Check endpoint. There are many ways to do this, but are outside the scope of this blog post.
Some Feedback on the Design
Overall I think this abstraction is really useful, and I will use it myself once it RTM’s. The built-in health checks are nice, so that you don’t have to write that logic yourself. I’m all about punting as much logic onto someone else as possible.
There are some little things I wish were a little easier though.
- It seems like the
HealthCheckResult.CheckStatus == CheckStatus.Healthycode is going to be extremely common. It’d be nice if there was a helper prop off of
HealthCheckResult.IsHealthywhich does that computation for you. Much like
IsSuccessStatusCodeproperty which is super useful. Although, I understand that “Healthy” is a relative term that’s tough to globally define. Some people might think that the
CheckStatus.Warningwould qualify as being Healthy and others wouldn’t. Ubiquitous languages are hard.
- I wish there was a way you could override the name for the built-in Health Checks. Like the URL Health Check automatically takes on the Name
UrlCheck(http://google.com). You’ll need this name if you want to pull out the specific results of a Health Check. I had expected to be able to specify the name of each Health Check and store it in something like a HealthCheckConstants class for easily retrieval. Instead, I need to follow this convention when using the constants class, which isn’t the end of the world, but being able to override the name would be nice.
- When calling the
RunGroupAsyncmethod, I wish you could just specify the group name rather than the
HealthCheckGroupinstance and let the
RunGroupAsyncmethod handle getting the HealthCheckGroup instance.
- There should be no Cache Duration on the URL Check. 5 minutes is just too long of a default, and IMO there should be no Cache Duration at all on the URL Check. I control the frequency of how often my Health Check monitoring service hits my health check endpoint. If I want it to check my service every minute, then I’d probably expect the Health Check result to be fresh every time and not be cached.
Overall, I really like this package, and it seems like it’s going to be really useful. I plan on using this when it RTM’s, so I’ll keep this post up to date when I see they make changes to this package.