Updated 7/17/2018: This package will be released in ASP.NET Core 2.2 which is going to come out in Q4 2018 currently.
CAVEAT: This is pre-released software. There isn’t even a NuGet package for it yet. The GitHub repo lives here – https://github.com/dotnet-architecture/HealthChecks that I cloned and manually added a project reference to a new demo app. There is a plan for it to get made into a NuGet package, but that hasn’t happened yet.
With that said – I will keep this post updated as I see changes being made, and will eventually remove the Caveat above.
tldr;
Register your Health Checks in your ConfigureServices
method.
Inject in an IHealthCheckService
and call CheckHealthAsync
to run all your Health Checks, or another method like RunCheckAsync
or RunGroupAsync
to run a subset of the Health Checks you registered in ConfigureServices
.
What is a Health Check?
Health Checks are pretty much what the name implies. They are a way of checking to see if your application is healthy. Health Checks have become especially critical as more and more applications are moving to a microservice-style architecture. While microservice architectures have many benefits, one of the downsides is there is a higher operations overhead to ensuring all of these services are running. Rather than monitoring the health of one majestic monolith, you need to monitor the status of many different services, which are usually responsible for one thing and one thing only. Health Checks are usually used in combination with a service discovery tool such as Consul that monitor your microservices for when they become healthy and unhealthy. If you use Consul for service discovery as well, Consul will automatically route traffic away from your unhealthy microservices and only serve traffic to your healthy microservices… which is awesome.
How do I implement a Health Check?
There are a few different ways to do Health Checks, but the most common way is exposing an HTTP endpoint to your application dedicated to doing Health Checks. Typically you will return a status code of 200 if everything is good, and any non-2xx code means something went wrong. For example, you might return a 500 if something went wrong along with a JSON payload of what exactly went wrong.
Common scenarios to Health Check
What you Health Check will be based on what your application/microservice does, but some common things:
- Can my service connect to a database?
- Can my service query a 3rd party API?
- Likely making some read-only call
- Can my service access the file system?
- Is the Memory and/or CPU above a certain threshold?
Looking at the Microsoft.AspNetCore.HealthChecks package
Microsoft is on the verge of shipping a set of Health Check packages to help you solve this problem in a consistent way. If you look in the GitHub repo you will notice there is also a package for ASP.NET 4.x as well under the Microsoft.AspNet.HealthChecks namespace. There is a samples folder on that GitHub repo that contains how to wire that up if you’re interested in ASP.NET 4.x. I’m going to focus on the ASP.NET Core package for this blog post.
The Microsoft.AspNetCore.HealthChecks package targets netcoreapp1.0
, but I suspect this will change to be either netcoreapp2.0
or netstandard2.0
by the time this RTM’s. The ASP.NET 4 project targets net461
, and all the other libraries target netstandard1.3
which works with both .NET Core and Full Framework.
Getting Started
The basic flow is that you register your health checks in your IoC container of choice (such as the built-in Microsoft one, although I prefer SimpleInjector due to the fantastic feature set, blazing speed, and ridiculously good documentation, but I’ll just use the built-in one for these demos). You register these Health Checks via a fluent HealthCheckBuilder
API in your Startup
‘s ConfigureServices
method. This HealthCheckBuilder
will build a HealthCheckService
and register it as an IHealthCheckService
in your IoC container.
That looks something like this:
So to run your Health Checks you inject in an IHealthCheckService
into your Controller and then call CheckHealthAsync
.
There are a few things to note here:
- You get back a
CompositeHealthCheckResult
which is a summary of all of your health checks that you registered in yourAddHealthChecks
method in yourStartup
class. - That
CompositeHealthCheckResult
class has aCheckStatus
property which is an enum. That enum has 4 options –Healthy
,Unhealthy,
Warning
, andUnknown
. You can determine what you want to do with each of those. In my simple example above, I consider anything other thanHealthy
to be a problem and return a 500 if it’s notHealthy
.
You can also loop over the results of the CompositeHealthCheckResult
by looking at the Results
property and get even more detail about what exactly happened.
You can optionally run a single Health Check by calling RunCheckAsync
and supplying the name of the Health Check that you registered in your ConfigureServices
method (more on that later).
Out of the box Health Checks
Microsoft ships quite a few Health Checks out of the box that fit into the Common Scenarios section above. They are:
- URL Health Check via
AddUrlCheck
- SQL Server Health Check via
AddSqlCheck
- PrivateMemorySizeCheck via
AddPrivateMemorySizeCheck
- VirtualMemorySizeCheck via
AddVirtualMemorySizeCheck
- WorkingSetCheck via
AddWorkingSetCheck
- A few Azure Health Checks (such as BLOB Storage, Table Storage, File Storage, and Queue storage).
Let’s take a look at the URL Health Check and the SQL Server Health Check.
URL Health Checks
The URL Health Check lets you specify a URL and then it will execute a GET
to that URL and see if the URL returns a Success Status Code or not (any 2xx Status Code like 200).
You can register the URL Health Check by adding this.
Then you inject your IHealthCheckService
and call CheckHealthAsync
as shown above. If you want to just run this single Health Check, and not others you may have registered, you’ll need to know that the name is not configurable. The name will be UrlCheck(https://github.com). So you would run that single check with RunCheckAsync("UrlCheck(https://github.com)").
Another thing to note, that second parameter where I’m passing TimeSpan.FromMilliseconds(1)
is the CacheDuration
of the HealthCheckResult
. The default is 5 minutes. So if you have some other service (like Consul) pinging your Health Check endpoint every minute, the HealthCheckResult
will be the same for 5 minutes until the CacheDuration
expires. To me, that doesn’t make a ton of sense, and I don’t want to risk an up to 5 minute delay on being notified when my service becomes unhealthy. So by only adding a 1 millisecond cache, I’m effectively adding no caching at all.
There is also another parameter to the AddUrlCheck
method where you can pass a Func to the URL Checker. This is nice in scenarios such as:
- You want to execute something other than a
GET
. - You need to do something special with the HttpRequest in general such as add Auth Headers or something.
- You want to validate the response’s Content contains some specific words or HTML.
So the URL Check should satisfy just about any Web check you could possibly want to do with that flexibility.
Built-in SQL Server Health Checks
The SQL Check lets you specify a name and a connection string to connect to.
The first parameter, “SQL DB Check”, is just the name I chose. You can make it whatever makes sense to you. To run this check, as mentioned above, you would call this from IHealthCheckService
.
Making your own custom Health Check
You can of course make your own custom Health Check. For me, most of my use cases are solved by the built-in ones, as I’m usually checking if an API is available (which I could do with the URL Check and overriding the checkFunc
parameter) or I’m checking to see if a SQL Server is available. But you could implement your own if you are missing some functionality that you need such as checking if another DB store is available or how much free space a drive has.
To do that, derive from IHealthCheck
and implement the interface. Below is an example of one that checks to make sure the C drive has at least 1 GB of free space.
Then in your ConfigureServices
method, register the custom Health Check with the lifestyle that makes sense for the Health Check (Singleton, Scoped, Transient, etc.) and then add it to the AddHealthChecks
registration that we’ve done before.
That’s it!
Group your health checks together
You can group your health checks together in a HealthCheckGroup
if you want (such as all performance checks like CPU, Memory, Disk Space, etc. go under a group called “performance”) or you can let them live on their own and mix and match.
This enables you to do things like only call that Group of Health Checks via the RunGroupAsync
method off of IHealthCheckService
.
Security
Reminder – this is demo code. Some flaws include that my Health Check endpoint is unsecured and anyone can hit the endpoint. You will likely want to secure your Health Check endpoint, especially if it is on the Internet, so someone doesn’t spam your Health Check endpoint. There are many ways to do this, but are outside the scope of this blog post.
Some Feedback on the Design
Overall I think this abstraction is really useful, and I will use it myself once it RTM’s. The built-in health checks are nice, so that you don’t have to write that logic yourself. I’m all about punting as much logic onto someone else as possible.
There are some little things I wish were a little easier though.
- It seems like the
HealthCheckResult.CheckStatus == CheckStatus.Healthy
code is going to be extremely common. It’d be nice if there was a helper prop off ofHealthCheckResult
likeHealthCheckResult.IsHealthy
which does that computation for you. Much likeHttpResponseMessage
has aIsSuccessStatusCode
property which is super useful. Although, I understand that “Healthy” is a relative term that’s tough to globally define. Some people might think that theCheckStatus.Warning
would qualify as being Healthy and others wouldn’t. Ubiquitous languages are hard. - I wish there was a way you could override the name for the built-in Health Checks. Like the URL Health Check automatically takes on the Name
UrlCheck(yourUrlHere)
such asUrlCheck(http://google.com)
. You’ll need this name if you want to pull out the specific results of a Health Check. I had expected to be able to specify the name of each Health Check and store it in something like a HealthCheckConstants class for easily retrieval. Instead, I need to follow this convention when using the constants class, which isn’t the end of the world, but being able to override the name would be nice. - When calling the
RunGroupAsync
method, I wish you could just specify the group name rather than theHealthCheckGroup
instance and let theRunGroupAsync
method handle getting the HealthCheckGroup instance. - There should be no Cache Duration on the URL Check. 5 minutes is just too long of a default, and IMO there should be no Cache Duration at all on the URL Check. I control the frequency of how often my Health Check monitoring service hits my health check endpoint. If I want it to check my service every minute, then I’d probably expect the Health Check result to be fresh every time and not be cached.
Overall, I really like this package, and it seems like it’s going to be really useful. I plan on using this when it RTM’s, so I’ll keep this post up to date when I see they make changes to this package.