Wearable tech has drastically changed the way we take care of our health. Instead of regular checkups, we are relying on all the smart devices to constantly monitor our health metrics, log our activities, provide a holistic view of trends, and even poke us if we sit for too long. Fortunately, Anthos platform is equipped with all the “wearable tech” (metrics, logging, customized dashboards, proactive alerts) out-of-the-box. This blog will guide you to use Cloud Monitoring and Cloud Logging functions to keep your Anthos GKE On-Prem platforms healthy.
Configuring Anthos GKE On-Prem
Your organization may have multiple Anthos platforms connected to different Google Cloud projects. There are two ways to design your monitoring and logging architecture. You can route all monitoring and logging data to one centralized Google Cloud project, or you can use the same Google Cloud project to host that Anthos platform’s monitoring and logging data.
There are three steps to set up Anthos GKE On-Prem Cloud Monitoring and Cloud Logging:
- Enable the required APIs in your monitoring-logging project:
gcloud services enable --project [MONITORING_LOGGING_PROJECT_ID] \ stackdriver.googleapis.com \ monitoring.googleapis.com \ logging.googleapis.com \ serviceusage.googleapis.com \ iam.googleapis.com \ cloudresourcemanager.googleapis.com
- Create a service account in your monitoring-logging project, assign IAM roles (stackdriver.resourceMetadata.writer, logging.logWriter, monitoring.metricWriter), download a service account key.
- Populate the configuration file’s stackdriver section:
stackdriver: projectID: # monitoring-logging project ID clusterLocation: # region where you want to store logs enableVPC: # whether to use restricted IP addresses serviceAccountKeyPath: # path of the JSON key file for the service account
With the configurations above, the required resources will be deployed to your Anthos clusters.
Customizing Cloud Monitoring
After the configuration steps are complete, Anthos GKE clusters will feed the monitoring data to Cloud Monitoring. This section describes how you can use the major components of Cloud Monitoring to monitor Anthos’ health.
Metrics Explorer can be treated as a scratchpad for you to explore different metrics and build charts for your purposes. You can search the resource type and metric, filter, and even aggregate data by groups. To get a list of all available metrics for Anthos, search for
Find resource type and metric field. A full list of metrics can also be found here.
Below is an example chart using Metrics Explorer that showcases the received bytes per nodes of Anthos cluster:
After finding the metrics, you can save the metrics chart(s) to Monitoring Dashboards and distribute dashboards to other teams. The dashboards can aggregate all relevant charts into one single view, which provides a single pane of glass for all the information that end-users are interested in. Multiple Anthos dashboards can be built to better fit your organization’s operation model. For instance, you could have a dashboard per application, or you can build separate dashboards for the networking team, security team, SRE team, etc.
Dashboards can be configured manually or through JSON files. Dashboards are also interactive, meaning you can apply grouping, filtering, zooming to the charts.
Below is an example dashboard monitoring the health of pods running on Anthos:
In Cloud Monitoring, alerts can be easily set up and integrated with a range of communication channels. You can set up the target metrics and conditions to trigger alerts to a specific group of people/channels to engage the correct persona immediately. Documentation can also be added to each alert to specify remediation steps during incidents. Alerts can be configured manually or through JSON files.
Apart from using Anthos metrics, you can also set up alerts for service level objectives (SLO) for your microservices if Anthos Service Mesh (ASM) is enabled. This provides a great way to proactively monitor your application/platform’s performance before breaching service level agreement (SLA). It also help with root cause analysis thanks to the better visibility into your applications’ traffic.
Viewing Cloud Logging
Similar to Cloud Monitoring, Cloud Logging receives logs from Anthos clusters and provides a resovior with a powerful search engine. Some example queries for Anthos can be found here. Queries can also be saved for future uses.
Below is an example of log viewer (Preview) version. The view provides not only the detailed logs, but interactive dashboards with filtering and analytical functions as well.
Besides manually composing log queries for Anthos workloads, there is a shortcut to view container logs. From Google Cloud Console, you can browse to
Kubernetes Engine > Workloads. Select the workload you are investigating, click on
Container logs link. This will bring you to Cloud Logging page with query field auto-populated.
In my opinion, your Day 2 operations team can truly leverage Google Cloud Monitoring and Cloud Logging for Anthos for these reasons:
All (but not limited to) the functions discussed in this blog are available out-of-the-box for your Anthos platform. There are no extra licensing or deployments needed.
Single pane of glass
Cloud Monitoring and Cloud Logging are part of GCP Operations (formerly Stackdriver). This tool can also be configured to monitor and log GCP and AWS resources. For your hybrid/multi cloud solutions, GCP Operations can provide a single view of your platforms/applications’ health status and logs.
Cloud Monitoring and Cloud Logging are managed by Google Cloud Platform. These toolsets require zero maintenance from your team, compared to other on-prem monitoring and logging tools.
If you are interested to learn more about Anthos Day 2 operations or Google Cloud Monitoring and Logging, please leave a comment or check out our Anthos Blueprint Solutions for more details.