A Complete Walkthrough of the Grafana Observability Dashboard

Once you have imported the dashboard and connected your datasources, you are looking at a real-time window into your io.Connect deployment. Here is what each section tells you and how to make the most of it.


Filtering your view

Before diving into the individual sections, it is worth understanding the filter bar that runs across the top of the dashboard. Every single panel on every page responds to these filters — they are the primary way to scope the data down from your entire fleet to exactly what you care about.

The five filter dropdowns scope the entire dashboard simultaneously. All values are populated dynamically from your Prometheus data.

There are five filters available:

Filter What it scopes Typical use case
userId All metrics to a specific user or set of users Investigating a complaint from a specific person, or comparing behaviour across user segments
machine All metrics to a specific host or machine Isolating whether a performance issue is system-wide or tied to a particular node
application App-level panels to a specific application Drilling into the crash rate, startup time, or CPU usage of a single app
platform_version All metrics to a specific platform release Comparing the behaviour of a new release against the previous one before a full rollout
layout Workspace and layout panels to a specific layout Understanding the startup and performance characteristics of a particular workspace layout

All dropdowns support multi-select, so you can compare two platform versions side by side, or look at a group of related machines at once. They default to All, meaning no filtering is applied until you change them.

Note: The filter values are populated dynamically from your Prometheus labels at dashboard load time. If a dropdown is empty, it usually means the corresponding label (user, machine, application, platformVersion, or layout) is not being emitted by your metrics scrape config.


The big picture: KPI row

The first thing you see below the filters is a row of stat panels giving you an at-a-glance health summary of your entire platform. Platform and app startup times, CPU and memory usage, total crashes, and error counts — all in one place, colour-coded so problems are immediately visible without having to read a single number.

] Red panels signal something that needs attention; green means things are within a healthy range. The colours update in real time as your data changes.

This row is intentionally designed as a quick health check rather than a detailed view. If something is red, it tells you where to look next — the sections below give you the detail to understand why.

The system metrics charts beneath the KPI row show CPU and memory trends over time, making it easy to spot spikes and correlate them with other events.


Platform metrics: version-by-version breakdown

The platform section is where version comparisons become very powerful. Three pie charts break down active users, app crashes, and platform errors by platform version — so you can immediately see whether a particular release is behaving differently from the others, even before it has fully rolled out.

The three pie charts let you compare user distribution, crash rates, and error rates across platform versions at a glance. A disproportionate crash share on one version is an immediate red flag.

The key thing to look for here is imbalance. If two versions have a similar number of users but one accounts for a much larger share of crashes or errors, that is a strong signal of a regression. From there, use the platform_version filter to lock the entire dashboard to that version and investigate every other metric in its context.


App metrics: what is running and how hard

The app metrics section gives you per-application CPU and memory rankings, a usage share breakdown, and a time-spent chart. Together these answer two important questions: which apps are the heaviest consumers of system resources, and which ones your users actually spend time in — which are not always the same apps.

The bar charts rank apps by CPU and memory consumption, while the pie chart shows relative usage share. An app that ranks high on resources but low on usage is a good candidate for optimisation.

The error rate chart over time makes it easy to spot when something went wrong. The crash and error totals per app below it show which apps are contributing most to the problem.

The error rate chart is particularly useful for correlating incidents with specific timeframes. If you see a spike, check whether it aligns with a deployment, a configuration change, or a peak usage period. Use the application filter to narrow the view to a single app and confirm whether the issue is isolated or platform-wide.


Startup times and user reach

This section shows which apps take longest to load and how many unique users each app reaches. App startup time is one of the most direct indicators of user experience — consistently slow startup times will be felt immediately by end users, even if everything else looks healthy.

The startup time ranking shows which apps are slowest to load, while the unique user count shows actual reach. The workspace section below it breaks down load frequency and average startup time per workspace type.

The workspace breakdown at the bottom of this page is worth a close look. Different workspace types can have significantly different startup profiles depending on how many components and apps they load. If users of a particular workspace are reporting slowness, this is where you will see it quantified. Use the layout filter to isolate that workspace across the whole dashboard.


Workspace and layout performance

This section drills further into workspace behaviour, breaking down component startup times, app loading times within workspaces, and layout startup times side by side. It is especially useful when you have multiple workspace configurations and want to understand the performance trade-offs between them.

The horizontal bar chart at the bottom compares layout startup times across all workspace types simultaneously — a quick way to identify which layout configuration is the most expensive to load.


Logs and traces

The final two panels tie everything together with raw logs and distributed traces. These require Loki and Tempo datasources to be configured respectively. If you are running a full Grafana LGTM stack, they will populate automatically alongside your metrics.

The logs panel streams directly from Loki. The userId and machine filters at the top let you scope the log stream to a specific user or host, making it much easier to find relevant entries.

The traces panel lists spans from Tempo, sortable by service, duration, and start time — useful for pinpointing slow operations down to the individual request level.

If these panels show “No data”, it is most likely because Loki or Tempo have not been configured yet. Metrics-only deployments will still get full value from everything above — logs and traces are additive, not required.


Ready to import the dashboard into your own Grafana instance? Check out the setup guide for step-by-step instructions on connecting your datasources and getting everything running.