Monitoring¶

Service manager

Examples on this page do not assume systemd. Navio documentation deliberately avoids systemd-centric tooling — the project's stance is that operators should choose their own init / supervisor (runit, OpenRC, s6, supervisord, launchd, Docker, plain tmux/screen, or a release-specific service manager). Substitute your supervisor's log / restart commands where needed.

Health checks¶

RPC liveness¶

navio-cli getblockchaininfo | jq '.blocks, .headers, .initialblockdownload, .verificationprogress'

Healthy:

blocks == headers (or within a small lag).
initialblockdownload: false.
verificationprogress: 1.0.

Peer count¶

navio-cli getconnectioncount

< 8 is a warning; < 2 is effectively disconnected. getnetworkinfo.networks confirms reachable address families.

Mempool¶

navio-cli getmempoolinfo | jq '.size, .bytes, .usage, .mempoolminfee'

Staking¶

Navio does not ship a getstakinginfo RPC. Verify the staker is functioning by inspecting:

The navio-staker process's own stdout/stderr — it logs each template poll, eligibility check, and submit result.
navio-cli listblscttransactions "*" 50 0 true | jq '[.[] | select(.category == "stake")] | length' — count of recent stake rewards received by this wallet.
Block-height advancement at the expected rate (~120 s on mainnet, ~60 s on testnet).

Metrics via Prometheus¶

No first-party exporter ships with navio-core. Community options:

Custom exporter script polling RPC every 30 s, exposing /metrics.
Generic blackbox exporter for RPC probing.

Starter Python script:

from prometheus_client import start_http_server, Gauge
import requests, time

RPC = "http://user:pass@127.0.0.1:33677/"
m_blocks = Gauge("navio_block_height", "Current block height")
m_peers = Gauge("navio_peer_count", "Peer count")
m_mempool = Gauge("navio_mempool_bytes", "Mempool size in bytes")

def rpc(method, params=[]):
    r = requests.post(RPC, json={"jsonrpc": "1.0", "id": "m", "method": method, "params": params})
    return r.json()["result"]

if __name__ == "__main__":
    start_http_server(9190)
    while True:
        try:
            m_blocks.set(rpc("getblockcount"))
            m_peers.set(rpc("getconnectioncount"))
            m_mempool.set(rpc("getmempoolinfo")["bytes"])
        except Exception as e:
            print("scrape error:", e)
        time.sleep(30)

ZMQ publishers¶

Enable in navio.conf:

zmqpubrawblock=tcp://127.0.0.1:28332
zmqpubhashblock=tcp://127.0.0.1:28333
zmqpubrawtx=tcp://127.0.0.1:28334
zmqpubhashtx=tcp://127.0.0.1:28335
zmqpubsequence=tcp://127.0.0.1:28336

Subscribe with any ZMQ client. Use cases:

Trigger explorer re-index on new blocks without polling RPC.
Low-latency mempool feeds.
Event-sourced backends.

The navio-blocks indexer uses RPC polling today; switching to ZMQ is a straightforward enhancement.

Log management¶

Unless your init system captures stdout natively, debug.log and staker.log in the datadir grow unbounded. Rotate them with logrotate (or your supervisor's built-in log rotation).

Example logrotate drop-in:

/home/navio/.navio/testnet7/debug.log
/home/navio/.navio/testnet7/staker.log
/home/navio/.navio/debug.log
{
    weekly
    rotate 8
    compress
    delaycompress
    missingok
    notifempty
    copytruncate
}

Alerting signals¶

Signal	Severity	Action
Peer count < 3 for > 5 min	warning	Check firewall, DNS seeds, external reachability
`getblockchaininfo.blocks` not advancing for > 20 min	warning	Peers stuck, restart, check `debug.log`
`verificationprogress < 1.0` for > 1 h	critical	IBD stuck or flapping
Staker produced no reward txs in 7 days at expected weight	warning	Inspect staker stdout, RPC auth, wallet lock state
Disk usage > 80 % on datadir partition	warning	Prune more aggressively or move datadir
RPC 500 responses > 1 %	warning	Overloaded — raise `rpcworkqueue`
`naviod` restart loop	critical	OOM, DB corruption, or config error

Grafana dashboards¶

Start from Bitcoin Core community dashboards; swap metric names for your Navio exporter labels.

External uptime check¶

Run a cloud function (Cloudflare Worker, Lambda, simple cron on a different host) that hits a locked-down /healthz endpoint via an authenticated reverse proxy every minute. Alert on HTTP 5xx or timeout.