第04期：Prometheus 数据采集（三）

优采云发布时间: 2020-08-17 22:27

　　本期作者：罗韦

　　爱可生上海研发中心成员，研发工程师，主要负责 DMP 平台监控告警功能的相关工作。

　　Prometheus 的监控对象各式各样，没有统一标准。为了解决这个问题，Prometheus 制定了一套监控规范，符合这个规范的样本数据可以被 Prometheus 采集并解析样本数据。Exporter 在 Prometheus 监控系统中是一个采集监控数据并通过 Prometheus 监控规范对外提供数据的组件，针对不同的监控对象可以实现不同的 Exporter，这样就解决了监控对象标准不一的问题。从广义上说，所有可以向 Prometheus 提供监控样本数据的程序都可以称为 Exporter，Exporter 的实例也就是我们上期所说的"target"。

　　Exporter 的运行方法Exporter 有两种运行方法Exporter 接口数据规范

　　Exporter 通过 HTTP 接口以文本方式向 Prometheus 暴露样本数据，格式简单，没有嵌套，可读性强。每个监控指标对应的数据文本格式如下：

　　# HELP

# TYPE

{ =,=...}

...

　　# HELP x balabala

# TYPE x summary

x{quantile="0.5"} value1

x{quantile="0.9"} value2

x{quantile="0.99"} value3

x_sum sum(values)

x_count count(values)

　　# HELP x The temperature of cpu

# TYPE x histogram

x_bucket{le="20"} value1

x_bucket{le="50"} value2

x_bucket{le="70"} value3

x_bucket{le="+Inf"} count(values)

x_sum sum(values)

x_count count(values)

　　这样的文本格式也有不足之处：

　　1. 文本内容可能过分繁琐；

　　2. Prometheus 在解析时不能校准 HELP 和 TYPE 字段是否缺位，如果缺位 HELP 字段，这条样本数据的来源可能就无法判定；如果缺位 TYPE 字段，Prometheus 对这条样本数据的类型就无从得悉；

　　3. 相比于 protobuf，Prometheus 使用的文本格式没有做任何压缩处理，解析成本较高。

　　MySQL Server Exporter

　　针对被广泛使用的关系型数据库 MySQL，Prometheus 官方提供了 MySQL Server Exporter，支持 MySQL 5.6 及以上版本，对于 5.6 以下的版本，部分监控指标可能不支持。

　　MySQL Server Exporter 监控的信息包括了常用的 global status/variables 信息、schema/table 的统计信息、user 统计信息、innodb 的信息以及主从复制、组复制的信息，监控指标比较全面。但是因为它提供的监控指标中缺乏对 MySQL 实例的标示，所以当一台主机上存在多个 MySQL 实例，需要运行多个 MySQL Server Exporter 进行监控时，就会无法分辨实例信息。具体使用方法可参考：

　　Node Exporter

　　Prometheus 官方的 Node Exporter 提供对 *NIX 系统、硬件信息的监控，监控指标包括 CPU 使用率/配置、系统平均负载、内存信息、网络状况、文件系统信息统计、磁盘使用情况统计等。对于不同的系统，监控指标会有所差别，如 diskstats 支持 Darwin, Linux, OpenBSD 系统；loadavg 支持 Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris 系统。Node Exporter 的监控指标没有对主机身分的标示，可以通过 relabel 功能在 Prometheus Server 端降低一些标示标签。具体使用方法可参考：

　　如何实现一个 Exporter编撰一个简单的 Exporter

　　使用 prometheus/client_golang 包，我们来编撰一个简单的 Exporter，包括 Prometheus 支持的四种监控指标类型

　　package main

import (

"log"

"net/http"

"github.com/prometheus/client_golang/prometheus"

"github.com/prometheus/client_golang/prometheus/promhttp"

)

var (

//使用GaugeVec类型可以为监控指标设置标签，这里为监控指标增加一个标签"device"

speed = prometheus.NewGaugeVec(prometheus.GaugeOpts{

Name: "disk_available_bytes",

Help: "Disk space available in bytes",

}, []string{"device"})

tasksTotal = prometheus.NewCounter(prometheus.CounterOpts{

Name: "test_tasks_total",

Help: "Total number of test tasks",

})

taskDuration = prometheus.NewSummary(prometheus.SummaryOpts{

Name: "task_duration_seconds",

Help: "Duration of task in seconds",

//Summary类型的监控指标需要提供分位点

Objectives: map[float64]float64{0.5: 0.05, 0.9: 0.01, 0.99: 0.001},

})

cpuTemperature = prometheus.NewHistogram(prometheus.HistogramOpts{

Name: "cpu_temperature",

Help: "The temperature of cpu",

//Histogram类型的监控指标需要提供Bucket

Buckets: []float64{20, 50, 70, 80},

})

)

func init() {

//注册监控指标

prometheus.MustRegister(speed)

prometheus.MustRegister(tasksTotal)

prometheus.MustRegister(taskDuration)

prometheus.MustRegister(cpuTemperature)

}

func main() {

//模拟采集监控数据

fakeData()

//使用prometheus提供的promhttp.Handler()暴露监控样本数据

//prometheus默认从"/metrics"接口拉取监控样本数据

http.Handle("/metrics", promhttp.Handler())

log.Fatal(http.ListenAndServe(":10000", nil))

}

func fakeData() {

tasksTotal.Inc()

//设置该条样本数据的"device"标签值为"/dev/sda"

speed.With(prometheus.Labels{"device": "/dev/sda"}).Set(82115880)

taskDuration.Observe(10)

taskDuration.Observe(20)

taskDuration.Observe(30)

taskDuration.Observe(45)

taskDuration.Observe(56)

taskDuration.Observe(80)

cpuTemperature.Observe(30)

cpuTemperature.Observe(43)

cpuTemperature.Observe(56)

cpuTemperature.Observe(58)

cpuTemperature.Observe(65)

cpuTemperature.Observe(70)

}

　　接下来编译、运行我们的 Exporter

　　GOOS=linux GOARCH=amd64 go build -o my_exporter main.go

./my_exporter &

　　Exporter 运行上去以后，还要在 Prometheus 的配置文件中加入 Exporter 信息，Prometheus 才能从 Exporter 拉取数据。

　　static_configs:

- targets: ['localhost:9090','172.17.0.3:10000']

　　在 Prometheus 的 targets 页面可以看见刚刚新增的 Exporter 了

　　untitled.png

　　访问"/metrics"接口可以找到如下数据：

　　Gauge

　　因为我们使用了 GaugeVec，所以形成了带标签的样本数据

　　# HELP disk_available_bytes disk space available in bytes

# TYPE disk_available_bytes gauge

disk_available_bytes{device="/dev/sda"} 8.211588e+07

　　Counter

　　# HELP test_tasks_total total number of test tasks

# TYPE test_tasks_total counter

test_tasks_total 1

　　Summary

　　# HELP task_duration_seconds Duration of task in seconds

# TYPE task_duration_seconds summary

task_duration_seconds{quantile="0.5"} 30

task_duration_seconds{quantile="0.9"} 80

task_duration_seconds{quantile="0.99"} 80

task_duration_seconds_sum 241

task_duration_seconds_count 6

　　Histogram

　　# HELP cpu_temperature The temperature of cpu

# TYPE cpu_temperature histogram

cpu_temperature_bucket{le="20"} 0

cpu_temperature_bucket{le="50"} 2

cpu_temperature_bucket{le="70"} 6

cpu_temperature_bucket{le="80"} 6

cpu_temperature_bucket{le="+Inf"} 6

cpu_temperature_sum 322

cpu_temperature_count 6

　　Exporter实现方法的审视

　　上面的板栗中，我们在程序一开始就初始化所有的监控指标，这种方案一般接下来会开启一个取样解释器去定期采集、更新监控指标的样本数据，最新的样本数据将仍然保留在显存中，在接到 Prometheus Server 的恳求时，返回显存里的样本数据。这个方案的优点在于，易于控制取样频度；不用害怕并发取样可能带来的资源占据问题。不足之处有：

　　1. 由于样本数据不会被手动清除，当某个已被取样的采集对象失效了，Prometheus Server 依然能拉取到它的样本数据，只是这个数据从监控对象失效时就早已不会再被更新。这就须要 Exporter 自己提供一个对无效监控对象的数据清除机制；

　　2. 由于响应 Prometheus Server 的恳求是从显存里取数据，如果 Exporter 的取样解释器异常卡住，Prometheus Server 也难以感知，拉取到的数据可能是过期数据；

　　3. Prometheus Server 拉取的数据不是即时取样的，对于某时间点的数据一致性不能保证。

　　另一种方案是 MySQL Server Exporter 和 Node Exporter 采用的，也是 Prometheus 官方推荐的方案。该方案是在每次接到 Prometheus Server 的恳求时，初始化新的监控指标，开启一个取样解释器。和方案一不同的是，这些监控指标只在恳求期间存活。然后取样解释器会去采集所有样本数据并返回给 Prometheus Server。相比于方案一，方案二的数据是即时拉取的，可以保证时间点的数据一致性；因为监控指标会在每次恳求时重新初始化，所以也不会存在失效的样本数据。不过方案二同样有不足之处：

　　1. 当多个拉取恳求同时发生时，需要控制并发采集样本的资源消耗；

　　2. 当多个拉取恳求同时发生时，在短时间内须要对同一个监控指标读取多次，对于一个变化频度较低的监控指标来说，多次读取意义不大，却降低了对资源的占用。

　　相关内容方面的知识，大家还有哪些疑惑或则想知道的吗？赶紧留言告诉小编吧！

0

2020-08-17

自动采集编写

0 个评论

要回复文章请先登录或注册

AI时代内容工厂

第04期：Prometheus 数据采集（三）

0 个评论

发起人