- 14 May, 2019 2 commits
- 
- 
Frank Mai authored**Problem:** There are two kubelet scraping targets on Prometheus, one is scraping `/metrics`, another one is scraping `/metrics/cadvisor`. The metrics from `/metrics` endpoint are not including `container_name`. So the `container_*` expression will double the actual mount without `conatiner_name!=""` **Solution:** Add `container_name!=""` into the expression **Issue:** https://github.com/rancher/rancher/issues/20162 
- 
Frank Mai authored**Problem:** The `Flunetd` pane from `Rancher Components` dashboard cannot show the right counting of fluentd Pods **Solution:** Change `sum(kube_pod_info{pod=~"fluentd.*"})` to `sum(kube_pod_info{pod=~".*fluentd.*",pod!~".*aggregator.*"})` **Issue:** https://github.com/rancher/rancher/issues/19722
 
- 
- 09 May, 2019 1 commit
- 
- 
orangedeng authoredFix the issue that node exporter crash when deploying into the node without internal ip. 
 
- 
- 08 May, 2019 3 commits
- 
- 
Frank Mai authored
- 
Aiwantaozi authoredProblem: update configure reloader to mirrored rancher image, logging configure secret has precan and generted data Solution: update configure reloader image, separate configure secret into two secrets Issue: https://github.com/rancher/rancher/issues/19836 
- 
Aiwantaozi authored
 
- 
- 07 May, 2019 2 commits
- 01 May, 2019 2 commits
- 
- 
Frank Mai authored- Embed operator as sub charts + Support to configure operator like other charts + Adjust operator default limit - Add permission to kube-state exporter - Replace localhost by 127.0.0.1 on prometheus-auth - Increase Nginx proxy buffers - Configure PVC name of Prometheus or Alertmanager + Allow to configure PVC name of Prometheus or Alertmanager via `prometheus.persistence.name` or `alertmanager.persistence.name` - Adjust Cluster Monitoring scrape logic + Don't scrape Monitoring namespace on `prometheus-io-scrape` job + The rate to scrape is using global interval, the default is 60s + Remove useless Prometheus record rules **Issue:** - https://github.com/rancher/rancher/issues/19693 - https://github.com/rancher/rancher/issues/18830 - https://github.com/rancher/rancher/issues/19243 - https://github.com/rancher/rancher/issues/19689 - https://github.com/rancher/rancher/issues/19410 - https://github.com/rancher/rancher/issues/19248
- 
Frank Mai authored
 
- 
- 12 Mar, 2019 2 commits
- 06 Mar, 2019 1 commit
- 
- 
orangedeng authored**Problem:** When we start nginx in our start-up script, the nginx process would become the child process of start-up script process and not process 1. In this case, the kill signal from kubelet/docker will be sent to start-up script instead of nginx so the nginx process won't stop after kill. **Solution:** Change the proxy command and let nginx start at process 1. 
 
- 
- 05 Mar, 2019 1 commit
- 
- 
orangedeng authoredIn system-charts, we need to use the `repository` and `tag` to define container's image name. After that, we can collect them together and provide an images list we need for system charts. 
 
- 
- 27 Feb, 2019 3 commits
- 
- 
Prachi Damle authored
- 
Aiwantaozi authoredproblem: before fluentd 1.3.1 version can't support add client cert for fluentd output Solution: upgrade fluentd to 1.3.3, but the related kafka gem also upgrade small version, tested fluentd and kafka after upgrade version Issue: https://github.com/rancher/rancher/issues/18396 
- 
gitlawr authored
 
- 
- 26 Feb, 2019 3 commits
- 
- 
Prachi Damle authored- Adding checksum over secrets to ensure change in secrets upgrades deployment - Using rancher image for ensuring airgap case works too - Adding nodeSelector to ensure the workloads never schedule to the Windows node - Adding resource limits - Add private image registry for airgap case 
- 
Prachi Damle authoredWe will be keeping up with the upstream chart https://github.com/helm/charts/tree/master/stable/external-dns 
- 
Frank Mai authored**Problem:** Cannot start "rules-configmap-reloader" container with 10Mi limit resource **Solution:** Update images: - quay.io/coreos/prometheus-operator:v0.29.0 -> rancher/coreos-prometheus-operator:v0.29.0 - quay.io/coreos/prometheus-config-reloader -> rancher/coreos-prometheus-config-reloader:v0.29.0 - prom/alertmanager:v0.16.1 -> rancher/prom-alertmanager:v0.16.1 - prom/prometheus:v2.7.1 -> rancher/prom-prometheus:v2.7.1 - grafana/grafana:5.4.3 -> rancher/grafana-grafana:5.4.3 - prom/node-exporter:v0.17.0 -> rancher/prom-node-exporter:v0.17.0 - quay.io/coreos/kube-state-metrics:v1.5.0 -> rancher/coreos-kube-state-metrics:v1.5.0 **Issue:** - https://github.com/rancher/rancher/issues/17997 - https://github.com/rancher/rancher/issues/18353 
 
- 
- 25 Feb, 2019 1 commit
- 
- 
Frank Mai authored**Problem:** Enable logging and monitoring in `rancher/rancher:master`, but can't see fluentd metric **Solution:** Consist label and endpoint name in `system-chart/rancher-monitoring:v0.0.2`` **Issue:** https://github.com/rancher/rancher/issues/18327 **Patch:** https://github.com/rancher/system-charts/pull/17 
 
- 
- 22 Feb, 2019 1 commit
- 
- 
Aiwantaozi authoredProblem: enable logging and moinitoring but can't see fluentd metric Solution: consist label and endpoint name Issue: https://github.com/rancher/rancher/issues/18327 
 
- 
- 20 Feb, 2019 7 commits
- 
- 
Craig Jellick authoredRefactor & fix some issue for Monitoring 
- 
Frank Mai authoredchecking **Issue:** https://github.com/rancher/rancher/issues/18104 
- 
Frank Mai authored
- 
Frank Mai authored
- 
Frank Mai authored**Issues:** https://github.com/rancher/rancher/issues/18166 
- 
Frank Mai authored
- 
Frank Mai authored
 
- 
- 15 Feb, 2019 1 commit
- 
- 
Frank Mai authored**Problem:** - Remote reader mode only allow `project-level` Prometheus to share the metrics from `cluster-level` Prometheus - Remote reader mode cannot save the namespace-related metrics from `cluster-level` Prometheus **Solution:** - Add `prometheus.sync.mode` to choose - Add a "federate" scrape job when deploying federation mode **Issue:** https://github.com/rancher/rancher/issues/17390 
 
- 
- 14 Feb, 2019 5 commits
- 
- 
Frank Mai authored
- 
Frank Mai authored**Problem:** Cannot input like `x.y.z/k` label name into serviceSelectorLabels **Solution:** Use array instead of object as values 
- 
Frank Mai authored
- 
Frank Mai authored
- 
Frank Mai authored**Problem:** Cannot input like `x.y.z/k` label name into nodeSelector **Solution:** Use array instead of object as values **Issue:** https://github.com/rancher/rancher/issues/17340 
 
- 
- 13 Feb, 2019 1 commit
- 
- 
Fyery authoredproblem: We can not deploy monitoring tools in an air gap environment. Solution: Add the ability to use the private image registry when deploying monitoring tools Issue: https://github.com/rancher/rancher/issues/17842 
 
- 
- 12 Feb, 2019 2 commits
- 
- 
frank authored**Problem:** - Previous charts cannot satisfy the project level monitoring deploying design - Grafana cannot be restarted after password changed - node-exporter cannot be scheduled to `controlpane` or `etcd` role nodes - Prometheus cannot be started with PVC that provided by some storage provisioner which don't respect the `SecurityContext` **Solution:** - Deploy "project level" monitoring with a permission-limit Prometheus - Remove Grafana account `Secret` and use provisioning instead of `grafana-watch` - Modify node-exporter `taints` - Add configurable `SecurityContext` for Prometheus and Alertmanager - Narrow Prometheus permission **Issue:** - https://github.com/rancher/rancher/issues/17039 - https://github.com/rancher/rancher/issues/16962 - https://github.com/rancher/rancher/issues/17030 - https://github.com/rancher/rancher/issues/17256Co-authored-by: orangedeng <jxfa0043379@hotmail.com> 
- 
frank authored
 
- 
- 29 Jan, 2019 1 commit
- 
- 
Fyery authoredproblem: After we refactored the logging, we can not deploy logging tools in an air gap environment. Solution: Add the ability to use the private image registry when deploying logging tools Issue: https://github.com/rancher/rancher/issues/17568 
 
- 
- 14 Jan, 2019 1 commit
- 
- 
Aiwantaozi authoredProblem: we want to use catalog to deploy system tools Solution: add logging chart 
 
- 
