Deepfactor provides a K8s webhook which automatically injects a lightweight language-agnostic library, referred to as Deepfactor runtime in this document, into the containers being observed with Deepfactor. This library intercepts and sends relevant telemetry over to the Deepfactor portal for analysis and alert generation. This document describes steps to troubleshoot issues with Deepfactor K8s webhook and runtime.
If you notice that your Kubernetes pods are not instrumented/mutated with Deepfactor, or instrumented process are not reporting expected telemetry, please follow the steps below to collect logs and information that will help Deepfactor support staff debug the issue.
-
Check webhook installation and pod status.
WEBHOOK_NS="df-webhook" kubectl get pods -n $WEBHOOK_NS # check pods are running, haven't restarted, and the validation pod has completed successfully # e.g. no lines should be printed for the command: kubectl get pods -n $WEBHOOK_NS | grep -v 'Completed' | grep -v 'Running'
-
Check webhook<->portal connectivity, cluster & namespace configurations.
# collect the webhook log WEBHOOK_PNAME=`kubectl get pods -n $WEBHOOK_NS | grep mutating-webhook | awk '{print $1}'` kubectl logs $WEBHOOK_PNAME -n $WEBHOOK_NS > $WEBHOOK_PNAME.log # check for any error lines, investigate any portal communication error first grep '^E' $WEBHOOK_PNAME.log | grep 'error updating webhook config' | tail # confirm webhook was able to retrieve cluster & namespace configuration from portal # Find the last line which says 'Config reloaded' and review the configuration grep -n 'Config reloaded, config' $WEBHOOK_PNAME.log | tail -n1
-
Check if the pod or image is excluded by configuration
# inspect the last namespace configuration after 'Config reloaded' # for pod or image name exclusion patterns grep -n 'Config reloaded, config' $WEBHOOK_PNAME.log # look for excluded pod images or names grep 'ExcludeImageNameRegularExpression' $WEBHOOK_PNAME.log grep 'ExcludePodNameRegularExpression' $WEBHOOK_PNAME.log
- Check if component was successfully registered with Deepfactor portal
Check logs of df-init-con-0 container in the instrumented pod. Check for alerts, dfctl register success, dfinit-test results or warnings.pod=transactionhistory-68d7bb76d8-jmbdz ns=myns kubectl logs $pod -c df-init-con-0 -n $ns > $pod.dfinit-con0.log # grab a few lines after register grep -A 5 'dfctl register' $pod.dfinit-con0.log
- Login to the Deepfactor portal UI and locate the application corresponding to the pod of interest. If there are any warnings associated with that application, please take a screenshot.
-
Collect number of restarts, container exit codes, reasons, probes, resource requests & limits.
pod=transactionhistory-68d7bb76d8-jmbdz ns=myns kubectl describe pod $pod -n $ns > $pod.describe # check for container "Exit Code"(s), Reason(s) grep -A 10 'State:' $pod.describe # check for probe failures, restart event history grep -A 99 '^Events:' $pod.describe # check for resources and probes in pod spec #grep -A 2 -e 'Requests:' -e 'Limits:' $pod.describe #grep -e 'Readiness:' -e 'Startup:' -e 'Liveness:' $pod.describe
Collect the previous container’s stdout log if there was a restart. This could include aborts, kill signals, crashes, or exceptions along with other important context just before a container exited. Once a container restarts 3-7 times and a ‘running without deepfactor’ alert is reported, collect the ‘baseline’ log for the container for comparison.
kubectl logs $pod -p -n $ns > $pod.prev.log tail -n20 $pod.prev.log # check if container running without deepfactor df_enabled=`k exec -it $pod -n $ns -- sh -c \ 'grep "Container started without Deepfactor" /tmp-df/df-con-*.log.entry > /dev/null && echo .0df'` # another check for df in pid1, but not 100% accurate df_pid_1=`k exec -it $pod -n $ns -- sh -c \ 'grep "libdf\.so" /proc/1/maps > /dev/null && echo .pid1df'` # collect a baseline log kubectl logs $pod -n $ns > $pod$df_pid_1$df_enabled.log ls -l $pod$df_pid_1$df_enabled.log
-
Collect runtime logs, verbose debug logging.
# enable debug runtime/java logs for: webhook scan/dfcsan, runtime pod=transactionhistory-68d7bb76d8-jmbdz ns=myns # option: -c container if more than one pod container kubectl cp $pod:/tmp-df $pod-tmp-df -n $ns # check dfeventd log for connectivity errors, periodic telemetry event counts # vi $pod-tmp-df/dfeventd-*.log
Set
DF_DEBUG=true
in pod env for runtime verbose loggingenv: - name: DF_DEBUG value: "true"
Set
DF_JAVA_LOG_FILE=/tmp/df.java.log
to enable Javaagent (Class usage) debug log file
SetDF_DEBUG_VERBOSE=true
to get verbose, all telemetry decoded, dfeventd-X.log - Collect webhook static scan pod log, enable verbose debug logging.
WEBHOOK_NS="df-webhook" kubectl get pods -n $WEBHOOK_NS # collect the webhook static scan log SCAN_PNAME=`kubectl get pods -n $WEBHOOK_NS | grep static-scan | awk '{print $1}'` kubectl logs $SCAN_PNAME -n $WEBHOOK_NS > $SCAN_PNAME.log
Enable debug logging for webhook and static-scan pod by editing deployment env.
WEBHOOK_DEPLOY=`kubectl get deployments -n $WEBHOOK_NS | grep mutating-webhook | awk '{print $1}'` kubectl edit deployment $WEBHOOK_DEPLOY -n $WEBHOOK_NS SCAN_DEPLOY=`kubectl get deployments -n $WEBHOOK_NS | grep static-scan | awk '{print $1}'` kubectl edit deployment $SCAN_DEPLOY -n $WEBHOOK_NS env: - name: DF_DEBUG value: "true"