Forum Discussion

Ashok42470's avatar
Ashok42470
Copper Contributor
Feb 20, 2025

Need assistance on KQL query for pulling AKS Pod logs

I am trying to pull historical pod logs using below kql query. Looks like joining the tables; containerlog and KubePodInventory didn't go well as i see lot of duplicates in the output.

ContainerLog
//| project TimeGenerated, ContainerID, LogEntry
| join kind= inner    (
    KubePodInventory
    | where ServiceName == "<<servicename>>"
    )
    on ContainerID
| project TimeGenerated, Namespace, ContainerID, ServiceName, LogEntrySource, LogEntry, Name1
| sort by TimeGenerated asc

Can someone suggest a better query?

  • Take this:

     

    ContainerLog
    | join kind=inner (
        KubePodInventory
        | where ServiceName == "<<servicename>>"
    ) on ContainerID
    | project TimeGenerated, Namespace, ContainerID, ServiceName, LogEntrySource, LogEntry, Name1
    | distinct TimeGenerated, Namespace, ContainerID, ServiceName, LogEntrySource, LogEntry, Name1
    | sort by TimeGenerated asc
    

     

  • luchete's avatar
    luchete
    Steel Contributor

    Hi Ashok42470,

    The join between ContainerLog and KubePodInventory is causing duplicates because you're not filtering enough. To avoid duplicates, try adding the PodName or ContainerName as an additional filter in the join condition. You can also consider using summarize to aggregate the logs per pod/container if needed. Here’s an adjusted query:

    ContainerLog
    | join kind= inner (
        KubePodInventory
        | where ServiceName == "<<servicename>>"
    ) on $left.ContainerID == $right.ContainerID
    | project TimeGenerated, Namespace, ContainerID, ServiceName, LogEntrySource, LogEntry, Name1
    | summarize Logs = make_list(LogEntry) by ContainerID, TimeGenerated, ServiceName, Namespace
    | sort by TimeGenerated asc

    In this case the query groups the logs per container and should remove duplicates.

    Give it a try and let me know how it goes.

    Regards!

    • Ashok42470's avatar
      Ashok42470
      Copper Contributor

      Thanks for the reply luchete. The reason i am giving service name; pods are ephemeral right, sometimes i had to pull the logs for the pods that were already killed whose name i may not know. 

      In my case, just chosen logs for last 30 min and it gave me 30k+ rows of data which shouldn't be the case. By looking at the results, comma separated strings in logentry are getting split into multiple rows. Not sure how to tackle it?

      • luchete's avatar
        luchete
        Steel Contributor

        Hello Ashok42470,

        I understand the challenge with ephemeral pods. To prevent logs from being split across rows, you can aggregate them using summarize. Here’s an updated query that groups the logs into a single entry per container:

        ContainerLog
        | join kind=inner (
            KubePodInventory
            | where ServiceName == "<<servicename>>"
        ) on $left.ContainerID == $right.ContainerID
        | summarize Logs = make_list(LogEntry, 1000) by ContainerID, TimeGenerated, ServiceName, Namespace
        | extend CombinedLogs = strcat_array(Logs, " ") // Joins logs into a single string
        | project TimeGenerated, Namespace, ContainerID, ServiceName, CombinedLogs
        | sort by TimeGenerated asc

        This should eliminate duplicates and keep the log messages intact. If you’re pulling logs for killed pods, consider adjusting your time range or adding filters for PodName.

        Regards!

Resources