Parsing TeamCity log output. Analyse, find reasons for unique or systematic problemsBy neokrates, written on May 7, 2010
What is great about TeamCity is how it logs things. You can have it write down every bit of information in the way, that allows you to understand the build process and problems with it. On the other hand, you are facing the Proverbial Needle in a Haystack. Here is the general method on how to work with TeamCity logs. Find your “needle problem” in the Haystack of log output. Common sources of errors or even crashes in TeamCity are also presented here.
✔ TeamCity 4.x
Should also work for:
✔ Other TeamCity Versions
So, you have XX BuildAgents.
There are at least three things which make your task easier:
⭐ logging on the master is important. Set up the logger the way, that it is daily archived. Maybe use RollingFileAppender or its alternatives (http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/DailyRollingFileAppender.html).
⭐ log level. In development DEBUG and INFO levels are ok. But it is “no go” on the production system. So, any developer should consider carefully when we should be WARNed and when there is an ERROR. For us, it is important to have enough information but not even one bit more.
⭐ agents should be setup in the way that TeamCity has enough disk space to store the history. NAS or huge shared drive might be good choice.
Provided these conditions, you have all the build history and good chance to find the build problem within given time frame.
Where to begin
It all begins with the failed build or timed out build, or just slow build, or whatever other build that makes people unhappy.
You can inspect what is actually wrong by selecting the drop-down near to the build and then Full log. If the log file is not that big, you can inspect it from the browser. If the file is large and you need more parsing options, download it using download full build log.
What seems to be the problem
There are classical tools for analysis of the build log. One such tool is grep. From the first inspection you found out, that your build had the problem at 12:14 on 2010-05-06. You can limit the search log volume by getting all the output for the date YYYY-MM-dd HH:mm:ss (Your format may differ)
grep "2010-05-06 12:[0-9][0-9]:[0-9][0-9]" mylog_20100506.log > myshorterlog_20100506.log
If you are lucky you might see the problem in this output and you are done.
See the error but can’t see its cause
Often whatever you see in the local output results in new questions and doens’t solve your problem. Here are some remarks on how to search for more info.
TeamCity uses unique ID to identify the build
In the log you will see
[17:20:57]: Checking for changes [17:21:17]: Clearing temporary directory: /var/log/TeamCity/your_agent111/temp/buildTmp [17:21:17]: Checkout directory: /var/log/TeamCity/your_agent111/work/e0ca0d75a95c2215
Here, e0ca0d75a95c2215 is a unique identifier for the build. You will also see similar identifiers in many places in logs, and you will see it in many processes started by the build.
Detect the “Zombie process” using TeamCity build identifier
Maybe our build e0ca0d75a95c2215 has started some process and never finished it. Login to the build_agent111 and use:
ps -aux|grep e0ca0d75a95c2215
Alternatively search for “Zombie process” and then identify the build of origin
Another common way to find your “Zombie process” is to search for token, marker which identifies it. Just see in the code how the process was started, and then:
ps -aux|grep MY_UNIQUE_PROCESS_TOKEN
If you have the process, you also have its PID.
diuw@diuw-desktop:~$ ps -aux|grep MY_UNIQUE_PROCESS_TOKEN diuw 28119 3.0 2.8 326400 99392 ? Sl 17:11 2:35 /usr/lib/my_process_home/my_process<nowiki>
Now, when was the Zombie started?
diuw@diuw-desktop:~$ ps -ef|grep 28119 agent201 28119 1 0 Mar11 ? 00:00:00 /usr/lib/my_process_home/my_process
Use "TeamCity agent Statistics" to find the historical build
Use Date and Agent ID to find the problematic build.
- Go to Agents -> statistics
- In the from: and to: enter “11 Mar 2010”, which is the day you are interested in.
We know that our your_agent111 has a hanging process, and it is there since Mar11. So, just check out all the builds on that day, made on that particular agent. One of them will be identified by having the same TeamCity build identification as the process, which you are found. That build contains the root cause.
Port is busy
You can identify the hanging process by the port which it occupies:
lsof -l|grep MY_NEEDED_PORT
After that use previously described steps to find the root cause.
Synchronization IS an issue
With TeamCity you never know. Don’t forget that builds which run simultaneously might have many synchronization issues. The synchronization problem is often the cause if many builds fail in the same time, whereas normally they work ok.
That may be because:
- the same port is required,
- the other same resource(file) is used for IO,
- agents run under different user which creates permission conflict. First creates resource, and second can’t create or use the resource on the same place
- JVM issues. Like static code or concurrent calls
etc.. This list can be continued endlessly.
Forking the process may become a disaster
If you are getting out of memory type of java message, that may be not because your code is bad and eats the memory. On the Linux, java exec forks the process, which is a particular problem for the TeamCity, which is huge by itself.More about this problem
What takes so long
Use some kind of parser to extract the execution time and find the slow places. Here is one such parser, written in groovy
Just a log overflow
TeamCity may log as much as 100-200 mb of rough log data pro build. And it will crash if there is no more disk space… Cleaning and good logging policy will solve that issue.
Happy debugging & have fun :-)
LEARN MORE (amazon bookstore)