« Back
You are in :
Recent Topics »
OpManager »Any word on the LATEST PATCH RELEASE for JAVA HEAP MEMORY ?
Rob:
We have been working on this diligently. It is the highest priority item for us. Unfortunately, it is very hard to pin down. When we run it with memory-profiling tools, the problem does not occur, indicating some kind of timing (thread) issues conspiring to cause the leak. We have put in a couple of fixes, but we are not really sure they take care of the underlying issue, and each trial is taking time to verify. That is the reason for the delay. We apologize sincerely, and will post a fix as soon as possible.
Sridhar
We have been working on this diligently. It is the highest priority item for us. Unfortunately, it is very hard to pin down. When we run it with memory-profiling tools, the problem does not occur, indicating some kind of timing (thread) issues conspiring to cause the leak. We have put in a couple of fixes, but we are not really sure they take care of the underlying issue, and each trial is taking time to verify. That is the reason for the delay. We apologize sincerely, and will post a fix as soon as possible.
Sridhar
A quick update, with a much more detailed update coming tomorrow: this issue turns out not to be so much a bug but almost a "feature" under high load. Basically layering of code (which is normally a good design practice to save from other kinds of bugs) results in memory overheads due to object creation/destruction. It happens under relatively rare conditions so it took quite a while to consistently reproduct it on cue. We are reworking to make this code much more efficient.
We know this has been a pain, and please accept our profuse apologies. We will be enhancing our stress tests to ensure that we can handle heavy loads much more consistently.
Sridhar
We know this has been a pain, and please accept our profuse apologies. We will be enhancing our stress tests to ensure that we can handle heavy loads much more consistently.
Sridhar
Dear Rob,
After our analysis we found the following
1. When there are large number of switch and hence ports to be monitored, out of memory issue occurs. To find the status of ports we do some snmp queries and with large number of ports we found that the memory increases due to timeouts in call back mechanism in the snmp queries, which lead to out of memory issues. We have identified the root cause of this issue and a fix for this issue will be provided early next week.
2.When there are large number of services to be monitored, after sometime the polling becomes slow and seems to be stopped. Our initial analysis based on the debug patch applied in your setup, we found that the data polling threads are held up by service polling. Also the out of memory exception which was thrown in your setup was not reproduceable locally as well as in your setup. We are closely monitoring your setup and also trying to reproduce this issue locally in our environment. We hope that we will be able to resolve the polling and out of memory issues, as early as possible. We will keep you updated.
Thanks for your patience and understanding.
Regards,
Karthi
For OpManager Support
After our analysis we found the following
1. When there are large number of switch and hence ports to be monitored, out of memory issue occurs. To find the status of ports we do some snmp queries and with large number of ports we found that the memory increases due to timeouts in call back mechanism in the snmp queries, which lead to out of memory issues. We have identified the root cause of this issue and a fix for this issue will be provided early next week.
2.When there are large number of services to be monitored, after sometime the polling becomes slow and seems to be stopped. Our initial analysis based on the debug patch applied in your setup, we found that the data polling threads are held up by service polling. Also the out of memory exception which was thrown in your setup was not reproduceable locally as well as in your setup. We are closely monitoring your setup and also trying to reproduce this issue locally in our environment. We hope that we will be able to resolve the polling and out of memory issues, as early as possible. We will keep you updated.
Thanks for your patience and understanding.
Regards,
Karthi
For OpManager Support
Any update on the release of the patch?
Thanks
Thanks
Could you please provide a status on the release of the patch? I am monitoring a large number of devices and the program continueously stops polling and the web client is unbearably slow. I have already tried to unmanage all my switches, but after a short period of time, the problem returns. It is at the point where I am unable to use the program to monitor any devices.
Thanks
Thanks
Hey Kad im just as anxious as you are, They are actually working directly on my server and they have made some headway on it. But its still a little slow..
My stuff stops polling without any switches..just servers and services.
I've been complaining quite a bit about this issue. I think they are working on it. They just need to figure out where the problem is. I got about a gig of LOGS they gotta dig through..
:cry:
My stuff stops polling without any switches..just servers and services.
I've been complaining quite a bit about this issue. I think they are working on it. They just need to figure out where the problem is. I got about a gig of LOGS they gotta dig through..
:cry:
Dear Rob,
Please accept our thanks for providing us access to your system which helped us a lot in reproducing the issue and in analysis.
Dear Rob and Kad,
As mentioned in my earlier post, there were two issues which caused the memory overflow. The second issue was reproduced in Rob's setup. We found that when there was a heavy load ie huge data to be collected from the servers being monitored , there was a queue growing at the SNMP level handling in OpManager. This causes the system to perform slow and hence out of memory. We have fixed the queue overgrowing issue through self controlled queue size mode. This fix sometimes (when the queue grows beyond a certain limit) may skip some polling cycle, but ensures that the system does not fail. To avail this hot fix please contact OpManager support. Currently we are working on a better algorithm to handle such huge queues. We will keep you updated on the same.
Thanks for your understanding and patience.
With regards,
Karthi Mariappan
For OpManager support.
Please accept our thanks for providing us access to your system which helped us a lot in reproducing the issue and in analysis.
Dear Rob and Kad,
As mentioned in my earlier post, there were two issues which caused the memory overflow. The second issue was reproduced in Rob's setup. We found that when there was a heavy load ie huge data to be collected from the servers being monitored , there was a queue growing at the SNMP level handling in OpManager. This causes the system to perform slow and hence out of memory. We have fixed the queue overgrowing issue through self controlled queue size mode. This fix sometimes (when the queue grows beyond a certain limit) may skip some polling cycle, but ensures that the system does not fail. To avail this hot fix please contact OpManager support. Currently we are working on a better algorithm to handle such huge queues. We will keep you updated on the same.
Thanks for your understanding and patience.
With regards,
Karthi Mariappan
For OpManager support.
Dear Rob and Kad,
We have come up with the fix (with a better algorithm) for the queue growing issue . This fix ensures that resource hungry monitors do not starve the other monitors from being scheduled for data collection. Such resource hungry monitors will not be rescheduled for data collection till the previous schedule for them is completed. This is a consolidated patch which has the fix for Switch ports issue also. Please contact OpManager support to avail this hot fix.
Thanks and regards,
Karthi Mariappan
We have come up with the fix (with a better algorithm) for the queue growing issue . This fix ensures that resource hungry monitors do not starve the other monitors from being scheduled for data collection. Such resource hungry monitors will not be rescheduled for data collection till the previous schedule for them is completed. This is a consolidated patch which has the fix for Switch ports issue also. Please contact OpManager support to avail this hot fix.
Thanks and regards,
Karthi Mariappan
Hello, I have reinstalled opmanager onto a linux box and would like to get some help moving the Databases to the linux box from a windows. Can you guys give me a call or email me and help me move over the databases. And also implement the fix on the linux box ?
Hi! I am evaluating that product - have only 10 servers, no switches - same problem. I have already removed all smtp and wmi counters and same result - high CPU. A little better if I close web client window. I was sure we will buy that product because it exactly fits our needs, but I can't buy the product with 100% CPU consumption. Please provide me some fix you already have, I will try it. Thanks,
Michael
mivanov@touchpoint.ca
Michael
mivanov@touchpoint.ca
Hi Michael,
Let me know if you run the default SNMP service on the machine which runs OpManager. Also, let me know the monitoring interval used for monitoring the devices. Provide the configuration details of the server and the services you monitor. Please send the files under /logs folder and /mysql/data/OpManagerDB to support@opmanager.com for further analysis.
Regards
Karthik
OpManager Support
Let me know if you run the default SNMP service on the machine which runs OpManager. Also, let me know the monitoring interval used for monitoring the devices. Provide the configuration details of the server and the services you monitor. Please send the files under /logs folder and /mysql/data/OpManagerDB to support@opmanager.com for further analysis.
Regards
Karthik
OpManager Support
Hello,
My company is a client. I keep seeing the following exception occuring.
Exception in thread "Image Fetcher 0" java.lang.OutOfMemoryError: Java heap space
Will the patch correct this problem?
Chris
My company is a client. I keep seeing the following exception occuring.
Exception in thread "Image Fetcher 0" java.lang.OutOfMemoryError: Java heap space
Will the patch correct this problem?
Chris
It would be a tremendous help if someone would be willing to share the conditions under which they are experiencing these issues so we can understand if the patch is relelvent to us.
i.e. opmanager server specs, number of devices being monitored, etc
TIA!
-Alex
i.e. opmanager server specs, number of devices being monitored, etc
TIA!
-Alex
Hi Chris,
I understand that you are getting Java heap space error. However to analyze the issue further, please raise a request from "http://support.opmanager.com". Also forget not to attach the "Support Tab -->Support Information File", we will analyze the issue and get back to you.
BTW Alex, If the issue is not environment specific, I will post the status in forums also for your follow-up.
Regards,
Kalvin
OpManager Support Team
I understand that you are getting Java heap space error. However to analyze the issue further, please raise a request from "http://support.opmanager.com". Also forget not to attach the "Support Tab -->Support Information File", we will analyze the issue and get back to you.
BTW Alex, If the issue is not environment specific, I will post the status in forums also for your follow-up.
Regards,
Kalvin
OpManager Support Team
Post Actions
Corp