Luckily the IBM JVM's generate a Portable Heap Dump (PHD) on an OOM, and has an array of extremely helpful tools to analyze information from heap/thread dumps and and other information offline. Thus it is still possible to detect memory leaks in applications, even when they only occur in live production systems. Sometimes the cause would be heap fragmentation, where even if a considerable percentage of memory is still available, a contiguous chunk of the size required cannot be freed.
data:image/s3,"s3://crabby-images/5119c/5119c97893ffb7fee810d7b11f47ab990b22c987" alt=""
The above image shows the IBM Heap Analyzer, detecting a memory leak by the WebSphere DRS / Session Replication, where 923MB of heap has been consumed by 14,011 HashMap#Entry objects held onto by the WebSphere Data Replication Service, used for HTTP session replication.
It is also interesting to look for the use of Xalan 2.6.0 by any application code, as I have at earlier instances found memory leaks that are typically more difficult to trace - but which occurs primarily due to a well known bug in 2.6.0 of Xalan.