Skip to main content

Paul Lindner

The Mysteries of Java Character Set Performance

4 min read

"Two Characters Sets?  Seems like plenty!"

So I've been pushing Java to it's limits lately and finding some real nasty concurrency issues inside the JRE code itself.  Here's one particulary ugly one -- we had 700 threads stuck here:

       java.lang.Thread.State: BLOCKED (on object monitor)                                                                    
         at sun.nio.cs.FastCharsetProvider.charsetForName(
         - waiting to lock <0x00002aab4cdf91b8> (a sun.nio.cs.StandardCharsets)
         at java.nio.charset.Charset.lookup2( 
         at java.nio.charset.Charset.lookup(
         at java.nio.charset.Charset.isSupported( 
         at java.lang.StringCoding.lookupCharset( 
         at java.lang.StringCoding.decode(                                                                      
         at java.lang.String.( 
Digging deeper we find the lookupCharset is called all over the place.  The app in question is functions as a web proxy, so it's constantly reading and writing data from web pages in a variety of character sets.  The method charsetForName() uses a synchronized data structure to lookup defined character sets.  (Yay serialized access....)
But wait, lookup and lookup2 provide us with a cache so we can avoid the big bad synchronized method..  Sigh, here's the implementation:
     private static Charset lookup(String charsetName) {
         if (charsetName == null)
             throw new IllegalArgumentException("Null charset name");
         Object[] a;
         if ((a = cache1) != null && charsetName.equals(a[0]))
             return (Charset)a[1];
         // We expect most programs to use one Charset repeatedly.
         // We convey a hint to this effect to the VM by putting the
         // level 1 cache miss code in a separate method.
         return lookup2(charsetName);
     private static Charset lookup2(String charsetName) {
         Object[] a;
         if ((a = cache2) != null && charsetName.equals(a[0])) {
             cache2 = cache1;
             cache1 = a;
             return (Charset)a[1];
         Charset cs;
         if ((cs = standardProvider.charsetForName(charsetName)) != null ||
             (cs = lookupExtendedCharset(charsetName))           != null ||
             (cs = lookupViaProviders(charsetName))              != null)
             cache(charsetName, cs);
             return cs;
         /* Only need to check the name if we didn't find a charset for it */
         return null;
Yes, a whopping 2-entry cache!!
Also, the keys used are not canonical, so if my app asks for "UTF-8", "utf-8", and "ISO-8859-1" with regularity this 2 entry cache is worthless, every call ends up blocking in the evil thread-synchronized data structure.
Someone send them a copy of the ConcurrentHashMap doc.  please.