JDK-6483858 : File attribute access is very slow (isDirectory, etc.)
  • Type: Enhancement
  • Component: core-libs
  • Sub-Component: java.io
  • Affected Version: 5.0
  • Priority: P5
  • Status: Closed
  • Resolution: Duplicate
  • OS: windows_xp
  • CPU: x86
  • Submitted: 2006-10-19
  • Updated: 2011-02-16
  • Resolved: 2009-02-16
Related Reports
Duplicate :  
Description
A DESCRIPTION OF THE REQUEST :

As can be seen in bugs 4712307,4145781,4071318, 4480327, 4679673, 4711700, 4858226, 4889108, 5033747 and others, there is a significant performance issue when accessing the attributes of files in large directory structures. I encountered this after investigating the reason Tomcat was extremely slow in compiling JSP pages. These bugs have been closed with no real fix by

1. claiming it cannot be recreated, despite having many users being able to consistently recreate it, on versions of Windows and JDK going several years back and up to current versions (in my case, WinXP SP2 with JDK 1.5.0_06), and possibly other operating systems.
2. bypassing the checking of these attributes in JDK code, which is in most other cases not an option.
3. pointing the blame elsewhere (namely the OS).


JUSTIFICATION :
The simple point to be made is:

1. A Very large number of applications iterate over files (whether for GUI display or actual file processing).
2. Checking file attributes (isDirectory, exists, isHidden, isNormal, etc.) during iteration on a large number of files is unacceptably slow (can be measured in minutes). Java provides no way of doing it better.
3. It is possible using native code to do this efficiently (i've seen a 25 times gain in speed, and others have reported similar results)

So, this is a request for enhancement, to provide us with a built-in solution to this serious performance issue, which will solve all these reported bugs.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
File attribute access should be efficient such that processing large folders is possible in reasonable time without the need for JNI.


One possible suggestion:

- extend the File object with a cache of the file attributes.
- add a useCachedAttributes flag in File, defaulting to false (for perfect backward compatibility) and a corresponding useCachedAttributes(boolean) method which sets this flag.
- when calling File's existing attribute getter/checker methods: if the useCachedAttributes flag is set and the cached data is available, return the cached data. otherwise, make the native call and update the cache with whatever information it can update (most native calls provide several attributes for the price of one).
- iterator methods such as listFiles can update the cache data during iteration (for example, the WIN32_FIND_DATA structure used during file iteration on Windows already contains this data, so the performance cost is neglible).

This scheme will add only a few bytes to a File instance, will not break any existing code, and will allow adding a single line of code (file.useCachedAttributes(true)) on existing code to enjoy this very significant performance boost on thousands of suffering applications.

Of course, I'm sure there are many other possible solutions which may be better - it does not matter as long as this functionality is made available to the many applications that desperately need it.

ACTUAL -
File attribute access is extremely slow to the extent that it cannot be used from Java code for large folders in real-world applications.

Comments
EVALUATION Verified with directories containing 20,000 files on FAT32 and NTFS.
16-02-2009

EVALUATION Some of the issues cited in the description relate to Swing's JFileChooser component rather than java.io.File. In any case, we are already well on the way to providing bulk access to attributes via JSR-203. Once this work is integrated all the attributes listed by the submitter can be read with a single method. The requirement to cache attributes may not be interesting at that point but if it is then it will be relatively simple for the submitter to develop his/her caching file system that caches attributes that are useful to the application and for the most part delegates to the default file system. There are a number of existing bugs and RFEs also seeking bulk access to attributes and we will likely close this one as a duplicate.
19-10-2006