JDK-8097381 : [WebView] solar system web page is slow, has rendering problems and throws exceptions
  • Type: Bug
  • Component: javafx
  • Sub-Component: web
  • Affected Version: 8
  • Priority: P2
  • Status: Resolved
  • Resolution: Fixed
  • Submitted: 2013-11-21
  • Updated: 2016-12-21
  • Resolved: 2015-04-28
The Version table provides details related to the release that this issue/RFE will be addressed.

Unresolved : Release in which this issue/RFE will be addressed.
Resolved: Release in which this issue/RFE has been resolved.
Fixed : Release in which this issue/RFE has been fixed. The release containing this fix may be available for download as an Early Access Release or a General Availability Release.

To download the current JDK release, click here.
JDK 8 JDK 9
8u60Fixed 9Fixed
Related Reports
Duplicate :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Relates :  
Description
1) Run WebLauncher and use this URL: http://neography.com/experiment/circles/solarsystem
2) Eventually the web page renders (is it very slow)
3) Resize or click around and you will get exceptions etc.
Comments
Ok, thanks. I've filed it (with a beautiful number): RT-40700
30-04-2015

Yes, I agree that this should be handled on a subsequent bug. And it's fairly low priority compared to this fix. Maybe fix for 9 once we get 8u60 going.
29-04-2015

Hi Jim, Well, I see what you're talking about, thanks for the notes. I can't answer the question if the concerns you raised may have any real effect in practice but not in theory. Looks like this requires some more time to invest in it. Unfortunately, we're quite limited in resources, you know (and the list of critical issues is pending 8u60...). So, if we agree the fix is good enough and it "will still do its job in lots and lots of cases", doesn't it make sense to file another (of less priority) JIRA issue to tweak the intersection job in case of stroke ops? Anyway, as the fix is pushed, we will have to have another JIRA ID if we fix the stroke code further...
29-04-2015

This is kind of late, but I just double checked the latest revision and noticed that there isn't always agreement between the intersections we calculate and the way that stroke works. Most of the operations that stroke use "stroke.apply(g)" which returns false if the stroke is not applicable. Prior to that we did our intersections with "stroke.getPlatformStroke()" which may not be appropriate if the stroke "is not applicable". A common way for the stroke to not be applicable is if the stroke.getPlatformStroke() method returns null. For cases where we "fill and stroke" this isn't an issue because we'd still fill the primitive and the null we get back from that method causes us to do the intersection test with a null stroke - i.e. on the fillable bounds. That is OK so far. But, in some cases the only operation being fired off is a stroke operation - this means we will perform the intersection test on a fillable bounds for an operation that will eventually NOP. That will be sub-optimal, but not wrong. Another way that the stroke could be "not applicable" is if the stroke's paint is null. If the operations is a "stroke only" operation then we may calculate bounds for a strokable version even though it will eventually NOP. If the operation is a "fill and stroke" operation then we will merely overestimate the bounds by a stroke boundary when the stroke part is going to be a NOP. It's a minor consideration, though, and the fix will still do its job in lots and lots of cases. I'm not even sure how likely either of the above conditions can be (is the stroke always defined in practice and the tests are only for theoretical cases?).
28-04-2015

Hi Jim, (Sorry for the delay). I fixed everything you mentioned above. Also, I reverted the change here: http://cr.openjdk.java.net/~ant/RT-34443/webrev.11/modules/web/src/main/java/com/sun/webkit/graphics/WCRenderQueue.java.udiff.html This reveals an NPE (reproduced on google maps) and it seems to be related to the "Outstanding resource locks detected" problem. I'm going to work on it separately (it doesn't correlate with this fix). update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.12 I did some additional testing on Windows/Mac/Linux. I played with a dozen of CSS3 animating (layered) samples which I found across the web. Not all of them rendered perfectly, but they rendered the same way without the fix. So I didn't find regressions, at least. Additionally I did a sanity check with the maps: - http://maps.google.com - http://maps.yandex.com The fix doesn't seem to introduce any new issues.
28-04-2015

Jim, Phil, thank you for the review! changeset: http://hg.openjdk.java.net/openjfx/9-dev/rt/rev/f76507f7c9bd THE END
28-04-2015

PrismImage diffs seem to be missing - maybe there were only white space changes in that file? (Also, I didn't check any of the copyrights, are they all at 2015?) WCGraphicsPrismContext - uses of "WCGraphicsPrismContext.this" can be removed, no? (line 491 at the least, but I didn't look for others) - you modified TextUtilities to return TextRun, but then don't take advantage of that at lines 886,896... - lines 899, 900 - use the local variables shadow and stroke. Note that the stroke variable may be null (intentionally) even though getStroke() is not in which case you'd be doing the wrong intersection if you don't use the local variable (the intersection should still be functionally correct, but not optimal) - line 1635 - if stroke is null, this method doesn't draw anything, but the intersection test may cause it to try to...? (This wouldn't be a bug, but it may miss the optimization in some cases...?) - line 1685 - it looks like nothing may get drawn if stroke is null here too, but I'm not sure how to factor in the shadow. The existing test should be OK, but maybe not optimal. None of these should cause any bugs, but could be cleaned up. I don't particularly need to see another version if all you do is fix those issues, but you might want to have Phil take a second look at the text code.
15-04-2015

Hi Jim, There're new changes: - In WCBufferedContext.init() I swapped the lines with clip and transform. Now transform comes first, then clip. Also, clip is set in the user space coords (unlike the previous version). The reason is that the clip is modified by the current transform and is saved in that form. So, this version doesn't change the logic but simplifies the code. - Added getPixelScale() to WCImage/WCImageImpl/RTImage. - In WCBufferedContext.trIntersectsClip() the image size is multiplied by the pixel scale. Previously it was incorrect. (Also, please note that this case (null clip) is actually unexpected). > trIntersectClip() takes a rectbounds parameter and then mismanages it in several ways. This was actually a miss. I assumed to use the local parameter, of course. Fixed it. Thanks for the catch! > In the shadow code, I don't think you need to union with the original bounds since DropShadow should do that for you. Ok. I use the calculated bounds now directly. > I thought we were removing the RTL code to a separate issue? Ok, removed it. > In the drawString code, wouldn't it make sense to always compute font.getStrike() in the enclosing code? Yes, if you mean that the font strike is created later (on the prism side) anyway and is cached, then this makes sense indeed. Fixed it. > What about scrolling long pages? Do we render the entire thing and then scroll the view of it? WRT a compositing layer. It's rendered entirely, regardless of the scroll view. Then, as I scroll, the layer is not re-rendered but its tiles are reused as is. The layer is not updated until it's triggered. For instance, when I resize the window I see it's repainted (entirely again). Not sue what else can trigger its repaint (may be some DOM modifications). update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.11
31-03-2015

Comments on webrev.10: - Hopefully Phil will review the new drawString bounds code. - I've only looked over the code in WCBufferedContext and WCGraphicsPrismContext in the past couple of iterations. Let me know if there is something else changed that I need to look at. - trIntersectClip() takes a rectbounds parameter and then mismanages it in several ways. First, it transforms TEMP_BOUNDS, assuming that is the same object. Second, it does the intersection testing on the bounds parameter, again assuming that the incoming parameter was modified by the transform() operation even though the bounds reference was not referenced in the transforming code block. This all works because it happens that the incoming bounds should be TEMP_BOUNDS, but it is sloppy coding. Either remove the parameter or use it throughout so that the method stands on its own, making no "unstated" assumptions about its environment. Note that "transform(bounds, TEMP_BOUNDS)" would assume that it is allowed to use TEMP_BOUNDS which is currently OK, but may cause complications if we create new uses that might not want TEMP_BOUNDS modified. On the other hand, "transform(bounds, bounds)" would assume that the incoming bounds was not needed by the caller. As long as we document what happens either could be acceptable. Creating another _BOUNDS variable would be safest and most flexible, but probably overkill for this case as long as we document the side effects of the method. My preference would be to get rid of references to TEMP_BOUNDS in that method and document that the method might modify its incoming bounds variable and then just rely on the caller to find a safe copy to pass in if it needs the value preserved. - In the shadow code, I don't think you need to union with the original bounds since DropShadow should do that for you. - I thought we were removing the RTL code to a separate issue? Nits: - In the drawString code, wouldn't it make sense to always compute font.getStrike() in the enclosing code? We will need it in all cases whether or not we are outline or glyphs or bounds or not so just do it unconditionally and then we have less fumbling with "possibly null values" in the ensuing code. I think you addressed my performance concerns anyway, but I'm curious about the comment you made about tiles being rendered only once. What about scrolling long pages? Do we render the entire thing and then scroll the view of it?
26-03-2015

Hi Phil, > getOutline() could be 10x or 20x slower than getVisualBounds .. Got it. In that case it makes sense to calculate rect bounds of a text for the purpose of an intersection test. Please, have a look if I did that correctly: update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.10 As to the ligature case. Honestly, I don't see how can we break at a ligature... We're getting a String object from WebKit (which since jfx9b52 uses ICU to process text), then using prism text API to 1) create layout; 2) get text run's; 3) create glyphs array and advances array (based on an individual glyph position). So, in case the original string contains a ligature character (say, "ffi" which is encoded by a single code point U+FB03 as found in http://graphemica.com/ffi), may it be splitted into a set of glyphs with that logic? As far as I understand, it should be put into the array as a single glyph, with an appropriate x-position. Do I miss something?
26-03-2015

In response to Jim > In drawString you can see that if "shouldCalcIntersection" is true (line 891) then we end up calling strike.getOutline() (line 898) and later shouldRenderShape() on that outline (line 906). I see. Anyway, getOutline() could be 10x or 20x slower than getVisualBounds .. In response to Anton > (By the way, how can we check the code has problems? So, what the sample text may look like?) Well just try any Arabic. Letters have initial, medial and final forms and they maybe completely different. Or break at a ligature even in Latin - an "ffi" ligature will likely have a smaller advance than the individual glyphs. If you are always breaking at white space this may be mostly a non-issue but I don't know that you do.
25-03-2015

Hi Phil, > Its a bit confusing since String.substring(int) means 'from' but you apparently 'know' somehow this is an entirely RTL string ? Yes, I see it may be confusing. However, in the call I know if the string is RTL or not (it's previously processed by WebKit which provides the parameters of the call). > In general the logic for the advances here doesn't look safe for complex text. Ok, if that's the case, then I will create a separate issue for that if you don't mind, because it doesn't directly relate to this particular fix. (By the way, how can we check the code has problems? So, what the sample text may look like?) Thanks for the review! @Jim, Thanks. Just a reminder, that a tile (layer) is rendered only once (or at least, quite rarely) and then reused.
25-03-2015

> I didn't see where the outline was retrieved and used only for bounds, but In drawString you can see that if "shouldCalcIntersection" is true (line 891) then we end up calling strike.getOutline() (line 898) and later shouldRenderShape() on that outline (line 906). shouldRenderShape() does a bounds test on the shape against the clip to see if we should even bother with this render call. It's intended to avoid creating a texture for the tile if none of the renderings intersect it. A very common case is one tile (though at 512x512 that would be a small to modest size webview) and so the first rendering call would trigger us to create the texture and then we'd avoid all intersection tests in the future because shouldCalcIntersection would return false once we have a texture. But if we have a wall of text and one tile on the webview that isn't rendered (or only rendered by some rendering call fairly far down the list of rendering calls) then we'd do this intersection for every string in the HTML against that unrendered tile's bounds. So, how expensive is strike.getOutline() compared to requesting some bounds for the text and doing the intersection call on that bounds?
25-03-2015

> It's now: > width(to) > In RTL mode, this is an offset from the left to the begging of the drawn substring. > The modified version is simply shorter. Isn't this correct (Phil?) Its a bit confusing since String.substring(int) means 'from' but you apparently 'know' somehow this is an entirely RTL string ? If you are wrong on that it will not layout at all as you expect. In general the logic for the advances here doesn't look safe for complex text. It's apparently relying on advance(0->length) being the same as advance(0->from) + advance(from ->length) ie it assumes you can add partial advances. In the degenerate case this is the same as adding the advances of the individual chars which it should be apparent is not correct. > I'll leave it to Phil to judge if that is a reasonably fast way to do bounds checking on text. I didn't see where the outline was retrieved and used only for bounds, but visual bounds pulls info directly from the font (via platform APIs, or via Java code that opens the font itself) and is fairly fast these days. If you look at the source of javafx.text.Text you'll see we use for its visual bounds support we use getTextLayout().getVisualBounds(..) in preference to getShape().getBounds(); And if you just care about advances use getBounds()/getLogicalBounds()
24-03-2015

Hi Jim, > Shape.accumulate transforms each point in the shape and then accumulates it into the bounds. This produces a tighter bounding box than accumulating the bounds and then transforming. Correct. I was wrong about it. > It can get complicated pretty fast, but there are probably some common cases that are worth targeting. Well, your suggestions for small perf tweaks sound reasonable. I implemented them. Thanks! > It shouldn't cause an "uninitialized" error if you make sure that every nested if clause and else clause initializes every variable. Indeed, this is interesting. Honestly, I never used it that way. I agree with that this is rather a style thing, but I put it into the code as a sample :) > I'll leave it to Phil to judge if that is a reasonably fast way to do bounds checking on text. Ok. The text performance is really important for WebView. (Though the intersection test is specific and is not supposed to perform continuously.) So far, I have the following: update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.9
24-03-2015

Hi Anton, I'll respond to a couple of questions first, and then review the new version later... > > - wrt transforms, my gut feel is that if you don't have a shadow then doing the accumulation with the transform > > would be more accurate than accumulating and then transforming. > > Well, the question then is if the rect outline of a shape is transformed to a rect outline of the transformed shape > with Affine transform? It is, if I'm not mistaken. So, could you please clarify what do you mean by "more accurate"? > The floating point arithmetics? (So far, I left it unchanged...). Shape.accumulate transforms each point in the shape and then accumulates it into the bounds. This produces a tighter bounding box than accumulating the bounds and then transforming. Consider the bounds of a transformed circle. If you rotate the circle the bounds never change. But, if you compute its bounds you get a rectangle. If you rotate that rectangle then its corners will extend beyond the original rectangle and computing new bounds from that rotated rectangle will grow the region. > > - wrt rectangles it would be nice to optimize the case of rectangular primitives with no shadow as accumulating > > over a shape, even a rectangular one, would be much slower > > Sorry, didn't understand your idea. Do you suggest to "manually" calculate stroked outline for a rectangular shape (like I did before)? It can get complicated pretty fast, but there are probably some common cases that are worth targeting. We need to find a good level of tradeoff of: - how much new code we add vs. how much time it saves - how fast the code runs vs. how accurately it computes bounds - (keeping in mind that this code is only executed until the first time we trigger a buffer allocation) Right now you are treating all rectangular primitives as shapes and doing the long calculations on them. That does reduce the code, but it comes at the expense of speed. Unstroked rectangles should be pretty quick to compute bounds (even if transformed) and do an intersection test as per your first few revs. Stroked rectangles can be simply padded (since they have no angles more than 90 degrees and so no stroke decorations can cause their bounds to grow any worse than that), transformed and intersected. Rounded rects and ellipses might be more accurate if we treated them like a shape, but the added accuracy would come at the expense of adding more specific code for them and adding compute time. I think it is probably a good tradeoff to just treat them like rectangles, especially if we optimize unshadowed rectangles as per the above. Right now ellipses and round rects are turned into rectangle shapes, which costs the compute time of processing a shape, but without the added benefit of knowing how to cut their corners out of the rotated bounds if they are rotated. My gut feel for the best "bang for the buck" would be: rectIntersects() { if (shadow) return shapeIntersects(...); if (stroke) pad bounds by appropriate amount return rect bounds intersect clip/texture bounds } shapeintersects() { Transform accumTX = (shadow) ? IDENTITY : shapeTX; bounds = shape/stroke.accumulate(shape, accumTX) if (shadow) return shadowintersects(shape, shapeTX); else return bounds intersect clip.texture bounds; } We end up turning an ellipse into a rect which isn't optimal bounds, but it simplifies the code. We use the transform in the shape/stroke.accumulate phase if we don't have a shadow which makes it more optimal and the only code complexity is having to do a ?: to figure out which transform to use. The shadow case still does the accumulate and then transform, and we could actually perform computations which let us transform first and then apply the shadow to the transformed bounds, but the added complexity for the corner case of shadows is probably not worth it. > > move the assignments to "null" into an "else" clause and leave the original declarations of the variables > > without any initialization > > That causes "the variable might not have been initialized" error... Keep in mind that I don't think this is a flaw here, I'm just describing a technique that I use in similar situations and why I sometimes use it. It shouldn't cause an "uninitialized" error if you make sure that every nested if clause and else clause initializes every variable. I sometimes use this technique to make sure that I've consciously done initializations in all cases. If you initialize the variables to a default value and then rely on nested if/else clauses to change them when the defaults are wrong then you run the risk of forgetting a case and leaving them default values. Declaring them uninitialized and instead making sure all cases assign a value means that every initial value case was consciously processed and the exact values that each case uses can be seen right next to the conditions that trigger it. It's a style thing mostly, but it can improve maintainability when used in the right situations. > > drawString, line 919 - you always do the bounds check on the outline? Isn't that overkill? > > Seems like I should have used the shouldRenderShape version (I missed it, thanks for the catch!). > If that is what you meant, then I fixed it. No, I think I was misreading it and I thought that Stroke.getOutline() was returning something about stroked text, but now I see that it is simply computing the fillable outline of the text. But, computing that outline path isn't very simple. In the most common case of rendering the text from glyphs it is a lot of work that we wouldn't ordinarily do since nothing about rendering text would compute that large path outline. Even if it is a case where we would render the text from outlines (typically very large letters), then we may end up computing the path twice - once here for bounds testing and then later on for rendering. I'll leave it to Phil to judge if that is a reasonably fast way to do bounds checking on text. My fear is that if we have a page with a lot of text and some tiles that do not get rendered then we will do a lot of Shape production which could cost a bit of performance when a simple bounds test based on computing the ?layout? bounds of all of the glyphs would suffice.
23-03-2015

Hi Jim, > printStackTrace() is fine I suppose if that is "this should never happen" code, and it looks like that is the case. That's it. > I like the new system, it looks good and functionally correct. Good to hear that, thanks! > - deriveWithShadow also transforms, which isn't reflected in its name. I agree. I thought it would look better if I eliminate all the methods called only once at all. I put step-wise comments instead. > - wrt transforms, my gut feel is that if you don't have a shadow then doing the accumulation with the transform would be more accurate than accumulating and then transforming. Well, the question then is if the rect outline of a shape is transformed to a rect outline of the transformed shape with Affine transform? It is, if I'm not mistaken. So, could you please clarify what do you mean by "more accurate"? The floating point arithmetics? (So far, I left it unchanged...). > - wrt rectangles it would be nice to optimize the case of rectangular primitives with no shadow as accumulating over a shape, even a rectangular one, would be much slower Sorry, didn't understand your idea. Do you suggest to "manually" calculate stroked outline for a rectangular shape (like I did before)? > Overrides for shouldRender*() in WCGPC.java - I probably wouldn't have bothered and just supplied the null arguments at the call sites Ok, I did that. I'm hesitating to say which one is better in this case. So, I simply reduced the code. > drawString calls, are the final shadow variables necessary? Or is this a case of the fact that they are initially declared as null and then filled in later? That is. The "final" keyword for the placeholder declaration is unnecessary (due to it's considered as effectively final). But I'd have it for evidence. > move the assignments to "null" into an "else" clause and leave the original declarations of the variables without any initialization That causes "the variable might not have been initialized" error... > drawString - "res" variable name Renamed. > drawString, line 919 - you always do the bounds check on the outline? Isn't that overkill? Seems like I should have used the shouldRenderShape version (I missed it, thanks for the catch!). If that is what you meant, then I fixed it. > drawString line 965-980 - is there a reason we don't just rely on the next call to do the intersections? Oh, not a single reason! Thanks! Fixed. > drawString handling of from, to - substring(to) grabs all of the characters that are in the source string after the last char to be rendered. I'm not sure how that can be right It was: width(from) - width(from, to) It's now: width(to) In RTL mode, this is an offset from the left to the begging of the drawn substring. The modified version is simply shorter. Isn't this correct (Phil?) update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.8
23-03-2015

Hi Anton, printStackTrace() is fine I suppose if that is "this should never happen" code, and it looks like that is the case. I like the new system, it looks good and functionally correct. Some suggestions: - deriveWithShadow also transforms, which isn't reflected in its name. For a bit I was wondering where the transform was util I spotted it in the shadow method. - wrt transforms, my gut feel is that if you don't have a shadow then doing the accumulation with the transform would be more accurate than accumulating and then transforming. On the other hand, it could be slower if the shapes optimize iteration with an identity transform. - wrt rectangles it would be nice to optimize the case of rectangular primitives with no shadow as accumulating over a shape, even a rectangular one, would be much slower Overrides for shouldRender*() in WCGPC.java - I probably wouldn't have bothered and just supplied the null arguments at the call sites, but that's just a style difference. drawString calls, are the final shadow variables necessary? I thought the compiler would allow "effectively final" values in inner classes now. I'll note that you didn't shadow the gl or shadow or stroke variables, but they still get passed to the inner class, doesn't it? Or is this a case of the fact that they are initially declared as null and then filled in later? (One way to change it to be more "final-like" would be to move the assignments to "null" into an "else" clause and leave the original declarations of the variables without any initialization...?) drawString - "res" variable name - I wasn't sure what that stood for. Maybe "intNeeded" or "testNeeded"? drawString, line 919 - you always do the bounds check on the outline? Isn't that overkill? For one thing it requires us to calculate an outline for every OOB string operation, and it is a conservative test. drawString line 965-980 - is there a reason we don't just rely on the next call to do the intersections? drawString handling of from, to - substring(to) grabs all of the characters that are in the source string after the last char to be rendered. I'm not sure how that can be right, but I'll leave it to Phil?
20-03-2015

Hi Jim, thanks again for your valuable notes! > moving the code that calls the intersection test into the super class. Good idea. Did that. This eliminated duplicating the strings metrics code, at least. > if other sub-classes can do a reasonable job of testing the intersection with the information that they have available? Currently, there's only one sub-class. However, in case there will be more, they will reuse the same pattern. > we do need to deal with drop shadow Addressed with WCBufferedContext.deriveWithShadow(). Please, check if I got the logic right. > line 129: The transform is passed in as a parameter, no need to call getTxNoClone() here. > line 150: Why use toRectBounds() instead of doing the intersection test on all of the .x.y.w.h members as in line 114? Also, what does line 150 do if there is no clip? Perhaps 112->117 could be factored into a helper method? Fixed. > line 239: this works for lines with round caps I think, but butt caps would have a smaller fringe and square caps have a larger fringe so we need to deal with that. > line 301: strokeArc could be off for similar reasons to drawLine since it involves end caps. I solved that by switching to Shape-based calculations for all the primitives. This simplified the code and... the task. > line 248: I'm not familiar with the drawPattern() call, but if it constrains its output to the destRect and the transform in the arguments applies only to the pattern, then this is correct. Did I read that right? Yes, I think so. It only calls g.fillRect(destRect), so all the rest applies to the source and paint. > line 293: strokePath should not have a null path? Now it's in the original if-statement. > line 307: strokeRect - doesn't that include the stroke? It does. Fixed. > 2 drawString methods - should the work to compute the bounds be only done if graphics is null? Right. I put extra calculations to an if-statement. > WCGraphicsPrismContext line 94: Should the baseTransform field just be declared as Affine3D? This method is called twice with a parameter of declared BaseTransform type. So, this simply saves us two casts. > WCRenderQueue - is this change just for debugging here, or do you think we should do this in the production builds as well? Well, I indeed thought about leaving it in production. This would make bug reports (if any) contain a stack trace, simplifying initial evaluation. This is a common practice, isn't it? (I made a search in jfx graphics and found 117 matches of printStackTrace...) @Phil Thanks in advance. I actually left the original string metrics code virtually untouched, except the following (not related to the fix). In drawString(WCFont f, String str, ...): TextLayout layout = TextUtilities.createLayout( str.substring(from, to), f.getPlatformFont()); - if (rtl) { - x += (getLayoutWidth(str.substring(from), f.getPlatformFont()) - - layout.getBounds().getWidth()); is replaced with: + if (rtl) { + if (to < str.length()) { + offset = TextUtilities.getLayoutWidth(str.substring(to), font); + } That is, in right-to-left, the following: width(from) - width(from, to) is replaced with: width(to) because the latter is a shortened equivalent of the former., AFAICS. update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.7
19-03-2015

I also would like Phil to look at the bounds checking in the drawString methods and determine if it is duplicating work with the super.drawString() call... Phil?
17-03-2015

On the widget.getSize() methods, all I can say is "ouch". I'm willing to trust you on that one as I have a lot of other reviews to get to... Unless someone else on the watch list wants to look at that code (it's just a few lines)...?
17-03-2015

Hi Anton, WRT pushing the methods into the superclass - the intersection tests could be left in the sub-class while moving the code that calls the intersection test into the super class. The only question is if other sub-classes can do a reasonable job of testing the intersection with the information that they have available? Most all of the logic could be moved up save for lines 112->117 and just that method could be left in the sub-class (not sure if the "baseGraphics==null" test makes sense in the super class either, though). Does that simplify anything or open up any other optimizations? WCBufferedContext.java: Yes, we do need to deal with drop shadow - I just noticed that on this review as well. It will expand the bounds based on union with a "grown and offset" shadow (you can trace into the DropShadow object (the one in Decora, not the one in the FX API) from the shadow field and see how it computes bounds). line 129: The transform is passed in as a parameter, no need to call getTxNoClone() here. line 150: Why use toRectBounds() instead of doing the intersection test on all of the .x.y.w.h members as in line 114? Also, what does line 150 do if there is no clip? Perhaps 112->117 could be factored into a helper method? line 239: this works for lines with round caps I think, but butt caps would have a smaller fringe and square caps have a larger fringe so we need to deal with that. line 248: I'm not familiar with the drawPattern() call, but if it constrains its output to the destRect and the transform in the arguments applies only to the pattern, then this is correct. Did I read that right? line 293: strokePath should not have a null path? line 301: strokeArc could be off for similar reasons to drawLine since it involves endcaps. line 307: strokeRect - doesn't that include the stroke? 2 drawString methods - should the work to compute the bounds be only done if graphics is null? WCGraphicsPrismContext line 94: Should the baseTransform field just be declared as Affine3D? WCRenderQueue - is this change just for debugging here, or do you think we should do this in the production builds as well? Sorry about the mixup on the default tile size. The sdiffs window was so wide that I only saw the "old" version and didn't see the new value until I remembered to scroll right. 512 looks fine.
17-03-2015

Hi Jim, thanks for the thorough review! > line 88: Should we also do an intersection test? Or are we only doing the intersection tests to avoid creating a graphics? I think so. I expect that in an average web application empty tiles occupy much less space of a layer than non-empty tiles. Thus, we won't save a rendering with an intersection test in majority of calls. Another thing is that once an RTImage got its graphics, the other time it is being updated the chances it will (again) pass the intersection test are high. And so, I inserted a "return" in the helper methods when graphics is not null, following your advice. > lines 95-100: The cascading ifs are unnecessary, or they are misplaced. They are unnecessary, indeed. Thanks for the catch. I fixed it. > line 154: Same with all of the other draw*() methods - take strokeWidth into account. Done. > line 163: (Note that focus ring uses a different stroke then the regular draw*() methods). Done. > Is it worth moving all of these "intersects override" calls up into the parent class where the work is done This would save us a number of the new wrapper get*NoClone() methods. However, there's a dependency on WCBufferedContext by its "img" private field. I could set the clip based on the img size, which is available in super, however this doesn't seem worth playing. So, I'd leave those methods where they are now, if you don't mind. > line 221: would need a variant of pathIntersects that deals with stroke. Also, see BasicStroke.accumulateStrokeBounds(...)... Done. Hope I did that right. > line 94 - state.setTransform() copies the values so you don't need to protect it with a copy of the object here. Fixed. > WCRenderQueue.java: is this change necessary? This log actually masks exceptions for the whole rendering process.. It's easy to forget about it and then an exception is swallowed... (I had one). So, I think printing it out is preferable here for the sake of easier maintenance. > line 353: this is why I always wrap a case statement in braces if I create local variables... Right, I added braces. > You seem to have gone with a max of 2048 for the buffer size now...? Why did you think it? I didn't change the limit. We have 512 max: -static const int s_maximumAllowedImageBufferDimension = 2048; +static const int s_maximumAllowedImageBufferDimension = 512; > I didn't understand the widget.getSize() methods enough to review them. Is there someone else that can look at those? Well, it was me who primarily wrote the Widgets code (way back :) I'm afraid I'm the only one left. So, we can ask virtually anyone to look at it... Shortly, the html input elements (backed by FX controls) are created and managed (css/layout/sync) on JavaFX App thread. Then, on Render thread they are rendered in pre-configured size & state. The getWidgetSize() methods, which I've added, are invoked from the rendering code and so on Render thread where the widgets (FX controls) have their final size prior to painting. The widgets (as well as other Object graphics resources) are referenced by a dedicated Ref object. A Ref instance is put into a special ref_queue when an appropriate render command is put into the native render queue. A Ref instance is referenced by an integer ID from the render_queue. On Render thread a Ref instance is extracted from the ref_queue by its ID and then casted to necessary sub type (WCPath, FormControlRef, etc etc) and that graphics object is eventually used for rendering. When a render cycle is done the refs are released. (We're getting right into the cycle.) Also, the ensureNotDefault() method is used to make sure a particular method is not called in a "default theme instance". This instance is created and used in a page-less context. This doesn't take place for widgets (which always connect with a certain page), that's why I call the method. Hope I shed some light on it... There's one question left still. Some render primitives use DropShadow. Do we need to count it and what's the best way to do that?.. update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.6
17-03-2015

Looking good: WCBufferedContext.java: line 88: Should we also do an intersection test? Or are we only doing the intersection tests to avoid creating a graphics? If the latter, then shouldn't the first line in the intersects() helper method test if we already have a graphics and just return true? If we aren't worried about the overhead of the rendering primitive, just about avoiding creating a graphics then we shouldn't do all of the work of the intersects() method once we have a graphics. What is the best tradeoff here? lines 95-100: The cascading ifs are unnecessary, or they are misplaced. If we are going to test both conditions, we should do it before any work is done at all. If they stay where they are at line 95, then just use "if (clip != null) return <intersects clip> else if (img != null) return <intersects img else return false. No need to test both, then test clip again like that. Also, if img is null, then how will we get a graphics? Shouldn't we have "if (img == null) return false" at the very top? Then later we just have "return (clip != null) ? <intersects clip> : <intersects img>"? line 107: drawRect() definitely needs to take strokewidth into account. Hmm... It looks like some of the draw*() calls are the equivalent of "fill+stroke" so they may conditionally need to take stroke into account if one is set. line 154: Same with all of the other draw*() methods - take strokeWidth into account. line 163: (Note that focus ring uses a different stroke then the regular draw*() methods). (General Note: Is it worth moving all of these "intersects override" calls up into the parent class where the work is done in case the work done in the parent method more easily lends itself to computing bounds?) line 213: This may be a case where we say "if there is already a graphics, do we really need to worry about bounds?". Also, computing bounds from the transformed iterator takes less memory than copying the path to transform it. We may have a utility method somewhere that does that (Shape.accumulate(...)?)... line 221: would need a variant of pathIntersects that deals with stroke. Also, see BasicStroke.accumulateStrokeBounds(...)... I'll leave it to Phil to see if the drawString() overrides are optimal... WCGraphicsPrismContext.java: line 94 - state.setTransform() copies the values so you don't need to protect it with a copy of the object here. WCRenderQueue.java: is this change necessary? GraphicsDecoder.java: line 353: this is why I always wrap a case statement in braces if I create local variables... You seem to have gone with a max of 2048 for the buffer size now...? I didn't understand the widget.getSize() methods enough to review them. Is there someone else that can look at those?
13-03-2015

I found and fixed 3 flaws in webrev.4: 1. A cast to Font is replaced with a cast to PGFont in WCBufferedContext.drawString(). 2. A null clip is checked in WCBufferedContext.intersects(). 3. RULE_NONZERO is added as the default rule for path in GraphicsContextJava::drawConvexPolygon() (it's read in GraphicsDecoder.java). I plan to do more testing of the fix when we come to agreement with its final version. update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.5
13-03-2015

Hi Jim! (I'm sorry for the delay, I had to switch to some other critical stuff and now I'm able to return to this.) Please, look at the new webrev: update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.4 Here, the changes are: - All the rest of the graphics primitives are overridden in WCBufferedContext. - A dedicated "intersects" method and a cached RectBounds instance are created. (The ceil/floor problem has gone.) - In drawRoundedRect() rect bounds are taken into account. - The drawPolygon() method now takes WCPath (the native side and GraphicsDecoder are modified appropriately). - RenderTheme.getWidgetSize() is added. I have a concern about the drawString methods where I calculate metrics. I had to duplicate some code from their super implementation. However, I'm worrying about performance. Do you think may it have a noticeable impact? To avoid that we could overload the methods in super class with additional parameters (metrics). However, this approach doesn't look nice to me... Also, what do you think about these two things: 1) Do we need to take into account line width (in the draw ops where it takes place); 2) Do we need to take into account effects when calculating bounds. With regard to your questions: > Even with a circle, though, what are the chances that a bounds test may intersect, but a shape test would not? The chances are higher by 4/PI (~1.27), right? :) However, taking into account "the complexity of a circle intersection test", I yet switched to rect bounds. > > No we have a separate queue for every RTImage. > That might be a potential optimization later, That makes sense, thanks. I'll file an issue. Also, there were two more issues discussed (far) above. 1) The context menu appears only once when hovering over planet names on the left. 2) NPE is thrown when resizing the window with the fix. The first one has gone with the updated WebKit (RT-36726). The second is still there (I remember I have to file it).
11-03-2015

> I thought abou that too. However, a rect may be rounded to a circle, just like in the demo, and I yet decided to count the arcs. Well, I'll try not to do that and see if it changes the result. Even with a circle, though, what are the chances that a bounds test may intersect, but a shape test would not? (Compared to the complexity of a circle intersection test...) >> Another option would be to cache an untransformed version of the clip rect and we could go straight to an intersects() test. >> The problem with that is that the transform might change a lot and we'd be constantly invalidating the untransformed clip and it would add a lot of complexity for no savings. > > Yes, we would need to track the transform changes then. And if the transform changes a lot (translates perhaps?) then it may be a net loss as well, so I wouldn't be too aggressive about this idea without more hard data on how often it would save us work. >> Would it save time to do the bounds intersections at the time we queue things up? > > I wouldn't do that by many reasons. The native counterpart of the rendering queue impl is kept as simple as possible. >> not complicating the native glue code. This makes the logic more compact, That makes sense. > No we have a separate queue for every RTImage. That might be a potential optimization later, then - to share a queue for multiple tiles of the same layer. Although I don't have a good handle on how much rendering tends to be done to layers (or tiled layers)...
16-02-2015

Hi Jim, yeah the vram size is reduced by ~5 times for this test! On Windows I have the following now: [java] D3D Vram Pool: 42��010��006 used (7,8%), 67��108��864 target (12,5%), 536��870��912 max [java] 83 total resources being managed [java] average resource age is 120,6 frames [java] 0 resources at maximum supported age (0,0%) [java] 58 resources marked permanent (69,9%) [java] 0 resources have had mismatched locks (0,0%) [java] 0 resources locked (0,0%) [java] 65 resources contain interesting data (78,3%) [java] 0 resources disappeared (0,0%) That's the 40Mb you're talking about. > why do you clone the clip in getClipRect() rather than provide a getClipRectNoClone()? I suppose that was done because originally I modified the clip locally, but then improved the code, though forgot to fix the getter. Sure, there's no need to clone it unless we need a clone... > In the round rect case I would just do the intersection test on the bounds of the rectangle and ignore the rounded corners. I thought abou that too. However, a rect may be rounded to a circle, just like in the demo, and I yet decided to count the arcs. Well, I'll try not to do that and see if it changes the result. > "ceil(w)" isn't the right operation, you need "ceil(x+w)-floor(x)" Oh, thanks! I've missed it, not being a 2D expert =) > I would move the tests into a helper function Ok, I'll do that. > make the rb variable a static or instance-specific temp field Ok. > Another option would be to cache an untransformed version of the clip rect and we could go straight to an intersects() test. The problem with that is that the transform might change a lot and we'd be constantly invalidating the untransformed clip and it would add a lot of complexity for no savings. Yes, we would need to track the transform changes then. > Would it save time to do the bounds intersections at the time we queue things up? I wouldn't do that by many reasons. The native counterpart of the rendering queue impl is kept as simple as possible. Basicly, we either have the optimization on the WebKit side, or on the java side where possible, not complicating the native glue code. This makes the logic more compact, having the target platform (jfx/prism) stuff close at hand. This way it's way easier to develop and support. Also, at the time we populate the queue, there're only op codes, and no clip as a value. So, we would need to do a kind of intermediate decoding for a number of ops, have them cached somewhere and all the stuff. This would complicate the task a lot for us. So, the design we have currently doesn't assume intermediate optimizations... > Or do we save resources by sharing the same queue for every tile No we have a separate queue for every RTImage. The whole page (or a dirty rect area of it) is painted by WebKit in a single pass. That includes rendering all the layers/tiles (please, don't forget that a layer is rendered virtually once, and then only reused). So, every tile rendering is packed into a separate sub-queue (a part of the main queue). This is pretty optimized in terms of having the whole queue as small as possible. (I don't have an updated webrev yet. I'll provide it asap.)
16-02-2015

Another option would be to cache an untransformed version of the clip rect and we could go straight to an intersects() test. The problem with that is that the transform might change a lot and we'd be constantly invalidating the untransformed clip and it would add a lot of complexity for no savings. Also, do we queue up rendering commands for each tile? Would it save time to do the bounds intersections at the time we queue things up so that empty tiles would simply have empty queues to process? Or do we save resources by sharing the same queue for every tile in which case the bounds testing can't be done until we are actually rendering a specific tile?
13-02-2015

Hi Anton, so the figure of 168Mb was on retina!? That is even more encouraging! I thought that was non-retina. This means that it should be around 40Mb on non-retina which is not too shabby (compared to what they are asking the code to do). WRT the code to do the intersections, why do you clone the clip in getClipRect() rather than provide a getClipRectNoClone()? Is that to avoid leaking a reference that could be modified? It's odd because we already leak the transform in getTransformNoClone() in the same class so we either trust the callers or we don't. If we are really concerned about keeping the clip safe then we could instead provide intersectsClipRect(Rectangle) and intersectsClipRect(xywh) methods. In the round rect case I would just do the intersection test on the bounds of the rectangle and ignore the rounded corners. The number of tiles saved because a rounded rect narrowly misses the corner of a tile is probably small enough not to be worth the performance difference. "ceil(w)" isn't the right operation, you need "ceil(x+w)-floor(x)". Think of x=0.9 w=0.2. You should end up with a rectangle at x=0,w=2, but you get w=1 with a simple "ceil(w)". I was going to ask why you didn't use the Rectangle.intersects() method, but I see that someone removed it when we imported the code from Java2D. D'oh! I would move the tests into a helper function so that we can do performance work on it later. Something like: boolean renderingIntersects(float x,y,w,h) { RectBounds rb = new RectBounds(x, y, x+w, y+h); // Or reuse temp var? BaseBounds txb = getTransformNoClone().transform(rb, rb); Rectangle clip = getClipRect[NoClone?](); return txb.intersects(clip.x, clip.y, clip.x+clip.width, clip.y+clip.height); } Later we can try to do something faster if the transform is only a translate, etc. but if we make the rb variable a static or instance-specific temp field then this shouldn't take a lot of time since the transform(bounds, bounds) methods are already pretty optimal for the various simplified transform types.
13-02-2015

Here is the update: http://cr.openjdk.java.net/~ant/RT-34443/webrev.3 In WCBufferedContext.java the following primitives are currently covered: clearRect, drawRect, drawImage, fillRoundedRect, fillPath, fillRect. In filllRoundedRect "arcw" and "arch" are taken as average values just like in the super method. In RTImage.java fillQuad() is used, following your advice. Statistics on my Retina MacBook Pro: [java] ES2 Vram Pool: 165,208,056 used (30.8%), 174,854,892 target (32.6%), 536,870,912 max [java] 100 total resources being managed [java] average resource age is 53.1 frames [java] 0 resources at maximum supported age (0.0%) [java] 75 resources marked permanent (75.0%) [java] 0 resources have had mismatched locks (0.0%) [java] 0 resources locked (0.0%) [java] 79 resources contain interesting data (79.0%) [java] 0 resources disappeared (0.0%)
13-02-2015

I used the same tile limit - 512x512. So far I don't have enough info about the other mapper implementations. So, can't answer your question. I'll try to dig something about it. Let me prepare a draft fix (I need to clean it a little bit). I won't override all the primitives, but just a few to have an impression of what it looks like.
12-02-2015

That is great information to have finally Anton! Things are starting to make sense. What tile size did you use for the 168Mb result? If they are going to render everything into every tile then it does make sense. On the other hand, you mentioned that there were more interesting tile mappers available that we weren't using. Do those have this optimization already applied? Getting the bounds code right shouldn't be too hard, but it can be tricky under some cases so I'd have to look at your mechanism and see if I can spot any issues, but, yes, I think that this is an interesting and important optimization to pursue for this test case (and though I wonder how common these conditions are - I have to admit that other web renderers seem to be more efficient on this test case as if they do use this optimization).
11-02-2015

>> how is creating a temp texture and having a finally clause any simpler than simply replacing that middle block with setColor(transparent), fillQuad(dxy12)? Sure, I'll do that. As to the bg fill optimisation you suggested. That can be done, indeed. And I went into thorough analysing of the rendering log of the demo to understand what exact graphics primitives are painted into tiles. What I found was quite surprising. 1) I was wrong about the bg colour. WebKit doesn't fill a tile with bg colour if the layer has transparent bg. That's the case for the layers behind the planet orbits. The layers have the visual stuff (orbit + planet + ring + etc.) and a text painted onto it, but no bg fill eventually happens in RTImage. 2) What is actually rendered into a tile is the layer content (the planet stuff) itself. And that was total confusion that WebKit rendered the content into _every_ tile. First, I was thinking if that is a bug in our rendering queue, but then I found out WebKit was rendering it in its texture mapper implementation. So, I tried to guess what was the logic behind it. When a layer is splitter into tiles, the visual content may cover a single tile or a number of tiles or all of them. Also, the content may have quite complicated geometry. So, when a tile is painted, it may be a problem to figure out what piece of the content should be rendered onto it, and to render the only piece (I suppose there're 2D algorithms which solve the problem with possible max efficiency). What WebKit does (with its simple TextrueMapperTiledBackingStore) is probably a straightforward solution - it paints the whole content at every tile. Then, applying that to the solar demo Each layer has content that is covered with maximum 3 to 9 tiles of 512x512 size. All the rest remains empty (except the tile with text). So, when WebKit renders the content into a tile we may check if the primitive being rendered has real intersection with the tile bounds in the target graphics coordinate space. In the demo, the majority of the primitives (as I can see from the log) is limited to: fillRoundedRect and drawRect. So, I overrode the two appropriate methods in WCBufferedContext (which is tight to RTImage). Then in the methods, I applied the current transform to the primitive bounds and checked if the result intersected the RTImage bounds (0, 0, w, h). In case it didn't, the method simply returned not causing the associated RTImage to create graphics and a texture. I ran the original demo and it flew! 168Mb vram with this change (against 800Mb vram with the original fix). So, this was just a try. If the idea is worth implementing, then we will need to do the same with every primitive from the Graphics. What do you think, Jim?
11-02-2015

Did you look at using fillQuad instead to avoid the texture creation entirely? One thing to note about the webrev.2 fix, you don't need to use 0,0,0,0 for the source coordinates or create a special call to drawTexture for that case. A 1x1 transparent texture is an infinite field of transparency for any set of source coordinates. On the other hand, the "texture" argument to the drawTexture call is different so it may not simplify anything to try to fold together the two calls to drawTexture in that case. As written, how is creating a temp texture and having a finally clause any simpler than simply replacing that middle block with setColor(transparent), fillQuad(dxy12)? With respect to detecting the single color tiles - we detect something similar in Canvas to reset the render queue. If you wrap fillRect() and detect the case: - if graphics has not been created yet - and xywh cover the tile/layer - then save the color aside and return without creating anything - and, when lazily creating a texture, if there is a color saved aside, fill the newly created texture with that color before returning - (TBD - if someone can resize a layer that might have a saved bg fill, then we may need to save the original dimensions of the fill and/or trigger its lazy creation depending on the new size - but I'm guessing that these layers are never resized?) - In the layer compositing method, if there is no texture, but a saved color, fillQuad on the destination with that color instead of drawTexture. If all they do is fillRect(xywh) on the texture then we create nothing.
05-02-2015

Jim, thanks for all the details. > the same technique (1x1 temp texture) can be used for a tile even if it has been filled with a non-transparent bg color if we can detect that case Right, but the problem is that we can't easily detect that... WebKit has this knowledge. Even more, it updates a layer content stepwise, including the phase: "filling the background". However, that logic is virtually not accessible for us. It's not a part of the abstraction layer we use. Theoretically, it's possible to hack it, but that is what we are better to avoid (we should do our best to stay in the bounds of the API entry points). So, first I'd consider switching to more advanced TEXTURE_MAPPER backends and only if that is impossible, I'd think of hacking the code we use currently... I've updated the fix to use a 1x1 temp texture for the non-SRC_OVER case: http://cr.openjdk.java.net/~ant/RT-34443/webrev.2
04-02-2015

Oh, and, setColor(bg)/fillQuad() works for the "tile is just bg color" case even without having to worry about CLAMP modes, and an easy way to reduce the impact on the code is to just do this in the if statement at the top of the function: if (txt == null) { if (fullBGColor) { g.setColor(fullBGColor); } else if (mode != SRC_OVER) { g.setColor(Transparent); } else { // transparent and SRC_OVER return; } g.fillQuad(dxy12); return; } That works without creating any mechanisms for temp textures for either the empty tile or the solid tile cases...
03-02-2015

Technically, the same technique (1x1 temp texture) can be used for a tile even if it has been filled with a non-transparent bg color if we can detect that case and optimize it out (i.e. make texture allocation lazy in that case like we do now for unrendered tiles). Update - a 1x1 temp texture can be used for non-transparent colors, but you must specify CLAMP_TO_EDGE which requires a cached texture (RTT does not support CLAMP_TO_EDGE) so it gets tricky (to avoid filling the cache with duplicate copies, you would need to use the same "Image" object to get the cached texture every time and you would have to flush the vertex buffer if you decide to update the image with a new bg color in the same frame).
03-02-2015

For the non-SRC_OVER fix in webrev.1 - since it is all transparent, you can create a 1x1 texture since it represents an infinite field of transparency (whether you stretch it or just sample outside the bounds since CLAMP_NOT_NEEDED, CLAMP_TO_EDGE, and CLAMP_TO_ZERO all do the same thing in the case of a totally transparent texture regardless of its size). Also, setColor(Transparent)/filllQuad() can also be used instead, but that may impact the code more than just substituting a 1x1 empty texture for the existing graphics calls. Arguably a static 1x1 texture could be used for all non-SRC_OVER empty tile cases, but you'd have to deal with lock/isSurfaceLost/unlock on it or make it permanent. The resource usage is so minimal that I'm not sure it matters either way, though, but if there was a test case that hammered on this we might be kinder to the texture allocator and possible fragmentation if we shared an empty tiny texture.
03-02-2015

Jim, ok, when an empty texture (with zero alpha) is rendered onto a target, it will (can) modify the target unless the mode is SRC_OVER. So, we can't simply ignore it, you're right. By the way, I've checked if the method is called with null texture and the mode different from SRC_OVER , with the solar system demo. It is not, however it is called for null texture & SRC_OVER (the default mode for BaseGraphics). What about implementing the behaviour the comment describes? Please, have a look: webrev: http://cr.openjdk.java.net/~ant/RT-34443/webrev.1 It renders a temp texture if it's null. No need to retain it, as we always can re-create its (empty) content when (and if) it's requested for any meaningful rendering sometime later. - - - Also, I have to update my previous posts. - I've attached a reduced test case which consists of two files: index.html, styles.css. - The problem with that the context menu of a planet (in the side list on the right) only displayed once, the first time. This is not a regression of this fix, I checked that. I'll file a separate bug. - The problem with the NPE in WCPageBackBufferImpl.validate() on resizing. I can't reproduce it on Windows w/ and w/o the fix. However, I can (but harder) reproduce it on Mac w/o the fix. So, this is not a regression as well. Will file a separate bug. - I forgot to mention (in the overall description of the problem in question) that the reason WebKit uses all (or almost all) the tiles is that it simply fills them with bg colour. We can't ignore it, because a layer can have a specified bg colour (or even a bg image). Just for an experiment I tried to set the layer's html element colour to transparent, however this didn't help. WebKit still fills it with some bg colour (probably, inherited). This is css-processing code, deep in the WebKit rendering logic.
03-02-2015

I've already mentioned why I think the *FIX* in RTImage line 95 is necessary in the larger scheme. The only reservations I had was that the new comment that was added alongside it didn't seem to describe what was changed and implied that it was helping the case here, when it seems to be orthogonal to this test case and fixing some other problem. The comment, on the other hand, seems to be addressing how this existing code helps with the test case here, but it does so in a confusing way. What I'd like to see is: - Leave the code fix in here, or file a bug that we shouldn't elide an empty composite if the mode isn't SRC_OVER and then fix it under that. Either way, fix this, don't just let it slide. - Fix the comment. It is obviously trying to document the role of that test in helping avoid vram allocation, but it is confusing and seems backwards.
02-02-2015

webrev: http://cr.openjdk.java.net/~ant/RT-34443/webrev.0 Please, consider pushing the current solution that benefits from the "clearRect optimisation" and from reducing the max tile size to 512x512. We can't improve it further without either modifying the WebKit TextureMapper implementation we currently use, or switching to another implementation (in case that is at all possible) which is beyond the scope of this bug... I'm also answering the questions raised by Jim regarding the current fix: > The comment on line 95 in RTImage.java is confusing - is that really a description of what is happening if you return? I can't see any benefits of this particular change. So, I simply suggest to revert it back. Please, let me know if you think otherwise. > WCGraphicsPrismContext, line 86 - newline for closing brace? Fixed.
02-02-2015

I've reduced the test case. This version displays only Saturn with a Ring turning round. WebKit makes a single layer out of the html/css content. The layer is of 10102px X 12305px size. What causes it to be that huge is the "text-indent: -9999" style set on the "li" element which includes two nested "span" elements, each containing some text. VRAM footprint is as follows: WebLauncher: ~65 Mb WinLauncher: ~10 Mb I used MS PIX tool to look into the D3D level. It showed that WinLauncher used as few textures as possible to cover only the visible part of the layer. And so the result: only 10 Mb. So, why it behaves differently? As it turned out, the native Windows port is built with different configuration. It still uses ACCELERATED_COMPOSITING but, unlike WebView, it doesn't use the TEXTURE_MAPPER code path. The latter features an abstraction layer allowing for implementing the GPU rendering capabilities atop of high-level 2D toolkits (Qt, GTK etc.). We use this mechanism to draw via Prism. Even in the bounds of TextureMapper there're different approaches as to how to store layers. We use the simplest, named TextureMapperTiledBackingStore. More on TextureMapper is on the wiki: https://trac.webkit.org/wiki/CoordinatedGraphicsSystem "The implementation in TextureMapperTiledBackingStore is pretty simple, and is used only to work around the 2000x2000 texture size limitation in OpenGL. It is very different from the dynamic tiled backing store." And here what the wiki says about the dynamic backing store: "The backing store is dynamic in the sense that it doesn't paint all tiles, but only the tiles visible (tiles intersecting the visible rect) plus some area outside (pre-painted tiles; cover rect)." The other WebKit ports use either direct access to the platform toolkit (like DirectX by Win port, or Core Graphics by Mac port) or the other implementations of the TextureMapper (like Qt, GTK), each somehow solving the "clipping" problem. And so, the problem of WebView is that it uses the mechanism (TextureMapperTiledBackingStore) that doesn't include the logic of clipping an animated layer inside a "view port".Whether we can switch to another code path is a subject of investigation... - - - As to the test. The idea to tie a text to an animated layer with -9999 indentation doesn't look good enough... Seems like it is used to easily extract the name of each planet with JS and then display it in the side list. But, it would be better to put the text out of the layer, leaving its bounds natural. (Even worse is the Saturn Ring nested <span>, that makes the layer expand to 10000 x 10000, rising the size quadratically...). Also, if I set "overflow:hidden" to the "li" element, it reduces the layer size to its visual limits, however it cuts off a part of the planet.
02-02-2015

I've collected some more info. 1. Safari 6 has new feature in its Web Inspector (under the Develop menu). Now it shows Layers (more on it in the blog: https://www.webkit.org/blog/2518/state-of-web-inspector/) It gives us a perfect ability to inspect the compositing layers created for the solar system demo. In the DOM tree of the page there's a <ui class="solarsystem"> element. It contains a list of the planets. Each item contains a span with a text string (the planet name). The CSS stylesheet of the page defines all the geometry and animation of the solar system displayed. The "Layer Info" tab reports that the list contains 13 layers. Even more, it reports every layer real dimension and its footprint. All the layers in total occupies 58 Mb. Then, I can disable the "text-indent" style set on the "li" element (set to -9999) and observe the result immediately. This makes the planet names be tied to the planets so that they travel around Sun as well. The layers are reduced to normal size and start to occupy 21 Mb in total. Another interesting observation. The "Mars" layer has 10106x284 dimension and weights 3.86 Mb. The "Saturn" layer has 10102x12305 dimension but weights 14.07 Mb. That's the footprint differs by 3.5 times, whereas the size differs by 43 times! Thus, the footprint grows un-proportionally that probably means some perf improvements are applied (not all the tiles are physically created, for instance). 2. I've measured the vram footprint (on Win7) of the demo (with the Peter's fix applied) with WebLauncher and with the native WebKit port called WinLauncher. (I used ProcessExplorer, GPU tab.) with -9999 text-indent: WebLauncher: 240 Mb WinLauncher: 40 Mb without default 0 text-indent: WebLauncher: 40 Mb WinLauncher: 30 Mb So, the difference is evident. 3. I've instrumented the RTImage code and grabbed the following statistics: Total number of RTImage instances - 806 (this corresponds to what I see with NetBeans profiler. They're all living objects). Also, this looks pretty close to the number of tiles (limited by 512x512) needed to cover all the 13 layers reported by Web Inspector, taking into account their dimensions. (Note that a tile may be less than 512x512.) The total number of textures created - 368. The total size of all of the textures - 49 169 398. This should take about 200 Mb footprint, quite close to what I see with ProcessExplorer. Also, I track the number of "draw" calls performed on a particular texture. And what I see is that every texture (of 386) gets drawn to a graphics object persistently as the demo animates. So, my instant assumption is that we still create and draw tiles which somehow could be ignored or swept out.
23-01-2015

An update to my previous comment. The numbers for different tile size limit values (which I observe on my Win7 x64 desktop): 128x128 [java] D3D Vram Pool: 183��488��628 used (34,2%), 198��045��188 target (36,9%), 536��870��912 max [java] 3114 total resources being managed [java] average resource age is 2,6 frames [java] 8 resources at maximum supported age (0,3%) [java] 3��104 resources marked permanent (99,7%) [java] 0 resources have had mismatched locks (0,0%) [java] 0 resources locked (0,0%) [java] 3��110 resources contain interesting data (99,9%) [java] 0 resources disappeared (0,0%) 512x512 [java] D3D Vram Pool: 219��087��940 used (40,8%), 225��713��936 target (42,0%), 536��870��912 max [java] 402 total resources being managed [java] average resource age is 54,2 frames [java] 12 resources at maximum supported age (3,0%) [java] 374 resources marked permanent (93,0%) [java] 0 resources have had mismatched locks (0,0%) [java] 0 resources locked (0,0%) [java] 381 resources contain interesting data (94,8%) [java] 0 resources disappeared (0,0%) The footprint difference is not that much actually as you can see. But the number of the consumed resourses greatly increases as the tile size drops.
20-01-2015

I did some more digging and this is another coin to the discussion: > Is this really the fix for the solar system rendering (tiling?) issues? It is. The tiling mechanism is implemented in WebKit itself. So, the only thing the fix does with regards to it, is that it changes the max tile size. And here comes the question. Why can't we set it to, say, 128? On Windows (where I'm currently running it) this makes the test consume even less vram. This looks like a tradeoff b/w performance and vram footprint. Can we set the value based on the system capabilities?.. As to the RT-39586, where some clip bounds are established. I've checked that it doesn't correlate with this particular test. Moreover, when the tiled_layer mode is switched on for webkit, it stopps taking into account clip bounds when a tile is created.
19-01-2015

The new patch still has not addressed my earlier concerns raised above: - The comment on line 95 in RTImage.java is confusing - is that really a description of what is happening if you return? - WCGraphicsPrismContext, line 86 - newline for closing brace?
16-01-2015

Kevin helped me figure out how to build webkit and now I have new results. Non-retina now takes 216MB which is much better, though still an order or 2 of magnitude above what I think the demo should require. Retina now takes 820MB which means it does actually run on my retina MBP if I set the maxvram up to at least 1G (the total VRAM in the machine). Before when I set the maxvram up and retina took 1.5GB, it didn't run very well since it was using more VRAM than the system had. Now that it is down to 820, it actually runs reasonably well, though it takes all of my VRAM to do it.
16-01-2015

gradle -PCOMPILE_WEBKIT=true ...
16-01-2015

I instrumented RTImage and it looks like the tiles are still 2048x2048. Am I perhaps not recompiling webkit by default - I'm just typing "gradle" in rt...?
16-01-2015

With the new patch and the tile size set to 512, I see the exact same amount of vram used - 400MB for non-retina and 1.5GB for retina (maybe a few dozen megs less for retina, but no appreciable change and pretty much the same number as the first patch for non-retina).
16-01-2015

As Anton reasonably mentioned, I've missed changes for TextureMapperImageBuffer.cpp. Fixing it with the following patch: http://cr.openjdk.java.net/~loneid/RT-34443/v1/
16-01-2015

Jim, yes I was wrong about my assumption, thanks for correcting me. Still, the comments look misplaced like you noticed. Something like the following would probably seem reasonable: if (txt == null) if (g.getCompositeMode() == CompositeMode.SRC_OVER) return; /// TODO: draw temporary texture instead of creating permanent txt } Also, I see your comment: <<Looking at the CSS I notice that the Solarsystem style has "overflow: hidden" which means that content that overflows the boxes of the elements should be clipped out. The text in the spans which is positioned at -9999 is surely overflowing the boxes and should be clipped, but we seem to be allocating space for 10k pixels in width for each planet. It sounds like the bug here is that we aren't honoring the "overflow" CSS attribute...?>> I suspect that the problem is not that we aren't honoring the "overflow", but that we aren't honoring the clip. In RT-39586 a partial fix is suggested, that is to avoid crash. However, it doesn't solve the problem of computing a correct clip. I'll try to investigate if this is related.
16-01-2015

OK, summary of my feelings about this fix: - In general things are better so we should pursue it the fix as there doesn't seem to be a downside. - It makes the example work with default settings on 8u-dev on normal DPI desktop screens, that's good. - It doesn't make it work on retina/HiDPI screens, though because it still uses over about 1.6GB of VRAM (down from 2.6GB) on those machines - It's still using way more memory for this particular example than other platforms seem to use Specific comments about the fix: - The comment on line 95 in RTImage.java is confusing - is that really a description of what is happening if you return? - WCGraphicsPrismContext, line 86 - newline for closing brace? There is more work to be done, though. - We eventually get errors if we resize the window dramatically - some permanent texture is not being disposed somewhere - There is no reason we could not run this on retina Macs - Chrome can run this full screen with near 0 startup time and no animation pauses - Given that phones can run this without a hiccup, I'd have to think there is an architecture for managing these layers that would use less than 10% of the memory we are using.
15-01-2015

Anton, actually, it doesn't explain anything related to resource reduction. It used to never create the texture, but this change now creates it anyway if the mode is not SRC_OVER. That change to that line of code increased the likelihood of our creating a texture. The comment appears misplaced, though - it is describing what is happening if the body is not executed and the method continues on to the rendering code below...? In looking at the changes again, I think the key is the override on clearRect() to not do anything if there is no graphics object. I believe this is "don't create a buffer simply to clear it to transparent - since it will be fully transparent when we lazily create it if we ever have something to render into it". And that is where the savings come from. But, the savings are not that great. It takes it from needing 676M down to 493M when I run it on my Mac in non-retina mode. That is noticeable, but this example should be able to be run in just a few megs of memory and that only barely takes it below the 512M default limit in 8u40 (upped from 256M in 8.0/8u20). Some smaller environments may not even have that much VRAM (notably embedded). This example runs like gangbusters with no performance hits and without being a resource hog on my phone, but we take half a GB to run it. That's still not a great situation even though this improvement seems to be beneficial.
15-01-2015

Jim, thank you for your investigation. > the texture will be created as fully transparent if needed and you don't need that if the mode is SRC_OVER Ok, this explains the change for me, indeed. And this, as far as I can guess, is just what partially solves the solar test problem. Because, as it was discussed in RT-24738, one of the problem of the test is that it forces creation of lots of textures most of which are not used further. Thus, the change simply prevents creating of a texture before any rendering is performed on it (in which case a texture is created from the RTImage.getGraphics() method and then it appears to be non-null in RTImage.draw()). However, I wonder if the change is valid in general. This looks too specific, isn't it?
15-01-2015

If I run with poolstats on my non-retina Mac and do the resizes I see the VRAM usage steadily climb from about 400MB to the 512MB limit and then I get the NPE. VRAM usage then drops dramatically. It looks like we are relying on GC to reclaim some of the discarded Permanent textures? (Note that GC does not reclaim them reliably because the objects that hold on to the VRAM are usually very tiny and somewhat long-lived so once you drop them they stay in the heap until something dramatic happens and only then do we get our many-many-megabytes of VRAM back...)
14-01-2015

Example of NPE from resizing the window with the patch (drag the lower right corner around in a circle for about 10 seconds or so): java.lang.NullPointerException at com.sun.javafx.webkit.prism.WCPageBackBufferImpl.validate(WCPageBackBufferImpl.java:115) at com.sun.webkit.WebPage.paint(WebPage.java:644) at com.sun.javafx.sg.prism.NGWebView.renderContent(NGWebView.java:95) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2067) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1959) at com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:235) at com.sun.javafx.sg.prism.NGRegion.renderContent(NGRegion.java:576) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2067) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1959) at com.sun.javafx.tk.quantum.ViewPainter.doPaint(ViewPainter.java:474) at com.sun.javafx.tk.quantum.ViewPainter.paintImpl(ViewPainter.java:327) at com.sun.javafx.tk.quantum.PresentingPainter.run(PresentingPainter.java:91) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at com.sun.javafx.tk.RenderJob.run(RenderJob.java:58) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at com.sun.javafx.tk.quantum.QuantumRenderer$PipelineRunnable.run(QuantumRenderer.java:125) at java.lang.Thread.run(Thread.java:745)
14-01-2015

The patch helps a lot on my Mac if I don't use a retina resolution, but it still uses 400MB of texture memory which is insane. Needless to say, it still does not run in retina mode even with the patch. I tried to reproduce the rendering problems I saw on Win8.1 on my Windows 7 install, but I didn't see it. On Windows 7 and Mac non-retina, I can still get NPEs if I resize the window a lot.
14-01-2015

I don't really understand how, but applying the patch to my 8u workspace does pick up the performance quite a bit and eliminates the errors. Unfortunately, it also shows a bug on my Win8.1 system - if I resize the window then some of the planets disappear unless they are in the upper left quadrant of the "solar system"...
14-01-2015

I'm starting to wonder if the patch indicated above by Leonid is the right patch. It basically looks like a patch to get retina working correctly and contains none of the tiling fixes that were being discussed as "the solution". Can someone verify that we are reviewing the correct patch? I'm in roughly the same boat. I keep returning to this and there are so many missing pieces as to understanding how this works. All of the changes look randomly unconnected and none of them appears to increase the likelihood that we will avoid creating textures. In fact, this one change that Anton asks about appears to add one new case under which we will create a permanent textures (to be fair, it looks like it is a case that was necessary - the texture will be created as fully transparent if needed and you don't need that if the mode is SRC_OVER, but you shouldn't return if the mode was something else). Is this really the fix for the solar system rendering (tiling?) issues?
14-01-2015

Leonid, I can't clearly understand the following, in RTImage.java: 94 if (txt == null && g.getCompositeMode() == CompositeMode.SRC_OVER) { 95 /// draw temporary texture instead of creating permanent txt 96 return; Where is the code that renders a temp texture instead of creating a permanent one (I see what creates the latter)? And why we don't render in SRC_OVER? Also, the fix doesn't contain the native change that decreases the buffer size limit: --- old/modules/web/src/main/native/Source/WebCore/platform/graphics/texmap/TextureMapperImageBuffer.cpp 2013-10-01 21:51:59.638407133 +0400 +++ new/modules/web/src/main/native/Source/WebCore/platform/graphics/texmap/TextureMapperImageBuffer.cpp 2013-10-01 21:51:59.394405666 +0400 @@ -31,7 +31,7 @@ namespace WebCore { #if PLATFORM(JAVA) -static const int s_maximumAllowedImageBufferDimension = 2048; +static const int s_maximumAllowedImageBufferDimension = 512; #else static const int s_maximumAllowedImageBufferDimension = 4096; #endif
14-01-2015

This also needs to be well tested. It does fix the use case in question. It doesn't address the texture resource issues, but as long as it doesn't make them worse, it will be OK.
10-01-2015

This looks quite promising. It might be good for Jim to take a look, too. I know this is targeted to 9, but we should get it into 8u60 as it should not conflict with the newer webkit (Anthony can confirm this).
10-01-2015

Fix made by PeterZ for review: http://cr.openjdk.java.net/~loneid/RT-34443/v0/
30-12-2014

We should also consider backporting this to an 8u release once it has baked in 9 for a while, but yes, it is too risky for 8u40.
25-11-2014

With the PeterZ' patch attached to RT-24738, it works almost perfect, excluding that the planet pop-ups appear only once when the planet name is mouse hovered for the first time, and never again until the page is reloaded. Since the patch includes changes in core WebView files such as WCGraphicsPrismContext, PrismGraphicsManager or WCBufferedContext, the risk of possible regressions is too high to apply it to 8u40 at the late stage. The patch should go to 9.
25-11-2014

Is it possible you can reconsider? This bug is marked critical, was known for a year, worked in the prior version, and was already deferred from 8 to 8u20 to 8u40. Those people (like myself) whose apps use WebView will not be able to update to Java 8 until this is fixed.
19-11-2014

SQE is OK to defer it from 8u40.
19-11-2014

I can verify that this still happens, but we are out of time for 8u40 so will need to defer it again (unfortunately).
13-11-2014

erroneous comment deleted I ran the wrong test (using SW pipeline by mistake)
14-10-2014

We are sorry, but at this time we have no resources to work on WebView bugs for the 8u20 release. Unfortunately, we will have to defer them to 8u40.
23-05-2014

Perhaps we can investigate the patch and determine whether the flaws can be fixed. There are a number of rendering issues I believe that JIRA represents. Fixing this low level issue would be fantastic for WebView.
30-04-2014

Vadim pointed me at a document describing details of hardware-accelerated compositing in Chrome: http://www.chromium.org/developers/design-documents/gpu-accelerated-compositing-in-chrome Its 'Tiling' section covers how graphics resources can be saved using layer tiling. This is what actually PeterZ proposed in his comments for http://javafx-jira.kenai.com/browse/RT-24738 A fix made by PeterZ and attached to that issue (rt.patch) follows the Chrome's design and works great. I applied it to my workspace (with 256x256 tiles, as Chrome uses by default) and it made the Solar System demo working pretty good, although not perfect.
30-04-2014

Peter or Leonid: Any update on this?
26-03-2014

I'm not sure how much of things like CSS overflow make it down through the C code and into the Java code that we are using to render. Peter or Leonid? One thing to check would be the other ports of WebKit renderers. Do they use hardware acceleration?
23-01-2014

Another data point, my phone (with a webkit renderer) renders this page very quickly with very little memory usage (it doesn't even have the amount of memory that we have, RAM, VRAM, or otherwise). This example does not necessarily require a lot of memory usage - something about the way we are using webkit or responding to its requests is the culprit here...
23-01-2014

Looking at the CSS I notice that the Solarsystem style has "overflow: hidden" which means that content that overflows the boxes of the elements should be clipped out. The text in the spans which is positioned at -9999 is surely overflowing the boxes and should be clipped, but we seem to be allocating space for 10k pixels in width for each planet. It sounds like the bug here is that we aren't honoring the "overflow" CSS attribute...?
23-01-2014

Jim, is it insane / doable to have a render target that is not mapped to a texture? My understanding is that we are running out of memory. We can try to give textures back more aggressively (I thought this was tried, but without success) or have a scheme where less textures are allocated (by keeping some in Java memory). Any ideas?
22-01-2014

This is handled completely inside Webkit, in our code we just get requests to create drawables. We have to respond with something we can draw on, so we need a RenderTarget, and the only RenderTarget implementation in Prism is currently RTTexture (well, it was last time I checked)
22-01-2014

As a corollary to my previous comment, perhaps the other browsers are using temporary buffers, but they have much better reuse. In other words, if a given page asks for layers A and B to blend and then layers C and D, do we use a different buffer for each of those 4 layers whereas other implementations might use the same pair of temporary buffers for both A/B and C/D? In terms of scaling, if there are N such pairs we would need N*2 buffers but another implementation might only use 2 total? A similar question - it looks like a number of places in the layer code "request a repaint". What is involved in that? Is it an asynchronous operation for us that basically disrupts the current frame whereas in other implementation it is synchronous? The reason I ask is that it looks like we place a high priority on keeping the results of rendering a layer around rather than just repeating the work. For the layers involved in this solar system web page it appears that the individual elements being rendered on the layers are incredibly simple and so caching them as bitmaps is wasteful instead of just rerendering every planet on every frame. How is this dealt with in our web-node design?
21-01-2014

I'm still skeptical that the other browsers are handling this with better temporary buffer memory management. They hardly use any resources at all. Has any thought been given to why we use memory buffers for these cases? Or has the assumption been that memory buffers are the only way to handle these operations and so it comes down to managing them?
21-01-2014

Peter, can you investigate some of this and get back to us? This bug is the worst that we have in WebView and we have no plan in place to fix it. Even if we detected the case and rendered slower this would be a step forward.
21-01-2014

My favourite way was to implement RT-14776 and render to memory. This would have solved other problems too, such as canvases disappearing after waking up from sleep. As a workaround we might pull data from layers into memory when we're running short of textures, the performance would be awful, but at least it will "work". It may also be possible to lift video memory limits back to their past values, maybe only temporarily, or in some cases only, when we're able to detect lack of resources...
21-01-2014

Peter Z, this is assigned to you. It seems that there is a deep and fundamental issue in the way that web view allocates textures. Chrome does not seem to allocate textures the same way according to Felipe's experiments. Do you have an suggestions for fixing this?
20-01-2014

SQE: I agree to defer as for now, but investigation should be continued and I agree with Kevin, we may consider critical request to fix it in 8.
04-12-2013

Lowering priority to P2 since not every web site is affected. We will need to defer this, but continue working on it and push it to 8u20 when ready. At that time we could consider a critical request to pull it into 8.
02-12-2013

Would that profiler show if they were using some form of heap store for the layers? I'm guessing a heap store might be managed through some non-OpenGL system, though and might not show up there. I still feel that there is something about the way that this example is set up that allocating huge pixel stores for the layers is not the best approach to get it to render smoothly, but I don't know enough of the internals of webkit and CSS handling to have any better direction to point my finger at... :(
02-12-2013

Jim, I tried OpenGL Profiler on Mac, I was able to attach to Chrome while rendering http://neography.com/experiment/circles/solarsystem and texture allocation was very small. Pointing HelloWebView to http://neography.com/experiment/circles/solarsystem and profiling it showed a long list of large textures in the Resources Views. I was not able to test Safari, as OpenGL Profiler did not show it to me in the running process view. But I feel it is safe to assume they don't allocate all the texture space javafx does.
02-12-2013

Bottom line: We are going to have to defer this one and hope that it is not a common case.
02-12-2013

To answer Felipe's question - there is probably some sort of memory tracing tool that could be used to shed light on how some of the browsers are handling this example, but I am not familiar with these tools off-hand.
02-12-2013

Thanks for the explanations of how the code comes up with such huge values for the layers, but are the layers actually required to be stored as full-on pixel buffers? It looks like we have tiny planets sharing a layer with a small amount of text with a virtual galaxy's worth of empty space between them. In all the levels of code that we stack for executing this HTML example somehow we end up with the requirement that we allocate every pixel in that galaxy of empty space, but a more responsible way to get those displayed results would be to just render them as needed, or to split the text and the planet graphics into separate pieces. Are there no heuristics we could apply here? There are a couple of mitigating factors we use in other parts of the code: - Taking the destination clip into account. If text is translated off the screen, it is outside the eventual renderable area and so it doesn't affect the operations. The effects code goes through some very complicated testing to only apply affects to parts of the sources that can affect the desired output area, for instance. - If a cache would require too much space, and we can render without the cache, then we render without the cache. - In damage repair, if 2 damaged areas are far enough apart, we treat them separately even if they are a part of the same scene graph or are a part of the same underlying condition that caused the repaint. Somewhere between "allocate enormous textures that overflow all possible VRAM" and "eliminate all cached layers" lies a somewhat more responsible way to handle this particular example. Can we detect sparse layers and break them up into smaller, more manageable chunks? Can we disable the layer-as-pixel-buffer approach when the layer exceeds certain dimensions and rely on immediate rendering instead? I'd be interested in hearing how safari uses resources for this web page, but I don't think "they must allocate huge pixel buffers, therefore where are they allocating them?" is the only assumption you can make. Perhaps they are not allocating such huge pixel buffers because they detect anomalies with the way this example is using layers? Perhaps they don't allocate pixel buffers at all for this demo?
02-12-2013

Jim, as far as I remember Vasiliy's analysis (he investigated multiple issues around this demo), each planet has some text associated with it that has an offset of -10000, so each planet's layer is a long narrow strip. For Saturn rings at least, a shear transform is applied to this layer, which makes its bounding box 10k x 10k.
30-11-2013

Answering Steve's comments: 1) I don't have clear understanding of how other Webkit ports handle this -- they delegate most of the work to third party libraries such as Qt and CoreGraphics. It's possible that in the end they store layers in heap memory -- as seems the case with canvas data. If this is the case then they won't consider this a flaw. If we had a heap-based RenderTarget, or at least could get an ahead-of-time notification of texture death, we would be fine with this architecture as well. 2) No, this is a compile-time setting in Webkit. This issue affects pages that create large animated layers (and by large I mean several times bigger then the visible area). Of course it's hard to tell those pages just by looking at them without studying their code. Taking this particular demo as an example: by looking at it one could hardly expect it to consume so much resources. OTOH this is the only application I'm aware of that hits the limit, which suggest most of the "regular" animations should be fine.
30-11-2013

Hi Jim, could it be possible to use "OpenGL Profiler" to inspect the resource allocation for Safari rendering the given page ?
30-11-2013

I took a quick look at the HTML and the css and see nothing that requires 10k x 10k textures...? (The amount of code here is fairly small, just the HTML file which is fairly readable because it is intended as a web programming example, and the associated css file. All of the numbers in the CSS seem quite normal...)
29-11-2013

The thing that is bothering me about this test case is that there is no fathomable reason that they should need that much texture memory to render what they are rendering. Either the example is just extremely irresponsibly written, or we are misinterpreting how much space they are requesting, or ...?
29-11-2013

Here is the poolstat snapshot when running in retina mode: ES2 Vram Pool: 2,668,728,348 used (49.7%), 2,668,728,348 managed (49.7%), 5,368,709,120 total 140 total resources being managed average resource age is 0.4 frames 0 resources at maximum supported age (0.000000) 120 resources marked permanent (85.700000) 0 resources have had mismatched locks (0.000000) 0 resources locked (0.000000) 122 resources contain interesting data (87.100000) 0 resources disappeared (0.000000)
29-11-2013

Without a webview-retina-enabled 2u build I have no way to know if it will work there or not. On 8.0, using maxvram=5g and poolstats=true I can see it using 2.5-ish GB of texture memory. Apparently OpenGL allows allocations above and beyond actual VRAM usage because I only have 1GB of vram in my system, but the example still managed to launch, even with retina support and 2.5GB allocated. The additional textures are probably in main memory. It's hard to say how Safari does this. Do they have a more intelligent compositing mechanism? Or they could simply allow rampant allocations like we did in 2.2...?
29-11-2013

So what is the bottom line here. Are you saying that the only reason it works in 2.2 is that 2.2 is not retina enabled? BTW, the page renders no problem and is fast in Safari (of course).
29-11-2013

I updated my 2u workspace and the web page was still not retina-enabled so it looks like we'd only use 600Meg of VRAM on 2.x even on a retina Mac...
29-11-2013

If we have a way to determine actual VRAM size and we want to be resource hogs then we could set the size of ManagedResources to the full size of VRAM. This is the main difference with 2.2, that it would have no limit on the amount of VRAM that it would use if you requested. I'm not sure that is a good model to follow. The web site seems to work with about 600Meg of textures so most modern graphics cards will run it, but some older machines will run out of VRAM anyway. Also, that same memory usage would be about 2.4GB on a retina machine and there aren't any retina macs with that much VRAM. It ran OK for me on my retina Mac with my 2u workspace, but the web page did not look full retina resolution. I am updating my 2u workspace to the latest to test again. Is WebView retina-enabled on 2u yet?
29-11-2013

I am able to run 2.2 and performance is good in this example so we will need to fix this as some point, even if it means putting back code or rendering strategies from 2.2 (drastic, not likely to happen and less likely for ZBB). 1) Is there a design flaw in accelerated composition that means it can never be made to work when dealing with large areas? 2) Could compositing layers be disabled for this case and maybe others? If we are going to defer this, then we will need a good description of what sorts of web pages will fail so we can understand what else is unlikely to work. You spoke of TextureMapper etc. but we need to understand what sort of web pages will trigger the problem. This is absolutely the worst problem that FX has right now and everyone with knowledge in the area needs to concentrate on it.
29-11-2013

Summarizing the discussion from RT-24738: The problem here is, accelerated compositing layers are backed by RTImages which in turn are backed by permanent RTTextures. As a layer can be, in theory at least, redrawn when its texture is lost, marking those textures permanent is an error. To fix this we need to: - Introduce non-permanent RTImages to be used for accelerated compositing layers; - Manage locking/unlocking textures for these RTImages; - Listen for texture-lost events and notify Webkit that the corresponding layers should be redrawn (note that the notification will occur on Render thread and Webkit renders on FX thread). This is too elaborate and risky change for JDK8. I believe this should be deferred and fixed in 9 if at all. Note that TextureMapper needs all the layers rendered before it can start compositing, and we don't have enough memory for that, so this particular test will still be broken one way or another.
29-11-2013

I debugged this a bit and basically confirmed what Jim has described. I see that when WebPage.updateDirty() calls into the native twkPrePaint(getPage()) method it will call up into Java creating a lot of fairly large RTImage. (see felipe-log.txt)
27-11-2013

Peter, do you need any help on this? It is currently the worst but that we have outstanding.
26-11-2013

I added some comments about that patch to RT-24738. I don't fully understand the tile system, but it sounds like they are temporary rendering tiles and so they are cleared at the start of every operation. If so, then unlocking them at the end of the operation (and relocking on the next clear) should be feasible. See my comments in RT-24738...
26-11-2013

Using non-permanent RTImages for accelerated compositing layers may help, though we'll need some way to notify Webkit that some layers are lost and need to be redrawn...
26-11-2013

As I said RT-24738 has some details and even a patch. The patch makes rendering routine more cumbersome and is more of an ad-hoc fix not guaranteed to work in all cases. I felt applying the patch to "fix" one corner case was unjustified so I didn't proceed with fixing.
26-11-2013

In the webkit prism code any use of their "RTImage", which extends PrismImage, creates a permanent texture. These need to be handled more dynamically...
25-11-2013

When I run with poolstats, it tops out at over 600Meg of permanent resources. 120 are permanent and only 5 are transient textures like backbuffers, etc. ES2 Vram Pool: 672,240,290 used (62.6%), 672,240,290 managed (62.6%), 1,073,741,824 total 125 total resources being managed average resource age is 8.3 frames 0 resources at maximum supported age (0.000000) 120 resources marked permanent (96.000000) 0 resources have had mismatched locks (0.000000) 0 resources locked (0.000000) 120 resources contain interesting data (96.000000) 0 resources disappeared (0.000000)
25-11-2013

2.2 had no limits on VRAM usage. It also had some problems: - D3D would never fail an allocation, but performance would suffer dramatically as it started using alternate memory for textures - ES2 desktop would not fail the allocations directly, but you'd end up with rendering problems eventually - We weren't "good players" in terms of VRAM usage. I'm still updating my build environment for the latest JDK8 so I can run some tests to see what might be failing. We are likely thrashing quite a bit, but I'm not sure why that would cause failures. Perhaps VRAM fragmentation?
25-11-2013

This is sad. It runs flawlessly on 2.2. Do we need to simply set a number bigger?
25-11-2013

Here is some extra information if it may helps. This program seems to need about 1 gigabeta of vram to run. I'm able to get over the NPE problem by setting the vram to -Dprism.maxvram=1024m. I tested this on my Windows laptop on both es2 or d3d pipe. Similarly I'm able to run it on the sw pipe if I specified -Xmx1024m: $ jjfxjar -Xmx1024m -Dprism.order=sw -DproxyHost=www-proxy.us.oracle.com -DproxyPort=80 -jar dist/WebLauncher.jar http://neography.com/experiment/circles/solarsystem I see regular flashing on the es2 or d3d pipe, but sw pipe looks really stable. Resizing on all pipes work as well.
25-11-2013

Leonid: can you confirm this? Peter or Jim: any ideas?
25-11-2013

Given that we are one week from ZBB, I am raising this to P1. We cannot ship with this critical regression.
25-11-2013

I did a full clean build on my Mac and see the following exceptions: kcr@dhcp-santaclara22-2fl-west-10-132-182-154:~/javafx/8-kcr-gfx/jfx/rt-closed/toys/WebLauncher$ jjfx -DproxyHost=www-proxy.us.oracle.com -DproxyPort=80 -jar dist/WebLauncher.jar http://neography.com/experiment/circles/solarsystem java.lang.NullPointerException at com.sun.javafx.webkit.prism.WCPageBackBufferImpl.validate(WCPageBackBufferImpl.java:75) at com.sun.webkit.WebPage.paint(WebPage.java:625) at com.sun.javafx.sg.prism.NGWebView.renderContent(NGWebView.java:73) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2043) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1951) at com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:225) at com.sun.javafx.sg.prism.NGRegion.renderContent(NGRegion.java:499) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2043) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1951) at com.sun.javafx.tk.quantum.ViewPainter.doPaint(ViewPainter.java:469) at com.sun.javafx.tk.quantum.ViewPainter.paintImpl(ViewPainter.java:324) at com.sun.javafx.tk.quantum.PresentingPainter.run(PresentingPainter.java:91) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at com.sun.javafx.tk.RenderJob.run(RenderJob.java:58) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at com.sun.javafx.tk.quantum.QuantumRenderer$PipelineRunnable.run(QuantumRenderer.java:129) at java.lang.Thread.run(Thread.java:744) ERROR: unexpected fbo is bound! Expected 1, but found 2 java.lang.NullPointerException at com.sun.javafx.webkit.prism.WCPageBackBufferImpl.validate(WCPageBackBufferImpl.java:75) at com.sun.webkit.WebPage.paint(WebPage.java:625) at com.sun.javafx.sg.prism.NGWebView.renderContent(NGWebView.java:73) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2043) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1951) at com.sun.javafx.sg.prism.NGGroup.renderContent(NGGroup.java:225) at com.sun.javafx.sg.prism.NGRegion.renderContent(NGRegion.java:499) at com.sun.javafx.sg.prism.NGNode.doRender(NGNode.java:2043) at com.sun.javafx.sg.prism.NGNode.render(NGNode.java:1951) at com.sun.javafx.tk.quantum.ViewPainter.doPaint(ViewPainter.java:469) at com.sun.javafx.tk.quantum.ViewPainter.paintImpl(ViewPainter.java:317) at com.sun.javafx.tk.quantum.PresentingPainter.run(PresentingPainter.java:91) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at com.sun.javafx.tk.RenderJob.run(RenderJob.java:58) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at com.sun.javafx.tk.quantum.QuantumRenderer$PipelineRunnable.run(QuantumRenderer.java:129) at java.lang.Thread.run(Thread.java:744) ERROR: unexpected fbo is bound! Expected 1, but found 2
21-11-2013

RT-24738 has some analysis of the issue
21-11-2013

This is a very serious regression.
21-11-2013