Need help optimising

0 like 0 dislike

I've made a rain particle effect that works exactly as I had intended, however, I'm in need of a little help to optimise this effect due to the fact that it begins to lag the pc after spawning more than 500 rain particles at any given point.

If you need any images just ask

Thanks in advance for the help guys
asked Aug 29 by Jamesking96 (660 points)
can you share here how you made your effect, maybe a screenshot of the layers, if you have multiple ones, is one consuming noticeably much more perf than the others in the HUD profiling data ? (a screenshot of that would be nice too)
are you using scene collisions for every particle?
are you testing in the popcorn editor only or in a specific game-engine? if so which one?

3 Answers

1 like 0 dislike
Best answer

Got the .pkkg, Thanks !

It's a rather long and complex answer so I'll split it in three different parts:

  1. Raw optimization / improvement of your current effect without changing its structure (1 main rain layer)
  2. Optimizing further using PopcornFX v1.12
  3. How to take it a step further and transform it into a more complete "large scale" rain effect that adapts to the viewer and has different LOD layers.

1- Raw optimization / improvement

Here are the few things I noted that could be improved.
In order to get more readable/significant measurements I increased the spawn area from 10x10 to 20x20 and increased 4-fold the number of particles to keep the same rain density (goes from 9000/sec to 36000/sec), so all the following timings in ms are the timings measured by the editor for that particle count, captured on a relatively crappy quad-core PC, so you can expect much better figures with a more recent machine and less particles.

1- two raycasts per particle per frame
The main rain particles are actually doing two raycasts per frame: one made by the collisions evolver, another made by the 'intersectExt' call in the script.
The way you're doing the manual intersect + kill means there can be some intersection misses, therefore the collision evolver is useful, but as we'll see in #2 below, you can fix that and make the intersect call never miss intersections. So let's just remove the collision evolver. Here are the timings of the whole effect before and after:

  • ~4-5 ms total before
  • 1.0 ms after

It's much faster, mainly due to the fact that the collision evolver instantiates the "HitOther" layer, which is more expensive.
If we set it to spawn the "HitWater" layer, like the intersect() script, it goes from 3.0 ms with the collisions evolver down to 1.0 without. It's still less than half the cost, but that's because the intersect script misses intersection calls. When increasing the intersection distance so it doesn't miss any, it goes from 3.0ms to 1.5ms. 

2- improper intersection, causes intersection misses
The way the intersection script currently works is it starts from the particle Position, raycasts down scene.axisUp(), for a fixed distance, which is set to the value of the attribute "RaycastDistance".

This causes multiple issues.
If 'RaycastDistance' is too small, rain particles will miss intersections.
If 'RaycastDistance' is too large, they won't miss intersections, but they'll raycast further, and it'll be more expensive:

The current default value of "RaycastDistance" the effect has is actually too small (0.064 world units), so collisions are missed, and not all splashes are spawned. Here are the timings and number of splash particles based on the value of RaycastDistance:

  • 1.0 ms before (~700 splash particles, distance=0.064)
  • 1.5 ms after (~2000 splash particles, distance=1.0)

When using a large raycast distance, particles see the ground earlier, trigger the splash and kill themselves too early. Artefacts become visible due to a bad trigger location (see #3 below)

The solution to this is to perform a raycast between the particle's previous-frame position and its current position. See it as raycasting a segment that's the whole "movement path" of the particle this frame.

To do this we'll need to save the position it had at the beginning of a frame, with a script that runs before any other evolver, and that does something like "StartPosition = Position", then after the evolvers that move the particle around (ie: physics), we'll do the scene intersect like so:

    float3  raycastStart = StartPosition;
    float3  raycastStop = Position;
    float3  raycastVec = raycastStop - raycastStart;
    float3  raycastDir = safe_normalize(raycastVec, -scene.axisUp());
    float   raycastDist = length(raycastVec);
    int4    hit = scene.intersectExt(raycastStart, raycastDir, raycastDist);

(local variables added to emphasize what each computed value is)

Note the safe_normalize : when the length of its first parameter (here, 'raycastVec' is zero, it can't normalize it, and will return the second parameter)

With this proper intersection, we don't have any more misses, and no need for an arbitrary raycast distance attribute anymore.

Also, watchout for when nothing is hit, scene intersect will return 'infinity' as the hit distance, the proper way to test for this is to do:

    int4    hit = scene.intersectExt(raycastStart, raycastDir, raycastDist);
    int     hasHit = isfinite(scene.unpackDist(hit));

3- Bad trigger location

By default, all 'trigger' calls will use the current particle 'Position' field as the location the child layer should be spawned at. This means that when intersecting with a ray starting at 'Position' and ending further down the raycast line, when an intersection was found, all triggers were triggered at 'Position', therefore at the start of the raycast. When using large raycast distances, it spawned splashes way above ground:

When using the fixed raycast in #2 above, that raycasts between 'StartPosition' and 'Position', it will trigger the child layer at 'Position' which is the end of the raycast, so splashes now spawn below the ground (which is arguably worse):

There are two ways this can be fixed:

A- temporarily set "Position" to be the intersection position, call trigger(), the restore it to the actual Position the particle should have.

As you're killing it afterwards, you don't really need to restore it, you can just snap it back where it hit, call trigger(), then kill() it, and voila.
whenever the particle does NOT trigger and kill though, you still need to restore it.
It could be tempting to write this:

    float3 backupPos = Position;
    Position = raycastStart + raycastDir * hitDist;
    Position = backupPos;

But it won't actually work. The script optimizer incorrectly assumes the two assignments to "Position" can be simplified. However, it correctly sees the call to "trigger" as having side-effects including reading from particle fields in-memory, so it doesn't remove the store to Position before the trigger, but it removes the store to Position after the trigger, which looks a lot like a bug. I've filed a bug report, but in the meantime, don't do that.

You can use a select instead:
    Position = select(Position, raycastStart + raycastDir * hitDist, hasHit);

B- Use the overload of the 'trigger' function that expects position and fwd + up axes.

You can call the trigger function with explicit instantiation coordinates:

                     raycastStart + raycastDir * hitDist, // spawn position
                     scene.axisForward(), // spawn forward axis
                     scene.axisUp()); // spawn up axis

This is the cleanest of the two, and we'll need the explicit up and fwd for the next step :)

It now spawns at the proper location:

One important side-note: it's actually better to slightly offset the planar billboards along their up axis when they spawn to avoid Z-fighting with the ground geometry.

Add this to their spawn-script to offset 'Position' of 0.05 units along the up axis:

    Position += float3sfu(0,0,0.05);

4- aligning triggered layers with the world surface they collided with

Currently, when rain collides with a non-horizontal surface, the splash particles are still aligned horizontally (see previous screenshot at the back)

You can get the intersected surface normal from the hit report of 'intersectExt()', and use that to pass down custom fwd and up axes to the trigger() call to orient the child layer properly:

    float3    hitPosition = raycastStart + raycastDir * hitDist;
    float3    hitUp = scene.unpackNormal(hit);
    float3    hitForward = scene.axisForward();
    HitWater.trigger(con, hitPosition, hitForward, hitUp);

This is already much better, but you'll see some artefacts using 'scene.axisForward()' as the forward axis:

(notice some of the splashes on the inclined plane are still horizontal)

You need to recompute a proper forward axis.
You could do:
    float3    hitForward = cross(scene.axisSide(), hitUp);

But this will give incorrect results if a rain particle hits a surface whose normal (therefore 'hitUp') is equal to scene.axisSide(), producing a zero-length forward axis.
The following trick will work regardless of the surface that was hit:

    float3    hitPosition = raycastStart + raycastDir * hitDist;
    float3    hitUp = scene.unpackNormal(hit);
    float3    hitForward = cross(float3suf(0, -hit.f, hit.u + 0.01), hitUp);
    HitWater.trigger(con, hitPosition, hitForward, hitUp);

And the splashes will now be oriented properly:

5- Avoid using dummy folders
The trigger targets in the effect are two different folders, each containing a single layer:

Folders are nice to tidy-up the effect, but when you can avoid them, do so (ie: when they only contain a single layer). They are not optimized-away and have a small overhead that doesn't matter in most cases, but in that kind of trigger-heavy effect, it adds-up pretty quickly. Without the folders:

Here are the timings of the whole effect before and after removing the dummy folders

  • 1.5 ms total with the dummy folders
  • 0.8 ms total without the dummy folders

Which is almost twice as fast without.
Here's a screenshot of the in-editor profiler, parts treating layers and folders highlighted in red (horizontal axis is time).
Left is with dummy folders, right is without:

Likewise, if possible, avoid using non-instantaneous layers (layers whose 'Duration' property is nonzero), they incur a performance penalty as well.

6- Weird use of Color in the splash layer

In the 'HitWater' layer, you have a field evolver that binds a curve to the 'Color' field, then a script that runs and does:

    Color -= RippleColor;

RippleColor being a float4 attribute that defaults to zero.
This will cause color components to become negative, which can be particularly bad in some engines, and probably not what you initially wanted?
The usual pattern here is typically to have multiplicative coloring. So 'RippleColor' would default to white (1,1,1,1), and you'd multiply it with 'Color' rather than subracting it:
This will tint the particles to match the color set in 'RippleColor', which is not the case at all with the subtraction, where setting 'RippleColor' to red would tint the particles greenish / teal.

From a performance point of view, doing an addition, subtraction, or multiplication has no effect whatsoever, it will cost the same thing. (division is more costly though)

Also, the "ColorDeform" field evolver has a curve with a constant value. If you don't plan to change it, it's useless, remove the evolver alltogether and just assign that constant value (0.053) once in the spawn-script.

PS: just saw you also had this for the color of the main rain layer.

7- tweaking the page-size for better thread utilization

This one is a pretty advanced one and will be less measurable especially if you have many more effects running at the same time.
It's best left untouched but it can help in some specific cases where you know it's an issue.

For this part, I brought down the spawn rate from 36000/sec back to the initial 9000/sec:

First we'll need to inspect the physical storage of the particles. To do that, toggle the page profiler:

This will toggle the following debug display on the top left corner of the viewport:

Each column shows the storage of one particle layer. Here the first one is the Rain layer, the second one is the Splash layer, and the third one is the second triggered layer which isn't currently triggered so it's empty.
Here each layer has a single page (horizontal bar), the red part of the page shows the particles active in the page. The green parts are the free space.
The text above tells us, for the first layer that:

  • the storage is used at 59.7% by particles, the rest is free space
  • the storage uses 128 Kilobytes of memory
  • the pages are 1024 particles wide, and there's 1 page.
  • the storage ID is '0', there is 1 active evolve page and 0 active spawn page (you don't care about that really)

The important thing to understand here is that the PopcornFX runtime will emit one update task per page. Therefore, if a layer has only a single page, it can only be processed by a single worker thread. If your machine has 4 cores, PopcornFX will have 4 worker threads, if it's a 12 core, it'll have 12 worker threads.

If your layer is using 10 pages, 10 threads out of your 12 workers will have work to do.
So, if you have very expensive particles, that only use one or two pages, on a 12 core machine, and there are no other layers/effects to update, you'll have most of the cores sitting idle.

The idea here is that you can tell the runtime to use a smaller storage, hence breaking up the particles into more physical pages, which will cause better thread utilization when the simulation runs.

Select the Rain layer, and scroll down to "PrefferedStorageSize", switch it from "Auto" to "Small" :

This will cause the page size to be smaller, and the particles will use more pages: (here after setting the page size to "Small" for both the rain and splash layers:

You can now see each page has dropped from 1024 particles-wide to 128, the total storage size has gone from 128Kb to 80Kb for the rain layer (less unused space), and there are now 5 pages used
Here are the timings:

  • ~0.4-0.5 ms with default pages @ 9000 particles/sec
  • ~0.35 ms after switching just the rain layer to "Small"
  • ~0.35 ms after switching the splash layer as well (running on a quadcore, 4 workers, no measurable difference, workers are fully utilized anyway, no more idle gaps, so it doesn't change anything)

In this case it's a pretty small gain, but still a gain, and in some specific cases it can make a large difference.

Like with other similar really low-level and specific tweaks, don't do this for everything, it's generally better to leave the runtime decide the page size. It can especially hurt performance ingame if you happen to spawn way more instances of an effect than what you're testing in-editor, as smaller pages have a higher overhead to process, so if all the worker threads are already well used with large pages, switching to smaller pages will actually cost more.

2- Optimizing further with v1.12

v1.12 comes with two interesting new features that you can use for this effect:

  1. low-rate updates for script evolvers
  2. custom quality setting for curve samplers

The curve samples are only taking a small fraction of the time in the effect (less than 4%):

So it's not really useful to bother with this, but it's good to know if you end up using more curve samplers or field evolvers.

The main gain here is with the sub-rate updates. These allow you to say that a script evolver should only run at half, quarter, or eighth rate (once every 2, 4, or 8 frames)

This allows to smooth-out expensive computations over multiple frames, and effectively divide the per-frame cost by 2, 4, or 8.

Here, the most costly part is the scene.intersectExt() in the rain layer.
If you were using a turbulence sampler for turbulent wind it would also be a good idea to pull out the sample() call in a sub-rate script evolver.

Let's look at the effect in v1.11:

and v1.12:

you can see v1.12 has a grey "1" for each script evolver. This is the update rate. Clicking on it pops a drop-down that allows you to pick another rate, let's make the intersection script run at quarter rate:

Here are the timings of the whole effect (again with a spawn rate of 36000 particles/sec)

  • ~1.0 ms before (full rate)
  • ~1.0 ms after (quarter rate)

Which doesn't change a thing.
Here's why:

just switching the intersect script means we're now missing intersections 3 out of 4 frames. This means the rain particles now go through the geometry and die much later. Therefore instead of ~2200 live rain particles before, we now have ~20 000 live rain particles (!). Also, instead of ~2000 splash particles we now have only ~500.
The rain layer costing more (more than 4 times the particle count even though the expensive stuff is now only done once every 4 frames), plus the splash layer costing less (4 times less particles than before) ends up giving roughly the same timing as before, but the effect is totally broken.

Here's what we have to change to make it work:

We had a script at the start of the evolver stack that backed up "Position" into "StartPosition"
We now want "StartPosition" not to be the position at the start of the frame, but the position the particle had at the last intersect, so we'll just remove this evolver, and save "Position" into "StartPosition" after the intersectExt() call, for the next run of the script.
For the first frame to pick up the correct "StartPosition", we'll also need to initialize it in the spawn script.

Two ways to do this:
A- in the "Eval()" function of the spawn-script, which runs in localspace.
This requires to have set the proper transform flags to the "StartPosition" field, so that it gets properly transformed to worldspace when leaving the "Eval()" function, ie: full transform flags (rotate+translate):

B- in the "PostEval()" function of the spawn-script, which runs in worldspace.
If we do it here we don't care about transform flags, we'll assign the worldspace "Position" field into the worldspace "StartPosition" field:

function void PostEval()
    StartPosition = Position;

If you have the choice, prefer not calling a "PostEval" function just to do that, just set the proper xform flag and do it in the regular spawn script.

The "Rain" layer particle count is now back to the normal ~2200 particles, no more intersections are missed, and we have ~2000 splash particles.
Timings for just the "Rain" layer are now:

  • ~0.35 ms before
  • ~0.12 ms after

3- Large scale rain & LOD layers

This is a whole topic by itself so I won't cover it in extended details like the above, but here are the main lines you can investigate:

1- Player-following effect for "infinite rain"
You can make your rain emitter move along the player camera directly inside the spawn script by using "view.position()".

Instead of doing this:

    Position = SpawnBox.samplePosition();

Just do this:

    Position = SpawnBox.samplePosition() + view.position();

This will make the spawn-box follow the player camera and you'll basically have a "rain spawn box" following the player head. Based on where the player is in the level you can turn it on or off by simply changing your SpawnValue attribute.

Depending on what your level looks like you might want to ignore the camera height and do:

    Position = SpawnBox.samplePosition() + float3sfu(view.position().sf0);

(sfu stands for "side, forward, up", it's better than the xyz swizzles as it allows the effect to be axis-system independent, read: you can use it in UE4 or Unity even though UE4 is Z-up and Unity is Y-up)
If you don't care, just do, for Z-up coordinate systems:

    Position = SpawnBox.samplePosition() + view.position().xy0;

2- Beauty/fill layer(s)

You can add a second far-away layer of cheaper non-colliding particles to fill the background. (actually you can use multiple of these of you have a long view distance, with far-away rain being large billboards textured with a sheet of rain streaks.

To do that we'll use a different spawn shapes (you're currently using a box).
Cylinders will be better suited, as we can setup an innerRadius, and entangle them together:
- spawn shape of colliding rain layer (closest layer): cylinder in volume sampling mode, radius = 10, innerRadius = 0, height = 0.5
- spawn shape of beauty rain layer (second closest): cylinder in volume sampling mode, radius = 50 (or whatever), innerRadius = 10 (equal to the first layer radius so they fit together nicely), height=0.5

Just copy/paste the first rain layer, delete the script doing the intersection, boost the size and the spawn count up (otherwise due to aliasing they will be barely visible and just eat performance for almost no visual impact), and also bring down their life. As they're not intersecting, they'll go through the level and die when they want.

You can also keep an intersection test, just to kill them, make it run at 1/8th rate, and perform no trigger when they hit, it'll still be much cheaper.

Something that could add a nice polish would also be to fade the intensity of the splashes based on how far they're from the edge of the colliding rain spawn limit so you don't see a clear and hard "no more splashes" limit. This can be done either by comparing their spawn position to the view.position(), or, when spawning the initial rain particle, grab the value of length(Position), and fade a "SplashIntensity" field as the length gets close to the radius of the spawn shape (10.0 in the example above), then forward that "SplashIntensity" field as a parent field to the child layer, and use it to fade the final splash color.

well, that was a pretty long answer...
I'm sending you the modified effects back by email...

Hope it helps :)


answered Sep 26 by Julien (30,940 points)
0 like 0 dislike
can you share here how you made your effect, maybe a screenshot of the layers, if you have multiple ones, is one consuming noticeably much more perf than the others in the HUD profiling data ? (a screenshot of that would be nice too)
are you using scene collisions for every particle?
are you testing in the popcorn editor only or in a specific game-engine? if so which one?
answered Sep 14 by jbayeux (640 points)
Sure the images are here need any more info just ask. Note: have made changes that have improved performance, Also rain uses intersectExt and the unpackSurfaceType functions and is to run in UE4
0 like 0 dislike
(answering in a new answer rather than a comment, as I'm wondering if you get notified of replies to comments?)

I'll need to see more, many things can eat performance there, (ranging from how you trigger the child particles, to the way they are setup, to the fact that they use dummy folders or not, to wether they're instantaneous or not, etc..) most of the time these won't matter much, but in such "trigger-heavy" effects they do.
can you send your effect over in a .pkkg to support _at_ popcornfx _dot_ com?
I'll take a look and make a more detailed feedback.

Apart from these potential improvements, the biggest gain will definitely be to make less raycasts and triggers. So the obvious optimization is to only emit rain particles that raycast around the viewer camera, and emit non-raycasting versions further away where you don't notice the splashes.

you can also mix this with other techniques pretty useful for rain:

1- have a raycast layer and a beauty layer for the rain, where the raycast layer has invisible particles whose sole purpose is to find intersection points in the scene, and the beauty layer is just raw rain streaks.
You can have much less raycast particles than rain streaks, also you usually won't notice the mismatch between the splashes and the actual rain streaks, unless they're really close to the camera. So this technique can be used as a second more "distant" layer.

2- for further away rain, use much fewer but bigger billboard strips simply containing a "distant rain" texture. (watchout for overdraw with that one though !). This is pretty useful to make a background "fill-layer" for more distant rain.
answered Sep 25 by Julien (30,940 points)
Thanks for the comments I can send an email but the format I'm not sure of as popcorn particles are saved as .pkfx I can send that if it helps? On top of the Particles and Memory issues, it appears that only one Item is affecting the Overdraws this being the Ripples I have for when the rain hits a water surface type.