Idea Transcript
Browse
Community
Careers
Members
Search
Existing user? Sign In
Home Forums Programming Graphics and GPU Programming
GDNet Chat
All Activity
Sign Up
Search...
Find your next job in the games industry through GameDev.Jobs. Click here to learn more. ANNOUNCEMENTS
For Beginners is now a Group!
ADVERTISEMENT
DX12 and threading
DX12
Sign in to follow this
Followers
1
By lubbe75 , December 7, 2017 in Graphics and GPU Programming
1 2 NEXT
Page 1 of 2
Posted December 7, 2017 ADVERTISEMENT
Being new to DirectX 12 I am looking for examples on how to use threading. I have done lots of
lubbe75
OpenGL in the past and some DirectX, but with DX12 the threading magic is gone and I understand
Member
I have one command list, one command allocator, one bundle and one bundle allocator. I also have a
122
2D 3D Advice
that threading is crucial to get good performance. In my project I currently have one thread doing it all. million triangles, so it's about time that I start doing this.
Algorithm Animation
C# C++ Character Concept Design DX11
How do I split things up? How many threads should I use? How many command lists and allocators?
Feedback GameMaker Gameplay General
Java Javascript
PC
Mobile Music OpenGL
SFX Sprites
I realize this is a beginner's question , but I have to begin somewhere. I would be grateful if someone
Unity Unreal
could point me in a direction where I could find a simple code sample, tutorial or something similar.
NOBODY bought my game - storytime. Things to learn for future. By POKLU Started Friday at 03:13 PM Financial
Is Our Javascript Code Viewable by Browsers, etc. By Josheir Started Wednesday at 11:57 PM Javascript
Command Frames and Tick Synchronization By poettlr Started Monday at 07:40 PM
Thanks!
ADVERTISEMENT
9
Posted December 7, 2017 On December 7, 2017 at 10:08 AM, lubbe75 said:
Hodgman Moderator
14
I also have a million triangles, so it's about time that I start doing this.
51880
Number of triangles is irrelevant to the CPU - how many draw calls do you have? If it's thousands, you may get some benefit from using multiple threads to record the draw commands. In my experience, with less than around a thousand draws, there's not much benefit in threaded draw submission. On December 7, 2017 at 10:08 AM, lubbe75 said:
9
How many threads should I use? Most engines these days make a pool of one thread per CPU core, and then split all of their workloads
Retrieving the list of clients from the c ++ socket By TheRaider Tut Started Monday at 05:38 PM
13
Quickly finding location in an array that is similar to a smaller array. By Syerjchep Started May 7
12
up amongst that pool. So on a quad core, I'd use a max of 4 threads, and as above, also no more than around (draws/1000)+1 threads. 3
Posted December 8, 2017 We have a work-stealing task scheduler that spawns 1 thread for every core on the CPU (minus 1 for
MJP
the main thread). Then we create a bunch of tasks for groups of draw calls, and throw them a the task
Moderator
scheduler. We've tried both 1 thread per logical core (Intel CPU's with hyperthreading have 2 logical cores for every physical core) as well as 1 thread per physical core, and we've generally found that
19926
trying to run our task scheduler thread on both logical cores to be somewhat counterproductive. But
your mileage may vary. AMD has some code here that can show you how to query the relevant CPU information, Writing your own task scheduler can be quite a bit of work (especially fixing all of the bugs!), but it can also be very educational. There's a pretty good series of articles here that can get you started. There's also third-party libraries like Intel's Thread Building Blocks (which is very comprehensive, but also a bit complex and very heavyweight), or Doug Bink's enkiTS (which is simple and lightweight, but doesn't have fancier high-level features). Windows also has a built-in thead pool API, but I've never used it myself so I can't really vouch for its effectiveness in a game engine scenario. My general advice for starting on multithreading programming is to carefully plan out which data will be
ADVERTISEMENT
touched by each separate task. IMO the easiest (and fastest!) way to have multiple threads work effectively is to make sure that they never touch the same data, or at least do so as infrequently as possible. If you have lots of shared things it can messy, slow, and error-prone very quickly if you have
Number of commands to put in… By _void_ Hey!
to manually wrap things in critical sections. Also keep in mind that *reading* data from multiple threads is generally fine, and it's *writing* to the same data that usually gets you in trouble. So it can help to figure out exactly which data is immutable during a particular phase of execution, and perhaps also
enforce that through judicious use of the "const" keyword.
...
3
Exception while creating D3D1… By Jason Smith While working on a project using D3D12 I was getting an exception...
Posted December 8, 2017
HLSL noise or random for fake … By lubbe75 As far as I understand there is no
Thanks for the tips and the links!
lubbe75
After reading a bit more I get the idea that threading is mainly for recording command lists. Is this
Member
real random or noise function in...
correct? Would this also include executing command lists?
122
Before adding threads, will I benefit anything from using multiple command lists, command allocators
DirectX - Vulkan clip space By turanszkij Hi,
or command queues? I have read somewhere that using multiple command allocators can increase performance since I may
I finally managed to get the DX11...
not have to wait as often before recording the next frame. I guess it's a matter of experimenting with the number of allocators that would be needed in my case.
Discard vs Clip By NikiTo Some people say "discard" has not
Would using multiple command lists or multiple command queues have the same effect as using multiple allocators, or will this only make sense with multi-threading?
a positive effect on...
I'm currently in a stage where my Dx9 renderer is about 20 times faster than my Dx12 renderer, so I guessing it's mainly multi-threading that is missing. Do you know any other obvious and common beginner mistakes when starting with Dx12?
Posted December 9, 2017 Before messing around with threading, 1 thing you'll want to do is make sure that the CPU and GPU
MJP
are working in parallel. When starting out with DX12, you'll probably have things set up like this:
Moderator
Record command list for frame 0 -> submit command list for frame 0 - > wait for GPU to process
19926
frame 0 (by waiting on a fence -> Record comand list for frame 1
If you do it this way the GPU will be idle while the CPU is doing work, and the CPU will be idle while the GPU is doing work. To make sure that the CPU and GPU are pipelined (both working at the same time), you need to do it like this: Record command list for frame 0 -> submit command list for frame 0 -> record command list for frame 1 -> submit command list for frame 1 -> wait for the GPU to finish frame 0 -> record command list for frame 2 With this setup the GPU will effectively be a frame behind the CPU, but your overall throughput (framerate) will be higher since the CPU and GPU will be working concurrently instead of in lockstep. The big catch is that since the CPU is preparing the next frame while the GPU is actively processing commands, you need to be careful not to modify things that the GPU is reading from. This is where the "multiple command allocators" thing comes in: if you switch back and forth between two allocators, you'll always be modifying one command allocator while the GPU is reading from the other one. The ADVERTISEMENT
same concept applies to things like constant buffers that are written to by the CPU. Once you've got that working, you can look into splitting things up into multiple command lists that are recorded by multiple threads. Without multiple threads there's no reason to have more than 1 command list unless you're also submitting to multiple queues. Multi-queue is quite complicated, and is definitely an advanced topic. COPY queues are generally useful for initializing resources like textures. COMPUTE queues can be useful for GPU's that support concurrently processing compute commands alongside graphics commands, which can result in higher overall throughput in certain scenarios. They can also be useful for cases where the compute work is completely independent of your graphics work, and therefore doesn't need to be synchronized with your graphics commands. 1
Posted December 9, 2017 On December 8, 2017 at 4:13 AM, lubbe75 said:
Infinisearch Member
After reading a bit more I get the idea that threading is mainly for recording command lists. Is this
3046
correct? Would this also include executing command lists? Before adding threads, will I benefit anything from using multiple command lists, command allocators or command queues? Read through this document it should answer your questions. https://developer.nvidia.com/sites/default/files/akamai/gameworks/blog/GDC16/GDC16_gthomas_adu nn_Practical_DX12.pdf
Posted December 11, 2017 Thanks for that link, Infinisearch!
lubbe75
MJP, I have tried what you suggested, but I got poorer results compared to the straight forward 1-
Member
allocator method. Here is what I tried:
122
After initiating, setting frameIndex to 0 and resetting commandList with allocator 0 I run the following loop (pseudo-code): populate commandList; execute commandList; reset commandList (using allocator[frameIndex]); present the frame; frameIndex = swapChain.CurrentBackBufferIndex; // 0 -> 1, 1 -> 0 if (frameIndex == 1) { // set the fence after frame 0, 2, 4, 6, 8, ... commandQueue.Signal(fence, fenceValue); } else { // wait for the fence after frame 1, 3, 5, 7, 9, ... int currentFence = fenceValue; fenceValue++; if (fence.CompletedValue < currentFence) { fence.SetEventOnCompletion(currentFence, fenceEvent.SafeWaitHandle.Danger ousGetHandle()); fenceEvent.WaitOne(); } } Have I understood the idea correctly (I think I do)? Perhaps something here gets done in the wrong order?
Posted December 12, 2017 That's not quite what I meant. You'll still want to signal your fence and wait on it every frame, you just
MJP
need to wait on the value one frame later. The first frame you don't need to wait because there was no
Moderator
"previous" frame, but you do need to wait for every frame after that. Here's what my code looks like, minus a few things that aren't relevant:
19926
void EndFrame(IDXGISwapChain4* swapChain, uint32 syncIntervals) { DXCall(CmdList->Close()); ID3D12CommandList* commandLists[] = { CmdList }; GfxQueue->ExecuteCommandLists(ArraySize_(commandLists), commandLists); // Present the frame.
DXCall(swapChain->Present(syncIntervals, syncIntervals == 0 ? DXGI_PRESENT_ALL ++CurrentCPUFrame;
// Signal the fence with the current frame number, so that we can check back o FrameFence.Signal(GfxQueue, CurrentCPUFrame); // Wait for the GPU to catch up before we stomp an executing command buffer const uint64 gpuLag = DX12::CurrentCPUFrame - DX12::CurrentGPUFrame; Assert_(gpuLag = DX12::RenderLatency) { // Make sure that the previous frame is finished FrameFence.Wait(DX12::CurrentGPUFrame + 1); ++DX12::CurrentGPUFrame; } CurrFrameIdx = DX12::CurrentCPUFrame % NumCmdAllocators; // Prepare the command buffers to be used for the next frame DXCall(CmdAllocators[CurrFrameIdx]->Reset()); DXCall(CmdList->Reset(CmdAllocators[CurrFrameIdx], nullptr)); }
2
Posted December 12, 2017 On December 11, 2017 at 6:03 PM, MJP said:
Infinisearch Member
That's not quite what I meant. You'll still want to signal your fence and wait on it every frame, you
3046
just need to wait on the value one frame later. The first frame you don't need to wait because there was no "previous" frame, but you do need to wait for every frame after that. Here's what my code looks like, minus a few things that aren't relevant: MJP I didn't look at the linked code but do you do anything for frame pacing in the full code? I see that gamers on the internet complain about frame pacing quite a lot when they seem to percieve issues with it. Your code snippet above would render a certain number of frames on the CPU as fast as possible and then wait for the GPU to catch up. Wouldn't this lead to jerkiness in the input sampling and simulation? Would you just add some timer code to the above to delay the next iteration of the game loop if necessary? Or is it more complex?
Posted December 12, 2017 The code that I posted will let the CPU get no more than 1 frame ahead of the GPU. After the CPU
MJP
submits command lists to the direct queue, it waits for the previous GPU frame to finish. So if the GPU
Moderator
is taking more time to complete a frame than the CPU is (or if VSYNC is enabled), the CPU will be effectively throttled by fence and will stay tied to the GPU's effective framerate.
19926
In my experience, frame pacing issues usually come from situations where the time delta being used for updating the game's simulation doesn't match the rate at which frames are actually presented on the screen. This can happen very easily if you use the length of the previous frame as your delta for the next frame. When you do this, you're basically saying "I expect the next frame to take just as long to update and render as the previous frame". This assumption will hold when you're locked at a steady framerate (usually due to VSYNC), but if your framerate is erratic then you will likely have mismatches between your simulation time delta and the actual frame time. It can be especially bad when missing VSYNC, since your frame times may go from 16.6ms up to 33.3ms, and perhaps oscillate back and forth. I would probably suggest the following for mitigating this issue: 1. Enable VSYNC, and never miss a frame! This will you 100% smooth results, but obviously it's much easier said than done. 2. Detect when you're not making VSYNC, and increase the sync interval to 2. This will effectively halve your framerate (for instance, you'll go from 60Hz to 30Hz on a 60Hz display), but that may be preferable to "mostly" making full framerate with frequent dips. 3. Alternatively, disable VSYNC when you're not quite making it. This is common on consoles, where you have the ability to do this much better than you do on PC. It's good for when you're just barely missing your VSYNC rate, since in that case most of the screen will still get updated at full rate (however there will be a horizontal tear line). It will also keep you from dropping to half the VSYNC rate, which will reduce the error in your time delta assumption. 4. Triple buffering can also give you similar results to disabling VSYNC, but also prevent tearing (note that non-fullscreen D3D apps on Windows are effectively triple-buffered by default since they go through the desktop compositor) 5. You could also try filtering your time deltas a bit to keep them from getting too erratic when you don't make VSYNC. I've never tried this myself, but it's possible that having a more consistent but smaller errors in your time delta is better than less frequent but larger errors. Hopefully someone else can chime in with more thoughts if they have experience with this. I haven't really done any specific research or experimentation with this issue outside of making games feel good when they ship, so don't consider me an authority on this issue.
1 2 NEXT
Page 1 of 2
Create an account or sign in to comment You need to be a member in order to leave a comment
Create an account
Sign in
Sign up for a new account in our community. It's easy!
Already have an account? Sign in here.
Register a new account
Sign In Now
GO TO TOPIC LISTING
Graphics and GPU Programming
ADVERTISEMENT
Home Forums Programming Graphics and GPU Programming
GDNet Chat
Resources
Community
About
Articles & Tutorials Blogs Calendar Forums Gallery
Activity Guidelines GameDev Market GameDev Jobs Leaderboard
About Us Terms of Service Privacy Policy Contact Us
All Activity
Search...
Social
×
Sign Up with GameDev.net!
Participate in the game development conversation and more when you Copyright © 1999-2018 GameDev.net, LLC create an account on GameDev.net!
Important Information By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.
Sign me up!
I accept