Recently I have been interested in working with multiple windows in OpenGL. In most cases this also means working with multiple OpenGL contexts and, on occasion, threads. This “Fact Sheet” is meant to be a summary of my research on this topic to help me keep it all straight (it’s a complex topic).
- You can change which OS window a GL context is bound to, note however this will first involve unbinding it from the window it was originally attached too. This assumes that both windows are using the same pixel format. This isn’t much of an issue these days as modern OSs have pretty much standardized on 24bit RGB for their pixel format, you aren’t likely to come across a system that mixes them anymore.
- There are hardware/vendor specific OpenGL extensions which allow you to specify which card to use for a given GL context and, in some cases, allow movement between cards. These calls are often limited to specific high end cards (e.g. the NVIDIA Quadro cards).
- Each application thread can only have one OpenGL context bound at any given time. You can change which GL Context is active for a thread at any time using an OS specific call.
- In most cases dividing OpenGL rendering across multiple threads will not result in any performance improvement (due to the pipeline nature of OpenGL). You can however achieve significant performance improvements by using a second thread for data streaming (see the Performance section below).
It is possible to Share information between different OpenGL contexts, subject to some restrictions.
- Sharing can only occur in the same OpenGL implementation, i.e. you cannot share between a software render, an ATI Hardware Render (ATI Card) and an NVIDIA hardware render (ATI Card), etc. Each of these uses different code to implement OpenGL (i.e. different OpenGL.dll files, or whatever your platform of choice equivalent is) even if they are the same version of OpenGL, they are still separate and can’t share.
- You can share data between different OpenGL Contexts, even if these contexts are bound to different GPUs (once again assuming they use the same OpenGL implementation/Drivers). Some things to note:
- This is done using OS Specific extensions, on windows you use the wglShareLists() function, Mac OS X and Linux X11 have their own methods as well. The wglShareLists() function shares Data between the context(s) you specify.
- If you have multiple separate cards then the data gets copied once to each card as they each have different address spaces and it is the only way to “share” the data. This can significantly slow down sending data to the GPUs.
- There is no “primary” Context. By that I mean that no individual context controls/owns the data being shared, ownership is shard along with the data. For example, ordinarily when you close an OpenGL context it will clean up after itself. However when sharing between contexts the shared data will only be cleaned up after all contexts that use it are destroyed. So you can do something like: create context 1, load a texture with it, create context two setting it to share with context one, delete context one, use the texture in context two.
- In general all ‘Data’ objects are shared between OpenGL contexts, including but not limited to:
- Vertex Buffer Objects (VBOs), i.e. vertex data
- Index Buffer Objects (VAOs), i.e. Indices
- Shader Programs* (see below)
- Pixel Buffer Objects (PBOs)
- Render Buffers
- Sampler Objects *One thing to note with shader programs is that their state is context independent. This means that any uniforms you bind for that shader remain bound even after switching contexts, e.g.
Bind Context 1 Bind Shader 1 // now available for use in context 1 Bind Model Matrix 1 // bound to shader 1 Bind Texture 1 // bound to shader 1 Bind Context 2 Bind Shader 1 // now available for use in context 1, // note that it still has Model // Matrix 1 and Texture 1 bound to it!!
- Depending on how you configure your render system this may cause issues. Also note that the currently bound Texture is just another uniform attached to the shader.
- In contrast ‘State’ objects are not shared between contexts, including but not limited to:
- Vertex Array Objects (VAOs)
- Framebuffer Objects (FBOs)
- In my reading I have seen it recommended that you create all your contexts and setup sharing before you start sending data to the GPU(s). This is what I’ve done to date in my testing, so I have no idea what happens if you don’t do this.
Assuming that you are rendering the same scene, from the same view-point, with the same options, to the same sized buffer (i.e. same resolution) on the same GPU and you have set up your render pipeline to completely render the scene for each context in a serial fashion to reduce context switching, then the performance penalty for multi-context rendering can be calculated using the follow rule of thumb:
Multi Context Performance = Single context Performance / Number of Contexts – context switching overhead
If any of the above assumptions are not true then f**k only knows what the performance impact is going to be. You’ll need to do some benchmarking to find out. The point is that each extra render context has a major impact of the applications performance.
It is worth noting that most OpenGL implementations and graphics drivers are single threaded (Mac OSX seems to be an exception as it seems that there is an optional multithreaded implementation you can use). This means that while it is possible and, on most modern implementations, safe to be rendering using multiple contexts at the same time, there is very little performance improvement. In fact there is most likely a performance penalty to doing this. Why? Because the OpenGL driver queues up the commands in a pipeline as it receives them, when sending it render commands from multiple contexts in an asynchronous fashion you end up with a queue like this:
glUseProgram() // context 1 glActiveTexture () // context 1 glDrawElements() // context 2 glBindTexture () // context 1 glDrawElements() // context 2 glUseProgram () // context 2 glDrawElements() // context 1
As you can see, each context/thread is happily doing its own thing, however as the driver executes each command it needs to change to the appropriate context, in the case above that means 4 different contexts switches for just 7 commands. Remember that graphics cards do not like switching contexts. This is why it is in most cases better to just render to each context sequentially once per frame, as this would minimize the context switching and result in more predictable/measurable performance.
I’ve tried rending using multiple contexts on multiple threads (based on this). Without synchronising the threads the demo went from running at ~2300 frames-per-second when running on a single thread to ~60 frames-per-second when multi-threaded, the context NVIDIA drivers to crash in some instances due to race conditions, so not only is it a lot slower, it is also unstable.
When synchronising the threads so that only one of them can render at any given time I experienced a more modest performance reduction of 18%. By synchronising the threads most (if not all) of the extra cost associated with the OpenGL context switching is removed, leaving only the additional overhead of syncing the threads.
Given the above performance it is not recommended that you do not try rendering using multiple OpenGL contexts. A better use of a second OpenGL context is for data streaming, i.e. moving data too/from the GPU. In one case I have seen the use of a PBO + second context/thread achieve up to a 300% performance improvement in rendering 1080p video when compared to a single context/thread without using a PBO to manage the texture data.
I have uploaded a small demo project to Github, it is based on my GLFW3 tutorial (as discussed above).If you whish to play with the demo I suggest starting on line 75 of ThreadingDemo.cpp (through to about line 90). buy toggleing the g_bDoWork variable you can control wether or not “work” is simulated. Below this you’ll find three loops, the first is a single threaded loop (both windows render on the same thread). The second is the bad/unstable loop (I don’t recommend running this). and the third is the good multi-threaded loop (and is ‘on’ by default). I’ve simply commented out all but one loop, change which one to see the different results.
BTW I would not recommend using this as a “best practice” example as to how to render from multiple threads.
(Or some other interesting information on this topic)
Info on Context Destruction in an OO setting (has implications for multi Context environments too)
An article on using multi GPU and OpenGL Contexts to render to multiple monitors
A FAQ on parallel OpenGL Programming
OpenGL and Multithreading
A tutorial on using a second OpenGL context for texture streaming.
The story of multi-monitor rendering (Mainly about DirectX but there’s some interesting stuff on OpenGL here too, also i think the Win7 bug discussed is still in Win8).
5. http://www.opengl.org/registry/doc/glspec44.core.pdf (Chapter 5, p. 47)