Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GLSL backend #15

Open
dhewg opened this issue Jun 30, 2012 · 95 comments
Open

GLSL backend #15

dhewg opened this issue Jun 30, 2012 · 95 comments

Comments

@dhewg
Copy link
Member

dhewg commented Jun 30, 2012

There're two GLSL branches floating around:

  1. https://git.iodoom.org/~raynorpat/iodoom3/raynorpats-glsl_iodoom3/commits/master
    and a continuation 2) https://github.com/LogicalError/doom3.gpl/commits/master

Both are based on different trees, and I merged those a while back on top of my tree:
http://static.hackmii.com/dhewg/0001-Add-GLSL-backend.patch

My lack of GL foo is disturbing, but maybe someone wants to finish this backend?

@andre-d
Copy link
Contributor

andre-d commented Jul 4, 2012

I would like to do it next time I have time, might pick it up this evening.

@andre-d
Copy link
Contributor

andre-d commented Jul 5, 2012

Any idea whats not working/not finished in it? I played with it and it seemed to work just fine.

@andre-d
Copy link
Contributor

andre-d commented Jul 5, 2012

Alright, I have identified some differences/flaws which I want to correct. (I first updated to his latest versions of the shaders your patches were missing)

  1. Specularity looks sketchy compared to arb2

  2. Optimize, optimize, optimize

  3. Custom render passes

  4. Hook up with a high resolution timer to tell how long renders are taking

@andre-d
Copy link
Contributor

andre-d commented Jul 5, 2012

Btw the commit is at 2bfe28dcb2c2c00732d03d50a523023e6b55428d (With the modifications)

@scaronni
Copy link

Here's a newly released branch with both GLSL and ES 2.0 renderer:

https://github.com/omcfadde/idtech4/commits/es2

@omcfadde
Copy link

FYI, my tree has been renamed and moved to https://github.com/omcfadde/dante

GLSL backend is implemented with some minor missing features (i.e. shaders for heatHaze, heatHazeWithMask, etc.) There may also be other bugs. Hopefully I will have time to fix them soon.

@tapir
Copy link

tapir commented Sep 20, 2012

Is there someone working on merging dante's modifications?

@motorsep
Copy link

Not that I know of. Plus what's there to merge? It's not 100% working yet.

@andre-d
Copy link
Contributor

andre-d commented Sep 20, 2012

heathaze and multitexture and such will need to be implemented for that to happen

@motorsep
Copy link

Why does one need GLSL backend precisely ? ARB shaders can be made using RenderMonkey and FX Composer afaik. Where it's ARB or GLSL, one still needs to know what to do and how to do it.

@andre-d
Copy link
Contributor

andre-d commented Sep 20, 2012

ARB is outdated and basically coding assembly, GLSL is more like coding C.

@motorsep
Copy link

Oh, I am aware of that. But isn't it irrelevant when using above mentioned tools for share creation ?

@andre-d
Copy link
Contributor

andre-d commented Sep 20, 2012

Even if you use some kind of shader compiler to compile down to ARB, arb is still limited in terms of available features/limits. FX Composer can be used to create CG shaders, which can then be compiled down into ARB only if you use the feature sets available in ARB. GLSL is the more modern solution, even if some shader languages compile down to ARB still...and even then..you are coding in CG..not GLSL..two totally separate shader languages.

@omcfadde
Copy link

It's possible for the GLSL compiler to generate much more efficient code for whichever hardware you happen to be running. While both ARB programs and GLSL shaders may be compiled to the same intermediate representation internally, GLSL is able to provide a significantly more context information which the compiler can use to generate more efficient code.

Many so called high-level optimizations that are possible when compiling from GLSL are simply not possible, or are prohibitively difficult with lower level languages (or intermediate representations thereof.)

You are comparing apples and oranges when talking about GLSL shaders vs. ARB fragment and vertex programs, and I generally do not like the "C vs. assembly" comparison, but in this case the example is quite apt; the C compiler will eventually use the assembler, but not before it's had a chance to make many optimization passes at the high level. The assembler may then perform further low-level optimization on the IR before finally emitting machine code.

andre-d, heat haze will be coming; I have some work on a local branch, but it's not release ready yet.

@nbohr1more
Copy link

RaynorPat has a new branch with Mh's VBO Cache and GLSL ported from BFG:

https://github.com/raynorpat/morpheus

@motorsep
Copy link

Did you see that his fork is no longer on Github ?

@nbohr1more
Copy link

This one is:

https://github.com/raynorpat/Doom3

however, it seems that he also has a deferred shading branch

https://github.com/raynorpat/dhewm3

too.

@DanielGibson
Copy link
Member

Just a note here: if this (or any other new rendering-backend) is gonna happen eventually, I'd like

  • rendering backends in DLLs so they can be switched without recompiling (ok, this is not a super-hard requirement)
  • automatic translation from ARB shader code to GLSL - I think this is the only feasible way to support Mods, that often also have their own shaders (this is a hard requirement)

@coreyoconnor
Copy link
Contributor

@DanielGibson the second point: translation of ARB to GLSL is an interesting problem on it's own. Are you aware of any existing tooling that does this?

I searched for a bit and only found people asking the same.

@DanielGibson
Copy link
Member

I'm not, but it doesn't seem like a super hard problem to me?
ARB assembly seems to be pretty limited (even more so when assuming that the relevant shaders only use the core instructions and not nvidias extensions), it'd probably be possible to just translate it line by line to equivalent GLSL (and then add some "static", most probably game-specific, GLSL code before and possibly after that for passing in/out vertex data and uniforms).

One problem seems to be that finding documentation and tutorials/examples on ARB assembly is kinda hard; here are a few sources I found when researching this a while ago though:

Despite doing this (minimal) research I haven't looked into how to actually do this though, so if you or someone else wants to do it, that'd be great! :-)
Probably a good starting point on the C++ side of things is an existing "modern" OpenGL renderer that uses GLSL (with rewritten shaders), I'd probably use the one from D3Wasm (https://github.com/gabrielcuvillier/d3wasm).

@motorsep
Copy link

motorsep commented May 13, 2019

Thread with beard... @coreyoconnor Why not to just use renderer from Doom 3 BFG for dhewm3 ?

@raynorpat
Copy link
Contributor

@motorsep that doesn't really bring the compatibility piece for mods still using custom ARB shaders

@motorsep
Copy link

@motorsep that doesn't really bring the compatibility piece for mods still using custom ARB shaders

What if someone writes a parser to convert ARB to GLSL on the fly? :)

@coreyoconnor
Copy link
Contributor

@motorsep

Interesting project right? A C++ library providing source to source translation of ARB to GLSL. Would be useful outside of doom 3 as well I bet.

@turol
Copy link
Contributor

turol commented May 14, 2019

The specs for ARB_vertex_program and ARB_fragment_program contain a grammar of the language.

@revelator
Copy link

revelator commented Aug 9, 2019

If you can live with only the interaction shader being drawn with GLSL this might interrest you.

/*
===========================================================================

Doom 3 GPL Source Code
Copyright (C) 1999-2011 id Software LLC, a ZeniMax Media company.

This file is part of the Doom 3 GPL Source Code (?Doom 3 Source Code?).

Doom 3 Source Code is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Doom 3 Source Code is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with Doom 3 Source Code.  If not, see <http://www.gnu.org/licenses/>.

In addition, the Doom 3 Source Code is also subject to certain additional terms.
You should have received a copy of these additional terms immediately following
the terms and conditions of the GNU General Public License which accompanied the
Doom 3 Source Code.  If not, please request a copy in writing from id Software
at the address below.

If you have questions concerning this license or the applicable additional terms,
you may contact in writing id Software LLC, c/o ZeniMax Media Inc., Suite 120,
Rockville, Maryland 20850 USA.

===========================================================================
*/

#include "precompiled.h"
#include "tr_local.h"

/*
===========================================================================

DEFAULT GLSL SHADER

===========================================================================
*/
#define GLSL_VERSION_ATTRIBS \
	"#version 130\n"

#define GLSL_INPUT_ATTRIBS \
	"in vec4 attrTexCoords;\n" \
	"in vec3 attrTangents0;\n" \
	"in vec3 attrTangents1;\n" \
	"in vec3 attrNormal;\n" \
	"mat3x3 u_lightMatrix = mat3x3 (attrTangents0, attrTangents1, attrNormal);\n\n"

#define GLSL_UNIFORMS \
	"uniform vec4 u_light_origin;\n" \
	"uniform vec4 u_view_origin;\n" \
	"uniform vec4 u_color_modulate;\n" \
	"uniform vec4 u_color_add;\n" \
	"uniform mat2x4 u_diffMatrix;\n" \
	"uniform mat2x4 u_bumpMatrix;\n" \
	"uniform mat2x4 u_specMatrix;\n" \
	"uniform mat4x4 u_projMatrix;\n" \
	"uniform mat4x4 u_fallMatrix;\n" \
	"uniform sampler2D bumpImage;\n" \
	"uniform sampler2D lightFalloffImage;\n" \
	"uniform sampler2D lightProjectImage;\n" \
	"uniform sampler2D diffuseImage;\n" \
	"uniform sampler2D specularImage;\n" \
	"uniform vec4 u_constant_diffuse;\n" \
	"uniform vec4 u_constant_specular;\n\n"

#define GLSL_VARYINGS \
	"varying vec2 diffCoords;\n" \
	"varying vec2 bumpCoords;\n" \
	"varying vec2 specCoords;\n" \
	"varying vec4 projCoords;\n" \
	"varying vec4 fallCoords;\n" \
	"varying vec3 lightDir;\n" \
	"varying vec3 halfAngle;\n" \
	"varying vec4 Color;\n"

// these are our GLSL interaction shaders
#define interaction_vs \
	GLSL_VERSION_ATTRIBS \
	GLSL_INPUT_ATTRIBS \
	GLSL_UNIFORMS \
	GLSL_VARYINGS \
	"void main ()\n" \
	"{\n" \
	"	// we must use ftransform as Doom 3 needs invariant position\n" \
	"	gl_Position = ftransform ();\n" \
	"\n" \
	"	diffCoords = attrTexCoords * u_diffMatrix;\n" \
	"	bumpCoords = attrTexCoords * u_bumpMatrix;\n" \
	"	specCoords = attrTexCoords * u_specMatrix;\n" \
	"\n" \
	"	projCoords = gl_Vertex * u_projMatrix;\n" \
	"	fallCoords = gl_Vertex * u_fallMatrix;\n" \
	"\n" \
	"	Color = (gl_Color * u_color_modulate) + u_color_add;\n" \
	"\n" \
	"	vec3 OffsetViewOrigin = (u_view_origin - gl_Vertex).xyz;\n" \
	"	vec3 OffsetLightOrigin = (u_light_origin - gl_Vertex).xyz;\n" \
	"\n" \
	"	lightDir = OffsetLightOrigin * u_lightMatrix;\n" \
	"	halfAngle = (normalize (OffsetViewOrigin) + normalize (OffsetLightOrigin)) * u_lightMatrix;\n" \
	"}\n\n"

#define interaction_fs \
	GLSL_VERSION_ATTRIBS \
	GLSL_UNIFORMS \
	GLSL_VARYINGS \
	"void main ()\n" \
	"{\n" \
	"	vec3 normalMap = texture2D (bumpImage, bumpCoords).agb * 2.0 - 1.0;\n" \
	"	vec4 lightMap = texture2DProj (lightProjectImage, projCoords);\n" \
	"\n" \
	"	lightMap *= dot (normalize (lightDir), normalMap);\n" \
	"	lightMap *= texture2DProj (lightFalloffImage, fallCoords);\n" \
	"	lightMap *= Color;\n" \
	"\n" \
	"	vec4 diffuseMap = texture2D (diffuseImage, diffCoords) * u_constant_diffuse;\n" \
	"	float specularComponent = clamp ((dot (normalize (halfAngle), normalMap) - 0.75) * 4.0, 0.0, 1.0);\n" \
	"\n" \
	"	vec4 specularResult = u_constant_specular * (specularComponent * specularComponent);\n" \
	"	vec4 specularMap = texture2D (specularImage, specCoords) * 2.0;\n" \
	"\n" \
	"	gl_FragColor = (diffuseMap + (specularResult * specularMap)) * lightMap;\n" \
	"}\n\n"

/* 32 bit hexadecimal 0, BFG had this set to a negative value which is illegal on unsigned */
static const GLuint INVALID_PROGRAM = 0x00000000;

static GLuint u_light_origin = INVALID_PROGRAM;
static GLuint u_view_origin = INVALID_PROGRAM;

static GLuint u_color_modulate = INVALID_PROGRAM;
static GLuint u_color_add = INVALID_PROGRAM;

static GLuint u_constant_diffuse = INVALID_PROGRAM;
static GLuint u_constant_specular = INVALID_PROGRAM;

static GLuint u_diffMatrix = INVALID_PROGRAM;
static GLuint u_bumpMatrix = INVALID_PROGRAM;
static GLuint u_specMatrix = INVALID_PROGRAM;

static GLuint u_projMatrix = INVALID_PROGRAM;
static GLuint u_fallMatrix = INVALID_PROGRAM;

static GLuint rb_glsl_interaction_program = INVALID_PROGRAM;

/*
==================
RB_GLSL_MakeMatrix
==================
*/
static float *RB_GLSL_MakeMatrix( const float *in1 = 0, const float *in2 = 0, const float *in3 = 0, const float *in4 = 0 )
{
    static float m[16];

    if( in1 )
    {
        SIMDProcessor->Memcpy( &m[0], in1, sizeof( float ) * 4 );
    }

    if( in2 )
    {
        SIMDProcessor->Memcpy( &m[4], in2, sizeof( float ) * 4 );
    }

    if( in3 )
    {
        SIMDProcessor->Memcpy( &m[8], in3, sizeof( float ) * 4 );
    }

    if( in4 )
    {
        SIMDProcessor->Memcpy( &m[12], in4, sizeof( float ) * 4 );
    }
    return m;
}

/* Calculate matrix offsets */
#define DIFFMATRIX( ofs ) din->diffuseMatrix[ofs].ToFloatPtr ()
#define BUMPMATRIX( ofs ) din->bumpMatrix[ofs].ToFloatPtr ()
#define SPECMATRIX( ofs ) din->specularMatrix[ofs].ToFloatPtr ()
#define PROJMATRIX( ofs ) din->lightProjection[ofs].ToFloatPtr ()

/*
=========================================================================================

GENERAL INTERACTION RENDERING

=========================================================================================
*/

/*
==================
RB_ARB2_BindTexture
==================
*/
void RB_ARB2_BindTexture( int unit, idImage *tex )
{
    backEnd.glState.currenttmu = unit;
    glActiveTextureARB( GL_TEXTURE0_ARB + unit );
    tex->BindFragment();
}

/*
==================
RB_ARB2_UnbindTexture
==================
*/
void RB_ARB2_UnbindTexture( int unit )
{
    backEnd.glState.currenttmu = unit;
    glActiveTextureARB( GL_TEXTURE0_ARB + unit );
    globalImages->BindNull();
}

/*
==================
RB_ARB2_BindInteractionTextureSet
==================
*/
void RB_ARB2_BindInteractionTextureSet( const drawInteraction_t *din )
{
    // texture 1 will be the per-surface bump map
    RB_ARB2_BindTexture( 1, din->bumpImage );

    // texture 2 will be the light falloff texture
    RB_ARB2_BindTexture( 2, din->lightFalloffImage );

    // texture 3 will be the light projection texture
    RB_ARB2_BindTexture( 3, din->lightImage );

    // texture 4 is the per-surface diffuse map
    RB_ARB2_BindTexture( 4, din->diffuseImage );

    // texture 5 is the per-surface specular map
    RB_ARB2_BindTexture( 5, din->specularImage );
}

/*
==================
RB_GLSL_DrawInteraction
==================
*/
static void RB_GLSL_DrawInteraction( const drawInteraction_t *din )
{
    /* Half Lambertian constants */
    static const float whalf[] = { 0.0f, 0.0f, 0.0f, 0.5f };
    static const float wzero[] = { 0.0f, 0.0f, 0.0f, 0.0f };
    static const float wone[] = { 0.0f, 0.0f, 0.0f, 1.0f };

    // load all the vertex program parameters
    glUniform4fv( u_light_origin, 1, din->localLightOrigin.ToFloatPtr() );
    glUniform4fv( u_view_origin, 1, din->localViewOrigin.ToFloatPtr() );

    glUniformMatrix2x4fv( u_diffMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( DIFFMATRIX( 0 ), DIFFMATRIX( 1 ) ) );
    glUniformMatrix2x4fv( u_bumpMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( BUMPMATRIX( 0 ), BUMPMATRIX( 1 ) ) );
    glUniformMatrix2x4fv( u_specMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( SPECMATRIX( 0 ), SPECMATRIX( 1 ) ) );

    glUniformMatrix4fv( u_projMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( PROJMATRIX( 0 ), PROJMATRIX( 1 ), wzero, PROJMATRIX( 2 ) ) );
    glUniformMatrix4fv( u_fallMatrix, 1, GL_FALSE, RB_GLSL_MakeMatrix( PROJMATRIX( 3 ), whalf, wzero, wone ) );

    /* Lambertian constants */
    static const float zero[4] = { 0.0f, 0.0f, 0.0f, 0.0f };
    static const float one[4] = { 1.0f, 1.0f, 1.0f, 1.0f };
    static const float negOne[4] = { -1.0f, -1.0f, -1.0f, -1.0f };

    switch( din->vertexColor )
    {
    case SVC_IGNORE:
        glUniform4fv( u_color_modulate, 1, zero );
        glUniform4fv( u_color_add, 1, one );
        break;

    case SVC_MODULATE:
        glUniform4fv( u_color_modulate, 1, one );
        glUniform4fv( u_color_add, 1, zero );
        break;

    case SVC_INVERSE_MODULATE:
        glUniform4fv( u_color_modulate, 1, negOne );
        glUniform4fv( u_color_add, 1, one );
        break;
    }

    // set the constant colors
    glUniform4fv( u_constant_diffuse, 1, din->diffuseColor.ToFloatPtr() );
    glUniform4fv( u_constant_specular, 1, din->specularColor.ToFloatPtr() );

    // set the textures
    RB_ARB2_BindInteractionTextureSet( din );

    // draw it
    RB_DrawElementsWithCounters( din->surf->geo );
}

/*
==================
RB_ARB2_DrawInteraction
==================
*/
void RB_ARB2_DrawInteraction( const drawInteraction_t *din )
{
    // load all the vertex program parameters
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_ORIGIN, din->localLightOrigin.ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_VIEW_ORIGIN, din->localViewOrigin.ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_PROJECT_S, din->lightProjection[0].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_PROJECT_T, din->lightProjection[1].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_PROJECT_Q, din->lightProjection[2].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_LIGHT_FALLOFF_S, din->lightProjection[3].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_BUMP_MATRIX_S, din->bumpMatrix[0].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_BUMP_MATRIX_T, din->bumpMatrix[1].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_DIFFUSE_MATRIX_S, din->diffuseMatrix[0].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_DIFFUSE_MATRIX_T, din->diffuseMatrix[1].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_SPECULAR_MATRIX_S, din->specularMatrix[0].ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_SPECULAR_MATRIX_T, din->specularMatrix[1].ToFloatPtr() );

    // testing fragment based normal mapping
    if( r_testARBProgram.GetBool() )
    {
        glProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 2, din->localLightOrigin.ToFloatPtr() );
        glProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 3, din->localViewOrigin.ToFloatPtr() );
    }
    static const float zero[4] = { 0, 0, 0, 0 };
    static const float one[4] = { 1, 1, 1, 1 };
    static const float negOne[4] = { -1, -1, -1, -1 };

    switch( din->vertexColor )
    {
    case SVC_IGNORE:
        glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_MODULATE, zero );
        glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_ADD, one );
        break;

    case SVC_MODULATE:
        glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_MODULATE, one );
        glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_ADD, zero );
        break;

    case SVC_INVERSE_MODULATE:
        glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_MODULATE, negOne );
        glProgramEnvParameter4fvARB( GL_VERTEX_PROGRAM_ARB, PP_COLOR_ADD, one );
        break;
    }

    // set the constant colors
    glProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 0, din->diffuseColor.ToFloatPtr() );
    glProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 1, din->specularColor.ToFloatPtr() );

    // set the textures
    RB_ARB2_BindInteractionTextureSet( din );

    // draw it
    RB_DrawElementsWithCounters( din->surf->geo );
}

/*
=============
RB_ARB2_SharedSurfaceSetup
=============
*/
void RB_ARB2_SharedSurfaceSetup( const drawSurf_t *surf )
{
    // set the vertex pointers
    idDrawVert *ac = ( idDrawVert * ) vertexCache.Position( surf->geo->ambientCache );
    glColorPointer( 4, GL_UNSIGNED_BYTE, sizeof( idDrawVert ), ac->color );
    glVertexAttribPointerARB( 11, 3, GL_FLOAT, false, sizeof( idDrawVert ), ac->normal.ToFloatPtr() );
    glVertexAttribPointerARB( 10, 3, GL_FLOAT, false, sizeof( idDrawVert ), ac->tangents[1].ToFloatPtr() );
    glVertexAttribPointerARB( 9, 3, GL_FLOAT, false, sizeof( idDrawVert ), ac->tangents[0].ToFloatPtr() );
    glVertexAttribPointerARB( 8, 2, GL_FLOAT, false, sizeof( idDrawVert ), ac->st.ToFloatPtr() );
    glVertexPointer( 3, GL_FLOAT, sizeof( idDrawVert ), ac->xyz.ToFloatPtr() );
}


/*
=============
RB_ARB2_CreateDrawInteractions
=============
*/
void RB_ARB2_CreateDrawInteractions( const drawSurf_t *surf )
{
    if( !surf )
    {
        return;
    }

    // perform setup here that will be constant for all interactions
    GL_State( GLS_SRCBLEND_ONE | GLS_DSTBLEND_ONE | GLS_DEPTHMASK | backEnd.depthFunc );

    // enable the vertex arrays
    glEnableVertexAttribArrayARB( 8 );
    glEnableVertexAttribArrayARB( 9 );
    glEnableVertexAttribArrayARB( 10 );
    glEnableVertexAttribArrayARB( 11 );
    glEnableClientState( GL_COLOR_ARRAY );

    // check for enabled GLSL program first, if it fails go back to ARB
    if( rb_glsl_interaction_program != INVALID_PROGRAM )
    {
        // enable GLSL programs
        glUseProgram( rb_glsl_interaction_program );

        // texture 0 is the normalization cube map for the vector towards the light
        if( backEnd.vLight->lightShader->IsAmbientLight() )
        {
            RB_ARB2_BindTexture( 0, globalImages->ambientNormalMap );
        }
        else
        {
            RB_ARB2_BindTexture( 0, globalImages->normalCubeMapImage );
        }

        // no test program in GLSL renderer
        RB_ARB2_BindTexture( 6, globalImages->specularTableImage );

        for( /**/; surf; surf = surf->nextOnLight )
        {
            // perform setup here that will not change over multiple interaction passes
            RB_ARB2_SharedSurfaceSetup( surf );

            // this may cause RB_ARB2_DrawInteraction to be executed multiple
            // times with different colors and images if the surface or light have multiple layers
            RB_CreateSingleDrawInteractions( surf, RB_GLSL_DrawInteraction );
        }

        // back to fixed (or ARB program)
        glUseProgram( INVALID_PROGRAM );
    }
    else // Do it the old way
    {
        // enable ASM programs
        glEnable( GL_VERTEX_PROGRAM_ARB );
        glEnable( GL_FRAGMENT_PROGRAM_ARB );

        // texture 0 is the normalization cube map for the vector towards the light
        if( backEnd.vLight->lightShader->IsAmbientLight() )
        {
            RB_ARB2_BindTexture( 0, globalImages->ambientNormalMap );
        }
        else
        {
            RB_ARB2_BindTexture( 0, globalImages->normalCubeMapImage );
        }

        // bind the vertex program
        if( r_testARBProgram.GetBool() )
        {
            RB_ARB2_BindTexture( 6, globalImages->specular2DTableImage );

            glBindProgramARB( GL_VERTEX_PROGRAM_ARB, VPROG_TEST );
            glBindProgramARB( GL_FRAGMENT_PROGRAM_ARB, FPROG_TEST );
        }
        else
        {
            RB_ARB2_BindTexture( 6, globalImages->specularTableImage );

            glBindProgramARB( GL_VERTEX_PROGRAM_ARB, VPROG_INTERACTION );
            glBindProgramARB( GL_FRAGMENT_PROGRAM_ARB, FPROG_INTERACTION );
        }

        for( /**/; surf; surf = surf->nextOnLight )
        {
            // perform setup here that will not change over multiple interaction passes
            RB_ARB2_SharedSurfaceSetup( surf );

            // this may cause RB_ARB2_DrawInteraction to be exacuted multiple
            // times with different colors and images if the surface or light have multiple layers
            RB_CreateSingleDrawInteractions( surf, RB_ARB2_DrawInteraction );
        }

        // need to disable ASM programs again
        glBindProgramARB( GL_VERTEX_PROGRAM_ARB, VPROG_NONE );
        glBindProgramARB( GL_FRAGMENT_PROGRAM_ARB, VPROG_NONE );

        glDisable( GL_VERTEX_PROGRAM_ARB );
        glDisable( GL_FRAGMENT_PROGRAM_ARB );
    }

    // disable vertex arrays
    glDisableVertexAttribArrayARB( 8 );
    glDisableVertexAttribArrayARB( 9 );
    glDisableVertexAttribArrayARB( 10 );
    glDisableVertexAttribArrayARB( 11 );
    glDisableClientState( GL_COLOR_ARRAY );

    // disable features
    RB_ARB2_UnbindTexture( 6 );
    RB_ARB2_UnbindTexture( 5 );
    RB_ARB2_UnbindTexture( 4 );
    RB_ARB2_UnbindTexture( 3 );
    RB_ARB2_UnbindTexture( 2 );
    RB_ARB2_UnbindTexture( 1 );

    backEnd.glState.currenttmu = -1;
    GL_SelectTexture( 0 );
}

/*
==================
RB_ARB2_InteractionPass
==================
*/
void RB_ARB2_InteractionPass( const drawSurf_t *shadowSurfs, const drawSurf_t *lightSurfs )
{
	// these are allway's enabled since we do not yet use GLSL shaders for the shadows.
	glEnable( GL_VERTEX_PROGRAM_ARB );
	glBindProgramARB( GL_VERTEX_PROGRAM_ARB, VPROG_STENCIL_SHADOW );

    // save on state changes by not bothering to setup/takedown all the messy states when there are no surfs to draw
    if( shadowSurfs )
    {
        RB_StencilShadowPass( shadowSurfs );
    }

    if( lightSurfs )
    {
        RB_ARB2_CreateDrawInteractions( lightSurfs );
    }

	// need to disable ASM programs again, we do not check for GLSL here since we do not use it for shadows.
	glBindProgramARB( GL_VERTEX_PROGRAM_ARB, VPROG_NONE );
	glDisable( GL_VERTEX_PROGRAM_ARB );
}

/*
==================
RB_ARB2_DrawInteractions
==================
*/
void RB_ARB2_DrawInteractions( void )
{
    viewLight_t	*vLight;

    GL_SelectTexture( 0 );

	// ensure that GLSL is down comming in here.
	glUseProgram( INVALID_PROGRAM );

    // for each light, perform adding and shadowing
    for( vLight = backEnd.viewDef->viewLights; vLight; vLight = vLight->next )
    {
        backEnd.vLight = vLight;

        // do fogging later
        if( vLight->lightShader->IsFogLight() )
        {
            continue;
        }

        if( vLight->lightShader->IsBlendLight() )
        {
            continue;
        }

        // nothing to see here; these aren't the surfaces you're looking for; move along
        if( !vLight->localInteractions &&
            !vLight->globalInteractions &&
            !vLight->translucentInteractions )
        {
            continue;
        }

        // clear the stencil buffer if needed
        if( vLight->globalShadows || vLight->localShadows )
        {
            backEnd.currentScissor = vLight->scissorRect;

            if( r_useScissor.GetBool() )
            {
                glScissor( backEnd.viewDef->viewport.x1 + backEnd.currentScissor.x1,
                           backEnd.viewDef->viewport.y1 + backEnd.currentScissor.y1,
                           backEnd.currentScissor.x2 + 1 - backEnd.currentScissor.x1,
                           backEnd.currentScissor.y2 + 1 - backEnd.currentScissor.y1 );
            }
            glClear( GL_STENCIL_BUFFER_BIT );
        }
        else
        {
            // no shadows, so no need to read or write the stencil buffer
            // we might in theory want to use GL_ALWAYS instead of disabling
            // completely, to satisfy the invarience rules
            glStencilFunc( GL_ALWAYS, 128, 255 );
        }

        // run our passes for global and local
        RB_ARB2_InteractionPass( vLight->globalShadows, vLight->localInteractions );
        RB_ARB2_InteractionPass( vLight->localShadows, vLight->globalInteractions );

        // translucent surfaces never get stencil shadowed
        if( r_skipTranslucent.GetBool() )
        {
            continue;
        }
        glStencilFunc( GL_ALWAYS, 128, 255 );

        backEnd.depthFunc = GLS_DEPTHFUNC_LESS;
        RB_ARB2_CreateDrawInteractions( vLight->translucentInteractions );
        backEnd.depthFunc = GLS_DEPTHFUNC_EQUAL;
    }

    // disable stencil shadow test
    glStencilFunc( GL_ALWAYS, 128, 255 );

    GL_SelectTexture( 0 );
}

//===================================================================================

typedef struct
{
    GLenum			target;
    GLuint			ident;
    char			name[64];
} progDef_t;

static	const int	MAX_GLPROGS = 256;

// a single file can have both a vertex program and a fragment program
// removed old invalid shaders, ARB2 is default nowadays and we override the interaction shaders with GLSL anyway if availiable.
static progDef_t	progs[MAX_GLPROGS] =
{
    {GL_VERTEX_PROGRAM_ARB,   VPROG_TEST, "test.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_TEST, "test.vfp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_INTERACTION, "interaction.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_INTERACTION, "interaction.vfp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_BUMPY_ENVIRONMENT, "bumpyEnvironment.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_BUMPY_ENVIRONMENT, "bumpyEnvironment.vfp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_AMBIENT, "ambientLight.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_AMBIENT, "ambientLight.vfp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_STENCIL_SHADOW, "shadow.vp"},
    {GL_VERTEX_PROGRAM_ARB,   VPROG_ENVIRONMENT, "environment.vfp"},
    {GL_FRAGMENT_PROGRAM_ARB, FPROG_ENVIRONMENT, "environment.vfp"},
    // additional programs can be dynamically specified in materials
};

/*
=================
R_LoadARBProgram
=================
*/
void R_LoadARBProgram( int progIndex )
{
    int		ofs;
    int		err;
    idStr	fullPath = "glprogs/";
    fullPath += progs[progIndex].name;
    char	*fileBuffer;
    char	*buffer;
    char	*start = '\0', *end;

    common->Printf( "%s", fullPath.c_str() );

    // load the program even if we don't support it, so
    // fs_copyfiles can generate cross-platform data dumps
    fileSystem->ReadFile( fullPath.c_str(), ( void ** ) &fileBuffer, NULL );

    if( !fileBuffer )
    {
        common->Printf( ": File not found\n" );
        return;
    }

    // copy to stack memory and free
    buffer = static_cast<char *>( _alloca( strlen( fileBuffer ) + 1 ) );
    strcpy( buffer, fileBuffer );
    fileSystem->FreeFile( fileBuffer );

    if( !glConfig.isInitialized )
    {
        return;
    }

    // submit the program string at start to GL
    if( progs[progIndex].ident == 0 )
    {
        // allocate a new identifier for this program
        progs[progIndex].ident = PROG_USER + progIndex;
    }

    // vertex and fragment programs can both be present in a single file, so
    // scan for the proper header to be the start point, and stamp a 0 in after the end
    if( progs[progIndex].target == GL_VERTEX_PROGRAM_ARB )
    {
        if( !glConfig.ARBVertexProgramAvailable )
        {
            common->Printf( ": GL_VERTEX_PROGRAM_ARB not available\n" );
            return;
        }
        start = strstr( ( char * ) buffer, "!!ARBvp" );
    }

    if( progs[progIndex].target == GL_FRAGMENT_PROGRAM_ARB )
    {
        if( !glConfig.ARBFragmentProgramAvailable )
        {
            common->Printf( ": GL_FRAGMENT_PROGRAM_ARB not available\n" );
            return;
        }
        start = strstr( ( char * ) buffer, "!!ARBfp" );
    }

    if( !start )
    {
        common->Printf( ": !!ARB not found\n" );
        return;
    }
    end = strstr( start, "END" );

    if( !end )
    {
        common->Printf( ": END not found\n" );
        return;
    }
    end[3] = 0;

    glBindProgramARB( progs[progIndex].target, progs[progIndex].ident );
    glGetError();

    glProgramStringARB( progs[progIndex].target, GL_PROGRAM_FORMAT_ASCII_ARB, strlen( start ), ( unsigned char * ) start );

    err = glGetError();
    glGetIntegerv( GL_PROGRAM_ERROR_POSITION_ARB, ( GLint * ) &ofs );

    if( err == GL_INVALID_OPERATION )
    {
        const GLubyte *str = glGetString( GL_PROGRAM_ERROR_STRING_ARB );
        common->DWarning( "\nGL_PROGRAM_ERROR_STRING_ARB: %s\n", str );

        if( ofs < 0 )
        {
            common->DWarning( "GL_PROGRAM_ERROR_POSITION_ARB < 0 with error\n" );
        }
        else if( ofs >= ( int ) strlen( ( char * ) start ) )
        {
            common->DWarning( "error at end of program\n" );
        }
        else
        {
            common->DWarning( "error at %i:\n%s", ofs, start + ofs );
        }
        return;
    }

    if( ofs != -1 )
    {
        common->DWarning( "\nGL_PROGRAM_ERROR_POSITION_ARB != -1 without error\n" );
        return;
    }
    common->Printf( "\n" );

    // need to strip the extension.
    fullPath.StripFileExtension();

    // output separated fragment / vertex shaders.
	if ( r_printGLProgs.GetBool() )
	{
		if ( progs[progIndex].target == GL_FRAGMENT_PROGRAM_ARB )
		{
			fileSystem->WriteFile( ( fullPath + ".fp" ).c_str(), start, strlen( start ) );
		}
		else
		{
			fileSystem->WriteFile( ( fullPath + ".vp" ).c_str(), start, strlen( start ) );
		}
	}
}

/*
==================
R_FindARBProgram

Returns a GL identifier that can be bound to the given target, parsing
a text file if it hasn't already been loaded.
==================
*/
int R_FindARBProgram( GLenum target, const char *program )
{
    int		i;
    idStr	stripped = program;

    stripped.StripFileExtension();

    // see if it is already loaded
    for( i = 0; progs[i].name[0]; i++ )
    {
        if( progs[i].target != target )
        {
            continue;
        }
        idStr	compare = progs[i].name;
        compare.StripFileExtension();

        if( !idStr::Icmp( stripped.c_str(), compare.c_str() ) )
        {
            return progs[i].ident;
        }
    }

    if( i == MAX_GLPROGS )
    {
        common->Error( "R_FindARBProgram: MAX_GLPROGS" );
    }

    // add it to the list and load it
    progs[i].ident = ( program_t ) 0;	// will be gen'd by R_LoadARBProgram
    progs[i].target = target;
    strncpy( progs[i].name, program, sizeof( progs[i].name ) - 1 );

    R_LoadARBProgram( i );

    common->Printf( "Finding program %s\n", program );
    return progs[i].ident;
}

/*
==================
GL_GetShaderInfoLog
==================
*/
static void GL_GetShaderInfoLog( GLuint s, GLchar *src, bool isprog )
{
	static GLchar	infolog[4096];
	GLsizei			outlen = 0;

    infolog[0] = 0;

    if( isprog )
    {
        glGetProgramInfoLog( s, 4095, &outlen, infolog );
    }
    else
    {
        glGetShaderInfoLog( s, 4095, &outlen, infolog );
    }
    common->Warning( "Shader Source:\n\n%s\n\n%s\n\n", src, infolog );
}

/*
==================
GL_CompileShader
==================
*/
static bool GL_CompileShader( GLuint sh, GLchar *src )
{
    if( sh && src )
    {
        GLint result = GL_FALSE;

        glGetError();

        glShaderSource( sh, 1, ( const GLchar ** )&src, NULL );
        glCompileShader( sh );
        glGetShaderiv( sh, GL_COMPILE_STATUS, &result );

        if( result != GL_TRUE )
        {
            GL_GetShaderInfoLog( sh, src, false );
            return false;
        }
        else if( glGetError() != GL_NO_ERROR )
        {
            GL_GetShaderInfoLog( sh, src, false );
        }
    }
    return true;
}

//===================================================================================

struct glsltable_t
{
    GLuint slot;
    GLchar *name;
};

// doom actually emulates immediate function modes with quite a bit of the vertex attrib calls, like glVertex3f = attrPosition or glColor3/4f = attribColor etc.
// this is also the reason our first attempts at replacing them with vertex array pointers failed,
// because those index positions are not declared in the shader at all.
// the uncommented ones below are the ones missing from the shaders,
// i only left them in in case someone wanted to make an effort in that regard.
glsltable_t interactionAttribs[] =
{
    /*{0, "attrPosition"},	// does not exist in shader
    {2, "attrNormal"},		// ditto and we have two normal indexes (one is used to get texture coordinates for skyportals)
    {3, "attrColor"},*/		// sigh...
    { 8, "attrTexCoords" },
    { 9, "attrTangents0" },
    { 10, "attrTangents1" },
    { 11, "attrNormal" }
};

/*
==================
GL_CreateGLSLProgram

Checks and creates shader programs for GLSL
Modified to throw invalid program if something fails.
==================
*/
static GLuint GL_CreateGLSLProgram( GLchar *vssrc, GLchar *fssrc, glsltable_t *attribs, GLuint numattribs )
{
    GLuint	progid;
    GLuint	vs = vssrc ? glCreateShader( GL_VERTEX_SHADER ) : INVALID_PROGRAM;
    GLuint	fs = fssrc ? glCreateShader( GL_FRAGMENT_SHADER ) : INVALID_PROGRAM;

    glGetError();

    // vertex shader failed to compile
    if( vs && vssrc && !GL_CompileShader( vs, vssrc ) )
    {
        return INVALID_PROGRAM;
    }

    // fragment shader failed to compile
    if( fs && fssrc && !GL_CompileShader( fs, fssrc ) )
    {
        return INVALID_PROGRAM;
    }
    progid = glCreateProgram();

    if( vs && vssrc )
    {
        glAttachShader( progid, vs );
    }

    if( fs && fssrc )
    {
        glAttachShader( progid, fs );
    }

    // bind attrib index numbers
    // we could actually bind the emulated ones here as well and then vertex attribs should work.
    if( attribs && numattribs )
    {
        for( GLuint i = 0; i < numattribs; i++ )
        {
            glBindAttribLocation( progid, attribs[i].slot, attribs[i].name );
        }
    }
    GLint result = GL_FALSE;

    glLinkProgram( progid );
    glGetProgramiv( progid, GL_LINK_STATUS, &result );

    glDeleteShader( vs );
    glDeleteShader( fs );

    if( result != GL_TRUE )
    {
        GL_GetShaderInfoLog( progid, "", true );
        return INVALID_PROGRAM;
    }
    return progid;
}

//===================================================================================

struct sampleruniforms_t
{
    GLchar	*name;
    GLint	binding;
};

sampleruniforms_t rb_interactionsamplers[] =
{
    { "bumpImage", 1 },
    { "lightFalloffImage", 2 },
    { "lightProjectImage", 3 },
    { "diffuseImage", 4 },
    { "specularImage", 5 }
};

/*
==================
GL_SetupSamplerUniforms
==================
*/
static void GL_SetupSamplerUniforms( GLuint progid, sampleruniforms_t *uniForms, GLuint numUniforms )
{
    // setup uniform locations - this is needed even on nvidia
    glUseProgram( progid );

    for( GLuint i = 0; i < numUniforms; i++ )
    {
        glUniform1i( glGetUniformLocation( progid, uniForms[i].name ), uniForms[i].binding );
    }
}

/*
==================
GL_GetGLSLFromFile
==================
*/
static GLchar *GL_GetGLSLFromFile( const GLchar *name )
{
    idStr	fullPath = "glprogs130/";
    fullPath += name;
    GLchar	*fileBuffer;
    GLchar	*buffer;

    if( !glConfig.isInitialized )
    {
        return NULL;
    }
    common->Printf( "%s", fullPath.c_str() );

    fileSystem->ReadFile( fullPath.c_str(), reinterpret_cast<void **>( &fileBuffer ), NULL );

    if( !fileBuffer )
    {
        common->Printf( ": File not found, using internal shaders\n" );
        return NULL;
    }

    // copy to stack memory
    buffer = reinterpret_cast<char *>( Mem_Alloc( strlen( fileBuffer ) + 1 ) );
    strcpy( buffer, fileBuffer );
    fileSystem->FreeFile( fileBuffer );

    common->Printf( "\n" );

    return buffer;
}

/*
==================
R_ReloadARBPrograms_f
==================
*/
void R_ReloadARBPrograms_f( const idCmdArgs &args )
{
    common->Printf( "----- R_ReloadARBPrograms -----\n" );

    for( int i = 0; progs[i].name[0]; i++ )
    {
        R_LoadARBProgram( i );
    }

    // load GLSL interaction programs if enabled
    if( glConfig.ARBShadingLanguageAvailable )
    {
		common->Printf( "----- Using GLSL interactions (forced) -----\n" );

		// according to khronos this might not actually delete the shader program.
        glDeleteProgram( rb_glsl_interaction_program );

		// try to load from file, use internal shader if not available.
        GLchar *vs = GL_GetGLSLFromFile( "interaction_vs.glsl" );
        GLchar *fs = GL_GetGLSLFromFile( "interaction_fs.glsl" );

        // replace ARB interaction shaders with GLSL counterparts, it is possible to use external GLSL shaders as well.
        rb_glsl_interaction_program = GL_CreateGLSLProgram( ( vs != NULL ) ? vs : interaction_vs, ( fs != NULL ) ? fs : interaction_fs, interactionAttribs, sizeof( interactionAttribs ) / sizeof( interactionAttribs[0] ) );

		// free externally loaded vertex shader.
        if( vs != NULL )
        {
            Mem_Free( vs );
        }

		// free externally loaded fragment shader.
        if( fs != NULL )
        {
            Mem_Free( fs );
        }

		// if the shader did not run into problems load it up.
        if( rb_glsl_interaction_program != INVALID_PROGRAM )
        {
            // made sure shaders are valid coming in here
            GL_SetupSamplerUniforms( rb_glsl_interaction_program, rb_interactionsamplers, sizeof( rb_interactionsamplers ) / sizeof( rb_interactionsamplers[0] ) );

			// set shader uniforms
            u_light_origin = glGetUniformLocation( rb_glsl_interaction_program, "u_light_origin" );
            u_view_origin = glGetUniformLocation( rb_glsl_interaction_program, "u_view_origin" );

            u_color_modulate = glGetUniformLocation( rb_glsl_interaction_program, "u_color_modulate" );
            u_color_add = glGetUniformLocation( rb_glsl_interaction_program, "u_color_add" );

            u_constant_diffuse = glGetUniformLocation( rb_glsl_interaction_program, "u_constant_diffuse" );
            u_constant_specular = glGetUniformLocation( rb_glsl_interaction_program, "u_constant_specular" );

            u_diffMatrix = glGetUniformLocation( rb_glsl_interaction_program, "u_diffMatrix" );
            u_bumpMatrix = glGetUniformLocation( rb_glsl_interaction_program, "u_bumpMatrix" );
            u_specMatrix = glGetUniformLocation( rb_glsl_interaction_program, "u_specMatrix" );

            u_projMatrix = glGetUniformLocation( rb_glsl_interaction_program, "u_projMatrix" );
            u_fallMatrix = glGetUniformLocation( rb_glsl_interaction_program, "u_fallMatrix" );
        }
        glUseProgram( INVALID_PROGRAM );
    }
    common->Printf( "-------------------------------\n" );
}

/*
==================
R_ARB2_Init
==================
*/
void R_ARB2_Init( void )
{
    glConfig.allowARB2Path = false;

    common->Printf( "---------- R_ARB2_Init ----------\n" );

    if( !glConfig.ARBVertexProgramAvailable || !glConfig.ARBFragmentProgramAvailable )
    {
        common->DWarning( "Not available.\n" );
        return;
    }
    common->Printf( "Available.\n" );
    common->Printf( "---------------------------------\n" );

    glConfig.allowARB2Path = true;
}

This will override the ARB2 assembly shaders with GLSL if your card supports it.
The benefit is that we can still use things like sikkmod (tested), and the internal GLSL interaction shader looks better than sikkmods.
One downside though is that we cannot use sikkpins parallax shader unless we create a cvar for turning of GLSL completely, since sikkpins parralax shader is an interaction shader.

This code was originally by michael hiney, but it had a few bugs that i been working over the years to iron out, since im not very well versed in shaders it took longer than i expected but it works rather well now.

Next thing would be GLSL for the stencil shadows, though if we want to still be able to use things like sikkmods soft shadows, it will be nessesary with a cvar to turn it of.

@revelator
Copy link

revelator commented Aug 11, 2019

Ported my work on the hybrid GLSL / ARB2 backend and uploaded a 32 bit windows build here.

(https://sourceforge.net/projects/cbadvanced/files/Game%20Projects/dhewm3-with-glsl-interactions.7z/download)

not much improvement speedwise, maybe 1 fps, it might actually gain more with things like sikkmod.
Also ported a few things from darkmod like SSE enhanced matrix operations, still need to get the SMP
code ported and then there is also the SSE enhanced Culling code from darkmod.

R_testGLSLInteractions is on by default, but if you want to toy with seeing the differences between the two render backends just set it to 0 ( you might not actually see much since its pretty close to the original ARB2 based renderer in regards to the overall look, the GLSL backend is a bit crisper though ).

@DanielGibson
Copy link
Member

However, there is no Sikkmod for dhewm3 because the C++ code is not released under GPL?

@motorsep
Copy link

I also want to preserve the look and feel of the game so the RBDoom3BFG renderer is not an option.

They are the same picture o.O (granted BFG had some tweaks, but there is nothing stopping you making it looks exactly like Doom 3 vanilla)

@revelator
Copy link

BGFX sounds like a cool project, one could also look into using openshadinglanguage from blender which compiles shaders to bytecode much like GLSL is compiled into ARB assembly internally in opengl.

Writing a decompiler for ARB assembly will probably be tough as hell, and i suspect that is why noone has done it before.

@DanielGibson
Copy link
Member

To be clear, with ARB assembly I mean these assembly-like text shaders used by Doom3, not some GPU-internal bytecode.

I don't think the converter would be super hard, at least when sticking to the "standard" ARB shader syntax of ARB_vertex_program and ARB_fragment_program (and ignoring nvidias extensions like NV_gpu_program4), I gathered some information in #15 (comment)

I mean the grammar seems pretty simple and it can't do anything fancy, not even loops or if/else (unless using nvidias extensions, which I hope Doom3 and the common mods don't do) ..

@revelator
Copy link

I hope you are right :)

@DanielGibson
Copy link
Member

I hope I ever have enough time to try it out =)

@revelator
Copy link

could upload my own compile to let you try it out :)
or if you prefer the code is pretty much plug and go, just yank the codepiece above into draw_arb2.cpp make a cvar named r_testGLSLProgram in rendersystem_init.cpp and an extern for it in tr_local.h or locally in draw_arb2.cpp compile and fire up doom3.

Test it by using reloadARBPrograms after you enabled or disabled r_testGLSLProgram it should look pretty much equal but with better lightning if using the GLSL backend.

@revelator
Copy link

Btw. There was once a codepiece in MH's backend that would spit out ARB compatible shaders from the GLSL backend.
Was pretty simple i seem to recall, but the other way around im not so sure about.
The converted GLSL shaders ran quite fine in standard doom3 also.

Atm. im looking at replacing the old GLimp gamma correction with an internal GLSL version.
Ill leave the old one intact as a fallback in case user has a very old card.

First test looks quite good tbh.

@revelator
Copy link

Ugh. just discovered something nasty...
msvc cannot detect cpu intrinsics at runtime so if you actually want AVX or AVX2 optimization you would have to pass it to the compiler by hand since cmake does not have a native function for determining this either.
So need to use /arch:AVX or /arch:AVX2 for those processors that support it, and dont enable this on those that dont or you will crash. The cpuid code in idtech4 is not enough since even if it detects AVX or AVX2 it cannot pass these macros to the compiler.
So unless someone has some code for cmake to detect these at runtime it will be up to the user to enable these flags.
Else we would have to rewrite the AVX code to inline simd assembly like it is done with SSE* and that is a pretty tall order.

@DanielGibson
Copy link
Member

DanielGibson commented Sep 21, 2020

IIRC inline assembly is 32bit (x86, not x64) only in MSVC so it's (basically) useless?

Does AVX gain big speedups in the game?
Shouldn't it be enough to set /arch:AVX(2) for SIMD_AVX.cpp and SIMD_AVX2.cpp?

(BTW, what C/C++ compiler does support runtime detection and dispatching? I think GCC doesn't support it either, so you gotta do the detection and calling the right function yourself.)

@DanielGibson
Copy link
Member

According to https://stackoverflow.com/a/13639476 something like this should work:

if(cpu STREQUAL "x86" OR cpu STREQUAL "amd64" OR cpu STREQUAL "x86_64")
  if(MSVC)
    set_source_files_properties(idlib/math/Simd_AVX.cpp PROPERTIES COMPILE_FLAGS /arch:AVX)
  else() # GCC/clang
    set_source_files_properties(idlib/math/Simd_AVX.cpp PROPERTIES COMPILE_FLAGS -mavx) 
  endif()
endif()

(untested; there might be better ways to match the target CPU)

@revelator
Copy link

revelator commented Sep 21, 2020

Aye you can get cmake to set the AVX flags but if your cpu does not support it it would crash the game :S (tried on an older intel core 2 if i set avx support on that one it would crash doom3) so what we need cmake to do would be checking capabilities with cpuid and then pass the correct flags to the solution since msvc does not support this internally, gcc does support this with -march=native so there is no need there.

Can use inline assembly with x64 though some asm structures are named differently that is why i ment that it might be quite troublesome to do.

The speed gain is not super great but noticeable on those cpu's that do support it, my guess is that TDM tried to squeeze every iota of extra speed out of idtech4 so that it would not run like total crap with all the new additions to there game.

@DanielGibson
Copy link
Member

DanielGibson commented Sep 21, 2020

I don't know what you're doing with AVX or how TDM is using it (though it works fine on CPUs without AVX support so it can't be that essential).

But if you stick to how the existing Doom3 SIMD support works, there should be no crashes.
There is that abstract idSIMDProcessor class in https://github.com/dhewm/dhewm3/blob/master/neo/idlib/math/Simd.h
and generic code (class idSIMD_Generic : public idSIMDProcessor) implementing that with plain C++ in https://github.com/dhewm/dhewm3/blob/master/neo/idlib/math/Simd_Generic.h (and .cpp).
Then there's classes for the SIMD backends (MMX, SSE, ...) that derive from idSIMD_Generic or other SIMD backends (for example, idSIMD_SSE is derived from idSIMD_MMX and idSIMD_SSE2 is derived from idSIMD_SSE), see
https://github.com/dhewm/dhewm3/blob/master/neo/idlib/math/Simd_SSE.h (and .cpp) etc
Those derived classes override some of the methods of their superclasses (the generic backend or the one for MMX or SSE or whatever) with ones for their specific instructions (like SSE2).
When the game starts, idSIMD::InitProcessor() (https://github.com/dhewm/dhewm3/blob/master/neo/idlib/math/Simd.cpp#L76) checks what the users CPU actually supports and sets the globel idSIMDProcessor* SIMDProcessor to a suitable implementation. So all the code in Doom3 that wants to use SIMD optimization calls SIMDProcessor->MatX_LowerTriangularSolveTranspose() or whatever, which then uses an implementation suitable for their CPU - so if a CPU doesn't support SSE2, it gets the SSE or MMX or generic implementation.

So if you add idSIMD_AVX : public idSIMD_SSE3 that implements its methods in neo/idlib/math/Simd_AVX.cpp (and make sure to build that file with /arch:AVX or -mavx), it should work without breaking older CPUs.

@revelator
Copy link

The problem on msvc is that it sets that flag globally (atleast on msvc 2013) so all the code needs to support AVX not just the AVX code for shadow volumes in idlib no matter what idtechs CPUID returns.

Going to try with msvc 2017 and see what happens if enabled with the /arch:AVX flag there.

Might also be a borked msvc install though i somehow doubt that.

@revelator
Copy link

test.zip
could you try this one and see what happens ? it is my test engine.

@turol
Copy link
Contributor

turol commented Sep 22, 2020

@DanielGibson GCC does support runtime dispatch. They call it function multiversioning. I'm not sure if it's possible to also apply different autovectorization options to different functions but I suspect it can be done with the optimization control pragmas.

@revelator
Copy link

Hmm setting the avx flag with msvc 2017 does indeed work even if used on older hardware as opposed to the older msvc 2013, not sure what changed in later msvc compilers.

@DanielGibson
Copy link
Member

@turol: Oh nice, I didn't know about that GCC feature!

@revelator: glad to hear it works with VS2017

I just remembered that using AVX2 can be dangerous: Several generations of Intel CPUs clock down when encountering AVX2 instructions (or, for some newer generations, "heavy" AVX2 instructions), and additionally have some kind of delay when doing the downclocking: https://gist.github.com/rygorous/32bc3ea8301dba09358fd2c64e02d774

This can significantly slow down games (both because of the delay and because of clocking the CPU down will outweigh any speedup AVX gets you unless like 99% of your code is optimized AVX code which is not realistic for games).
Sounds like one must be super careful when using AVX :-/

@revelator
Copy link

Knights end or AVX 512 is even worse in that regard :P

@revelator
Copy link

Old AVX or AVX1 works just fine my cpu cannot handle AVX2 so i cannot benchmark it (core i7 3930k) but the AVX1 path does give a few extra FPS.

Biggest difference came from the SMP changes in darkmod while not giving any uber FPS boost it certainly helped in regards to the game feeling sluggish at times with some of the heavier mods.

I could actually keep FPS at 60 with all but softshadows on in sikkmod at 1920x1080x32.

Sadly i havent gotten around to port the SMP changes to Doom3 again after github destroyed my previous working code port :S

@revelator
Copy link

Im also going to try porting BFG's multithreading to old idtech4, should get pretty interresting as Doom3 as is only uses 2 threads at max. One for the main engine and one for background downloads (kinda an odd place).

@DanielGibson
Copy link
Member

DanielGibson commented Sep 23, 2020

Doom3 also has an extra thread for (mostly) soundmixing (this AsyncTimer() thing frequently calling common->Async()), which is a lot less useful when using OpenAL

But yeah, almost all of the work is done in the main thread.

An extra thread for background downloads totally makes sense though, so things can be downloaded in the background without blocking the mainthread

@revelator
Copy link

That makes sense but does it use that many ressources ?.

Hmm since it seems to at some point have being planned by id (loads of disabled code for it),
im going to do a test and put the render backend on it's own thread.

@DanielGibson
Copy link
Member

DanielGibson commented Sep 24, 2020

It's not about using CPU resources, it's about waiting for I/O

Remember that OpenGL doesn't really like threads - that's the reason they didn't put the renderer in its own thread, https://fabiensanglard.net/doom3/renderer.php has details on that

Especially https://fabiensanglard.net/doom3/interviews.php#qrenderer

Interestingly, we only just found out last year why it was problematic (the same thing applied to Rage’s r_useSMP option, which we had to disable on the PC) – on windows, OpenGL can only safely draw to a window that was created by the same thread. We created the window on the launch thread, but then did all the rendering on a separate render thread. It would be nice if doing this just failed with a clear error, but instead it works on some systems and randomly fails on others for no apparent reason.

The Doom 4 codebase now jumps through hoops to create the game window from the render thread and pump messages on it, but the better solution, which I have implemented in another project under development, is to leave the rendering on the launch thread, and run the game logic in the spawned thread.

@revelator
Copy link

Hehe yeah i read about async :) its not even multithreading in the usual sense as you can run any number of processes on the same core with it. Looking at BFG's code it was kinda understandable as it takes quite a bit more code to split things between the cores.

Been a while since i toyed with intels tools for finding out where in the code we might get a benefit for doing that, will be interresting to see what it finds.

Read most of fabians docs but that one slipped past me... ugh

@DanielGibson
Copy link
Member

Oh wait, "It's not about using CPU resources, it's about waiting for I/O" was about the download thread

Regarding sound mixing: I can imagine that it indeed is quite expensive when mixing lots of sound sources in software, and doing it in an extra thread wasn't that hard apparently, so I think it made sense (though the way they implemented it was pretty bad and resulted in up to 100ms delay until a sound actually started playing, which made machine guns sound like they stutter; I fixed that recently).

@revelator
Copy link

Cool i think i allready stumbled upon your fix in the code.

@revelator
Copy link

Ok so async does best with I/O while sync does best when we dont have to care about I/O as it can run the tasks in parallel.
Damn i have some work to do xD

@revelator
Copy link

enabling smp also act's up in quake 4 i noticed... things look mighty weird with it on atleast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests