Warning: Function ereg() is deprecated in ..../includes/class_postbit.php(345) : eval()'d code on line 4

Warning: Function split() is deprecated in ..../includes/class_postbit.php(345) : eval()'d code on line 19

Warning: Function ereg() is deprecated in ..../includes/class_postbit.php(345) : eval()'d code on line 4

Warning: Function split() is deprecated in ..../includes/class_postbit.php(345) : eval()'d code on line 19
For all you speed freaks
Results 1 to 2 of 2

Thread: For all you speed freaks

  1. #1
    Advisor
    Join Date
    Apr 2004
    Location
    orlando
    Posts
    608

    For all you speed freaks

    I think we should have an optimization thread.... i don't want a debate over asm-optimization vs algorithmical optimization, they both have their place (and yes, algorithmical should _always_ come first, but asm has it's spot too)

    so yea, i'm gonna start one, i'll post an SSE cross-product routine below, and i'm gonna work on a matrix mult (4x4 * 4x4)
    routine using SSE next (god, a decent dot product would be nice... fsck intel and their dumb choices)

    Anyone else got _any_ cool optimization tricks, put 'em in here and show us what you got (benchmark data would be cool too)

    EDIT:

    Oh yea, feedback! (I know my sh*t could be a lot better, i'm just incompetent and still learning ;p)

  2. #2
    Advisor
    Join Date
    Apr 2004
    Location
    orlando
    Posts
    608
    I'm posting both my C code and the ASM code, both assume

    Code:
    typedef float vector4[4] __attribute__ ((aligned (16)));
    #define vector3 vector4
    benching using the rdtsc instruction gives me 92 clock cycles for the C version
    and 52 clock cycles for the asm version, both inlined (the C one by hand)

    Code:
    void    vector3_cross ( vector3 src, vector3 dest )
    {
            vector3 buf = {
                    src[1]*dest[2] - src[2]*dest[1],
                    src[0]*dest[2] - src[2]*dest[0],
                    src[0]*dest[1] + src[1]*dest[0]
            };
    
            vector3_copy(buf, dest);
    }
    Code:
    static float neg_one = -1.0f
    
    ...
    
    QMATH void vector3_cross ( vector3 src, vector3 dest )
    {
            /*
             * the shufps imm value is a bit confusing,
             * to get it:
             *
             * write out the reverse of what you want to 
             * rearrange the vector as
             *
             * (src) y x x w = w x x y
             *
             * x = 0 ... w = 3
             *
             * so:
             * w x x y = 3 0 0 1
             *
             * shift needed:
             * 3 0 0 1 *
             * 4 1 4 1
             * --------
             * c 0 0 1
             *
             * combine
             * c + 0 = c
             * 0 + 1 = 1
             */
            __asm__ volatile (
                    "movaps %[src], %%xmm0\n\t"
                    "movaps %[dest], %%xmm2\n\t"
                    "movaps %[src], %%xmm1\n\t"
                    "movaps %[dest], %%xmm3\n\t"
                    "shufps $0xc1, %%xmm0, %%xmm0\n\t"
                    "shufps $0xda, %%xmm2, %%xmm2\n\t"
                    "shufps $0xda, %%xmm1, %%xmm1\n\t"
                    "shufps $0xc1, %%xmm3, %%xmm3\n\t"
                    "mulps  %%xmm0, %%xmm2\n\t"
                    "mulps  %%xmm1, %%xmm3\n\t"
                    "subps  %%xmm3, %%xmm2\n\t"
                    "shufps $0xb5, %%xmm2, %%xmm2\n\t"
                    "mulss  %[neg_one], %%xmm2\n\t"
                    "shufps $0xb5, %%xmm2, %%xmm2\n\t"
                    "movaps %%xmm2, %[dest]\n\t"
    
                    : [dest] "+m" (*dest)
                    : [src] "m" (*src), [neg_one] "m" (neg_one)
                    : "xmm0", "xmm1", "xmm2", "xmm3"
    
            );
    }

Similar Threads

  1. Speed.
    By XGames in forum Windows - General Topics
    Replies: 7
    Last Post: 09-06-2006, 01:09 PM
  2. CPU Fan Speed
    By mojo jojo in forum Linux - Hardware, Networking & Security
    Replies: 8
    Last Post: 12-10-2003, 09:08 PM
  3. How do I tell the CPU speed?
    By datamike in forum Linux - General Topics
    Replies: 12
    Last Post: 12-22-2002, 08:51 PM
  4. Speed up VNC
    By Blaqb0x in forum Linux - Software, Applications & Programming
    Replies: 1
    Last Post: 10-11-2002, 05:47 PM
  5. I knew the net was full of freaks but...
    By NGene in forum General Chat
    Replies: 16
    Last Post: 07-19-2002, 04:59 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •