FLTK logo

STR #2387

FLTK matrix user chat room
(using Element browser app)   FLTK gitter user chat room   GitHub FLTK Project   FLTK News RSS Feed  
  FLTK Apps      FLTK Library      Forums      Links     Login 
 Home  |  Articles & FAQs  |  Bugs & Features  |  Documentation  |  Download  |  Screenshots  ]
 

Return to Bugs & Features | SVN ⇄ GIT ]

STR #2387

Application:FLTK Library
Status:1 - Closed w/Resolution
Priority:1 - Request for Enhancement, e.g. asking for a feature
Scope:2 - Specific to an operating system
Subsystem:Core Library
Summary:Speedup of fl_read_image on win32
Version:1.4-feature
Created By:rokan
Assigned To:AlbrechtS
Fix Version:1.3.0 (SVN: v8048)
Update Notification:

Receive EMails Don't Receive EMails

Trouble Report Files:


Name/Time/Date Filename/Size  
 
#1 AlbrechtS
14:50 Jun 29, 2010
read_image.cxx
11k
 
 
#2 AlbrechtS
06:48 Jul 03, 2010
read_image_v2.cxx
9k
 
 
#3 AlbrechtS
06:48 Jul 03, 2010
fl_read_image_win32_test.cxx
4k
 
 
#4 AlbrechtS
06:49 Jul 03, 2010
fl_read_image_win32.diff
3k
 
 
#5 AlbrechtS
06:55 Jul 03, 2010
read_image.pdf
29k
 
     

Trouble Report Comments:


Name/Time/Date Text  
 
#1 rokan
07:22 Jun 27, 2010
Following is the code to speed-up fl_read_image() on windows
The speed-up factor is about 200x for 960x960 images b(from 1.8s to 0.009s)

unsigned char * fl_read_image_win32( unsigned char *data, int x, int y, int w, int h, int alpha){
  // RK:
  // The implementation reads the whole array with one call to GetDIBits()
  // (We could save some temporary memory if we would read the data on per-line basis instead but
  // it would be little bit slower - though still much fasrter than per-pixel readout)
  //
  // We also need to do some manimulation as - unlike in fltk - the DIB bits have format:
  // 1) lines are ordered from bottom to top
  // 2) the channels are in BGRA order, not RGB(A)


  if(!data){ // allocate memory for new data if buffer not provided
    if(alpha)
      data = new unsigned char[w * h * 4];
    else
      data = new unsigned char[w * h * 3];
  }




  BITMAP of_info;
  GetObject(fl_window, sizeof(BITMAP),&of_info);

  BITMAPINFO   bi;
  bi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
  bi.bmiHeader.biWidth = of_info.bmWidth;
  bi.bmiHeader.biHeight = of_info.bmHeight;
  bi.bmiHeader.biPlanes = 1;
  bi.bmiHeader.biBitCount = 32;
  bi.bmiHeader.biCompression = BI_RGB;
  bi.bmiHeader.biSizeImage = 0;
  bi.bmiHeader.biXPelsPerMeter = 0;
  bi.bmiHeader.biYPelsPerMeter = 0;
  bi.bmiHeader.biClrUsed = 0;
  bi.bmiHeader.biClrImportant = 0;

  // Assure that we are not trying to read non-existing data.
  // If it is so, the function  should still work but the out-of-bounrady
  // part of the image is untouched (uninitialised if new array is allocated - so full of rubbish)

  int shift_x = 0; // X target shift if x modified
  int shift_y = 0; // Y target shift if y modified
  if(x < 0){
    shift_x = -x;
    w += x;
    x = 0;
  }
  if(y < 0){
    shift_y = -y;
    h += y;
    y = 0;
  }
  int ww = w; // We need the original width for output data line size before possible modification

  if(x + w > bi.bmiHeader.biWidth)
     w = bi.bmiHeader.biWidth - x;
  if(y + h > bi.bmiHeader.biHeight)
     h = bi.bmiHeader.biHeight - y;


  int line_size = ((of_info.bmWidth * bi.bmiHeader.biBitCount + 31) / 32) * 4; // Each line is four-byte aligned



  int dib_size =  line_size * h;
  unsigned char * dib = new unsigned char[dib_size]; // create temporary buffer to read DIB

  // The obtained lines are ordered from botom-to-top so we need to read corresponding top-bottom mirrored image
  GetDIBits(fl_gc, (HBITMAP)fl_window, of_info.bmHeight - y - h, h, dib, (BITMAPINFO *)&bi, DIB_RGB_COLORS);

  if(alpha){ // four-byre result with alpha
    unsigned char a = alpha;
    for(int j = 0; j<h; j++){
      const unsigned char * src = dib +  (h - j - 1) * line_size  + x * 4; // flipped source line
      unsigned char * tg = data + (j + shift_y) * 4 * ww + shift_x * 4; // target line
      for(int i = 0; i<w; i++){
        // The channels are in BGRA order, we need to swap them to fltk RGBA standard
        unsigned char b = *src++;
        unsigned char g = *src++;
        unsigned char r = *src; src += 2; // skip alpha source
        *tg++ = r;
        *tg++ = g;
        *tg++ = b;
        *tg++ = a;
      }
    }
  }else{ // three byte result without alpha channel
    for(int j = 0; j<h; j++){
      const unsigned char * src = dib + (h - j - 1)  * line_size + x * 4; // flipped source line
      unsigned char * tg = data + (j + shift_y) * 3 * ww + shift_x * 3; // target line
      for(int i = 0; i<w; i++){
        unsigned char b = *src++;
        unsigned char g = *src++;
        unsigned char r = *src; src += 2; // skip alpha source
        *tg++ = r;
        *tg++ = g;
        *tg++ = b;
      }
    }
  }
  delete[] dib; // delete DIB temporary buffer
  return data;
}
 
 
#2 AlbrechtS
14:50 Jun 29, 2010
Roman, thanks for the code given. Unfortunately this doesn't work with "real" windows. I assume that you used it only with Fl_Offscreen ?

The problem seems to be that fl_window can be two *different* object types:
 (1) for normal windows, it's an HWND type,
 (2) for Fl_Offscreen, it's an HBITMAP type.

This is set up in fl_begin_offscreen(). Difficult to read, however, due to multiple typedefs.

Thus, your code may be okay for Fl_Offscreen + fl_read_image() - I didn't try this - but it didn't work for me with normal windows. Neither GetObject() nor GetDIBits() return useful results in this case, because of the wrong argument types, AFAICT. I'm currently not able to tell how to fix this, but I think that it must be doable. Maybe we can do it with some other WinAPI calls, or we must provide additional internal info when switching to an Fl_Offscreen.

I attached a test program that does fl_read_image() from a normal window. For further information, please see the comments in the file. This contains the slightly modified code (added printf's and modified 2 statements for testing, but w/o success).

Sorry, it's longer than intended, and it uses global variables to make testing easier, but anyway...

If someone could add a short example that does fl_read_image() from an Fl_Offscreen, this would be appreciated.
 
 
#3 AlbrechtS
03:56 Jul 03, 2010
Just to let you know: I'm working on this STR, and I'm making good progress. My current version can read both from windows and Fl_Offscreen's. It's not yet ready to post, though -- needs some more tests and cleanup...  
 
#4 AlbrechtS
06:53 Jul 03, 2010
Good news: it works for me. Please test...

For testing, save these two files
 - http://www.fltk.org/strfiles/2387/read_image_v2.cxx
 - http://www.fltk.org/strfiles/2387/fl_read_image_win32_test.cxx

and compile and run read_image_v2.cxx as described in the file. You can test the old and new behavior in one program.

-----

Use http://www.fltk.org/strfiles/2387/fl_read_image_win32.diff to patch the FLTK lib and use this for testing whatever you have.

TIA for feedback...
 
 
#5 AlbrechtS
06:55 Jul 03, 2010
Forgot to say: file http://www.fltk.org/strfiles/2387/read_image.pdf shows how it ought to look...  
 
#6 rokan
07:18 Jul 06, 2010
Thanks for taking care of this, looks good.
One suggestion for speed-up : I would remove the
  if(alpha)
condition from the internal loop and put it outside.
But maybe compilers are good enough to optimize this?
 
 
#7 rokan
07:27 Jul 06, 2010
Also, are you not already setting alpha with memset() at the beginning of the function - thus one is redundant?  
 
#8 AlbrechtS
07:52 Jul 06, 2010
Hmm, yes, I saw that you did this in your version, but you had to duplicate the code of the complete loop - I wanted to avoid that, and the only difference in my version would indeed be the "if (alpha)" part. I don't think that it matters much, though. A good compiler would keep the alpha value in a register, and a branch-if-zero-or-not should be very fast.

Oh, I have another idea: we don't need to assign the alpha value in the inner loop, since the whole array is initialized with the alpha value before. So we could do:

 - before the loop:
    [register] int da = alpha ? 1 : 0;
 - in the loop:
    tg += da;

That would be pretty fast, IMHO, wouldn't it?

I'd be more worried about the multiplications in the loop (although they are used only 'h' times, whereas the "if (alpha)" happens for each pixel (w*h). They could be replaced by calculating some delta values before the loop and _adding_ them in the loops. Would this be worth a try?

Another point for optimizing could be to find a way to use GetDIBits directly with the primary hdc's bitmap, but I didn't know how to do it when I wrote that code. Now I think that I found a way, but there is a conflicting problem: If we use GetDIBits directly, then we can only copy complete "scan lines". That's okay if you want to capture the whole window or Fl_Offscreen, but it's a big waste if you want to read only a small subsection of the window or offscreen.
 
 
#9 AlbrechtS
08:06 Jul 06, 2010
Note that I didn't read your last message before mine. That's why I wrote something similar about the alpha value initialization...

Another way. I saw something like this in fl_draw_image or something. Would this be faster ?

   for (int i = 0; i<w; i++, src +=3, tg +=d) {
      tg[0] = src[2]; // R
      tg[1] = src[1]; // G
      tg[2] = src[0]; // B
   }

No need to define intermediate variables, but some kind of index calculation.
 
 
#10 matt
13:22 Oct 30, 2010
Hi Albrecht. Is this ready to be committed to the SVN?  
 
#11 AlbrechtS
07:20 Nov 02, 2010
Unfortunately not. It kinda works, but I wanted to take another approach to optimize it further, but then I got stuck.

I'll take another look in a few days and see what I can do to complete it.
 
 
#12 matt
12:54 Nov 14, 2010
Not essential for the 1.3.0 release. Temporarily moved to 1.4 .

This issue seems to be stuck. I am looking forward to have this in as soon as ALbrecht is ready.
 
 
#13 AlbrechtS
12:33 Dec 16, 2010
I decided to commit the existing and working patch that improves
fl_read_image() on Windows significantly, so that we get the benefits of
this patch into FLTK 1.3.0 (svn r8048).

------------------------------------------------------------------------
There is probably still room for improvement, therefore this STR is left
open for FLTK 1.4 to keep the discussions in context. The remaining
optimizations can be postponed for a later release or patch version.
------------------------------------------------------------------------

Note to myself: Use GetCurrentObject to get the Bitmap of the current DC,
then GetObject to get the size. Maybe use CreateDIBSection to get the
image data.
 
 
#14 AlbrechtS
17:23 Sep 19, 2014
Closing this STR now.

It is resolved, and more optimizations are not likely to be done anytime soon.
 
     

Return to Bugs & Features ]

 
 

Comments are owned by the poster. All other content is copyright 1998-2024 by Bill Spitzak and others. This project is hosted by The FLTK Team. Please report site problems to 'erco@seriss.com'.