STR #2387: Speedup of fl_read_image on win32

STR #2387

Application:	FLTK Library
Status:	1 - Closed w/Resolution
Priority:	1 - Request for Enhancement, e.g. asking for a feature
Scope:	2 - Specific to an operating system
Subsystem:	Core Library
Summary:	Speedup of fl_read_image on win32
Version:	1.4-feature
Created By:	rokan
Assigned To:	AlbrechtS
Fix Version:	1.3.0 (SVN: v8048)
Update Notification:	Receive EMails Don't Receive EMails

Trouble Report Files:

#1	AlbrechtS 14:50 Jun 29, 2010

read_image.cxx
11k

#2	AlbrechtS 06:48 Jul 03, 2010

read_image_v2.cxx
9k

#3	AlbrechtS 06:48 Jul 03, 2010

fl_read_image_win32_test.cxx
4k

#4	AlbrechtS 06:49 Jul 03, 2010

fl_read_image_win32.diff
3k

#5	AlbrechtS 06:55 Jul 03, 2010

read_image.pdf
29k

Trouble Report Comments:

#1	rokan 07:22 Jun 27, 2010

Following is the code to speed-up fl_read_image() on windows The speed-up factor is about 200x for 960x960 images b(from 1.8s to 0.009s) unsigned char * fl_read_image_win32( unsigned char *data, int x, int y, int w, int h, int alpha){ // RK: // The implementation reads the whole array with one call to GetDIBits() // (We could save some temporary memory if we would read the data on per-line basis instead but // it would be little bit slower - though still much fasrter than per-pixel readout) // // We also need to do some manimulation as - unlike in fltk - the DIB bits have format: // 1) lines are ordered from bottom to top // 2) the channels are in BGRA order, not RGB(A) if(!data){ // allocate memory for new data if buffer not provided if(alpha) data = new unsigned char[w * h * 4]; else data = new unsigned char[w * h * 3]; } BITMAP of_info; GetObject(fl_window, sizeof(BITMAP),&of_info); BITMAPINFO bi; bi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER); bi.bmiHeader.biWidth = of_info.bmWidth; bi.bmiHeader.biHeight = of_info.bmHeight; bi.bmiHeader.biPlanes = 1; bi.bmiHeader.biBitCount = 32; bi.bmiHeader.biCompression = BI_RGB; bi.bmiHeader.biSizeImage = 0; bi.bmiHeader.biXPelsPerMeter = 0; bi.bmiHeader.biYPelsPerMeter = 0; bi.bmiHeader.biClrUsed = 0; bi.bmiHeader.biClrImportant = 0; // Assure that we are not trying to read non-existing data. // If it is so, the function should still work but the out-of-bounrady // part of the image is untouched (uninitialised if new array is allocated - so full of rubbish) int shift_x = 0; // X target shift if x modified int shift_y = 0; // Y target shift if y modified if(x < 0){ shift_x = -x; w += x; x = 0; } if(y < 0){ shift_y = -y; h += y; y = 0; } int ww = w; // We need the original width for output data line size before possible modification if(x + w > bi.bmiHeader.biWidth) w = bi.bmiHeader.biWidth - x; if(y + h > bi.bmiHeader.biHeight) h = bi.bmiHeader.biHeight - y; int line_size = ((of_info.bmWidth * bi.bmiHeader.biBitCount + 31) / 32) * 4; // Each line is four-byte aligned int dib_size = line_size * h; unsigned char * dib = new unsigned char[dib_size]; // create temporary buffer to read DIB // The obtained lines are ordered from botom-to-top so we need to read corresponding top-bottom mirrored image GetDIBits(fl_gc, (HBITMAP)fl_window, of_info.bmHeight - y - h, h, dib, (BITMAPINFO *)&bi, DIB_RGB_COLORS); if(alpha){ // four-byre result with alpha unsigned char a = alpha; for(int j = 0; j<h; j++){ const unsigned char * src = dib + (h - j - 1) * line_size + x * 4; // flipped source line unsigned char * tg = data + (j + shift_y) * 4 * ww + shift_x * 4; // target line for(int i = 0; i<w; i++){ // The channels are in BGRA order, we need to swap them to fltk RGBA standard unsigned char b = *src++; unsigned char g = *src++; unsigned char r = *src; src += 2; // skip alpha source *tg++ = r; *tg++ = g; *tg++ = b; *tg++ = a; } } }else{ // three byte result without alpha channel for(int j = 0; j<h; j++){ const unsigned char * src = dib + (h - j - 1) * line_size + x * 4; // flipped source line unsigned char * tg = data + (j + shift_y) * 3 * ww + shift_x * 3; // target line for(int i = 0; i<w; i++){ unsigned char b = *src++; unsigned char g = *src++; unsigned char r = *src; src += 2; // skip alpha source *tg++ = r; *tg++ = g; *tg++ = b; } } } delete[] dib; // delete DIB temporary buffer return data; }

#2	AlbrechtS 14:50 Jun 29, 2010

Roman, thanks for the code given. Unfortunately this doesn't work with "real" windows. I assume that you used it only with Fl_Offscreen ? The problem seems to be that fl_window can be two *different* object types: (1) for normal windows, it's an HWND type, (2) for Fl_Offscreen, it's an HBITMAP type. This is set up in fl_begin_offscreen(). Difficult to read, however, due to multiple typedefs. Thus, your code may be okay for Fl_Offscreen + fl_read_image() - I didn't try this - but it didn't work for me with normal windows. Neither GetObject() nor GetDIBits() return useful results in this case, because of the wrong argument types, AFAICT. I'm currently not able to tell how to fix this, but I think that it must be doable. Maybe we can do it with some other WinAPI calls, or we must provide additional internal info when switching to an Fl_Offscreen. I attached a test program that does fl_read_image() from a normal window. For further information, please see the comments in the file. This contains the slightly modified code (added printf's and modified 2 statements for testing, but w/o success). Sorry, it's longer than intended, and it uses global variables to make testing easier, but anyway... If someone could add a short example that does fl_read_image() from an Fl_Offscreen, this would be appreciated.

#3	AlbrechtS 03:56 Jul 03, 2010

Just to let you know: I'm working on this STR, and I'm making good progress. My current version can read both from windows and Fl_Offscreen's. It's not yet ready to post, though -- needs some more tests and cleanup...

#4	AlbrechtS 06:53 Jul 03, 2010

Good news: it works for me. Please test... For testing, save these two files - http://www.fltk.org/strfiles/2387/read_image_v2.cxx - http://www.fltk.org/strfiles/2387/fl_read_image_win32_test.cxx and compile and run read_image_v2.cxx as described in the file. You can test the old and new behavior in one program. ----- Use http://www.fltk.org/strfiles/2387/fl_read_image_win32.diff to patch the FLTK lib and use this for testing whatever you have. TIA for feedback...

#5	AlbrechtS 06:55 Jul 03, 2010

Forgot to say: file http://www.fltk.org/strfiles/2387/read_image.pdf shows how it ought to look...

#6	rokan 07:18 Jul 06, 2010

Thanks for taking care of this, looks good. One suggestion for speed-up : I would remove the if(alpha) condition from the internal loop and put it outside. But maybe compilers are good enough to optimize this?

#7	rokan 07:27 Jul 06, 2010

Also, are you not already setting alpha with memset() at the beginning of the function - thus one is redundant?

#8	AlbrechtS 07:52 Jul 06, 2010

Hmm, yes, I saw that you did this in your version, but you had to duplicate the code of the complete loop - I wanted to avoid that, and the only difference in my version would indeed be the "if (alpha)" part. I don't think that it matters much, though. A good compiler would keep the alpha value in a register, and a branch-if-zero-or-not should be very fast. Oh, I have another idea: we don't need to assign the alpha value in the inner loop, since the whole array is initialized with the alpha value before. So we could do: - before the loop: [register] int da = alpha ? 1 : 0; - in the loop: tg += da; That would be pretty fast, IMHO, wouldn't it? I'd be more worried about the multiplications in the loop (although they are used only 'h' times, whereas the "if (alpha)" happens for each pixel (w*h). They could be replaced by calculating some delta values before the loop and _adding_ them in the loops. Would this be worth a try? Another point for optimizing could be to find a way to use GetDIBits directly with the primary hdc's bitmap, but I didn't know how to do it when I wrote that code. Now I think that I found a way, but there is a conflicting problem: If we use GetDIBits directly, then we can only copy complete "scan lines". That's okay if you want to capture the whole window or Fl_Offscreen, but it's a big waste if you want to read only a small subsection of the window or offscreen.

#9	AlbrechtS 08:06 Jul 06, 2010

Note that I didn't read your last message before mine. That's why I wrote something similar about the alpha value initialization... Another way. I saw something like this in fl_draw_image or something. Would this be faster ? for (int i = 0; i<w; i++, src +=3, tg +=d) { tg[0] = src[2]; // R tg[1] = src[1]; // G tg[2] = src[0]; // B } No need to define intermediate variables, but some kind of index calculation.

#10	matt 13:22 Oct 30, 2010

Hi Albrecht. Is this ready to be committed to the SVN?

#11	AlbrechtS 07:20 Nov 02, 2010

Unfortunately not. It kinda works, but I wanted to take another approach to optimize it further, but then I got stuck. I'll take another look in a few days and see what I can do to complete it.

#12	matt 12:54 Nov 14, 2010

Not essential for the 1.3.0 release. Temporarily moved to 1.4 . This issue seems to be stuck. I am looking forward to have this in as soon as ALbrecht is ready.

#13	AlbrechtS 12:33 Dec 16, 2010

I decided to commit the existing and working patch that improves fl_read_image() on Windows significantly, so that we get the benefits of this patch into FLTK 1.3.0 (svn r8048). ------------------------------------------------------------------------ There is probably still room for improvement, therefore this STR is left open for FLTK 1.4 to keep the discussions in context. The remaining optimizations can be postponed for a later release or patch version. ------------------------------------------------------------------------ Note to myself: Use GetCurrentObject to get the Bitmap of the current DC, then GetObject to get the size. Maybe use CreateDIBSection to get the image data.

#14	AlbrechtS 17:23 Sep 19, 2014

Closing this STR now. It is resolved, and more optimizations are not likely to be done anytime soon.