Hi, I've just started evaluating TwinCompile and I have a few questions.
I have a large app currently in BCB6 which would normally take 45 mins to compile. Up to now I have been using Andreas Hausladen's Bcc32pch which brings the compile time down to 1m:54s (using a single thread) and have been very happy with this. But recently I have wondered about TC for a couple of reasons. 1. I'm considering upgrading to Tokyo and there is no Bcc32pch alternative and 2. The latest processors are very appealing and I'd like to be able to throw 18 cores at the compilation and see what it can do.
I've tried the TC demo on BCB6 and note that the single core performance is not far off Bcc32pch, but I was surprised at how little difference it made beyond adding 2 cores.
Timings are
1 core 2m:27
2 cores 2m:03
4 cores 1m:47
6 cores 1m:37
It seems that the benefits very rapidly tail off.
So my questions are:-
1. Do those results look reasonable or might I be doing something wrong? I'm compiling using the Ide and I'm happy that each cpu was 100% saturated as it was used. I tested multiple times and tried all the TC options, the above results were the best I could get and were repeatable.
2. If they are correct then just where is the bottleneck occurring? Based on the above, it doesn't seem worthwhile going above 6 (or even 4) cores, never mind 18.
3. Could I expect to see a different pattern if/when I upgrade to Tokyo.
I look forward to your thoughts.
Des
Effect of multiple processors
Re: Effect of multiple processors
Compiling speed is very much dependant on the project structure and configuration. 90% of the time, going from 1 thread to 2 will halve the compile time, and then each subsequent thread will reduce it further with diminishing returns as the disk becomes the bottleneck. I have a 6 core (with hyper-threading) machine, and one of my biggest test projects compiles about 8 times faster when I use all 12 logical processors as opposed to using just one.
I would expect similar behaviour in your case, so these times look wrong. So, I have a couple questions about your project and machine:
1. Does the application have a proper pre-compiled header setup?
2. How many files are in each project?
3. What sort of disks do you have in your machine? Are they magnetic/spindles or SSDs?
I would expect similar behaviour in your case, so these times look wrong. So, I have a couple questions about your project and machine:
1. Does the application have a proper pre-compiled header setup?
2. How many files are in each project?
3. What sort of disks do you have in your machine? Are they magnetic/spindles or SSDs?
Jon
Re: Effect of multiple processors
Hi Jon, thanks for the quick reply, the speedups that you refer to are very encouraging.
To answer your questions:-
1. Each file has the broadly equivalent starting section
/***************** STANDARD FILE HEADER - START ************************/
#pragma hdrfile "winprop.csm"
#include "std_head.inc"
#pragma hdrstop
/***************** STANDARD FILE HEADER - END ************************/
This produces a winprop.csm of approx 39MB
2. There are 580 files in the project.
3. The whole environment is being run on a VM stored on ssd. I'm happy that the cpu usage that I see in the vm corresponds with actual cpu usage on the host machine.
Following on from your encouraging response, I've been doing a bit more investigation. One thing that looks interesting, is that in the TC logs there are 394 entries of "Loaded pre-compiled headers" for the 580 files. Is this relevant?
Regards
Des
To answer your questions:-
1. Each file has the broadly equivalent starting section
/***************** STANDARD FILE HEADER - START ************************/
#pragma hdrfile "winprop.csm"
#include "std_head.inc"
#pragma hdrstop
/***************** STANDARD FILE HEADER - END ************************/
This produces a winprop.csm of approx 39MB
2. There are 580 files in the project.
3. The whole environment is being run on a VM stored on ssd. I'm happy that the cpu usage that I see in the vm corresponds with actual cpu usage on the host machine.
Following on from your encouraging response, I've been doing a bit more investigation. One thing that looks interesting, is that in the TC logs there are 394 entries of "Loaded pre-compiled headers" for the 580 files. Is this relevant?
Regards
Des
Re: Effect of multiple processors
Ok, a bit more investigating...
As it looks to me like a constant lag might be being added to each unit compile, I wondered about disk access.
I set process explorer to measure the data being read from ssd during a build (compilation only, no linking), using 5 cores I got.
mtbcc32-1.exe 1.853 GB
mtbcc32-2.exe 3.520 GB
mtbcc32-3.exe 3.452 GB
mtbcc32-4.exe 3.410 GB
mtbcc32-5.exe 3.281 GB
=======================
total: 15.5GB
My source code (cpp,hpp,dfm) for 580 files adds up to about 85MB so that is negligible
But the pch header info (394 x 39MB = 15GB) would account for this.
So is it possible that the pch is being loaded for each unit compiled (perhaps in all but the original thread) rather than being cached?, would that be sufficient to cause such a slowdown? Or are these figures looking as expected?
Des
As it looks to me like a constant lag might be being added to each unit compile, I wondered about disk access.
I set process explorer to measure the data being read from ssd during a build (compilation only, no linking), using 5 cores I got.
mtbcc32-1.exe 1.853 GB
mtbcc32-2.exe 3.520 GB
mtbcc32-3.exe 3.452 GB
mtbcc32-4.exe 3.410 GB
mtbcc32-5.exe 3.281 GB
=======================
total: 15.5GB
My source code (cpp,hpp,dfm) for 580 files adds up to about 85MB so that is negligible
But the pch header info (394 x 39MB = 15GB) would account for this.
So is it possible that the pch is being loaded for each unit compiled (perhaps in all but the original thread) rather than being cached?, would that be sufficient to cause such a slowdown? Or are these figures looking as expected?
Des
Re: Effect of multiple processors
Because the pre-compiled header file is so large, it is not cached and will be read off disk for each file compiled. TwineCompile only caches the source and header files in RAM. The SSD should be fast enough to deliver this file fast enough to have minimal effect on the compile time, but it's possible that the VM layer is slowing things down considerably.
The "Loading pre-compiled headers" for each file is a good sign. However, for a pre-compiled header to work universally for all the files, it must only be created by the first file compiled, and each subsequent file must just read it, not create it (due to the Borland/Embarcadero compiler not supporting simultaneous access to the CSM).
So, couple more questions about your project setup:
1. Can you tell me what the PCH settings in the project are set to? (a screenshot is perfectly fine)
2. Is there a reason that you put the name of the CSM file in as a header and not via the project options?
The "Loading pre-compiled headers" for each file is a good sign. However, for a pre-compiled header to work universally for all the files, it must only be created by the first file compiled, and each subsequent file must just read it, not create it (due to the Borland/Embarcadero compiler not supporting simultaneous access to the CSM).
So, couple more questions about your project setup:
1. Can you tell me what the PCH settings in the project are set to? (a screenshot is perfectly fine)
2. Is there a reason that you put the name of the CSM file in as a header and not via the project options?
Jon
Re: Effect of multiple processors
Hi Jon, thankfully I've got it all working properly now
The problem seemed to be down to intermittent loading of the pch file. According to the logs (and borne out by the i/o read statistics) with 2 cores it was being loaded 568 times for 580 files, but with 5 cores it was only being loaded 394 times, the remaining times, compiles would have had to suffer a complete recompile of all the headers. So I'm guessing that there was some sort of read conflict going on as the thread count increased.
The solution should have been simple, just use the option to have a separate pch per thread, but unfortunately this didn't work when first tested as it doesn't work with my manual naming of the pch header in my files. When I deleted those and set the name in the project options instead, it all worked properly (and impressively).
The only thing now remaining that doesn't seem to work properly is with the file caching option. When it is turned on, the compiler seems to get confused about which version of a file it is to compile, resulting in very confusing errors and warnings. This may well have nothing to do with TC, and could well be caused by my old version of Builder and its many add-ons. I'm not concerned about this as I'll be turning the option off, but I'm just mentioning it in case anyone else spots the same behaviour.
Thanks for your help and I look forward to placing an order soon.
Des
The problem seemed to be down to intermittent loading of the pch file. According to the logs (and borne out by the i/o read statistics) with 2 cores it was being loaded 568 times for 580 files, but with 5 cores it was only being loaded 394 times, the remaining times, compiles would have had to suffer a complete recompile of all the headers. So I'm guessing that there was some sort of read conflict going on as the thread count increased.
The solution should have been simple, just use the option to have a separate pch per thread, but unfortunately this didn't work when first tested as it doesn't work with my manual naming of the pch header in my files. When I deleted those and set the name in the project options instead, it all worked properly (and impressively).
The only thing now remaining that doesn't seem to work properly is with the file caching option. When it is turned on, the compiler seems to get confused about which version of a file it is to compile, resulting in very confusing errors and warnings. This may well have nothing to do with TC, and could well be caused by my old version of Builder and its many add-ons. I'm not concerned about this as I'll be turning the option off, but I'm just mentioning it in case anyone else spots the same behaviour.
Thanks for your help and I look forward to placing an order soon.
Des
Re: Effect of multiple processors
Glad you were able to figure it out. My next step would have been to suggest removing that explicit naming of the CSM file, and enabling PCH injection to guarantee your PCH include in every file, or switching on the option to give each thread its own PCH file.
I am surprised you are having issues with file caching. If you have some add-ons that interfere with TwineCompile's saving the project/files to disk at compile time, then I could understand the problems. I would suggest removing bcc32pch from the IDE anyway because it can conflict with TwineCompile (they both perform similar IDE optimizations).
I am surprised you are having issues with file caching. If you have some add-ons that interfere with TwineCompile's saving the project/files to disk at compile time, then I could understand the problems. I would suggest removing bcc32pch from the IDE anyway because it can conflict with TwineCompile (they both perform similar IDE optimizations).
Jon