[Openmcl-devel] Fwd: freshly built wx86cl64.exe crashes on start

Tim McNerney mc at media.mit.edu
Fri Dec 30 05:59:47 PST 2022


Bharat,

I should first thank you for taking on this CCL compilation project. It is important to keep up with OS changes. Your efforts are much appreciated. 

My partial understanding is that ASLR is for protecting code from attack. The other Lisp (Franz) was intolerant to address space fragmentation caused by ASLR, but I suspect there may be other ways ASLR can potentially violate assumptions made by a garbage collected language. They said, something else may be going on here. 

I feel like a working baseline would help you. By that I mean, start by finding an OS environment where the compilation does work, and compare to the broken results, striving to reduce variables. 

If you don’t have enough hardware to, say, run both Windows 10 and Windows 7 simultaneously, use a free or cheap VM (e.g. VMWare Player). I know AWS charges by the hour of usage (so not free), but they have a large selection of prebuilt images that will save you time. Or, since you are focusing on Windows, maybe Azure offers the same benefit. Maybe they offer promotional prices for developers. 

It is possible you have found an incompatibility with the gnu/cygwin tool chain (maybe little to do with Windows). Explore which version works for CCL. Again, establishing a baseline that gives you working code, and inching forward from there. 

Good luck, or in Gerry Sussman’s words “good skill!”

--Tim

> On Dec 30, 2022, at 07:55, Bharat Shetty <bshetty at gmail.com> wrote:
> 
> 
> Hi,
> 
> I had not mentioned previously the reason I had added -no-pie. The executable used to wperror out after calling VirtualProtect in pmcl-kernel.c:remap_spjump() function. The error was ERROR_INVALID_ADDRESS 487 (0x1E7). After I added -no-pie (strangely -Wl,--no-pie or -fno-pie doesnt work) this got sorted. Some people say this disables ASLR however gnu ld doc is not very clear on this. However VirtualProtect working means this is needed.
> 
> Today i added the following flags 
> -Wl,--disable-high-entropy-va  ;; --high-entropy-va - Image is compatible with 64-bit address space layout randomization (ASLR).This option is enabled by default for 64-bit PE images.
> -Wl,--disable-dynamicbase ;; --dynamicbase - The image base address may be relocated using address space layout randomization (ASLR). This feature was introduced with MS Windows Vista for i386 PE targets.
> -Wl,--disable-nxcompat ;; --nxcompat - The image is compatible with the Data Execution Prevention.
> However these do not change anything and the wx86cl64 crashes at exactly the same position(calculate_relocation () at ../x86-gc.c:1571). When I debug the executable the image base address(0x12000) , text start address(0x21000) amongst other addresses are the same across multiple executions. Would these not change if ASLR is active?
> 
> This may not be related to ASLR as ASLR changes the image base and section start addresses at load time. As far as I understand ASLR does not change the base during execution. Whatever is causing this is changed behaviour of gcc/ld. Microsoft site clearly mention ASLR is to be opted in by the developer. If this was an OS change the downloaded bits should have the same issue. Another question is if ASLR in Windows were to cause re-initialisation of globals who could use it ?
> 
> Regards,
> Bharat
> 
> 
>> On Fri, Dec 30, 2022 at 3:05 AM Carl Shapiro <carl.shapiro at gmail.com> wrote:
>> The feature is known as ASLR in both Windows and Linux
>> 
>> https://learn.microsoft.com/en-us/windows/security/threat-protection/overview-of-threat-mitigations-in-windows-10#address-space-layout-randomization
>> 
>>> On Thu, Dec 29, 2022 at 12:43 PM Tim McNerney <mc at media.mit.edu> wrote:
>>> I wonder if you need to turn off Windows 10’s new-ish, nondeterministic memory allocation policy. We have run into this with other Lisps. I don’t remember the correct terminology for this malware countermeasure or the name of the configuration flag. Sorry. Can someone else chime in?
>>> 
>>> --Tim
>>> 
>>>>> On Dec 29, 2022, at 13:49, Bharat Shetty <bshetty at gmail.com> wrote:
>>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> I built ccl (downloaded the 1.12.1 zip file from github)on 
>>>> Windows 10 (cygwin)
>>>> gcc version 11.3.0 
>>>> ld/binutils version 2.39
>>>> debug flag changed to -g3 in Makefile
>>>> code optimisation level set to -O0 (zero) also in Makefile
>>>> 
>>>> When the exe was built i got a message that section /1, /8 and /32 are before text. I altered the pei-x86-64.x to include the new debug sections(upto dwarf 5). For this i generated the default script running ld --verbose. retained .Copied .spfoo from the original, removed KEEP() and most of the SORT() unless it was present in the original file. Besides this I had to add -no-pie and -Wl,--allow-multiple-definition to the build rule for wx86cl64.exe target. I have not made any changes to the source code. This got an exe that starts. 
>>>> 
>>>> However every time I run this, it crashes at calculate_relocation in x86-gc.c. The back trace is as follows:
>>>> #0  0x0000000000031d56 in calculate_relocation () at ../x86-gc.c:1571
>>>> #1  0x000000000002ea15 in gc (tcr=0x5acebc0, param=0) at ../gc-common.c:1821
>>>> #2  0x000000000003a170 in gc_from_tcr (tcr=0x5acebc0, param=0) at ../x86-exceptions.c:3014
>>>> #3  0x000000000003a06b in gc_like_from_xp (xp=0x25f6f570, fun=0x3a126 <gc_from_tcr>, param=0) at ../x86-exceptions.c:2970
>>>> #4  0x000000000003a1ce in gc_from_xp (xp=0x25f6f570, param=0) at ../x86-exceptions.c:3026
>>>> #5  0x0000000000035c0f in allocate_object (xp=0x25f6f570, bytes_needed=16, disp_from_allocptr=13, tcr=0x5acebc0, crossed_threshold=0x25f6f1ec) at ../x86-exceptions.c:171
>>>> #6  0x0000000000036f26 in handle_alloc_trap (xp=0x25f6f570, tcr=0x5acebc0, notify=0x25f6f1ec) at ../x86-exceptions.c:665
>>>> #7  0x0000000000037f15 in handle_exception (signum=11, info=0x25f6f4d8, context=0x25f6f570, tcr=0x5acebc0, old_valence=0) at ../x86-exceptions.c:1215
>>>> #8  0x0000000000038cf3 in windows_exception_handler (exception_pointers=0x25f6f4c0, tcr=0x5acebc0, signal_number=11) at ../x86-exceptions.c:2150
>>>> #9  0x00000000000438dd in windows_switch_to_foreign_stack () at ../x86-asmutils64.s:263
>>>> #10 0x0000000025f6f4c0 in ?? ()
>>>> 
>>>> This happens after start_lisp is called. By the time the code reaches gc.c global_reloctab is reset (set to 0x7cfe000000 before entering start_lisp). Due to this GCrelocptr is also set to 0x0 in gc-common.c. This results in relocptr being set to 0x0 in calculate_relocation. Also GCndynamic_dnodes_in_area is also 0 at this point(in the downloaded version it is 2048 at this point). 
>>>> 
>>>> Does anyone know why these global variables are reset ? And how can I fix this? I suspect this is because of the newer versions of gcc and ld.
>>>> 
>>>> Regards,
>>>> Bharat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20221230/4130ecaa/attachment.htm>


More information about the Openmcl-devel mailing list