[Openmcl-devel] Fwd: freshly built wx86cl64.exe crashes on start

Bharat Shetty bshetty at gmail.com
Wed Jan 4 15:15:41 PST 2023

Since two days wx86cl64.exe has been behaving erratically (both the version
i downloaded and built using gccv4.7.1) it has been crashing randomly at
startup and emacs is unable to start it with slime. I suspect this might be
to do with some security patches installed.

So I looked into the windows security controls. Turns out windows defender
lets us configure "exploit protection setting" by configuring the following

   - control flow guard CFG
   - Data Execution Prevention DEP
   - Mandatory ASLR (force randomisation for images - force relocation of
   images not compiled with Bottom-up ASLR ) -- off by default for now
   - Randomise memory allocation (Bottom-up ASLR) -- on by default
   - High Entropy ASLR - needs Bottom-up ASLR to be ON
   - validate execution chains (SEHOP)
   - validate heap integrity - terminate process when heap corruption os

I observed we can get wxcl8664 to run with 'Mandatory ASLR' and 'High
Entropy ASLR' turned off and with all other options enabled. So even if gcc
were to enable us to build non PIE position independant executable, it is
just a matter of time before no-pie apps and ccl stops running on windows.

The only way we can keep ccl running is making the code relocatable (PIE)
at the earliest. The bright spot is it still runs on linux :)


On Tue, Jan 3, 2023 at 4:17 PM Bharat Shetty <bshetty at gmail.com> wrote:

> The following flags are needed by the new (gccv11.3)compiler driver to
> build lisp(alongwith changes to the linker script)
> -no-pie -Wl,--disable-reloc-section -Wl,--allow-multiple-definition
> However the crashes still happen. I will raise a support call with
> sourceware on this.
> Regards,
> Bharat
> On Sat, Dec 31, 2022 at 4:19 AM Tim McNerney <mc at media.mit.edu> wrote:
>> That’s very good news! Now all we need to figure out is either a) how to
>> coax the new GNU tool chain into behaving or b) how we can help the GNU
>> maintainers reproduce the problem. That CCL is open source helps a lot.
>> --Tim
>> On Dec 30, 2022, at 14:16, Bharat Shetty <bshetty at gmail.com> wrote:
>> I just built and ran ccl successfully on Windows 10 using an old version
>> of gcc from https://sysprogs.com/getfile/31/mingw64-gcc4.7.1.exe. Have
>> not extensively tested this (like building images or running test code) but
>> still this is important. At Least we know the issues are due to the newer
>> toolchain. We need to fix this with the latest version of gcc in
>> the future.
>> This has gcc4.7.1 and binutils2.23.
>> For the build I restored the original makefile and hence makefile
>> doesn't have options to disable pie/dynamicbase/high_entropy/nxcompat. And
>> made the following changes to makefile:
>>    - set CC to the older gcc4.7.1
>>    - set LD to the older one from binutils2.23
>>    - set AS to the (still new) one from binutils2.39 -- the one from
>>    binutils2.23 was crashing
>>    - set -g3, -O0 options(just to help me debug better)
>> Using older AS should be OK as it is only invoked directly to compile the
>> .s sources
>> Regards,
>> Bharat
>> On Fri, Dec 30, 2022 at 11:12 PM Bharat Shetty <bshetty at gmail.com> wrote:
>>> Hi Tim,
>>> Thanks to all the good people who gave us ccl. And thank you and Carl
>>> for responding it makes me feel less lost :D
>>> Luckily for me I have an additional old laptop with Windows7. I will be
>>> back at times with a question or update.
>>> Regards,
>>> Bharat
>>> On Fri, Dec 30, 2022 at 7:29 PM Tim McNerney <mc at media.mit.edu> wrote:
>>>> Bharat,
>>>> I should first thank you for taking on this CCL compilation project. It
>>>> is important to keep up with OS changes. Your efforts are much appreciated.
>>>> My partial understanding is that ASLR is for protecting *code* from
>>>> attack. The other Lisp (Franz) was intolerant to address space
>>>> *fragmentation* caused by ASLR, but I suspect there may be other ways
>>>> ASLR can potentially violate assumptions made by a garbage collected
>>>> language. They said, something *else* may be going on here.
>>>> I feel like a working *baseline* would help you. By that I mean, start
>>>> by finding an OS environment where the compilation *does* work, and
>>>> compare to the broken results, striving to reduce variables.
>>>> If you don’t have enough hardware to, say, run both Windows 10 and
>>>> Windows 7 simultaneously, use a free or cheap VM (e.g. VMWare Player). I
>>>> know AWS charges by the hour of usage (so not free), but they have a large
>>>> selection of prebuilt images that will save you time. Or, since you are
>>>> focusing on Windows, maybe Azure offers the same benefit. Maybe they offer
>>>> promotional prices for developers.
>>>> It is possible you have found an incompatibility with the gnu/cygwin
>>>> tool chain (maybe little to do with Windows). Explore which *version*
>>>> works for CCL. Again, establishing a baseline that gives you working code,
>>>> and inching forward from there.
>>>> Good luck, or in Gerry Sussman’s words “good skill!”
>>>> --Tim
>>>> On Dec 30, 2022, at 07:55, Bharat Shetty <bshetty at gmail.com> wrote:
>>>> Hi,
>>>> I had not mentioned previously the reason I had added -no-pie. The
>>>> executable used to wperror out after calling VirtualProtect in
>>>> pmcl-kernel.c:remap_spjump() function. The error was
>>>> ERROR_INVALID_ADDRESS 487 (0x1E7). After I added -no-pie (strangely
>>>> -Wl,--no-pie or -fno-pie doesnt work) this got sorted. Some people say this
>>>> disables ASLR however gnu ld doc is not very clear on this. However
>>>> VirtualProtect working means this is needed.
>>>> Today i added the following flags
>>>>    - -Wl,--disable-high-entropy-va  ;; --high-entropy-va - Image is
>>>>    compatible with 64-bit address space layout randomization (ASLR).This
>>>>    option is enabled by default for 64-bit PE images.
>>>>    - -Wl,--disable-dynamicbase ;; --dynamicbase - The image base
>>>>    address may be relocated using address space layout randomization (ASLR).
>>>>    This feature was introduced with MS Windows Vista for i386 PE targets.
>>>>    - -Wl,--disable-nxcompat ;; --nxcompat - The image is compatible
>>>>    with the Data Execution Prevention.
>>>> However these do not change anything and the wx86cl64 crashes at
>>>> exactly the same position(calculate_relocation () at ../x86-gc.c:1571).
>>>> When I debug the executable the image base address(0x12000) , text start
>>>> address(0x21000) amongst other addresses are the same across multiple
>>>> executions. Would these not change if ASLR is active?
>>>> This may not be related to ASLR as ASLR changes the image base and
>>>> section start addresses at load time. As far as I understand ASLR does not
>>>> change the base during execution. Whatever is causing this is changed
>>>> behaviour of gcc/ld. Microsoft site clearly mention ASLR is to be opted in
>>>> by the developer. If this was an OS change the downloaded bits should have
>>>> the same issue. Another question is if ASLR in Windows were to cause
>>>> re-initialisation of globals who could use it ?
>>>> Regards,
>>>> Bharat
>>>> On Fri, Dec 30, 2022 at 3:05 AM Carl Shapiro <carl.shapiro at gmail.com>
>>>> wrote:
>>>>> The feature is known as ASLR in both Windows and Linux
>>>>> https://learn.microsoft.com/en-us/windows/security/threat-protection/overview-of-threat-mitigations-in-windows-10#address-space-layout-randomization
>>>>> On Thu, Dec 29, 2022 at 12:43 PM Tim McNerney <mc at media.mit.edu>
>>>>> wrote:
>>>>>> I wonder if you need to *turn off* Windows 10’s new-ish,
>>>>>> nondeterministic memory allocation policy. We have run into this with other
>>>>>> Lisps. I don’t remember the correct terminology for this malware
>>>>>> countermeasure or the name of the configuration flag. Sorry. Can someone
>>>>>> else chime in?
>>>>>> --Tim
>>>>>> On Dec 29, 2022, at 13:49, Bharat Shetty <bshetty at gmail.com> wrote:
>>>>>> Hi,
>>>>>> I built ccl (downloaded the 1.12.1 zip file from github)on
>>>>>>    - Windows 10 (cygwin)
>>>>>>    - gcc version 11.3.0
>>>>>>    - ld/binutils version 2.39
>>>>>>    - debug flag changed to -g3 in Makefile
>>>>>>    - code optimisation level set to -O0 (zero) also in Makefile
>>>>>> When the exe was built i got a message that section /1, /8 and /32
>>>>>> are before text. I altered the pei-x86-64.x to include the new debug
>>>>>> sections(upto dwarf 5). For this i generated the default script running ld
>>>>>> --verbose. retained .Copied .spfoo from the original, removed KEEP() and
>>>>>> most of the SORT() unless it was present in the original file. Besides this
>>>>>> I had to add *-no-pie and -Wl,--allow-multiple-definition* to the
>>>>>> build rule for wx86cl64.exe target. I have not made any changes to the
>>>>>> source code. This got an exe that starts.
>>>>>> However every time I run this, it crashes at calculate_relocation in
>>>>>> x86-gc.c. The back trace is as follows:
>>>>>> #0  0x0000000000031d56 in calculate_relocation () at ../x86-gc.c:1571
>>>>>> #1  0x000000000002ea15 in gc (tcr=0x5acebc0, param=0) at
>>>>>> ../gc-common.c:1821
>>>>>> #2  0x000000000003a170 in gc_from_tcr (tcr=0x5acebc0, param=0) at
>>>>>> ../x86-exceptions.c:3014
>>>>>> #3  0x000000000003a06b in gc_like_from_xp (xp=0x25f6f570, fun=0x3a126
>>>>>> <gc_from_tcr>, param=0) at ../x86-exceptions.c:2970
>>>>>> #4  0x000000000003a1ce in gc_from_xp (xp=0x25f6f570, param=0) at
>>>>>> ../x86-exceptions.c:3026
>>>>>> #5  0x0000000000035c0f in allocate_object (xp=0x25f6f570,
>>>>>> bytes_needed=16, disp_from_allocptr=13, tcr=0x5acebc0,
>>>>>> crossed_threshold=0x25f6f1ec) at ../x86-exceptions.c:171
>>>>>> #6  0x0000000000036f26 in handle_alloc_trap (xp=0x25f6f570,
>>>>>> tcr=0x5acebc0, notify=0x25f6f1ec) at ../x86-exceptions.c:665
>>>>>> #7  0x0000000000037f15 in handle_exception (signum=11,
>>>>>> info=0x25f6f4d8, context=0x25f6f570, tcr=0x5acebc0, old_valence=0) at
>>>>>> ../x86-exceptions.c:1215
>>>>>> #8  0x0000000000038cf3 in windows_exception_handler
>>>>>> (exception_pointers=0x25f6f4c0, tcr=0x5acebc0, signal_number=11) at
>>>>>> ../x86-exceptions.c:2150
>>>>>> #9  0x00000000000438dd in windows_switch_to_foreign_stack () at
>>>>>> ../x86-asmutils64.s:263
>>>>>> #10 0x0000000025f6f4c0 in ?? ()
>>>>>> This happens after start_lisp is called. By the time the code reaches
>>>>>> gc.c global_reloctab is reset (set to 0x7cfe000000 before entering
>>>>>> start_lisp). Due to this GCrelocptr is also set to 0x0 in gc-common.c. This
>>>>>> results in relocptr being set to 0x0 in calculate_relocation.
>>>>>> Also GCndynamic_dnodes_in_area is also 0 at this point(in the downloaded
>>>>>> version it is 2048 at this point).
>>>>>> Does anyone know why these global variables are reset ? And how can I
>>>>>> fix this? I suspect this is because of the newer versions of gcc and ld.
>>>>>> Regards,
>>>>>> Bharat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20230105/cd45f103/attachment-0001.htm>

More information about the Openmcl-devel mailing list