[Openmcl-devel] Fwd: freshly built wx86cl64.exe crashes on start

Bharat Shetty bshetty at gmail.com
Tue Jan 3 02:47:24 PST 2023


The following flags are needed by the new (gccv11.3)compiler driver to
build lisp(alongwith changes to the linker script)

-no-pie -Wl,--disable-reloc-section -Wl,--allow-multiple-definition

However the crashes still happen. I will raise a support call with
sourceware on this.

Regards,
Bharat
On Sat, Dec 31, 2022 at 4:19 AM Tim McNerney <mc at media.mit.edu> wrote:

> That’s very good news! Now all we need to figure out is either a) how to
> coax the new GNU tool chain into behaving or b) how we can help the GNU
> maintainers reproduce the problem. That CCL is open source helps a lot.
>
> --Tim
>
> On Dec 30, 2022, at 14:16, Bharat Shetty <bshetty at gmail.com> wrote:
>
> 
> I just built and ran ccl successfully on Windows 10 using an old version
> of gcc from https://sysprogs.com/getfile/31/mingw64-gcc4.7.1.exe. Have
> not extensively tested this (like building images or running test code) but
> still this is important. At Least we know the issues are due to the newer
> toolchain. We need to fix this with the latest version of gcc in
> the future.
>
> This has gcc4.7.1 and binutils2.23.
>
> For the build I restored the original makefile and hence makefile
> doesn't have options to disable pie/dynamicbase/high_entropy/nxcompat. And
> made the following changes to makefile:
>
>    - set CC to the older gcc4.7.1
>    - set LD to the older one from binutils2.23
>    - set AS to the (still new) one from binutils2.39 -- the one from
>    binutils2.23 was crashing
>    - set -g3, -O0 options(just to help me debug better)
>
> Using older AS should be OK as it is only invoked directly to compile the
> .s sources
>
> Regards,
> Bharat
>
> On Fri, Dec 30, 2022 at 11:12 PM Bharat Shetty <bshetty at gmail.com> wrote:
>
>> Hi Tim,
>>
>> Thanks to all the good people who gave us ccl. And thank you and Carl for
>> responding it makes me feel less lost :D
>>
>> Luckily for me I have an additional old laptop with Windows7. I will be
>> back at times with a question or update.
>>
>> Regards,
>> Bharat
>>
>> On Fri, Dec 30, 2022 at 7:29 PM Tim McNerney <mc at media.mit.edu> wrote:
>>
>>> Bharat,
>>>
>>> I should first thank you for taking on this CCL compilation project. It
>>> is important to keep up with OS changes. Your efforts are much appreciated.
>>>
>>> My partial understanding is that ASLR is for protecting *code* from
>>> attack. The other Lisp (Franz) was intolerant to address space
>>> *fragmentation* caused by ASLR, but I suspect there may be other ways
>>> ASLR can potentially violate assumptions made by a garbage collected
>>> language. They said, something *else* may be going on here.
>>>
>>> I feel like a working *baseline* would help you. By that I mean, start
>>> by finding an OS environment where the compilation *does* work, and
>>> compare to the broken results, striving to reduce variables.
>>>
>>> If you don’t have enough hardware to, say, run both Windows 10 and
>>> Windows 7 simultaneously, use a free or cheap VM (e.g. VMWare Player). I
>>> know AWS charges by the hour of usage (so not free), but they have a large
>>> selection of prebuilt images that will save you time. Or, since you are
>>> focusing on Windows, maybe Azure offers the same benefit. Maybe they offer
>>> promotional prices for developers.
>>>
>>> It is possible you have found an incompatibility with the gnu/cygwin
>>> tool chain (maybe little to do with Windows). Explore which *version*
>>> works for CCL. Again, establishing a baseline that gives you working code,
>>> and inching forward from there.
>>>
>>> Good luck, or in Gerry Sussman’s words “good skill!”
>>>
>>> --Tim
>>>
>>> On Dec 30, 2022, at 07:55, Bharat Shetty <bshetty at gmail.com> wrote:
>>>
>>> 
>>> Hi,
>>>
>>> I had not mentioned previously the reason I had added -no-pie. The
>>> executable used to wperror out after calling VirtualProtect in
>>> pmcl-kernel.c:remap_spjump() function. The error was
>>> ERROR_INVALID_ADDRESS 487 (0x1E7). After I added -no-pie (strangely
>>> -Wl,--no-pie or -fno-pie doesnt work) this got sorted. Some people say this
>>> disables ASLR however gnu ld doc is not very clear on this. However
>>> VirtualProtect working means this is needed.
>>>
>>> Today i added the following flags
>>>
>>>    - -Wl,--disable-high-entropy-va  ;; --high-entropy-va - Image is
>>>    compatible with 64-bit address space layout randomization (ASLR).This
>>>    option is enabled by default for 64-bit PE images.
>>>    - -Wl,--disable-dynamicbase ;; --dynamicbase - The image base
>>>    address may be relocated using address space layout randomization (ASLR).
>>>    This feature was introduced with MS Windows Vista for i386 PE targets.
>>>    - -Wl,--disable-nxcompat ;; --nxcompat - The image is compatible
>>>    with the Data Execution Prevention.
>>>
>>> However these do not change anything and the wx86cl64 crashes at
>>> exactly the same position(calculate_relocation () at ../x86-gc.c:1571).
>>> When I debug the executable the image base address(0x12000) , text start
>>> address(0x21000) amongst other addresses are the same across multiple
>>> executions. Would these not change if ASLR is active?
>>>
>>> This may not be related to ASLR as ASLR changes the image base and
>>> section start addresses at load time. As far as I understand ASLR does not
>>> change the base during execution. Whatever is causing this is changed
>>> behaviour of gcc/ld. Microsoft site clearly mention ASLR is to be opted in
>>> by the developer. If this was an OS change the downloaded bits should have
>>> the same issue. Another question is if ASLR in Windows were to cause
>>> re-initialisation of globals who could use it ?
>>>
>>> Regards,
>>> Bharat
>>>
>>>
>>> On Fri, Dec 30, 2022 at 3:05 AM Carl Shapiro <carl.shapiro at gmail.com>
>>> wrote:
>>>
>>>> The feature is known as ASLR in both Windows and Linux
>>>>
>>>>
>>>> https://learn.microsoft.com/en-us/windows/security/threat-protection/overview-of-threat-mitigations-in-windows-10#address-space-layout-randomization
>>>>
>>>> On Thu, Dec 29, 2022 at 12:43 PM Tim McNerney <mc at media.mit.edu> wrote:
>>>>
>>>>> I wonder if you need to *turn off* Windows 10’s new-ish,
>>>>> nondeterministic memory allocation policy. We have run into this with other
>>>>> Lisps. I don’t remember the correct terminology for this malware
>>>>> countermeasure or the name of the configuration flag. Sorry. Can someone
>>>>> else chime in?
>>>>>
>>>>> --Tim
>>>>>
>>>>> On Dec 29, 2022, at 13:49, Bharat Shetty <bshetty at gmail.com> wrote:
>>>>>
>>>>> 
>>>>> Hi,
>>>>>
>>>>> I built ccl (downloaded the 1.12.1 zip file from github)on
>>>>>
>>>>>    - Windows 10 (cygwin)
>>>>>    - gcc version 11.3.0
>>>>>    - ld/binutils version 2.39
>>>>>    - debug flag changed to -g3 in Makefile
>>>>>    - code optimisation level set to -O0 (zero) also in Makefile
>>>>>
>>>>>
>>>>> When the exe was built i got a message that section /1, /8 and /32 are
>>>>> before text. I altered the pei-x86-64.x to include the new debug
>>>>> sections(upto dwarf 5). For this i generated the default script running ld
>>>>> --verbose. retained .Copied .spfoo from the original, removed KEEP() and
>>>>> most of the SORT() unless it was present in the original file. Besides this
>>>>> I had to add *-no-pie and -Wl,--allow-multiple-definition* to the
>>>>> build rule for wx86cl64.exe target. I have not made any changes to the
>>>>> source code. This got an exe that starts.
>>>>>
>>>>> However every time I run this, it crashes at calculate_relocation in
>>>>> x86-gc.c. The back trace is as follows:
>>>>> #0  0x0000000000031d56 in calculate_relocation () at ../x86-gc.c:1571
>>>>> #1  0x000000000002ea15 in gc (tcr=0x5acebc0, param=0) at
>>>>> ../gc-common.c:1821
>>>>> #2  0x000000000003a170 in gc_from_tcr (tcr=0x5acebc0, param=0) at
>>>>> ../x86-exceptions.c:3014
>>>>> #3  0x000000000003a06b in gc_like_from_xp (xp=0x25f6f570, fun=0x3a126
>>>>> <gc_from_tcr>, param=0) at ../x86-exceptions.c:2970
>>>>> #4  0x000000000003a1ce in gc_from_xp (xp=0x25f6f570, param=0) at
>>>>> ../x86-exceptions.c:3026
>>>>> #5  0x0000000000035c0f in allocate_object (xp=0x25f6f570,
>>>>> bytes_needed=16, disp_from_allocptr=13, tcr=0x5acebc0,
>>>>> crossed_threshold=0x25f6f1ec) at ../x86-exceptions.c:171
>>>>> #6  0x0000000000036f26 in handle_alloc_trap (xp=0x25f6f570,
>>>>> tcr=0x5acebc0, notify=0x25f6f1ec) at ../x86-exceptions.c:665
>>>>> #7  0x0000000000037f15 in handle_exception (signum=11,
>>>>> info=0x25f6f4d8, context=0x25f6f570, tcr=0x5acebc0, old_valence=0) at
>>>>> ../x86-exceptions.c:1215
>>>>> #8  0x0000000000038cf3 in windows_exception_handler
>>>>> (exception_pointers=0x25f6f4c0, tcr=0x5acebc0, signal_number=11) at
>>>>> ../x86-exceptions.c:2150
>>>>> #9  0x00000000000438dd in windows_switch_to_foreign_stack () at
>>>>> ../x86-asmutils64.s:263
>>>>> #10 0x0000000025f6f4c0 in ?? ()
>>>>>
>>>>> This happens after start_lisp is called. By the time the code reaches
>>>>> gc.c global_reloctab is reset (set to 0x7cfe000000 before entering
>>>>> start_lisp). Due to this GCrelocptr is also set to 0x0 in gc-common.c. This
>>>>> results in relocptr being set to 0x0 in calculate_relocation.
>>>>> Also GCndynamic_dnodes_in_area is also 0 at this point(in the downloaded
>>>>> version it is 2048 at this point).
>>>>>
>>>>> Does anyone know why these global variables are reset ? And how can I
>>>>> fix this? I suspect this is because of the newer versions of gcc and ld.
>>>>>
>>>>> Regards,
>>>>> Bharat
>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20230103/3efb4a89/attachment.htm>


More information about the Openmcl-devel mailing list