[Openmcl-devel] Fwd: freshly built wx86cl64.exe crashes on start

Bharat Shetty bshetty at gmail.com
Fri Dec 30 11:16:07 PST 2022


I just built and ran ccl successfully on Windows 10 using an old version of
gcc from https://sysprogs.com/getfile/31/mingw64-gcc4.7.1.exe. Have not
extensively tested this (like building images or running test code) but
still this is important. At Least we know the issues are due to the newer
toolchain. We need to fix this with the latest version of gcc in
the future.

This has gcc4.7.1 and binutils2.23.

For the build I restored the original makefile and hence makefile
doesn't have options to disable pie/dynamicbase/high_entropy/nxcompat. And
made the following changes to makefile:

   - set CC to the older gcc4.7.1
   - set LD to the older one from binutils2.23
   - set AS to the (still new) one from binutils2.39 -- the one from
   binutils2.23 was crashing
   - set -g3, -O0 options(just to help me debug better)

Using older AS should be OK as it is only invoked directly to compile the
.s sources

Regards,
Bharat

On Fri, Dec 30, 2022 at 11:12 PM Bharat Shetty <bshetty at gmail.com> wrote:

> Hi Tim,
>
> Thanks to all the good people who gave us ccl. And thank you and Carl for
> responding it makes me feel less lost :D
>
> Luckily for me I have an additional old laptop with Windows7. I will be
> back at times with a question or update.
>
> Regards,
> Bharat
>
> On Fri, Dec 30, 2022 at 7:29 PM Tim McNerney <mc at media.mit.edu> wrote:
>
>> Bharat,
>>
>> I should first thank you for taking on this CCL compilation project. It
>> is important to keep up with OS changes. Your efforts are much appreciated.
>>
>> My partial understanding is that ASLR is for protecting *code* from
>> attack. The other Lisp (Franz) was intolerant to address space
>> *fragmentation* caused by ASLR, but I suspect there may be other ways
>> ASLR can potentially violate assumptions made by a garbage collected
>> language. They said, something *else* may be going on here.
>>
>> I feel like a working *baseline* would help you. By that I mean, start
>> by finding an OS environment where the compilation *does* work, and
>> compare to the broken results, striving to reduce variables.
>>
>> If you don’t have enough hardware to, say, run both Windows 10 and
>> Windows 7 simultaneously, use a free or cheap VM (e.g. VMWare Player). I
>> know AWS charges by the hour of usage (so not free), but they have a large
>> selection of prebuilt images that will save you time. Or, since you are
>> focusing on Windows, maybe Azure offers the same benefit. Maybe they offer
>> promotional prices for developers.
>>
>> It is possible you have found an incompatibility with the gnu/cygwin tool
>> chain (maybe little to do with Windows). Explore which *version* works
>> for CCL. Again, establishing a baseline that gives you working code, and
>> inching forward from there.
>>
>> Good luck, or in Gerry Sussman’s words “good skill!”
>>
>> --Tim
>>
>> On Dec 30, 2022, at 07:55, Bharat Shetty <bshetty at gmail.com> wrote:
>>
>> 
>> Hi,
>>
>> I had not mentioned previously the reason I had added -no-pie. The
>> executable used to wperror out after calling VirtualProtect in
>> pmcl-kernel.c:remap_spjump() function. The error was
>> ERROR_INVALID_ADDRESS 487 (0x1E7). After I added -no-pie (strangely
>> -Wl,--no-pie or -fno-pie doesnt work) this got sorted. Some people say this
>> disables ASLR however gnu ld doc is not very clear on this. However
>> VirtualProtect working means this is needed.
>>
>> Today i added the following flags
>>
>>    - -Wl,--disable-high-entropy-va  ;; --high-entropy-va - Image is
>>    compatible with 64-bit address space layout randomization (ASLR).This
>>    option is enabled by default for 64-bit PE images.
>>    - -Wl,--disable-dynamicbase ;; --dynamicbase - The image base address
>>    may be relocated using address space layout randomization (ASLR). This
>>    feature was introduced with MS Windows Vista for i386 PE targets.
>>    - -Wl,--disable-nxcompat ;; --nxcompat - The image is compatible with
>>    the Data Execution Prevention.
>>
>> However these do not change anything and the wx86cl64 crashes at
>> exactly the same position(calculate_relocation () at ../x86-gc.c:1571).
>> When I debug the executable the image base address(0x12000) , text start
>> address(0x21000) amongst other addresses are the same across multiple
>> executions. Would these not change if ASLR is active?
>>
>> This may not be related to ASLR as ASLR changes the image base and
>> section start addresses at load time. As far as I understand ASLR does not
>> change the base during execution. Whatever is causing this is changed
>> behaviour of gcc/ld. Microsoft site clearly mention ASLR is to be opted in
>> by the developer. If this was an OS change the downloaded bits should have
>> the same issue. Another question is if ASLR in Windows were to cause
>> re-initialisation of globals who could use it ?
>>
>> Regards,
>> Bharat
>>
>>
>> On Fri, Dec 30, 2022 at 3:05 AM Carl Shapiro <carl.shapiro at gmail.com>
>> wrote:
>>
>>> The feature is known as ASLR in both Windows and Linux
>>>
>>>
>>> https://learn.microsoft.com/en-us/windows/security/threat-protection/overview-of-threat-mitigations-in-windows-10#address-space-layout-randomization
>>>
>>> On Thu, Dec 29, 2022 at 12:43 PM Tim McNerney <mc at media.mit.edu> wrote:
>>>
>>>> I wonder if you need to *turn off* Windows 10’s new-ish,
>>>> nondeterministic memory allocation policy. We have run into this with other
>>>> Lisps. I don’t remember the correct terminology for this malware
>>>> countermeasure or the name of the configuration flag. Sorry. Can someone
>>>> else chime in?
>>>>
>>>> --Tim
>>>>
>>>> On Dec 29, 2022, at 13:49, Bharat Shetty <bshetty at gmail.com> wrote:
>>>>
>>>> 
>>>> Hi,
>>>>
>>>> I built ccl (downloaded the 1.12.1 zip file from github)on
>>>>
>>>>    - Windows 10 (cygwin)
>>>>    - gcc version 11.3.0
>>>>    - ld/binutils version 2.39
>>>>    - debug flag changed to -g3 in Makefile
>>>>    - code optimisation level set to -O0 (zero) also in Makefile
>>>>
>>>>
>>>> When the exe was built i got a message that section /1, /8 and /32 are
>>>> before text. I altered the pei-x86-64.x to include the new debug
>>>> sections(upto dwarf 5). For this i generated the default script running ld
>>>> --verbose. retained .Copied .spfoo from the original, removed KEEP() and
>>>> most of the SORT() unless it was present in the original file. Besides this
>>>> I had to add *-no-pie and -Wl,--allow-multiple-definition* to the
>>>> build rule for wx86cl64.exe target. I have not made any changes to the
>>>> source code. This got an exe that starts.
>>>>
>>>> However every time I run this, it crashes at calculate_relocation in
>>>> x86-gc.c. The back trace is as follows:
>>>> #0  0x0000000000031d56 in calculate_relocation () at ../x86-gc.c:1571
>>>> #1  0x000000000002ea15 in gc (tcr=0x5acebc0, param=0) at
>>>> ../gc-common.c:1821
>>>> #2  0x000000000003a170 in gc_from_tcr (tcr=0x5acebc0, param=0) at
>>>> ../x86-exceptions.c:3014
>>>> #3  0x000000000003a06b in gc_like_from_xp (xp=0x25f6f570, fun=0x3a126
>>>> <gc_from_tcr>, param=0) at ../x86-exceptions.c:2970
>>>> #4  0x000000000003a1ce in gc_from_xp (xp=0x25f6f570, param=0) at
>>>> ../x86-exceptions.c:3026
>>>> #5  0x0000000000035c0f in allocate_object (xp=0x25f6f570,
>>>> bytes_needed=16, disp_from_allocptr=13, tcr=0x5acebc0,
>>>> crossed_threshold=0x25f6f1ec) at ../x86-exceptions.c:171
>>>> #6  0x0000000000036f26 in handle_alloc_trap (xp=0x25f6f570,
>>>> tcr=0x5acebc0, notify=0x25f6f1ec) at ../x86-exceptions.c:665
>>>> #7  0x0000000000037f15 in handle_exception (signum=11, info=0x25f6f4d8,
>>>> context=0x25f6f570, tcr=0x5acebc0, old_valence=0) at
>>>> ../x86-exceptions.c:1215
>>>> #8  0x0000000000038cf3 in windows_exception_handler
>>>> (exception_pointers=0x25f6f4c0, tcr=0x5acebc0, signal_number=11) at
>>>> ../x86-exceptions.c:2150
>>>> #9  0x00000000000438dd in windows_switch_to_foreign_stack () at
>>>> ../x86-asmutils64.s:263
>>>> #10 0x0000000025f6f4c0 in ?? ()
>>>>
>>>> This happens after start_lisp is called. By the time the code reaches
>>>> gc.c global_reloctab is reset (set to 0x7cfe000000 before entering
>>>> start_lisp). Due to this GCrelocptr is also set to 0x0 in gc-common.c. This
>>>> results in relocptr being set to 0x0 in calculate_relocation.
>>>> Also GCndynamic_dnodes_in_area is also 0 at this point(in the downloaded
>>>> version it is 2048 at this point).
>>>>
>>>> Does anyone know why these global variables are reset ? And how can I
>>>> fix this? I suspect this is because of the newer versions of gcc and ld.
>>>>
>>>> Regards,
>>>> Bharat
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clozure.com/pipermail/openmcl-devel/attachments/20221231/7eafc4e9/attachment.htm>


More information about the Openmcl-devel mailing list