[Openpvrsgx-devgroup] [PATCH] Update pvrsgx 1.14.3759903 to latest kernel
Lucas Fryzek
lucas.fryzek at hazeco.xyz
Mon Nov 15 00:06:09 CET 2021
>
>>
>>> FYI: I looked into
>>>
>>> git diff --no-index
>>> drivers/gpu/drm/pvrsgx/1.14.3699939/eurasia_km/services4/srvkm/env/linux/osfunc.c
>>> drivers/gpu/drm/pvrsgx/1.14.3759903/eurasia_km/services4/srvkm/env/linux/osfunc.c
>>>
>>> and there seems to be some different fixed for 1.14.3699939.
>>>
>>> Mainly in OSAcquirePhysPageAddr() which uses
>>>
>>> - psInfo->iNumPagesMapped = get_user_pages_remote(current->mm,
>>> uStartAddr, psInfo->iNumPages, FOLL_WRITE, psInfo->ppsPages,
>>> NULL, NULL);
>>>
>>> since LINUX_VERSION_CODE >= KERNEL_VERSION(5,9,0)
>>>
>>> instead of
>>>
>>> + psInfo->iNumPagesMapped = get_user_pages(
>>> + uStartAddr, psInfo->iNumPages, 1, psInfo->ppsPages,
>>> NULL);
>>>
>>> But don't ask me what the difference is and if it is the root cause
>>> of my kernel panic...
>>>
>>> BR,
>>> Nikolaus
>>>
>>
>> Thanks for the all the additional information! I was able to trigger
>> the crash by compiling the driver as a module (i.e.
>> `CONFIG_SGX_JZ4780=m`), it appears that the default letux_defconfig
>> doesn't do this. I am now seeing the same behavior, and trying to
>> debug it. I don't believe `OSAcquirePhysPageAddr` is causing the
>> issue here, I tried taking the changes from `1.14.3699939` and that
>> didn't help. I am going to figure out which value triggers the
>> paging exception and work backwards from there to see whats going
>> wrong.
>>
>> Also I did have a case where it decided to boot up fine without an
>> error, and I saw the same results with `pvrsrvctrl` reporting that
>> it was happy with the KMOD and then triggering a kernel crash.
>>
>
> After more investigation it appears that there are two problems:
>
> First problem is that when the driver fails to init on boot it
> appears that `struct device *dev` pointer coming from
> `PVRLDMGetDevice` is returning NULL.
>
> The second problem is that `dma_sync_single_for_device` does appear
> to be happy with the address being given to it. I suspect this is a
> problem with how the driver is allocating memory. The `dma_sync_*`
> APIs are intended to be used with memory allocated from `dma_alloc_*`
>
> In the code I can see this comment which originally from the
> unmodified kernel code that IMGTEC shipped
>
> ```
> /*
> * dmac cache functions are supposed to be used for dma
> * memory which comes from dma-able memory. However examining
> * the implementation of dmac cache functions and experimenting,
> * can assert that dmac functions are safe to use for high-mem
> * memory as well for our OS{Clean/Flush/Invalidate}Cache functions
> *
> */
> ```
>
> I suspect this comment is no longer true, and if we want to use the
> DMA cache ops we need to allocate memory using the dma allocate
> functions, instead of using `vmalloc` or a modified version of
> `vmalloc`. This seems to be trivial change though as its difficult to
> tell if `OSAllocPages_Impl` is just used for DMA memory, or just
> kernel memory as well. There is probably a viable strategy of using
> the mips platform specific cache operations in `include/asm/cache*`
> instead of the `dma` ones. I don't think this would be a problem
> since the code for section is already wrapped in a `#elif
> defined(__mips__)`. Although if the goal of this project is to get
> this kernel module upstreamed, I suspect the kernel maintainers would
> prefer if we didn't use platform specific `ifdefs` and instead used
> the common kernel infrastructure that is platform independent.
>
> For fun I just removed the call to `dma_sync_single_for_device` to
> see what would happen and the error went away, allowing `pvrsrvctl`
> to get further along until it printed these errors
>
> ```
> [ 89.495046] PVR_K: (FAIL) SGXInit: Mismatch in client-side and KM
> driver build options.
> [ 89.531664] PVR_K: Extra options present in client-side driver:
> (0xa100). Please check sgx_options.h
> [ 89.569613] PVR_K:(Error): PVRSRVFinaliseSystem: Failed
> PVRSRVDevInitCompatCheck call (device index: 0)
> [ 89.608249] PVR_K:(Error): BridgedDispatchKM: Initialisation
> failed. Driver unusable.
> pvrsrvctl: SrvInit failed (already initialized?)
> (err=PVRSRV_ERROR_BUILD_MISMATCH
> ```
>
> So after figuring out a fix for these cache operations, looks like
> next steps is spoofing the driver build options to match what the
> userland expects :)
>
So I modified `arch/mips/mm/cache.c` to export all the cache ops, and
used them instead and I'm having some success. I can get the driver to
initialize fine, I had to define `IGNORE_SGX_INIT_COMPATIBILITY_CHECK`
to prevent the driver from failing when `pvrsrvctl` ran.
It looks like `pvrsrvctl --start --no-module` will successfully run
now, but I'm not actually sure how to validate if it working properly.
It looks like the return code is 0 and no errors are generated and
printed to the console.
One other thing of note is that since switching to version
`1.14.3759903` of the kernel mode driver Xorg seems to fail when
starting up. I've linked to my xorg log below but the key error seems
to be:
Fatal server error:
[ 52.758] (EE) Cannot run in framebuffer mode. Please specify busIDs
for all framebuffer devices
https://pastebin.com/XG0f7S1Y
I'm not sure if anyone else has run into this in the past, but if they
have I'm all ears for ideas to try.
More information about the openpvrsgx-devgroup
mailing list