[Openpvrsgx-devgroup] [PATCH] Update pvrsgx 1.14.3759903 to latest kernel

Lucas Fryzek lucas.fryzek at hazeco.xyz
Mon Nov 15 00:06:09 CET 2021



> 
>> 
>>> FYI: I looked into
>>> 
>>> git diff --no-index 
>>> drivers/gpu/drm/pvrsgx/1.14.3699939/eurasia_km/services4/srvkm/env/linux/osfunc.c 
>>> drivers/gpu/drm/pvrsgx/1.14.3759903/eurasia_km/services4/srvkm/env/linux/osfunc.c
>>> 
>>> and there seems to be some different fixed for 1.14.3699939.
>>> 
>>> Mainly in OSAcquirePhysPageAddr() which uses
>>> 
>>> -    psInfo->iNumPagesMapped = get_user_pages_remote(current->mm, 
>>> uStartAddr, psInfo->iNumPages, FOLL_WRITE, psInfo->ppsPages, 
>>> NULL, NULL);
>>> 
>>> since LINUX_VERSION_CODE >= KERNEL_VERSION(5,9,0)
>>> 
>>> instead of
>>> 
>>> +    psInfo->iNumPagesMapped = get_user_pages(
>>> +               uStartAddr, psInfo->iNumPages, 1, psInfo->ppsPages, 
>>> NULL);
>>> 
>>> But don't ask me what the difference is and if it is the root cause 
>>> of my kernel panic...
>>> 
>>> BR,
>>> Nikolaus
>>> 
>> 
>> Thanks for the all the additional information! I was able to trigger 
>> the crash by compiling the driver as a module (i.e. 
>> `CONFIG_SGX_JZ4780=m`), it appears that the default letux_defconfig 
>> doesn't do this. I am now seeing the same behavior, and trying to 
>> debug it. I don't believe `OSAcquirePhysPageAddr` is causing the 
>> issue here, I tried taking the changes from `1.14.3699939` and that 
>> didn't help. I am going to figure out which value triggers the 
>> paging exception and work backwards from there to see whats going 
>> wrong.
>> 
>> Also I did have a case where it decided to boot up fine without an 
>> error, and I saw the same results with `pvrsrvctrl` reporting that 
>> it was happy with the KMOD and then triggering a kernel crash.
>> 
> 
> After more investigation it appears that there are two problems:
> 
> First problem is that when the driver fails to init on boot it 
> appears that `struct device *dev` pointer coming from 
> `PVRLDMGetDevice` is returning NULL.
> 
> The second problem is that `dma_sync_single_for_device` does appear 
> to be happy with the address being given to it. I suspect this is a 
> problem with how the driver is allocating memory. The `dma_sync_*` 
> APIs are intended to be used with memory allocated from `dma_alloc_*`
> 
> In the code I can see this comment which originally from the 
> unmodified kernel code that IMGTEC shipped
> 
> ```
> /*
> * dmac cache functions are supposed to be used for dma
> * memory which comes from dma-able memory. However examining
> * the implementation of dmac cache functions and experimenting,
> * can assert that dmac functions are safe to use for high-mem
> * memory as well for our OS{Clean/Flush/Invalidate}Cache functions
> *
> */
> ```
> 
> I suspect this comment is no longer true, and if we want to use the 
> DMA cache ops we need to allocate memory using the dma allocate 
> functions, instead of using `vmalloc` or a modified version of 
> `vmalloc`. This seems to be trivial change though as its difficult to 
> tell if `OSAllocPages_Impl` is just used for DMA memory, or just 
> kernel memory as well. There is probably a viable strategy of using 
> the mips platform specific cache operations in `include/asm/cache*` 
> instead of the `dma` ones. I don't think this would be a problem 
> since the code for section is already wrapped in a `#elif 
> defined(__mips__)`. Although if the goal of this project is to get 
> this kernel module upstreamed, I suspect the kernel maintainers would 
> prefer if we didn't use platform specific `ifdefs` and instead used 
> the common kernel infrastructure that is platform independent.
> 
> For fun I just removed the call to `dma_sync_single_for_device` to 
> see what would happen and the error went away, allowing `pvrsrvctl` 
> to get further along until it printed these errors
> 
> ```
> [   89.495046] PVR_K: (FAIL) SGXInit: Mismatch in client-side and KM 
> driver build options.
> [   89.531664] PVR_K: Extra options present in client-side driver: 
> (0xa100). Please check sgx_options.h
> [   89.569613] PVR_K:(Error): PVRSRVFinaliseSystem: Failed 
> PVRSRVDevInitCompatCheck call (device index: 0)
> [   89.608249] PVR_K:(Error): BridgedDispatchKM: Initialisation 
> failed. Driver unusable.
> pvrsrvctl: SrvInit failed (already initialized?) 
> (err=PVRSRV_ERROR_BUILD_MISMATCH
> ```
> 
> So after figuring out a fix for these cache operations, looks like 
> next steps is spoofing the driver build options to match what the 
> userland expects :)
> 

So I modified `arch/mips/mm/cache.c` to export all the cache ops, and 
used them instead and I'm having some success. I can get the driver to 
initialize fine, I had to define `IGNORE_SGX_INIT_COMPATIBILITY_CHECK` 
to prevent the driver from failing when `pvrsrvctl` ran.

It looks like `pvrsrvctl --start --no-module` will successfully run 
now, but I'm not actually sure how to validate if it working properly. 
It looks like the return code is 0 and no errors are generated and 
printed to the console.

One other thing of note is that since switching to version 
`1.14.3759903` of the kernel mode driver Xorg seems to fail when 
starting up. I've linked to my xorg log below but the key error seems 
to be:

Fatal server error:
[ 52.758] (EE) Cannot run in framebuffer mode. Please specify busIDs 
for all framebuffer devices

https://pastebin.com/XG0f7S1Y

I'm not sure if anyone else has run into this in the past, but if they 
have I'm all ears for ideas to try.





More information about the openpvrsgx-devgroup mailing list