[Openpvrsgx-devgroup] [PATCH] Update pvrsgx 1.14.3759903 to latest kernel

Lucas Fryzek lucas.fryzek at hazeco.xyz
Mon Nov 15 00:06:09 CET 2021

>>> FYI: I looked into
>>> git diff --no-index 
>>> drivers/gpu/drm/pvrsgx/1.14.3699939/eurasia_km/services4/srvkm/env/linux/osfunc.c 
>>> drivers/gpu/drm/pvrsgx/1.14.3759903/eurasia_km/services4/srvkm/env/linux/osfunc.c
>>> and there seems to be some different fixed for 1.14.3699939.
>>> Mainly in OSAcquirePhysPageAddr() which uses
>>> -    psInfo->iNumPagesMapped = get_user_pages_remote(current->mm, 
>>> uStartAddr, psInfo->iNumPages, FOLL_WRITE, psInfo->ppsPages, 
>>> NULL, NULL);
>>> instead of
>>> +    psInfo->iNumPagesMapped = get_user_pages(
>>> +               uStartAddr, psInfo->iNumPages, 1, psInfo->ppsPages, 
>>> NULL);
>>> But don't ask me what the difference is and if it is the root cause 
>>> of my kernel panic...
>>> BR,
>>> Nikolaus
>> Thanks for the all the additional information! I was able to trigger 
>> the crash by compiling the driver as a module (i.e. 
>> `CONFIG_SGX_JZ4780=m`), it appears that the default letux_defconfig 
>> doesn't do this. I am now seeing the same behavior, and trying to 
>> debug it. I don't believe `OSAcquirePhysPageAddr` is causing the 
>> issue here, I tried taking the changes from `1.14.3699939` and that 
>> didn't help. I am going to figure out which value triggers the 
>> paging exception and work backwards from there to see whats going 
>> wrong.
>> Also I did have a case where it decided to boot up fine without an 
>> error, and I saw the same results with `pvrsrvctrl` reporting that 
>> it was happy with the KMOD and then triggering a kernel crash.
> After more investigation it appears that there are two problems:
> First problem is that when the driver fails to init on boot it 
> appears that `struct device *dev` pointer coming from 
> `PVRLDMGetDevice` is returning NULL.
> The second problem is that `dma_sync_single_for_device` does appear 
> to be happy with the address being given to it. I suspect this is a 
> problem with how the driver is allocating memory. The `dma_sync_*` 
> APIs are intended to be used with memory allocated from `dma_alloc_*`
> In the code I can see this comment which originally from the 
> unmodified kernel code that IMGTEC shipped
> ```
> /*
> * dmac cache functions are supposed to be used for dma
> * memory which comes from dma-able memory. However examining
> * the implementation of dmac cache functions and experimenting,
> * can assert that dmac functions are safe to use for high-mem
> * memory as well for our OS{Clean/Flush/Invalidate}Cache functions
> *
> */
> ```
> I suspect this comment is no longer true, and if we want to use the 
> DMA cache ops we need to allocate memory using the dma allocate 
> functions, instead of using `vmalloc` or a modified version of 
> `vmalloc`. This seems to be trivial change though as its difficult to 
> tell if `OSAllocPages_Impl` is just used for DMA memory, or just 
> kernel memory as well. There is probably a viable strategy of using 
> the mips platform specific cache operations in `include/asm/cache*` 
> instead of the `dma` ones. I don't think this would be a problem 
> since the code for section is already wrapped in a `#elif 
> defined(__mips__)`. Although if the goal of this project is to get 
> this kernel module upstreamed, I suspect the kernel maintainers would 
> prefer if we didn't use platform specific `ifdefs` and instead used 
> the common kernel infrastructure that is platform independent.
> For fun I just removed the call to `dma_sync_single_for_device` to 
> see what would happen and the error went away, allowing `pvrsrvctl` 
> to get further along until it printed these errors
> ```
> [   89.495046] PVR_K: (FAIL) SGXInit: Mismatch in client-side and KM 
> driver build options.
> [   89.531664] PVR_K: Extra options present in client-side driver: 
> (0xa100). Please check sgx_options.h
> [   89.569613] PVR_K:(Error): PVRSRVFinaliseSystem: Failed 
> PVRSRVDevInitCompatCheck call (device index: 0)
> [   89.608249] PVR_K:(Error): BridgedDispatchKM: Initialisation 
> failed. Driver unusable.
> pvrsrvctl: SrvInit failed (already initialized?) 
> ```
> So after figuring out a fix for these cache operations, looks like 
> next steps is spoofing the driver build options to match what the 
> userland expects :)

So I modified `arch/mips/mm/cache.c` to export all the cache ops, and 
used them instead and I'm having some success. I can get the driver to 
initialize fine, I had to define `IGNORE_SGX_INIT_COMPATIBILITY_CHECK` 
to prevent the driver from failing when `pvrsrvctl` ran.

It looks like `pvrsrvctl --start --no-module` will successfully run 
now, but I'm not actually sure how to validate if it working properly. 
It looks like the return code is 0 and no errors are generated and 
printed to the console.

One other thing of note is that since switching to version 
`1.14.3759903` of the kernel mode driver Xorg seems to fail when 
starting up. I've linked to my xorg log below but the key error seems 
to be:

Fatal server error:
[ 52.758] (EE) Cannot run in framebuffer mode. Please specify busIDs 
for all framebuffer devices


I'm not sure if anyone else has run into this in the past, but if they 
have I'm all ears for ideas to try.

More information about the openpvrsgx-devgroup mailing list