[Openpvrsgx-devgroup] [PATCH] Update pvrsgx 1.14.3759903 to latest kernel

Lucas Fryzek lucas.fryzek at hazeco.xyz
Sun Nov 14 19:58:33 CET 2021

>> FYI: I looked into
>> git diff --no-index 
>> drivers/gpu/drm/pvrsgx/1.14.3699939/eurasia_km/services4/srvkm/env/linux/osfunc.c 
>> drivers/gpu/drm/pvrsgx/1.14.3759903/eurasia_km/services4/srvkm/env/linux/osfunc.c
>> and there seems to be some different fixed for 1.14.3699939.
>> Mainly in OSAcquirePhysPageAddr() which uses
>> -    psInfo->iNumPagesMapped = get_user_pages_remote(current->mm, 
>> uStartAddr, psInfo->iNumPages, FOLL_WRITE, psInfo->ppsPages, NULL, 
>> NULL);
>> instead of
>> +    psInfo->iNumPagesMapped = get_user_pages(
>> +               uStartAddr, psInfo->iNumPages, 1, psInfo->ppsPages, 
>> NULL);
>> But don't ask me what the difference is and if it is the root cause 
>> of my kernel panic...
>> BR,
>> Nikolaus
> Thanks for the all the additional information! I was able to trigger 
> the crash by compiling the driver as a module (i.e. 
> `CONFIG_SGX_JZ4780=m`), it appears that the default letux_defconfig 
> doesn't do this. I am now seeing the same behavior, and trying to 
> debug it. I don't believe `OSAcquirePhysPageAddr` is causing the 
> issue here, I tried taking the changes from `1.14.3699939` and that 
> didn't help. I am going to figure out which value triggers the paging 
> exception and work backwards from there to see whats going wrong.
> Also I did have a case where it decided to boot up fine without an 
> error, and I saw the same results with `pvrsrvctrl` reporting that it 
> was happy with the KMOD and then triggering a kernel crash.

After more investigation it appears that there are two problems:

First problem is that when the driver fails to init on boot it appears 
that `struct device *dev` pointer coming from `PVRLDMGetDevice` is 
returning NULL.

The second problem is that `dma_sync_single_for_device` does appear to 
be happy with the address being given to it. I suspect this is a 
problem with how the driver is allocating memory. The `dma_sync_*` APIs 
are intended to be used with memory allocated from `dma_alloc_*`

In the code I can see this comment which originally from the unmodified 
kernel code that IMGTEC shipped

 * dmac cache functions are supposed to be used for dma
 * memory which comes from dma-able memory. However examining
 * the implementation of dmac cache functions and experimenting,
 * can assert that dmac functions are safe to use for high-mem
 * memory as well for our OS{Clean/Flush/Invalidate}Cache functions

I suspect this comment is no longer true, and if we want to use the DMA 
cache ops we need to allocate memory using the dma allocate functions, 
instead of using `vmalloc` or a modified version of `vmalloc`. This 
seems to be trivial change though as its difficult to tell if 
`OSAllocPages_Impl` is just used for DMA memory, or just kernel memory 
as well. There is probably a viable strategy of using the mips platform 
specific cache operations in `include/asm/cache*` instead of the `dma` 
ones. I don't think this would be a problem since the code for section 
is already wrapped in a `#elif defined(__mips__)`. Although if the goal 
of this project is to get this kernel module upstreamed, I suspect the 
kernel maintainers would prefer if we didn't use platform specific 
`ifdefs` and instead used the common kernel infrastructure that is 
platform independent.

For fun I just removed the call to `dma_sync_single_for_device` to see 
what would happen and the error went away, allowing `pvrsrvctl` to get 
further along until it printed these errors

[   89.495046] PVR_K: (FAIL) SGXInit: Mismatch in client-side and KM 
driver build options.
[   89.531664] PVR_K: Extra options present in client-side driver: 
(0xa100). Please check sgx_options.h
[   89.569613] PVR_K:(Error): PVRSRVFinaliseSystem: Failed 
PVRSRVDevInitCompatCheck call (device index: 0)
[   89.608249] PVR_K:(Error): BridgedDispatchKM: Initialisation failed. 
 Driver unusable.
pvrsrvctl: SrvInit failed (already initialized?) 

So after figuring out a fix for these cache operations, looks like next 
steps is spoofing the driver build options to match what the userland 
expects :)

More information about the openpvrsgx-devgroup mailing list