[Openpvrsgx-devgroup] [PATCH] Update pvrsgx 1.14.3759903 to latest kernel
Lucas Fryzek
lucas.fryzek at hazeco.xyz
Sun Nov 14 19:58:33 CET 2021
>
>> FYI: I looked into
>>
>> git diff --no-index
>> drivers/gpu/drm/pvrsgx/1.14.3699939/eurasia_km/services4/srvkm/env/linux/osfunc.c
>> drivers/gpu/drm/pvrsgx/1.14.3759903/eurasia_km/services4/srvkm/env/linux/osfunc.c
>>
>> and there seems to be some different fixed for 1.14.3699939.
>>
>> Mainly in OSAcquirePhysPageAddr() which uses
>>
>> - psInfo->iNumPagesMapped = get_user_pages_remote(current->mm,
>> uStartAddr, psInfo->iNumPages, FOLL_WRITE, psInfo->ppsPages, NULL,
>> NULL);
>>
>> since LINUX_VERSION_CODE >= KERNEL_VERSION(5,9,0)
>>
>> instead of
>>
>> + psInfo->iNumPagesMapped = get_user_pages(
>> + uStartAddr, psInfo->iNumPages, 1, psInfo->ppsPages,
>> NULL);
>>
>> But don't ask me what the difference is and if it is the root cause
>> of my kernel panic...
>>
>> BR,
>> Nikolaus
>>
>
> Thanks for the all the additional information! I was able to trigger
> the crash by compiling the driver as a module (i.e.
> `CONFIG_SGX_JZ4780=m`), it appears that the default letux_defconfig
> doesn't do this. I am now seeing the same behavior, and trying to
> debug it. I don't believe `OSAcquirePhysPageAddr` is causing the
> issue here, I tried taking the changes from `1.14.3699939` and that
> didn't help. I am going to figure out which value triggers the paging
> exception and work backwards from there to see whats going wrong.
>
> Also I did have a case where it decided to boot up fine without an
> error, and I saw the same results with `pvrsrvctrl` reporting that it
> was happy with the KMOD and then triggering a kernel crash.
>
After more investigation it appears that there are two problems:
First problem is that when the driver fails to init on boot it appears
that `struct device *dev` pointer coming from `PVRLDMGetDevice` is
returning NULL.
The second problem is that `dma_sync_single_for_device` does appear to
be happy with the address being given to it. I suspect this is a
problem with how the driver is allocating memory. The `dma_sync_*` APIs
are intended to be used with memory allocated from `dma_alloc_*`
In the code I can see this comment which originally from the unmodified
kernel code that IMGTEC shipped
```
/*
* dmac cache functions are supposed to be used for dma
* memory which comes from dma-able memory. However examining
* the implementation of dmac cache functions and experimenting,
* can assert that dmac functions are safe to use for high-mem
* memory as well for our OS{Clean/Flush/Invalidate}Cache functions
*
*/
```
I suspect this comment is no longer true, and if we want to use the DMA
cache ops we need to allocate memory using the dma allocate functions,
instead of using `vmalloc` or a modified version of `vmalloc`. This
seems to be trivial change though as its difficult to tell if
`OSAllocPages_Impl` is just used for DMA memory, or just kernel memory
as well. There is probably a viable strategy of using the mips platform
specific cache operations in `include/asm/cache*` instead of the `dma`
ones. I don't think this would be a problem since the code for section
is already wrapped in a `#elif defined(__mips__)`. Although if the goal
of this project is to get this kernel module upstreamed, I suspect the
kernel maintainers would prefer if we didn't use platform specific
`ifdefs` and instead used the common kernel infrastructure that is
platform independent.
For fun I just removed the call to `dma_sync_single_for_device` to see
what would happen and the error went away, allowing `pvrsrvctl` to get
further along until it printed these errors
```
[ 89.495046] PVR_K: (FAIL) SGXInit: Mismatch in client-side and KM
driver build options.
[ 89.531664] PVR_K: Extra options present in client-side driver:
(0xa100). Please check sgx_options.h
[ 89.569613] PVR_K:(Error): PVRSRVFinaliseSystem: Failed
PVRSRVDevInitCompatCheck call (device index: 0)
[ 89.608249] PVR_K:(Error): BridgedDispatchKM: Initialisation failed.
Driver unusable.
pvrsrvctl: SrvInit failed (already initialized?)
(err=PVRSRV_ERROR_BUILD_MISMATCH
```
So after figuring out a fix for these cache operations, looks like next
steps is spoofing the driver build options to match what the userland
expects :)
More information about the openpvrsgx-devgroup
mailing list