[Letux-kernel] New LetuxOS Kernels - strcmp(NULL)
H. Nikolaus Schaller
hns at goldelico.com
Sun Jun 24 22:14:54 CEST 2018
> Am 24.06.2018 um 20:53 schrieb Andreas Kemnade <andreas at kemnade.info>:
> On Sun, 24 Jun 2018 20:17:38 +0200
> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>>> Am 24.06.2018 um 18:08 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>>> Am 24.06.2018 um 09:52 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>>> Main suspect is the generic-adc-battery driver (although I remember to have seen the
>>>> strcmp(NULL) once even when blackisting it).
>>>> The reason why it is the my main suspect is that the message iio_charge:-747
>>>> seems to almost always come before platform backlight: Retrying from deferred list
>>>> (AFAIR in failing and non-failing cases).
>>>> But they did not yet confess :)
>>> well, there is new evidence:
>>> [ 7.479400] really_probe: driver generic-adc-battery
>>> [ 7.484710] gab_probe: properties = ed1fde00
>>> ^^^ this is a printk for memory block address allocated by
>>> [ 7.490631] i2c 1-0030: Retrying from deferred list
>>> [ 7.495849] really_probe: driver ov9655
>>> [ 7.508209] platform backlight: Retrying from deferred list
>>> [ 7.514526] really_probe: driver pwm-backlight
>>> [ 7.524291] pps_core: LinuxPPS API ver. 1 registered
>>> [ 7.532531] pwm-backlight backlight: backlight supply power not found, using dummy regulator
>>> [ 7.545471] (NULL device *): hwmon: 'gta04-battery' is not a valid name attribute, please fix
>>> [ 7.554840] i2c 1-0030: Retrying from deferred list
>>> [ 7.560089] really_probe: driver ov9655
>>> [ 7.570648] platform backlight: Retrying from deferred list
>>> [ 7.583770] really_probe: driver pwm-backlight
>>> [ 7.598083] pinctrl_generic_get_group_name: group ed1fde50 has NULL name
>>> ^^^ here is the struct group_desc where the group->name magically became NULL
>>> [ 7.605102] group>name = (null)
>>> [ 7.613464] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti at linux.it>
>>> [ 7.623229] group>pins = ed1faf10
>>> [ 7.626861] group>num_pins = 1
>>> [ 7.630340] group>data = ee444e10
>>> [ 7.633972] pinctrl_generic_group_name_to_selector: pctldev = ee3bb200 function = backlight_pins_pinmux
>>> [ 7.653869] really_probe: driver omap-dmtimer-pwm
>>> [ 7.658966] strcmp(backlight_pins_pinmux, (null))
>>> [ 7.664062] get_group_name() = pinctrl_generic_get_group_name+0x0/0xa0
>>> [ 7.682098] ngroups = 19
>>> [ 7.684936] selector = 18
>>> [ 7.689056] really_probe: driver connector-analog-tv
>>> So we have these addresses:
>>> This is just 0x50 bytes apart.
>>> A write to psy_desc->properties[0x14] = NULL would overwrite ed1fde50->name in this example.
>> I may have found the final evidence of an out-of-bounds write access (unfortunately not a NULL) in line
>> memcpy(properties + sizeof(*(psy_desc->properties)) * index,
>> the + of 'properties' and some integer already multiplies the index by sizeof(*properties).
>> Hence it should IMHO be
>> memcpy(properties + index,
>> or to better describe that memcpy takes an address
>> What do you think?
> that code looks more weird the longer I am looking at it.
Yes, it needs readability improvements.
> Hmm, we are using memcpy for simply copying over a single enum or what?
well, gab_dyn_props[chan] and * properties are both of type enum power_supply_property,
i.e. int. And indeed, a simply assignment should suffice.
> If sizeof(gab_dyn_props[chan] would be bigger, then we need to increase
> index by more than one.
I am currently testing both variants. If one works and the other doesn't
we have found it. Because run time is identical, it can't change timing
of concurrent threads and operations.
On first glance the weird memcpy() seems to have been introduced by
but using a memcpy() is older. A first check it seems to be wrongly introduced in 3.7-rc1:
So we may have a fix for some quite old stable kernels... If it is the
reason for our problem.
> PS: 275 boots without Oopses with gab blacklisted...
Well, absence of Oopses doesn't say too much. Blacklisting pwm_bl
would likely give the same result according to my tests.
How many did you see with not blacklisting anything? 100% or something significantly high?
And then please replace the strange
memcpy(properties + sizeof(*(psy_desc->properties)) * index,
properties[index] = gab_dyn_props[chan];
and run your auto-repeated tests. If it goes down (to 0%) we have found it.
If we really have found it, I'll prepare a patch for 4.18-rc2 and 4.17.2
for further evaluation and tests (also on Pyra). And then we can upstream it.
> I was testing my raingear at the time, so no time wasted
More information about the Letux-kernel