[Letux-kernel] New LetuxOS Kernels - strcmp(NULL)
H. Nikolaus Schaller
hns at goldelico.com
Mon Jun 25 07:47:18 CEST 2018
> Am 25.06.2018 um 07:14 schrieb Andreas Kemnade <andreas at kemnade.info>:
>
> On Sun, 24 Jun 2018 22:14:54 +0200
> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>
>>> Am 24.06.2018 um 20:53 schrieb Andreas Kemnade <andreas at kemnade.info>:
>>>
>>> On Sun, 24 Jun 2018 20:17:38 +0200
>>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>>>
>>>>> Am 24.06.2018 um 18:08 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> Am 24.06.2018 um 09:52 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>>>>>
>>>>>> Main suspect is the generic-adc-battery driver (although I remember to have seen the
>>>>>> strcmp(NULL) once even when blackisting it).
>>>>>>
>>>>>> The reason why it is the my main suspect is that the message iio_charge:-747
>>>>>> seems to almost always come before platform backlight: Retrying from deferred list
>>>>>> (AFAIR in failing and non-failing cases).
>>>>>>
>>>>>> But they did not yet confess :)
>>>>>
>>>>> well, there is new evidence:
>>>>>
>>>>> [ 7.479400] really_probe: driver generic-adc-battery
>>>>> [ 7.484710] gab_probe: properties = ed1fde00
>>>>>
>>>>> ^^^ this is a printk for memory block address allocated by
>>>>>
>>>>> http://git.goldelico.com/?p=gta04-kernel.git;a=blob;f=drivers/power/supply/generic-adc-battery.c;h=359cc8dac245c071f1ff21b12791ad6c5a9332c4;hb=dca26f608a765008b869991bf29fa241769599fb#l404
>>>>>
>>>>> [ 7.490631] i2c 1-0030: Retrying from deferred list
>>>>> [ 7.495849] really_probe: driver ov9655
>>>>> [ 7.508209] platform backlight: Retrying from deferred list
>>>>> [ 7.514526] really_probe: driver pwm-backlight
>>>>> [ 7.524291] pps_core: LinuxPPS API ver. 1 registered
>>>>> [ 7.532531] pwm-backlight backlight: backlight supply power not found, using dummy regulator
>>>>> [ 7.545471] (NULL device *): hwmon: 'gta04-battery' is not a valid name attribute, please fix
>>>>> [ 7.554840] i2c 1-0030: Retrying from deferred list
>>>>> [ 7.560089] really_probe: driver ov9655
>>>>> [ 7.570648] platform backlight: Retrying from deferred list
>>>>> [ 7.583770] really_probe: driver pwm-backlight
>>>>> [ 7.598083] pinctrl_generic_get_group_name: group ed1fde50 has NULL name
>>>>>
>>>>> ^^^ here is the struct group_desc where the group->name magically became NULL
>>>>>
>>>>> [ 7.605102] group>name = (null)
>>>>> [ 7.613464] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti at linux.it>
>>>>> [ 7.623229] group>pins = ed1faf10
>>>>> [ 7.626861] group>num_pins = 1
>>>>> [ 7.630340] group>data = ee444e10
>>>>> [ 7.633972] pinctrl_generic_group_name_to_selector: pctldev = ee3bb200 function = backlight_pins_pinmux
>>>>> [ 7.653869] really_probe: driver omap-dmtimer-pwm
>>>>> [ 7.658966] strcmp(backlight_pins_pinmux, (null))
>>>>> [ 7.664062] get_group_name() = pinctrl_generic_get_group_name+0x0/0xa0
>>>>> [ 7.682098] ngroups = 19
>>>>> [ 7.684936] selector = 18
>>>>> [ 7.689056] really_probe: driver connector-analog-tv
>>>>>
>>>>> So we have these addresses:
>>>>>
>>>>> ed1fde00
>>>>> ed1fde50
>>>>>
>>>>> This is just 0x50 bytes apart.
>>>>>
>>>>> A write to psy_desc->properties[0x14] = NULL would overwrite ed1fde50->name in this example.
>>>>
>>>> I may have found the final evidence of an out-of-bounds write access (unfortunately not a NULL) in line
>>>>
>>>> http://git.goldelico.com/?p=gta04-kernel.git;a=blob;f=drivers/power/supply/generic-adc-battery.c;h=359cc8dac245c071f1ff21b12791ad6c5a9332c4;hb=dca26f608a765008b869991bf29fa241769599fb#l429
>>>>
>>>> memcpy(properties + sizeof(*(psy_desc->properties)) * index,
>>>>
>>>> the + of 'properties' and some integer already multiplies the index by sizeof(*properties).
>>>>
>>>> Hence it should IMHO be
>>>>
>>>> memcpy(properties + index,
>>>>
>>>> or to better describe that memcpy takes an address
>>>>
>>>> memcpy(&properties[index],
>>>>
>>>> What do you think?
>>>
>>> that code looks more weird the longer I am looking at it.
>>
>> Yes, it needs readability improvements.
>>
>>> Hmm, we are using memcpy for simply copying over a single enum or what?
>>
>> well, gab_dyn_props[chan] and * properties are both of type enum power_supply_property,
>> i.e. int. And indeed, a simply assignment should suffice.
>>
>>> If sizeof(gab_dyn_props[chan] would be bigger, then we need to increase
>>> index by more than one.
>>
>> I am currently testing both variants. If one works and the other doesn't
>> we have found it. Because run time is identical, it can't change timing
>> of concurrent threads and operations.
>>
>> On first glance the weird memcpy() seems to have been introduced by
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/diff/drivers/power/generic-adc-battery.c?h=v4.17.2&id=297d716f6260cc9421d971b124ca196b957ee458
>>
>> but using a memcpy() is older. A first check it seems to be wrongly introduced in 3.7-rc1:
>>
>> https://elixir.bootlin.com/linux/v3.7-rc1/source/drivers/power/generic-adc-battery.c#L239
>>
>> So we may have a fix for some quite old stable kernels... If it is the
>> reason for our problem.
>>
>>>
>>> Regards,
>>> Andreas
>>> PS: 275 boots without Oopses with gab blacklisted...
>>
>> Well, absence of Oopses doesn't say too much. Blacklisting pwm_bl
>> would likely give the same result according to my tests.
>>
>> How many did you see with not blacklisting anything? 100% or something significantly high?
>>
>> And then please replace the strange
>>
>> memcpy(properties + sizeof(*(psy_desc->properties)) * index,
>> &gab_dyn_props[chan],
>> sizeof(gab_dyn_props[chan]));
>>
>> by
>>
>> properties[index] = gab_dyn_props[chan];
>>
>> and run your auto-repeated tests. If it goes down (to 0%) we have found it.
>>
> 268 boots.
> 1% empty log (have to evaluate what that means, had the same in the other tests
> as well but was only grepping for Oopses).
> 0% Oopses
>
> But even if that would not fix the error it should be worth patching it.
> that old line of code really causes a segfault while using /dev/brain
Indeed. I will prepare and submit a patch now.
>
> I tried this one:
>
> diff --git a/drivers/power/supply/generic-adc-battery.c b/drivers/power/supply/generic-adc-battery.c
> index 359cc8dac245..7c5a8dbcb463 100644
> --- a/drivers/power/supply/generic-adc-battery.c
> +++ b/drivers/power/supply/generic-adc-battery.c
> @@ -375,7 +375,6 @@ static int gab_probe(struct platform_device *pdev)
> int ret = 0;
> int chan;
> int index = 0;
> -
> adc_bat = devm_kzalloc(&pdev->dev, sizeof(*adc_bat), GFP_KERNEL);
> if (!adc_bat) {
> dev_err(&pdev->dev, "failed to allocate memory\n");
> @@ -426,9 +425,7 @@ static int gab_probe(struct platform_device *pdev)
> adc_bat->channel[chan] = NULL;
> } else {
> /* copying properties for supported channels only */
> - memcpy(properties + sizeof(*(psy_desc->properties)) * index,
> - &gab_dyn_props[chan],
> - sizeof(gab_dyn_props[chan]));
> + properties[index] = gab_dyn_props[chan];
> index++;
> }
> }
Yes, I tried this change as well. And there is even a difference
visible in user-space.
without patch:
root at letux:~# cat /sys/class/power_supply/gta04-battery/uevent
POWER_SUPPLY_NAME=gta04-battery
POWER_SUPPLY_STATUS=Discharging
POWER_SUPPLY_CHARGE_FULL_DESIGN=1250000
POWER_SUPPLY_CHARGE_EMPTY_DESIGN=0
POWER_SUPPLY_CHARGE_NOW=37500
POWER_SUPPLY_VOLTAGE_NOW=3653000
POWER_SUPPLY_CURRENT_NOW=-750000
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=3200000
POWER_SUPPLY_VOLTAGE_MAX_DESIGN=0
POWER_SUPPLY_MODEL_NAME=gta04-battery
POWER_SUPPLY_TEMP=1
POWER_SUPPLY_CAPACITY=3
POWER_SUPPLY_VOLTAGE_NOW=3665000
POWER_SUPPLY_STATUS=Discharging
POWER_SUPPLY_STATUS=Discharging
root at letux:~#
with patch:
root at letux:~# cat /sys/class/power_supply/gta04-battery/uevent
POWER_SUPPLY_NAME=gta04-battery
POWER_SUPPLY_STATUS=Discharging
POWER_SUPPLY_CHARGE_FULL_DESIGN=1250000
POWER_SUPPLY_CHARGE_EMPTY_DESIGN=0
POWER_SUPPLY_CHARGE_NOW=25000
POWER_SUPPLY_VOLTAGE_NOW=3560000
POWER_SUPPLY_CURRENT_NOW=-747000
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=3200000
POWER_SUPPLY_VOLTAGE_MAX_DESIGN=0
POWER_SUPPLY_MODEL_NAME=gta04-battery
POWER_SUPPLY_TEMP=4
POWER_SUPPLY_CAPACITY=2
POWER_SUPPLY_VOLTAGE_NOW=3636000
POWER_SUPPLY_CURRENT_NOW=-750000
POWER_SUPPLY_STATUS=Discharging
root at letux:~#
Without patch there are three POWER_SUPPLY_STATUS entries.
This is because the enum value is copied to the wrong location.
Well, there is also duplication, but this is correct if
we accept what the gab_probe function is doing: append raw iio
channels to list of predefined properties without checking for
duplicates. Maybe we should also add a patch that checks for
duplicates.
So I am quite confident that we finally have identified
the offender.
Let me thank everybody who helped to identify it!
BR,
Nikolaus
More information about the Letux-kernel
mailing list