[Letux-kernel] New LetuxOS Kernels - strcmp(NULL)

H. Nikolaus Schaller hns at goldelico.com
Mon Jun 25 07:47:18 CEST 2018


> Am 25.06.2018 um 07:14 schrieb Andreas Kemnade <andreas at kemnade.info>:
> 
> On Sun, 24 Jun 2018 22:14:54 +0200
> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
> 
>>> Am 24.06.2018 um 20:53 schrieb Andreas Kemnade <andreas at kemnade.info>:
>>> 
>>> On Sun, 24 Jun 2018 20:17:38 +0200
>>> "H. Nikolaus Schaller" <hns at goldelico.com> wrote:
>>> 
>>>>> Am 24.06.2018 um 18:08 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>> Am 24.06.2018 um 09:52 schrieb H. Nikolaus Schaller <hns at goldelico.com>:
>>>>>> 
>>>>>> Main suspect is the generic-adc-battery driver (although I remember to have seen the
>>>>>> strcmp(NULL) once even when blackisting it).
>>>>>> 
>>>>>> The reason why it is the my main suspect is that the message iio_charge:-747
>>>>>> seems to almost always come before platform backlight: Retrying from deferred list
>>>>>> (AFAIR in failing and non-failing cases).
>>>>>> 
>>>>>> But they did not yet confess :)    
>>>>> 
>>>>> well, there is new evidence:
>>>>> 
>>>>> [    7.479400] really_probe: driver generic-adc-battery
>>>>> [    7.484710] gab_probe: properties = ed1fde00
>>>>> 
>>>>> ^^^ this is a printk for memory block address allocated by
>>>>> 
>>>>> http://git.goldelico.com/?p=gta04-kernel.git;a=blob;f=drivers/power/supply/generic-adc-battery.c;h=359cc8dac245c071f1ff21b12791ad6c5a9332c4;hb=dca26f608a765008b869991bf29fa241769599fb#l404
>>>>> 
>>>>> [    7.490631] i2c 1-0030: Retrying from deferred list
>>>>> [    7.495849] really_probe: driver ov9655
>>>>> [    7.508209] platform backlight: Retrying from deferred list
>>>>> [    7.514526] really_probe: driver pwm-backlight
>>>>> [    7.524291] pps_core: LinuxPPS API ver. 1 registered
>>>>> [    7.532531] pwm-backlight backlight: backlight supply power not found, using dummy regulator
>>>>> [    7.545471] (NULL device *): hwmon: 'gta04-battery' is not a valid name attribute, please fix
>>>>> [    7.554840] i2c 1-0030: Retrying from deferred list
>>>>> [    7.560089] really_probe: driver ov9655
>>>>> [    7.570648] platform backlight: Retrying from deferred list
>>>>> [    7.583770] really_probe: driver pwm-backlight
>>>>> [    7.598083] pinctrl_generic_get_group_name: group ed1fde50 has NULL name
>>>>> 
>>>>> ^^^ here is the struct group_desc where the group->name magically became NULL
>>>>> 
>>>>> [    7.605102]   group>name = (null)
>>>>> [    7.613464] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti at linux.it>
>>>>> [    7.623229]   group>pins = ed1faf10
>>>>> [    7.626861]   group>num_pins = 1
>>>>> [    7.630340]   group>data = ee444e10
>>>>> [    7.633972] pinctrl_generic_group_name_to_selector: pctldev = ee3bb200 function = backlight_pins_pinmux
>>>>> [    7.653869] really_probe: driver omap-dmtimer-pwm
>>>>> [    7.658966]   strcmp(backlight_pins_pinmux, (null))
>>>>> [    7.664062]   get_group_name() = pinctrl_generic_get_group_name+0x0/0xa0
>>>>> [    7.682098]   ngroups = 19
>>>>> [    7.684936]   selector = 18
>>>>> [    7.689056] really_probe: driver connector-analog-tv
>>>>> 
>>>>> So we have these addresses:
>>>>> 
>>>>> 	ed1fde00
>>>>> 	ed1fde50
>>>>> 
>>>>> This is just 0x50 bytes apart.
>>>>> 
>>>>> A write to psy_desc->properties[0x14] = NULL would overwrite ed1fde50->name in this example.    
>>>> 
>>>> I may have found the final evidence of an out-of-bounds write access (unfortunately not a NULL) in line
>>>> 
>>>> http://git.goldelico.com/?p=gta04-kernel.git;a=blob;f=drivers/power/supply/generic-adc-battery.c;h=359cc8dac245c071f1ff21b12791ad6c5a9332c4;hb=dca26f608a765008b869991bf29fa241769599fb#l429
>>>> 
>>>>  memcpy(properties + sizeof(*(psy_desc->properties)) * index,
>>>> 
>>>> the + of 'properties' and some integer already multiplies the index by sizeof(*properties).
>>>> 
>>>> Hence it should IMHO be
>>>> 
>>>>  memcpy(properties + index,
>>>> 
>>>> or to better describe that memcpy takes an address
>>>> 
>>>>  memcpy(&properties[index],
>>>> 
>>>> What do you think?  
>>> 
>>> that code looks more weird the longer I am looking at it.  
>> 
>> Yes, it needs readability improvements.
>> 
>>> Hmm, we are using memcpy for simply copying over a single enum or what?  
>> 
>> well, gab_dyn_props[chan] and * properties are both of type enum power_supply_property,
>> i.e. int. And indeed, a simply assignment should suffice.
>> 
>>> If sizeof(gab_dyn_props[chan] would be bigger, then we need to increase
>>> index by more than one.  
>> 
>> I am currently testing both variants. If one works and the other doesn't
>> we have found it. Because run time is identical, it can't change timing
>> of concurrent threads and operations.
>> 
>> On first glance the weird memcpy() seems to have been introduced by
>> 
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/diff/drivers/power/generic-adc-battery.c?h=v4.17.2&id=297d716f6260cc9421d971b124ca196b957ee458
>> 
>> but using a memcpy() is older. A first check it seems to be wrongly introduced in 3.7-rc1:
>> 
>> https://elixir.bootlin.com/linux/v3.7-rc1/source/drivers/power/generic-adc-battery.c#L239
>> 
>> So we may have a fix for some quite old stable kernels... If it is the
>> reason for our problem.
>> 
>>> 
>>> Regards,
>>> Andreas
>>> PS: 275 boots without Oopses with gab blacklisted...  
>> 
>> Well, absence of Oopses doesn't say too much. Blacklisting pwm_bl
>> would likely give the same result according to my tests.
>> 
>> How many did you see with not blacklisting anything? 100% or something significantly high?
>> 
>> And then please replace the strange
>> 
>> 			memcpy(properties + sizeof(*(psy_desc->properties)) * index,
>> 					&gab_dyn_props[chan],
>> 					sizeof(gab_dyn_props[chan]));
>> 
>> by
>> 
>> 			properties[index] = gab_dyn_props[chan];
>> 
>> and run your auto-repeated tests. If it goes down (to 0%) we have found it.
>> 
> 268 boots.
> 1% empty log (have to evaluate what that means, had the same in the other tests
> as well but was only grepping for Oopses).
> 0% Oopses
> 
> But even if that would not fix the error it should be worth patching it.
> that old line of code really causes a segfault while using /dev/brain

Indeed. I will prepare and submit a patch now.

> 
> I tried this one:
> 
> diff --git a/drivers/power/supply/generic-adc-battery.c b/drivers/power/supply/generic-adc-battery.c
> index 359cc8dac245..7c5a8dbcb463 100644
> --- a/drivers/power/supply/generic-adc-battery.c
> +++ b/drivers/power/supply/generic-adc-battery.c
> @@ -375,7 +375,6 @@ static int gab_probe(struct platform_device *pdev)
>        int ret = 0;
>        int chan;
>        int index = 0;
> -
>        adc_bat = devm_kzalloc(&pdev->dev, sizeof(*adc_bat), GFP_KERNEL);
>        if (!adc_bat) {
>                dev_err(&pdev->dev, "failed to allocate memory\n");
> @@ -426,9 +425,7 @@ static int gab_probe(struct platform_device *pdev)
>                        adc_bat->channel[chan] = NULL;
>                } else {
>                        /* copying properties for supported channels only */
> -                       memcpy(properties + sizeof(*(psy_desc->properties)) * index,
> -                                       &gab_dyn_props[chan],
> -                                       sizeof(gab_dyn_props[chan]));
> +                       properties[index] = gab_dyn_props[chan];
>                        index++;
>                }
>        }

Yes, I tried this change as well. And there is even a difference
visible in user-space.

without patch:

root at letux:~# cat /sys/class/power_supply/gta04-battery/uevent 
POWER_SUPPLY_NAME=gta04-battery
POWER_SUPPLY_STATUS=Discharging
POWER_SUPPLY_CHARGE_FULL_DESIGN=1250000
POWER_SUPPLY_CHARGE_EMPTY_DESIGN=0
POWER_SUPPLY_CHARGE_NOW=37500
POWER_SUPPLY_VOLTAGE_NOW=3653000
POWER_SUPPLY_CURRENT_NOW=-750000
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=3200000
POWER_SUPPLY_VOLTAGE_MAX_DESIGN=0
POWER_SUPPLY_MODEL_NAME=gta04-battery
POWER_SUPPLY_TEMP=1
POWER_SUPPLY_CAPACITY=3
POWER_SUPPLY_VOLTAGE_NOW=3665000
POWER_SUPPLY_STATUS=Discharging
POWER_SUPPLY_STATUS=Discharging
root at letux:~# 

with patch:

root at letux:~# cat /sys/class/power_supply/gta04-battery/uevent 
POWER_SUPPLY_NAME=gta04-battery
POWER_SUPPLY_STATUS=Discharging
POWER_SUPPLY_CHARGE_FULL_DESIGN=1250000
POWER_SUPPLY_CHARGE_EMPTY_DESIGN=0
POWER_SUPPLY_CHARGE_NOW=25000
POWER_SUPPLY_VOLTAGE_NOW=3560000
POWER_SUPPLY_CURRENT_NOW=-747000
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_VOLTAGE_MIN_DESIGN=3200000
POWER_SUPPLY_VOLTAGE_MAX_DESIGN=0
POWER_SUPPLY_MODEL_NAME=gta04-battery
POWER_SUPPLY_TEMP=4
POWER_SUPPLY_CAPACITY=2
POWER_SUPPLY_VOLTAGE_NOW=3636000
POWER_SUPPLY_CURRENT_NOW=-750000
POWER_SUPPLY_STATUS=Discharging
root at letux:~# 

Without patch there are three POWER_SUPPLY_STATUS entries.
This is because the enum value is copied to the wrong location.

Well, there is also duplication, but this is correct if
we accept what the gab_probe function is doing: append raw iio
channels to list of predefined properties without checking for
duplicates. Maybe we should also add a patch that checks for
duplicates.

So I am quite confident that we finally have identified
the offender.

Let me thank everybody who helped to identify it!

BR,
Nikolaus



More information about the Letux-kernel mailing list