Jump to content

Making EspressoBin V7 work in 2022 ...


fallen_aarch64

Recommended Posts

@pali, I appreciate the attempt to fix, but as you point out, it shouldn't be happening due to reserved space.  Also, I think the environment is in a different order anyway, such that it isn't hitting the beginning of scriptaddr.  And lastly, I've been playing with u-boot 2022.10 and TF-A 2.8, and I can't repeat the problem myself.  So it may be some deeper problem that includes even the toolchain, or even something to do with starting from the older environment first.

 

We've noted that the existence of "scriptaddr" and not "criptaddr" is an important check when board fails to boot.  Better still will be to keep the platform firmware up to date so as to not have this bug at all.

 

But as for your patch: Doesn't it check each byte two times?  Do we care about efficiency?  I propose:

while (*ptr != '\0') {
  do { ptr++; } while (*ptr != '\0') ;
  ptr++;
}

The inner while will scan until the end of the current string, while the outer loop ends on the second null.  There's an initial condition of being at the end of the list already, which is detected by starting on a null rather than a character.

 

Link to comment
Share on other sites

You are right that my patch checks every byte twice, but it should not be an issue as it is still linear. My patch is simple and I prefer simple solution. Your solution is a bit complicated at the cost to read every byte just once. But because it is complicated there is issue with it -- it took me some time to find it -- you are missing ptr--; after the outer while loop otherwise you would have two nul bytes.

 

Anyway, I would really like to hear from somebody if my patch above fixes this criptaddr issue. Or we are still dealing with another hidden problem.

 

I think that I should send this patch to upstream because currenly, if I understand (previous) code correctly, it drops all variables after the first valid variable.

 

And if I'm looking at it, at the end of function is missing code which will add two nul bytes into *ptr as final terminal of variables.

 

Hm... and maybe this could be a reason for criptaddr? First issue that code dropped all variables after the first one and then second issue that final two nul byte terminal is missing which leaded to read garbage of previous bytes in storage (which probably started with criptaddr and continued with valid env structure of remaining variables).

Link to comment
Share on other sites

Something like this patch:

diff --git a/board/Marvell/mvebu_armada-37xx/board.c b/board/Marvell/mvebu_armada-37xx/board.c
index c6ecc323bb99..0c6ac126d0e7 100644
--- a/board/Marvell/mvebu_armada-37xx/board.c
+++ b/board/Marvell/mvebu_armada-37xx/board.c
@@ -145,6 +145,8 @@ int board_late_init(void)
 		strcpy(ptr, "fdtfile=marvell/armada-3720-espressobin-emmc.dtb");
 	else
 		strcpy(ptr, "fdtfile=marvell/armada-3720-espressobin.dtb");
+	ptr += strlen(ptr) + 1;
+	*ptr = '\0';
 
 	return 0;
 }

 

Link to comment
Share on other sites

I found additional reports: https://github.com/mirrexagon/espressobin-nix/issues/6

 

You say a missing decrement... Maybe I misunderstand where the pointer ought to end up, I thought the end result was to point at the null at the end of the list, not the null at the end of the last string.  These test cases:

1. The array of strings is empty, so there are only nulls: the end result is to immediately leave the outer loop, and ptr is pointing at the beginning of empty memory.  Since the default environment is *probably* not empty, maybe this can be skipped.

2. First string is only 1 character before a null.  Since the environment is "key=value" strings, maybe this can be ignored.  The end result of my algorithm is to increment, find a null, increment, and finish - pointing at the second null to begin writing additional variables.

3. first string is multiple characters.  enter while, increment, find not null, increment... until the null, move one more space, find a second null, and finish

4. 1 character and then a second string of one character: enter while, increment, finish inner, increment, find second string, enter while, increment, find null, increment, find null, finish pointing at second null, ready to write additional variables.

 

Your later patch seems superfluous, but is there a reason for the change from sprintf earlier to strcpy later?  For that matter, if we're going to use strlen here, why are we reinventing the wheel above?

if (*ptr != '\0') { // Checks for empty default env
  while (i = strlen(ptr)) { 
    ptr += i + 1;
  }
}

Yeah, I'm using a zero for strlen as false, maybe that's bad coding practice. 

 

As for writing a final null, what is the default_environment location?  Is it initialized heap space, about to be written to the environment?  We shouldn't need to write a null then, it's all nulls.

 

And it looks like if someone intentionally filled ethaddr and eth{1,2,3}addr with 17 characters, the four of them could fill more than 60 characters of the buffer.

Link to comment
Share on other sites

3 minutes ago, ManoftheSea said:

And it looks like if someone intentionally filled ethaddr and eth{1,2,3}addr with 17 characters

 

For every eth{,1,2,3}addr (totally 4) is reserved 30 characters. Hence totally there is 4*30+60 = 180 bytes of free space in buffer.

 

5 minutes ago, ManoftheSea said:

is there a reason for the change from sprintf earlier to strcpy later?

 

sprintf is more complicated function, strcpy is simple and compiler can optimize it (= eliminate its usage) during LTO.

 

6 minutes ago, ManoftheSea said:

You say a missing decrement... Maybe I misunderstand where the pointer ought to end up, I thought the end result was to point at the null at the end of the list, not the null at the end of the last string.

 

Hm.. looking at the code and I'm not sure right now... So pointer must point to the location where new variable starts, so one byte after the nul byte separator.

 

8 minutes ago, ManoftheSea said:

 

As for writing a final null, what is the default_environment location?  Is it initialized heap space, about to be written to the environment? 

 

Static r/w global array.

 

8 minutes ago, ManoftheSea said:

We shouldn't need to write a null then, it's all nulls.

 

This can prove my above hypothesis how criptaddr was generated. With writing this new nul byte there should not be any criptaddr anymore and also no other variable.

Link to comment
Share on other sites

180 bytes: Oh, that's fine then.  I also understand you on sprintf, since it's doing variable insertion in the strings there.  Which means strlen would be the more readable code, rather than either of our proposals here.  If there were an error and ptr pointed one byte wrong in either direction, we'd either see eth3addr=00:11:22:33:44:55fdtfile=... or we'd just not have an fdtfile variable (as it's after a double null).  It wouldn't affect scriptaddr.

 

Well, if you'll continue to indulge my growing understanding.  I see where there's 30 bytes of filler added for the four ethaddr.  But "ethaddr=" is 8 characters (and a null), while eth?addr is 9 and null.  Shouldn't the check be for the current environment mac be less than 21 or 20 characters?  Why 17?  Because a MAC is 17 characters... but then why reserve extra space?

Maybe 12 extra bytes just don't matter?  I suppose if eth{,1,2,3}addr aren't set, there's even more empty space.  Okay, it's not important.

 

Okay, I find EXTRA_ENV_SETTINGS in include/configs/mvebu_armada-37xx.h, which puts scriptaddr as the first variable and the extra buffer space at the end of the buffer.  and then include/env_default populates variables and appends extra_env_settings, with "mtdparts" being the immediate predecessor if configured.  Which means Default_environment cannot be an empty array, so that check is unnecessary, and doesn't have any variables of only 1 character.  I don't see how scriptaddr can be overwritten in this function. 

 

I'm going to keep exploring.

Link to comment
Share on other sites

CONFIG_EXTRA_ENV_SETTINGS is put at the end of the default_environment[] array. So if you start overwritting default_environment[] in the middle it could result that you overwrite also first byte of the "scriptaddr" string as it is at the beginning of the CONFIG_EXTRA_ENV_SETTINGS, which means it is near the end of the default_environment[]. And as I described in some previous post, due to wrong logic of finding free space in default_environment[], it can really happen.

Link to comment
Share on other sites

Oh, I see.  The current logic finds the last character of the first variable in default environment, then moves 2 forward: that gives the first character of the second string.  Which it then overwrites with the eth{,1,2,3)addr variables and the fdtfile, overwriting other default environment variables... which finishes by putting a null byte, overwriting the beginning of some variable and leaving a fragment.  Yeah, I can see how it'd happen.

 

Let's see, what vars: { bootcmd, arch, cpu, board, boardname, vendor, soc } are I believe included in the default config...

arch=arm

cpu=armv8

board=mvebu_armada-37xx

boardname=mvebu_armada-37xx

vendor=Marvell

soc=mvebu

9 + 10 + 24 + 27 + 15 + 10 = 95 characters.  eth*addrs is 9 + 10 + 10 + 10, and fdtfile string is... 44. 83 characters for a v5, or 86 for a v7.  That doesn't quite line up, but seems very possible.

 

Oh, but I'm looking at 2022.10... Let me dig out the old version.

 

Link to comment
Share on other sites

When I look at the compiled 2022.10, I find a block in mtd0 with

Quote

bootcmd=run distro_bootcmd bootdelay=2 baudrate=115200 loadaddr=0x6000000 arch=arm cpu=armv8 board=mvebu_armada-37xx board_name=mvebu_armada-37xx vendor=Marvell soc=mvebu mtdids= mtdparts= scriptaddr=0x6d0000

That's 163 characters between the end of bootcmd and the beginning of scriptaddr.  On my v7 which has shown the problem before running the 2022.04 firmware, I get 154 characters (adds "preboot=" but drops both mtdids and mtdparts)

 

Let me see which of these variables survive the "env default -a"

Link to comment
Share on other sites

Yes, okay, I have compiled against the latest and see improvements, though I can't figure out exactly how the old bug worked.  It's odd that it stomps on more variables than it seems like it should, but not ALL of the variables.

 

I'll attempt to submit a patch to lists.denx.de.

 

Link to comment
Share on other sites

6 hours ago, ManoftheSea said:

That's 163 characters between the end of bootcmd and the beginning of scriptaddr. 

 

No, because printenv sort variables by name. But in default_environment[] they are not sorted. For example distro_bootcmd= is after the scriptaddr= and preboot= is before arch=

Link to comment
Share on other sites

On 11/28/2022 at 8:29 PM, ManoftheSea said:

But as for your patch: Doesn't it check each byte two times? 

 

I checked my version:

char *test1(char *ptr) {
  while (*ptr != '\0' || *(ptr+1) != '\0') ptr++;
  ptr++;
  return ptr;
}

 

And your version:

#define strlen(p) __builtin_strlen(p)
char *test2(char *ptr) {
  int i;
  if (*ptr != '\0') {
    while (i = strlen(ptr)) {
      ptr += i + 1;
    }
  }
  return ptr;
}

 

And for my version gcc generates more efficient code:

test1:
.L2:
	mov	r3, r0
	ldrb	r2, [r3]	@ zero_extendqisi2
	add	r0, r0, #1
	cmp	r2, #0
	bne	.L2
	ldrb	r2, [r0]	@ zero_extendqisi2
	cmp	r2, #0
	bne	.L2
	bx	lr

 

For your version gcc generates:

test2:
	push	{r4, lr}
	ldrb	r3, [r0]	@ zero_extendqisi2
	mov	r4, r0
	cmp	r3, #0
	bne	.L6
.L7:
	mov	r0, r4
	pop	{r4, pc}
.L8:
	add	r0, r0, #1
	add	r4, r4, r0
.L6:
	mov	r0, r4
	bl	strlen(PLT)
	cmp	r0, #0
	bne	.L8
	b	.L7

 

I compiled it with arm-linux-gnueabi-gcc -Os -S test.c

 

I specially put there __builtin_strlen so gcc would have a chance to optimize that loop. But neither with -O2 nor with -O3 it was able to eliminate it.

 

On the other hand for my code, gcc was able to figure out that every byte is reading two times and it is not neceserry and it can be optimized to read every byte just once. So it even generate shorter code when you compare number of instructions (important for u-boot which has space limitation).

 

So it is interesting to know this kind of information. And I'm surprised that it also eliminated the final ptr++ statement.

Edited by Pali
Link to comment
Share on other sites

That later version (which I sent to the u-boot list, did you see it) was for readability.  The code I proposed for size/speed was

while (*ptr != '\0') {
  do { ptr++; } while (*ptr != '\0') ;
  ptr++;
}

There is an error in your code.  if the environment is empty, you will increment and write new variables.  These variables will not be able to be found, as the check for the beginning of the variables will always see the first null byte.

Link to comment
Share on other sites

4 minutes ago, ManoftheSea said:

if the environment is empty, you will increment and write new variables.

 

Should not happen, but ok, some check could be there. Same as for adding trailing '\0' after filling all env variables.

Link to comment
Share on other sites

I am opposed.  Let's discuss why.  And I recognize that you have more experience here, so I appreciate if you explain your thinking so I can refine my understanding of what is important.

Looking at the final null byte, I believe it's entirely superfluous. The entire region has already been nulled out with enough space to hold more than the strings we're writing.  If, somehow, the fdtfile variable has begun stomping over other data, it would be better to discover the error by NOT nulling the beginning of the next variable location, as the logic that would read that data would make it visible that there is a problem.  Looking back at what we've learned, the lack of "soc", "board", etc. variables should have told us something, but we didn't know what we didn't know.  I have no desire to submit that as a patch. 

 

As for the "optimized" check for variables against the use of strlen, 9 bytes vs 14, but how long do you have to stare at the optimized version vs. immediately understanding the correctness of the algorithm.  Plus, when you add the initial condition check, you may have a few more instructions.  The loop itself from .L7 to the end is only 9 instructions.  Uh, I don't know how to get the assembly as you did, to compare my optimized version.  Do you think it's better to save the bytes at the cost of developer cognition?

Link to comment
Share on other sites

I have two reasons for extra terminating nul byte:

1. If it had set then this bug would have appeared because it would have immediatelly affected all espressobin boards just by env default -a comment.

2. In case in future env allocation changes from static to dynamic (e.g. malloc-ed) then code without extra terminating nul byte cause problems because dynamic malloced memory does not have to be zeroed. Espressobin is currently the only one board which set/changes default env variables, so any such such would affect only espressobin board. And U-Boot developers do not test every core U-Boot change on Espressobin.

 

So adding extra terminating nul byte is only for future changes in (core) u-boot code to catch possible bugs or regressions. In my opinion it is better to not expect that undefined/uninitialized data would be always zero bytes. And if other code _requires_ that after the last variable must be extra nul byte then it is better to always set it.

 

But you are right that currently it is not needed at all and code would be correct even without it (until somebody changes it).

 

Another option to prevent future regression in this area is to write (unit) tests which can be run during u-boot CI testing without need to have any ARM/espressobin HW. Personally I would really love to see this test but as I looked at the code it is written in such way that unit test is currently impossible without refactoring whole code which sets those default env variables. And I think it does not make sense to do this big refactoring as there was proposed better API for setting default env variables at runtime (I pointed it in u-boot email discussion as link to other patch series). In my opinion it would take same time to rewrite to the that new better API and refactoring code with ability to write unit tests. That is why I'm not going to that refactor -- but anybody else can do it and I think nobody would be against it.

 

How to compile test.c source code to assembly file test.s: Just call gcc with -S argument, it will create file with extension small .s instead of binary a.out. You can also specify -o. So for example: arm-linux-gnueabi-gcc -Os -S test.c -o test.s

 

And which version to choose? If we do not have an agreement what is better then the best is to ask developer responsible for u-boot platform code (which espressobin is part of), who would take and merge that patch and who is maintainer, take care to have code in a good shape and can make decision. Maintainer for mvebu is Stefan. I will send an email and ask for advice.

Link to comment
Share on other sites

Thanks!  So I tried the compilation, and the check added to the beginning of yours matches the first three instructions of the while( do ... while ) :

test3:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
.L15:
	ldrb	r3, [r0]	@ zero_extendqisi2
	cmp	r3, #0
	bxeq	lr
.L16:
	mov	r3, r0
	ldrb	r2, [r0, #1]!	@ zero_extendqisi2
	cmp	r2, #0
	bne	.L16
	add	r0, r3, #2
	b	.L15

 

Dang, that's tight!  Only 9 instructions with the initial check, because it appears ARM64 has a "pre-index" addressing mode that updates the base register as part of the load.  If I understand, that's able to replace the "add" from your loop.  But looking at this, it strikes me that the code could be manually adjusted to not need the "mov" instruction.  Of course, I have no idea if there are rules on branching before/after a load.

	ldrb	r2, [r0]	@ zero_extendqisi2
.L15:
	cmp	r2, #0
	bxeq	lr
.L16:
	ldrb	r2, [r0, #1]!	@ zero_extendqisi2
	cmp	r2, #0
	bne	.L16
	ldrb	r2, [r0, #1]!	@ zero_extendqisi2
	b	.L15

 

But we're awfully far afield of bugfixing, we're into pre-mature optimization.  I did enjoy the exercise.

 

You've convinced me about the null byte.  It doesn't harm, it would have helped catch the present bug, it has value in the future based on the change you mentioned to memory init.

 

If I'm reworking the patch to include the terminating byte (which is entirely your code, if you're okay with me claiming it), I'll also use the tighter loop, and let the list and CC's provide the comments directly.  It may take me a few days to once again cut the patch and communicate the changes.

Link to comment
Share on other sites

30 minutes ago, ManoftheSea said:

If I'm reworking the patch to include the terminating byte (which is entirely your code, if you're okay with me claiming it)

 

It is fine. I sent you private forum message with updated commit description (with tags) and comments. 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...

Important Information

Terms of Use - Privacy Policy - Guidelines