Finally I’ve booted kernel on real machine without unexpected rebooting it in process of booting.
While testing code, I’ve tried smaller buffers (16, 8, 4, and 2Kbytes were used) and tests figured out, that code with 4 and 2 kbytes is able to boot kernel on real machine, bigger buffers are working on virtual machines. Keeping in mind fact that memory allocation/freeing is done most probably correctly (it’s done by libstand functions, that are trusted) it was difficult to understand why smaller usage of memory for buffers gave such strange results.
First investigation showed, that loader’s heap is allocated over PXE data (in base memory higher 8d000h). That was fixed (higher bound of heap was limited to address from which PXE data starts), but hadn’t solved the problem. Yeah, sad moment, I had faith that problem is in heap allocation. Next suggestion was that pxeboot is rather big binary. In fact, PXE specs recommend to make download of boot images in two steps: remote.0 and remote.1. Remote.0 placed at well-known 7c00h:0000 and must not exceed 32Kbytes (cause “32K size limit provides the advantage of being able to clean up and fail gracefully” (c) PXE Specification). remote.0 determines if system is has adequate resources and downloads remote.1 (in specs to in extended memory at e98000h). Monolithic pxeboot is about 200K, with pxe_http above 240Kb. I don’t know if 32K size is strict requirement or not, pxelinux for example is about 12Kb.
Well, “NFS loader works” I’ve thought and then reduced binary size by removing all pxe_http testing code and code related to support filesystems such as dos. It gave me nearely same size as usual for pxeboot. So, now http loaded kernel boots on my home machine.
While working with DHCP client, I’ve found that I’m doing same work, that was performed in bootp() call in libstand. I was not giving much attention to it, cause my code was working, but after binary reducing thought, that it’s good idea to use already available function. This function mainly depends on udpread/udpsend functions, that were earlier situated in libi386/pxe.c but commented by me while rewriting it’s code a little bit (commented cause it was using pxe_call and etc, that was removed by me form this file).
It was not big deal to rewrite udpread/udpwrite to use pxe_http functions to read/send data, but for this purpose was done “default socket” mechanism for UDP. It’s done with aim not use sockets while calling udpread/udpsend, cause code, which uses them, don’t know about pxe_http sockets. To be more accurate, sending of udp packets doesn’t require socket, it’s enough to call pxe_udp_send(), but incoming packets are coming through filters mechanism and are delivered to appropriate socket buffer. Previously, all data that was filtered out – was going to trash bin, now it goes to “default socket”. and it’s possible to read from it as usual, using pxe_udp_read() function.
So, bootp() began working good (except moment with updating of gateway/nameserver information, that I’ll fix in next few days). The main problem with it was, that bootp() may work only after opening of device, cause after that NIC MAC is known by code, that generates BOOTP packet. Normally, pxe_dhcp_query() – which is now may be wrapper for bootp() – is started from pxe_core_init(), and already knows own MAC. Well, I’ve returned pxe_dhcp_query() to pxe_open() in case if not own DHCP client used (here was bootp() originally). And all began work correctly.
NFS code also uses udpread/udpsend functions, so it was rather logical to try NFS loader. It also works. It’s started working without any doubts. so it’ uninteresting to tell anything about it.
This made me possible to make conclusion that compatibility with old working PXE related code in libi386 is big enough.
So, formally main goal of project is achieved. But there is more work to do:
1. load image of RAM drive and mount root to it, so booting process via http will be fully performed. Now it stopson stage of mounting of root file system.
2. Understand where is problem with memory when I’ve used bigger buffers with big pxeboot binary. It’s rather interesting moment, cause I’ve no idea what might cause rebooting of prefectly loaded kernel in case of bigger buffer size for sockets earlier.
3. Test all and update some code details (well, one of them is that root path is not used by http loader now).
4. write finally whole documentation related to project. I’ve started to write it few times, but it wasn’t finished yet.
5. choose notebook to buy.
These are my tasks for next weeks.