Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
transcripts:building-an-os-1-hello-world [2023/09/04 19:26] – Tiberiu Chibici | transcripts:building-an-os-1-hello-world [2023/09/09 16:57] (current) – [Building an OS - 1 - Hello world] Tiberiu Chibici | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Building an OS - 1 - Hello world ====== | ====== Building an OS - 1 - Hello world ====== | ||
- | >Note: This is a verbatim transcript of the [[https:// | + | >Note: This is an almost |
===== Introduction ===== | ===== Introduction ===== | ||
Line 112: | Line 113: | ||
As you can see, the system boots from floppy, and then it does nothing, exactly as we expected! So far, our operating system does nothing, and does it perfectly!!! | As you can see, the system boots from floppy, and then it does nothing, exactly as we expected! So far, our operating system does nothing, and does it perfectly!!! | ||
- | {{ : | + | {{ : |
===== Hello world ===== | ===== Hello world ===== | ||
Line 122: | Line 123: | ||
All processors have a number of registers, which are really small pieces of memory that can be written and read very fast, and are built into the CPU. Here is a diagram of all the registers on an x86 CPU: | All processors have a number of registers, which are really small pieces of memory that can be written and read very fast, and are built into the CPU. Here is a diagram of all the registers on an x86 CPU: | ||
- | {{ :transcripts:table_of_x86_registers_svg.svg?1000 |Table of x86 registers. By Immae - Own work, CC BY-SA 3.0, https:// | + | {{ osdev:media:table_of_x86_registers.svg?1000 |Table of x86 registers. By Immae - Own work, CC BY-SA 3.0, https:// |
+ | |||
+ | There are several types of registers: | ||
+ | |||
+ | * the general-purpose registers can be used for almost any purpose (RAX, RBX, RCX, RDX, R8-R15 including their smaller counter parts, EAX, AX, AL, AH etc) | ||
+ | * the index registers (RSI, RDI) are usually used for keeping indices and pointers; they can also be used for other purposes | ||
+ | * the program counter (RIP) is a special register which keeps track of which memory location the current instruction begins at | ||
+ | * the segment registers (CS, DS, ES, FS, GS, SS) are used to keep track of the currently active memory segments (which I will explain in just a moment) | ||
+ | * there is also a flags register (RFLAGS) which contains some special flags set by various instructions | ||
+ | * there are a few more special purpose registers, but I will only introduce them when we need them | ||
+ | |||
+ | ==== Real memory model ==== | ||
+ | |||
+ | Now let's talk a bit about RAM. The 8086 CPU had a 20-bit address bus, which meant that you could access up to 2< | ||
+ | |||
+ | <code -> | ||
+ | | ||
+ | segment: | ||
+ | </ | ||
+ | |||
+ | In this scheme, you use two 16-bit values, the **//segment //**and the **// | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | <code c> | ||
+ | linear_address = segment << 4 + offset; | ||
+ | // or | ||
+ | linear_address = segment * 16 + offset; | ||
+ | </ | ||
+ | |||
+ | This also means that there are multiple ways of addressing the same location in memory. For example, the absolute address 0x7C00 (where the BIOS loads our operating system) can be written as any combination that you can see on the screen: | ||
+ | |||
+ | <code -> | ||
+ | segment: | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | There are some special registers which are used to specify the actively used segments: | ||
+ | |||
+ | * **'' | ||
+ | * **'' | ||
+ | * **'' | ||
+ | |||
+ | In order to access (read or write) any memory location, its segment needs to be loaded into one of these registers, by setting the corresponding register. The code segment can only be modified by performing a jump. | ||
+ | |||
+ | Now, how do we reference a memory location from assembly? We use this syntax: | ||
+ | |||
+ | <code asm> | ||
+ | [segment : base + index * scale + displacement] | ||
+ | </ | ||
+ | |||
+ | Where: | ||
+ | |||
+ | * segment: one of CS, DS, ES, FS, GS, SS. Default: DS (SS if BP is used as base) | ||
+ | * base | ||
+ | * 16-bit: BP or BX | ||
+ | * 32/64-bit: any general purpose register | ||
+ | * index: | ||
+ | * 16-bit: SI or DI | ||
+ | * 32/64-bit: any general purpose register | ||
+ | * scale (32/64-bit only): 1, 2, 4 or 8 | ||
+ | * displacement: | ||
+ | |||
+ | The processor is capable of doing some arithmetic for us, as long as we use this expression. | ||
+ | |||
+ | In 16-bit mode, there are a few limitations because that's how the 8086 CPU was originally designed. This was probably done to keep the complexity and cost down. Another example of one such limitation is that we can't write constants to the segment registers directly, we have to use an intermediary register. With the introduction of the 386 processor just a few years later, 32-bit mode was introduced which pretty much rendered 16-bit mode obsolete. A lot of newer CPU features were simply not added to the 16-bit mode, because it is obsolete and only exists for backwards compatibility. However, it is still useful to learn, because most of the things that apply to a 16-bit mode also apply to 32-bit and 64 bit modes. The main use today of 16-bit mode is in the startup sequence; most operating systems switch to 32 or 64-bit mode immediately after starting up. We will do the same thing in a future video, but we can't just yet, as we are limited to the first sector of a floppy disk (512 bytes) which is very little space. Once we are able to load a from the disk, we can do a lot more. | ||
+ | |||
+ | All operating systems have to do the same thing in order to boot, but until we get there, let's get back to referencing our memory locations. So, I already talked about the base and index operands. The scale and displacement operands are numerical constants; the scale can only be used in 32 and 64-bit modes, and it can only have a value of 1, 2, 4 or 8. The displacement can be any signed integer constant. | ||
+ | |||
+ | All the operands in a memory reference expression are optional, so you only have to use whatever you need. | ||
+ | |||
+ | === Examples === | ||
+ | |||
+ | == Example 1: == | ||
+ | |||
+ | <code asm> | ||
+ | var: dw 100 | ||
+ | |||
+ | mov ax, var ; copy offset to ax | ||
+ | mov ax, [var] ; copy memory contents of ds:var to ax | ||
+ | </ | ||
+ | |||
+ | First, I defined a label which points to a word having the value '' | ||
+ | |||
+ | The first instruction '' | ||
+ | |||
+ | The the second instruction '' | ||
+ | |||
+ | == Example 2: == | ||
+ | |||
+ | <code asm> | ||
+ | array: dw 100, 200, 300 | ||
+ | |||
+ | ; read third element in array | ||
+ | mov bx, array | ||
+ | mov si, 2 * 2 | ||
+ | mov ax, [bx + si] | ||
+ | </ | ||
+ | |||
+ | Here's a more complicated example, where we want to read the third element in an array. We put the offset of the array into BX, and the index of the third element in SI. Since we use zero-based indexing, the third element is at '' | ||
+ | |||
+ | Note: You can see here that we use the multiplication symbol. The assembler is capable of calculating the result of constant expressions, | ||
+ | |||
+ | Finally, we put into AX the third element in the array, by referencing the memory location at BX + SI. BX is our base register, and SI is our index register. | ||
+ | |||
+ | ==== Back to the OS - the initialization ==== | ||
+ | |||
+ | Back to our operating system, the code segment register has been set up for us by the BIOS and it points to segment 0. There are some BIOSes out there which actually jump to our code using a different segment and offset such 0x07C0: | ||
+ | |||
+ | <code asm> | ||
+ | main: | ||
+ | ; setup data segments | ||
+ | mov ax, 0 ; can't set ds/es directly | ||
+ | mov ds, ax | ||
+ | mov es, ax | ||
+ | |||
+ | ; setup stack | ||
+ | mov ss, ax | ||
+ | mov sp, 0x7C00 | ||
+ | </ | ||
+ | |||
+ | We also set up the stack segment (SS) to 0, and the stack pointer (SP) to the beginning of our program. So what exactly is this stack? | ||
+ | |||
+ | The stack is a piece of memory that we can access in a "first in last out" manner, using the PUSH and POP instructions. The stack also has a special purpose when using functions. When you call a function, the return address is added to the stack, and when you return from a function, the processor will read the return address from the stack and then jump to it. | ||
+ | |||
+ | Another thing to note about the stack is that it grows downwards! SP points to the top of the stack. When you push something, SP is decremented by the number of bytes pushed, and then the data is written to memory. This is why we set up the stack to point to the start of our operating system: because it grows downwards. If we set it up to the end of our program, it would overwrite our program. We don't want that, so we just put it somewhere where it won't overwrite anything. The beginning of our operating system is a pretty safe spot. | ||
+ | |||
+ | Now we'll start coding a '' | ||
+ | |||
+ | Note: Always document your assembly functions! | ||
+ | |||
+ | <code asm> | ||
+ | start: | ||
+ | jmp main | ||
+ | |||
+ | ; | ||
+ | ; Prints a string to the screen | ||
+ | ; Params: | ||
+ | ; - ds:si points to string | ||
+ | ; | ||
+ | puts: | ||
+ | |||
+ | ; ....... | ||
+ | |||
+ | |||
+ | main: | ||
+ | </ | ||
+ | |||
+ | Our function will receive a pointer to a string in '' | ||
+ | |||
+ | First, we push the registers that we're going to modify to the stack, after which we enter the main loop. | ||
+ | |||
+ | <code asm> | ||
+ | puts: | ||
+ | ; save registers we will modify | ||
+ | push si | ||
+ | push ax | ||
+ | push bx | ||
+ | |||
+ | .loop: | ||
+ | lodsb ; loads next character in al | ||
+ | </ | ||
+ | |||
+ | The '' | ||
+ | |||
+ | Next, I wrote the loop exit condition; the '' | ||
+ | |||
+ | <code asm> | ||
+ | or al, al ; verify if next character is null? | ||
+ | jz .done ; exit condition | ||
+ | |||
+ | ; todo ..... | ||
+ | |||
+ | jmp .loop | ||
+ | |||
+ | .done: | ||
+ | pop bx | ||
+ | pop ax | ||
+ | pop si | ||
+ | ret | ||
+ | </ | ||
+ | |||
+ | The next instruction, | ||
+ | |||
+ | After exiting the loop, we pop the registers we previously pushed in reverse order, and then we'll return from this function. So far, our function takes a string, iterates every character until it encounters the '' | ||
+ | |||
+ | ===== Interrupts ===== | ||
+ | |||
+ | An interrupt is a signal which makes the processor stop whatever it is doing to handle that event. There are 3 possible ways of triggering an interrupt: | ||
+ | |||
+ | - Through **//an exception// | ||
+ | - **// | ||
+ | - From code, through the **//INT instruction// | ||
+ | |||
+ | The BIOS installed some interrupt handlers for us, so that we can use its functionality. Typically, the BIOS reserves an interrupt number for a category of functions, and the value in the '' | ||
+ | |||
+ | <code -> | ||
+ | Examples of BIOS interrupts: | ||
+ | |||
+ | INT 10h -- Video | ||
+ | INT 11h -- Equipment check | ||
+ | INT 12h -- Memory size | ||
+ | INT 13h -- Disk I/O | ||
+ | INT 14h -- Serial communication | ||
+ | INT 15h -- Cassette | ||
+ | INT 16h -- Keyboard | ||
+ | |||
+ | ............ | ||
+ | </ | ||
+ | |||
+ | To print text to the screen, we will need to call [[https:// | ||
+ | |||
+ | <code -> | ||
+ | VIDEO - TELETYPE OUTPUT | ||
+ | |||
+ | AH = 0Eh | ||
+ | AL = character to write | ||
+ | BH = page number | ||
+ | BL = foreground color (graphics modes only) | ||
+ | |||
+ | Return: | ||
+ | Nothing | ||
+ | |||
+ | Desc: Display a character on the screen, advancing the cursor and scrolling the screen as necessary | ||
+ | |||
+ | Notes: Characters 07h (BEL), 08h (BS), 0Ah (LF), and 0Dh (CR) are interpreted and do the expected things. | ||
+ | IBM PC ROMs dated 1981/4/24 and 1981/10/19 require that BH be the same as the current active page | ||
+ | |||
+ | BUG: If the write causes the screen to scroll, BP is destroyed by BIOSes for which AH=06h destroys BP | ||
+ | |||
+ | Source: http:// | ||
+ | </ | ||
+ | |||
+ | What we need to do in order to call this function is to set: | ||
+ | |||
+ | * AH to 0Eh | ||
+ | * AL to the ASCII character that we want to print | ||
+ | * BH to the page number (which is 0) | ||
+ | * BL (the foreground color) is only used in graphics mode, so we can ignore it because we're currently running in text mode. | ||
+ | |||
+ | <code asm> | ||
+ | mov ah, 0x0E ; call bios interrupt | ||
+ | ; al is already set by lodsb | ||
+ | mov bh, 0 ; set page number to 0 | ||
+ | int 0x10 | ||
+ | </ | ||
+ | |||
+ | Finally let's add a string containing "Hello world", | ||
+ | |||
+ | <code asm> | ||
+ | %define ENDL 0x0D, 0x0A | ||
+ | msg_hello: db 'Hello world!', | ||
+ | </ | ||
+ | |||
+ | All that's left to do is to set DS:SI to the address of the string, and then call '' | ||
+ | |||
+ | <code asm> | ||
+ | ; print hello world message | ||
+ | mov si, msg_hello | ||
+ | call puts | ||
+ | </ | ||
+ | |||
+ | Let's now test our program: | ||
+ | |||
+ | <code bash> | ||
+ | $ make | ||
+ | $ qemu-system-i386 -fda build/ | ||
+ | </ | ||
+ | |||
+ | And the result: | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | ===== Conclusion ===== | ||
+ | |||
+ | Great! So, we have successfully written a tiny operating system which can print text to the screen! This was a lot of work, and we learned a lot of new stuff about how computers work. We'll continue the next time when we will improve our assembly skills and learn some new stuff, by extending our operating system to print numbers to the screen. After that, we will get into the complex task of loading stuff from the disk. | ||
- | there are several types of registers the general-purpose registers can be used for almost any purpose the index registers are usually used for keeping indices and pointers they can also be used for other purposes the program counter is a special register which keeps track of which memory location the current instruction begins the segment registers are used to keep track of the currently active memory segments which I will explain in just a moment there is also a Flags register which contains some special flags which are set by various instructions there are a few more special purpose registers but will only introduce them when we need them now we'll talk a bit about RAM memory the 8086 CPU had a 20-bit address bus this meant that you could access up to 2 to the power of 20 which means about one megabyte of memory at the time typical computers had around 64 to 128 kilobytes of memory so the engineers at Intel thought this limit was huge for various reasons they decided to use a segment and offset addressing scheme for Memory segmentation memory in this scheme you address memory by using two 16-bit values the segment and the offset each segment contains 64 kilobytes of memory where each byte can be accessed using the offset value segments overlap every 16 bytes this means that you can convert a segment offset address to an absolute address by shifting the segment four bits to the left were multiplying it by 16 and then adding the offset this also means that there are multiple ways of addressing the same location in memory for example the absolute address 7000 which is where the BIOS flows our operating system can be written as any combination that you can see on the screen there are some special registers which are used to specify the actively used segments CS contain the code segment which is the segment the processor executes code from the IP or program counter register only gives us the offset the D s and D s registers are data segments newer processors introduced additional data segments FS and GS SS contains the current stack register in order to access the outside one of these active statements we need to load that Simon into one of these registers the code segment can only be modified by performing a jump now how do Referencing a memory location you reference a memory location from assembly you use this syntax a segment register followed by a colon followed by an expression which gives the offset put between in brackets the segment register can be omitted in which case the DSL register will be used the processor is capable of doing some arithmetic for us as long as we use this expression the base and index operands can be any general-purpose processor registers in 16-bit mode there are a few limitations however only B P and B X can be used as base registers and only Si and di can be used as index registers these limitations exist because of how the 8086 CPU was originally designed where they had to put such limitations to keep the complexity down another example of one such limitation is that we can't write constants to the segment registers directly we have to use an intermediary register with the introduction of the 386 processor just a few years later 32-bit mode was introduced which pretty much rendered 16-bit mode obsolete a lot of newer cpu features were simply not added to the 16-bit mode because it is absolute and it only exists for backwards compatibility it is still useful to learn because most of the things that apply to a 16-bit mode apply to a 32-bit or 62 bit mode and it is much simpler its main use today is in the startup sequence most operating systems switch to 32 or 64-bit mode immediately after starting up we will do the same thing in a future video but we can't just yet for now we are limited to the first sector of a floppy disk that is 512 bytes which is very little space once we are able to load a from the floppy disk we can do a lot more all operating systems have to do the same thing in order to boot but until we get there let's get back to referencing our memory locations so I already talked about the base and index operands the scale and displacement operands are numerical constants the scale can only be used in 32 and 64-bit modes and it can only have a value of one to four or eight the displacement can be any signed integer constant all the operands in a memory reference expression are optional so you only have to use whatever you need so here's an example first I defined a label which points to a word having the value 100 the first instruction puts the offset of the label into the ax register the second instruction puts the memory contents per our label point set since we didn't specify a segment register D s is going to be used we haven' | + | Thank you for watching and see you the next time! Bye bye! |