下一站 - Ihcblog!

远方的风景与脚下的路 | 子站点:ihc.im

0%

用 Rust 实现极简 VMM 3 - 运行真实的 Linux Kernel

This article also has an English version.

本系列文章主要记录我在尝试用 Rust 实现一个 Hypervisor 的过程。目录:

  1. 用 Rust 实现极简 VMM - 基础
  2. 用 Rust 实现极简 VMM - 模式切换
  3. 用 Rust 实现极简 VMM - 运行真实的 Linux Kernel
  4. 用 Rust 实现极简 VMM - 实现 Virtio 设备

本文是系列的第三篇,会做一些准备工作,并实际跑起来一个真正的 Linux。

在前面的章节中我们能够在 64bit 模式下运行任意代码了,本章的目标是能够将一个真实的 Linux Kernel 跑起来。

可能有人会好奇,Linux 也能以实模式启动,为什么我们需要做这么多麻烦事呢?是因为正常情况 Linux 启动会依赖 bootloader 完成模式切换和内核代码加载,而这一步我们 VMM 来做更高效。我们切换完模式后,只要确保内核代码和 initrd 已经加载到对应页表项的内存地址中,即可直接跳转启动 vcpu。

环境准备

由于我们只想启动 Linux 而不是完整启动一个包含硬盘的 Linux,所以我们只需要准备:

  1. 内核文件:你可以自己编译,也可以从 firecracker 提供的地址下载预编译调优过的文件
  2. initrd 镜像:(这其实不是个磁盘镜像,只是沿用了旧的名字)你可以自己打包,也可以使用脚本创建。

这里有一份我在尝试 Rust for Linux 时写的简单的过程,包含了内核编译、initrd 手动构建与启动,可以作为参考。但构建 Kernel 和 initrd 并不是我们目前关注的东西,为了确保我们不被这部分的问题影响,我们这里直接用别人搞好的。

vmlinux.bin: https://s3.amazonaws.com/spec.ccfc.min/img/quickstart_guide/x86_64/kernels/vmlinux.bin

initrd.img: 按照 https://github.com/marcov/firecracker-initrd.git 构建(注:这个有点旧了,会提示 root 用户密码过于简单,手动修改一下即可)

我们将 vmlinux.bininitrd.img 两个文件放置于 /tmp/mini-kvm 下。

IRQ 与 PIT 创建

PIC 和 APIC

除了 CPU 与内存,计算机的另一个重要部分是 IO 设备。设备是否有数据有两种方式判定:要么 CPU 轮询,频率较高时 cost 比较大,频率较低时又会导致延迟问题;另一种方式就是设备主动在数据就绪时通知 CPU,就是通过所谓的中断。每个指令周期结束后,CPU 都会查看其中断标识 IF 有没有被设置,如果被设置则跳转对应的中断处理程序。

外设有各种各样,CPU 不可能为每种外设都留相应的引脚接收中断,所以需要一个 dispatcher 角色的硬件来辅助工作。IBM 设计了 8259A 中断控制器,有 8 个信号线,以可编程的形式工作,可以动态地注册引脚和优先级、屏蔽中断等。为了支持更多的外设,往往以级联的方式使用多片 8259A 一起工作。这种可编程中断控制器被称为 PIC(Programmable Interrupt Controller)。

到了多 CPU 时代,Intel 提出 APIC(Advanced Programmable Interrupt Controller)技术。APIC 由两部分组成:一个是 LAPIC(Local APIC),存在于每块 CPU 中(现在每个逻辑核心都有一个);另一个是 IOAPIC,可能有一个或多个,其连接外部设备。两者通过 APIC Bus 连接。外设通过 IOAPIC 向 LAPIC 广播中断,LAPIC 自行决定是否处理。

apic

IRQ 虚拟化

KVM 为我们虚拟化好了 IRQ 芯片,我们只需要创建它即可使用:

1
vm.create_irq_chip().unwrap();

对于需要触发某个中断的需求,我们只需要向其注册一个 EventFd 和对应 IRQ 号:

1
vm.register_irqfd(&evtfd, 0).unwrap();

时钟信号虚拟化

计算机系统中有两类有关时间的设备,一种是时钟,一种是定时器。我们可以通过时钟拿到当前时间信息,如 TSC(Time Stamp Counter)设备;通过定时器我们可以在到相应时间或以固定频率触发中断使 CPU 在执行用户代码的同时能够感知到时间流逝,如 PIT(Programmable Interval Timer)。

PIT 精度较低,仅在系统启动过程中使用;启动完成后将使用 LAPIC Timer,其工作在 CPU 内部,精度更高。

要创建虚拟 PIT 设备,我们只需要利用 KVM 的能力:

1
2
3
4
5
let pit_config = kvm_pit_config {
flags: KVM_PIT_SPEAKER_DUMMY,
..Default::default()
};
vm.create_pit2(pit_config).unwrap();

CPUID 处理

有关 CPU 的一些信息是通过 CPUID 指令获取的,我们需要修改 VM 内看到的 CPUID。我们需要将期望 Guest 看到的 CPUID 信息在最开始的时候告诉 KVM,后续因为 Guest 执行 CPUID 导致 VM_EXIT 的时候 KVM 就可以自行处理,不用丢给用户态 VMM 了。

具体的规范可以看 https://en.wikipedia.org/wiki/CPUIDhttp://www.flounder.com/cpuid_explorer2.htm

简单的例子

作为例子,我们可以看 function = 0 对应的寄存器数据:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
let mut kvm_cpuid = kvm.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES).unwrap();
let entries = kvm_cpuid.as_mut_slice();
for entry in entries.iter_mut() {
match entry.function {
0 => {
println!("EBX: {:x}", entry.ebx);
println!("EDX: {:x}", entry.edx);
println!("ECX: {:x}", entry.ecx);
}
_ => (),
}
}

// EBX: 756e6547
// EDX: 49656e69
// ECX: 6c65746e

// "\\x47\\x65\\x6e\\x75\\x69\\x6e\\x65\\x49\\x6e\\x74\\x65\\x6c" =>
// "GenuineIntel"

另一个例子是 0x40000000,会在寄存器里保存 KVMKVMKVM 字符串,具体可以看这里:https://01.org/linuxgraphics/gfx-docs/drm/virt/kvm/cpuid.html

结构定义

kvm_cpuid_entry2 的结构定义是这样的:

1
2
3
4
5
6
7
8
9
10
11
12
#[repr(C)]
#[derive(Debug, Default, Copy, Clone, PartialEq, Versionize)]
pub struct kvm_cpuid_entry2 {
pub function: __u32,
pub index: __u32,
pub flags: __u32,
pub eax: __u32,
pub ebx: __u32,
pub ecx: __u32,
pub edx: __u32,
pub padding: [__u32; 3usize],
}

你可能会好奇,执行 CPUID 指令不是只要设置对 EAX/ECX 就好了嘛?这个 function 和 index 哪来的?参考这里 https://elixir.bootlin.com/linux/latest/source/arch/x86/kvm/cpuid.c#L1392 我们可以看到,function 就是 *EAX 得到的,index 就是 *ECX 得到的。所以我们对照前面的规范时,将 function 和 index 映射为 *EAX, *ECX 即可。

设置 CPUID

这部分主要参考 firecracker 代码,可能某些配置是必要的,某些是不必要的。

参考前面 wiki,我们可以找到 EAX=1 时(由于这个是输入,所以对应我们的 function=1):

  1. ECX 的第 31bit 要置 1 表示 hypervisor。
  2. EBX 的 32:24 bit 设置为 Local APIC ID,多 vcpu 时从 0 开始编号即可。
  3. EBX 的 15:8 bit 设置为 CLFLUSH line size。x86 下 cacheline 一般是 64 byte,根据 wiki 我们设置的值会 *8 后作为实际值,所以设置为 8 即可。
  4. EDX 的第 19bit 设置 1 表示启用 CLFLUSH,配置时 CLFLUSH line size 设置才生效。TODO:为啥几个参考项目都没设置这个?
  5. EBX 的 23:16 bit 设置为单个 physical package 中的 logical processors 数量,通常设置为其 vCPU 数目向上取二次幂(当然不取也没关系)。
  6. EDX 的第 28 bit 设置 1 表示启用 hyper threading,前面那条 logical processors 数量设置才生效。通常 vCPU > 1 时设置。
  7. ECX 的第 24 bit 设置 1 表示启用 tsc-deadline。

EAX=4 主要涉及缓存和 core 相关,比如一个 socket 上有多少 core:

  1. 省略不写了。

EAX=6 风扇和电源管理:

  1. ECX 第 3bit 置 0,关闭 Performance-Energy Bias capability。
  2. EAX 第 1bit 置 0,关闭 Intel Turbo Boost Technology capability

EAX=10 性能监控:

  1. 全部置 0 关掉。

EAX=11 Extended Topology Entry:

  1. 省略不写了。

EAX=0x8000_0002..=0x8000_0004 CPU 型号信息:

  1. 可以自己编。

简单处理

事实上不处理 cpuid 直接扔出去也能用:

1
2
let kvm_cpuid = kvm.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES).unwrap();
vcpu.set_cpuid2(&kvm_cpuid).unwrap();

这里我们先 workaround 一下,后续再来处理这部分。

设置 TSS

TODO: TSS 科普 & 讲清楚为啥 KVM 要搞这个

1
2
const KVM_TSS_ADDRESS: usize = 0xfffb_d000;
vm.set_tss_address(KVM_TSS_ADDRESS as usize).expect("set tss failed");

加载内核和 initrd

我们需要把 bootloader 的活干了,把内核和 initrd、启动参数加载到内核,并将一些必要的信息放在内存中以传递给 kernel。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// load linux kernel
let mut kernel_file = File::open(KERNEL_PATH).expect("open kernel file failed");
let kernel_entry = Elf::load(
&guest_mem,
None,
&mut kernel_file,
Some(GuestAddress(HIMEM_START)),
)
.unwrap()
.kernel_load;

// load initrd
let initrd_content = std::fs::read(INITRD_PATH).expect("read initrd file failed");
let first_region = guest_mem.find_region(GuestAddress::new(0)).unwrap();
assert!(
initrd_content.len() <= first_region.size(),
"too big initrd"
);
let initrd_addr =
GuestAddress((first_region.size() - initrd_content.len()) as u64 & !(4096 - 1));
guest_mem
.read_from(
initrd_addr,
&mut Cursor::new(&initrd_content),
initrd_content.len(),
)
.unwrap();

// load boot command
let mut boot_cmdline = Cmdline::new(0x10000);
boot_cmdline.insert_str(BOOT_CMD).unwrap();
load_cmdline(&guest_mem, GuestAddress(BOOT_CMD_START), &boot_cmdline).unwrap();

创建启动参数并写入内存:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
// crate and write boot_params
let mut params = boot_params::default();
// <https://www.kernel.org/doc/html/latest/x86/boot.html>
const KERNEL_TYPE_OF_LOADER: u8 = 0xff;
const KERNEL_BOOT_FLAG_MAGIC_NUMBER: u16 = 0xaa55;
const KERNEL_HDR_MAGIC_NUMBER: u32 = 0x5372_6448;
const KERNEL_MIN_ALIGNMENT_BYTES: u32 = 0x0100_0000;

params.hdr.type_of_loader = KERNEL_TYPE_OF_LOADER;
params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC_NUMBER;
params.hdr.header = KERNEL_HDR_MAGIC_NUMBER;
params.hdr.cmd_line_ptr = BOOT_CMD_START as u32;
params.hdr.cmdline_size = 1 + BOOT_CMD.len() as u32;
params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
params.hdr.ramdisk_image = initrd_addr.raw_value() as u32;
params.hdr.ramdisk_size = initrd_content.len() as u32;

// Value taken from <https://elixir.bootlin.com/linux/v5.10.68/source/arch/x86/include/uapi/asm/e820.h#L31>
const E820_RAM: u32 = 1;
const EBDA_START: u64 = 0x9fc00;
const FIRST_ADDR_PAST_32BITS: u64 = 1 << 32;
const MEM_32BIT_GAP_SIZE: u64 = 768 << 20;
const MMIO_MEM_START: u64 = FIRST_ADDR_PAST_32BITS - MEM_32BIT_GAP_SIZE;

add_e820_entry(&mut params, 0, EBDA_START, E820_RAM);
let last_addr = guest_mem.last_addr();
let first_addr_past_32bits = GuestAddress(FIRST_ADDR_PAST_32BITS);
let end_32bit_gap_start = GuestAddress(MMIO_MEM_START);
let himem_start = GuestAddress(HIMEM_START);
if last_addr < end_32bit_gap_start {
add_e820_entry(
&mut params,
himem_start.raw_value() as u64,
// it's safe to use unchecked_offset_from because
// mem_end > himem_start
last_addr.unchecked_offset_from(himem_start) as u64 + 1,
E820_RAM,
);
} else {
add_e820_entry(
&mut params,
himem_start.raw_value(),
// it's safe to use unchecked_offset_from because
// end_32bit_gap_start > himem_start
end_32bit_gap_start.unchecked_offset_from(himem_start),
E820_RAM,
);

if last_addr > first_addr_past_32bits {
add_e820_entry(
&mut params,
first_addr_past_32bits.raw_value(),
// it's safe to use unchecked_offset_from because
// mem_end > first_addr_past_32bits
last_addr.unchecked_offset_from(first_addr_past_32bits) + 1,
E820_RAM,
);
}
}
LinuxBootConfigurator::write_bootparams(
&BootParams::new(&params, GuestAddress(ZERO_PAGE_START)),
&guest_mem,
)
.unwrap();

fn add_e820_entry(params: &mut boot_params, addr: u64, size: u64, mem_type: u32) {
if params.e820_entries >= params.e820_table.len() as u8 {
panic!();
}
params.e820_table[params.e820_entries as usize].addr = addr;
params.e820_table[params.e820_entries as usize].size = size;
params.e820_table[params.e820_entries as usize].type_ = mem_type;
params.e820_entries += 1;
}

其中可用内存信息正常应当由 bootloader 通过 bios 中断(中断号 0x15,AX=0xE820,所以对应结构得名 e820 entry)拿到,这里我们手动将可用内存表示成多个 e820 entry 传给 kernel。

TODO:内存布局

创建输入输出设备

通常有两种输入输出设备类型:PortIO 和 mmap IO。我们这里只关注 PortIO 通信。

PortIO 有 64K 的 Port 地址空间,其典型地址有(参考链接):

  • COM1: I/O port 0x3F8, IRQ 4
  • COM2: I/O port 0x2F8, IRQ 3
  • COM3: I/O port 0x3E8, IRQ 4
  • COM4: I/O port 0x2E8, IRQ 3

而 Linux 中的 /dev/ttyS{0/1…} 对应 COM{1/2…}。所以我们要通过 PortIO 得到 Linux 的 console 输入输出,只需要处理 COM1(0x3F8,IRQ 4)并在启动参数中指定 console=ttyS0

这里创建一个 EventFd 并将其注册到 IRQ 4 上。当 COM1 有 PortIO 时,则我们可以通过这个 EventFd 得到通知。

实现上,我们使用 vm_superio 提供的模拟串口实现,使用这个 EventFd 作为其 Trigger,并使用 stdout 作为输出。

1
2
3
4
5
6
7
8
// initialize devices
let com_evt_1 = EventWrapper::new();
vm.register_irqfd(&com_evt_1.0, 4).unwrap();
let stdio_serial = Arc::new(Mutex::new(Serial::with_events(
com_evt_1.try_clone().unwrap(),
DummySerialEvent,
std::io::stdout(),
)));

为了适配它的接口,我们需要额外做两个结构:EventWrapperDummySerialEvent。主要目的是实现 TriggerSerialEvents。这部分代码并不重要,只是为了满足其接口约束。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
struct EventWrapper(EventFd);

impl EventWrapper {
pub fn new() -> Self {
Self(EventFd::new(EFD_NONBLOCK).unwrap())
}

pub fn try_clone(&self) -> std::io::Result<Self> {
self.0.try_clone().map(Self)
}
}

impl std::ops::Deref for EventWrapper {
type Target = EventFd;

fn deref(&self) -> &Self::Target {
&self.0
}
}

impl Trigger for EventWrapper {
type E = std::io::Error;

fn trigger(&self) -> std::io::Result<()> {
self.0.write(1)
}
}

struct DummySerialEvent;

impl SerialEvents for DummySerialEvent {
fn buffer_read(&self) {}
fn out_byte(&self) {}
fn tx_lost_byte(&self) {}
fn in_buffer_empty(&self) {}
}

在遇到 VcpuExit::IoInVcpuExit::IoOut 时,我们可以拿到对应的 PortIO addr 和 data,这时我们在判断后转交给 stdio_serial 处理即可。输出时 stdio_serial 直接向 stdout 输出;输入时需要我们自己处理。

Vcpu Run

同我们在上一小节说的一样,我们需要将 IoIn 和 IoOut 事件转交给 Serial 处理。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// run vcpu in another thread
let exit_evt = EventWrapper::new();
let vcpu_exit_evt = exit_evt.try_clone().unwrap();
let stdio_serial_read = stdio_serial.clone();

std::thread::spawn(move || {
loop {
match vcpu.run() {
Ok(run) => match run {
VcpuExit::IoIn(addr, data) => {
if addr >= COM1 && addr - COM1 < 8 {
data[0] = stdio_serial_read.lock().unwrap().read((addr - COM1) as u8);
}
}
VcpuExit::IoOut(addr, data) => {
if addr >= COM1 && addr - COM1 < 8 {
let _ = stdio_serial_read
.lock()
.unwrap()
.write((addr - COM1) as u8, data[0]);
}
}
VcpuExit::MmioRead(_, _) => {}
VcpuExit::MmioWrite(_, _) => {}
VcpuExit::Hlt => {
println!("KVM_EXIT_HLT");
break;
}
VcpuExit::Shutdown => {
println!("KVM_EXIT_SHUTDOWN");
break;
}
r => {
println!("KVM_EXIT: {:?}", r);
}
},
Err(e) => {
println!("KVM Run error: {:?}", e);
break;
}
}
}
vcpu_exit_evt.trigger().unwrap();
});

在该线程结束运行时,我们可以通过 exit_evt 得到通知,这样我们的主线程就可以在等待 stdin 的同时等待 vcpu 退出事件。

Stdin 处理

Serial 设备需要我们自行处理输入数据,而我们在等待用户侧 stdin 的同时还需要等待 vcpu 退出,这样可以在 vm 停止时退出主线程。你可能已经猜到了,我们这里使用 epoll 作为多路复用机制即可(因为 KVM 已经是 linux only 的了,所以也不用考虑跨平台问题)。

这里使用 vmm_sys_util 封装的 PollContext

对于 stdin 的处理我们需要使用 raw mode,因为我们需要转发类似 ctrl+c 之类的键入。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// process events
let stdin = std::io::stdin().lock();
stdin.set_raw_mode().expect("set terminal raw mode failed");

let poll: PollContext<u8> = PollContext::new().unwrap();
poll.add(&exit_evt.0, 0).unwrap();
poll.add(&stdin, 1).unwrap();
'l: loop {
let events: PollEvents<u8> = poll.wait().unwrap();
for event in events.iter_readable() {
match event.token() {
0 => {
println!("vcpu stopped, main loop exit");
break 'l;
}
1 => {
let mut out = [0u8; 64];
match stdin.read_raw(&mut out[..]) {
Ok(0) => {}
Ok(count) => {
stdio_serial
.lock()
.unwrap()
.enqueue_raw_bytes(&out[..count])
.expect("enqueue bytes failed");
}
Err(e) => {
println!("error while reading stdin: {:?}", e);
}
}
}
_ => unreachable!(),
}
}
}

完整代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
use std::{
fs::File,
io::Cursor,
sync::{Arc, Mutex},
};

use kvm_bindings::{
kvm_pit_config, kvm_segment, kvm_userspace_memory_region, KVM_MAX_CPUID_ENTRIES,
KVM_MEM_LOG_DIRTY_PAGES, KVM_PIT_SPEAKER_DUMMY,
};
use kvm_ioctls::{Kvm, VcpuExit};
use linux_loader::{
bootparam::boot_params,
configurator::{linux::LinuxBootConfigurator, BootConfigurator, BootParams},
loader::{elf::Elf, load_cmdline, Cmdline, KernelLoader},
};
use vm_memory::{Address, Bytes, GuestAddress, GuestMemory, GuestMemoryMmap};
use vm_superio::{serial::SerialEvents, Serial, Trigger};
use vmm_sys_util::{
eventfd::{EventFd, EFD_NONBLOCK},
poll::{PollContext, PollEvents},
terminal::Terminal,
};

const MEMORY_SIZE: usize = 128 << 20;

const KVM_TSS_ADDRESS: usize = 0xfffb_d000;
const X86_CR0_PE: u64 = 0x1;
const X86_CR4_PAE: u64 = 0x20;
const X86_CR0_PG: u64 = 0x80000000;
const BOOT_GDT_OFFSET: u64 = 0x500;
const EFER_LME: u64 = 0x100;
const EFER_LMA: u64 = 0x400;

const HIMEM_START: u64 = 0x100000;
const BOOT_CMD_START: u64 = 0x20000;
const BOOT_STACK_POINTER: u64 = 0x8ff0;
const ZERO_PAGE_START: u64 = 0x7000;

const KERNEL_PATH: &str = "/tmp/mini-kvm/vmlinux.bin";
const INITRD_PATH: &str = "/tmp/mini-kvm/initrd.img";
const BOOT_CMD: &str = "console=ttyS0 noapic noacpi reboot=k panic=1 pci=off nomodule";

fn main() {
// create vm
let kvm = Kvm::new().expect("open kvm device failed");
let vm = kvm.create_vm().expect("create vm failed");

// initialize irq chip and pit
vm.create_irq_chip().unwrap();
let pit_config = kvm_pit_config {
flags: KVM_PIT_SPEAKER_DUMMY,
..Default::default()
};
vm.create_pit2(pit_config).unwrap();

// create memory
let guest_addr = GuestAddress(0x0);
let guest_mem = GuestMemoryMmap::<()>::from_ranges(&[(guest_addr, MEMORY_SIZE)]).unwrap();
let host_addr = guest_mem.get_host_address(guest_addr).unwrap();
let mem_region = kvm_userspace_memory_region {
slot: 0,
guest_phys_addr: 0,
memory_size: MEMORY_SIZE as u64,
userspace_addr: host_addr as u64,
flags: KVM_MEM_LOG_DIRTY_PAGES,
};
unsafe {
vm.set_user_memory_region(mem_region)
.expect("set user memory region failed")
};
vm.set_tss_address(KVM_TSS_ADDRESS as usize)
.expect("set tss failed");

// create vcpu and set cpuid
let vcpu = vm.create_vcpu(0).expect("create vcpu failed");
let kvm_cpuid = kvm.get_supported_cpuid(KVM_MAX_CPUID_ENTRIES).unwrap();
vcpu.set_cpuid2(&kvm_cpuid).unwrap();

// load linux kernel
let mut kernel_file = File::open(KERNEL_PATH).expect("open kernel file failed");
let kernel_entry = Elf::load(
&guest_mem,
None,
&mut kernel_file,
Some(GuestAddress(HIMEM_START)),
)
.unwrap()
.kernel_load;

// load initrd
let initrd_content = std::fs::read(INITRD_PATH).expect("read initrd file failed");
let first_region = guest_mem.find_region(GuestAddress::new(0)).unwrap();
assert!(
initrd_content.len() <= first_region.size(),
"too big initrd"
);
let initrd_addr =
GuestAddress((first_region.size() - initrd_content.len()) as u64 & !(4096 - 1));
guest_mem
.read_from(
initrd_addr,
&mut Cursor::new(&initrd_content),
initrd_content.len(),
)
.unwrap();

// load boot command
let mut boot_cmdline = Cmdline::new(0x10000);
boot_cmdline.insert_str(BOOT_CMD).unwrap();
load_cmdline(&guest_mem, GuestAddress(BOOT_CMD_START), &boot_cmdline).unwrap();

// set regs
let mut regs = vcpu.get_regs().unwrap();
regs.rip = kernel_entry.raw_value();
regs.rsp = BOOT_STACK_POINTER;
regs.rbp = BOOT_STACK_POINTER;
regs.rsi = ZERO_PAGE_START;
regs.rflags = 2;
vcpu.set_regs(&regs).unwrap();

// set sregs
let mut sregs = vcpu.get_sregs().unwrap();
const CODE_SEG: kvm_segment = seg_with_st(1, 0b1011);
const DATA_SEG: kvm_segment = seg_with_st(2, 0b0011);

// construct kvm_segment and set to segment registers
sregs.cs = CODE_SEG;
sregs.ds = DATA_SEG;
sregs.es = DATA_SEG;
sregs.fs = DATA_SEG;
sregs.gs = DATA_SEG;
sregs.ss = DATA_SEG;

// construct gdt table, write to memory and set it to register
let gdt_table: [u64; 3] = [
0, // NULL
to_gdt_entry(&CODE_SEG), // CODE
to_gdt_entry(&DATA_SEG), // DATA
];
let boot_gdt_addr = GuestAddress(BOOT_GDT_OFFSET);
for (index, entry) in gdt_table.iter().enumerate() {
let addr = guest_mem
.checked_offset(boot_gdt_addr, index * std::mem::size_of::<u64>())
.unwrap();
guest_mem.write_obj(*entry, addr).unwrap();
}
sregs.gdt.base = BOOT_GDT_OFFSET;
sregs.gdt.limit = std::mem::size_of_val(&gdt_table) as u16 - 1;

// enable protected mode
sregs.cr0 |= X86_CR0_PE;

// set page table
let boot_pml4_addr = GuestAddress(0xa000);
let boot_pdpte_addr = GuestAddress(0xb000);
let boot_pde_addr = GuestAddress(0xc000);

guest_mem
.write_slice(
&(boot_pdpte_addr.raw_value() as u64 | 0b11).to_le_bytes(),
boot_pml4_addr,
)
.unwrap();
guest_mem
.write_slice(
&(boot_pde_addr.raw_value() as u64 | 0b11).to_le_bytes(),
boot_pdpte_addr,
)
.unwrap();

for i in 0..512 {
guest_mem
.write_slice(
&((i << 21) | 0b10000011u64).to_le_bytes(),
boot_pde_addr.unchecked_add(i * 8),
)
.unwrap();
}
sregs.cr3 = boot_pml4_addr.raw_value() as u64;
sregs.cr4 |= X86_CR4_PAE;
sregs.cr0 |= X86_CR0_PG;
sregs.efer |= EFER_LMA | EFER_LME;
vcpu.set_sregs(&sregs).unwrap();

// crate and write boot_params
let mut params = boot_params::default();
// <https://www.kernel.org/doc/html/latest/x86/boot.html>
const KERNEL_TYPE_OF_LOADER: u8 = 0xff;
const KERNEL_BOOT_FLAG_MAGIC_NUMBER: u16 = 0xaa55;
const KERNEL_HDR_MAGIC_NUMBER: u32 = 0x5372_6448;
const KERNEL_MIN_ALIGNMENT_BYTES: u32 = 0x0100_0000;

params.hdr.type_of_loader = KERNEL_TYPE_OF_LOADER;
params.hdr.boot_flag = KERNEL_BOOT_FLAG_MAGIC_NUMBER;
params.hdr.header = KERNEL_HDR_MAGIC_NUMBER;
params.hdr.cmd_line_ptr = BOOT_CMD_START as u32;
params.hdr.cmdline_size = 1 + BOOT_CMD.len() as u32;
params.hdr.kernel_alignment = KERNEL_MIN_ALIGNMENT_BYTES;
params.hdr.ramdisk_image = initrd_addr.raw_value() as u32;
params.hdr.ramdisk_size = initrd_content.len() as u32;

// Value taken from <https://elixir.bootlin.com/linux/v5.10.68/source/arch/x86/include/uapi/asm/e820.h#L31>
const E820_RAM: u32 = 1;
const EBDA_START: u64 = 0x9fc00;
const FIRST_ADDR_PAST_32BITS: u64 = 1 << 32;
const MEM_32BIT_GAP_SIZE: u64 = 768 << 20;
const MMIO_MEM_START: u64 = FIRST_ADDR_PAST_32BITS - MEM_32BIT_GAP_SIZE;

add_e820_entry(&mut params, 0, EBDA_START, E820_RAM);
let last_addr = guest_mem.last_addr();
let first_addr_past_32bits = GuestAddress(FIRST_ADDR_PAST_32BITS);
let end_32bit_gap_start = GuestAddress(MMIO_MEM_START);
let himem_start = GuestAddress(HIMEM_START);
if last_addr < end_32bit_gap_start {
add_e820_entry(
&mut params,
himem_start.raw_value() as u64,
// it's safe to use unchecked_offset_from because
// mem_end > himem_start
last_addr.unchecked_offset_from(himem_start) as u64 + 1,
E820_RAM,
);
} else {
add_e820_entry(
&mut params,
himem_start.raw_value(),
// it's safe to use unchecked_offset_from because
// end_32bit_gap_start > himem_start
end_32bit_gap_start.unchecked_offset_from(himem_start),
E820_RAM,
);

if last_addr > first_addr_past_32bits {
add_e820_entry(
&mut params,
first_addr_past_32bits.raw_value(),
// it's safe to use unchecked_offset_from because
// mem_end > first_addr_past_32bits
last_addr.unchecked_offset_from(first_addr_past_32bits) + 1,
E820_RAM,
);
}
}
LinuxBootConfigurator::write_bootparams(
&BootParams::new(&params, GuestAddress(ZERO_PAGE_START)),
&guest_mem,
)
.unwrap();

// initialize devices
const COM1: u16 = 0x3f8;
let com_evt_1 = EventWrapper::new();
vm.register_irqfd(&com_evt_1.0, 4).unwrap();
let stdio_serial = Arc::new(Mutex::new(Serial::with_events(
com_evt_1.try_clone().unwrap(),
DummySerialEvent,
std::io::stdout(),
)));

// run vcpu in another thread
let exit_evt = EventWrapper::new();
let vcpu_exit_evt = exit_evt.try_clone().unwrap();
let stdio_serial_read = stdio_serial.clone();
std::thread::spawn(move || {
loop {
match vcpu.run() {
Ok(run) => match run {
VcpuExit::IoIn(addr, data) => {
if addr >= COM1 && addr - COM1 < 8 {
data[0] = stdio_serial_read.lock().unwrap().read((addr - COM1) as u8);
}
}
VcpuExit::IoOut(addr, data) => {
if addr >= COM1 && addr - COM1 < 8 {
let _ = stdio_serial_read
.lock()
.unwrap()
.write((addr - COM1) as u8, data[0]);
}
}
VcpuExit::MmioRead(_, _) => {}
VcpuExit::MmioWrite(_, _) => {}
VcpuExit::Hlt => {
println!("KVM_EXIT_HLT");
break;
}
VcpuExit::Shutdown => {
println!("KVM_EXIT_SHUTDOWN");
break;
}
r => {
println!("KVM_EXIT: {:?}", r);
}
},
Err(e) => {
println!("KVM Run error: {:?}", e);
break;
}
}
}
vcpu_exit_evt.trigger().unwrap();
});

// process events
let stdin = std::io::stdin().lock();
stdin.set_raw_mode().expect("set terminal raw mode failed");

let poll: PollContext<u8> = PollContext::new().unwrap();
poll.add(&exit_evt.0, 0).unwrap();
poll.add(&stdin, 1).unwrap();
'l: loop {
let events: PollEvents<u8> = poll.wait().unwrap();
for event in events.iter_readable() {
match event.token() {
0 => {
println!("vcpu stopped, main loop exit");
break 'l;
}
1 => {
let mut out = [0u8; 64];
match stdin.read_raw(&mut out[..]) {
Ok(0) => {}
Ok(count) => {
stdio_serial
.lock()
.unwrap()
.enqueue_raw_bytes(&out[..count])
.expect("enqueue bytes failed");
}
Err(e) => {
println!("error while reading stdin: {:?}", e);
}
}
}
_ => unreachable!(),
}
}
}
}

const fn seg_with_st(selector_index: u16, type_: u8) -> kvm_segment {
kvm_segment {
base: 0,
limit: 0x000fffff,
selector: selector_index << 3,
// 0b1011: Code, Executed/Read, accessed
// 0b0011: Data, Read/Write, accessed
type_,
present: 1,
dpl: 0,
// If L-bit is set, then D-bit must be cleared.
db: 0,
s: 1,
l: 1,
g: 1,
avl: 0,
unusable: 0,
padding: 0,
}
}

// Ref: <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html> 3-10 Vol. 3A
const fn to_gdt_entry(seg: &kvm_segment) -> u64 {
let base = seg.base;
let limit = seg.limit as u64;
// flags: G, DB, L, AVL
let flags = (seg.g as u64 & 0x1) << 3
| (seg.db as u64 & 0x1) << 2
| (seg.l as u64 & 0x1) << 1
| (seg.avl as u64 & 0x1);
// access: P, DPL, S, Type
let access = (seg.present as u64 & 0x1) << 7
| (seg.dpl as u64 & 0x11) << 5
| (seg.s as u64 & 0x1) << 4
| (seg.type_ as u64 & 0x1111);
((base & 0xff00_0000u64) << 32)
| ((base & 0x00ff_ffffu64) << 16)
| (limit & 0x0000_ffffu64)
| ((limit & 0x000f_0000u64) << 32)
| (flags << 52)
| (access << 40)
}

fn add_e820_entry(params: &mut boot_params, addr: u64, size: u64, mem_type: u32) {
if params.e820_entries >= params.e820_table.len() as u8 {
panic!();
}
params.e820_table[params.e820_entries as usize].addr = addr;
params.e820_table[params.e820_entries as usize].size = size;
params.e820_table[params.e820_entries as usize].type_ = mem_type;
params.e820_entries += 1;
}

struct EventWrapper(EventFd);

impl EventWrapper {
pub fn new() -> Self {
Self(EventFd::new(EFD_NONBLOCK).unwrap())
}

pub fn try_clone(&self) -> std::io::Result<Self> {
self.0.try_clone().map(Self)
}
}

impl std::ops::Deref for EventWrapper {
type Target = EventFd;

fn deref(&self) -> &Self::Target {
&self.0
}
}

impl Trigger for EventWrapper {
type E = std::io::Error;

fn trigger(&self) -> std::io::Result<()> {
self.0.write(1)
}
}

struct DummySerialEvent;

impl SerialEvents for DummySerialEvent {
fn buffer_read(&self) {}
fn out_byte(&self) {}
fn tx_lost_byte(&self) {}
fn in_buffer_empty(&self) {}
}

运行起来:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
[    0.000000] Linux version 4.14.174 (@57edebb99db7) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #2 SMP Wed Jul 14 11:47:24 UTC 2021
[ 0.000000] Command line: console=ttyS0 noapic noacpi reboot=k panic=1 pci=off nomodule
[ 0.000000] Disabled fast string operations
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64
[ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64
[ 0.000000] x86/fpu: xstate_offset[5]: 960, xstate_sizes[5]: 64
[ 0.000000] x86/fpu: xstate_offset[6]: 1024, xstate_sizes[6]: 512
[ 0.000000] x86/fpu: xstate_offset[7]: 1536, xstate_sizes[7]: 1024
[ 0.000000] x86/fpu: xstate_offset[9]: 2560, xstate_sizes[9]: 8
[ 0.000000] x86/fpu: Enabled xstate features 0x2ff, context size is 2568 bytes, using 'compacted' format.
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000007ffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] tsc: Unable to calibrate against PIT
[ 0.000000] tsc: No reference (HPET/PMTIMER) available
[ 0.000000] e820: last_pfn = 0x8000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR: Disabled
[ 0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[ 0.000000] CPU MTRRs all blank - virtualized system.
[ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] RAMDISK: [mem 0x06525000-0x07ffffff]
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000007ffffff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x06503000-0x06524fff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] DMA32 [mem 0x0000000001000000-0x0000000007ffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x0000000007ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x0000000007ffffff]
[ 0.000000] smpboot: Boot CPU (id 0) not listed by BIOS
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x000fffff]
[ 0.000000] e820: [mem 0x08000000-0xffffffff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on bare hardware
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x486 with crng_init=0
[ 0.000000] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] percpu: Embedded 41 pages/cpu s128600 r8192 d31144 u2097152
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 32137
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: console=ttyS0 noapic noacpi reboot=k panic=1 pci=off nomodule
[ 0.000000] PID hash table entries: 512 (order: 0, 4096 bytes)
[ 0.000000] Memory: 83524K/130680K available (8204K kernel code, 645K rwdata, 1480K rodata, 1324K init, 2792K bss, 47156K reserved, 0K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Kernel/User page tables isolation: enabled
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=1.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.000000] NR_IRQS: 4352, nr_irqs: 24, preallocated irqs: 16
[ 0.000000] Console: colour dummy device 80x25
[ 0.000000] console [ttyS0] enabled
[ 0.024000] tsc: Unable to calibrate against PIT
[ 0.028000] tsc: No reference (HPET/PMTIMER) available
[ 0.032000] tsc: Marking TSC unstable due to could not calculate TSC khz
[ 0.040000] Calibrating delay loop... 5951.48 BogoMIPS (lpj=11902976)
[ 0.088000] pid_max: default: 32768 minimum: 301
[ 0.092000] Security Framework initialized
[ 0.096000] SELinux: Initializing.
[ 0.100000] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.108000] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.112000] Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.120000] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.132000] Disabled fast string operations
[ 0.140000] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.148000] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 0.156000] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.164000] Spectre V2 : Mitigation: Full generic retpoline
[ 0.168000] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 0.176000] Spectre V2 : Enabling Restricted Speculation for firmware calls
[ 0.184000] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.188000] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[ 0.196000] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 0.248000] Freeing SMP alternatives memory: 28K
[ 0.268000] smpboot: Max logical packages: 1
[ 0.272000] smpboot: SMP motherboard not detected
[ 0.276000] smpboot: SMP disabled
[ 0.276000] Not enabling interrupt remapping due to skipped IO-APIC setup
[ 0.500000] Performance Events: Skylake events, Intel PMU driver.
[ 0.504000] ... version: 2
[ 0.508000] ... bit width: 48
[ 0.512000] ... generic registers: 4
[ 0.516000] ... value mask: 0000ffffffffffff
[ 0.520000] ... max period: 000000007fffffff
[ 0.524000] ... fixed-purpose events: 3
[ 0.528000] ... event mask: 000000070000000f
[ 0.536000] Hierarchical SRCU implementation.
[ 0.544000] smp: Bringing up secondary CPUs ...
[ 0.548000] smp: Brought up 1 node, 1 CPU
[ 0.552000] smpboot: Total of 1 processors activated (5951.48 BogoMIPS)
[ 0.560000] devtmpfs: initialized
[ 0.564000] x86/mm: Memory block size: 128MB
[ 0.572000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.576000] futex hash table entries: 256 (order: 2, 16384 bytes)
[ 0.588000] NET: Registered protocol family 16
[ 0.596000] cpuidle: using governor ladder
[ 0.596000] cpuidle: using governor menu
[ 0.640000] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[ 0.644000] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.652000] SCSI subsystem initialized
[ 0.656000] pps_core: LinuxPPS API ver. 1 registered
[ 0.660000] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[ 0.664000] PTP clock support registered
[ 0.664000] dmi: Firmware registration failed.
[ 0.672000] NetLabel: Initializing
[ 0.672000] NetLabel: domain hash size = 128
[ 0.676000] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
[ 0.680000] NetLabel: unlabeled traffic allowed by default
[ 0.684000] clocksource: Switched to clocksource refined-jiffies
[ 0.688000] VFS: Disk quotas dquot_6.6.0
[ 0.692000] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.708001] NET: Registered protocol family 2
[ 0.712001] TCP established hash table entries: 1024 (order: 1, 8192 bytes)
[ 0.716002] TCP bind hash table entries: 1024 (order: 2, 16384 bytes)
[ 0.720002] TCP: Hash tables configured (established 1024 bind 1024)
[ 0.724002] UDP hash table entries: 256 (order: 1, 8192 bytes)
[ 0.728002] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[ 0.732003] NET: Registered protocol family 1
[ 0.736003] Unpacking initramfs...
[ 1.228034] Freeing initrd memory: 27500K
[ 1.232034] platform rtc_cmos: registered platform RTC device (no PNP device found)
[ 1.236034] Scanning for low memory corruption every 60 seconds
[ 1.240034] audit: initializing netlink subsys (disabled)
[ 1.244035] Initialise system trusted keyrings
[ 1.248035] Key type blacklist registered
[ 1.252035] audit: type=2000 audit(943920001.244:1): state=initialized audit_enabled=0 res=1
[ 1.256035] workingset: timestamp_bits=36 max_order=15 bucket_order=0
[ 1.264036] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 1.272036] Key type asymmetric registered
[ 1.276037] Asymmetric key parser 'x509' registered
[ 1.280037] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[ 1.288037] io scheduler noop registered (default)
[ 1.292038] io scheduler cfq registered
[ 1.296038] Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled
[ 1.304038] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a U6_16550A
[ 1.312039] loop: module loaded
[ 1.316039] Loading iSCSI transport class v2.0-870.
[ 1.320039] iscsi: registered transport (tcp)
[ 1.324040] tun: Universal TUN/TAP device driver, 1.6
[ 1.336040] i8042: Can't read CTR while initializing i8042
[ 1.340041] i8042: probe of i8042 failed with error -5
[ 1.344041] hidraw: raw HID events driver (C) Jiri Kosina
[ 1.348041] nf_conntrack version 0.5.0 (1024 buckets, 4096 max)
[ 1.356042] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 1.360042] Initializing XFRM netlink socket
[ 1.364042] NET: Registered protocol family 10
[ 1.372043] Segment Routing with IPv6
[ 1.376043] NET: Registered protocol family 17
[ 1.380043] Bridge firewalling registered
[ 1.384043] NET: Registered protocol family 40
[ 1.388044] registered taskstats version 1
[ 1.392044] Loading compiled-in X.509 certificates
[ 1.396044] Loaded X.509 cert 'Build time autogenerated kernel key: e98e9d271da5d0a322cc4d7bfaa8c2c4c3e46010'
[ 1.404045] Key type encrypted registered
[ 1.416045] Freeing unused kernel memory: 1324K
[ 1.424046] Write protecting the kernel read-only data: 12288k
[ 1.440047] Freeing unused kernel memory: 2016K
[ 1.452048] Freeing unused kernel memory: 568K

OpenRC 0.44.10 is starting up Linux 4.14.174 (x86_64)

* Mounting /proc ... [ ok ]
* Mounting /run ... * /run/openrc: creating directory
* /run/lock: creating directory
* /run/lock: correcting owner
* Caching service dependencies ... [ ok ]
* Clock skew detected with `(null)'
* Adjusting mtime of `/run/openrc/deptree' to Fri Sep 23 07:15:15 2022

* WARNING: clock skew detected!
* WARNING: clock skew detected!
* Mounting devtmpfs on /dev ... [ ok ]
* Mounting /dev/mqueue ... [ ok ]
* Mounting /dev/pts ... [ ok ]
* Mounting /dev/shm ... [ ok ]
* Loading modules ...modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
[ ok ]
* Mounting misc binary format filesystem ... [ ok ]
* Mounting /sys ... [ ok ]
* Mounting security filesystem ... [ ok ]
* Mounting debug filesystem ... [ ok ]
* Mounting SELinux filesystem ... [ ok ]
* Mounting persistent storage (pstore) filesystem ... [ ok ]
* WARNING: clock skew detected!
* Starting fcnet ... [ ok ]
* Checking local filesystems ... [ ok ]
* Remounting filesystems ... [ ok ]
* Mounting local filesystems ... [ ok ]
* Setting hostname ... [ ok ]
* Starting networking ... * eth0 ...Cannot find device "eth0"
Device "eth0" does not exist.
[ ok ]
* Starting networking ... * lo ... [ ok ]
* eth0 ... [ ok ]

Welcome to Alpine Linux 3.16
Kernel 4.14.174 on an x86_64 (ttyS0)

[ 2.744128] random: fast init done
localhost login: root
Password:
Welcome to Alpine!

The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org/>.

You can setup the system with the command: setup-alpine

You may change this message by editing /etc/motd.

login[1080]: root login on 'ttyS0'
localhost:~# pwd
/root
localhost:~# reboot -f
[ 15.780943] reboot: Restarting system
[ 15.780943] reboot: machine restart
KVM_EXIT_SHUTDOWN
vcpu stopped, main loop exit

欢迎关注我的其它发布渠道