faicker's个人博客

关注云计算,网络虚拟化


  • 首页

  • 关于

  • 归档

  • 标签

  • 搜索

trex支持nat的使用教程

发表于 2017-10-09 | 分类于 trex , nat |

编译和安装

系统:centos7
下载:

  1. 官方版本,wget https://github.com/cisco-system-traffic-generator/trex-core/archive/v2.29.tar.gz
  2. 支持nvgre版本,https://github.com/lxu4net/trex-core/tree/feature_nvgre

编译trex:

1
2
3
cd linux_dpdk
./b configure (only once)
./b build

可执行文件在scripts目录下面。

nat支持

修改源代码,src/bp_sim.h:3455行左右,原来是,

1
2
ipv4->updateIpSrc(node->m_dest_ip);
ipv4->updateIpDst(node->m_src_ip);

snat的修改(hard code了snat后的最高位是2.2.),

1
2
ipv4->updateIpSrc(node->m_dest_ip);
ipv4->updateIpDst((node->m_src_ip & 0x0000FFFF) + 0x02020000);

dnat的修改(hard code了dnat后的最高位是10.0.),

1
2
ipv4->updateIpSrc((node->m_dest_ip & 0x0000FFFF) + 0x0A000000);
ipv4->updateIpDst(node->m_src_ip);

最后重新编译。

trex配置文件

以dnat测试为例,

  • 端口的配置文件,/etc/trex_cfg_dnat.yaml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    - port_limit : 2
    version : 2
    port_bandwidth_gb : 10
    interfaces : ["06:00.0","06:00.1"]
    port_info :
    - ip : 192.168.1.2
    default_gw : 192.168.1.15
    tunnels :
    type : 'nvgre'
    dl_dst : '68:05:CA:34:90:D4'
    dl_src : '6C:92:BF:27:3D:72'
    tun_id : 6057468
    tun_dst: '192.168.1.15'
    tun_src: [ 192.168.2.1,192.168.2.2,192.168.2.3,192.168.2.4,192.168.2.5,192.168.2.6,192.168.2.7,192.168.2.8,192.168.2.9,192.168.2.10,192.168.2.11,192.168.2.12,192.168.2.13,192.168.2.14,192.168.2.15,192.168.2.16,192.168.2.17,192.168.2.18,192.168.2.19,192.168.2.20,192.168.2.21,192.168.2.22,192.168.2.23,192.168.2.24,192.168.2.25,192.168.2.26,192.168.2.27,192.168.2.28,192.168.2.29,192.168.2.30,192.168.2.31,192.168.2.32,192.168.2.33,192.168.2.34,192.168.2.35,192.168.2.36,192.168.2.37,192.168.2.38,192.168.2.39,192.168.2.40,192.168.2.41,192.168.2.42,192.168.2.43,192.168.2.44,192.168.2.45,192.168.2.46,192.168.2.47,192.168.2.48,192.168.2.49,192.168.2.50,192.168.2.51,192.168.2.52,192.168.2.53,192.168.2.54,192.168.2.55,192.168.2.56,192.168.2.57,192.168.2.58,192.168.2.59,192.168.2.60,192.168.2.61,192.168.2.62,192.168.2.63,192.168.2.64,192.168.2.65,192.168.2.66,192.168.2.67,192.168.2.68,192.168.2.69,192.168.2.70,192.168.2.71,192.168.2.72,192.168.2.73,192.168.2.74,192.168.2.75,192.168.2.76,192.168.2.77,192.168.2.78,192.168.2.79,192.168.2.80,192.168.2.81,192.168.2.82,192.168.2.83,192.168.2.84,192.168.2.85,192.168.2.86,192.168.2.87,192.168.2.88,192.168.2.89,192.168.2.90,192.168.2.91,192.168.2.92,192.168.2.93,192.168.2.94,192.168.2.95,192.168.2.96,192.168.2.97,192.168.2.98,192.168.2.99,192.168.2.100,192.168.2.101,192.168.2.102,192.168.2.103,192.168.2.104,192.168.2.105,192.168.2.106,192.168.2.107,192.168.2.108,192.168.2.109,192.168.2.110,192.168.2.111,192.168.2.112,192.168.2.113,192.168.2.114,192.168.2.115,192.168.2.116,192.168.2.117,192.168.2.118,192.168.2.119,192.168.2.120,192.168.2.121,192.168.2.122,192.168.2.123,192.168.2.124,192.168.2.125,192.168.2.126,192.168.2.127,192.168.2.128 ]
    - ip : 192.168.1.3
    default_gw : 192.168.1.15
    tunnels :
    type : 'nvgre'
    dl_dst : '68:05:CA:34:90:D4'
    dl_src : '6C:92:BF:27:3D:73'
    tun_id : 1
    tun_dst: '192.168.1.15'
    tun_src: [ 192.168.3.1,192.168.3.2,192.168.3.3,192.168.3.4,192.168.3.5,192.168.3.6,192.168.3.7,192.168.3.8,192.168.3.9,192.168.3.10,192.168.3.11,192.168.3.12,192.168.3.13,192.168.3.14,192.168.3.15,192.168.3.16,192.168.3.17,192.168.3.18,192.168.3.19,192.168.3.20,192.168.3.21,192.168.3.22,192.168.3.23,192.168.3.24,192.168.3.25,192.168.3.26,192.168.3.27,192.168.3.28,192.168.3.29,192.168.3.30,192.168.3.31,192.168.3.32,192.168.3.33,192.168.3.34,192.168.3.35,192.168.3.36,192.168.3.37,192.168.3.38,192.168.3.39,192.168.3.40,192.168.3.41,192.168.3.42,192.168.3.43,192.168.3.44,192.168.3.45,192.168.3.46,192.168.3.47,192.168.3.48,192.168.3.49,192.168.3.50,192.168.3.51,192.168.3.52,192.168.3.53,192.168.3.54,192.168.3.55,192.168.3.56,192.168.3.57,192.168.3.58,192.168.3.59,192.168.3.60,192.168.3.61,192.168.3.62,192.168.3.63,192.168.3.64,192.168.3.65,192.168.3.66,192.168.3.67,192.168.3.68,192.168.3.69,192.168.3.70,192.168.3.71,192.168.3.72,192.168.3.73,192.168.3.74,192.168.3.75,192.168.3.76,192.168.3.77,192.168.3.78,192.168.3.79,192.168.3.80,192.168.3.81,192.168.3.82,192.168.3.83,192.168.3.84,192.168.3.85,192.168.3.86,192.168.3.87,192.168.3.88,192.168.3.89,192.168.3.90,192.168.3.91,192.168.3.92,192.168.3.93,192.168.3.94,192.168.3.95,192.168.3.96,192.168.3.97,192.168.3.98,192.168.3.99,192.168.3.100,192.168.3.101,192.168.3.102,192.168.3.103,192.168.3.104,192.168.3.105,192.168.3.106,192.168.3.107,192.168.3.108,192.168.3.109,192.168.3.110,192.168.3.111,192.168.3.112,192.168.3.113,192.168.3.114,192.168.3.115,192.168.3.116,192.168.3.117,192.168.3.118,192.168.3.119,192.168.3.120,192.168.3.121,192.168.3.122,192.168.3.123,192.168.3.124,192.168.3.125,192.168.3.126,192.168.3.127,192.168.3.128 ]

其中,192.168.1.15 是nat物理机的ip地址,68:05:CA:34:90:D4 是nat物理机的mac地址。

  • 流的配置文件,dnat.yaml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    - duration : 10.0
    generator :
    distribution : "seq"
    clients_start : "15.0.0.1"
    clients_end : "15.10.255.255"
    servers_start : "2.2.1.0"
    servers_end : "2.2.2.255"
    clients_per_gb : 201
    min_clients : 101
    dual_port_mask : "1.0.0.0"
    tcp_aging : 0
    udp_aging : 0
    cap_info :
    - name: 2.pcap
    cps: 20000.0
    ipg : 10000
    rtt : 10000
    w : 1

其中,15.x模拟的是从外网进来的源地址,2.2.模拟的是云用户的EIP地址段。10.0.模拟的是用户的内网地址段。2.pcap 是cap2/dns.pcap,可以调整这个pcap文件里来回包交互次数来达到调整pps的目的。

nat物理机配置

  1. nat物理机环境搭建好。
  2. 加好静态arp。

    1
    2
    arp -i xxx -s 192.168.1.2 6C:92:BF:27:3D:72
    arp -i xxx -s 192.168.1.3 6C:92:BF:27:3D:73
  3. 加好静态路由。

    1
    2
    ip route add 192.168.2.0/24 via 192.168.1.2
    ip route add 192.168.3.0/24 via 192.168.1.3

运行trex

1
./t-rex-64 --checksum-offload-disable --cfg /etc/trex_cfg_dnat.yaml -m 1 -c 4 -f dnat.yaml -d 60

调试

  1. 大量并发时,需要在trex-cfg里调大hugepages个数,默认值是2048。
  2. 改dnat.yaml里的cps为0.01,name改为cap2/dns.pcap,在nat物理机上抓包看来回流量。
  3. 流量OK后,观察trex运行完毕后是否有丢包Total-pkt-drop,并增大压力。可以用这个脚本观察nat物理机上哪里有丢包,dropstat.sh

ovn学习-5-conntrack

发表于 2017-09-02 | 分类于 ovn , openvswitch |

conntrack定义

ctstate:INVALID,NEW,ESTABLISHED,RELATED,UNTRACKED,SNAT,DNAT
ctstatus:NONE EXPECTED SEEN_REPLY ASSURED CONFIRMED
ctdir:ORIGINAL REPLY

解释,

  1. CONNECTION TRACKING FIELDS in ovs-fields
  2. CONFIRMED是当这个包离开系统即是confirmed

tcp in conntrack

TCP协议的状态有,
NONE | SYN_SENT | SYN_RECV | ESTABLISHED | FIN_WAIT | CLOSE_WAIT | LAST_ACK | TIME_WAIT | CLOSE | LISTEN

状态含义:

  • NONE: initial state
  • SYN_SENT: SYN-only packet seen
  • SYN_SENT2: SYN-only packet seen from reply dir, simultaneous open
  • SYN_RECV: SYN-ACK packet seen
  • ESTABLISHED: ACK packet seen
  • FIN_WAIT: FIN packet seen
  • CLOSE_WAIT: ACK seen (after FIN)
  • LAST_ACK: FIN seen (after FIN)
  • TIME_WAIT: last ACK seen
  • CLOSE: closed connection (RST)

实例解析

从vm1(on host1),telnet vm2(on host2) 22端口《vm2里监听22》

packet vm1 host1 host2 vm2
syn -> syn_sent syn_sent syn_sent syn_sent -> syn_recv
syn+ack <- syn_sent -> syn_recv syn_recv syn_recv syn_recv
ack -> established established established established

syn包

在宿主机上把来自vm2的arp包drop,导致回包在L3->L2时失败。还是过了L3,会经过NF_INET_LOCAL_OUT。
vm1,

1
tcp 6 108 SYN_SENT src=172.16.255.130 dst=172.16.255.131 sport=41080 dport=22 [UNREPLIED] src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41080 mark=0 use=1

host1,

1
tcp 6 112 SYN_SENT src=172.16.255.130 dst=172.16.255.131 sport=41080 dport=22 [UNREPLIED] src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41080 mark=0 zone=1 use=1

host2,

1
tcp 6 114 SYN_SENT src=172.16.255.130 dst=172.16.255.131 sport=41080 dport=22 [UNREPLIED] src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41080 mark=0 zone=1 use=1

vm2,(这里如果没监听,看不到这条记录,应该是回了rst导致的;收到syn包,回复syn+ack包)

  1. conntrack -E -e ALL看到的,

    1
    2
    [NEW] tcp 6 120 SYN_SENT src=172.16.255.130 dst=172.16.255.131 sport=41080 dport=22 [UNREPLIED] src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41080
    [UPDATE] tcp 6 60 SYN_RECV src=172.16.255.130 dst=172.16.255.131 sport=41080 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41080
  2. 1
    tcp 6 57 SYN_RECV src=172.16.255.130 dst=172.16.255.131 sport=41080 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41080 mark=0 use=1

syn+ack包

在vm1里,iptables -A OUTPUT -d vm2 -p tcp –tcp-flags ACK ACK -j DROP
vm2,

1
tcp 6 22 SYN_RECV src=172.16.255.130 dst=172.16.255.131 sport=41098 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41098 mark=0 use=1

host2,

1
tcp 6 36 SYN_RECV src=172.16.255.130 dst=172.16.255.131 sport=41098 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41098 mark=0 zone=1 use=1

host1,

1
tcp 6 19 SYN_RECV src=172.16.255.130 dst=172.16.255.131 sport=41098 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41098 mark=0 zone=1 use=1

vm1《收到syn+ack包,回复ack包》

  1. conntrack -E -e ALL看到的,

    1
    2
    [UPDATE] tcp 6 60 SYN_RECV src=172.16.255.130 dst=172.16.255.131 sport=41098 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41098
    [UPDATE] tcp 6 432000 ESTABLISHED src=172.16.255.130 dst=172.16.255.131 sport=41098 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41098 [ASSURED]
  2. 1
    tcp 6 431989 ESTABLISHED src=172.16.255.130 dst=172.16.255.131 sport=41098 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41098 [ASSURED] mark=0 use=1

ack包

vm1,

1
tcp 6 431984 ESTABLISHED src=172.16.255.130 dst=172.16.255.131 sport=41102 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41102 [ASSURED] mark=0 use=1

host1,

1
tcp 6 431987 ESTABLISHED src=172.16.255.130 dst=172.16.255.131 sport=41102 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41102 [ASSURED] mark=0 zone=1 use=1

host2,

1
tcp 6 431991 ESTABLISHED src=172.16.255.130 dst=172.16.255.131 sport=41102 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41102 [ASSURED] mark=0 zone=1 use=1

vm2,

1
tcp 6 431997 ESTABLISHED src=172.16.255.130 dst=172.16.255.131 sport=41102 dport=22 src=172.16.255.131 dst=172.16.255.130 sport=22 dport=41102 [ASSURED] mark=0 use=1

tips

  • udp,一来一回,就是established,再一来或者一回,就是assured。

ovn学习-4-mac学习

发表于 2017-08-31 | 分类于 ovn , openvswitch |

前言

localnet port,动态学习ip-mac关系,不走内核,存储在MAC_Binding表里。

分析

过程是这样的,
logic router确定了nexthop的转发端口,用get_arp action来查找nexthop的mac是否已经学习到(在table 66里面)。如果没有,则使用arp action(在table=24里)从转发端口广播一个arp request,logic switch会广播到它下的每一个端口,如果有arp应答,logic switch会回给logic router,logic router会执行put_arp action(在table=17里),同时,ovn-controller会在MAC_Binding表里增加一行记录。ovn-controller收到MAC_Binding表的更新后,添加一条flow到table=66里面。
涉及到的flow如下,

1
2
3
4
5
6
7
8
9
10
table=17, n_packets=1, n_bytes=42, idle_age=65534, hard_age=65534, priority=90,arp,metadata=0x3,arp_op=2 actions=push:NXM_NX_REG0[],push:N
XM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],controller(userdata=00.00.00.01.00.00.00.00),pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG0[
]
table=22, n_packets=4703, n_bytes=460837, idle_age=0, hard_age=65534, priority=0,ip,metadata=0x3 actions=push:NXM_NX_REG0[],push:NXM_NX_XX
REG0[96..127],pop:NXM_NX_REG0[],mod_dl_dst:00:00:00:00:00:00,resubmit(,66),pop:NXM_NX_REG0[],resubmit(,23)
table=66, n_packets=4702, n_bytes=460739, idle_age=0, hard_age=65534, priority=100,reg0=0xa7f0082,reg15=0x2,metadata=0x3 actions=mod_dl_dst:f6:ba
:43:3d:44:07
table=23, n_packets=5876, n_bytes=575963, idle_age=630, hard_age=65534, priority=0,metadata=0x3 actions=resubmit(,24)
table=24, n_packets=1, n_bytes=98, idle_age=65534, hard_age=65534, priority=100,ip,metadata=0x3,dl_dst=00:00:00:00:00:00 actions=controller(userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.ff.ff.00.18.00.00.23.20.00.06.00.20.00.40.00.00.00.01.de.10.00.00.20.04.ff.ff.00.18.00.00.23.20.00.06.00.20.00.60.00.00.00.01.de.10.00.00.22.04.00.19.00.10.80.00.2a.02.00.01.00.00.00.00.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.20.00.00.00)

table=17这条flow解释下(代码函数是pinctrl_handle_put_mac_binding),
执行controller这个action时,reg0=arp_spa,dl_src=arp_sha,再加上logic input port值,实现了action=(put_arp(inport, arp.spa, arp.sha)),
后面是恢复寄存器的值。userdata的前四个字节是opcode,这里是0x01,表示opcode是ACTION_OPCODE_PUT_ARP。

ovn学习-3-流表分析

发表于 2017-08-29 | 分类于 ovn , openvswitch |

概念

Hypervisor是运行ovs的,比如libvirtd。
gateway,连接logic network和physical network。
Hypervisors和gateways都称之为chassis。
logic network,
Logical switches, the logical version of Ethernet switches
Logical routers, the logical version of IP routers
Logical datapaths, the logical version of an OpenFlow switch. Logical switches and routers are both implemented as logical datapaths
Logical ports,

  1. Logical ports representing VIFs
  2. Localnet ports represent the points of connectivity between logical switches and the physical network
  3. Logical patch ports represent the points of connectivity between logical switches and logical routers,and in some cases between peer logical routers

gateway router, 物理位置(chassis)绑定的。可支持1-to-N nat。Port_Binding里port type是l3gateway。比如这里的edge1。如果要在gateway router上面配置snat或者dnat规则,和distributed router连接必须通过switch。NAT rules only work on Gateway routers, and on distributed routers with one logical router port with a redirect-chassis specified.
distributed router,
Distributed Gateway Port is logic network gateway。

寄存器使用介绍

  1. tunnel key
  2. logical datapath field,一个logic switch/logic router就是一个logic datapath,存储在metadata寄存器里(跨机器是存储在tunnel key里)
  3. logical input port field,存储在reg14里
  4. logical output port field,存储在reg15里
  5. conntrack zone field for logical ports,存储在reg13里
  6. conntrack zone fields for routers,dnat在reg11里,snat在reg12里
  7. logical flow flags,存储在reg10里
  8. VLAN ID

tips,

  • flow里的metadata值即是Datapath_Binding表里的tunnel_key(ovn-sbctl list Datapath_Binding)。
  • flow里的reg14值即是Port_Binding表里的tunnel_key(ovn-sbctl list Port_Binding)。
  • 物理port(ofport)和逻辑port的转换是在本地chassis由ovn-controller完成的。

整体流表处理

switch datapath主要功能是arp代答,dhcp应答,l2查找,ACL,security。
router datapath主要功能是网关arp代答,网关icmp代答,路由,下一跳arp解析,nat。
router路由和nat的顺序跟内核处理的顺序保持一致。

一开始,VM发出来一个包进入ovs,然后,

  1. 在table 0里,匹配了in_port,做了一个物理到逻辑的转换。设置logical datapath field,logical input port,跳到table 16。如果是VM里容器发出来的包,会用vlan id来区分,并strip vlan,然后就跟上面的一样了。table 0也会处理从隧道进来的包,也会设置logical datapath field,logical input port,还会设置logical output port(因为是先知道了logical output port,再做的隧道封装),这些信息是从tun metadata获取的。然后跳到table 33。
  2. table 16到31执行sb db里Logical_Flow表的逻辑,ingress方向。对应Logical_Flow表里的table 0到15。每一条logic flow对应一条或者多条实际flow。包可能只匹配其中的一条,也可能匹配多条(action是一样的)。ovn-controller使用logic flow的UUID的前32bit作为flow里的cookie值。一些logic flow可能对应到ovs里的conjunctive match,使用的cookie值是0。大多数logic flow里的action都在openflow里有对应的实现。比如next对应resubmit。以下是一些特别的,1)output: 通过resubmit到table 32实现。如果output有多个,也会resubmit多次。2)get_arp(P, A)/get_nd(P, A):通过保存一些值,resubmit到table 66实现。table 66里的flow是sb db里MAC_Binding表生成的。3) put_arp(P, A, E)/put_nd(P, A, E):通过保存一些值,发送包给ovn-controller,ovn-controller会更新MAC_Binding表。gateway连接外部网络时学习到的。
  3. table 32到47实现的是logic ingress的output逻辑。table=32处理发送到其他hypervisor的包<从隧道口output>,table=33处理发送给本地hypervisor的包,table=34检查包的逻辑ingress和egress port是否一样,如果一样,则丢弃。1)table=32里面的flow包括单播和多播处理。设置值并单播发送给其他hypervisor;多播同样,只是发送多份。default跳到table 33。2)table=33处理logic port是本地的包。resubmit到34,如果是多播的,会更改logic port,resubmit到34多次。3)table=34,loopback检查(MLF_ALLOW_LOOPBACK标志),并resubmit到table 48处理。
  4. table 48到63执行的是sb db的Logical_Flow表的逻辑,egress方向。通过resubmit到table 64,执行output action。egress处理里不能更改logic output port和做隧道封装。
  5. table 64会检查loopback(MLF_ALLOW_LOOPBACK标志),openflow默认禁止loopback(除非使用IN_PORT action)。如果MLF_ALLOW_LOOPBACK设置了,会保存in_port值,清空in_port寄存器,resubmit到table 65,绕过openflow限制。
  6. table 65执行逻辑到物理的转换,跟table 0相反。匹配逻辑端口,发送到本地bridge里的物理端口。如果是VM内的容器,则会加上vlan再发送。

实例解析

环境搭建

  • A Primer on OVN
  • An Introduction to OVN Routing
  • The OVN Gateway Router
  • 这里用的ovs 2.7.0

拓扑

1
2
3
4
5
6
7
8
9
10
11
12
13
physical network
|
outside(sw)
| <10.127.0.129/25>
edge1(gr) <172.16.255.1/30> physical network
| /
transit(join sw) out(sw)
| <172.16.255.2/30> / <10.127.1.2/24>
tenant1(dr) <172.16.255.129/26,172.16.255.193/26>
/ \
dmz(sw) inside(sw)
/ | \ / \
vm1 vm2 vm5 vm3 vm4
1
2
host1,vm1(02:ac:10:ff:01:30/172.16.255.130) vm3(02:ac:10:ff:01:94/172.16.255.194) vm5(02:ac:10:ff:01:32/172.16.255.132)
host2,vm2(02:ac:10:ff:01:31/172.16.255.131) vm4(02:ac:10:ff:01:95/172.16.255.195)

左边是中心化的router做nat,右边是分布式的router做nat。

寄存器值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
metadata = tun_id
0x6 dmz
0x5 inside
0x4 transit
0x7 tenant1
0x8 outside
0x3 edge1
0x9 out
reg14,
dmz,
vm1 reg14=0x2
vm5 reg14=0x4
vm2 reg14=0x3
dmz-tenant1 reg14=0x1
inside,
vm3 reg14=0x2
vm4 reg14=0x3
inside-tenant1 reg14=0x1
tenant1,
tenant1-transit reg14=0x3
tenant1-dmz reg14=0x1
tenant1-inside reg14=0x2
transit,
transit-tenant1, 0x2
transit-edge1, 0x1
edge1,
edge1-outside 0x2
edge1-transit 0x1

逻辑流表分析

从vm1 ping vm4,
ovn-sbctl dump-flows,
dmz datapath ingress logic flow,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
table=16,L2 安全验证。in_port和源mac是否匹配。
table=17,L3 安全验证。dhcp放行,源mac和源ip是否匹配。
table=18,arp/nd 安全验证。
安全验证可以关闭,ovn-nbctl lsp-set-port-security
table=19,pre_acl,放行dmz-tenant1进来的包,如果是local的port发出来的,则reg0[0]=1,比如vm1。
table=20,pre_lb。
table=21,pre_stateful。如果是reg0[0]=1,则过ct,跳到22,否则跳22。
table=22,出向acl,acl是作用在整个logic datapath上的,这里是stateful的acl(from-lport drop)
ovn-nbctl acl-list dmz
from-lport 0 (udp.dst==1234) drop
to-lport 100 (tcp.dst==22) allow-related
to-lport 100 (tcp.dst==23) allow-related
drop并不是没有状态的,flow里ct_label用来做是否由于policy change导致要drop packet。
table=23,qos_mark
table=24,lb
table=25,stateful
table=26,arp response,如果是arp代答,reg10=0x1,跳32,如果是garp则next。
table=27,dhcp,dhcp参数写到了flow里,送给controller,controller可以直接回。
table=28,dhcp dhcp应答,跳32
table=29,l2查找(整个datapath),如果是多播,设置outport是多播的(reg15=0xffff),跳32;本地vm的单播,设置对应的outport(逻辑的,包括网关),跳32;这里reg15=0x1
reg15=0xfffe, unknown

dmz datapath egress logic flow,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
table=48,pre_lb,next
table=49,pre_acl,不是去dmz-tenant1的设置reg0[0] = 1
table=50,pre_stateful,reg0[0] = 1,ct_next
table=51,lb
table=52,acl,dhcp的应答放行。
table=53,qos_mark
table=54,stateful
table=55,port_sec_ip,目的安全验证。目的mac和IP是否匹配。
table=56,port_sec_l2,目的mac和目的port是否匹配。
table=32,33,34见上面的介绍。
table=64,65见上面的介绍。
table=65,如果目的port是网关,会修改metadata=0x7,reg14=0x1,reg15=0x0,重新跳到16。
metadata=0x7就是logic router tenant1的逻辑了。

tenant1 ingress logic flow,

1
2
table=21,ip_routing,匹配目的段,ttl-1,修改源mac,设置outport=tenant1-inside,reg15=0x2
table=22,arp_resolve,根据outport<reg15>和目的IP<reg0>,修改目的mac。

tenant1 egress logic flow,

1
table=65,修改metadata=0x5,reg14=0x1,reg15=0x0,重新跳到16。

inside ingress logic flow,

1
table=32,从隧道发出去。

host2上,
inside egress logic flow,

跨网段互访,总共有3次traverse。

阅读全文 »

ovn学习-2-ovsdb

发表于 2017-08-22 | 分类于 ovn , openvswitch |

ovsdb协议

ovn-arch

  • 基于json的RPC
  • 协议定义参考 https://www.rfc-editor.org/rfc/rfc7047.txt
  • 当前ovsdb-server(ovs 2.7.0)实现是单线程,性能很低,HA只有主备。

client库

自带 C 和 Python的。其他开源的库有

  • golang
    https://github.com/socketplane/libovsdb/blob/master/example/play_with_ovs.go
    https://github.com/contiv/ofnet/blob/master/ovsdbDriver/ovsdbDriver.go
  • python
    ovsdbapp

以下是一个ovsdbapp使用的例子,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from ovsdbapp.backend.ovs_idl import connection
from ovsdbapp import constants
from ovsdbapp.schema.open_vswitch import impl_idl
ovsdb_connection = connection.Connection(
idl=connection.OvsdbIdl.from_server(
'unix:/var/run/openvswitch/db.sock', 'Open_vSwitch'),
timeout=constants.DEFAULT_TIMEOUT)
api = impl_idl.OvsdbIdl(ovsdb_connection)
result = api.br_exists("br0").execute(check_error=True)
print result
result = api.del_br("br1").execute(check_error=True)
print result

协议交互举例

ovs-vsctl操作的是ovs db。ovs-vswitchd有monitor ovs db,ovs db的变更都会发到ovs-vswitchd。ovs-vsctl和ovs-vswitchd都是通过ovsdb-server进程(unix socket或者是tcp连接)去watch或者update db的。

以ovs-vsctl add-port br0 abc -- set interface abc type=internal为例,
ovs-vsctl会在ovsdb的Port表和Interface表增加一行数据,ovs-vswitchd收到通知后,会创建接口,并且把ofport,mtu,mac,link_state等信息写入Interface表里。写入是否成功呢,还是靠的监控通知机制。

ovs-vsctl -v可以打印出ovs-vsctl和ovsdb-server详细的json rpc交互过程。

ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json这个命令可以watch某些表某些字段的变更通知:

1. 初始时会把所有的内容返回来。
2. 以后只会返回变化的。
3. 注意,返回的action里有initial,delete, new不是实际协议的内容,这是处理过了的。协议里只有old和new。见rfc里的4.1.6里的解释。

ovs-vswitchd消息处理过程如下,

阅读全文 »
123
faicker

faicker

13 日志
14 分类
26 标签
RSS
© 2017 faicker
由 Hexo 强力驱动

Hosted by Coding Pages