Copyright @ Lenovo US
Netbox is a nice level 1 tool (for levels, see capability model). But we need more. Managing a lab or a rack is more than level 1. On top of all these, we talk about automation, which implies capabilities at least to level 4. Therefore, we used Netbox code as a starting point, and built a POC.
It has been a painful experience to witness how we are diagnosing
infrastructure, especially network connections, in a what I would call
chasing the rabbit fashion in which
on a switch, to which port the two ends of a cable plugged into, all
coming to a single chain of a
valid connection. Part of it is to
verify hardware connections (cablings), part of software
configurations (many layers deep). A knowledgeable person can traverse
the link from one end to the other, but even he is faced with a harsh
reality of which area is to focus on.
If you think of it in the highest level — a topology design, its mirror image is the reality. Scanning devices to acquire and compose the reality is what computer is good at. Therefore, like counting inventory, it shall have the capability to replace the typing of repetitive commands by a human hand, and piece-meal ocean of meta data into a logical, meaningful view that saves operator mechanical efforts. Without knowing the design, at least it should produce, and even maintain, the reality view, on demand and continuously.
Taking this further, if reality can be described, the same syntax shall be used to describe expectation (design). Now, we will be equiped with both views and produce a diff → Previously we are relying on an experienced operator to know where to look; in the future this diff view highlights it, color-codes it, for anyone who wants to look, anytime, and no devop expertise required.
If you have read system capability model, you must have identified that this is a capability approach. This shifts the focus of management from bookkeeping to knowledge automation. The goal is not to eliminate human factor, but to alleviate waste of their bandwidth on things that can be well known, well modeled, and scriptable.
Analysis requires intelligence; SSH to ten machines does not.
There are essentially two type of models: logical and physical. Logic models are logical relationships, eg. device → tenant. Tenant is nothing but a logical concept. Physical models are describing a relationship that requires a physical connection, eg. an interface is connected to a switch port (via a cable).
The center piece of Netbox models is the
Device, representing a
physical device such as server and switch. This makes sense as the
primary physical asset of a data center are certainly these devices.
DeviceRole is a user-defined value list that can be assigned to a
device. Its common use is to group device by its function, such as
what we see above — "Management Switch", "Ceph".
A device role also defines a color, thus making a color-coded presentation of a list of devices possible.
DeviceType describes attributes such as manufacturer and model. There
are two important flags that will affect how a device can be used:
- is a
network device? ← if not, the device will not have an interface → therefore, it will not be able to link to an IP!
- can have
child device? → if not, it will not have bay device (we model BMC controller as a bay device inside a server).
Access a device
How to access a device? In bare essence
we need three things: (IP, username, password) (we will explain
Netbox's way to control password in a later section). Further, we use
platform value to determine access method:
Lenovo ENOS/CNOS: these are Lenovo switch operating systems, and access method is Telnet (see Network switch for details.)
Interface & topology
A device whose device type sets
is_network_device=True can be linked
Interface, and an interface can be linked to an IP address.
InterfaceConnection linking two interfaces is the corner stone
to describe a network topology:
CONNECTION_STATUS_PLANNED = False CONNECTION_STATUS_CONNECTED = True CONNECTION_STATUS_CHOICES = [ [CONNECTION_STATUS_PLANNED, 'Planned'], [CONNECTION_STATUS_CONNECTED, 'Connected'], ] class InterfaceConnection(models.Model): """ An InterfaceConnection represents a symmetrical, one-to-one connection between two Interfaces. There is no significant difference between the interface_a and interface_b fields. """ interface_a = models.ForeignKey( 'Interface', related_name='connected_as_a', on_delete=models.CASCADE) interface_b = models.ForeignKey( 'Interface', related_name='connected_as_b', on_delete=models.CASCADE) connection_status = models.BooleanField( choices=CONNECTION_STATUS_CHOICES, default=CONNECTION_STATUS_CONNECTED, verbose_name='Status')
In the example topology diagram below, we see:
- Connect between port #34 of a switch (
LCTC-R1U39-SW) to a BMC interface (
ceph-1is the BMC controller's name).
A server (
ceph-node-brain2) has two interfaces —
eno1is connected to port 2 of a switch (
ens4f1is connected to port 18 of another switch (
Children device (
A device can have device bays, which essentially forms a
parent-children relationship. One use case of this relationship is
to register BMC controller as a bay device inside a server:
Conditions to form a parent-child relationship:
- parent device allows bay/child (defined in
- one can not install into itself
Related devices is defined as devices that:
- belong to the same
- and has the same device role
In Netbox this is only a convenience for navigation. However we can extend this idea using other definitions.
Just how many ways one can group devices? Maybe too many. Note that there are overlapping groups which are very confusing!
- Rack can also be grouped by
- Both rack and site can be grouped by
device_type = models.ForeignKey( 'DeviceType', related_name='instances', on_delete=models.PROTECT) device_role = models.ForeignKey( 'DeviceRole', related_name='devices', on_delete=models.PROTECT) tenant = models.ForeignKey( "tenancy.Tenant", blank=True, null=True, related_name='devices', on_delete=models.PROTECT) platform = models.ForeignKey( 'Platform', related_name='devices', blank=True, null=True, on_delete=models.SET_NULL) site = models.ForeignKey( 'Site', related_name='devices', on_delete=models.PROTECT) rack = models.ForeignKey( 'Rack', related_name='devices', blank=True, null=True, on_delete=models.PROTECT)
Rack is the physical grouping of devices. The most important thing about rack is whether it has availabe space to contain a device:
- rack height
- how much have been reserved (
- what devices does it already have? what are their height and depth?
Depth comes to play because some devices can be half-depth, thus allowing two devices in the same slot — one facing the front, and one facing the back. Also, it is common that top-of-rack switch is mounted facing back so that cables can access its port.
There are four ways to group racks:
site = models.ForeignKey( 'Site', related_name='racks', on_delete=models.PROTECT) group = models.ForeignKey( 'RackGroup', related_name='racks', blank=True, null=True, on_delete=models.SET_NULL) tenant = models.ForeignKey( Tenant, blank=True, null=True, related_name='racks', on_delete=models.PROTECT) role = models.ForeignKey( 'RackRole', related_name='racks', blank=True, null=True, on_delete=models.PROTECT)
— by Feng Xia