All posts by Garret West

Cisco Identity Services Engine – Part 1 – Overview

What is the Cisco Identity Services Engine?

Today’s enterprise network is rapidly changing, especially when it comes to employee mobility. Employees are no longer tethered to desktop workstations, but instead access enterprise resources via a variety of devices: tablets, smartphones, and personal laptops, just to name a few. Being able to access resources from anywhere greatly increases productivity, but it also increases the probability of data breaches and security threats because you may not control the security posture of devices accessing the network. Keeping track of all devices accessing the network is a huge task in itself, and as the need for more access arises, the more unsustainable it becomes to manage.

The Cisco Identity Services Engine (ISE) is an identity-based network access control and policy enforcement system. ISE allows a network administrator to centrally control access policies for wired and wireless endpoints based on information gathered via RADIUS messages passed between the device and the ISE node, also known as profiling. The profiling database is updated on a regular basis to keep up with the latest and greatest devices so there are no gaps in device visibility.

Essentially, ISE attaches an identity to a device based on user, function, or other attributes to provide policy enforcement and security compliance before the device is authorized to access the network. Based on the results from a variety of variables, an endpoint can be allowed onto the network with a specific set of access rules applied to the interface it is connected to, else it can be completely denied or given guest access based on your specific company guidelines.

Let’s analogize LOTR-style for clarification: ISE is Gandalf, and the end-user device is the pursuing Balrog. I think you know where this is going.

You_Shall_Not_Pass
YOU SHALL NOT PASS!

ISE an automated policy enforcement engine that takes care of the mundane day-to-day tasks like BYOD device onboarding, guest onboarding, switchport VLAN changes for end-users, access list management, and many others, so a network administrator can focus on other important tasks (and cool projects!).

ISE Basics

The ISE platform is typically a distributed deployment of nodes made up of three different personas: Policy Administration Node (PAN), Monitoring and Troubleshooting Node (MnT), and Policy Services Node (PSN). All three roles are required for ISE to function.

Policy Administration Node (PAN)

The PAN persona is the interface an administrator logs into in order to configure policies. It is the control center of the deployment. This node allows an administrator to make changes to the entire ISE topology, and those changes are pushed out from the admin node to the Policy Services Nodes (PSN).

Policy Services Node (PSN)

The PSN persona is where policy decisions are made. These are the nodes where network enforcement devices send all network messaging to; RADIUS messaging is an example of what is sent to the PSNs. The messages are processed and the PSN gives the go/no-go for access to the network.

Monitoring and Troubleshooting Node (MnT)

The MnT persona is where logging occurs and reports are generated. All logs are sent to this node and it sorts through them so it can assemble them in a legible format. It is also used to generate various reports so you can make management happy with pretty pictures and numbers (*wink wink*) as well as notify you of any alarms for ISE.

How ISE Works

Now that you know what each persona does, let’s take a look at how everything fits together as a complete system. The diagram shows a logical representation of ISE, because the personas may be distributed across many different appliances. Familiarize yourself with the figure below, and I will explain what each piece is doing:

http://www.cisco.com/en/US/solutions/collateral/ns340/ns414/ns742/ns744/docs/howto_50_ise_deployment_tg.pdf
ISE Communication Model

The figure above is from the Cisco Trustsec How-To Guide: ISE Deployment Types and Guidelines. If you’re considering deploying ISE, I really recommend reading all ISE Design Guides before you plan your implementation.

  1. Communication starts with the endpoint. This could be a laptop, smartphone, tablet, security camera, videoconferencing system — anything that requires network access.
  2. The client must connect through a network access device — a switch, a wireless LAN controller, or a VPN concentrator — in order to gain access to the network. This is where enforcement of all policies takes place.
  3. The endpoint is asked for authentication via an 802.1X request, and that request is sent to the Policy Services Node.
  4. At this point, the PSN has already been given a specific configuration from the Admin node. The PSN will process the credentials (it may need to query an external database for this; LDAP or Active Directory, for example), and based on the configuration set the PSN will make an authorization decision.
  5. The PSN sends the decision back to the network access device so it can enforce the decision. The network access device is sent specific actions to take for the session. Many actions can be taken at this point depending on the policy, but a few common features are dynamic access lists, change of authorization (to switch VLANs, for example), and Security Group Tags (part of the Cisco TrustSec solution).
  6. Now the client can access specific resources based on what the PSN has sent back as a rule set. Alternatively, the client can be redirected to the guest login page or completely denied access to the network.
  7. All of these messages passed back and forth are all logged to the monitoring node, where they can be viewed from the admin node in an organized format.

This is definitely a complex beast with a lot of moving parts, but as long as you keep the fundamentals in mind and break it down into different parts, it’s not too tough to implement and troubleshoot. The most time-consuming part of a deployment is figuring out your policies for authorization. Once you have standard policies across the board, enforcing those policies is a breeze with ISE. I will go into policies in my next post, but let’s move onto the last topic for the overview: Deployment Topologies and Licensing.

Physical Deployment Examples

What I’m about to say here is probably the most important part of a deployment: DO NOT TRY TO SAVE MONEY BY GIVING UP HIGH AVAILABILITY! These nodes control access to your entire network. If these nodes go down, you might as well have a total network failure because nobody will get authenticated or authorized. Design ISE with as much high availability as you can afford. The only time a standalone deployment is acceptable is if you are doing a very small proof-of-concept which does not effect production end-users.

The other important part of a deployment is the hardware chosen to implement ISE on. Now, Cisco does offer an ESX/ESXi option, however, I don’t recommend that for a few reasons. First and foremost, the appliance option is tested and rated to scale to a certain number of endpoints. If you use the ESX/ESXi option, you are losing a little bit of that predictability. I said it before and I will say it again, these nodes control access to your entire network, so if you have unpredictable performance, then you will have unpredictable issues. The other thing I don’t like about an ESX/ESXi option is troubleshooting. If you do have an issue with ISE, you really want it resolved quickly. If you’re using the VM deployment and something goes wrong, you have to open tickets with Cisco, your server manufacturer, VMWare, and anything else that may be tied to your deployment. That’s not incredibly efficient, and you’re likely to run into a lot of vendor finger-pointing before you finally get the issue resolved. If you go with the Cisco appliance route, you open a ticket with Cisco, and that’s it! Smartnet covers the software and the hardware, so it makes the issue resolution process much simpler. I will say it one last time: These nodes control access to your entire network!

With that said, let’s get into the actual deployment options:

  • Standalone
    • One node running all three personas (PAN, MnT, PSN).
    • No redundancy.
    • Limited to a maximum of 2000 endpoints, regardless of hardware type.
  • Two-Node Deployment
    • Two nodes running all three personas (PAN, MnT, PSN).
    • Simple redundancy – one node is assigned the primary admin role and secondary monitoring role, and the other node is assigned the secondary admin role and primary monitoring role. They both run the policy services node, and network access devices are configured to use both PSNs.
    • Still limited to a maximum of 2000 endpoints.
  • Mid-sized Deployment – Separate Policy Services Nodes
    • Two nodes taking on the admin and monitoring persona, but no policy service persona.
    • Policy service nodes are standalone boxes. You can have up to 5 PSNs in a deployment where the PAN and MnT are collocated.
    • Limited to a maximum of 10000 endpoints regardless of hardware capacity.
  • Large Deployment
    • Every persona has its own dedicated node. You have a primary admin node, a secondary admin node, a primary monitoring node, a secondary monitoring node, and up to 40 policy services nodes. This deployment is so large that you will likely need load balancers to serve virtual IPs with a group of PSNs sitting behind it.
    • The endpoint capacity is based on the hardware of nodes. Prior to ISE version 1.2, the maximum capacity was 100000 endpoints. In version 1.2, that limitation was increased to 250000 endpoints.

If you’re looking for the performance metrics for each appliance, take a look at the design guides I linked to above. There are plenty of charts and other goodies to explain everything.

Got it? Good! Onto licensing this bad boy!

Licensing

ISE is offered in a few different flavors of licensing: functionality-based or deployment-based.

Functionality-based is the “full-steam ahead” type of licensing where all network access devices are supported and feature-sets are licensed. You can choose from the Base License or Base + Advanced License. The Base License is for deployments that only need to authenticate and authorize users and devices, provision guest users, access reporting features, and monitor and troubleshoot access to the network. The base license is perpetual (it has no term subscription limit). The Advanced License expands on the base license and enables organizations to make more advanced decisions (it has all of the cool features that you really want in a deployment). The features include device onboarding and provisioning, device profiling and feed service, posture services, mobile device management integration, and security group access capabilities. This license is term-based with a choice of 3- or 5-year term subscriptions. The base license is a prerequisite for the advanced license.

Deployment-based licensing is the slow, phased approach to deploying ISE. This type of licensing allows you to start with wireless endpoints only and expand to wired and VPN later when your organization is ready. Due to the complexity of ISE, I recommend using the phased approach and really getting to know the product before rolling it out to the entirety of the network. The Wireless License includes everything that the base + advanced license does, but it only applies to wireless network access devices. The wireless license is term-based with a choice of 3- or 5-year term subscriptions. This license typically satisfies most BYOD (Bring Your Own Device) policies management may be asking for. Once ISE has been proven effective on the wireless front, it’s typically pretty easy to justify rolling it out to wired and VPN devices as well using the Wireless Upgrade License. The wireless upgrade license is the same as wireless, but it expands the functionality of ISE to wired and VPN network access devices. It is also licensed on 3- or 5-year terms.

Along with the term length, each license has an endpoint limit (100, 250, 500, 1000, 1500, and so on). Keep in mind, this is not total endpoints, but simultaneously authorized endpoints. If an endpoint isn’t authorized, it doesn’t increment the license count. If an endpoint is authorized and then leaves the network, the license count decrements because it is de-authorized.

SO! Hopefully this information was helpful. It’s a lot to take in, and there are many nooks and crannies to navigate in order to have a successful ISE deployment. Keep an eye out for Part 2: Wireless ISE Deployment, where I’ll get into the technical details of the deployment type most organizations decide to start with.

As always, leave a comment below or send me an email at garret@thenetworksurgeon.com with any questions!

Cisco Spine and Leaf Architecture Discussion – Nexus 5500 vs 6001

Spine and Leaf Basics

As virtualization, cloud computing, and distributed cloud computing (Hadoop, for example) becomes more popular in the data center, a shift in the traditional three-tier networking model is taking place as well.

The traditional core-aggregate-access model is efficient for traffic that travels “North-South”, which is traffic that travels in and out of the data center. This kind of traffic is typically a web service of sorts–HTTP/S, Exchange, and Sharepoint, for example–where there is a lot of remote client/server communication. This type of architecture is usually built for redundancy and resiliency against a failure. However, 50% of the critical network links are typically blocked by the Spanning-Tree Protocol (STP) in order to prevent network loops, just to sit idly as a backup, which means 50% of your maximum bandwidth is wasted (until something fails). Here is an example:

 

The traditional three-tier network design
The traditional three-tier network design

This type of architecture is still very widely used for service-oriented types of traffic that travel North-South. However, the trends in traffic patterns are changing with the types of workloads that are common in today’s data centers: East-West traffic, or server-to-server traffic. Take a look at the diagram above. If a server connected to the left-most access switch needs to communicate with a server connected to the right-most access switch, what path does it need to take? It travels all the way to the core switch and back down again. That is not the most efficient path to take, and causes more latency while using more bandwidth. If a cluster of servers (this number can be in the hundreds, or even thousands) is performing a resource-intensive calculation in parallel, the last thing you want to introduce is unpredictable latency or a lack of bandwidth. You can have extremely powerful servers performing these calculations, but if the servers can’t talk to each other efficiently because of a bottleneck in your network architecture, that is wasted capital expenditure.

So how do you design for this shift from North-South to East-West traffic? One way is to create a Spine and Leaf architecture, also known as a Distributed Core. This architecture has two main components: Spine switches and Leaf switches. You can think of spine switches as the core, but instead of being a large, chassis-based switching platform, the spine is composed of many high-throughput Layer 3 switches with high port density. You can think of leaf switches as your access layer; they provide network connection points for servers, as well as uplink to the spine switches. Now, here is the important part of this architecture: every leaf switch connects to every spine switch in the fabric. That point is important because no matter which leaf switch a server is connected to, it always has to cross the same amount of devices to get to another server (unless the other server is located on the same leaf). This keeps the latency down to a predictable level because a payload only has to hop to a spine switch and another leaf switch to get to its destination.

A small leaf and spine architecture
A small-scale leaf and spine architecture

You would typically have many more spine and leaf switches in a deployment, but this small-scale diagram gets the fundamental design points across. The beautiful thing about this design is that instead of relying on one or two monster chassis-based switches at the core, the load is distributed across all spine switches, making each spine individually insignificant as you scale out.

Before you design an architecture like this, you will need to know what the current and future needs are. For example, if you have a server count of 100 today and that will eventually scale up to 500 servers, you need to make sure your fabric can scale to accommodate future needs. There are two important variables to calculate your maximum scalability: the number of uplinks on a leaf switch and the number of ports on your spine switches. The number of uplinks on a leaf switch determines how many spine switches you can have in your fabric–remember: every leaf switch has to connect to every spine switch in the fabric! Also, the number of ports on a spine switch determines how many leaf switches you can have; this is why spine switches need to have a high port density. Let’s take the example of 100 servers today with a need to scale to 1000 servers in the future. If we plan on using a 24-port 10Gbps switch for the leaf layer, utilizing 20 ports for servers and 4 ports for uplinks, we can have a total of 4 spine switches. If each spine switch has 64 10Gbps ports, we can scale out to a maximum of 64 leaf switches. 64 leaf switches x 20 servers on each switch = 1280 maximum servers in this fabric. Keep in mind this is a theoretical maximum and you will need to accommodate for connecting the fabric to the rest of the data center. Regardless, this design will allow for seamless scalability without having to re-architect your fabric. You can start off with 5 leaf switches and 4 spine switches to meet your current need of 100 servers and scale out leaf switches as more servers are needed.

Another factor to keep in mind when designing your fabric is the oversubscription ratio. This ratio is calculated on the leaf switches, and it is defined as the max throughput of active southbound connections (down to servers) divided by the max throughput of active northbound connections (uplinks). If you have 20 servers each connected with 10Gbps links and 4 10Gbps uplinks to your spine switches, you have a 5:1 oversubscription ratio (200Gbps/40Gbps). It is not likely that all servers are going to be communicating at 100% throughput 100% of the time, so it is okay to be oversubscribed. Keeping that in mind, work with the server team to figure out what an acceptable ratio is for your purpose.

Cisco Nexus 5500 vs 6001

A common deployment of a spine and leaf architecture I have seen is to use Cisco Nexus 5548 or 5596 switches with a Layer 3 daughter card as the choice of spine switch. At first glance, this looks like a great switch at a low price point to use as a spine: 960Gbps forwarding on the 5548 and 1920Gbps forwarding on the 5596, as well as 48 or 96 10Gbps ports which is plenty of density for a small to mid-sized implementation (48 or 96 leaf switches). However, what is commonly missed in the specifications sheet is that once you add the Layer 3 daughter card into one of these switches, the Layer 3 forwarding drops down to 160Gbps or 240mpps (240 million packets per second). This is a huge performance hit and is definitely not sufficient for a spine switch at a large scale. Also of note, the MAC address table can handle 32,000 entries. The list price for a Cisco Nexus 5548 with the 16 port expansion module and Layer 3 daughter card comes out to $41,800 (without SMARTNET services attached).

Now let’s take a look at the Cisco Nexus 6001. The reason I’m comparing this to the Nexus 5548 for this application is because Cisco recently dropped the list price of the 6001 by 42%, which is a huge list price drop. If you buy through a reseller, the discounts will be even deeper. Onto the specifications comparison, take a look at the table below:

Switch Port Density Forwarding Rate – Layer 2 Forwarding Rate – Layer 3 MAC Entries List Price
Nexus 5548 48 SFP+ ports 960 Gbps or 714.24 mpps 160 Gbps or 240 mpps 32,000 $41,800
Nexus 6001 48 SFP+ and 4 QSFP+ ports 1.28 Tbps 1.28 Tbps 256,000 $40,000

The Nexus 6001 beats the 5548 in each comparison above, even in port density because you can use a breakout cable to convert the QSFP+ interfaces into 4 SFP+ interfaces each, adding a total of 16 more 10Gbps interfaces. Performance does not change on the 6001 regardless of Layer 2 or Layer 3 forwarding. The MAC table is 8 times larger. Plus, by the time you add a 16 port expansion module (to bring the port count from 32 to 48) and a Layer 3 daughter card to the Nexus 5548, the 6001 list price (with the recent discount) actually comes in lower. If you have a Cisco infrastructure already and are looking to build a small distributed core architecture, the Nexus 6001 is a no-brainer as a spine switch. For a larger-scale architecture, the Nexus 6004 provides up to 384 10Gbps interfaces or 96 40Gbps interfaces, but at a much larger price point of course.

This post was a ton of information, so if you have any questions or comments, I will be glad to answer them. Leave a comment or email me at garret@thenetworksurgeon.com. I look forward to hearing some discussion around this!

The Network Surgeon is Operating

Congratulations! You’ve stumbled upon my blog. Now what?

I want to begin this journey into the depths of the all-encompassing blogosphere by introducing who I am, what I do, and why I created this soon-to-be dump of information for your mind-hole.

Oh God, please, no! Not the Blogosphere!

Who Am I, What Do I Do, and Why Should You Care?

My name is Garret West, I’m a network engineer working in the VAR space at NVIDIA, and you don’t have to care because I’m writing this whether you like it or not! Ha!

In all seriousness, I am writing this blog as a dump of information as I’m working on various projects. It helps me to digest information by writing it down and really explaining the details — if other people can benefit from it as well, then that’s even better! I hope at some point this blog can turn into a small community of sharing ideas, best practices, technology news, and the like; really, an open forum for discussion. One of the best ways to learn is by talking with your peers, and why not open this up to our virtual peers as well.

As I said, I’m a network engineer and I have worked in the reseller space for about seven years. As of this writing, I have accepted a position at NVIDIA and will be transitioning to the enterprise space. Given that, I get my hands on a lot of different areas of networking and virtualization, as well as a breadth of different vendors. This includes, but is definitely not limited to:

  • Branch, Campus, and Data Center Routing and Switching
  • Enterprise WLAN
  • Data Center Computing and Virtualization
  • Security
  • Storage

At the moment, I’m mainly focused on data center routing and switching, which is primarily Cisco NX-OS for the most part. However, my posts won’t only be related to that.

I’m also in the process of studying for Cisco’s Data Center Certification track. There isn’t much information out there about how to study for these exams (there isn’t even an official Cisco Press book for this track yet), so as I find good resources for study material I will definitely be sharing those.

As you read my posts, I urge you to comment and provide feedback! Is the information helpful? Is the information accurate? Do you have questions about something? Open up the discussion, and I’ll do everything I can to answer you.

I’m using the default WordPress theme right now, but I figure the content is more important than the layout. I will be customizing the theme in the future, but bear with me for the time-being!

Thanks for visiting my blog, and I look forward to the future!

 

Garret West
garret@thenetworksurgeon.com

Disclosure: I am an employee of NVIDIA, however, this is my personal blog and any posts reflect my own opinions, not necessarily those of NVIDIA.