in this post I want to share my recent experience with the Installation of R75.40VS. Last year we noticed that our old Nokia IP390 Appliances were seeing more and more CPU usage and that was with Firewalling only. We couldnt even turn on IPS due to CPU usage.
So we decided to consolidate two IP390 cluster into one Checkpoint 12.400 Appliance Cluster and run them as virtual Clusters inside R75.40VS VSX with IPS and some room for further growth.
Fresh instllation of the Appliances was fully automated and thus quite easy. The real issues began during the configuration in SmartDashboard:
1. Problem: Object Creation – Clustering Method
Funny enough the first problem really was the simple prcoedure of creating the new VSX Container Object.
The First dialogue asks for a clustering Method. And I was delighted to see that I can select IPSO VRRP as we have a lot of experience with VRRP by know and had 6 Nokia Clusters running VRRP quite stable:
However when finishing the Object creation wizzard SmartDashboard presented me with this nice and userfriendly error message:
‘The value ” is not in the list of valid values’ – WTF?
So I had my first encounter with checkpoint Support on this Cluster which gladly was quite quickly resolved by them telling me that VRRP is not supported. Sure, this is even documented in the Checkpoint Administration Guide, but come on Checkpoint, why do you add a dropdown menu where you can chose Cluster method if there is only Support for one Cluster Method: ClusterXL?!
So problem 1 Resolved!
2. Problem: Object Creation – Cluster Kernel Panic / SecureXL
So after I switched to ClusterXL only to find that I still could not create the Object but get a whole new world of fun:
Together with the Support we figured out that SecureXL seemed to be the underlying cause of the Problem and I was able to finally finish the creation of the Cluster Object by disabling SecureXL on both cluster member before creating the VSX Object.
It did not make me happy that we had to disable SecureXL because in our expereince Checkpoint performance relies heavily on this feature.
However after all this weeks time was running out and we were facing our planed date to put this cluster into production so I put this issue aside for now just to get frustrated yet again:
3. Problem: Policy did not match its traffic / SecureXL still crashing the Cluster / Evil Drop Templates
So now with the VSX Cluster Object (Networking, Interfaces, Routing, etc.) finally configured in SmartDashboard I noticed the next problem: The shiny new hitcounters showed no single Match for any of our Rulebase entries. Not even the final drop rule! SmartView Tracker and SmartLog also logged no traffic!
After a bit of troubleshooting arround I noticed that the traffic was dropped silently (coincidentally we had exact the same issue with an IP290 cluster we used for testing GAIA):
[Expert@GW]# fw ctl zdebug + drop ;fw_log_drop: Packet proto=6 10.0.0.1:60366 -> 10.66.1.2:22 dropped by fw_handler_first_packet Reason: Rulebase drop - NO match; ;fw_log_drop: Packet proto=6 10.0.0.1:60367 -> 10.66.1.2:22 dropped by fw_handler_first_packet Reason: Rulebase drop - NO match; ;fw_log_drop: Packet proto=6 10.0.0.1:60368 -> 10.66.1.2:22 dropped by fw_handler_first_packet Reason: Rulebase drop - NO match;
Also enabling SecureXL would still result in the same Kernel Panic that appeared before on VSX Object creation with SecureXL activated.
This time I was in for a proper Support Case that would take me another 3-4 Weeks.
The Root-Cause was the fact that Drop Templates are not Supported by R75.40VS. During the Support Case I was asked a couple of times about this by Checkpoint Support however they always pointed me to global properties to Check if the Checkbox for Drop Templates was disabled.
Funny thing is that Checkpoint seems to have removed this option from the GUI alltogether but does NOT check if the feature was enabled previously! So I waited a couple of weeks in which the Support Technicians rotatet and asked me all the questions again and again till finally someone came up with the idea to disable Drop Templates in the Database via GUIDBedit.
Finally after 5 Minutes GUIDBedit and a final policy push traffic started to match the Policy! A day later SK88040 appeared in the Support Center and im pretty sure we were at least part of the reason it was created:
4. Taking the Cluster online, Ongoing Issues with SecureXL
After those initial Problems were resolved we were able to configure and test everything so we finally took the cluster productive.
So far it is working and the VSX concept itself is quite nice however we still have ongoing software bugs/problems/(maybe configuration errors?):
- SecureXL is still not working correctly: After we disabled drop templates I can now turn it on without the cluster Panic’ing however when turned on ClusterXL reports some Interfaces (even virtual Sync Interfaces from VirtualSystems) as down and takes the Clustermember offline
- Probably linked to SecureXL not operating: The CPU usage is just to high. On an average Day the CPU stays arround 60% usage for maybe 40.000 Connections and maybe arround 120Mbit througput with Firewall and IPS enabled.
- Strangely disabling IPS does not reduce the CPU load at all, I suspect some kind of bug that keeps the IPS engine on even if it is disabled on every VS
- We witnessed Cluster Failovers on policy Push. Smartview Tracker reports interfaces marked as Down on this occasions however we did not have any Network issues so I assume this is also linked to high CPU load on policy push. This happend alot when we tested Application Control and Anti Bot blade. After disabling everything except FW and IPS those Cluster failovers reduced to a minimum
- Latency Issues: Pings constantly vary between 1-30 ms under normal operation (50-60% CPU). Our previous IP390 Cluster had a constant =1-2ms Latency under normal operations.
So still not at the Finish line, Checkpoint Support mentioned they would produce a patch that should resolve the SecureXL issue, and maybe this resolves all the rest of our problems. I remain curious and will post Updates on this matter when things progress.
Might this post help some poor soul out there ;)