R75.40VS – VSX installation Odyssey – My first SK

Hi,

in this post I want to share my recent experience with the Installation of R75.40VS. Last year we noticed that our old Nokia IP390 Appliances were seeing more and more CPU usage and that was with Firewalling only. We couldnt even turn on IPS due to CPU usage.

So we decided to consolidate two IP390 cluster into one Checkpoint 12.400 Appliance Cluster and run them as virtual Clusters inside R75.40VS VSX with IPS and some room for further growth.

Fresh instllation of the Appliances was fully automated and thus quite easy. The real issues began during the configuration in SmartDashboard:

1. Problem: Object Creation – Clustering Method

Funny enough the first problem really was the simple prcoedure of creating the new VSX Container Object.
The First dialogue asks for a clustering Method. And I was delighted to see that I can select IPSO VRRP as we have a lot of experience with VRRP by know and had 6 Nokia Clusters running VRRP quite stable:

step2

However when finishing the Object creation wizzard SmartDashboard presented me with this nice and userfriendly error message:

step13

‘The value ” is not in the list of valid values’ – WTF?

So I had my first encounter with checkpoint Support on this Cluster which gladly was quite quickly resolved by them telling me that VRRP is not supported. Sure, this is even documented in the Checkpoint Administration Guide, but come on Checkpoint, why do you add a dropdown menu where you can chose Cluster method if there is only Support for one Cluster Method: ClusterXL?!

So problem 1 Resolved!

2. Problem: Object Creation – Cluster Kernel Panic / SecureXL

So after I switched to ClusterXL only to find that I still could not create the Object but get a whole new world of fun:

wizzard2

Together with the Support we figured out that SecureXL seemed to be the underlying cause of the Problem and I was able to finally finish the creation of the Cluster Object by disabling SecureXL on both cluster member before creating the VSX Object.

It did not make me happy that we had to disable SecureXL because in our expereince Checkpoint performance relies heavily on this feature.

However after all this weeks time was running out and we were facing our planed date to put this cluster into production so I put this issue aside for now just to get frustrated yet again:

3. Problem: Policy did not match its traffic / SecureXL still crashing the Cluster / Evil Drop Templates

So now with the VSX Cluster Object (Networking, Interfaces, Routing, etc.) finally configured in SmartDashboard I noticed the next problem: The shiny new hitcounters showed no single Match for any of our Rulebase entries. Not even the final drop rule! SmartView Tracker and SmartLog also logged no traffic!

After a bit of troubleshooting arround I noticed that the traffic was dropped silently (coincidentally we had exact the same issue with an IP290 cluster we used for testing GAIA):

[Expert@GW]# fw ctl zdebug + drop
;fw_log_drop: Packet proto=6 10.0.0.1:60366 -> 10.66.1.2:22 dropped by fw_handler_first_packet Reason: Rulebase drop - NO match;
;fw_log_drop: Packet proto=6 10.0.0.1:60367 -> 10.66.1.2:22 dropped by fw_handler_first_packet Reason: Rulebase drop - NO match;
;fw_log_drop: Packet proto=6 10.0.0.1:60368 -> 10.66.1.2:22 dropped by fw_handler_first_packet Reason: Rulebase drop - NO match;

Also enabling SecureXL would still result in the same Kernel Panic that appeared before on VSX Object creation with SecureXL activated.

This time I was in for a proper Support Case that would take me another 3-4 Weeks.

The Root-Cause was the fact that Drop Templates are not Supported by R75.40VS. During the Support Case I was asked a couple of times about this by Checkpoint Support however they always pointed me to global properties to Check if the Checkbox for Drop Templates was disabled.

Funny thing is that Checkpoint seems to have removed this option from the GUI alltogether but does NOT check if the feature was enabled previously! So I waited a couple of weeks in which the Support Technicians rotatet and asked me all the questions again and again till finally someone came up with the idea to disable Drop Templates in the Database via GUIDBedit.

Finally after 5 Minutes GUIDBedit and a final policy push traffic started to match the Policy! A day later SK88040 appeared in the Support Center and im pretty sure we were at least part of the reason it was created:

https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk88040&js_peid=P-114a7bc3b09-10006&partition=Advanced&product=Security

4. Taking the Cluster online, Ongoing Issues with SecureXL

After those initial Problems were resolved we were able to configure and test everything so we finally took the cluster productive.

So far it is working and the VSX concept itself is quite nice however we still have ongoing software bugs/problems/(maybe configuration errors?):

  • SecureXL is still not working correctly: After we disabled drop templates  I can now turn it on without the cluster Panic’ing however when turned on ClusterXL reports some Interfaces (even virtual Sync Interfaces from VirtualSystems) as down and takes the Clustermember offline
  • Probably linked to SecureXL not operating: The CPU usage is just to high. On an average Day the CPU stays arround 60% usage for maybe 40.000 Connections and maybe arround 120Mbit througput with Firewall and IPS enabled.
  • Strangely disabling IPS does not reduce the CPU load at all, I suspect some kind of bug that keeps the IPS engine on even if it is disabled on every VS
  • We witnessed Cluster Failovers on policy Push. Smartview Tracker reports interfaces marked as Down on this occasions however we did not have any Network issues so I assume this is also linked to high CPU load on policy push. This happend alot when we tested Application Control and Anti Bot blade. After disabling everything except FW and IPS those Cluster failovers reduced to a minimum
  • Latency Issues: Pings constantly vary between 1-30 ms under normal operation (50-60% CPU). Our previous IP390 Cluster had a constant  =1-2ms Latency under normal operations.

So still not at the Finish line, Checkpoint Support mentioned they would produce a patch that should resolve the SecureXL issue, and maybe this resolves all the rest of our problems. I remain curious and will post Updates on this matter when things progress.

Might this post help some poor soul out there ;)

Regards
Sebastian

Advertisements

About SebastianB

read it in my blog
This entry was posted in Checkpoint and tagged , , , , , . Bookmark the permalink.

14 Responses to R75.40VS – VSX installation Odyssey – My first SK

  1. Pingback: R75.40VS – The Saga continues – VSX Cluster Cisco ARP problem | IT-Unsecurity

  2. dreezman says:

    good stuff, love your posts. will link to you

  3. Sam says:

    great work mate! By the way how would one migrate the configuration of the IP390 appliances to be a VSX and then make it work with virtual cluster?
    We recently acquired 2 x 4600 and want to migrate 2 clusters of IP390 s to run as virtual clusters, exactly what you have mentioned in your post?
    Any chance you can share the migration process?

    • SebastianB says:

      I guess you can summarize the main steps in the following way:

      – Set up Gaia / OS configuration / VSX activation
      – Create VSX Dashboard object / Basic configuration (stay away from VSwitch in front of the Mgmt Interface!)
      – Create Virtual Systems (Probably best to stay away from VRouter if possible, set interfaces to a temporary IP until you replace the old FW/Default GW of DMZs)
      – Configure Routes in Dashboard VSX Object
      – Prepare policies (probably best to clone existing policies and change cluster specific rules)

      Also it might be best to go directly to R76 or wait for R76.10ish because Checkpoint says hat they fixed a lot of bugs in VSX code since R75.40VS.

      • Sam says:

        Hi Sebastian,

        First of all, thanks for replying so promptly with the information I really needed :-)

        not long ago, I downloaded ipso-migration tools and successfully migrated two IP390s to two S4600 as replacement proving the migration works !
        https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk92638&js_peid=P-114a7ba5fd7-10001&partition=Advanced&product=IP

        Our aim is to run two Nokia VRRP clusters with two members each as-is in VSX for now. That’s where I was not sure how to approach it.
        Because we have used upgrade_import to migrate the management server from windows to smart-1 appliance, I have all the config ready. so currently, the mgmt. server, two S4600 are running R75.40VS and functional.

        I tried to convert the gateway objet to VSX gateway but encountered errors due to not having deployed the policy, going to try again today.

        As per your latest post, this is what I amgoing to do now

        – Set up Gaia / OS configuration / VSX activation
        – Create VSX Dashboard object / Basic configuration (stay away from VSwitch in front of the Mgmt Interface!)
        – Create Virtual Systems (Probably best to stay away from VRouter if possible, set interfaces to a temporary IP until you replace the old FW/Default GW of DMZs)
        – import configuration from IP390 using IPSO-Migrator toolset

        For R76, I would love to wait however it all needs to be in production not later than 5 July, so going to bite the bullet :-)

        Pardon my ignorance, I am new to VSX so its all black art for now :-)

        I will let you know how it goes, thanks for taking time to respond.

        cheers
        sam

      • SebastianB says:

        Hey,

        no worries this was my first VSX experience as well. After you have implemented it successfully you will notice that it is not that complicated.

        I think i read one error in your thoughts between the lines:

        You are not supposed to convert your old Nokia Dashboard objects to VSX.
        The purpose of the “convert to VSX” Option is that you can convert a standalone Firewall Cluster to a VSX Cluster and elevating the previously “physical cluster” to a first Virtual Firewall on the (now VSX) Cluster. Then you can add a second Virtual Firewall beside it. And so on…

        So in your case this could mean that you do a 1:1 conversion of your first IP390 Cluster to a standard Cluster on S4600 Hardware. Convert it to VSX and then manually add the Configuration for the Second Virutal System to replace the second IP390 Cluster.

        I hope this was understandable ;)?

        As mentioned earlier we did not chose this approach:

        – We set up the VSX Cluster parallel to the two IP390 Cluster
        – First configure the basis (VS0)
        – Create two Virtual Systems which each will replace one of the IP390 Cluster
        – Already configure all network interface but give the VS a temporary IP in the DMZ
        For example the nokias have .2, .3 and .1 as VIP, we gave the VS .4 until the day of the migration and after the Nokia Cluster was shut down we changed the IP to .1 and the VSX Cluster got the traffic of the DMZ. Note that VSX VS only have 1 IP no phyiscal IPs! Only VS0 gets 3 IPs.
        – Prepare Routing (Done Inside of Smart Dashboard and not in GAIA!)
        – Prepare the policy so that it is compatible to the VSX Cluster (install on, etc)
        – Wait for maintenance window, push final policy to VSX Cluste, shut down Nokia Cluster 1, Change all DMZ interfaces to .1, save VSX Object (a basic config push will occour and change all IPs) and then the VS will be taking the traffic.

        Even VPN Tunnels came back online right away. The Migration itself was a really smooth experience for us. It took us maybe 2 hours on a sunday to do everything.

        Sadly we encountered problems and stability issues so make sure that ClusterXL is working as expected, that SecureXL is activated etc…

        Also think about the option to migrate the two clusters on seperate dates if you have the luxury of multiple maintenance windows. This could give you more time to assess the functionality and stability of the VSX Cluster. Especially if you are new to it.

        Regards
        Sebastian

  4. SebastianB says:

    Hi,

    I am pleased to hear that this might help you.

    I know that there are tools to migrate os configuration from ipso/clish to gaia clish.
    You should find it in the chekpoint support database.

    However our Nokia Configuration was quite simple with only arround 10 interfaces/vlans so i decided to configure everything by hand.
    I am not a fan of automated tools if the manual configuration is not that complex.

    The Smart Dashboard configuration is quite easy: you create the virtual systems with the same interfaces your nokias had and then just push the policy that you used to pus to the nokias to the new virtual firewalls (mind the install on collumn, duplicate the policy first if you need to alter it but want to keep a fallback).

    Be careful that you dont duplicate the IP of your default gateway by bringing up a virtual firewall interface while the nokia cluster is still in place. For the migration preperaton you can give the Virtual Firewall a different IP in the same DMZ/Subnet and change them to the default gateway IP of the subnet after shutting down the Nokia.
    This will cause brief downtimes (maybe 2-5 minutes) but a zero downtime migration was not mandatory for us.

    If you have more/specific questions feel free to ask!

    Regards
    Sebastian

  5. Sam says:

    Hi Sebastian,

    Thanks again, I indulged in setting things up on the weekend over 3 times as per your last post and did the following on a separate segment
    – setup 2 x s4600 with Nokia IP390 settings from one cluster (physical IPs)
    – from SmartDashboard, created a new VSX cluster using Virtual IP of the Nokia cluster
    – selected Cluster XL VSLS
    – added those two s4600 as nodes with correct ip addresses for sync interface
    – while installing the policy at the end, it errored out with IPS error

    Installing default Policy – VS0_VSX on VS0…
    VS0_VSX:
    “/opt/CPsuite-R75.40VS/fw1/conf/updates.def”, line 98185: ERROR: cannot find anywhere
    Compilation failed.
    Operation ended with errors.
    Failed to install default policy VS0_VSX on VS0

    Installing VSX default policy operation has finished with errors.
    This could have happen due to time-out while installing security policy.
    Check the modules to see if security policy is installed. if so discard
    this error message.
    If policy is not installed make sure that the failed Virtual System/Router
    is accessible from the management server, and that you have a valid license.
    Try to install security policy manually from the SmartDashboard.
    If the problem persists contact Check Point Technical Support.

    Operation has failed.
    ——————————

    of course googling didnt say anything :-)

    I think this is due to issues I had with IPS on my old management server
    As my previous setup was running R70.40 on windows smart server, i did the following to get the old configs
    – cloned and upgraded managment server to R70.50 -> R75.40 -> R75.40VS
    – upgrade_export database
    – upgrade_import on the management appliance
    – removed old ip390 objects, nokia clusters and all the policy packages
    – updated IPs
    – setup 2 x 4600 with the cluster IPs
    – while creating VSX cluster, it all fell apart

    Looks like I now only have two options
    1.start fresh, upgrade to R76 and configure network objects, rulebase manually OR
    2. fix the IPS issue and proceed with the imported config

    If i understand correctly, you used the same management server to add the new setup where I am using a new appliance, where I need import exprt config

    any ideas, suggestions would be helpful

    Thanks again, cheers, SAM
    PS: Someone said to me the other day, Using IPS on VSx is bit of a pain, not sure this is what they meant :-)

    • SebastianB says:

      Hi,

      just did a quick googling for updeates.def and found this:
      http://www.cpshared.com/forums/showthread.php?t=812

      Could be that the IPS Pattern are buggy/broken.

      I would try to do a IPS Definition Update from within Smart Dashboard.

      Also go to the checkpoint Supportcenter and search for updates.def.
      It seems there are a couple of SKs regarding broken update definitions. As far as i can see it is possible to reset the IPS definition Database by deleting some files and trigger a new IPS pattern update.

      Howerver I would advise to try a simple pattern update first and then get some more detailed information through Checkpoint Support on this.

      Regards
      Sebastian

  6. Sam says:

    Hi Sebastian,

    finally it all worked :-). thanks for the IPS search idea, i tried it but was not really worried starting all over again.

    I started from the scratch with R76, and had the VS0 setup as VSLS, then I could create a test VS but somehow had only one interface left as option to add!! I used the

    third option to have custom template applied.

    whats getting me now is the ip addressing, do i use the physical IPs from the gateways I had in the Nokia cluster and assign them to S4600 appliance? and use Virtual

    IPs for VS I create to replace the Clusters?

    so if i have followed your guidelines correctly, it would look like following

    new clusternode1
    —————-
    Mgmt – 10.10.10.1

    INT eth1 – 10.1.1.1
    DMZ eth2 – 10.3.1.1
    ext eth3 – 10.5.1.1

    these come from the first node in nokia cluster-1

    INT eth4 – 10.11.1.1
    DMZ eth5 – 10.13.1.1
    ext eth6 – 10.15.1.1

    these come from the first node in nokia cluster-2

    sync eth7 – 192.168.1.1

    new clusternode2
    —————-
    Mgmt – 10.10.10.2

    INT eth1 – 10.1.1.2
    DMZ eth2 – 10.3.1.2
    ext eth3 – 10.5.1.2

    these come from the second node in second cluster-1

    INT eth4 – 10.11.1.2
    DMZ eth5 – 10.13.1.2
    ext eth6 – 10.15.1.2

    these come from the second node in nokia cluster-2

    sync eth7 – 192.168.1.2

    ——————-
    VSX Cluster
    ——————-

    Create VS0 with 10.10.10.10 ip address and add these two nodes. Select Eth7 as sync interface

    Create a VS1 to replace first NOKIA cluster and use following VIPs (currently in production)

    INT eth1 – 10.1.1.3
    DMZ eth2 – 10.3.1.3
    ext eth3 – 10.5.1.3

    Create a VS2 to replace second NOKIA cluster and use following VIPs (currently in production)

    INT eth1 – 10.11.1.3
    DMZ eth2 – 10.13.1.3
    ext eth3 – 10.15.1.3

    Thanks again mate, I appreciate your time and expertise in getting to understand this more

    cheers
    Sam

    • SebastianB says:

      Hi,

      we also used the custom template. Im not sure anyone uses predefined templates because every network looks a bit different…

      Also we do not use VSLS as we wanted to keep the setup as simple as possible.

      Regarding the Interface settings I would explain it this way:

      First thing you set up (even during the Wizard) is the VS0 Virtual System which is the “Main Management Virtual System”. This is not supposed to handle productive traffic but only the management Traffic for Administration, SmartCenter Connection etc…

      It will use ClusterXL and need 3 IP Adresses like the Nokias did with VRRP: 2 Physical IPs for the Machines and one Virtual Cluster IP that gets assigned to the active member.

      This Management Interface for VS0 should be placed in a safe network because it has Administrative Ports like SSH open (tough you can create a specific policy for it to protect it).

      Now when you create a VIrtual System it will only get 1 IP Address per Interface. The Cluster will not use a physical IP Address anymore. Instead only the Cluster VIP will be published via ARP from the Active Cluster member.

      Your list of IPs looks like you have an error in your logic regarding VSX. Let me put your information in a slightly different form:

      1. You create the VSX Container Object in Smart Dashboard (VS0):

      Mgmt Interface Physical Cluster Node1: 10.10.10.2
      Mgmt Interface Physical Cluster Node2: 10.10.10.3
      Mgmt Interface Physical Cluster VIP: 10.10.10.1

      2. Now you Add a Virtual Firewallsystem to this Cluster and add various Interfaces to it (lets assume only physical interfaces for now, no vswitch or vrouter):

      VS1 (Firewall VS) will get the Following Interfaces:

      INT: eth1 10.1.1.1
      DMZ: eth2 10.3.1.1
      EXT: eth3 10.5.1.1

      Thats it, you dont need to think about physical Firewalls for your Virtual Firewalls anymore. This implies that you always need to configure the phsyical Interfaces of the Appliances in the same way (eg eth1 is INT on both machines, eth2 is DMZ, etc…). But you should do that in any case even with a simple stand alone Cluster.

      The IPs should represent the VRRP IP your Nokias Cluster has now so that you dont need to touch any of your servers (to change the default gw). For the preperation you can give it a different IP eg 10.1.1.11 for INT until you turn of the Nokia Cluster and Switch the IP to .1 (default gw of subnet).

      3. Sync is handled a bit different than on Normal Cluster:
      During the Wizzard you will be asked which interface is the Sync Interface (eg eth7) and which IP Adresses it should use (default IPs should be fine).

      When you have set up your physical Sync Interface in the VSX Cluster Container you dont need to add Sync interfaces to your Virtual Systems. VSX will take care of the Sync of Virtual Systems.

      4. If you use Virtual Router or Virtual Switches you will see WARP Interfaces and Different Internal IPs between the Virtual Systems. This is normal.

      I hope this was understandable. I would suggest you read the VSX Admin Guide contained in the R75.40VS or R76 Documentation Package. It has some basic Drawings and explains the Basics (only superficial tough).

      If you have wrapped your head arround the way VSX is designed its actually pretty simple. The GUI/Dashboard side anyways. Under the hood: Console, Troubleshooting, FW Kernel etc is quite complex and can be a pain to troubleshoot.

      Regards
      Sebastian

  7. Sam says:

    Many Thanks Sebastian,
    You are correct, its all about physical vs virtual confusion! I appreciate your feedback and clarification about IPs.
    So far we have setup
    – appliances with physical IPs from previous IP390s
    – VS0 with mgmt address only
    Will now have to setup the VS1 and VS2 to replace cluster1 and cluster2 with the virtual IPs only
    I will let you know how it goes.
    Cheers
    Sam

  8. Pingback: R75.40VS – Light at the end of the tunnel? | IT-Unsecurity

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s