Project Herbert Documentation

Everything you should know about Project Herbert and Deployment instructions.

About

Project Herbert is a traffic attribution and accounting tool. It is built on a bunch of open source tools to allow simple visibility into how much data you're using and who is using that data.

More information can be found on the landing About page


Dependencies

Design

This project uses a base of open source tools. Although there are a number of dependencies to setup this project, the design goals were to make it such that this tool was very modular, meaning you can add and swap components of this system as you see fit. This system was designed to scale and you should have no problems deploying all these components as clusters to scale out where you need it.

At the heart of this project is a message queue. This can be a very powerful tool to expand on this project by attaching to exchanges on events like user authentication. We have for instance built add-ons that can block or throttle users where desired.

RabbitMQ

RabbitMQ is used as the message-broker of choice. It allows applications to connect to each other and provide a common platform for messages to be exchanged. RabbitMQ is used in this project to:

  • Act as a buffer for 'unprocessed' netflows before we've had a chance to match a user up to them.
  • Allow the solution to be pluggable. All users-auth messages gets put into an exchange that you can attach queues to allowing you to add-on features without changing any of the code and config.

MongoDB

MongoDB is used our data store. It was chosen because of the simplicity to index and shard the collections such that as this becomes the bottleneck, you can scale up the cluster.

Pmacct

the pmacct project was chosen because of the ability for it to receive BGP feeds. We use this for our traffic classification should it be desired. This utility has the ability to push the netflow messages into RabbitMQ.

Netflow

Netflow can be used to send flows for the traffic used along with all the L3 headers. Whilst we are using Netflow to do this traffic attribution it might be worth noting pmacct does support operating in either a bridge mode or on a port span.


Configuring Dependencies

RabbitMQ

Nothing special needs to occur here. You can find the installation guide on the RabbitMQ website. After installing you just have to add a user so that the app can access it

MongoDB

There are a few tips to the MongoDB install. Just follow the appropriate MongoDB install guide to build yourself a cluster. It is recommended that you install at least Mongo 2.6 for performance reasons.

A database is required to be created and you should add a user to access the counters from your applications. Once the application is running, to keep it fast, there are some indexes you can create using the following:

db.user_daily_totals.ensureIndex( { username: "hashed" } )
db.user_monthly_totals.ensureIndex( { username: "hashed" } )
db.user_weekly_totals.ensureIndex( { username: "hashed" } )
db.user_yearly_totals.ensureIndex( { username: "hashed" } )

db.user_daily_totals.ensureIndex( { username: 1, date: 1 } )
db.user_monthly_totals.ensureIndex( { username: 1, date: 1 } )
db.user_weekly_totals.ensureIndex( { username: 1, date: 1 } )
db.user_yearly_totals.ensureIndex( { username: 1, date: 1 } )

db.daily_totals.ensureIndex( { date: 1 } )
db.monthly_totals.ensureIndex( { date: 1 } )
db.weekly_totals.ensureIndex( { date: 1 } )
db.yearly_totals.ensureIndex( { date: 1 } )

sh.shardCollection("herbert.user_yearly_totals", { username: "hashed" } )
sh.shardCollection("herbert.user_weekly_totals", { username: "hashed" } )
sh.shardCollection("herbert.user_monthly_totals", { username: "hashed" } )
sh.shardCollection("herbert.user_daily_totals", { username: "hashed" } )
					

pmacct

Pmacct is used to collect the netflow, attach the required BGP attributes then send it off to a queue in RabbitMQ so we can later marry these up with a user. I am using the following configuration file:

!
! pmacctd configuration example
!
! Did you know CONFIG-KEYS contains the detailed list of all configuration keys
! supported by 'nfacctd' and 'pmacctd' ?
!
nfacctd_port: 2055
daemonize: false

aggregate: src_std_comm,src_host,dst_host,src_port,dst_port,timestamp_start,timestamp_end

plugins: amqp
amqp_exchange: raw_netflow
amqp_exchange_type: fanout
amqp_refresh_time: 2
amqp_host: rabbitmq.somedomain.com
amqp_user: herbert
amqp_passwd: somePassword
amqp_routing_key:

plugin_pipe_size: 10240000
plugin_buffer_size: 10240

! BGP Information
bgp_daemon: true
bgp_agent_map: /usr/local/pmacct/etc/agent_to_peer.map
bgp_daemon_pipe_size: 1310710
nfacctd_as_new: bgp
bgp_src_std_comm_type: bgp
bgp_src_as_path_type: bgp
					

The agent_to_peer.map file is because in my setup I'm exporting my BGP data from a zebra/quagga instance where the source of this BGP information will be separate from the point I'm collecting Netflow from. Ultimately in my setup, I get BGP from our edge routers and collect netflow from a distribution point before NAT occurs. My agent_to_peer.map file ultimately looks like this where 130.130.218.6 is the box sending BGP into Pmacct and 10.25.0.1 is the id of the router sending netflow:

id=130.130.218.6 ip=10.25.0.1

BGP and Netflow

To export BGP into Pmacct, my quagga config looks like this. It should look the same from any IOS device though ( however you may wish to use route-maps to alter the BGP communities you send on.) Pmacct will automatically use the same ASN of the device that connects to it and receive iBGP only. My Quagga config looks like this:

router bgp 64698
 bgp router-id 130.130.218.6
 neighbor 130.130.208.103 remote-as 64698
 neighbor 130.130.208.103 description Herbert Remote Collector
 neighbor 130.130.208.103 update-source 130.130.218.6
 neighbor 130.130.208.103 route-reflector-client
...
!						
					

The netflow config I'm using (Note: I pass full netflow information right down to be stored in my MongoDB cluster for accountability purposes. You can always lessen the load by only sending a subset of these fields across.) 1.1.1.1 should be replaced with the pmacct netflow collector

ip flow-cache timeout active 1
mls netflow interface
mls flow ip interface-full
ip flow-export version 5
ip flow-export destination 1.1.1.1 
interface Te1/1.100
  ! I am the interface going out to the border NAT Device.
  ip flow ingress
!
					

Configuring Herbert

user-auth

The RabbitMQ user-auth exchange is used by the netflow-processor to keep a local cache of IP to user mappings. It should be used to push JSON messages once a user login event occurs somewhere on your network. We have a script used to pull down RADIUS logs with DHCP logs to pass the messages into this exchange. An example of the message to push into this exchange is:

{
	u'ip_address': u'10.64.72.10',
	u'mac_address': u'd0:22:be:33:44:44',
	u'method': u'acs',
	u'timestamp': u'2014-05-15T15:40:51',
	u'username': u'abc123'
}

herbert-user-auth-writer

This little utility has one job. It's purpose is to take the user registrations from a queue attached to the user-auth exchange and write them to the mongo datastore. We do this so instances of the netflow-processor can look up who a flow belonged to and so we've got some history on it. The configuration file looks like this:

amqp_server: rabbitmq.somedomain.com
amqp_exchange: user_auth
amqp_queue: herbert_auth_log
amqp_username: herbert
amqp_password: somePassword

mongodb_server: mongodb.somedomain.com
mongodb_database: herbert
mongodb_auth_log_collection: auth_log
mongodb_username: herbert
mongodb_password: somePassword

To deploy with python 2.7 just install pip and run 'pip -r requirements.txt'

herbert-netflow-processor

The netflow processor is responsible for taking the output of pmacct (that is, Netflow with BGP community information attached), matching the flow with a user and updating the counters for the appropriate document in MongoDB. The updates are cached in memory for a time and performed in bulk operations so it's recommended that you run Mongo 2.6 or greater to take advantage of the performance gain in bulk updates. The settings file looks something like this:

amqp_server: rabbitmq.somedomain.com
amqp_user_auth_exchange: user_auth
amqp_raw_netflow_exchange: raw_netflow
amqp_raw_netflow_queue: raw_netflow
amqp_username: herbert
amqp_password: somePassword

mongodb_server: mongo.somedomain.com
mongodb_database: herbert
mongodb_username: herbert
mongodb_password: somePassword

forks: 10

You should be aware that this process is capable of forking itself to process more flows concurrently. You can always run more than one instance across as many machines as you want for performance and redundancy.

To deploy with python 2.7 just install pip and run 'pip -r requirements.txt'

Herbert Frontend

The frontend is built on Meteor, an open-source platform for building web-apps. This does mean you'll need to spin up a node server. Once you have done that and installed npm you can install meteor by

curl https://install.meteor.com | /bin/sh
npm install meteorite
cd /the/directory/you/have/the/frontend
mrt bundle bundle.tgz
tar -zxvf bundle.tgz

To run the server you do. You may like to put this in an upstart script or however you prefer for your distro

export METEOR_SETTINGS="$(cat /location/to/herbert/src/settings.json)"
export ROOT_URL=http://herbertgui.somedomain.com
export MONGO_URL=mongodb://herbert:somePassword@mongo.somedomain.com/herbert
export PORT=80
/usr/bin/node /location/to/gui/dist/main.js