pmacct (Promiscuous mode IP Accounting package)
pmacct is Copyright (C) 2003-2016 by Paolo Lucente

Q1: What is pmacct project homepage ?
A: pmacct homepage is http://www.pmacct.net/


Q2: 'pmacct', 'pmacctd', 'nfacctd', 'sfacctd', 'uacctd', 'pmtelemetryd',
    'pmbgpd' and 'pmbmpd' -- but what do they mean ?
A: 'pmacct' is intended to be the name of the project; 'pmacctd' is the name of the
   libpcap-based IPv4/IPv6 accounting daemon; 'nfacctd' is the name of the NetFlow
   (versions supported NetFlow v1 to v9) and IPFIX accounting daemon; 'sfacctd' is
   the name of the sFlow v2/v4/v5 accounting daemon; 'uacctd' is the name of the
   Linux Netlink NFLOG-based accounting daemon (historically, it was using ULOG,
   hence its name); 'pmtelemetryd' is the name of the streaming network telemetry
   collector daemon, where quoting Cisco IOS-XR Telemetry Configuration Guide at the
   time of this writing "Streaming telemetry [ .. ] data can be used for analysis
   and troubleshooting purposes to maintain the health of the network. This is
   achieved by leveraging the capabilities of machine-to-machine communication.
   [ .. ]"; 'pmbgpd' is the name of the pmacct BGP collector daemon; 'pmbmpd' is
   the name of the pmacct BMP collector daemon. 


Q3: Does pmacct stand for Promiscuous mode IP Accounting package ?
A: That is not entirely correct today, it was originally though. pmacct born as a
   libpcap-based project only. Over the time it evolved to include NetFlow first,
   sFlow shortly afterwards and NFLOG more recently - this is striving to maintain
   a consistent implementation over the set, unless technical considerations
   prevent that to happen for specific cases.  


Q4: What are pmacct main features?
A: pmacct can collect, replicate and export network data. Collect in memory tables,
   store persistently to RDBMS (MySQL, PostgreSQL, SQLite 3.x), noSQL databases
   (key-value: BerkeleyDB 5.x via SQLite API or document-oriented: MongoDB) and
   flat-files (csv, formatted, JSON, Apache Avro output), publish to AMQP and
   Kafka brokers (ie. to insert in ElasticSearch, Cassandra or CouchDB). Export
   speaking sFlow v5, NetFlow v1/v5/v9 and IPFIX. pmacct is able to perform data
   aggregation, offering a rich set of primitives to choose from; it can also
   filter, sample, renormalize, tag and classify at L7. pmacct integrates a BGP
   daemon join routing visibility and network traffic information.


Q5: Does any of the pmacct daemons logs to flat files?
A: Yes, but in a specific way. In other tools flat-files are typically used to log every
   micro-flow (or whatever aggregation the NetFlow agents have been configured to export
   with) and work in a two-stages fashion: a) write down to persistent storage then b)
   consolidate, on either or both spatial and temporal axes, to build the desired view.
   By inception, pmacct always aimed to a single-stage approach instead, ie. offer data
   reduction tecniques and correlation tools to process network traffic data on the fly,
   so to immediately offer the desired view(s) of the traffic. pmacct writes to files in
   text-format (json, csv or formatted via its 'print' plugin, see QUICKSTART doc for
   further information) so to maximize potential integration with 3rd party applications
   while keeping low the effort of customization.


Q6: Is it feasible for pmacct to scale by making use of either memory tables or RDBMS
    as backend for logging network traffic?
A: pmacct was not originally meant to log network traffic at packet/micro-flow level: it
   allows to get an aggregated view of the traffic -- both in space and in time. On top
   of that, there are layers of filtering, sampling and tagging. These are the keys to
   scale. As these features are fully configurable, data granularity and resolution can
   be traded off in favour of increased scalability or less resources consumption. More
   recently, logging has been introduced, by means of two new primitives timestamp_start
   and timestamp_end, fostered by the development of NetFlow/IPFIX as generic transport
   protocol, ie. as a replacement of syslog; it was then intuitive to generalize the 
   logging support to the more traditional traffic accounting part.


Q7: I see my daemon taking much CPU cycles; is there a way to reduce the load?
A: CPU cycles are proportional to the amount of traffic (packets, flows, samples) that
   the daemon receives; in case of pmacctd it's possible to reduce the CPU share by
   avoiding unnecessary copies of data, also optimizing and buffering the necessary
   ones. Kernel-to-userspace copies are critical and hence the first to be optimized;
   for this purpose you may look at the following solutions: 

   Linux kernel has support for mmap() since 2.4. The kernel needs to
   be 2.6.34+ or compiled with option CONFIG_PACKET_MMAP. You need at
   least a 2.6.27 to get compatibility with 64bit. Starting from 3.10,
   you get 20% increase of performance and packet capture rate. You
   also need a matching libpcap library. mmap() support has been added
   in 1.0.0. To take advantage of the performance boost from Linux
   3.10, you need at least libpcap 1.5.0.
   
   PF_RING, http://www.ntop.org/PF_RING.html : it's a new type of network socket that
   improves the packet capture speed; it's available for Linux kernels 2.[46].x; it's
   kernel based; has libpcap support for seamless integration with existing applications.

   Device polling: it's available since FreeBSD 4.5REL kernel and needs just kernel
   recompilation (with "options DEVICE_POLLING"), and a polling-aware NIC. Linux kernel
   2.6.x also supports device polling. 

   Internal buffering can also help and, contrary to the previous tecniques, applies
   to all daemons. Buffering is enabled with the plugin_buffer_size directive; buffers
   can then be queued and distributed with a choice of an home-grown circolar queue
   implementation (plugin_pipe_size) or a RabbitMQ broker (plugin_pipe_amqp). Check
   CONFIG-KEYS and QUICKSTART for more information. 


Q8: I want to to account both inbound and outbound traffic of my network, with an host
    breakdown; how to do that in a savy fashion ? Do i need to run two daemon instances 
    one per traffic direction ?
A: No, you will be able to leverage the pluggable architecture of the daemons: you will
   run a single daemon with two plugins attached to it; each of these will get part of
   the traffic (aggregate_filter), either outbound or inbound. A sample config snippet
   follows:

   ...
   aggregate[inbound]: dst_host
   aggregate[outbound]: src_host
   aggregate_filter[inbound]: dst net 192.168.0.0/16
   aggregate_filter[outbound]: src net 192.168.0.0/16
   plugins: mysql[inbound], mysql[outbound]
   sql_table[inbound]: acct_in 
   sql_table[outbound]: acct_out 
   ... 

   It will account all traffic directed to your network into the 'acct_in' table and
   all traffic it generates into 'acct_out' table. Furthermore, if you actually need
   totals (inbound plus outbound traffic), you will just need to play around with
   basic SQL queries.

   If you are only interested into having totals instead, you may alternatively use
   the following piece of configuration:

   ...
   aggregate: sum_host
   plugins: mysql
   networks_file: /usr/local/pmacct/etc/networks.lst
   ...

   Where 'networks.lst' is a file where to define local network prefixes.


Q9: I'm intimately fashioned by the idea of storing every single flow flying through my
    network, before making up my mind what to do with such data: i basically would like
    to aggregate my traffic as 'src_host, dst_host, src_port, dst_port, proto'. Is this
    feasible without any filtering ?
A: This is not adviceable. A simple reason being this would result in a huge matrix of
   data, whose behaviour and size would be totally un-predictable over time (ie. impact
   of port scans, DDoS, etc.). Nevertless, it remains a valid configuration.


Q10: I use pmacctd. What portion of the packets is included into the bytes counter ?
A: The portion of the packet accounted starts from the IPv4/IPv6 header (inclusive) and
   ends with the last bit of the packet payload. This means that are excluded from the
   accounting: packet preamble (if any), link layer headers (e.g. ethernet, llc, etc.),
   MPLS stack length, VLAN tags size and trailing FCS (if any). This is the main reason
   of skews reported while comparing pmacct counters to SNMP ones. However, by having
   available a counter of packets, accounting for the missing portion is, in most cases,
   a simple math exercise which depends on the underlying network architecture.
   Example: Ethernet header = 14 bytes, Preamble+SFD (Start Frame Delimiter) = 8 bytes,
   FCS (Framke Check Sequence) = 4 bytes. It results in an addition of a maximum of 26
   bytes (14+8+4) for each packet. The use of VLANs will result in adding 4 more bytes
   to the forementioned 26. 
   If using an SQL plugin, starting from release 0.9.2, such adjustment can be achieved
   directly within pmacct via the 'adjb' action (sql_preprocess).


Q11: How to get the historical accounting enabled ? SQL table have a 'stamp_inserted'
   and 'stamp_updated' fields but they remain empty.
A: Historical accounting is easily enabled by adding to the SQL plugin configuration a
   'sql_history' directive. Associate to it a 'sql_history_roundoff'. For examples and
   syntax, refer to CONFIG-KEYS and QUICKSTART documents. 


Q12: CLI is not enough to me. I would like to graph traffic data: how to do that?
A: RRDtool, MRTG and GNUplot are just some tools which could be easily integrated with
   pmacct operations. 'Memory plugin' is suitable as temporary storage and allows to
   easily retrieve counters:
 
   shell> ./pmacctd -D -c src_host -P memory -i eth0 
   shell> ./pmacct -c src_host -N 192.168.4.133 -r
   2339
   shell>

   Et voila'! This is the bytes counter. Because of the '-r', counters will get reset
   or translating into the RRDTool jargon, each time you will get an 'ABSOLUTE' value.
   Let's now encapsulate our query into, say, RRDtool commandline:

   shell> rrdtool update 192_168_4_133.rrd N:`./pmacct -c src_host -N 192.168.4.133 -r`

   Starting from 0.7.6, you will also be able to spawn as much as 4096 requests into a
   single query; you may write your requests commandline (';' separated) but also read
   them from a file (one per line):

   shell> ./pmacct -c src_host,dst_host -N 192.168.4.133,192.168.0.101;192.168.4.5,192.168.4.1;... -r 
   50905
   1152
   ...

   OR 

   shell> ./pmacct -c src_host,dst_host -N "file:queries.list" -r
   ...

   shell> cat queries.list
   192.168.4.133,192.168.0.101
   192.168.4.5,192.168.4.1
   ...

   Furthermore, SNMP is a widespreaded protocol used (and widely supported) in the IP
   accounting field to gather IP traffic information by network devices. 'pmacct' may
   also be easily connected to Net-SNMP extensible MIB. What follows is an example for
   your 'snmpd.conf':

   exec .1.3.6.1.4.1.2021.50 Description /usr/local/bin/pmacct -c src_host -N 192.168.4.133 -r 

   Then, an 'snmpwalk' does the rest of the work:
   shell> snmpwalk -v 1 localhost -c public .1.3.6.1.4.1.2021.50 
   .1.3.6.1.4.1.2021.50.1.1 = 1
   .1.3.6.1.4.1.2021.50.2.1 = "Description"
   .1.3.6.1.4.1.2021.50.3.1 = "/usr/local/bin/pmacct -c src_host -N 192.168.4.133 -r"
   .1.3.6.1.4.1.2021.50.100.1 = 0 
   .1.3.6.1.4.1.2021.50.101.1 = "92984384"
   .1.3.6.1.4.1.2021.50.102.1 = 0


Q13: The network equipment i'm using supports sFlow but i don't know how to enable it.
     I'm  unable to find any sflow-related command. What to do ?
A: If you are unable to enable sFlow commandline, you have to resort to the SNMP way.
   The sFlow MIB is documented into the RFC 3176; all you will need is to enable a SNMP
   community with both read and write access. Then, continue using the sflowenable tool
   available at the following URL: http://www.inmon.com/technology/sflowenable 


Q14: I've configured the pmacct package in order to support IPv6 via the '--enable-ipv6'
     switch. Now, when i launch either nfacctd or sfacctd i receive the following error
     message: ERROR ( default/core ): socket() failed. What to do ? 
A: When IPv6 code is enabled, sfacctd and nfacctd will try to fire up an IPv6 socket.
   The error message is very likely to be caused by the proper kernel module not being
   loaded. So, try either to load it or specify an IPv4 address to bind to. If using a
   configuration file, add a line like 'nfacctd_ip: 192.168.0.14'; otherwise if going
   commandline, use the following: 'nfacctd [ ... options ... ] -L 192.168.0.14'.  


Q15: SQL table versions, what they are -- why and when do i need them ? Also, can i
     customize SQL tables ?
A: pmacct tarball gets with so called 'default' tables (IP and BGP); they are built
   by SQL scripts stored in the 'sql/' section of the tarball. Default tables enable
   to start quickly with pmacct out-of-the-box; this doesn't imply they are suitable
   as-is to larger installations. SQL table versioning is used to introduce features
   over the time without breaking backward compatibility when upgrading pmacct. The
   most updated guide on which version to use given a required feature-set can be,
   once again, found in the 'sql/' section of the tarball.
   SQL tables *can* be fully customized so that primitives of interest can be freely
   mixed and matched - hence making a SQL table to perfectly adhere to the required
   feature-set. This is achieved by setting the 'sql_optimize_clauses' configuration
   key. You will then be responsible for building the custom schema and indexes.


Q16: What is the best way to kill a running instance of pmacct avoiding data loss ?
A: Two ways. a) Simply kill a specific plugin that you don't need anymore: you will
   have to identify it and use the 'kill -INT <process number> command; b) kill the
   whole pmacct instance: you can either use the 'killall -INT <daemon name>' command
   or identify the Core Process and use the 'kill -INT <process number> command. All
   of these, will do the job for you: will stop receiving new data from the network,
   clear the memory buffers, notify the running plugins to take th exit lane (which
   in turn will clear cached data as required).
   To identify the Core Process you can either take a look to the process list (on
   the Operating Systems where the setproctitle() call is supported by pmacct) or
   use the 'pidfile' (-F) directive. Note also that shutting down nicely the daemon
   improves restart turn-around times: the existing daemon will, first thing, close
   its listening socket while the newly launched one will mostly take advantage of
   the SO_REUSEADDR socket option. 


Q17: I find interesting store network data in a SQL database. But i'm actually hitting
     poor performances. Do you have any tips to improve/optimize things ?
A: Few hints are summed below in order to improve SQL database performances. They are
   not really tailored to a specific SQL engine but rather of general applicability.
   Many thanks to Wim Kerkhoff for the many suggestions he contributed on this topic
   over the time: 

   * Keep the SQL schema lean: include only required fields, strip off all the others.
     Set the 'sql_optimize_clauses' configuration key in order to flag pmacct you are
     going to use a custom-built table.
   * Avoid SQL UPDATEs as much as possible and use only INSERTs. This can be achieved
     by setting the 'sql_dont_try_update' configuration key. A pre-condition is to let
     sql_history == sql_refresh_time. UPDATEs are demanding in terms of resources and
     are, for simplicity, enabled by default.
   * If the previous point holds, then look for and enable database-specific directives
     aimed to optimize performances ie. sql_multi_values for MySQL and sql_use_copy for
     PostgreSQL.
   * Don't rely automagically on standard indexes but enable optimal indexes based on
     clauses you (by means of reports, 3rd party tools, scripts, etc.) and pmacct use
     the most to SELECT data. Then remove every unused index.
   * See if the dynamic table strategy offered by pmacct fits the bill: helps keeping
     SQL tables and indexes of a manageable size by rotating SQL tables (ie. per hour,
     day, week, etc.). See the 'sql_table_schema' configuration directive. 
   * Run all SELECT and UPDATE queries under the "EXPLAIN ANALYZE ..." method to see
     if they are actually hitting the indexes. If not, you need to build indexes that
     better fit the actual scenario.
   * Sometimes setting "SET enable_seqscan=no;" before a SELECT query can make a big
     difference. Also don't underestimate the importance of daily VACUUM queries: 3-5
     VACUUMs + 1 VACUUM FULL is generally a good idea. These tips hold for PostgreSQL.
   * MyISAM is a lean SQL engine; if there is no concurrence, it might be preferred to
     InnoDB. Lack of transactions can reveal painful in case of unsecured shutdowns,
     requiring data recovery. This applies to MySQL only.
   * Disabling fsync() does improve performance. This might have painful consequences
     in case of unsecured shutdowns (remember power failure is a variable ...).


Q18: I've configured the server hosting pmacct with my local timezone - which includes
     DST (Daylight Saving Time). Is this allright?
A: In general, it's good rule to run the backend part of any accounting system as UTC;
   pmacct uses the underlying system clock, expecially in the SQL plugins to calculate
   time-bins and scanner deadlines among the others. The use of timezones is supported
   but not recommended. 


Q19: I'm using the 'tee' plugin with transparent mode set to true and keep receiving
     "Can't bridge Address Families when in transparent mode. Exiting ..." messages,
     why?

A: It means you can't receive packets on an IPv4 address and transparently replicate
   to an IPv6 collector or vice-versa. Less obvious scenarios are: a) some operating
   systems where loopback (127.0.0.1) is considered a different address family hence
   it's not possible to replicate to a 127.0.0.1 address; it's possible though to use
   any locally configured IPv4 address bound to a (sub-)interface in 'up' state; b)
   an  IPv4-mapped IPv6 address is still technically an IPv6 address hence on servers
   running IPv4 and IPv6 it is good practice to explicitely define also the receiving
   IP address (nfacctd_ip), if IPv4 is used.


Q20: I've enabled IPv6 support in pmacct with --enable-ipv6. Even though the daemon
     binds to the "::" address, i don't receive NetFlow/IPFIX/sFlow/BGP data sent via
     IPv4, why?

A: Binding to a "::" address (ie. no [sn]facctd_ip specified when pmacct is compiled
   with --enable-ipv6) should allow to receive both IPv4 and IPv6 senders. IPv4 ones
   should be reportd in 'netstat' as IPv4-mapped IPv6 addresses. Linux has a kernel
   switch to enable/disable the functionality and its status can be checked via the
   /proc/sys/net/ipv6/bindv6only . Historically the default has been '0'. It appears
   over time some distributions have changed the default to be '1'. If you experience
   this issue on Linux, please check your kernel setting.
   

/* EOF */
