Discussion:
[SNMP4J] max-bindings with big tables
Steffen Brüntjen
2018-07-05 15:04:58 UTC
Permalink
Hi Frank

I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.


Here's an extract of an exemplary List<TableEvent> for a "good" result:

[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]


But in some specific circumstances, I get results like these:

[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]


Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.


Here's the setting which produces the second result:

- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here


This is what happens:

1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column

This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.


How to fix?

I believe a moderately easy, but not very good way to fix this is to have the little part contain the first 3 columns, not the remaining last 3 columns:

max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7

Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.


Please tell me if you need more information or if my method invocation is wrong.


Best regards
Steffen Brüntjen
Frank Fock
2018-07-05 17:37:07 UTC
Permalink
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Steffen Brüntjen
2018-07-06 08:20:21 UTC
Permalink
Hi!

I'm using SNMP4J version 2.6.2.

Best regards
Steffen

-----Original Message-----
From: Frank Fock [mailto:***@agentpp.com]
Sent: Donnerstag, 5. Juli 2018 19:37
To: Steffen Brüntjen <***@macmon.eu>
Cc: ***@agentpp.org
Subject: Re: [SNMP4J] max-bindings with big tables

Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Frank Fock
2018-07-06 16:54:38 UTC
Permalink
Hi Steffen,
I will try to reproduce this issue.
Independent from the result, the parameters for TableUtils are not suitable for your setup. The maxNumColumnsPerPDU has to be as large as possible. Otherwise the overall performance will be bad and the likelihood of incomplete table rows increases significantly (through changes in the agent while TableUtils operate).
Best regards
Frank
Post by Steffen Brüntjen
Hi!
I'm using SNMP4J version 2.6.2.
Best regards
Steffen
-----Original Message-----
Sent: Donnerstag, 5. Juli 2018 19:37
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Steffen Brüntjen
2018-07-09 17:45:02 UTC
Permalink
Hi Frank

Thank you for having a look at it. I agree, the performance with many bindings is indeed *much* higher and yes, values should be retrieved row-by-row in order to avoid data inconsistencies. But there are also problems with many bindings:

1. Since the agent can not - in the contrast to max-repetition-count - decide how many values to send, the packet size might get too big if you have a table with many (big) columns.

2. There are agents that get into trouble when many columns are requested. This often results in timeouts (no tooBig error) and then there's no other option to requesting fewer bindings.

Maybe the proposed change is the way to go, it's decent, but effective (I believe).

Best regards
Steffen


-----Original Message-----
From: Frank Fock [mailto:***@agentpp.com]
Sent: Freitag, 6. Juli 2018 18:55
To: Steffen Brüntjen <***@macmon.eu>
Cc: ***@agentpp.org
Subject: Re: [SNMP4J] max-bindings with big tables

Hi Steffen,
I will try to reproduce this issue.
Independent from the result, the parameters for TableUtils are not suitable for your setup. The maxNumColumnsPerPDU has to be as large as possible. Otherwise the overall performance will be bad and the likelihood of incomplete table rows increases significantly (through changes in the agent while TableUtils operate).
Best regards
Frank
Post by Steffen Brüntjen
Hi!
I'm using SNMP4J version 2.6.2.
Best regards
Steffen
-----Original Message-----
Sent: Donnerstag, 5. Juli 2018 19:37
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Frank Fock
2018-07-12 06:40:40 UTC
Permalink
Hi Steffen,

If the agent sends a tooBig error on a GETBULK request, then this is an error in the agent. See RFC3416 4.2.3:

If the size of the message encapsulating the Response-PDU
containing the requested number of variable bindings would be
greater than either a local constraint or the maximum message
size of the originator, then the response is generated with a
lesser number of variable bindings. This lesser number is the
ordered set of variable bindings with some of the variable
bindings at the end of the set removed, such that the size of
the message encapsulating the Response-PDU is approximately
equal to but no greater than either a local constraint or the
maximum message size of the originator. Note that the number
of variable bindings removed has no relationship to the values
of N, M, or R.

For the issue you reported, there is no general solution, because it interferes with sparse tables.
A solution would either decrease the performance for sparse tables or will filter out sparse rows.
The latter is not acceptable for intentionally sparse tables.
For dense tables, the filtering could be the best option. Although it would hide new rows although the command generator already detected them.

I am currently about to add an option for getDenseTable to activate a filtering for new rows that appear during the table retrieval and are therefore incompletely received. Would that help you?

Best regards,
Frank
Post by Steffen Brüntjen
Hi Frank
1. Since the agent can not - in the contrast to max-repetition-count - decide how many values to send, the packet size might get too big if you have a table with many (big) columns.
2. There are agents that get into trouble when many columns are requested. This often results in timeouts (no tooBig error) and then there's no other option to requesting fewer bindings.
Maybe the proposed change is the way to go, it's decent, but effective (I believe).
Best regards
Steffen
-----Original Message-----
Sent: Freitag, 6. Juli 2018 18:55
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
I will try to reproduce this issue.
Independent from the result, the parameters for TableUtils are not suitable for your setup. The maxNumColumnsPerPDU has to be as large as possible. Otherwise the overall performance will be bad and the likelihood of incomplete table rows increases significantly (through changes in the agent while TableUtils operate).
Best regards
Frank
Post by Steffen Brüntjen
Hi!
I'm using SNMP4J version 2.6.2.
Best regards
Steffen
-----Original Message-----
Sent: Donnerstag, 5. Juli 2018 19:37
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Frank Fock
2018-07-18 07:31:00 UTC
Permalink
This post might be inappropriate. Click to display it.
Steffen Brüntjen
2018-07-19 15:20:04 UTC
Permalink
Hi Frank


I'm not sure whether we're talking about the same thing. The problem I described is *not* a timinig problem with rows being added to or removed from the table while retrieving rows. The table I am querying doesn't change at all and the problem is highly reproducible. Let's see the example again:


This is how the List<TableEvent> result should look like and how it actually does - always - when the max-bindings is set to 1 or 32 or some other value.

[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... everything normal ... ]


When setting the max-bindings to 4 (I'm requesting 7 columns), I - always - get these TableEvents:

[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]


The returned List<TableEvent> contains 4 more results, because 4 table rows are split into two TableEvents. We can see that these indexes seem to have two rows:
index=283
index=373
index=774
index=783


It's like this table


IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | 5 | 6 | 7 | 8
2 | 9 | 10 | 11 | 12
3 | 13 | 14 | 15 | 16


becomes something like this when obtained by TableUtils:

IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | null| null| 7 | 8 <-- index=1
2 | null| null| 11 | 12 <-- index=2
1 | 5 | 6 | null| null <-- index=1
2 | 9 | 10 | null| null <-- index=2
3 | 13 | 14 | 15 | 16


I tried to describe the reason for this, but it's a bit complicated I admit. Of course it's also possible that I didn't understand your answer correctly. Sorry for the confusion in that case. Then I'd be willing to grasp how sparse and dense tables are the reason for this problem.

Thanks for the clarification on tooBig errors with GETBULK requests!


Best regards
Steffen Brüntjen



-----Original Message-----
From: Frank Fock [mailto:***@agentpp.com]
Sent: Donnerstag, 12. Juli 2018 08:41
To: Steffen Brüntjen <***@macmon.eu>
Cc: ***@agentpp.org
Subject: Re: [SNMP4J] max-bindings with big tables

Hi Steffen,

If the agent sends a tooBig error on a GETBULK request, then this is an error in the agent. See RFC3416 4.2.3:

If the size of the message encapsulating the Response-PDU
containing the requested number of variable bindings would be
greater than either a local constraint or the maximum message
size of the originator, then the response is generated with a
lesser number of variable bindings. This lesser number is the
ordered set of variable bindings with some of the variable
bindings at the end of the set removed, such that the size of
the message encapsulating the Response-PDU is approximately
equal to but no greater than either a local constraint or the
maximum message size of the originator. Note that the number
of variable bindings removed has no relationship to the values
of N, M, or R.

For the issue you reported, there is no general solution, because it interferes with sparse tables.
A solution would either decrease the performance for sparse tables or will filter out sparse rows.
The latter is not acceptable for intentionally sparse tables.
For dense tables, the filtering could be the best option. Although it would hide new rows although the command generator already detected them.

I am currently about to add an option for getDenseTable to activate a filtering for new rows that appear during the table retrieval and are therefore incompletely received. Would that help you?

Best regards,
Frank
Post by Steffen Brüntjen
Hi Frank
1. Since the agent can not - in the contrast to max-repetition-count - decide how many values to send, the packet size might get too big if you have a table with many (big) columns.
2. There are agents that get into trouble when many columns are requested. This often results in timeouts (no tooBig error) and then there's no other option to requesting fewer bindings.
Maybe the proposed change is the way to go, it's decent, but effective (I believe).
Best regards
Steffen
-----Original Message-----
Sent: Freitag, 6. Juli 2018 18:55
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
I will try to reproduce this issue.
Independent from the result, the parameters for TableUtils are not suitable for your setup. The maxNumColumnsPerPDU has to be as large as possible. Otherwise the overall performance will be bad and the likelihood of incomplete table rows increases significantly (through changes in the agent while TableUtils operate).
Best regards
Frank
Post by Steffen Brüntjen
Hi!
I'm using SNMP4J version 2.6.2.
Best regards
Steffen
-----Original Message-----
Sent: Donnerstag, 5. Juli 2018 19:37
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Frank Fock
2018-07-19 17:34:53 UTC
Permalink
Hi Steffen
I think I understood your description correctly from the beginning. However the problem you described should not happen with a static (unchanged) table, because of the inner logic of TableUtils.
I assume, that the agent does not return the rows in lexicographic order. That would have the same effect as if a row is dynamically appearing during retrieval.

I do not want to exclude an off-by-one error in TableUtils but all unit tests I run so far do not indicate that.

What agent are you using?

Nevertheless, the new version will not show the issue you observed with the mode denseTableDoubleCheckIncompleteRows

Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
This is how the List<TableEvent> result should look like and how it actually does - always - when the max-bindings is set to 1 or 32 or some other value.
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... everything normal ... ]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
index=283
index=373
index=774
index=783
It's like this table
IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | 5 | 6 | 7 | 8
2 | 9 | 10 | 11 | 12
3 | 13 | 14 | 15 | 16
IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | null| null| 7 | 8 <-- index=1
2 | null| null| 11 | 12 <-- index=2
1 | 5 | 6 | null| null <-- index=1
2 | 9 | 10 | null| null <-- index=2
3 | 13 | 14 | 15 | 16
I tried to describe the reason for this, but it's a bit complicated I admit. Of course it's also possible that I didn't understand your answer correctly. Sorry for the confusion in that case. Then I'd be willing to grasp how sparse and dense tables are the reason for this problem.
Thanks for the clarification on tooBig errors with GETBULK requests!
Best regards
Steffen Brüntjen
-----Original Message-----
Sent: Donnerstag, 12. Juli 2018 08:41
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
If the size of the message encapsulating the Response-PDU
containing the requested number of variable bindings would be
greater than either a local constraint or the maximum message
size of the originator, then the response is generated with a
lesser number of variable bindings. This lesser number is the
ordered set of variable bindings with some of the variable
bindings at the end of the set removed, such that the size of
the message encapsulating the Response-PDU is approximately
equal to but no greater than either a local constraint or the
maximum message size of the originator. Note that the number
of variable bindings removed has no relationship to the values
of N, M, or R.
For the issue you reported, there is no general solution, because it interferes with sparse tables.
A solution would either decrease the performance for sparse tables or will filter out sparse rows.
The latter is not acceptable for intentionally sparse tables.
For dense tables, the filtering could be the best option. Although it would hide new rows although the command generator already detected them.
I am currently about to add an option for getDenseTable to activate a filtering for new rows that appear during the table retrieval and are therefore incompletely received. Would that help you?
Best regards,
Frank
Post by Steffen Brüntjen
Hi Frank
1. Since the agent can not - in the contrast to max-repetition-count - decide how many values to send, the packet size might get too big if you have a table with many (big) columns.
2. There are agents that get into trouble when many columns are requested. This often results in timeouts (no tooBig error) and then there's no other option to requesting fewer bindings.
Maybe the proposed change is the way to go, it's decent, but effective (I believe).
Best regards
Steffen
-----Original Message-----
Sent: Freitag, 6. Juli 2018 18:55
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
I will try to reproduce this issue.
Independent from the result, the parameters for TableUtils are not suitable for your setup. The maxNumColumnsPerPDU has to be as large as possible. Otherwise the overall performance will be bad and the likelihood of incomplete table rows increases significantly (through changes in the agent while TableUtils operate).
Best regards
Frank
Post by Steffen Brüntjen
Hi!
I'm using SNMP4J version 2.6.2.
Best regards
Steffen
-----Original Message-----
Sent: Donnerstag, 5. Juli 2018 19:37
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Steffen Brüntjen
2018-07-23 17:23:21 UTC
Permalink
Hi!
Post by Frank Fock
However the problem you described should not happen with a static (unchanged) table, because of the inner logic of TableUtils.
I'm sorry, but I still believe I was unable to make the problem clear. You wrote, this problem should not appear in tables that don't change OR it may appear when the agent doesn't return rows in lexicographic order. The latter case is perceived just like row creation or row deletion is happening while retrieving the table. I understand that and I can't rule out the possibility that there's an error in the agent, although I have analyzed all the packets in Wireshark. I was also debugging the TableUtils and I still think, the bug is there. So let me try to explain it one last time.

Let's say we have this configuration:

max-repetition-count = 2
max-bindings = 3
requested table columns = 5

IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 | 6 | 7 | 8 | 9 | 10 |
2 | 11 | 12 | 13 | 14 | 15 |
3 | 16 | 17 | 18 | 19 | 20 |

SNMP4J will ask for A, B, C (max-bindings=3)
DEVICE will return A.0=1, B.0=2, C.0=3 (DEVICE decides to not send a 2. row because of MTU size)
SNMP4J will ask for D, E
DEVICE will return D.0=4, E.0=5, D.1=9, E.1=10 (max-repetition-count = 2)

And here we're running into the problem. TableUtils "creates" an inner table in this state:

IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 |null |null |null | 9 | 10 |


Now we'll continue:

SNMP4J will ask for A.0, B.0, C.0 (GETNEXT)
DEVICE will return A.1=6, B.1=7, C.1=8

What TableUtils now does is:

IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 |null |null |null | 9 | 10 |
1 | 6 | 7 | 8 |null |null |
...


This is the phenomenon I'm actually observing and that I am trying to describe. So once again: The table doesn't change and the problem is 100% reproducable. In my case:

- 100% of table retrievals with max-bindings != 4 is ok
- 100% of table retrievals with max-bindings == 4 is broken


This problem will never appear with max-bindings=1 or max-bindings=infinite, and it will never appear when the agent always sends the exact requested repetitions.

Best regards
Steffen Brüntjen


-----Original Message-----
From: Frank Fock [mailto:***@agentpp.com]
Sent: Donnerstag, 19. Juli 2018 19:35
To: Steffen Brüntjen <***@macmon.eu>
Cc: ***@agentpp.org
Subject: Re: [SNMP4J] max-bindings with big tables

Hi Steffen
I think I understood your description correctly from the beginning. However the problem you described should not happen with a static (unchanged) table, because of the inner logic of TableUtils.
I assume, that the agent does not return the rows in lexicographic order. That would have the same effect as if a row is dynamically appearing during retrieval.

I do not want to exclude an off-by-one error in TableUtils but all unit tests I run so far do not indicate that.

What agent are you using?

Nevertheless, the new version will not show the issue you observed with the mode denseTableDoubleCheckIncompleteRows

Best regards
Frank
Post by Frank Fock
Hi Frank
This is how the List<TableEvent> result should look like and how it actually does - always - when the max-bindings is set to 1 or 32 or some other value.
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... everything normal ... ]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
index=283
index=373
index=774
index=783
It's like this table
IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | 5 | 6 | 7 | 8
2 | 9 | 10 | 11 | 12
3 | 13 | 14 | 15 | 16
IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | null| null| 7 | 8 <-- index=1
2 | null| null| 11 | 12 <-- index=2
1 | 5 | 6 | null| null <-- index=1
2 | 9 | 10 | null| null <-- index=2
3 | 13 | 14 | 15 | 16
I tried to describe the reason for this, but it's a bit complicated I admit. Of course it's also possible that I didn't understand your answer correctly. Sorry for the confusion in that case. Then I'd be willing to grasp how sparse and dense tables are the reason for this problem.
Thanks for the clarification on tooBig errors with GETBULK requests!
Best regards
Steffen Brüntjen
-----Original Message-----
Sent: Donnerstag, 12. Juli 2018 08:41
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
If the size of the message encapsulating the Response-PDU
containing the requested number of variable bindings would be
greater than either a local constraint or the maximum message
size of the originator, then the response is generated with a
lesser number of variable bindings. This lesser number is the
ordered set of variable bindings with some of the variable
bindings at the end of the set removed, such that the size of
the message encapsulating the Response-PDU is approximately
equal to but no greater than either a local constraint or the
maximum message size of the originator. Note that the number
of variable bindings removed has no relationship to the values
of N, M, or R.
For the issue you reported, there is no general solution, because it interferes with sparse tables.
A solution would either decrease the performance for sparse tables or will filter out sparse rows.
The latter is not acceptable for intentionally sparse tables.
For dense tables, the filtering could be the best option. Although it would hide new rows although the command generator already detected them.
I am currently about to add an option for getDenseTable to activate a filtering for new rows that appear during the table retrieval and are therefore incompletely received. Would that help you?
Best regards,
Frank
Post by Steffen Brüntjen
Hi Frank
1. Since the agent can not - in the contrast to max-repetition-count - decide how many values to send, the packet size might get too big if you have a table with many (big) columns.
2. There are agents that get into trouble when many columns are requested. This often results in timeouts (no tooBig error) and then there's no other option to requesting fewer bindings.
Maybe the proposed change is the way to go, it's decent, but effective (I believe).
Best regards
Steffen
-----Original Message-----
Sent: Freitag, 6. Juli 2018 18:55
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
I will try to reproduce this issue.
Independent from the result, the parameters for TableUtils are not suitable for your setup. The maxNumColumnsPerPDU has to be as large as possible. Otherwise the overall performance will be bad and the likelihood of incomplete table rows increases significantly (through changes in the agent while TableUtils operate).
Best regards
Frank
Post by Steffen Brüntjen
Hi!
I'm using SNMP4J version 2.6.2.
Best regards
Steffen
-----Original Message-----
Sent: Donnerstag, 5. Juli 2018 19:37
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Frank Fock
2018-07-23 17:57:11 UTC
Permalink
Hi Steffen,

OK, I understand the difference. Nevertheless, the current snapshot already fixes this issue too.
Although SNMP4J TableUtils could probably handle this kind of scenario smarter, the scenario you describe is very rare:
1. You configured max-rep-count*max-bindings = 6 > max columns (=5). The opposite is recommended.
2. The agent seems to cut off a whole row (a whole repetition) to return a PDU below maxResponsePDUSize (or MTU). According to the SNMPv2c/v3 standard, only those VBs should be removed from the response, that actually break the limit. Thus, in your case the agent should most likely return at least one column of the first part of row “1” instead of returning none.

Have you tried the latest 3.0 SNAPSHOT already?
Both dense table modes:
* denseTableDoubleCheckIncompleteRows
* denseTableDropIncompleteRows
should return your row “1” in one TableEvent.

Best regards,
Frank
Post by Steffen Brüntjen
Hi!
Post by Frank Fock
However the problem you described should not happen with a static (unchanged) table, because of the inner logic of TableUtils.
I'm sorry, but I still believe I was unable to make the problem clear. You wrote, this problem should not appear in tables that don't change OR it may appear when the agent doesn't return rows in lexicographic order. The latter case is perceived just like row creation or row deletion is happening while retrieving the table. I understand that and I can't rule out the possibility that there's an error in the agent, although I have analyzed all the packets in Wireshark. I was also debugging the TableUtils and I still think, the bug is there. So let me try to explain it one last time.
max-repetition-count = 2
max-bindings = 3
requested table columns = 5
IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 | 6 | 7 | 8 | 9 | 10 |
2 | 11 | 12 | 13 | 14 | 15 |
3 | 16 | 17 | 18 | 19 | 20 |
SNMP4J will ask for A, B, C (max-bindings=3)
DEVICE will return A.0=1, B.0=2, C.0=3 (DEVICE decides to not send a 2. row because of MTU size)
SNMP4J will ask for D, E
DEVICE will return D.0=4, E.0=5, D.1=9, E.1=10 (max-repetition-count = 2)
IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 |null |null |null | 9 | 10 |
SNMP4J will ask for A.0, B.0, C.0 (GETNEXT)
DEVICE will return A.1=6, B.1=7, C.1=8
IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 |null |null |null | 9 | 10 |
1 | 6 | 7 | 8 |null |null |
...
- 100% of table retrievals with max-bindings != 4 is ok
- 100% of table retrievals with max-bindings == 4 is broken
This problem will never appear with max-bindings=1 or max-bindings=infinite, and it will never appear when the agent always sends the exact requested repetitions.
Best regards
Steffen Brüntjen
-----Original Message-----
Sent: Donnerstag, 19. Juli 2018 19:35
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
I think I understood your description correctly from the beginning. However the problem you described should not happen with a static (unchanged) table, because of the inner logic of TableUtils.
I assume, that the agent does not return the rows in lexicographic order. That would have the same effect as if a row is dynamically appearing during retrieval.
I do not want to exclude an off-by-one error in TableUtils but all unit tests I run so far do not indicate that.
What agent are you using?
Nevertheless, the new version will not show the issue you observed with the mode denseTableDoubleCheckIncompleteRows
Best regards
Frank
Post by Frank Fock
Hi Frank
This is how the List<TableEvent> result should look like and how it actually does - always - when the max-bindings is set to 1 or 32 or some other value.
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... everything normal ... ]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
index=283
index=373
index=774
index=783
It's like this table
IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | 5 | 6 | 7 | 8
2 | 9 | 10 | 11 | 12
3 | 13 | 14 | 15 | 16
IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | null| null| 7 | 8 <-- index=1
2 | null| null| 11 | 12 <-- index=2
1 | 5 | 6 | null| null <-- index=1
2 | 9 | 10 | null| null <-- index=2
3 | 13 | 14 | 15 | 16
I tried to describe the reason for this, but it's a bit complicated I admit. Of course it's also possible that I didn't understand your answer correctly. Sorry for the confusion in that case. Then I'd be willing to grasp how sparse and dense tables are the reason for this problem.
Thanks for the clarification on tooBig errors with GETBULK requests!
Best regards
Steffen Brüntjen
-----Original Message-----
Sent: Donnerstag, 12. Juli 2018 08:41
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
If the size of the message encapsulating the Response-PDU
containing the requested number of variable bindings would be
greater than either a local constraint or the maximum message
size of the originator, then the response is generated with a
lesser number of variable bindings. This lesser number is the
ordered set of variable bindings with some of the variable
bindings at the end of the set removed, such that the size of
the message encapsulating the Response-PDU is approximately
equal to but no greater than either a local constraint or the
maximum message size of the originator. Note that the number
of variable bindings removed has no relationship to the values
of N, M, or R.
For the issue you reported, there is no general solution, because it interferes with sparse tables.
A solution would either decrease the performance for sparse tables or will filter out sparse rows.
The latter is not acceptable for intentionally sparse tables.
For dense tables, the filtering could be the best option. Although it would hide new rows although the command generator already detected them.
I am currently about to add an option for getDenseTable to activate a filtering for new rows that appear during the table retrieval and are therefore incompletely received. Would that help you?
Best regards,
Frank
Post by Steffen Brüntjen
Hi Frank
1. Since the agent can not - in the contrast to max-repetition-count - decide how many values to send, the packet size might get too big if you have a table with many (big) columns.
2. There are agents that get into trouble when many columns are requested. This often results in timeouts (no tooBig error) and then there's no other option to requesting fewer bindings.
Maybe the proposed change is the way to go, it's decent, but effective (I believe).
Best regards
Steffen
-----Original Message-----
Sent: Freitag, 6. Juli 2018 18:55
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
I will try to reproduce this issue.
Independent from the result, the parameters for TableUtils are not suitable for your setup. The maxNumColumnsPerPDU has to be as large as possible. Otherwise the overall performance will be bad and the likelihood of incomplete table rows increases significantly (through changes in the agent while TableUtils operate).
Best regards
Frank
Post by Steffen Brüntjen
Hi!
I'm using SNMP4J version 2.6.2.
Best regards
Steffen
-----Original Message-----
Sent: Donnerstag, 5. Juli 2018 19:37
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Post by Steffen Brüntjen
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Steffen Brüntjen
2018-07-24 18:53:27 UTC
Permalink
Hi Frank
Post by Frank Fock
1. You configured max-rep-count*max-bindings = 6 > max columns (=5). The opposite is recommended.
Post by Frank Fock
Post by Steffen Brüntjen
Post by Steffen Brüntjen
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int))
Oh, and I didn't configure these values here, I just tried to make up an example that shows the problem. The original results come with max-repetitions=30, maxNumColumnsPerPDU=4.
Post by Frank Fock
[...] should most likely return at least column of the first part
The agent doesn't actually cut off variable bindings to the end of the previous row. That again was just the example I gave to point out the problem. But still this WILL simply happen from time to time by pure coincidence. And it did.


Finally I'm sure this is /not/ a dense-table-only issue. The same problem, that the returned List<TableEvents> contains multiple rows with the same index, can happen with sparse tables.


Anyways, I tried to replace 2.6 by 3.0, but there were some more changes I had to apply. I was using

public List<TableEvent> getTable(Target target,
OID[] columnOIDs,
OID lowerBoundIndex,
OID upperBoundIndex)

But I can't set the SparseTableMode here. So I wrote my own TableListener, and with that I can, finally, confirm, that the dense table modes work properly. The returned table is correct. But as mentioned, the problem is not specific to dense or sparse tables. Now that I've written my own TableListener, I think I can solve the issue like so (no external libs required):


CountDownLatch latch = new CountDownLatch(1);
List<TableEvent> table = new ArrayList<>();
Map<OID, TableEvent> rows = new ConcurrentHashMap<>();
TableListener myListener = new TableListener() {

@Override
public boolean next(TableEvent event) {
OID index = event.getIndex();
rows.compute(index, (idx, prevEvent) -> {
if (prevEvent == null) {
table.add(event);
return event;
}
// Merge values from newEvent to the variable bindings
VariableBinding[] prevColumns = prevEvent.getColumns();
VariableBinding[] newColumns = event.getColumns();
for (int i = 0; i < prevColumns.length; i++) {
if (prevColumns[i] == null) {
prevColumns[i] = newColumns[i];
}
}
return prevEvent;
});

return true;
}

@Override
public boolean isFinished() {
return latch.getCount() > 0;
}

@Override
public void finished(TableEvent event) {
latch.countDown();
}

};

tu.getTable(..., oids, myListener, ...);
latch.await();
return table;


Best regards and thanks a lot for your help
Steffen Brüntjen


-----Original Message-----
From: Frank Fock [mailto:***@agentpp.com]
Sent: Montag, 23. Juli 2018 19:57
To: Steffen Brüntjen <***@macmon.eu>
Cc: ***@agentpp.org
Subject: Re: [SNMP4J] max-bindings with big tables

Hi Steffen,

OK, I understand the difference. Nevertheless, the current snapshot already fixes this issue too.
Although SNMP4J TableUtils could probably handle this kind of scenario smarter, the scenario you describe is very rare:
1. You configured max-rep-count*max-bindings = 6 > max columns (=5). The opposite is recommended.
2. The agent seems to cut off a whole row (a whole repetition) to return a PDU below maxResponsePDUSize (or MTU). According to the SNMPv2c/v3 standard, only those VBs should be removed from the response, that actually break the limit. Thus, in your case the agent should most likely return at least one column of the first part of row “1” instead of returning none.

Have you tried the latest 3.0 SNAPSHOT already?
Both dense table modes:
* denseTableDoubleCheckIncompleteRows
* denseTableDropIncompleteRows
should return your row “1” in one TableEvent.

Best regards,
Frank
Post by Frank Fock
Hi!
Post by Frank Fock
However the problem you described should not happen with a static (unchanged) table, because of the inner logic of TableUtils.
I'm sorry, but I still believe I was unable to make the problem clear. You wrote, this problem should not appear in tables that don't change OR it may appear when the agent doesn't return rows in lexicographic order. The latter case is perceived just like row creation or row deletion is happening while retrieving the table. I understand that and I can't rule out the possibility that there's an error in the agent, although I have analyzed all the packets in Wireshark. I was also debugging the TableUtils and I still think, the bug is there. So let me try to explain it one last time.
max-repetition-count = 2
max-bindings = 3
requested table columns = 5
IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 | 6 | 7 | 8 | 9 | 10 |
2 | 11 | 12 | 13 | 14 | 15 |
3 | 16 | 17 | 18 | 19 | 20 |
SNMP4J will ask for A, B, C (max-bindings=3)
DEVICE will return A.0=1, B.0=2, C.0=3 (DEVICE decides to not send a 2. row because of MTU size)
SNMP4J will ask for D, E
DEVICE will return D.0=4, E.0=5, D.1=9, E.1=10 (max-repetition-count = 2)
IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 |null |null |null | 9 | 10 |
SNMP4J will ask for A.0, B.0, C.0 (GETNEXT)
DEVICE will return A.1=6, B.1=7, C.1=8
IDX | A | B | C | D | E |
----+-----+-----+-----+-----+-----+
0 | 1 | 2 | 3 | 4 | 5 |
1 |null |null |null | 9 | 10 |
1 | 6 | 7 | 8 |null |null |
...
- 100% of table retrievals with max-bindings != 4 is ok
- 100% of table retrievals with max-bindings == 4 is broken
This problem will never appear with max-bindings=1 or max-bindings=infinite, and it will never appear when the agent always sends the exact requested repetitions.
Best regards
Steffen Brüntjen
-----Original Message-----
Sent: Donnerstag, 19. Juli 2018 19:35
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
I think I understood your description correctly from the beginning. However the problem you described should not happen with a static (unchanged) table, because of the inner logic of TableUtils.
I assume, that the agent does not return the rows in lexicographic order. That would have the same effect as if a row is dynamically appearing during retrieval.
I do not want to exclude an off-by-one error in TableUtils but all unit tests I run so far do not indicate that.
What agent are you using?
Nevertheless, the new version will not show the issue you observed with the mode denseTableDoubleCheckIncompleteRows
Best regards
Frank
Post by Frank Fock
Hi Frank
This is how the List<TableEvent> result should look like and how it actually does - always - when the max-bindings is set to 1 or 32 or some other value.
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... everything normal ... ]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
index=283
index=373
index=774
index=783
It's like this table
IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | 5 | 6 | 7 | 8
2 | 9 | 10 | 11 | 12
3 | 13 | 14 | 15 | 16
IDX | A | B | C | D
----+-----+-----+-----+-----
0 | 1 | 2 | 3 | 4
1 | null| null| 7 | 8 <-- index=1
2 | null| null| 11 | 12 <-- index=2
1 | 5 | 6 | null| null <-- index=1
2 | 9 | 10 | null| null <-- index=2
3 | 13 | 14 | 15 | 16
I tried to describe the reason for this, but it's a bit complicated I admit. Of course it's also possible that I didn't understand your answer correctly. Sorry for the confusion in that case. Then I'd be willing to grasp how sparse and dense tables are the reason for this problem.
Thanks for the clarification on tooBig errors with GETBULK requests!
Best regards
Steffen Brüntjen
-----Original Message-----
Sent: Donnerstag, 12. Juli 2018 08:41
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
If the size of the message encapsulating the Response-PDU
containing the requested number of variable bindings would be
greater than either a local constraint or the maximum message
size of the originator, then the response is generated with a
lesser number of variable bindings. This lesser number is the
ordered set of variable bindings with some of the variable
bindings at the end of the set removed, such that the size of
the message encapsulating the Response-PDU is approximately
equal to but no greater than either a local constraint or the
maximum message size of the originator. Note that the number
of variable bindings removed has no relationship to the values
of N, M, or R.
For the issue you reported, there is no general solution, because it interferes with sparse tables.
A solution would either decrease the performance for sparse tables or will filter out sparse rows.
The latter is not acceptable for intentionally sparse tables.
For dense tables, the filtering could be the best option. Although it would hide new rows although the command generator already detected them.
I am currently about to add an option for getDenseTable to activate a filtering for new rows that appear during the table retrieval and are therefore incompletely received. Would that help you?
Best regards,
Frank
Post by Steffen Brüntjen
Hi Frank
1. Since the agent can not - in the contrast to max-repetition-count - decide how many values to send, the packet size might get too big if you have a table with many (big) columns.
2. There are agents that get into trouble when many columns are requested. This often results in timeouts (no tooBig error) and then there's no other option to requesting fewer bindings.
Maybe the proposed change is the way to go, it's decent, but effective (I believe).
Best regards
Steffen
-----Original Message-----
Sent: Freitag, 6. Juli 2018 18:55
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen,
I will try to reproduce this issue.
Independent from the result, the parameters for TableUtils are not suitable for your setup. The maxNumColumnsPerPDU has to be as large as possible. Otherwise the overall performance will be bad and the likelihood of incomplete table rows increases significantly (through changes in the agent while TableUtils operate).
Best regards
Frank
Post by Steffen Brüntjen
Hi!
I'm using SNMP4J version 2.6.2.
Best regards
Steffen
-----Original Message-----
Sent: Donnerstag, 5. Juli 2018 19:37
Subject: Re: [SNMP4J] max-bindings with big tables
Hi Steffen
What SNMP4J version are you using?
Best regards
Frank
Hi Frank
I believe I found an issue in the TableUtils class. In certain scenarios, the returned List<TableEvent> from getTable(Target target, OID[] columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain incomplete and duplicate rows.
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[ ... 75 normal rows ... ]
[1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = service]
[1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = reception]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
[null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
[1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, null]
[1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, null]
[ ... everything normal ... ]
Here we find some rows split into two: One block with the first 4 columns set null, and another block with the last 3 columns set null.
- max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
- max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
- the device returns many rows (like 120)
- the table request contains more columns than max-bindings
- the table request contains not a multiple of max-bindings
- the problem will also depend on MTU size, but that's not important here
1. TableUtils will request the first 4 columns
2. device returns 60 variable bindings, that's 15 cells per column
3. TableUtils will request the latter 3 columns
4. device returns 60 variable bindings, that's 20 cells per column
This is repeating until all bindings are retrieved. So far, so good. The problem is now, that all second requests (step 3) will receive more rows, and so these requests will reach index 283 (as in the example above) earlier. I did some debugging and I think I found the reason: When the first results with index 283 are received (step 3), TableUtils creates a row for this index. That row is filled up with null values for the first 4 columns so that it's size equals 7 (and not 3). Having size=7, the row is considered finished too soon. TableUtils then prunes these incomplete but finished rows from rowCache. When TableUtils receives the other 4 columns for row 283, it creates a new row with the same index.
How to fix?
max-bindings = 4
columns: .1, .2, .3, .4, .5, .6, .7
1. packet should contain: .1, .2, and .3
2. packet should contain: .4, .5, .6, and .7
Number of columns for the first packet is NumColumnsTotal % maxBindings.
Number of columns for the other packets is maxBindings.
Please tell me if you need more information or if my method invocation is wrong.
Best regards
Steffen Brüntjen
_______________________________________________
SNMP4J mailing list
https://oosnmp.net/mailman/listinfo/snmp4j
Loading...