Skip to content

Make parsing from remote sources resilient to dropped connections #288

@JustinLoye

Description

@JustinLoye

In bgpflux, I parse multiple remote archive files concurrently. Due to processing times, the reading of some files occasionally gets paused/delayed while others are consumed.

Unfortunately, when these reads are delayed, RIPE RIS and RouteViews drop the connection after 10 and 60 seconds respectively. When this happens, the parser silently stops, resulting in unreliable data.

With this minimal reproducible example, I get around 15k BGP elements instead of the expected 654,472:

use bgpkit_parser::BgpkitParser;
use std::{thread, time};

fn main() {
    let url = "http://data.ris.ripe.net/rrc06/2010.08/bview.20100831.2359.gz";

    let parser = BgpkitParser::new(url).unwrap();
    let mut n_elems = 0;

    for (i, _elem) in parser.into_iter().enumerate() {
        if i == 1000 {
            println!("Reached 1000 elements. Sleeping 15s to trigger RIPE timeout...");
            thread::sleep(time::Duration::from_secs(15));
        }
        n_elems += 1;
    }

    println!("Result after artificial delay: {}", n_elems);
}

If my use-case is not too niche, could you please take a look?

One workaround would be to implement a resumable HTTP reader in the oneio dependency, using range headers to request data from where the parser left off. This is similar to how wandio handles it to make BGPStream work.

I was able to validate this approach with a POC, and I opened a feature request in oneio bgpkit/oneio#74

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions