Check txindex and use blockhash if disabled by Pantamis · Pull Request #452 · romanz/electrs

Pantamis · 2021-08-02T15:01:28Z

Solve #451

Use value in synced field for now.

Kixunil

Thanks!

There's a crash issue that needs to be fixed - see comments.

Additionally to what I wrote below, I think we should cache the result of getindexinfo somehow - otherwise the performance may be actually worse. However it's not as easy as it may look. We need to somehow invalidate the state when connection drops and also need to occasionally re-check if txindex synced.

I would store the state as:

enum TxIndexState {
    /// Didn't check yet, re-check next time
    Unknown,
    /// assume unavailable but re-check after a minute since `last_check`
    Syncing { last_check: std::time::Instant, },
    /// Available and synced - do not pass blockhash
    Synced,
    /// `txindex=0` not configured - pass blockhash and don't retry check
    Unavailable,
}

Then have a helper method that will update the state if needed and return bool and have some handler for dropped connection to reset the state to Unknown.

Kixunil · 2021-08-02T15:26:51Z

+struct IndexInfo {
+    synced: bool,
+    best_block_height: usize,
+}


I suggest renaming this to TxIndexInfo. (or TxindexInfo? the former reads better IMO, the latter seems formally consistent)

Then have a helper enum:

[derive(Deserialize, Debug)] #[serde(rename_all = "lowercase")] enum IndexInfo { TxIndex(TxIndexInfo), }

Then try to deserialize Vec<IndexInfo> directly from the response:

fn getindexinfo(&self, indexes: Vec<&str>) -> Result<Vec<IndexInfo>> { self.request("getindexinfo", json!(indexes)).map_err(Into::into) }

Why enum instead of struct?

This is preparation for the possibility of adding other index infos.
If this is upstreamed to bitcoincore-rpc it'd be the best way wit additional #[non_exhaustive] attribute (but not sure if its MSRV supports it OTOH maybe it can be bumped). It will force us to add _ => None match arm below, but that's fine.

I have an alternative black magic idea that would statically guarantee specific type (for our use case) but it's not exclusive because I think the file argument is optional.

I really like the enum I am not familiar with it but it really looks much better indeed fo futur index types :)

I have trouble with getindexinfo because self.request return me a Value type.

So I deserialize it by returning from_value(self.request("getindexinfo", json!(indexes))?).chain_err(|| "invalid index info") is that ok or do you think there is a better way to do it ?

It's OK stopgap solution but long term it should be changed to deserialize into the struct directly. This will be probably cleaner in the bitcoincore-rpc crate, so let's wait for it with that for now and only try to solve it later.

Although that method should be changed anyway because whatever is using it may be underperformant. Should be a different PR, I believe.

long term it should be changed to deserialize into the struct directly.

Do you mean it requires to change self.request ? Any way as you said, if it makes it to bitcoincore-rpc then code will looks better, no doubt.

Although that method should be changed anyway because whatever is using it may be underperformant. Should be a different PR, I believe.

I am not sure if you are talking about using the TxIndexState but I am thinking about how to do it now, it is clear that asking for txindex value each time we query a transaction is not good :/

Do you mean it requires to change self.request ?

Yes something along the lines: fn request<T: serde::Deserialize>(...) -> Result<T>

I am not sure if you are talking about using the TxIndexState

No, I meant changing request(). Anyway this is master branch and I'm not sure if there will be a release before p2p gets finished.

it is clear that asking for txindex value each time we query a transaction is not good

Yep, I'm just unsure how to handle failed connection without a huge code rewrite and forgetting something. One thing that comes to mind is wrapping the connection in a helper struct that will contain that cache and any other cache in the future. The struct is initialized with empty values after connecting.

Kixunil · 2021-08-02T15:41:38Z

    ) -> Result<Transaction> {
        let mut args = json!([txhash.to_hex(), /*verbose=*/ false]);
-        if let Some(blockhash) = blockhash {
+        if let (true, Some(blockhash)) = (self.getindexinfo(vec!["txindex"])?[0].synced, blockhash) {


I think the logic is inverted here: it should only add the blockhash if it's not synced.
There's also one more problem: according to what you said: "RPC command getindexinfo "txindex" returns nothing if txindex=0" this will panic if txindex=0 because you access it unconditionally.

Also it'd be nice to not blow up the whole thing if indexinfo can't be obtained - just log it and keep degraded performance.

I suggest:

let txindex_available = self.getindexinfo(vec!["txindex"]) .unwrap_or_else(|error| { error!("failed to get index info: {}, assuming txindex unavailable", error)); Vec::new() }) .find_map(|info| { match info { IndexInfo::TxIndex(txindex) => Some(txindex), } }) .map(|txindex| txindex.synced) .unwrap_or(false); if let (false, Some(blockhash) = (txindex_available, blokhash) { // ...

Note that synced could be mapped directly in find_map but I find this clearer. Do you agree @romanz ?

Aw I made the stupid mistake x).

Yeah indeed I forgot we need to handle the case where the RPC returns nothing, my bad

No problem! :)

Kixunil · 2021-08-02T15:46:35Z

    ) -> Result<Value> {
        let mut args = json!([txhash.to_hex(), verbose]);
-        if let Some(blockhash) = blockhash {
+        if let (true, Some(blockhash)) = (self.getindexinfo(vec!["txindex"])?[0].synced, blockhash) {


Same as above

Kixunil · 2021-08-02T17:53:55Z

+    Synced,
+    /// `txindex=0` not configured - pass blockhash and don't retry check
+    Unavailable,
+}


This is partial change, right?

Yeah, still thinking about how using it now, quite difficult as you already told x)

Just realized I forgot about #[derive(Copy, Clone)] it will not solve the problem but avoid annoyance later. :)

romanz · 2021-08-09T13:15:37Z

Sorry for the delay, hopefully will more available in the next week :(

Pantamis · 2021-08-09T14:01:28Z

In the last commit I put the TxIndexState in the Connection struct because I need a mutable state and the Daemon struct is not supposed to be mutable. This is weird but it should work. Maybe I didn't made all necessary checks with the RPC call to getindexinfo, tell me if it is an issue.

The TxIndexState is Unknown when connection is initialised. It is checked when the daemon must retrieve transactions and only if it is Unknown or was Syncing one hour ago. The daemon do not add the blockhash to the getrawtransaction if the TxIndexState is Synced, which should increase perf.

@romanz It's ok, @Kixunil already made this PR much much better :) it should be easier to review now (I hope so !)

Kixunil · 2021-08-09T14:10:30Z

        let mut args = json!([txhash.to_hex(), verbose]);
-        if let Some(blockhash) = blockhash {
-            args.as_array_mut().unwrap().push(json!(blockhash.to_hex()));
+        let mut conn = self.conn.lock().unwrap();


Making RPC calls under lock makes my hair stand up. The risk of deadlocks is too great IMO and we already had deadlock issues so I'd really, really like to avoid it. (Disclaimer: I think we already have such things and I think they are not good and there's no good reason to repeat such risky design.)

Actually this is unavoidable here because it's master not p2p. I'm not sure if it even makes sense to try to improve master now. @romanz do you expect a master-based release before 0.9 is out?

~~I don't know if there is a simple way to avoid that here. I think this has to do with the fact that we shouldn't do that here at the first place (but Core should have).~~

~~The other solution I see is to add txindex as field of Daemon struct and use mutable references to Daemon each time we need to get a transaction. I find it quite disruptive...~~

Edit: just saw your edit lol x)

The thing is, this perf improvement is really worth it in master not so much in p2p because p2p is not using txid anywhere except when you already know the txid, which needs only one call... (because we index the blockheight instead so we must retrieve full blocks already so it is only an improvement in case the users already know the txid, it's something but meh)

So yeah, it depends if p2p is the next released or not :)

Kixunil · 2021-08-09T14:12:56Z

+                args.as_array_mut().unwrap().push(json!(blockhash.to_hex())); 
+            }
+            (TxIndexState::Syncing{last_check}, Some(blockhash)) => {
+                if last_check.elapsed() > Duration::from_secs(3600) {


1h seems quite pessimistic to me. AFAIR syncing is not terribly slow so I'd prefer trading a little bit of performance during sync for better performance after it. 10 mins maybe?

I think I was worried about the cost of calling getindexinfo for each transaction (the reason why we introduced TxIndexState at the first place). But 10 min seems fair too, this call is quite fast after all :)

Kixunil · 2021-08-09T14:16:21Z

+enum TxIndexState {
+    /// Didn't check yet, re-check next time
+    Unknown,
+    /// assume unavailable but re-check after a minute since `last_check`


This comment also says after a minute while the code waits an hour. They should be the same once we agree on the period.

Gonna make it 10 minutes !

Kixunil · 2021-08-09T15:13:25Z

the Daemon struct is not supposed to be mutable

I think you mean it's shared? You can mutate shared structs if you do it via safe method such as mutex, rwlock or atomic variables but Connection is actually a good place here because it will get cleared automatically when the connection fails.

The best way to do it in p2p I can think of now is having the state inside Mutex and have another state Checking(CondVar) where threads register to be woken up by the checker thread after the check is done so that multiple threads don't attempt to check same thing simultaneously. Obviously one would lock the mutex, check current state and if the state needs updating register condvar and unlock. This should be combined with helper atomic to avoid Mutex in case the state doesn't need checking anymore.

Pantamis · 2021-08-09T17:44:16Z

the Daemon struct is not supposed to be mutable

I think you mean it's shared? You can mutate shared structs if you do it via safe method such as mutex, rwlock or atomic variables but Connection is actually a good place here because it will get cleared automatically when the connection fails.

The best way to do it in p2p I can think of now is having the state inside Mutex and have another state Checking(CondVar) where threads register to be woken up by the checker thread after the check is done so that multiple threads don't attempt to check same thing simultaneously. Obviously one would lock the mutex, check current state and if the state needs updating register condvar and unlock. This should be combined with helper atomic to avoid Mutex in case the state doesn't need checking anymore.

Really interesting, I have so much to learn. Indeed you are right, it seems more appropriate to hold IndexState in Connection struct.

As I said in an other comment, the perf improvement is really worth it in master but not so much in p2p because p2p is not using txid anywhere except when you already know the txid. So we would only see perf improvments when the user must retrieve a transaction from its txid, there is no improvment possible when retrieving address history or spending transaction in p2p because we must already load the whole block that contains it to find it.

In master, load_txn is used to scan the set of candidate transactions and it used its knowledge of the txid we get from the store index. So the perf would greatly benefit from txindex=1

Kixunil · 2021-08-09T20:41:24Z

because we must already load the whole block that contains it to find it.

Ah, I missed this little detail. I think we should measure if slowness of RPC is so big that it's worth loading whole block from P2P and searching and consider using RPC if not.

Pantamis · 2021-08-09T21:15:18Z

because we must already load the whole block that contains it to find it.

Ah, I missed this little detail. I think we should measure if slowness of RPC is so big that it's worth loading whole block from P2P and searching and consider using RPC if not.

There are several tradeoffs here. The p2p branch's index is very very small because we only store the blockheight but this requires retrieving full block when searching for a transaction under certain criteria (funding, spending or txid).

I think we can safely say if txindex is enabled we should discard txid store index, this would reduce electrs' index even more and retrieving transactions with txindex in Core with RPC is very very efficient ! (it directly reads the full transaction, txindex return the position in blockfile of the transaction with the given txid, ElectrumX requires txindex and is faster than ElectRS for example)

This however cannot be used to retrieve transactions based on scripthash and spending outpoint because we only index the blockheight, txindex will not help in perf here because we must retrieve the full block for each transaction anyway (and this is quite efficient with P2P). The only way to increase perf with txindex would be that the index return the txid but it implies it is bigger so...

Kixunil · 2021-08-10T04:57:24Z

Damn, it feels like my knowledge of electrs is diminishing over time. :( You're right.

It got me thinking p2p stores 32 bits of block height which means it can run for 81808 years assuming 210000 blocks every four years. Block size is capped at 4M, which needs 22 bits to represent any offset. If we added 2B to every input and output we could represent block height for up to 1278 years and any offset. This is an increase in size by 11.11-16.67 % traded for possibly significant speed improvement.

Pantamis · 2021-08-10T09:43:47Z

Damn, it feels like my knowledge of electrs is diminishing over time. :( You're right.

I would rather say that the fact that p2p branch and master are so different and coexist since half a year explain the difficulties x)

It got me thinking p2p stores 32 bits of block height which means it can run for 81808 years assuming 210000 blocks every four years. Block size is capped at 4M, which needs 22 bits to represent any offset. If we added 2B to every input and output we could represent block height for up to 1278 years and any offset. This is an increase in size by 11.11-16.67 % traded for possibly significant speed improvement.

It could be interesting to look at what txindex is storing exactly. We may also need to have the block offset in the blockfile too, which exists in block index of Core but if I understand the discussion in #308 well there is no RPC command to ask it.

What do you mean by adding "2B to every input and output" ? What I understand from your proposal: instead of using the 32 bits to store the blockheght, we use 22 bits to store an offset in the block and the 10 other for height.

Kixunil · 2021-08-10T09:50:07Z

Electrs doesn't access block files in p2p directly anymore, so offset is not needed.

What do you mean by adding "2B to every input and output" ? What I understand from your proposal: instead of using the 32 bits to store the blockheght, we use 22 bits to store an offset in the block and the 10 other for height.

4B currently + 2 additional bytes == 6B == 48 bits
That's 22 bits for offset (up to 4 MiB) and 26 bits for block height (~1278 years)

Note that meanwhile I looked at the code and noticed whole block is parsed into transactions, allocating each - the proposed change assumes this gets optimized.

Pantamis · 2021-08-10T10:08:28Z

Electrs doesn't access block files in p2p directly anymore, so offset is not needed.

Aww I think I got it ! You want to get the block through P2P then using the stored offset to not have to scan all the transactions to find which one is spending an outpoint or using the script with the hash we are searching for.

Actually I really like this idea, I think it can improve a lot the perf at low cost !

4B currently + 2 additional bytes == 6B == 48 bits
That's 22 bits for offset (up to 4 MiB) and 26 bits for block height (~1278 years)

Ok, if RocksDB is still performing well with 48 bits instead of 32 bits in stored value, it is clearly fine !

Note that meanwhile I looked at the code and noticed whole block is parsed into transactions, allocating each - the proposed change assumes this gets optimized.

It doesn't seems hard to receive the raw block data and only start processing it when the offset is reach until we get a valid transaction. The biggest part seems too me detecting when the transaction is actually complete soon enough. Then we parse it and check that it is indeed satisfying the criteria we were searching for (scripthash, spending an outpoint or txid)

Kixunil · 2021-08-10T10:19:51Z

if RocksDB is still performing well with 48 bits instead of 32 bits in stored value

AFAIU RocksDB works on raw byte arrays so it shouldn't care about number of bytes.

It doesn't seems hard to receive the raw block data and only start processing it when the offset is reach

We need to implement it ourselves though - bitcoin library is currently hostile to this kind of optimization. Was thinking of proposing alternative APIs some time ago but never got the time. Also would be better to have some measurement before we claim it actually improves anything. But anyway, block parsing is actually quite easy - I did it in the past to learn more about Bitcoin although without SegWit.

Pantamis · 2021-08-11T14:46:46Z

But anyway, block parsing is actually quite easy - I did it in the past to learn more about Bitcoin although without SegWit.

One other thing we can do then (if block parsing is quite fast) is to store the index position of the transaction in the block along with blockheight instead. The smallest possible transaction size seems to be 61 bytes. This means a block has at most 4M/(4*61) = 16394 transactions which means 14 bits are sufficient (in theory 15 but I am highly conservative already on maximum number of transaction in a block already) to store the position of the transaction.

So we can have 14 bits for position in block and 48 - 14 = 34 bits for blockheight.

This avoids having to filter every transaction of the block which already allow a good speed up if it is easy to extract the transaction from its position in the block !

Kixunil · 2021-08-11T15:55:51Z

which already allow a good speed up

Did you actually measure it? As far as I understand you only save the cost of hashing and comparing hashes, which I don't expect to be very high. Also making sure electrs can run for longer than a few hundred years sounds like waste of time. There will be new updates at least to resolve the hardfork required to fix time overflow.

However, with your scheme we could just add a single byte which would be 40 bits - 26 for height (same number, yay!), 14 for index. So only 5.55-8.34% increase.

Just checked that 14 vs 15 bits is just 10 tx difference. To be on the safe side we could store overflow as highest number and continue linear search if tx is not found there.

Pantamis · 2021-08-11T17:46:17Z

Did you actually measure it? As far as I understand you only save the cost of hashing and comparing hashes, which I don't expect to be very high.

In fact you can save a lot more ! This is how we find a transaction with an output with a certain scripthash in blocks that the store index return in p2p branch for this scripthash:

electrs/src/status.rs

Lines 261 to 264 in 94efd3a

    
           self.for_new_blocks(funding_blockhashes, daemon, |blockhash, block| { 
        
               let txids: Vec<Txid> = block.txdata.iter().map(|tx| tx.txid()).collect(); 
        
               for (pos, (tx, txid)) in block.txdata.into_iter().zip(txids.iter()).enumerate() { 
        
                   let funding_outputs = filter_outputs(&tx, &self.scripthash);

electrs/src/status.rs

Lines 395 to 402 in 94efd3a

    
           fn filter_outputs(tx: &Transaction, scripthash: &ScriptHash) -> Vec<u32> { 
        
               let outputs = tx.output.iter().zip(0u32..); 
        
               outputs 
        
                   .filter_map(move |(txo, vout)| { 
        
                       if ScriptHash::new(&txo.script_pubkey) == *scripthash { 
        
                           Some(vout) 
        
                       } else { 
        
                           None

We iterate on all outputs of all transactions of all blocks returned by the index to find which transactions have an output with the same scripthash.

If we had the position in block of the transactions at the first place we wouldn't have to iterate on all outputs of all transactions like this for each block just to find which transaction has an outpoint with the scripthash.

This is the same when you search a transaction with a certain input, I have to scan all inputs of all transactions of a block to find one that spend an outpoint when implementing outpoint.subscribe in #446.

When you want to get a transaction while knowing the txid yeah it is a bit different because Core will find it for us (using the blockhash or not). So you don't even need to ask for the block.

Storing the transaction position in the block avoid all the checks on all other transaction in the same block. So I think it should really be faster !

However, with your scheme we could just add a single byte which would be 40 bits - 26 for height (same number, yay!), 14 for index. So only 5.55-8.34% increase.

Just checked that 14 vs 15 bits is just 10 tx difference. To be on the safe side we could store overflow as highest number and continue linear search if tx is not found there.

Great ideas ! Really good 👍

Kixunil · 2021-08-12T10:33:51Z

Note that one still needs to iterate over all txins and txouts in order to even parse the transaction. (there's no transaction size field, so you must decode input count as varint, decode all script sizes as varints + do some math to compute next offset then decode output count, decode all output script sizes)

So the additional overhead is just hashing and comparing.

I suggest we write an isolated benchmark that will try to find a transaction as fast as possible using all three methods: seek to position, seek to index, exhaustive search. Also maybe consider only optimizing script finding which requires hashing - input finding only requires comparison, so it could be fast enough. Then we can learn how much it helps.

Pantamis · 2021-08-12T16:44:08Z

Note that one still needs to iterate over all txins and txouts in order to even parse the transaction. (there's no transaction size field, so you must decode input count as varint, decode all script sizes as varints + do some math to compute next offset then decode output count, decode all output script sizes)

Ow ok, so you have to do it once, not sure if it improves anything then. So indeed a benchmark is needed to know...

Pantamis · 2021-09-18T09:49:15Z

p2p is merged.

The improvment of this PR was great for old index, with the new one it is less valuable. It will be even less useful when Bitcoin Core 23.0 will be released since the perf loss from specifying blockhash will be resolve at Core level.

The idea is still something interesting for new indexing. In particular, it could be great to add the option to not index txid at all with ElectRS if txindex is used in Bitcoin Core. This way, we don't lose time in searching the blockhash when we have to get a transaction (without the whole block) from ElectRS.

Personnaly, to improve the perf slightly I simply replaced blockhash by None in get_transaction function before compiling ElectRS.

I think the goal is now very different from what this PR is doing, so I close it. Thanks for your time on its review @Kixunil !

Kixunil · 2021-09-18T10:45:21Z

when Bitcoin Core 23.0 will be released since the perf loss from specifying blockhash will be resolve at Core level.

Curious about this, do you have more information?

Thanks for your time

Thank you for your contribution as well! PR is probably the best thing one can do for an Open Source project. I hope you're enjoying Rust and will see PRs from you again. :)

romanz · 2021-09-18T10:57:27Z

Curious about this, do you have more information?

IIUC, this is handled by bitcoin/bitcoin#22383.

Pantamis · 2021-09-18T13:04:24Z

IIUC, this is handled by bitcoin/bitcoin#22383.

Yep, exactly, a bit sad that a such small change (but important !) didn't make it on time for 22.0 release x')

Thank you for your contribution as well! PR is probably the best thing one can do for an Open Source project. I hope you're enjoying Rust and will see PRs from you again. :)

I have a very busy period to go through first, then I will come back to contribute to bitcoin-rust projects :P with RGB and rust-lightning I think there is a bright futur for Rust usage in Bitcoin ! I want to do more !

Check txindex and use blockhash if disabled

ab78a9b

Pantamis mentioned this pull request Aug 2, 2021

Speeding up getrawtransaction when txindex=1 #451

Closed

Kixunil suggested changes Aug 2, 2021

View reviewed changes

Fix and introduce TxIndexState

2ccc575

Kixunil reviewed Aug 2, 2021

View reviewed changes

TxIndex state handling

5f62240

TxIndexState support in Connection

b796fef

Kixunil reviewed Aug 9, 2021

View reviewed changes

Pantamis mentioned this pull request Aug 26, 2021

Optimize batched RPC lookups in P2P branch #464

Closed

Pantamis closed this Sep 18, 2021

Conversation

Pantamis commented Aug 2, 2021

Uh oh!

Kixunil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romanz commented Aug 9, 2021

Uh oh!

Pantamis commented Aug 9, 2021

Uh oh!

Kixunil Aug 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pantamis Aug 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kixunil commented Aug 9, 2021

Uh oh!

Pantamis commented Aug 9, 2021

Uh oh!

Kixunil commented Aug 9, 2021

Uh oh!

Pantamis commented Aug 9, 2021

Uh oh!

Kixunil commented Aug 10, 2021

Uh oh!

Pantamis commented Aug 10, 2021

Uh oh!

Kixunil commented Aug 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pantamis commented Aug 10, 2021

Uh oh!

Kixunil commented Aug 10, 2021

Uh oh!

Pantamis commented Aug 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kixunil commented Aug 11, 2021

Uh oh!

Pantamis commented Aug 11, 2021

Uh oh!

Kixunil commented Aug 12, 2021

Uh oh!

Kixunil Aug 9, 2021 •

edited

Loading

Pantamis Aug 9, 2021 •

edited

Loading

Kixunil commented Aug 10, 2021 •

edited

Loading

Pantamis commented Aug 11, 2021 •

edited

Loading