Conversation
…hit up intel when testing
|
|
||
| impl Pccs { | ||
| /// Creates a new PCCS cache using the provided URL or Intel PCS default | ||
| pub fn new(pccs_url: Option<String>) -> Self { |
There was a problem hiding this comment.
question:
maybe use builder pattern here?
| .trim_end_matches("/sgx/certification/v4") | ||
| .trim_end_matches("/tdx/certification/v4") |
There was a problem hiding this comment.
question:
why do we need this?
There was a problem hiding this comment.
In case the user gives a PCCS url with the full route name for collateral fetching - which will not work since we add it later. Strictly speaking we don't really need this.
There was a problem hiding this comment.
i.m.o. (not a strong one, though) it can easily turn into a road paved with good intentions.
possible scenarios:
- there comes around
/tdx/certification/v5,/tdx/certification/v6,/tdx/certification/whatnotand we fail to keep up, or forget, or whatever. - user explicitly wants to use
/tdx/certification/v4and not/tdx/certification/v5but we strip/tdx/certification/v4and add/tdx/certification/v5in the end, which results in somewhat obscure and hard-to-detect misbehaviour (i.e. it doesn't fail right on the spot, but keeps working only to act differently somewhere deep down the road => leading to a spike of wtf/min metric on the user's side, until they dig into source-code, figure out what's really going on, and profoundly thank us for this design choice).
^-- which is why I don't feel at ease with those implicit url path manipulations. today their intent is clear and innocent, tomorrow it might back-fire in unpredictable ways.
in other words, our users (if any) are supposed to know what they're doing. so, we don't need to pad their path with pillows every step of the way.
| tokio::spawn(async move { | ||
| let outcome = pccs_for_prewarm.startup_prewarm_all_tdx().await; | ||
| pccs_for_prewarm.finish_prewarm(outcome); | ||
| }); |
There was a problem hiding this comment.
quesetion:
what happens if we are not in tokio runtime?
There was a problem hiding this comment.
We panic.
Currently we use dcap_qvl to fetch quote collateral which internally uses reqwest's async api for the fetch. Strictly speaking this doesn't need to mean a tokio runtime, but practically, on non-wasm targets, since async-std is now deprecrated, there are not many other options.
My proposal would be to document that this requires a tokio runtime. The alternative would be to re-implement get_collateral_for_fmspc using reqwest's synchronous api if tokio::runtime::Handle::try_current() fails.
That said, there are some other reasons why we might want our own implementation of get_collateral_for_fmspc, as it does quite a bit of unneeded fetching during the initial pre-warm. But the disadvantage is more code to maintain ourselves.
| /// Resolves when cache is pre-warmed with all available collateral | ||
| pub async fn ready(&self) -> Result<PrewarmSummary, PccsError> { | ||
| let mut outcome_rx = self.prewarm_outcome_tx.subscribe(); | ||
| loop { |
There was a problem hiding this comment.
question:
can it happen that startup_prewarm_all_tdx panics (or gets aborted somehow)?
if "yes" then ready() can get stuck forever, right?
There was a problem hiding this comment.
Yeah i guess we could add a timeout and return an error on timeout.
The most likely thing to go wrong is the PCCS requests timeout - and dcap_qvl uses a 180 second timeout for these. But i guess no harm in adding a global timeout for that task.
… relevant to the collateral
… peg/add-pccs-2
0x416e746f6e
left a comment
There was a problem hiding this comment.
🚢 let's merge it and address whatever remains in further issues/PRs
| .trim_end_matches("/sgx/certification/v4") | ||
| .trim_end_matches("/tdx/certification/v4") |
There was a problem hiding this comment.
i.m.o. (not a strong one, though) it can easily turn into a road paved with good intentions.
possible scenarios:
- there comes around
/tdx/certification/v5,/tdx/certification/v6,/tdx/certification/whatnotand we fail to keep up, or forget, or whatever. - user explicitly wants to use
/tdx/certification/v4and not/tdx/certification/v5but we strip/tdx/certification/v4and add/tdx/certification/v5in the end, which results in somewhat obscure and hard-to-detect misbehaviour (i.e. it doesn't fail right on the spot, but keeps working only to act differently somewhere deep down the road => leading to a spike of wtf/min metric on the user's side, until they dig into source-code, figure out what's really going on, and profoundly thank us for this design choice).
^-- which is why I don't feel at ease with those implicit url path manipulations. today their intent is clear and innocent, tomorrow it might back-fire in unpredictable ways.
in other words, our users (if any) are supposed to know what they're doing. so, we don't need to pad their path with pillows every step of the way.
Adds an internal Provisioning Certifcate Cache Service (PCCS) crate which pre-fetches collateral and pre-emptively refreshes it, with the goal of keeping colleteral fetching out of the hot path when verifying attestations.
This copies the functionality from theses PRs to attested-tls-proxy:
API Routes used internally
Here are the routes hit during initial caching - documented in Intel PCCS spec:
GET
https://api.trustedservices.intel.com/sgx/certification/v4/fmspcsSource: src/lib.rs:268
dcap_qvl::collateral::get_collateral_for_fmspcfor each, with both 'processor' and 'platform' as the 'ca' arguement.GET
https://api.trustedservices.intel.com/sgx/certification/v4/pckcrl?ca={processor|platform}&encoding=derSource builder: url_pckcrl():69
GET
https://api.trustedservices.intel.com/tdx/certification/v4/tcb?fmspc={FMSPC}Source builder: url_tcb():77
GET
https://api.trustedservices.intel.com/tdx/certification/v4/qe/identity?update=standardSource builder: url_qe_identity():81
- This gets the identity of the quoting enclave
Example demonstrating pre-warm
Included with the
pccscrate is an example which demonstrate the warm-up cache filling using the Intel PCS.Here is an example output which shows which FMSPCs end up in the cache and which are rejected as being not relevant for TDX quote verification:
Show output revealing which FMSPCs are cached
Possible optimisation
It takes quite a while. We could drastically reduce the number of API calls in the warm-up, as many are not FMSPC dependent and are made redundantly. But this would comes at the cost of rolling more of this ourselves and having less code maintained by Phala.
Show breakdown of how many API calls we could save
Assumingg:
Right now, each of those 74 attempts triggers these PCS calls:
Plus the initial:
So current PCS call count is roughly:
If you cache the shared routes during prewarm:
That becomes:
So the saving is: