Single stream processing by klendathu2k · Pull Request #49 · klendathu2k/slurp

klendathu2k · 2024-10-15T19:25:35Z

Work in progress. Submits streaming detectors by host.

…he match.

… and dataset rows in the filecatalog / datasets table.

… lfns to pfns.

…e update branch.

…e of the table doesn't change out from under us.

…tput file minus the .root extension) to the production status entry.

...

…tely

…ction setup. We will replace the $(streamname) token with _X_ to shorten it and eliminate characters that might otherwise need to be escaped.

… it. ...

...

…tely

…ction setup. We will replace the $(streamname) token with _X_ to shorten it and eliminate characters that might otherwise need to be escaped.

… it. ...

…tream name.

... cleanup ...

…uniquely defines a job. ...

…/slurp into single-stream-rebase

… the latest greatest version of main...

klendathu2k · 2025-01-02T02:23:36Z

This PR ought to merge in the changes necessary to support single stream processing in the streaming event builder.

klendathu2k · 2025-01-02T02:25:10Z




+def update_production_status( update_query, retries=10, delay=10.0 ):


This looks like a zombie method... removed in previous PRs, but this PR is bringing
it back from the dead.

klendathu2k · 2025-01-02T02:34:30Z

+    #pprint.pprint(values)
+    #exit(0)
+
    statusdbw.execute(insert)


TODO: switch to dbQuery.

klendathu2k · 2025-01-02T02:36:02Z


-        statusdbw.execute( insert )
-        statusdbw.commit()
+        try:


TODO: dbQuery

klendathu2k · 2025-01-02T02:38:51Z

+
    # Build dictionary of DSTs existing in the datasets table of the file catalog.  For every DST that is in this list,
    # we know that we do not have to produce it if it appears w/in the outputs list.
    dsttype="%s_%s_%s"%(name,build,tag)  # dsttype aka name above


TODO: May want to use 'name_' in the dsttype... I believe we want streamname substituted here.

klendathu2k · 2025-01-02T02:40:17Z

+
+    for output_, tuple_ in dstnames.items():
+        dt, ds = tuple_
+        exists = { c.filename : ( c.runnumber, c.segment ) for c in fccro.execute( f"select filename, runnumber, segment from datasets where runnumber>={runMin} and runnumber<={runMax} and dsttype='{dt}' and dataset='{ds}'" ) }


This is very pythonic... but we should be using the standard dbQuery function for the database access.

klendathu2k · 2025-01-02T02:40:48Z

+        on lfnlist.filename=files.lfn;        
+        """
+        #print(fcquery)
+        lfn2pfn = { r.lfn : r.pfn for r in fccro.execute( fcquery ) }


Again with the dbQuery comment.

klendathu2k · 2025-01-02T02:42:35Z

    # and (3) the hash of the local github repository where the payload scripts/macros are found.
    #
-    repo_dir  = payload #'/'.join(payload.split('/')[1:]) 
+    repo_dir  = payload 


may need to forcibly strip off whitespace and/or a trailing '/' from the path. TBD.

klendathu2k

Should be ready to merge and test.

klendathu2k · 2025-01-02T15:30:01Z

Note... this may have been superseded by PR # 64...

klendathu2k added 30 commits October 15, 2024 13:12

Define a 'streamname' and 'streamfile' argument

3ba7196

Parameterize the output filename

77225b5

Single streaming event builder workflow.

b360159

Run hit unpacked on the single stream outputs.

ddd4b30

Pull in the clustering step. Still need to rework the input query.

bacc9eb

Extend the match with stream name and stream file IF its defined in t…

30ce362

…he match.

The output of each job will be different based on the stream name

9e22241

For optimizing the lookup of existing outputs we will use the dsttype…

ade1181

… and dataset rows in the filecatalog / datasets table.

...

8799f60

... cleanup ...

fbbb536

Runnumber, segment number and output file are not used in mapping the…

a18d1e6

… lfns to pfns.

... cleanup ...

f9bc6d6

A question about how to handle the naming of the production setup.

d895928

... cleanup ...

bdbd4f9

fetch_production_status ignored the dstname and should never reach th…

7d14a24

…e update branch.

Even if we reached this code, the table already exists. So ...

9301fc2

We don't need to try to create the table here.

eefce5f

The table exists or the system has not been created properly. The nam…

5d4c1fd

…e of the table doesn't change out from under us.

Should be better optimized this way.

6ef62a3

And we can rid ourselves of the unused arguments.

df96354

... cleanup ...

cc02a85

file_basename is not used here. Rather we are mapping the dstfile (ou…

e4b4e30

…tput file minus the .root extension) to the production status entry.

... simplify ...

71b9266

... cleanup ...

2674679

... cleanup ...

511d9e5

Cleanup and impose the equal length condition on fc_result and outputs.

526fdb8

...

Not sure why this is showing up as a difference, but commit it separa…

2fe8d39

…tely

Any job submitted with a stream name will be mapped to the same produ…

0223b20

…ction setup. We will replace the $(streamname) token with _X_ to shorten it and eliminate characters that might otherwise need to be escaped.

dstname is no longer an argument to this function.

0e831fb

When streamname is provided we expect to need condor substitution for…

22355b7

… it. ...

klendathu2k and others added 17 commits October 31, 2024 11:29

Cleanup and impose the equal length condition on fc_result and outputs.

615c863

...

Not sure why this is showing up as a difference, but commit it separa…

2be62bc

…tely

Any job submitted with a stream name will be mapped to the same produ…

86d0c3d

…ction setup. We will replace the $(streamname) token with _X_ to shorten it and eliminate characters that might otherwise need to be escaped.

dstname is no longer an argument to this function.

d6c9504

When streamname is provided we expect to need condor substitution for…

7e642a2

… it. ...

Remove unused query.

c6d4442

We no longer have a fixed name... each match varies at the level of s…

4da8226

…tream name.

... cleanup ...

933b2fa

... cleanup ...

Should be better optimized.

8272067

When a streamname is specified, the triplet (run,segment,streamname) …

a9328ec

…uniquely defines a job. ...

Update of the production status depends on the stream name.

45f14dd

First crack at a 'closeout' dataset.

14a68b9

Merge branch 'single-stream-rebase' of https://github.com/klendathu2k…

51f5bcc

…/slurp into single-stream-rebase

Modify python path for alma linux

0f86b3e

add in cups statistics

d8c8e4f

Saving changes on the local working branch that may not be present in…

844f165

… the latest greatest version of main...

Merge branch 'main' into single-stream-rebase

25796cd

klendathu2k marked this pull request as ready for review January 2, 2025 02:23

klendathu2k commented Jan 2, 2025

View reviewed changes

Comment thread slurp.py

klendathu2k commented Jan 2, 2025

View reviewed changes

Comment thread slurp.py

#pprint.pprint(values)

#exit(0)

statusdbw.execute(insert)

Copy link
Copy Markdown

Owner Author

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: switch to dbQuery.

klendathu2k commented Jan 2, 2025

View reviewed changes

Comment thread slurp.py

statusdbw.execute( insert )

statusdbw.commit()

try:

Copy link
Copy Markdown

Owner Author

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: dbQuery

klendathu2k commented Jan 2, 2025

View reviewed changes

klendathu2k marked this pull request as draft January 2, 2025 15:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single stream processing#49

Single stream processing#49
klendathu2k wants to merge 82 commits intomainfrom
single-stream-rebase

klendathu2k commented Oct 15, 2024

Uh oh!

klendathu2k commented Jan 2, 2025

Uh oh!

klendathu2k Jan 2, 2025

Uh oh!

Uh oh!

klendathu2k Jan 2, 2025

Uh oh!

klendathu2k Jan 2, 2025

Uh oh!

klendathu2k Jan 2, 2025

Uh oh!

klendathu2k Jan 2, 2025

Uh oh!

klendathu2k Jan 2, 2025

Uh oh!

klendathu2k Jan 2, 2025

Uh oh!

klendathu2k left a comment

Uh oh!

klendathu2k commented Jan 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants




		def update_production_status( update_query, retries=10, delay=10.0 ):

Conversation

klendathu2k commented Oct 15, 2024

Uh oh!

klendathu2k commented Jan 2, 2025

Uh oh!

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

klendathu2k Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

klendathu2k left a comment

Choose a reason for hiding this comment

Uh oh!

klendathu2k commented Jan 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants