Project

General

Profile

Bug #4664

jigsaw - runs significantly slower under new OS/isis3 versus old OS/isis3

Added by Lynn Weller 9 months ago. Updated 2 months ago.

Status:
Closed
Priority:
High
Category:
Applications
Target version:
Impact:

jigsaw, deltack

Software Version:
Test Reviewer:

Description

The issue is apparent with a moderate sized network if you are paying attention, but it screams at you with a large, global network.
I discovered this while running jigsaw on my Themis IR global network on astrovm4 using isis3production2017-02-07. Under this version my network took over 14 hours to run whereas the identical run on astrovm3 using isis3production2016-11-22 took just over 2 hours to run. Since no one wants to work with that much data (33603 images, 1147041 points and 5411726 measures), I verified there is an issue with a smaller network (2760 images, 78402 points and 492593 measures) run in the manner.

NOTE: I did not use a stopwatch, but while running jigsaw under astrovm3 and astrovm4 almost simultaneously I noticed that astrovm4/new isis3 was slower when "Setting input images..." than astrovm3/old isis. In fact, I started the astrovm4 run first, then maybe a minute at most later started the astrovm3 run and the astrovm3 started iterating before astrovm4 because a4 was still setting input images. Clue? On the same note, I think qnet on astrovm4 takes longer to load images than it does on astrovm3.

Test data and runs can be found here:
/work/users/lweller/Isis3Tests/Jigsaw/TimeDiff/
see astrovm3/ for the output run on astrovm3 using isis3production2016-11-22
astrovm4/ for the output run on astrovm4 using isis3production2017-02-07

The image list and network is available under each subdirectory. See jigproc.scr for what was executed.

The astrovm3/old isis3 run took about 170.6s per iteration and about 13 minutes to run whereas astrovm4/new isis3 took about 288.4s per iteration and about 19.5 minutes to run. I was the only person running anything on either of these systems at the time.

Here's my jigsaw command for one of the runs (the only diff in the other is the file_prefix):
jigsaw from=Cebrenia_DayIR_Final.lis cnet=Cebrenia_DayIR_Final.net onet=JigOut_Cebrenia_DayIR_Final.net radius=yes update=no errorpropagation=no outlier_rejection=no sigma0=1.0e-10 maxits=10 camsolve=accelerations twist=yes overexisting=yes spsolve=position overhermite=yes camera_angles_sigma=.25 camera_angular_velocity_sigma=.1 camera_angular_acceleration_sigma=.01 spacecraft_position_sigma=100 point_radius_sigma=50 file_prefix=Astrovm3_isis3production2016-11-22_workdir

This mimics how I was running my global network.

Testing that was done for merging the ipce branch of jigsaw into trunk likely did not notice significant changes in run time due to the small data sets that were used. But I think there was a note of time diff but it was so small it didn't seem relevant.

History

#1 Updated by Lynn Weller 9 months ago

  • Description updated (diff)

#2 Updated by Tammy Becker 9 months ago

  • Status changed from New to Acknowledged

#3 Updated by Makayla Shepherd 8 months ago

These are the results of timing tests run on the old OS and the new OS. The parts of jigsaw that are not a problem have been omitted:

isis3.4.13
Form Normals: 109.76
Solve: 4.89
Corrections: 1.1
Total: 173.31

isis3.5.0
Form Normals: 285.0
Solve: 34.32
Corrections: 167.8
Total: 550.26

#4 Updated by Lynn Weller 7 months ago

  • Status changed from Acknowledged to In Progress
  • Assignee set to Kenneth Edmundson

Ken has made changes available by setting isis to /work/projects/isis/latest/m04664_Ken/

Elapsed time for iterations has improved and is quicker than what we were currently getting via astrovm3 isisproduction:

Astrovm4 m04664_Ken: Run Time: 2017-05-08T10:27:35
Total Elapsed Time: 403.7000 (seconds)

Astrovm4 production: Run Time: 2017-03-14T15:54:04
Total Elapsed Time: 866.4200 (seconds)

Astrovm3 isis3production2016-11-22 Run Time: 2017-03-14T15:50:29
Total Elapsed Time: 511.2200 (seconds)

Repeating tests and including error propagation to see if anything has changed there. Will post results when available.

Update including error prop:
m04664_Ken w/ tweak to not create the inverse matrix file:
Error Propagation Elapsed Time: 1299.4600 (seconds)
Total Elapsed Time: 1699.7000 (seconds)

Here are the numbers for the prior m04664_Ken version:
Error Propagation Elapsed Time: 1293.8400 (seconds)
Total Elapsed Time: 1696.7800 (seconds)

Astrovm4 production:
Error Propagation Elapsed Time: 1293.0000 (seconds)
Total Elapsed Time: 2049.6800 (seconds)

Astrovm3 production:
Error Propagation Elapsed Time: 1214.2800 (seconds)
Total Elapsed Time: 1733.9200 (seconds)

Ken will look into the error prop times. Will update when have opportunity to test.

#5 Updated by Lynn Weller 7 months ago

Ken made a change having to do with error propagation and that component is now running much quicker. Once this post is resolved I can close it!

Update including error prop:
Astrovm4, m04664_Ken w/ latest and greatest tweaks:
Error Propagation Elapsed Time: 839.4400 (seconds)
Total Elapsed Time: 1247.5200 (seconds)

Astrovm4 production:
Error Propagation Elapsed Time: 1293.0000 (seconds)
Total Elapsed Time: 2049.6800 (seconds)

Astrovm3 production:
Error Propagation Elapsed Time: 1214.2800 (seconds)
Total Elapsed Time: 1733.9200 (seconds)

#6 Updated by Kenneth Edmundson 7 months ago

  • Impact updated (diff)

#7 Updated by Kenneth Edmundson 7 months ago

  • Target version set to N/A
  • Software Version set to N/A
  • Test Reviewer set to Lynn Weller

1) The slow down during iterations was caused by a naive and costly implementation to compute the current column in the normal equations matrix. This is a bigger concern now that we have implemented the capability to solve for different number of parameters for different images. Now, the SparseBlockMatrix class keeps track of the column start for each block column in the matrix. This is set only once before the bundle starts instead of repeatedly recalculating it.

2) The slow down in error propagation was due to a bug where we were creating a copy of the "Q" matrix every time it was retrieved from a BundleControlPoint. Now we are getting a reference to "Q."

#8 Updated by Kenneth Edmundson 7 months ago

  • Status changed from In Progress to Resolved

#9 Updated by Lynn Weller 7 months ago

  • Status changed from Resolved to Closed

#10 Updated by Stuart Sides 2 months ago

  • Target version changed from N/A to 3.5.1 (2017-08-08 Aug)

Also available in: Atom PDF