Project

General

Profile

Feature #4842

Control network memory usage calculator/estimator

Added by Lynn Weller 9 months ago. Updated 16 days ago.

Status:
Blocked
Priority:
High
Assignee:
Category:
Applications
Target version:
Impact:

No impact on any ISIS objects or applications.

Software Version:
Test Reviewer:
Story points:
2

Description

It would be useful to have some mechanism (algorithm/script/application) by which a user can estimate how much memory control network applications require to operate on any given network and associated images. This is essential if/when internal users utilize interactive sessions to the nebula hpc and need to designate the amount of memory to access the proper resources (and not have their session killed due to insufficient memory allocated). Larger networks (hundreds of thousands points and millions of measures) in particular use a large amount of memory (my Themis IR global network uses over 30G of ram and I believe the messenger network uses over 200G) but even the more moderate sized networks (tens of thousands of points and hundreds of thousands of measures) require more than a couple G of ram (which currently is the default allocated for a cluster job). I have no idea how to guesstimate how much memory an application like cnetstats, cnetextract, qnet or jigsaw require when running these programs and more often than not exceed allowable memory on the cluster resulting in a cancelled job and lost time. Is there a way a user can estimate memory needs to avoid this sort of thing and ask for appropriate resources?

Not sure what Tracker or Category this request falls under (please change if there is something more appropriate), but I have selected High Priority because this is an ongoing issue that is not going to go away and which currently affects a number of users (not just on the cluster - astrovm4 just about crashed when over 75% of its memory was being used by jigsaw on a portion of the messenger network; the user just simply did not know this could happen) and no doubt others in the near future.

It's possible jigsaw requires a slightly different calculation to include the number of images in the network, but I honestly don't know. Help please!

Running on new OS using latest version of isis (isis3production2017-04-25) though old OS and older versions of isis demonstrate the issue as well.
I'll be happy to point a developer to a larger network if/when necessary.

History

#1 Updated by Tammy Becker 9 months ago

  • Status changed from New to Acknowledged

#2 Updated by Tammy Becker 9 months ago

  • Target version set to 3.5.1 (Sprint 1)

#4 Updated by Adam Paquette 8 months ago

  • Story points set to 2

#5 Updated by Adam Paquette 8 months ago

  • Assignee set to Adam Paquette

#6 Updated by Adam Paquette 8 months ago

  • Status changed from Acknowledged to Assigned

#7 Updated by Adam Paquette 8 months ago

  • Status changed from Assigned to In Progress

#8 Updated by Lynn Weller 8 months ago

Below are some varying sized networks to choose from. You should be able to copy the network and its associated image file list to your working directory without having to grab images. All of the images in the lists provided below reside in the same location.

Moderately sized network (but by no means huge):
/work/projects/themis_control/Completed_Tiles/MergedNetworks/DayIR/FinalTileNetworks/Cebrenia_DayIR_Ground_Final.net
/work/projects/themis_control/Completed_Tiles/MergedNetworks/DayIR/FinalTileNetworks/Cebrenia_DayIR_Ground_Final.lis

Pretty big network (1/4 of a planet):
/work/projects/themis_control/Completed_Tiles/MergedNetworks/DayIR/Themis_DayIR_Merged_TileNet_2017Apr17_Longitude180to270.net
/work/projects/themis_control/Completed_Tiles/MergedNetworks/DayIR/Themis_DayIR_Merged_TileNet_2017Apr17_Longitude180to270.lis

Very big network (global):
/work/projects/themis_control/Completed_Tiles/MergedNetworks/DayIR/Themis_DayIR_Merged_GroundNet_2017Apr17_Edit.net
/work/projects/themis_control/Completed_Tiles/MergedNetworks/DayIR/Themis_DayIR_Merged_GroundNet_2017Apr17_Edit.list

I know that jigsaw may use about 2Gb of memory for the moderately sized network; I have to allot about 5G of memory for cnetedit for the pretty big network; and I need to allot about 30G of memory to run jigsaw (without updating images...which I think doubles the memory) on the global network. Also noted that I need 15Gb of memory to extract from the very big network.

#9 Updated by Lynn Weller 8 months ago

To start by simplifying things, you might try seeing how much memory is required to run cnets stats on the Pretty big network (the middle one in the list).
Here's a command you could use:

cnetstats fromlist=Themis_DayIR_Merged_GroundNet_2017Apr17_Longitude180to270.lis cnet=Themis_DayIR_Merged_GroundNet_2017Apr17_Longitude180to270.net create_image_stats=yes image_stats_file=ImgStats_Themis_DayIR_Merged_GroundNet_2017Apr17_Longitude180to270.csv

Note, this might take 10+ minutes to run. I'm not sure, but it should be less than 30 minutes.

And for something to be ultimately useful, we would sort to need to have a way of guesstimating memory based on the size of the network is possible (is there a way to guess memory need for certain programs based on the number of points and measures?).

#10 Updated by Lynn Weller 8 months ago

Here's another very big network you can test with your calculator:

/work/projects/laser_a_work/lweller/MergedNetwork/NorthPole_2015Merged_Recovered_Ground_Fix8.net

It's been a while since I've worked with this one, but I believe it will need something to the tune of 30Gb of memory.

#11 Updated by Jesse Mapel 7 months ago

There are some very large control nets from Messenger processing at /scratch/jmapel/messenger/WAC_g_DEM/combinept/nets on astrovm4. These nets are several gigs and should provide a good upper-bound for testing.

#12 Updated by Stuart Sides 7 months ago

  • Status changed from In Progress to Blocked

#13 Updated by Stuart Sides 7 months ago

Blocked until Adam returns

#14 Updated by Stuart Sides 5 months ago

  • Target version changed from 3.5.1 (Sprint 1) to 3.5.2 (2017-01-31 Jan)

#15 Updated by Makayla Shepherd 5 months ago

  • Status changed from Blocked to In Progress

#16 Updated by Adam Paquette 4 months ago

  • Impact updated (diff)

#17 Updated by Adam Paquette 4 months ago

  • Status changed from In Progress to Resolved

#18 Updated by Adam Paquette 4 months ago

The script for running this is in /usgs/cdev/contrib/bin/ and is called cnetcalculator.py. To run the script use:

python cnetcalculator.py file1, file2, ... ,filen

You can optionally include --offset X, where X is the value in Gigabytes to offset the estimation. This is defaulted to 2 to be on the safe side.

#19 Updated by Lynn Weller 4 months ago

Permission denied when I try to run the python supplied script. It doesn't look like an executable.

#20 Updated by Lynn Weller 4 months ago

  • Status changed from Resolved to Feedback

#21 Updated by Makayla Shepherd 16 days ago

  • Status changed from Feedback to Blocked

This is going to be blocked until the new ControlNet changes are made as any estimations made now are going to be null and void when those changes come in.

Also available in: Atom PDF