[Moller_intdet] [EXTERNAL] Question: ADC board software - Too many files open core dump

Michael Gericke Michael.Gericke at umanitoba.ca
Tue Jan 21 11:01:49 EST 2025


Hi everyone,

Sorry if you get this multiple times (I am sending to several email 
lists with significant overlap).

I have an annoying problem. For the PMT testing, I want to run through a 
given set of tests for each PMT in one sitting (DAQ and analysis program 
running continuously), which
means starting the ADC DAQ on the computer end once and (currently - 
ideally) letting it collect upwards of about 1000 5 second long runs at 
a time for each PMT.  The program
runs fine for about 180 5 second sets and then core dumps with "Too many 
open files (src/epoll.cpp:38)".

If you have encountered this problem and figured out how to solve it 
(beyond the band aid suggestions given in online posts), can you please 
let me know.

Thanks,

Michael


Some more details ...


I am writing both root files and binary files, for the raw data. The 
code uses a separate thread to write the root trees, while it keeps 
getting data from the ADC board continuously.

The thread that writes the root trees and stores them to file is started 
only once and within it, ROOT files are opened and closed in the same 
loop. The main process writes
the raw data file and handles the communication with the ADC board. The 
raw data files are written in a function that both opens and closes each 
file that is being written.

There are two zmq sockets for communication with the ADC (control and 
data) and each is opened and closed for each 5 second data chunk that is 
being received from the ADC
(each single run).

I scoured forums and various information sources, but most of posts I 
find suggest that once needs to increase the nofile parameter in 
|/proc/sys/fs/file-max |or similar.
I have done that, but it doesn't really change anything. It just allows 
me to take more runs, but doesn't solve the problem.

In principle, I should be able to run this indefinitely (aside from 
storage space issues).

Of course I could just stop the process and restart periodically 
(collecting fewer runs at a time), but I want to move through a series 
of PMT voltages in one run series, with
as little time as possible between changing the HV.

I have encountered this crash every time I start the program and run for 
extended periods of time (1 hour or more) and it is driving me crazy, 
because I think the program is written
such that there should not be "many" open files at a time (I think maybe 
at most 5 at any given time).

The only way around this at the moment is to use a script that restarts 
the ADC software end after some number of runs before this crash occurs. 
That's okay, but doesn't resolve
the basic problem.

Somehow I feel like there is either a bug in ZMQ or I am doing something 
wrong (likely the latter, but some posts I found seem to suggest the 
possibility of the former).

I don't want to throw a bunch of code at you. For now, I am just 
wondering if any here has encountered a similar issue and could guess 
what the problem is.

If anyone wants to take a look (if you have time), the code is located here:

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mtgericke_MOLLER-2DIntElec-2DPMTGain-2DMeas&d=DwIDaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=sAcmfDnmgp80OHNp8BT9B0ppMns-xHhof47DzJMhOgs&m=-EZJCab9imxV2Ed78nsMj_9sCCxc72kDxOYbIQNZ8oJb1W_p8NtO1HTooVwaWx-W&s=-Va8O9TaTWGG5O4LbgCyL4UgAXklY7eqfl0FG4dbqro&e= 

The relevant file is CMData.cxx

The relevant functions are:

void CMData::StartDataCollection()   - file lines 712-733  , 758-781,

void *CMData::GetServerData(void *vargp)  - file line 826

void* CMData::FillRootTreeThread(void *vargp) - file lines 1070, 1293

void* CMData::GetSocket(SockType type)   - file line 458


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/moller_intdet/attachments/20250121/709034fb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gU4vnSoaU2NonqHW.png
Type: image/png
Size: 703404 bytes
Desc: not available
URL: <https://mailman.jlab.org/pipermail/moller_intdet/attachments/20250121/709034fb/attachment-0001.png>


More information about the Moller_intdet mailing list