DX cluster stops working

24 posts / 0 new
Last post
NZ0T
DX cluster stops working

I am running CQRLOG on Mint 19.3 64 bit (just upgraded but the problem started before upgrade when I was on 19.2) and over the last couple of weeks my DX cluster runs for a while and then stops. I can restart it and it works for a while but it then stops again. This happens with the cluster I have used for years DXSpots and I have also tried VE7CC with the same problem. I am using an old Dell Optiplex 380 desktop and a Panda Wireless PAU06 dongle. This all had worked just fine for a long time. I have also tied an older 802.11b/g dongle with the same problem. I have had no other problems with anything else related to our internet wifi.

Any ideas?

73, Bill NZ0T

oh1kh
DX cluster stops working

HI Bill!

How I remember that a same kind of issue has shown up here before. It sounds so similar.
Yes I found it https://www.cqrlog.com/node/2198 it seems to remain unresolved.
I remember now I got this happen few times, but then it did not happen again. That makes it hard to catch.

How about limiting spot rate at DXCuster end with some rules there? Does it effect any way?

--
Saku
OH1KH

ei2idb (not verified)
DX cluster stops working

Hi Bill and Saku,
I have exactly the same problem with very often random disconects problem on Ubuntu 18.04. Trying to use different servers, filters on cluster site and in cqrlog and it's not helping, I think it's even worst. Also changed domain name for ip address in preferences, but for for nothing.
That's never happens with other software. Sometimes I am using tlf for contests and telnet to dx cluster working fine. Xdx https://github.com/N0NB/xdx working hour by hour without any problems.
In xdx is option called "Keepalive packets" to turn on in the case of such problems. It's just sending backspace to server every few minutes to keep connection. Maybe it will be worth to try the same way in cqrlog? What do you think, Saku?

oh1kh
DX cluster stops working

Hi Bill and Slav!
Seasons greetings!

There is a document about keepalive here http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
On third page there is description how to enable keepalive on program that doesn't do it as default.

On 1st page there is description how to look and write values. My Fedora has just those same values here. But they are no in use if program does not activate keepalive (I understood).

Making it active without change to program can be done with libkeepalive.
I did install package to my fedora with command:
sudo dnf install libkeepalive

For ubuntus it is something like "sudo apt-get install libkeepalive"
After that carlog can be started like:

[saku@tpad ~]$ LD_PRELOAD=libkeepalive.so KEEPCNT=20 KEEPIDLE=180 KEEPINTVL=60 cqrlog

Parameters are explained in web page text. It is possible to change them and do testing.

I do not know does this help. I have never needed to worry about disconnects. But it is rather easy to test this away.

IÍ„'m curious to hear results!

--
Saku
OH1KH

ei2idb (not verified)
Working like a charm!

Saku,
thanks for your another one complete and useful answer. I am busy or lazy - not sure here ;) and just made copy-paste yours 'LD_PRELOAD=libkeepalive.so KEEPCNT=20 KEEPIDLE=180 KEEPINTVL=60 cqrlog' after installing keepalive packet. And it's working! No more troubles with disconecting from cluster.
Huge thanks,

ei2idb (not verified)
I was too optimistic

Hi Saku,
It's not working :(
I know that is not linux forum here and your time is limited but I don't know if I am right after googling in this matter.

AFAIK netstat --timers -tn should shows state of tcp timers. Below first row showing cqrlog's connection to hamqth dx cluster and timers are always off. Next two rows showing other connections with keepalive timers on:

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State Timer
tcp 0 0 192.168.1.11:56492 94.23.250.107:7300 ESTABLISHED off (0.00/0/0)
tcp 0 0 192.168.1.11:50186 99.86.122.41:443 ESTABLISHED keepalive (1,08/0/0)
tcp 0 0 192.168.1.11:59130 35.163.194.26:443 ESTABLISHED keepalive (519,22/0/0)

Is that mean that for any reason keepalive is not active for cqrlog?

I also googled that I can check keepalive packets with: sudo tcpdump -vv "tcp[tcpflags] == tcp-ack and less 1":

tcpdump: listening on enp7s0, link-type EN10MB (Ethernet), capture size 262144 bytes

and thats it what I can see here. Never showing any packets.

Next one I observed last few days is that very often dx cluster window in cqrlog only looks like disconnected but in fact it's freezing. E.g. if I send command sh/u than after ages of waiting whats coming first is few dx spots and in next step showing active users.

Sometimes I am using tlf for contesting. This logger has a built-in dx cluster client and band map. Same laptop, same OS, same ISP, same dx cluster and never stopped.

Last few days I checked again Xdx. Can work hours and hours without any problems.
Thanks,

oh1kh
I was too optimistic

Hi Slav!

Sad to hear that it does not work! But glad to hear that you have cleared out that keepalive does not work. I did never check that keepalive really work as specified. Thanks for clearing it out!

Then we have to look something else.

If you start cqrlog from console with debug=1 > /tmp/debug.txt and another console
with telnet direct to DXCluster (or tlf) and once you see they differ stop cqrlog and open debug.txt with text editor and search "DX de" that contains the spot that did not come to cqrlog.

You should see something like:

DX de IT9CHA: 3660.0 IT9ECY Astronauti Perduti 1630Z
Enter critical section On Receive
Leave critical section On Receive
Sending: fmv

Msg from ridex_: 0
TelThread.Execute - before Synchronize(@frmDXCluster.SynTelnet)
TelThread.Execute - after Synchronize(@frmDXCluster.SynTelnet)
Found - IT9ECY
SELECT * FROM cqrlog_common.bands where (b_begin <=3.66 AND b_end >=3.66) ORDER BY b_begin

SELECT * FROM cqrlog_common.bands where (b_begin <=3.66 AND b_end >=3.66) ORDER BY b_begin

-----------------modeSSB
SELECT id_cqrlog_main FROM cqrlog003.cqrlog_main WHERE adif=248 AND band='80M' AND ((qsl_r='Q') OR (lotw_qslr='L')) AND mode='SSB' LIMIT 1

SELECT id_cqrlog_main FROM cqrlog003.cqrlog_main WHERE adif=248 AND band='80M' AND mode='SSB' LIMIT 1

SELECT id_cqrlog_main FROM cqrlog003.cqrlog_main WHERE adif=248 AND band='80M' LIMIT 1

dx_prefix:I
dx_cont: EU
Freq: 3.66
Call: IT9ECY
Color: clBlack
Index_g: 50313000

Note that "fmv" belongs rig polling. To get less these set polling delay up to 10000 (10sec) or more.

You should find "critical section" lines and "Synchronize" lines there. If they are not found nothing is received.
But if they are there and no text comes to DXCluster window then problem lays somewhere in printing routines.

I just fixed HamLib CW keyer routines and found out that answer from rig did come always too late after HamLib cw command.
I found from internet one Lazarus issue that was a bit similar.
There if Freepascal lnet library was used together with synaptic tcp/udp routines then lnet based connection did not cause event when something was received from net.

Cqrlog uses lnet library for several connections including DXCluster.
On the other hand remote modes use synaptic for connections.

So it could be the reason. I managed to clear this out with CW keyer by using small sleep time and then ProcessApplication messages that releases code for a while to give time for system events.

Now I do not know/remember how DXCluster connection works, but as it uses @Syncronize it relies on events that should happen every time something comes from DXCluster.
If they fail it could very well cause this kind of bug. But this is just a guess.

Sad thing is that it does not happen here. At least I never had noticed it. That makes testing very complicated.

I have to think a bit about this...in future

--
Saku
OH1KH

NZ0T
It is working for me for the

It is working for me for the most part. It does stop on occasion but not nearly as bad as before.

73, Bill NZ0T

ei2idb (not verified)
I was too optimistic

Hi Saku,
I will check later what you asked for because I found that TCP session timeout on my router was set as low as 120 seconds and I think it was for the safety reasons. Have no idea if it will better but I changed it for 3600 seconds and now I am checking how its working now.
Anyway, I tried to force tcp alive packets in two ways: using linux built-in tcp_keepalive support and preloading library with libkeepalive.so. No one is working in my case. Probably cqrlog is not prepared to use linux built-in tcp keepalive support. I don't know why the preloaded library method is not working on my computer.

ei2idb (not verified)
OK now

Hi Saku and Bill,
It seems to work fine after correction of tcp session timeout on router. Almost three days without any disconnections from dx clusters.
Thanks again,

oh1kh
Ok now

Ok Slav!

Actually the keepalive would have resulted the same if just preloading library would have worked as should.
Nice that you got it work that way.

There are two open questions. Why preloading does not work and how to enable keepalive from Lazarus lnet unit that is used for dxcluster connection?
I have made some searchs (but not very deep) about setting keepalive, but no luck yet.

--
Saku
OH1KH

ei2idb (not verified)
OK now

Hi Saku,
I am not a programmer (I was kind of but it was abt 10k years ago when databases was as flat as dBase programmed in Clipper language or something :D )
Have not idea I 'm right but what I found is that program must use sockets to use tcp keep alive. Some info is here http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/programming.html and here http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/addsupport.html

oh1kh
Ok now

Me neither!

Made my first pascal programs at the of beginning of -80s with mainframe machine that had 300baud teletype printers as user interfaces. But it was totally different then. No ethernet, just current loop serial lines.

Found very simple quide that fits at least for fedora.
https://www.thegeekdiary.com/how-to-change-the-default-timeout-settings-...

I do not know if it works with dxcluster connection. Just have to set TO to very small and look with tcpdump does it change anything.

--
Saku
OH1KH

NZ0T
OK, I finally tried this and

OK, I finally tried this and it seems to be keeping the cluster going so far. Does this mean I have to start CQRlog in terminal every time with 'LD_PRELOAD=libkeepalive.so KEEPCNT=20 KEEPIDLE=180 KEEPINTVL=60 cqrlog' for this to work? Or is there a way to start CQRlog form the desktop and have this work?

73, Bill NZ0T

oh1kh
OK, I finally tried this and

Hi Bill!

You can do a script file that does those things, then put the script to start menu properties(or icon properties) instead of calling cqrlog directly.

Here file named: "start_cqrlog" in user home directory.

nano ~/start_cqrlog

(start these lines and then save):

unset LD_PRELOAD
export LD_PRELOAD=libkeepalive.so
export KEEPCNT=20
export KEEPIDLE=180
export KEEPINTVL=60
exec /usr/bin/cqrlog

Then set execute bit so that you can start the script:
chmod a+x ~/start_cqrlog

Now you can start cqrlog from console with command:
~/start_cqrlog

Or use it in start menu properties (or icon properties). Setting it depends on your desktop (KDE, XFC, Openbox, LXDE ...)

See atteched picture. There is my LXDE startup menu. By right click on "CQRLOG" opens a submenu where "ASETUKSET" (settings) opens a settings window. There I must use full path name. I.E replace the "~" with my username path "/home/saku".

But settings are a bit different in every desktop version.

Here is a bit same kind of case: https://bbs.archlinux.org/viewtopic.php?id=81295

File: 

--
Saku
OH1KH

NZ0T
Thank you Saku,

Thank you Saku,

I got the ~/start_cqrlog command working in console but I am not seeing ASETUKSET when I right click the desktop CQRLOG icon. I am using Linux Mint with Mate desktop.

oh1kh
libkeepalive startup

Hi Bill!

Yes, as I said different desktops have different ways to adjust settings.
You have to do some Googling to find how Mate desktop items can be configured.

You can start with this www.google.fi/search?q=Mate+startup+item+edit
I can not test suggestions from this search, that is what you have to try.

--
Saku
OH1KH

NZ0T
Saku, thanks! I'll keep

Saku, thanks! I'll keep trying. Sure is nice to have the cluster keep going even if I do have to open it in console.

73, Bill NZ0T

oh1kh
DX cluster stops working

Hi !

I wish you luck!

I have been thinking that this might be ISP problem as only few users have reported this.

At the time I was still working we got a new firewall between office and mill control system. With initial settings that caused problems as one of production machines sent messages so seldom to main database server that was at office side of network. The connection died away quite quickly.
Then I proposed to add connection related rule timeout to 15 minutes and that helped.

ISP may have set these kind of timeouts to rather small as in network with many clients timeout length will effect directly to connection table sizes in NAT and firewalls.

Well, that is just a guess. And anyway there is nothing else for user to do than set keepalive packets to be send often enough. ISPs do not change their settings by user requests.

--
Saku
OH1KH

NZ0T
Well I am still seeing

Well I am still seeing disconnections so maybe it is a problem with my ISP. How do i change it so my keepalive packets get sent more often?

73, Bill NZ0T

ei2idb (not verified)
Well I am still seeing

Hi Bill,
I think the key here is KEEPINTVL (parameter is in seconds), but try to change everything for:
KEEPCNT=5
KEEPIDLE=5
KEEPINTVL=10

NZ0T
Thanks! I will try those

Thanks! I will try those parameters.

NZ0T
Well, this worked well for

Well, this worked well for quite a while but I'm seeing the DX cluster stopping again - not sure what else i can do LOL.

DL8FMA
Telnet TCP broken

Under Linux Debian Bullseye and PUTTY no problem with keep-alive for dxcluster, but no keep alive with cqrlog. Sysctl tcp-keepalive and command with Libkeepalive0 also not working...