problems with the new CIs

Here's the place for discussion related to coding in FreeCAD, C++ or Python. Design, interfaces and structures.
Forum rules
Be nice to others! Respect the FreeCAD code of conduct!
Post Reply
User avatar
uwestoehr
Veteran
Posts: 4961
Joined: Sun Jan 27, 2019 3:21 am
Location: Germany
Contact:

problems with the new CIs

Post by uwestoehr »

I have problem with the new CIs.
(The GitLab CI is reliable, it brings the same result not matter how often I trigger it with the same code (by forcing a rebase of an unchanged PR.))

The new CIs however make me crazy since they sometimes fail, in the next rigger run, they succeed. Sometimes one only of it fails, the other succeeds. As if they have something in the cache.
I experimented now around with them and my latest big PR: https://github.com/FreeCAD/FreeCAD/pull/8355
And cannot find a problem with the PR or why the CIs behave different on every trigger run.

I also don't understand the difference between the CI called "Build2004 and "Build2204".

@openBrain?

Besides this, the new CIs suffer from this problem: https://github.com/FreeCAD/FreeCAD/pull ... 1416129895 , see also this forum thread: viewtopic.php?p=657274#p657274
The strange thing is that when you let the CIs run on the same code, you sometimes get this error or not.
User avatar
uwestoehr
Veteran
Posts: 4961
Joined: Sun Jan 27, 2019 3:21 am
Location: Germany
Contact:

Re: problems with the new CIs

Post by uwestoehr »

Here is another example: this PR https://github.com/FreeCAD/FreeCAD/pull/8359 from @wandererfan on which the new CIs fail while the GitLab CI passes and the error is not about the PR.
abdullah
Veteran
Posts: 4935
Joined: Sun May 04, 2014 3:16 pm
Contact:

Re: problems with the new CIs

Post by abdullah »

uwestoehr wrote: Sun Feb 05, 2023 4:42 pm I experimented now around with them and my latest big PR: https://github.com/FreeCAD/FreeCAD/pull/8355
And cannot find a problem with the PR or why the CIs behave different on every trigger run.
A Draft Test is failing for the Ubuntu 20.04, while it passes for the for the Ubuntu 22.04.

When the test sometimes passes and sometimes not, it may point towards a change (not necessarily of your PR) that leads to a dangling pointer or similar. Then it really depends on too many things whether it should fail or not.

Then it may also relate to a bug in a dependency that has been fixed between Ubuntu 20.04 and Ubuntu 22.04. This I would say is less likely.
uwestoehr wrote: Sun Feb 05, 2023 4:42 pm I also don't understand the difference between the CI called "Build2004 and "Build2204".
Ubuntu 20.04 and 22.04
uwestoehr wrote: Sun Feb 05, 2023 4:42 pm Besides this, the new CIs suffer from this problem: https://github.com/FreeCAD/FreeCAD/pull ... 1416129895 , see also tis forum thread: viewtopic.php?p=657274#p657274
The strange thing is that when you let the CIs run on the same code, you sometimes get this error or not.
The first one is also the Draft module complaining about the same kind of error, that the object no longer exists, but an attribute is being accessed. It might relate to not having increased a reference counter when passing the object to Python... Maybe in 22.04 the memory is released a little bit later, enabling the attribute to be accessed...I am for sure hypothesizing while trying to be helpful.
wmayer
Founder
Posts: 20245
Joined: Thu Feb 19, 2009 10:32 am
Contact:

Re: problems with the new CIs

Post by wmayer »

The new CIs however make me crazy since they sometimes fail, in the next rigger run, they succeed.
When looking through the log there are two issues:

In line 1099ff you will find this failure:

Code: Select all

Traceback (most recent call last):
  File "/usr/local/Mod/Draft/draftutils/todo.py", line 139, in doTasks
    f(arg)
  File "/usr/local/Mod/Draft/draftmake/make_clone.py", line 129, in <lambda>
    ToDo.delay(lambda col: setattr(cl.ViewObject, "DiffuseColor", col),
ReferenceError: Cannot access attribute 'ViewObject' of deleted object
which happens because of using a timer. So, it's very well possible that sometimes this works and sometimes not -- depending on when exactly the timer is triggered. For more details: viewtopic.php?t=75562

Then at the end at line 3783ff you will find that the application has crashed and printed a call stack:

Code: Select all

.Program received signal SIGSEGV, Segmentation fault.
#0  /lib/x86_64-linux-gnu/libc.so.6(+0x43090) [0x7fe785ca8090]
#1  /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so(+0x220a1e) [0x7fe77c98ea1e]
#2  /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so(+0x222626) [0x7fe77c990626]
#3  /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so(+0x222957) [0x7fe77c990957]
#4  /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so(+0x247706) [0x7fe77c9b5706]
#5  /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so(+0x232b9a) [0x7fe77c9a0b9a]
#6  /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so(+0x30c97b) [0x7fe77ca7a97b]
#7  /usr/lib/x86_64-linux-gnu/dri/swrast_dri.so(+0x30ebe4) [0x7fe77ca7cbe4]
#8  0x7fe7548587cd in QSGBatchRenderer::Renderer::renderMergedBatch(QSGBatchRenderer::Batch const*) from /lib/x86_64-linux-gnu/libQt5Quick.so.5+0x3ed
#9  0x7fe75485a0a5 in QSGBatchRenderer::Renderer::renderBatches() from /lib/x86_64-linux-gnu/libQt5Quick.so.5+0x275
#10  0x7fe7548604b2 in QSGBatchRenderer::Renderer::render() from /lib/x86_64-linux-gnu/libQt5Quick.so.5+0x312
#11  0x7fe75484c964 in QSGRenderer::renderScene(QSGBindable const&) from /lib/x86_64-linux-gnu/
...
To me this looks like a graphic driver issue and thus it shows erratic behaviour, too.
User avatar
uwestoehr
Veteran
Posts: 4961
Joined: Sun Jan 27, 2019 3:21 am
Location: Germany
Contact:

Re: problems with the new CIs

Post by uwestoehr »

Many thanks Abdullah and Werner!
abdullah wrote: Sun Feb 05, 2023 4:58 pm
uwestoehr wrote: Sun Feb 05, 2023 4:42 pm I also don't understand the difference between the CI called "Build2004 and "Build2204".
Ubuntu 20.04 and 22.04
@openBrain , could you rename the CIs accordingly that this becomes clear?

wmayer wrote: Sun Feb 05, 2023 5:10 pm In line 1099ff you will find this failure:
...
which happens because of using a timer. So, it's very well possible that sometimes this works and sometimes not -- depending on when exactly the timer is triggered.
This is the mentioned problem: viewtopic.php?p=657274#p657274 Do you have an idea how to fix this?

wmayer wrote: Sun Feb 05, 2023 5:10 pm Then at the end at line 3783ff you will find that the application has crashed and printed a call stack:
...To me this looks like a graphic driver issue and thus it shows erratic behaviour, too.
@openBrain , could you maybe update the graphics drivers on the machines?
wmayer
Founder
Posts: 20245
Joined: Thu Feb 19, 2009 10:32 am
Contact:

Re: problems with the new CIs

Post by wmayer »

This is the mentioned problem: viewtopic.php?p=657274#p657274 Do you have an idea how to fix this?
I don't have an ultimative solution right now. But the current timer is a workaround for another problem and I would try to check if this root problem could be really fixed.
User avatar
uwestoehr
Veteran
Posts: 4961
Joined: Sun Jan 27, 2019 3:21 am
Location: Germany
Contact:

Re: problems with the new CIs

Post by uwestoehr »

wmayer wrote: Sun Feb 05, 2023 6:20 pm
This is the mentioned problem: viewtopic.php?p=657274#p657274 Do you have an idea how to fix this?
I don't have an ultimative solution right now. But the current timer is a workaround for another problem and I would try to check if this root problem could be really fixed.
I think this new PR is an attempt to do so: https://github.com/FreeCAD/FreeCAD/pull/8363
maybe you can have a look?
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: problems with the new CIs

Post by openBrain »

uwestoehr wrote: Sun Feb 05, 2023 4:42 pm I also don't understand the difference between the CI called "Build2004 and "Build2204".
For sake of completeness, 20.04 builds with GCC and 22.04 with Clang. Besides of course each one using its own repo to get the libs.
Besides this, the new CIs suffer from this problem: https://github.com/FreeCAD/FreeCAD/pull ... 1416129895 , see also this forum thread: viewtopic.php?p=657274#p657274
The strange thing is that when you let the CIs run on the same code, you sometimes get this error or not.
This is a bit weird to accuse the CI before the code. The CI just build the code and run the tests. Of course it can suffer specific issues, but generally good to question the code first.
uwestoehr wrote: Sun Feb 05, 2023 4:48 pm Here is another example: this PR https://github.com/FreeCAD/FreeCAD/pull/8359 from @wandererfan on which the new CIs fail
Guess why it fails? Because you merged to master a faulty PR that failed the CI already. As wandererfan rebased on this, of course it fails also. CI shall not be seen as the culprit of this situation IMO.
while the GitLab CI passes and the error is not about the PR.
Of course the Gitlab CI passes. It doesn't run the GUI tests. :?
uwestoehr wrote: Sun Feb 05, 2023 5:17 pm @openBrain , could you rename the CIs accordingly that this becomes clear?
Sure. That won't solve your problem though. ;)
@openBrain , could you maybe update the graphics drivers on the machines?
CI uses virtual frame buffer so I think it's already up-to-date. I'll have a look, but IMO we should also question the code here. ;)
User avatar
uwestoehr
Veteran
Posts: 4961
Joined: Sun Jan 27, 2019 3:21 am
Location: Germany
Contact:

Re: problems with the new CIs

Post by uwestoehr »

openBrain wrote: Sun Feb 05, 2023 6:42 pm This is a bit weird to accuse the CI before the code. The CI just build the code and run the tests. Of course it can suffer specific issues, but generally good to question the code first.
I don't like to blame but to find solutions! Therefore please don't feel offended.

As I understood it the new CIs also run the Gui tests. I do that too locally. But locally I always get the same result, either fails or works.
As was pointed out here, we have an idea what is going on and @Roy_043 just made a PR that could resolve the issue.

Since I like that we have different CIs, can the Gitlab CI also run the GUI tests? @bernd
Sure. That won't solve your problem though. ;)
Thanks, then is is clear what the cryptic numbers stand for ;)
openBrain
Veteran
Posts: 9034
Joined: Fri Nov 09, 2018 5:38 pm
Contact:

Re: problems with the new CIs

Post by openBrain »

uwestoehr wrote: Sun Feb 05, 2023 6:51 pm I don't like to blame but to find solutions! Therefore please don't feel offended.
I don't feel offended at all. :)
Just the CI is a tool trying to catch problems as soon as possible. So IMO when it raises an error, first thought should not be "hey, how can we tweak the CI so there's no more this error?". ;)
As I understood it the new CIs also run the Gui tests. I do that too locally. But locally I always get the same result, either fails or works.
Are you often running the tests several times on the same code locally too?
Also I guess you're using Windows, which may not suffer same issue.
For the problem with the GFX driver, I think I ran the CI more than 100 times when I was fine tuning it and I never saw this, so it may have been introduced recently.
Post Reply