How to handle encoding issues for labels - Label names with umlauts not handled properly by STEP export

I'm trying build a STEP file. While building content of the XCAF document I'm assigning strings to labels. After loading the exported STEP in CAD Assistant I can see that some of the labels contain strange characters that were originally umlaut characters. I guess this is some kind of encoding issue regarding label strings.

How can I fix this encoding issue, i.e. how is this done properly ?

My code is build from various samples I found. It's python using pythonocc-core=7.7.2 but I guess it is compatible with C++ writings.

# Create XDE document
app = TDocStd_Application()
binxcafdrivers.DefineFormat(app)
doc = TDocStd_Document(f"example")
app.NewDocument("BinXCAF", doc)

# Tools
shape_tool = XCAFDoc_DocumentTool.ShapeTool(doc.Main())  # XCAFDoc_ShapeTool
color_tool = XCAFDoc_DocumentTool.ColorTool(doc.Main())  # XCAFDoc_ColorTool

compound = TopoDS_Compound()
brep_builder = BRep_Builder()
brep_builder.MakeCompound(compound)

shape = BRepPrimAPI_MakeBox(10,10,10).Shape()
label = shape_tool.AddShape(shape, False)

# Set custom name on label ( i.e. the shape it refers to )
TDataStd_Name.Set(label, "Some text with umlauts äöü") # FIXME this causes encoding issues

# add instance of a prototype to our compound shape
brep_builder.Add(compound, shape)

compund_label = shape_tool.AddShape(compound, True)

# Set custom name on label ( i.e. the shape it refers to )
TDataStd_Name.Set(compund_label, "compound")

# Initialize the STEP exporter
step_writer = STEPCAFControl_Writer()

# To make sub-shape names work, we have to turn on the following static
# variable of OpenCascade.
Interface_Static.SetIVal("write.stepcaf.subshapes.name", 1)

Interface_Static.SetCVal("write.step.schema", "AP214")
Interface_Static.SetCVal("write.step.product.name", "my product")

# transfer compound shape and write STEP file
step_writer.Transfer(doc, STEPControl_AsIs)
status = step_writer.Write("compound_with_umlaut_label.step")

if status != IFSelect_RetDone:
    raise AssertionError("write failed")
Manuel Koch's picture

The resulting STEP file ( extension renamed to allow upload of attachment )

Manuel Koch's picture

Attached the resulting STEP file

Dmitrii Pasukhin's picture

The problem is constructor of TCollection_ExtendedString

You need to create first of all TCollection_ExtendedString

TCollection_ExtendedString aString("СпецСимвол", true);
TDataStd_Name.Set(label, aString);

Default contsurctor from char* make no UTF support.

Best regards,  Dmtrii.

Manuel Koch's picture

Thank you for this suggestion. I guess I don't know how to use pythonocc-core properly, trying the following by adjusting my sample code from above

extended_string = TCollection_ExtendedString("Some text with umlauts äöü", True)
TDataStd_Name.Set(label, extended_string)

yields the following error when executing it ( though my code looks like the first suggested signature of the possible methods )

TypeError: Wrong number or type of arguments for overloaded function 'TDataStd_Name_Set'.
  Possible C/C++ prototypes are:
    TDataStd_Name::Set(TDF_Label const &,TCollection_ExtendedString)
    TDataStd_Name::Set(TDF_Label const &,Standard_GUID const &,TCollection_ExtendedString)
    TDataStd_Name::Set(TCollection_ExtendedString)
Dmitrii Pasukhin's picture

I have no idea, but looks like you can create 'on place'

TDataStd_Name.Set(label, TCollection_ExtendedString("Some text with umlauts äöü", True))

Best regards, Dmitrii.

Manuel Koch's picture

I tried that too, it yields the same runtime error.

Manuel Koch's picture

pythonocc-core fixed the issue https://github.com/tpaviot/pythonocc-core/issues/1278.

Thank you Dmitri again for your proposed solution!

gkv311 n's picture

You may also do conversion through TCollection_AsciiString in case if you can't update PythonOCC.

TDataStd_Name.Set(label, TCollection_ExtendedString(TCollection_AsciiString("Some text with umlauts äöü")))