Overview

Dataset statistics

Number of variables28
Number of observations10000
Missing cells186
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.4 MiB
Average record size in memory147.0 B

Variable types

Categorical10
Numeric6
Boolean12

Alerts

has_materialtype has constant value "True" Constant
material_id has a high cardinality: 1230 distinct values High cardinality
material_group has a high cardinality: 132 distinct values High cardinality
material_group.1 has a high cardinality: 149 distinct values High cardinality
part_desc has a high cardinality: 8301 distinct values High cardinality
Date has a high cardinality: 336 distinct values High cardinality
m_weight is highly correlated with area1 and 1 other fieldsHigh correlation
has_coatings.1 is highly correlated with has_matlspecs.1High correlation
has_matlspecs.1 is highly correlated with has_coatings.1High correlation
area1 is highly correlated with m_weight and 1 other fieldsHigh correlation
area2 is highly correlated with m_weight and 1 other fieldsHigh correlation
area3 is highly correlated with area4High correlation
area4 is highly correlated with area3High correlation
m_weight is highly correlated with area2High correlation
has_coatings.1 is highly correlated with has_matlspecs.1High correlation
has_matlspecs.1 is highly correlated with has_coatings.1High correlation
area1 is highly correlated with area2High correlation
area2 is highly correlated with m_weight and 1 other fieldsHigh correlation
area3 is highly correlated with area4High correlation
area4 is highly correlated with area3High correlation
has_coatings.1 is highly correlated with has_matlspecs.1High correlation
has_matlspecs.1 is highly correlated with has_coatings.1High correlation
area1 is highly correlated with area2High correlation
area2 is highly correlated with area1High correlation
area3 is highly correlated with area4High correlation
area4 is highly correlated with area3High correlation
has_coatings.1 is highly correlated with has_matlspecs.1 and 1 other fieldsHigh correlation
qty_replaced is highly correlated with has_materialtypeHigh correlation
has_matlspecs.1 is highly correlated with has_coatings.1 and 1 other fieldsHigh correlation
has_qspecs is highly correlated with has_materialtypeHigh correlation
has_weldspecs is highly correlated with has_materialtypeHigh correlation
surface_matl is highly correlated with has_materialtypeHigh correlation
surface_matl.1 is highly correlated with has_materialtypeHigh correlation
rig_plant is highly correlated with has_materialtypeHigh correlation
has_coatings is highly correlated with has_materialtypeHigh correlation
has_qspecs.1 is highly correlated with has_materialtypeHigh correlation
has_documents is highly correlated with has_materialtypeHigh correlation
has_weldspecs.1 is highly correlated with has_materialtypeHigh correlation
material_type is highly correlated with has_materialtypeHigh correlation
has_materialtype is highly correlated with has_coatings.1 and 15 other fieldsHigh correlation
material_type.1 is highly correlated with has_materialtypeHigh correlation
has_matlspecs is highly correlated with has_materialtypeHigh correlation
has_documents.1 is highly correlated with has_materialtypeHigh correlation
m_weight is highly correlated with area1 and 1 other fieldsHigh correlation
has_coatings.1 is highly correlated with has_matlspecs.1High correlation
has_matlspecs.1 is highly correlated with has_coatings.1High correlation
has_weldspecs.1 is highly correlated with area4High correlation
area1 is highly correlated with m_weight and 1 other fieldsHigh correlation
area2 is highly correlated with m_weight and 1 other fieldsHigh correlation
area3 is highly correlated with area4High correlation
area4 is highly correlated with has_weldspecs.1 and 1 other fieldsHigh correlation
part_desc is uniformly distributed Uniform
weight has 313 (3.1%) zeros Zeros

Reproduction

Analysis started2022-06-09 13:53:22.548177
Analysis finished2022-06-09 13:53:31.909940
Duration9.36 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

material_id
Categorical

HIGH CARDINALITY

Distinct1230
Distinct (%)12.3%
Missing3
Missing (%)< 0.1%
Memory size78.2 KiB
M1111114811
 
144
M1111133284
 
135
M1111181242
 
101
M182878
 
97
M1111145389
 
95
Other values (1225)
9425 

Length

Max length14
Median length11
Mean length10.35460638
Min length7

Characters and Unicode

Total characters103515
Distinct characters12
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique272 ?
Unique (%)2.7%

Sample

1st row18-151-111
2nd row18-151-111
3rd row18-187-411
4th row18-187-411
5th row18-222-291

Common Values

ValueCountFrequency (%)
M1111114811144
 
1.4%
M1111133284135
 
1.4%
M1111181242101
 
1.0%
M18287897
 
1.0%
M111114538995
 
0.9%
M111112614593
 
0.9%
M15132389
 
0.9%
M111114272688
 
0.9%
M111116187984
 
0.8%
M111113115180
 
0.8%
Other values (1220)8991
89.9%

Length

2022-06-09T13:53:31.971024image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
m1111114811144
 
1.4%
m1111133284135
 
1.4%
m1111181242101
 
1.0%
m18287897
 
1.0%
m111114538995
 
1.0%
m111112614593
 
0.9%
m15132389
 
0.9%
m111114272688
 
0.9%
m111116187984
 
0.8%
m111113115180
 
0.8%
Other values (1220)8991
89.9%

Most occurring characters

ValueCountFrequency (%)
152052
50.3%
M9845
 
9.5%
25848
 
5.6%
45567
 
5.4%
55191
 
5.0%
75174
 
5.0%
35127
 
5.0%
84698
 
4.5%
94581
 
4.4%
64320
 
4.2%
Other values (2)1112
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number92558
89.4%
Uppercase Letter9883
 
9.5%
Dash Punctuation1074
 
1.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
152052
56.2%
25848
 
6.3%
45567
 
6.0%
55191
 
5.6%
75174
 
5.6%
35127
 
5.5%
84698
 
5.1%
94581
 
4.9%
64320
 
4.7%
Uppercase Letter
ValueCountFrequency (%)
M9845
99.6%
D38
 
0.4%
Dash Punctuation
ValueCountFrequency (%)
-1074
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common93632
90.5%
Latin9883
 
9.5%

Most frequent character per script

Common
ValueCountFrequency (%)
152052
55.6%
25848
 
6.2%
45567
 
5.9%
55191
 
5.5%
75174
 
5.5%
35127
 
5.5%
84698
 
5.0%
94581
 
4.9%
64320
 
4.6%
-1074
 
1.1%
Latin
ValueCountFrequency (%)
M9845
99.6%
D38
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII103515
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
152052
50.3%
M9845
 
9.5%
25848
 
5.6%
45567
 
5.4%
55191
 
5.0%
75174
 
5.0%
35127
 
5.0%
84698
 
4.5%
94581
 
4.4%
64320
 
4.2%
Other values (2)1112
 
1.1%

rig_plant
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
ZW1S0
8571 
EWHG
1429 

Length

Max length5
Median length5
Mean length4.8571
Min length4

Characters and Unicode

Total characters48571
Distinct characters8
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowZW1S0
2nd rowZW1S0
3rd rowZW1S0
4th rowZW1S0
5th rowEWHG

Common Values

ValueCountFrequency (%)
ZW1S08571
85.7%
EWHG1429
 
14.3%

Length

2022-06-09T13:53:32.082292image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-09T13:53:32.187075image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
zw1s08571
85.7%
ewhg1429
 
14.3%

Most occurring characters

ValueCountFrequency (%)
W10000
20.6%
Z8571
17.6%
18571
17.6%
S8571
17.6%
08571
17.6%
E1429
 
2.9%
H1429
 
2.9%
G1429
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter31429
64.7%
Decimal Number17142
35.3%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
W10000
31.8%
Z8571
27.3%
S8571
27.3%
E1429
 
4.5%
H1429
 
4.5%
G1429
 
4.5%
Decimal Number
ValueCountFrequency (%)
18571
50.0%
08571
50.0%

Most occurring scripts

ValueCountFrequency (%)
Latin31429
64.7%
Common17142
35.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
W10000
31.8%
Z8571
27.3%
S8571
27.3%
E1429
 
4.5%
H1429
 
4.5%
G1429
 
4.5%
Common
ValueCountFrequency (%)
18571
50.0%
08571
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII48571
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
W10000
20.6%
Z8571
17.6%
18571
17.6%
S8571
17.6%
08571
17.6%
E1429
 
2.9%
H1429
 
2.9%
G1429
 
2.9%

qty_replaced
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1
5656 
0
4344 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters10000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
15656
56.6%
04344
43.4%

Length

2022-06-09T13:53:32.269724image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-09T13:53:32.364685image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
15656
56.6%
04344
43.4%

Most occurring characters

ValueCountFrequency (%)
15656
56.6%
04344
43.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number10000
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
15656
56.6%
04344
43.4%

Most occurring scripts

ValueCountFrequency (%)
Common10000
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
15656
56.6%
04344
43.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII10000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
15656
56.6%
04344
43.4%

m_weight
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct973
Distinct (%)9.8%
Missing90
Missing (%)0.9%
Infinite0
Infinite (%)0.0%
Mean43144.16178
Minimum0
Maximum325000
Zeros56
Zeros (%)0.6%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-06-09T13:53:32.465413image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile127
Q11803
median14868.5
Q389722
95-th percentile103300
Maximum325000
Range325000
Interquartile range (IQR)87919

Descriptive statistics

Standard deviation44161.50452
Coefficient of variation (CV)1.023580079
Kurtosis-0.831130803
Mean43144.16178
Median Absolute Deviation (MAD)14770.5
Skewness0.4405112038
Sum427558643.3
Variance1950238482
MonotonicityNot monotonic
2022-06-09T13:53:32.597153image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
102000259
 
2.6%
95350162
 
1.6%
88900144
 
1.4%
97852135
 
1.4%
101500119
 
1.2%
68100101
 
1.0%
8400097
 
1.0%
10743595
 
0.9%
10021095
 
0.9%
10560093
 
0.9%
Other values (963)8610
86.1%
(Missing)90
 
0.9%
ValueCountFrequency (%)
056
0.6%
2.23
 
< 0.1%
2.352
 
< 0.1%
4.31
 
< 0.1%
57
 
0.1%
61
 
< 0.1%
8.72
 
< 0.1%
106
 
0.1%
10.0012
 
< 0.1%
10.63
 
< 0.1%
ValueCountFrequency (%)
3250005
 
0.1%
1962979
 
0.1%
14050019
 
0.2%
11656931
 
0.3%
11625047
0.5%
10743595
0.9%
10582617
 
0.2%
10560093
0.9%
10546415
 
0.1%
10455749
0.5%

material_type
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
HALB
9828 
 
90
ROH
 
47
FERT
 
32
ZCOP
 
3

Length

Max length4
Median length4
Mean length3.9683
Min length1

Characters and Unicode

Total characters39683
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHALB
2nd rowHALB
3rd rowHALB
4th rowHALB
5th rowHALB

Common Values

ValueCountFrequency (%)
HALB9828
98.3%
90
 
0.9%
ROH47
 
0.5%
FERT32
 
0.3%
ZCOP3
 
< 0.1%

Length

2022-06-09T13:53:32.731473image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-09T13:53:32.853951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
halb9828
99.2%
roh47
 
0.5%
fert32
 
0.3%
zcop3
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
H9875
24.9%
A9828
24.8%
L9828
24.8%
B9828
24.8%
90
 
0.2%
R79
 
0.2%
O50
 
0.1%
F32
 
0.1%
E32
 
0.1%
T32
 
0.1%
Other values (3)9
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter39593
99.8%
Space Separator90
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H9875
24.9%
A9828
24.8%
L9828
24.8%
B9828
24.8%
R79
 
0.2%
O50
 
0.1%
F32
 
0.1%
E32
 
0.1%
T32
 
0.1%
Z3
 
< 0.1%
Other values (2)6
 
< 0.1%
Space Separator
ValueCountFrequency (%)
90
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin39593
99.8%
Common90
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
H9875
24.9%
A9828
24.8%
L9828
24.8%
B9828
24.8%
R79
 
0.2%
O50
 
0.1%
F32
 
0.1%
E32
 
0.1%
T32
 
0.1%
Z3
 
< 0.1%
Other values (2)6
 
< 0.1%
Common
ValueCountFrequency (%)
90
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII39683
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H9875
24.9%
A9828
24.8%
L9828
24.8%
B9828
24.8%
90
 
0.2%
R79
 
0.2%
O50
 
0.1%
F32
 
0.1%
E32
 
0.1%
T32
 
0.1%
Other values (3)9
 
< 0.1%

material_group
Categorical

HIGH CARDINALITY

Distinct132
Distinct (%)1.3%
Missing90
Missing (%)0.9%
Memory size78.2 KiB
A-A05-SWA
2130 
99
1862 
A-S22-TRW
803 
9999
611 
M-T03-UWA
425 
Other values (127)
4079 

Length

Max length9
Median length9
Mean length7.376185671
Min length2

Characters and Unicode

Total characters73098
Distinct characters38
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)0.2%

Sample

1st rowM-T03-UWA
2nd rowM-T03-UWA
3rd row99
4th row99
5th row9999

Common Values

ValueCountFrequency (%)
A-A05-SWA2130
21.3%
991862
18.6%
A-S22-TRW803
 
8.0%
9999611
 
6.1%
M-T03-UWA425
 
4.2%
M-C02-00A353
 
3.5%
A-S15-TRS314
 
3.1%
A-T03-RUN266
 
2.7%
M-H02-THA234
 
2.3%
M-H03-HAA227
 
2.3%
Other values (122)2685
26.9%

Length

2022-06-09T13:53:32.964805image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
a-a05-swa2130
21.5%
991862
18.8%
a-s22-trw803
 
8.1%
9999611
 
6.2%
m-t03-uwa425
 
4.3%
m-c02-00a353
 
3.6%
a-s15-trs314
 
3.2%
a-t03-run266
 
2.7%
m-h02-tha234
 
2.4%
m-h03-haa227
 
2.3%
Other values (123)2688
27.1%

Most occurring characters

ValueCountFrequency (%)
-14871
20.3%
A11555
15.8%
08462
11.6%
96171
8.4%
S4516
 
6.2%
W3726
 
5.1%
T3053
 
4.2%
52640
 
3.6%
M2509
 
3.4%
22320
 
3.2%
Other values (28)13275
18.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter34973
47.8%
Decimal Number23251
31.8%
Dash Punctuation14871
20.3%
Space Separator3
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A11555
33.0%
S4516
 
12.9%
W3726
 
10.7%
T3053
 
8.7%
M2509
 
7.2%
H1895
 
5.4%
R1790
 
5.1%
C1109
 
3.2%
U816
 
2.3%
O727
 
2.1%
Other values (16)3277
 
9.4%
Decimal Number
ValueCountFrequency (%)
08462
36.4%
96171
26.5%
52640
 
11.4%
22320
 
10.0%
31729
 
7.4%
11496
 
6.4%
4417
 
1.8%
810
 
< 0.1%
65
 
< 0.1%
71
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
-14871
100.0%
Space Separator
ValueCountFrequency (%)
3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common38125
52.2%
Latin34973
47.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
A11555
33.0%
S4516
 
12.9%
W3726
 
10.7%
T3053
 
8.7%
M2509
 
7.2%
H1895
 
5.4%
R1790
 
5.1%
C1109
 
3.2%
U816
 
2.3%
O727
 
2.1%
Other values (16)3277
 
9.4%
Common
ValueCountFrequency (%)
-14871
39.0%
08462
22.2%
96171
16.2%
52640
 
6.9%
22320
 
6.1%
31729
 
4.5%
11496
 
3.9%
4417
 
1.1%
810
 
< 0.1%
65
 
< 0.1%
Other values (2)4
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII73098
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
-14871
20.3%
A11555
15.8%
08462
11.6%
96171
8.4%
S4516
 
6.2%
W3726
 
5.1%
T3053
 
4.2%
52640
 
3.6%
M2509
 
3.4%
22320
 
3.2%
Other values (28)13275
18.2%

surface_matl
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
True
5445 
False
4555 
ValueCountFrequency (%)
True5445
54.4%
False4555
45.6%
2022-06-09T13:53:33.080449image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_coatings
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing3
Missing (%)< 0.1%
Memory size78.2 KiB
False
8625 
True
1372 
(Missing)
 
3
ValueCountFrequency (%)
False8625
86.2%
True1372
 
13.7%
(Missing)3
 
< 0.1%
2022-06-09T13:53:33.333162image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_documents
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
True
9482 
False
 
518
ValueCountFrequency (%)
True9482
94.8%
False518
 
5.2%
2022-06-09T13:53:33.417422image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_matlspecs
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
False
9972 
True
 
28
ValueCountFrequency (%)
False9972
99.7%
True28
 
0.3%
2022-06-09T13:53:33.499465image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_weldspecs
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
FALSE
9984 
TRUE
 
14
?
 
2

Length

Max length5
Median length5
Mean length4.9978
Min length1

Characters and Unicode

Total characters49978
Distinct characters9
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFALSE
2nd rowFALSE
3rd rowFALSE
4th rowFALSE
5th rowFALSE

Common Values

ValueCountFrequency (%)
FALSE9984
99.8%
TRUE14
 
0.1%
?2
 
< 0.1%

Length

2022-06-09T13:53:33.588184image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-09T13:53:33.696312image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
false9984
99.8%
true14
 
0.1%
2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
E9998
20.0%
F9984
20.0%
A9984
20.0%
L9984
20.0%
S9984
20.0%
T14
 
< 0.1%
R14
 
< 0.1%
U14
 
< 0.1%
?2
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter49976
> 99.9%
Other Punctuation2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E9998
20.0%
F9984
20.0%
A9984
20.0%
L9984
20.0%
S9984
20.0%
T14
 
< 0.1%
R14
 
< 0.1%
U14
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
?2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin49976
> 99.9%
Common2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
E9998
20.0%
F9984
20.0%
A9984
20.0%
L9984
20.0%
S9984
20.0%
T14
 
< 0.1%
R14
 
< 0.1%
U14
 
< 0.1%
Common
ValueCountFrequency (%)
?2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII49978
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E9998
20.0%
F9984
20.0%
A9984
20.0%
L9984
20.0%
S9984
20.0%
T14
 
< 0.1%
R14
 
< 0.1%
U14
 
< 0.1%
?2
 
< 0.1%

has_qspecs
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
False
9200 
True
 
800
ValueCountFrequency (%)
False9200
92.0%
True800
 
8.0%
2022-06-09T13:53:33.787443image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

weight
Real number (ℝ≥0)

ZEROS

Distinct629
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean39.9274899
Minimum0
Maximum11355
Zeros313
Zeros (%)3.1%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-06-09T13:53:33.899966image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.01
Q10.1
median0.3
Q32
95-th percentile95.3
Maximum11355
Range11355
Interquartile range (IQR)1.9

Descriptive statistics

Standard deviation299.3141348
Coefficient of variation (CV)7.496442564
Kurtosis576.1314944
Mean39.9274899
Median Absolute Deviation (MAD)0.28
Skewness19.94402167
Sum399274.899
Variance89588.95127
MonotonicityNot monotonic
2022-06-09T13:53:34.040394image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.11956
19.6%
0.01561
 
5.6%
0.2509
 
5.1%
0.5436
 
4.4%
1428
 
4.3%
0.22389
 
3.9%
0313
 
3.1%
0.3208
 
2.1%
2200
 
2.0%
0.4188
 
1.9%
Other values (619)4812
48.1%
ValueCountFrequency (%)
0313
3.1%
0.0034
 
< 0.1%
0.0042
 
< 0.1%
0.0092
 
< 0.1%
0.01561
5.6%
0.0111
 
< 0.1%
0.02119
 
1.2%
0.0253
 
< 0.1%
0.03111
 
1.1%
0.0490
 
0.9%
ValueCountFrequency (%)
113551
< 0.1%
101942
< 0.1%
82261
< 0.1%
58201
< 0.1%
44541
< 0.1%
44331
< 0.1%
44001
< 0.1%
43401
< 0.1%
40001
< 0.1%
37441
< 0.1%

material_type.1
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
HALB
9884 
ROH
 
78
ZCOP
 
35
FERT
 
3

Length

Max length4
Median length4
Mean length3.9922
Min length3

Characters and Unicode

Total characters39922
Distinct characters12
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowHALB
2nd rowHALB
3rd rowHALB
4th rowHALB
5th rowHALB

Common Values

ValueCountFrequency (%)
HALB9884
98.8%
ROH78
 
0.8%
ZCOP35
 
0.4%
FERT3
 
< 0.1%

Length

2022-06-09T13:53:34.167712image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-09T13:53:34.275314image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
halb9884
98.8%
roh78
 
0.8%
zcop35
 
0.4%
fert3
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
H9962
25.0%
A9884
24.8%
L9884
24.8%
B9884
24.8%
O113
 
0.3%
R81
 
0.2%
Z35
 
0.1%
C35
 
0.1%
P35
 
0.1%
F3
 
< 0.1%
Other values (2)6
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter39922
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
H9962
25.0%
A9884
24.8%
L9884
24.8%
B9884
24.8%
O113
 
0.3%
R81
 
0.2%
Z35
 
0.1%
C35
 
0.1%
P35
 
0.1%
F3
 
< 0.1%
Other values (2)6
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin39922
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
H9962
25.0%
A9884
24.8%
L9884
24.8%
B9884
24.8%
O113
 
0.3%
R81
 
0.2%
Z35
 
0.1%
C35
 
0.1%
P35
 
0.1%
F3
 
< 0.1%
Other values (2)6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII39922
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
H9962
25.0%
A9884
24.8%
L9884
24.8%
B9884
24.8%
O113
 
0.3%
R81
 
0.2%
Z35
 
0.1%
C35
 
0.1%
P35
 
0.1%
F3
 
< 0.1%
Other values (2)6
 
< 0.1%

material_group.1
Categorical

HIGH CARDINALITY

Distinct149
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
O-F03-000
2951 
O-S04-ST0
1054 
O-F02-000
981 
F-S04-STA
460 
O-M01-000
 
344
Other values (144)
4210 

Length

Max length9
Median length9
Mean length8.8513
Min length2

Characters and Unicode

Total characters88513
Distinct characters37
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)0.2%

Sample

1st rowO-S05-000
2nd rowO-S05-000
3rd rowM-L04-SS0
4th rowO-F03-000
5th rowO-S04-ST0

Common Values

ValueCountFrequency (%)
O-F03-0002951
29.5%
O-S04-ST01054
 
10.5%
O-F02-000981
 
9.8%
F-S04-STA460
 
4.6%
O-M01-000344
 
3.4%
O-C08-000326
 
3.3%
M-L04-SS0306
 
3.1%
O-S05-000248
 
2.5%
O-C06-000239
 
2.4%
O-S04-ME0239
 
2.4%
Other values (139)2852
28.5%

Length

2022-06-09T13:53:34.371665image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
o-f03-0002951
29.5%
o-s04-st01054
 
10.5%
o-f02-000981
 
9.8%
f-s04-sta460
 
4.6%
o-m01-000344
 
3.4%
o-c08-000326
 
3.3%
m-l04-ss0306
 
3.1%
o-s05-000248
 
2.5%
o-c06-000239
 
2.4%
o-s04-me0239
 
2.4%
Other values (139)2852
28.5%

Most occurring characters

ValueCountFrequency (%)
029321
33.1%
-19531
22.1%
O7429
 
8.4%
S5634
 
6.4%
F4718
 
5.3%
M3732
 
4.2%
33442
 
3.9%
42815
 
3.2%
T1848
 
2.1%
21253
 
1.4%
Other values (27)8790
 
9.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number39724
44.9%
Uppercase Letter29258
33.1%
Dash Punctuation19531
22.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
O7429
25.4%
S5634
19.3%
F4718
16.1%
M3732
12.8%
T1848
 
6.3%
C1198
 
4.1%
A1061
 
3.6%
L532
 
1.8%
G489
 
1.7%
E440
 
1.5%
Other values (16)2177
 
7.4%
Decimal Number
ValueCountFrequency (%)
029321
73.8%
33442
 
8.7%
42815
 
7.1%
21253
 
3.2%
11152
 
2.9%
6571
 
1.4%
9537
 
1.4%
8327
 
0.8%
5290
 
0.7%
716
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
-19531
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common59255
66.9%
Latin29258
33.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
O7429
25.4%
S5634
19.3%
F4718
16.1%
M3732
12.8%
T1848
 
6.3%
C1198
 
4.1%
A1061
 
3.6%
L532
 
1.8%
G489
 
1.7%
E440
 
1.5%
Other values (16)2177
 
7.4%
Common
ValueCountFrequency (%)
029321
49.5%
-19531
33.0%
33442
 
5.8%
42815
 
4.8%
21253
 
2.1%
11152
 
1.9%
6571
 
1.0%
9537
 
0.9%
8327
 
0.6%
5290
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII88513
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
029321
33.1%
-19531
22.1%
O7429
 
8.4%
S5634
 
6.4%
F4718
 
5.3%
M3732
 
4.2%
33442
 
3.9%
42815
 
3.2%
T1848
 
2.1%
21253
 
1.4%
Other values (27)8790
 
9.9%

surface_matl.1
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
True
9573 
False
 
427
ValueCountFrequency (%)
True9573
95.7%
False427
 
4.3%
2022-06-09T13:53:34.476115image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_materialtype
Boolean

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
True
10000 
ValueCountFrequency (%)
True10000
100.0%
2022-06-09T13:53:34.555652image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_coatings.1
Boolean

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
False
6768 
True
3232 
ValueCountFrequency (%)
False6768
67.7%
True3232
32.3%
2022-06-09T13:53:34.635305image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_documents.1
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
False
7581 
True
2419 
ValueCountFrequency (%)
False7581
75.8%
True2419
 
24.2%
2022-06-09T13:53:34.721419image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_matlspecs.1
Boolean

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
False
5943 
True
4057 
ValueCountFrequency (%)
False5943
59.4%
True4057
40.6%
2022-06-09T13:53:34.815532image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_weldspecs.1
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
False
9875 
True
 
125
ValueCountFrequency (%)
False9875
98.8%
True125
 
1.2%
2022-06-09T13:53:34.912042image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

has_qspecs.1
Boolean

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size9.9 KiB
False
9728 
True
 
272
ValueCountFrequency (%)
False9728
97.3%
True272
 
2.7%
2022-06-09T13:53:35.001962image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

area1
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct186
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75.4982
Minimum1
Maximum227
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-06-09T13:53:35.109790image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median41
Q3155
95-th percentile198
Maximum227
Range226
Interquartile range (IQR)154

Descriptive statistics

Standard deviation75.98282922
Coefficient of variation (CV)1.006419083
Kurtosis-1.448357323
Mean75.4982
Median Absolute Deviation (MAD)40
Skewness0.4084646976
Sum754982
Variance5773.390336
MonotonicityNot monotonic
2022-06-09T13:53:35.247905image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13424
34.2%
3495
 
5.0%
166408
 
4.1%
168368
 
3.7%
145335
 
3.4%
34263
 
2.6%
123243
 
2.4%
32213
 
2.1%
41179
 
1.8%
167175
 
1.8%
Other values (176)3897
39.0%
ValueCountFrequency (%)
13424
34.2%
226
 
0.3%
3495
 
5.0%
45
 
0.1%
523
 
0.2%
66
 
0.1%
813
 
0.1%
92
 
< 0.1%
101
 
< 0.1%
1121
 
0.2%
ValueCountFrequency (%)
2271
 
< 0.1%
2265
 
0.1%
2253
 
< 0.1%
2249
 
0.1%
2232
 
< 0.1%
22240
0.4%
22043
0.4%
2194
 
< 0.1%
2171
 
< 0.1%
21618
0.2%

area2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct41
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean17.094
Minimum1
Maximum43
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-06-09T13:53:35.382384image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median17
Q331
95-th percentile38
Maximum43
Range42
Interquartile range (IQR)30

Descriptive statistics

Standard deviation14.50449593
Coefficient of variation (CV)0.8485138601
Kurtosis-1.560779662
Mean17.094
Median Absolute Deviation (MAD)16
Skewness0.1045796381
Sum170940
Variance210.380402
MonotonicityNot monotonic
2022-06-09T13:53:35.503365image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
13424
34.2%
331061
 
10.6%
25554
 
5.5%
2521
 
5.2%
16487
 
4.9%
30427
 
4.3%
17355
 
3.5%
35351
 
3.5%
41304
 
3.0%
36292
 
2.9%
Other values (31)2224
22.2%
ValueCountFrequency (%)
13424
34.2%
2521
 
5.2%
328
 
0.3%
46
 
0.1%
537
 
0.4%
611
 
0.1%
859
 
0.6%
92
 
< 0.1%
101
 
< 0.1%
1123
 
0.2%
ValueCountFrequency (%)
4360
 
0.6%
4281
 
0.8%
41304
3.0%
4029
 
0.3%
3923
 
0.2%
38103
 
1.0%
37123
 
1.2%
36292
2.9%
35351
3.5%
347
 
0.1%

area3
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct439
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean238.9678
Minimum1
Maximum556
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-06-09T13:53:35.636419image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q182
median218
Q3422
95-th percentile509
Maximum556
Range555
Interquartile range (IQR)340

Descriptive statistics

Standard deviation178.5097755
Coefficient of variation (CV)0.7470034685
Kurtosis-1.373218026
Mean238.9678
Median Absolute Deviation (MAD)150
Skewness0.3276961124
Sum2389678
Variance31865.73994
MonotonicityNot monotonic
2022-06-09T13:53:35.772141image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
106887
 
8.9%
496450
 
4.5%
71430
 
4.3%
60372
 
3.7%
2355
 
3.5%
509271
 
2.7%
3258
 
2.6%
224248
 
2.5%
111174
 
1.7%
105173
 
1.7%
Other values (429)6382
63.8%
ValueCountFrequency (%)
13
 
< 0.1%
2355
3.5%
3258
2.6%
464
 
0.6%
51
 
< 0.1%
644
 
0.4%
712
 
0.1%
871
 
0.7%
928
 
0.3%
1029
 
0.3%
ValueCountFrequency (%)
5561
 
< 0.1%
54910
 
0.1%
5486
 
0.1%
54660
0.6%
54551
0.5%
5446
 
0.1%
5438
 
0.1%
54213
 
0.1%
5412
 
< 0.1%
5402
 
< 0.1%

area4
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct93
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.5133
Minimum1
Maximum101
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size78.2 KiB
2022-06-09T13:53:36.109244image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q121
median44
Q374
95-th percentile88
Maximum101
Range100
Interquartile range (IQR)53

Descriptive statistics

Standard deviation29.21666279
Coefficient of variation (CV)0.6419368138
Kurtosis-1.318299814
Mean45.5133
Median Absolute Deviation (MAD)23
Skewness0.1786769073
Sum455133
Variance853.6133844
MonotonicityNot monotonic
2022-06-09T13:53:36.250308image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
241041
 
10.4%
21863
 
8.6%
88772
 
7.7%
1616
 
6.2%
86453
 
4.5%
19429
 
4.3%
23293
 
2.9%
46251
 
2.5%
10250
 
2.5%
2249
 
2.5%
Other values (83)4783
47.8%
ValueCountFrequency (%)
1616
6.2%
2249
2.5%
67
 
0.1%
743
 
0.4%
815
 
0.1%
966
 
0.7%
10250
2.5%
1236
 
0.4%
1320
 
0.2%
146
 
0.1%
ValueCountFrequency (%)
1011
 
< 0.1%
97156
1.6%
9618
 
0.2%
9520
 
0.2%
9410
 
0.1%
9327
 
0.3%
923
 
< 0.1%
9118
 
0.2%
902
 
< 0.1%
8911
 
0.1%

part_desc
Categorical

HIGH CARDINALITY
UNIFORM

Distinct8301
Distinct (%)83.0%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
ltr product guaranteed shorthole max change tilting bracket enclosure chassis actual functional variety Casing head or Wellhead
 
3
date difference truck the mobilizing verticallyup Drill string
 
3
combustion levers superior rear system level actuation springloaded mowing products fluids
 
3
stateoftheart cooled independent repair like strategically low motorplanetary joystick blast spooling allow opened shifts metric optimal switch page Blowout preventer (BOP) Pipe ram & blind ram
 
3
stable rods making comfort multiple along mounting alters Drill string
 
3
Other values (8296)
9985 

Length

Max length242
Median length160
Mean length113.4972
Min length30

Characters and Unicode

Total characters1134972
Distinct characters45
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6704 ?
Unique (%)67.0%

Sample

1st rowltr product guaranteed shorthole max change tilting bracket enclosure chassis actual functional variety Casing head or Wellhead
2nd rowbeacon carried diesel support center induction directs rotating horizontal locations places quality offers Casing head or Wellhead
3rd rowdriller direct performer work trusted major risk tight hard even operational traverse www Drill floor
4th rowcenter tine strategically sabotagefree premium travel brake length oil comls near fingerboard Rotary table
5th rowltr product guaranteed shorthole max change tilting bracket enclosure chassis actual functional variety Casing head or Wellhead

Common Values

ValueCountFrequency (%)
ltr product guaranteed shorthole max change tilting bracket enclosure chassis actual functional variety Casing head or Wellhead3
 
< 0.1%
date difference truck the mobilizing verticallyup Drill string3
 
< 0.1%
combustion levers superior rear system level actuation springloaded mowing products fluids 3
 
< 0.1%
stateoftheart cooled independent repair like strategically low motorplanetary joystick blast spooling allow opened shifts metric optimal switch page Blowout preventer (BOP) Pipe ram & blind ram3
 
< 0.1%
stable rods making comfort multiple along mounting alters Drill string3
 
< 0.1%
operate functionality ever configured kit equipped enough chassis core rigid stainless expectations trademark provides position options Bell nipple3
 
< 0.1%
wash modular crown markets fully mud stro keeping Drill floor3
 
< 0.1%
times relatively shortly challenge staff vertically needs machines after string fitted complete layout emptying 3
 
< 0.1%
maneuverability tramming remembers handle incorporates modular decades 3
 
< 0.1%
requires balance contents electric list sensing secionts gauge hoist lifted are shift environmental width 3
 
< 0.1%
Other values (8291)9970
99.7%

Length

2022-06-09T13:53:36.401582image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drill2643
 
1.8%
pipe1243
 
0.8%
ram1166
 
0.8%
bop1109
 
0.7%
preventer1109
 
0.7%
blowout1109
 
0.7%
floor1103
 
0.7%
or762
 
0.5%
string749
 
0.5%
head738
 
0.5%
Other values (1341)137053
92.1%

Most occurring characters

ValueCountFrequency (%)
140666
12.4%
e112827
 
9.9%
i79735
 
7.0%
r76494
 
6.7%
a74084
 
6.5%
t73584
 
6.5%
n67322
 
5.9%
l66118
 
5.8%
o65353
 
5.8%
s58132
 
5.1%
Other values (35)320657
28.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter975059
85.9%
Space Separator140666
 
12.4%
Uppercase Letter13740
 
1.2%
Open Punctuation2342
 
0.2%
Close Punctuation2342
 
0.2%
Other Punctuation583
 
0.1%
Dash Punctuation240
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e112827
11.6%
i79735
 
8.2%
r76494
 
7.8%
a74084
 
7.6%
t73584
 
7.5%
n67322
 
6.9%
l66118
 
6.8%
o65353
 
6.7%
s58132
 
6.0%
c41317
 
4.2%
Other values (16)260093
26.7%
Uppercase Letter
ValueCountFrequency (%)
B3344
24.3%
D2201
16.0%
P1692
12.3%
S1472
10.7%
R1171
 
8.5%
O1109
 
8.1%
C775
 
5.6%
W640
 
4.7%
A526
 
3.8%
M336
 
2.4%
Other values (4)474
 
3.4%
Space Separator
ValueCountFrequency (%)
140666
100.0%
Open Punctuation
ValueCountFrequency (%)
(2342
100.0%
Close Punctuation
ValueCountFrequency (%)
)2342
100.0%
Other Punctuation
ValueCountFrequency (%)
&583
100.0%
Dash Punctuation
ValueCountFrequency (%)
-240
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin988799
87.1%
Common146173
 
12.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e112827
11.4%
i79735
 
8.1%
r76494
 
7.7%
a74084
 
7.5%
t73584
 
7.4%
n67322
 
6.8%
l66118
 
6.7%
o65353
 
6.6%
s58132
 
5.9%
c41317
 
4.2%
Other values (30)273833
27.7%
Common
ValueCountFrequency (%)
140666
96.2%
(2342
 
1.6%
)2342
 
1.6%
&583
 
0.4%
-240
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1134972
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
140666
12.4%
e112827
 
9.9%
i79735
 
7.0%
r76494
 
6.7%
a74084
 
6.5%
t73584
 
6.5%
n67322
 
5.9%
l66118
 
5.8%
o65353
 
5.8%
s58132
 
5.1%
Other values (35)320657
28.3%

Date
Categorical

HIGH CARDINALITY

Distinct336
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
12-11-2012
 
47
9-22-2012
 
45
3-3-2012
 
44
2-18-2012
 
43
10-10-2012
 
42
Other values (331)
9779 

Length

Max length10
Median length9
Mean length8.9293
Min length8

Characters and Unicode

Total characters89293
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row11-5-2012
2nd row12-23-2012
3rd row11-14-2012
4th row9-15-2012
5th row5-22-2012

Common Values

ValueCountFrequency (%)
12-11-201247
 
0.5%
9-22-201245
 
0.4%
3-3-201244
 
0.4%
2-18-201243
 
0.4%
10-10-201242
 
0.4%
2-15-201242
 
0.4%
9-18-201242
 
0.4%
10-5-201241
 
0.4%
6-16-201241
 
0.4%
2-1-201241
 
0.4%
Other values (326)9572
95.7%

Length

2022-06-09T13:53:36.532415image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
12-11-201247
 
0.5%
9-22-201245
 
0.4%
3-3-201244
 
0.4%
2-18-201243
 
0.4%
10-10-201242
 
0.4%
2-15-201242
 
0.4%
9-18-201242
 
0.4%
2-1-201241
 
0.4%
7-7-201241
 
0.4%
6-16-201241
 
0.4%
Other values (326)9572
95.7%

Most occurring characters

ValueCountFrequency (%)
226003
29.1%
-20000
22.4%
118847
21.1%
011548
12.9%
61947
 
2.2%
31923
 
2.2%
51907
 
2.1%
41891
 
2.1%
81890
 
2.1%
71851
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number69293
77.6%
Dash Punctuation20000
 
22.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
226003
37.5%
118847
27.2%
011548
16.7%
61947
 
2.8%
31923
 
2.8%
51907
 
2.8%
41891
 
2.7%
81890
 
2.7%
71851
 
2.7%
91486
 
2.1%
Dash Punctuation
ValueCountFrequency (%)
-20000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common89293
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
226003
29.1%
-20000
22.4%
118847
21.1%
011548
12.9%
61947
 
2.2%
31923
 
2.2%
51907
 
2.1%
41891
 
2.1%
81890
 
2.1%
71851
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII89293
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
226003
29.1%
-20000
22.4%
118847
21.1%
011548
12.9%
61947
 
2.2%
31923
 
2.2%
51907
 
2.1%
41891
 
2.1%
81890
 
2.1%
71851
 
2.1%

Interactions

2022-06-09T13:53:29.848065image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:26.432412image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.114465image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.766732image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.401178image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:29.030711image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:29.959891image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:26.555286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.227605image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.875855image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.511150image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:29.143580image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:30.065923image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:26.668432image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.339897image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.982157image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.617236image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:29.253850image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:30.169239image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:26.778771image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.444213image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.083645image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.718426image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:29.359022image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:30.272194image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:26.889864image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.550006image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.186927image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.820048image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:29.467355image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:30.379883image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.002331image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:27.658299image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.292773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:28.925077image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-09T13:53:29.741973image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-09T13:53:36.653194image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-09T13:53:36.892049image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-09T13:53:37.135136image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-09T13:53:37.368757image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-06-09T13:53:37.587962image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-09T13:53:30.605238image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-09T13:53:31.343307image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-06-09T13:53:31.595211image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-06-09T13:53:31.728766image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

material_idrig_plantqty_replacedm_weightmaterial_typematerial_groupsurface_matlhas_coatingshas_documentshas_matlspecshas_weldspecshas_qspecsweightmaterial_type.1material_group.1surface_matl.1has_materialtypehas_coatings.1has_documents.1has_matlspecs.1has_weldspecs.1has_qspecs.1area1area2area3area4part_descDate
018-151-111ZW1S01121.0HALBM-T03-UWATrueFalseTrueFalseFALSEFalse0.200HALBO-S05-000TrueTrueFalseFalseFalseFalseFalse421742ltr product guaranteed shorthole max change tilting bracket enclosure chassis actual functional variety Casing head or Wellhead11-5-2012
118-151-111ZW1S01121.0HALBM-T03-UWATrueFalseTrueFalseFALSEFalse0.030HALBO-S05-000TrueTrueFalseFalseFalseFalseFalse421743575beacon carried diesel support center induction directs rotating horizontal locations places quality offers Casing head or Wellhead12-23-2012
218-187-411ZW1S01480.0HALB99TrueFalseTrueFalseFALSEFalse0.900HALBM-L04-SS0TrueTrueTrueFalseTrueFalseFalse441849585driller direct performer work trusted major risk tight hard even operational traverse www Drill floor11-14-2012
318-187-411ZW1S01480.0HALB99TrueFalseTrueFalseFALSEFalse5.700HALBO-F03-000TrueTrueFalseFalseFalseFalseFalse441832563center tine strategically sabotagefree premium travel brake length oil comls near fingerboard Rotary table9-15-2012
418-222-291EWHG02650.0HALB9999TrueFalseTrueFalseFALSEFalse4.700HALBO-S04-ST0TrueTrueTrueFalseTrueFalseFalse511824849ltr product guaranteed shorthole max change tilting bracket enclosure chassis actual functional variety Casing head or Wellhead5-22-2012
518-222-291ZW1S012650.0HALB9999TrueFalseTrueFalseFALSEFalse8.500HALBM-L04-SS0TrueTrueTrueFalseTrueFalseFalse511819942opposed regressed purge unattended maneuverability within variable major utilizing toughest from pending components mobilizing decrease centralizer contact range representation Bell nipple7-25-2012
618-222-291ZW1S012650.0HALB9999TrueFalseTrueFalseFALSEFalse85.001HALB99TrueTrueTrueTrueTrueFalseFalse511853093adjust fluids energy positioning working upgrade staff hold injuries from support brake ring shift rotary northern aqhq holes Drill floor6-21-2012
718-222-291ZW1S012650.0HALB9999TrueFalseTrueFalseFALSEFalse4.700HALBO-S04-ST0TrueTrueTrueFalseTrueFalseFalse511824849transport lcs well hoist special featuring professional brakes torque can larger super raises easy circuit ensuring been umx8-28-2012
818-316-151ZW1S01440.0HALB9999TrueFalseTrueFalseFALSEFalse1.000HALBO-S05-000TrueTrueTrueFalseTrueFalseFalse1272742974raise announce tractor functionality gives innovative eliminating reach fluids norac increase breached bodies shortly casing forward metric Bell nipple2-25-2012
918-316-161ZW1S011460.0HALBM-T03-XTAFalseFalseTrueFalseFALSEFalse0.100HALBO-F03-000TrueTrueTrueFalseTrueFalseFalse1272710624norac production expanded spt combined cable point presenting tractors actuator tricone comlx sabotagefree jaw maneuverability following lattice further braking Bell nipple4-3-2012

Last rows

material_idrig_plantqty_replacedm_weightmaterial_typematerial_groupsurface_matlhas_coatingshas_documentshas_matlspecshas_weldspecshas_qspecsweightmaterial_type.1material_group.1surface_matl.1has_materialtypehas_coatings.1has_documents.1has_matlspecs.1has_weldspecs.1has_qspecs.1area1area2area3area4part_descDate
9990M7111114166ZW1S007959.0HALB99TrueFalseTrueFalseFALSEFalse1.19HALBF-S04-STATrueTrueFalseTrueFalseFalseFalse992550988replacement dramatically spaces remain warranty amongst however each uniquely insert allowing Casing head or Wellhead4-11-2012
9991M7111114166ZW1S007959.0HALB99TrueFalseTrueFalseFALSEFalse2.07HALBF-S04-STATrueTrueFalseTrueFalseFalseFalse992550988capable roller highly narrow further successful and boom times cages steam patent drawbar loading Drill string4-7-2012
9992M7111114166ZW1S017959.0HALB99TrueFalseTrueFalseFALSEFalse0.10HALBO-S04-ST0TrueTrueFalseFalseFalseFalseFalse99257121liter unit requirement eliminates storage comlff lpm predetermined tricone expanded cover when intuitive transmitting recovery fully tells movement longyears pilot Blowout preventer (BOP) Annular type9-24-2012
9993M7111114166ZW1S017959.0HALB99TrueFalseTrueFalseFALSEFalse0.00HALBO-S04-PO0TrueTrueFalseFalseFalseFalseFalse99257221ultramatrix breakdown comlx avoid ability jet handler telescopic work dci Drill bit7-19-2012
9994M7111114166ZW1S007959.0HALB99TrueFalseTrueFalseFALSEFalse1.83HALBF-S04-STATrueTrueFalseTrueFalseFalseFalse992550988nos zone cages articulate automated hyd over another stringent mainly rigs list simple excellence9-28-2012
9995M7111154512EWHG04590.0HALB99TrueFalseTrueFalseFALSETrue0.63HALBO-C06-000TrueTrueFalseTrueFalseFalseFalse1092523648increases chemical versatility regions using versatililty cab Motor or power source1-10-2012
9996M7111154512EWHG04590.0HALB99TrueFalseTrueFalseFALSETrue0.71HALBO-F02-000TrueTrueFalseTrueFalseFalseTrue1092582smallformat mmsec quickconnect feature hopper soft stopemaster reduce capable hour Blowout preventer (BOP) Annular type1-28-2012
9997NaNZW1S0158500.0HALB99FalseFalseTrueFalse?False0.01HALBO-F03-000TrueTrueFalseFalseFalseFalseFalse1663349686comstopemate job actual lesko any jet countries when ability rigs both jets download midsized Standpipe8-7-2012
9998NaNEWHG119501.0HALBO-C04-000FalseFalseTrueFalseFALSEFalse0.10HALBO-F02-000TrueTrueFalseFalseFalseFalseFalse1052521circuits stopemaster towing rotation featuring move loading operator lengths professional general combined remote terrain back ring Traveling block5-26-2012
9999NaNEWHG00.0HALBA-S22-TRWTrueFalseTrueFalseFALSEFalse1.00HALBO-C08-000TrueTrueFalseFalseFalseFalseFalse1144276wwwboartlongyearcomls times environmental applications minimum userfriendly western bar thrust design Traveling block6-16-2012