.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/plot_abnormality_scores.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_abnormality_scores.py: ============== Anomaly scores ============== calculating anomaly scores .. GENERATED FROM PYTHON SOURCE LINES 10-20 .. code-block:: Python import numpy as np import matplotlib.pyplot as plt from sknormod import MassUnivNormodOLS from sknormod.datasets import make_gaussian from sknormod.plotting import plot_scatter_with_lines from sknormod.atypicality import multi_atypicality_scorer, multi_roc_auc_score from sklearn.model_selection import train_test_split .. GENERATED FROM PYTHON SOURCE LINES 21-24 Generate synthetic data ----------------------- We generate synthetic data for multiple variables across a range of ages. .. GENERATED FROM PYTHON SOURCE LINES 24-35 .. code-block:: Python n_subj = 2000 # Number of subjects n_cols = 200 # Number of variables age = np.linspace(0, 100, n_subj) X = np.column_stack((age, age**2)) # Age and age squared as features # Generate synthetic Y data for each variable Y = np.column_stack([make_gaussian(n_subj, interpolate_mu = np.random.normal(loc=100, scale=2, size=3))[1] for _ in range(n_cols)]) .. GENERATED FROM PYTHON SOURCE LINES 36-37 train test split and generate test data .. GENERATED FROM PYTHON SOURCE LINES 37-52 .. code-block:: Python X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3) n_test = X_test.shape[0] diag_labels = np.array([0]*(n_test//2) + [1]*(n_test//2)) num_cols = 50 rows_to_modify = np.where(diag_labels == 1)[0] # For each row that meets the condition, subtract 0.1 from a random column for row in rows_to_modify: # Randomly select a column index col_index = np.random.choice(num_cols) # Subtract 0.1 from the selected column in the current row Y_test[row, col_index] -= 3 .. GENERATED FROM PYTHON SOURCE LINES 53-55 Fit and get z-scores and abnormality scores ------------------------------------------- .. GENERATED FROM PYTHON SOURCE LINES 55-70 .. code-block:: Python # Fit MassUnivNormodOLS model mass_normod = MassUnivNormodOLS() mass_normod.fit(X_train, Y_train) # Calculate z-scores Z_test = mass_normod.transform_to_z(X_test, Y_test) # Calculate abnormality scores atypicality_scores = multi_atypicality_scorer(Z_test) # Check how well they discirminate between groups auc_scores = multi_roc_auc_score(diag_labels, atypicality_scores) for key, value, in auc_scores.items(): print(f"{key}: {value:.2f}") .. rst-class:: sphx-glr-script-out .. code-block:: none mean_z: 0.56 mean_abs_z: 0.57 count<-3: 0.70 count>3: 0.51 count_abs>3: 0.68 max_z: 0.54 min_z: 0.74 max_abs_z: 0.72 trimmed_mean_top_z: 0.55 .. GENERATED FROM PYTHON SOURCE LINES 71-73 Ploting ------- .. GENERATED FROM PYTHON SOURCE LINES 73-101 .. code-block:: Python n_columns = len(atypicality_scores) # Number of rows and columns for subplots n_rows = (n_columns + 2) // 3 n_cols = min(n_columns, 3) # Create a figure with subplots (3 columns) fig, axes = plt.subplots(n_rows, n_cols, figsize=(12, 4 * n_rows), tight_layout=True) # Flatten the axes array axes = axes.flatten() # Plot a histogram for each score type in the dictionary for i, (score_name, scores) in enumerate(atypicality_scores.items()): ax = axes[i] min_value = np.min(scores) max_value = np.max(scores) ax.hist(scores[diag_labels == 0], range=(min_value, max_value), color="blue", alpha=0.5, label='Label 0') ax.hist(scores[diag_labels == 1], range=(min_value, max_value), color="orange", alpha=0.5, label='Label 1') ax.set_title(f'Histogram of {score_name}') ax.set_xlabel('Score') ax.set_ylabel('Frequency') # Hide empty subplots if there are fewer scores than subplots for ax in axes[n_columns:]: ax.axis('off') # Show the plot plt.show() .. image-sg:: /auto_examples/images/sphx_glr_plot_abnormality_scores_001.png :alt: Histogram of mean_z, Histogram of mean_abs_z, Histogram of count<-3, Histogram of count>3, Histogram of count_abs>3, Histogram of max_z, Histogram of min_z, Histogram of max_abs_z, Histogram of trimmed_mean_top_z :srcset: /auto_examples/images/sphx_glr_plot_abnormality_scores_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.470 seconds) .. _sphx_glr_download_auto_examples_plot_abnormality_scores.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_abnormality_scores.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_abnormality_scores.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_