source

NumPy에서 인덱스 배열을 원핫 인코딩 배열로 변환

factcode 2022. 9. 23. 00:03

NumPy에서 인덱스 배열을 원핫 인코딩 배열로 변환

1D 배열의 지수를 지정하면:

a = array([1, 0, 3])

이것을 2D 어레이로 원핫 인코딩합니다.

b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

영점 배열을 만듭니다.b충분한 열이 있어야 합니다.a.max() + 1.
그러면 각 행에 대해i, 를 설정합니다.a[i]까지 제열하다.1.

>>> a = np.array([1, 0, 3])
>>> b = np.zeros((a.size, a.max() + 1))
>>> b[np.arange(a.size), a] = 1

>>> b
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

>>> values = [1, 0, 3]
>>> n_values = np.max(values) + 1
>>> np.eye(n_values)[values]
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.]])

keras 를 사용하고 있는 경우는, 다음과 같은 유틸리티가 짜넣어져 있습니다.

from keras.utils.np_utils import to_categorical   

categorical_labels = to_categorical(int_labels, num_classes=3)

@YXD의 답변과 거의 동일합니다(소스 코드 참조).

도움이 되는 것은 다음과 같습니다.

def one_hot(a, num_classes):
  return np.squeeze(np.eye(num_classes)[a.reshape(-1)])

여기서num_classes는, 수강하고 있는 클래스의 수를 나타냅니다.그래서 만약에a(10000,)의 형상을 가진 벡터를 (10000,C)로 변환합니다.주의:a즉, 0점입니다.one_hot(np.array([0, 1]), 2)줄 것이다[[1, 0], [0, 1]].

네가 원하는 게 바로 그거야

PS: 소스는 시퀀스 모델 - 딥 러닝입니다.ai

numpy의 눈 기능을 사용할 수도 있습니다.

numpy.eye(number of classes)[vector containing the labels]

다음을 사용할 수 있습니다.

예:

import sklearn.preprocessing
a = [1,0,3]
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a)+1))
b = label_binarizer.transform(a)
print('{0}'.format(b))

출력:

[[0 1 0 0]
 [1 0 0 0]
 [0 0 0 1]]

특히, 초기화할 수 있습니다.sklearn.preprocessing.LabelBinarizer()그 결과물이transform희박합니다.

1-핫 인코딩의 경우

   one_hot_encode=pandas.get_dummies(array)

예를들면

코딩을 즐기다

다음 코드를 사용하여 원핫 벡터로 변환할 수 있습니다.

x는 0 ~ 몇 개의 클래스가 있는 단일 컬럼을 가진 일반 클래스 벡터입니다.

import numpy as np
np.eye(x.max()+1)[x]

0이 클래스가 아닌 경우 +1을 삭제합니다.

여기 1차원 벡터를 2차원 원핫 어레이로 변환하는 기능이 있습니다.

#!/usr/bin/env python
import numpy as np

def convertToOneHot(vector, num_classes=None):
    """
    Converts an input 1-D vector of integers into an output
    2-D array of one-hot vectors, where an i'th input value
    of j will set a '1' in the i'th row, j'th column of the
    output array.

    Example:
        v = np.array((1, 0, 4))
        one_hot_v = convertToOneHot(v)
        print one_hot_v

        [[0 1 0 0 0]
         [1 0 0 0 0]
         [0 0 0 0 1]]
    """

    assert isinstance(vector, np.ndarray)
    assert len(vector) > 0

    if num_classes is None:
        num_classes = np.max(vector)+1
    else:
        assert num_classes > 0
        assert num_classes >= np.max(vector)

    result = np.zeros(shape=(len(vector), num_classes))
    result[np.arange(len(vector)), vector] = 1
    return result.astype(int)

다음은 사용 예를 제시하겠습니다.

>>> a = np.array([1, 0, 3])

>>> convertToOneHot(a)
array([[0, 1, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 1]])

>>> convertToOneHot(a, num_classes=10)
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

제 생각에 짧은 대답은 '아니오'입니다.의 보다 일반적인 케이스의 경우n제가 생각해낸 것은 다음과 같습니다.

# For 2-dimensional data, 4 values
a = np.array([[0, 1, 2], [3, 2, 1]])
z = np.zeros(list(a.shape) + [4])
z[list(np.indices(z.shape[:-1])) + [a]] = 1

더 좋은 해결책이 없을까 생각하고 있습니다.마지막 두 줄에 목록을 작성해야 하는 것은 마음에 들지 않습니다.아무튼 제가 치수를 좀 쟀는데timeit그리고 그 사람은numpy베이스(indices/arange)와 반복 버전은 거의 동일하게 동작합니다.

K3-rnc의 훌륭한 답변을 자세히 설명하자면, 보다 일반적인 버전을 다음에 제시하겠습니다.

def onehottify(x, n=None, dtype=float):
    """1-hot encode x with the max value n (computed from data if n is None)."""
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    return np.eye(n, dtype=dtype)[x]

또한 이 방법의 간단한 벤치마크와 YXD가 현재 수용하고 있는 답변에서 얻은 방법을 소개합니다(약간 변경되어 YXD는 1D ndarray에서만 동작하는 것을 제외하고 동일한 API를 제공합니다).

def onehottify_only_1d(x, n=None, dtype=float):
    x = np.asarray(x)
    n = np.max(x) + 1 if n is None else n
    b = np.zeros((len(x), n), dtype=dtype)
    b[np.arange(len(x)), x] = 1
    return b

후자의 방법이 최대 35% 더 빠르지만(MacBook Pro 13 2015), 전자가 더 일반적입니다.

>>> import numpy as np
>>> np.random.seed(42)
>>> a = np.random.randint(0, 9, size=(10_000,))
>>> a
array([6, 3, 7, ..., 5, 8, 6])
>>> %timeit onehottify(a, 10)
188 µs ± 5.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit onehottify_only_1d(a, 10)
139 µs ± 2.78 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

def one_hot(n, class_num, col_wise=True):
  a = np.eye(class_num)[n.reshape(-1)]
  return a.T if col_wise else a

# Column for different hot
print(one_hot(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 8, 7]), 10))
# Row for different hot
print(one_hot(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 8, 7]), 10, col_wise=False))

최근에 같은 종류의 문제에 부딪혔는데, 그 해결방법은 당신이 특정한 형태 안에 있는 숫자를 가지고 있어야만 만족할 수 있다는 것을 알게 되었습니다.예를 들어, 다음 목록을 원핫 인코딩하는 경우:

all_good_list = [0,1,2,3,4]

계속해 주세요.투고된 솔루션은 이미 위에 기재되어 있습니다.하지만 이 데이터를 고려한다면 어떨까요?

problematic_list = [0,23,12,89,10]

위에서 설명한 방법으로 이 작업을 수행하면 90개의 핫 열이 생성됩니다.에는 '아까부터'와 같은 때문입니다.n = np.max(a)+1보다 범용적인 솔루션을 발견하여 공유하고자 합니다.

import numpy as np
import sklearn
sklb = sklearn.preprocessing.LabelBinarizer()
a = np.asarray([1,2,44,3,2])
n = np.unique(a)
sklb.fit(n)
b = sklb.transform(a)

위의 솔루션에서도 같은 제한이 발생하여 도움이 될 수 있기를 바랍니다.

이러한 유형의 부호화는 보통 numpy 배열의 일부입니다.다음과 같은 numpy 어레이를 사용하는 경우:

a = np.array([1,0,3])

그것을 원핫 인코딩으로 변환하는 매우 간단한 방법이 있습니다.

out = (np.arange(4) == a[:,None]).astype(np.float32)

바로 그겁니다.

p는 2nd 어레이가 됩니다.
어떤 값이 연속해서 가장 높은지 알고 싶습니다.이 값을 1로 하고 다른 모든 값을 0으로 합니다.

깔끔하고 쉬운 솔루션:

max_elements_i = np.expand_dims(np.argmax(p, axis=1), axis=1)
one_hot = np.zeros(p.shape)
np.put_along_axis(one_hot, max_elements_i, 1, axis=1)

「」를 하고 있는 tensorflow, 다음이 있습니다.

import tensorflow as tf
import numpy as np

a = np.array([1, 0, 3])
depth = 4
b = tf.one_hot(a, depth)
# <tf.Tensor: shape=(3, 3), dtype=float32, numpy=
# array([[0., 1., 0.],
#        [1., 0., 0.],
#        [0., 0., 0.]], dtype=float32)>

가장 쉬운 솔루션이 조합되어 있습니다.np.take ★★★★★★★★★★★★★★★★★」np.eye

def one_hot(x, depth: int):
  return np.take(np.eye(depth), x, axis=0)

works에서 x어떤 형태든.

위의 답변과 자신의 사용 사례를 바탕으로 작성한 기능의 예를 다음에 나타냅니다.

def label_vector_to_one_hot_vector(vector, one_hot_size=10):
    """
    Use to convert a column vector to a 'one-hot' matrix

    Example:
        vector: [[2], [0], [1]]
        one_hot_size: 3
        returns:
            [[ 0.,  0.,  1.],
             [ 1.,  0.,  0.],
             [ 0.,  1.,  0.]]

    Parameters:
        vector (np.array): of size (n, 1) to be converted
        one_hot_size (int) optional: size of 'one-hot' row vector

    Returns:
        np.array size (vector.size, one_hot_size): converted to a 'one-hot' matrix
    """
    squeezed_vector = np.squeeze(vector, axis=-1)

    one_hot = np.zeros((squeezed_vector.size, one_hot_size))

    one_hot[np.arange(squeezed_vector.size), squeezed_vector] = 1

    return one_hot

label_vector_to_one_hot_vector(vector=[[2], [0], [1]], one_hot_size=3)

numpy 연산자만 사용하여 간단한 함수를 추가합니다.

   def probs_to_onehot(output_probabilities):
        argmax_indices_array = np.argmax(output_probabilities, axis=1)
        onehot_output_array = np.eye(np.unique(argmax_indices_array).shape[0])[argmax_indices_array.reshape(-1)]
        return onehot_output_array

확률 행렬을 입력으로 사용합니다. 예:

[[0.03038822 0.65810204 0.16549407 0.3797123 ] ...[0.02771272 0.2760752 0.3280924 0.33458805]]

그리고 그것은 돌아올 것이다.

[[0 1 0 0] ...[0 0 0 1]]

다음은 차원에 구애받지 않는 독립형 솔루션입니다.

됩니다.arr이 N 배열로 한다.one_hot서, snowledge.one_hot[i_1,...,i_N,c] = 1「」를 의미합니다.arr[i_1,...,i_N] = c 를 입력을 할 수 .np.argmax(one_hot, -1)

def expand_integer_grid(arr, n_classes):
    """

    :param arr: N dim array of size i_1, ..., i_N
    :param n_classes: C
    :returns: one-hot N+1 dim array of size i_1, ..., i_N, C
    :rtype: ndarray

    """
    one_hot = np.zeros(arr.shape + (n_classes,))
    axes_ranges = [range(arr.shape[i]) for i in range(arr.ndim)]
    flat_grids = [_.ravel() for _ in np.meshgrid(*axes_ranges, indexing='ij')]
    one_hot[flat_grids + [arr.ravel()]] = 1
    assert((one_hot.sum(-1) == 1).all())
    assert(np.allclose(np.argmax(one_hot, -1), arr))
    return one_hot

다음 코드를 사용합니다.가장 잘 작동한다.

def one_hot_encode(x):
"""
    argument
        - x: a list of labels
    return
        - one hot encoding matrix (number of labels, number of class)
"""
encoded = np.zeros((len(x), 10))

for idx, val in enumerate(x):
    encoded[idx][val] = 1

return encoded

여기서 찾았어, P.S 링크에 접속할 필요가 없습니다.

Neuraxle 파이프라인 단계 사용:

예시를 설정하다

import numpy as np
a = np.array([1,0,3])
b = np.array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])

실제 변환 실행

from neuraxle.steps.numpy import OneHotEncoder
encoder = OneHotEncoder(nb_columns=4)
b_pred = encoder.transform(a)

효과가 있다고 주장하다

assert b_pred == b

문서에 대한 링크: neurolaxle.steps.numpy.One Hot Encoder

언급URL : https://stackoverflow.com/questions/29831489/convert-array-of-indices-to-one-hot-encoded-array-in-numpy

'source' 카테고리의 다른 글

php 문자열 연결, 성능 (0)	2022.09.23
속성 Javadoc 작성 방법 (0)	2022.09.23
문자열에서 각 단어의 첫 글자를 가져옵니다. (0)	2022.09.23
PDO가 bool(false) 파라미터를 string("?)으로 변환하는 이유는 무엇입니까? (0)	2022.09.23
VueX 스토어가 여러 유닛 테스트에 걸쳐 상태를 유지하는 이유는 무엇입니까? (0)	2022.09.23

현재글NumPy에서 인덱스 배열을 원핫 인코딩 배열로 변환

vuejs2, AngularJS, java, Wordpress, json, MariaDB, sql-server, C, oracle, php, asp.net, spring-boot, MySQL, REACTJS, jQuery, git, JavaScript, Excel, Vuex, Python,

Today :
Yesterday :

factcode

NumPy에서 인덱스 배열을 원핫 인코딩 배열로 변환

NumPy에서 인덱스 배열을 원핫 인코딩 배열로 변환

Neuraxle 파이프라인 단계 사용:

'source' 카테고리의 다른 글

'source'의 다른글

티스토리툴바

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

NumPy에서 인덱스 배열을 원핫 인코딩 배열로 변환

NumPy에서 인덱스 배열을 원핫 인코딩 배열로 변환

Neuraxle 파이프라인 단계 사용:

'source' 카테고리의 다른 글

'source'의 다른글

관련글

티스토리툴바