久草资源在线,嫩草嫩草嫩草嫩草四区,精品尤物一区

本篇是利用 Python 和 PyTorch 處理面向?qū)ο蟮臄?shù)據(jù)集系列博客的第 2 篇。

如需閱讀第 1 篇：原始數(shù)據(jù)和數(shù)據(jù)集，請參閱此處。

我們在第 1 部分中已定義 MyDataset 類，現(xiàn)在，讓我們來例化 MyDataset 對象

此可迭代對象是與原始數(shù)據(jù)交互的接口，在整個(gè)訓(xùn)練過程中都有巨大作用。

第 2 部分：創(chuàng)建數(shù)據(jù)集對象

■輸入［9］：

mydataset = MyDataset（isValSet_bool = None， raw_data_path = raw_data_path， norm = False， resize = True， newsize = （64， 64））

以下是該對象的一些使用示例：

■輸入［10］：

# 對象操作示例。

# 此操作用于調(diào)用 method __getitem__ 并從第 6 個(gè)樣本獲取標(biāo)簽

mydataset［6］［1］

■輸出［10］：

■輸入［11］：

# 此操作用于在類聲明后打印注釋

MyDataset.__doc__

■輸出［11］：

‘Interface class to raw data， providing the total number of samples in the dataset and a preprocessed item’

■輸入［12］：

# 此操作用于調(diào)用 method __len__

len（mydataset）

■輸出［12］：

49100

■輸入［13］：

# 此操作用于觸發(fā) method __str__

print（mydataset）

原始數(shù)據(jù)路徑為。/raw_data/data_images/《raw samples》

可迭代對象的重要性

訓(xùn)練期間，將向模型提供多批次樣本?？傻?mydataset 是獲得高級輕量代碼的關(guān)鍵。

以下提供了可迭代對象的 2 個(gè)使用示例。

示例 1：

我們可以直接獲取第 3 個(gè)樣本張量：

■輸入［14］：

mydataset.__getitem__（3）［0］.shape

■輸出［14］：

torch.Size（［3， 64， 64］）

與以下操作作用相同

■輸入［15］：

mydataset［3］［0］.shape

■輸出［15］：

torch.Size（［3， 64， 64］）

示例 2：

我們可以對文件夾中的圖像進(jìn)行解析，并移除黑白圖像：

■輸入［］：

# 數(shù)據(jù)集訪問示例：創(chuàng)建 1 個(gè)包含標(biāo)簽的新文件，移除黑白圖像

if os.path.exists（raw_data_path + ‘/’+ “l(fā)abels_new.txt”）：

os.remove（raw_data_path + ‘/’+ “l(fā)abels_new.txt”）

with open（raw_data_path + ‘/’+ “l(fā)abels_new.txt”， “a”） as myfile：

for item， info in mydataset：

if item ！= None：

if item.shape［0］==1：

# os.remove（raw_data_path + ‘/’ + info.SampleName）

print（‘C = {}; H = {}; W = {}; info = {}’.format（item.shape［0］， item.shape［1］， item.shape［2］， info））

else：

#print（info.SampleName + ‘ ’ + str（info.SampleLabel））

myfile.write（info.SampleName + ‘ ’ + str（info.SampleLabel） + ‘ ’）

■輸入［］：

# 查找具有非期望格式的樣本

with open（raw_data_path + ‘/’+ “l(fā)abels.txt”， “a”） as myfile：

for item， info in mydataset：

if item ！= None：

if item.shape［0］！=3：

# os.remove（raw_data_path + ‘/’ + info.SampleName）

print（‘C = {}; H = {}; W = {}; info = {}’.format（item.shape［0］， item.shape［1］， item.shape［2］， info））

修改標(biāo)簽文件后，請務(wù)必更新緩存：

■輸入［］：

if os.path.exists（raw_data_path + ‘/’+ “l(fā)abels_new.txt”）：

os.rename（raw_data_path + ‘/’+ “l(fā)abels.txt”， raw_data_path + ‘/’+ “l(fā)abels_orig.txt”）

os.rename（raw_data_path + ‘/’+ “l(fā)abels_new.txt”， raw_data_path + ‘/’+ “l(fā)abels.txt”）

@functools.lru_cache（1）

def getSampleInfoList（raw_data_path）：

sample_list = ［］

with open（str（raw_data_path） + ‘/labels.txt’， “r”） as f：

reader = csv.reader（f， delimiter = ‘ ’）

for i， row in enumerate（reader）：

imgname = row［0］

label = int（row［1］）

sample_list.append（DataInfoTuple（imgname， label））

sample_list.sort（reverse=False， key=myFunc）

return sample_list

del mydataset

mydataset = MyDataset（isValSet_bool = None， raw_data_path = ‘。./。./raw_data/data_images’， norm = False）

len（mydataset）

您可通過以下鏈接閱讀了解有關(guān) PyTorch 中的可迭代數(shù)據(jù)庫的更多信息：

https://pytorch.org/docs/stable/data.html

歸一化

應(yīng)對所有樣本張量計(jì)算平均值和標(biāo)準(zhǔn)差。

如果數(shù)據(jù)集較小，可以嘗試在內(nèi)存中對其進(jìn)行直接操作：使用 torch.stack 即可創(chuàng)建 1 個(gè)包含所有樣本張量的棧。

可迭代對象 mydataset 支持簡潔精美的代碼。

使用“view”即可保留 R、G 和 B 這 3 個(gè)通道，并將其余所有維度合并為 1 個(gè)維度。

使用“mean”即可計(jì)算維度 1 的每個(gè)通道的平均值。

請參閱附件中有關(guān) dim 使用的說明。

■輸入［16］：

imgs = torch.stack（［img_t for img_t， _ in mydataset］， dim = 3）

■輸入［17］：

#im_mean = imgs.view（3， -1）.mean（dim=1）.tolist（）

im_mean = imgs.view（3， -1）.mean（dim=1）

im_mean

■輸出［17］：

tensor（［0.4735， 0.4502， 0.4002］）

■輸入［18］：

im_std = imgs.view（3， -1）.std（dim=1）.tolist（）

im_std

■輸出［18］：

［0.28131285309791565， 0.27447444200515747， 0.2874436378479004］

■輸入［19］：

normalize = transforms.Normalize（mean=［0.4735， 0.4502， 0.4002］， std=［0.28131， 0.27447， 0.28744］）

# free memory

del imgs

下面，我們將再次構(gòu)建數(shù)據(jù)集對象，但這次將對此對象進(jìn)行歸一化：

■輸入［21］：

mydataset = MyDataset（isValSet_bool = None， raw_data_path = raw_data_path， norm = True， resize = True， newsize = （64， 64））

由于采用了歸一化，因此張量值被轉(zhuǎn)換至范圍 0..1 之內(nèi)，并進(jìn)行剪切操作。

■輸入［22］：

original = Image.open（‘。./。./raw_data/data_images/img_00009111.JPEG’）

fig， axs = plt.subplots（1， 2， figsize=（10， 3））

axs［0］.set_title（‘clipped tensor’）

axs［0］.imshow（mydataset［5］［0］.permute（1，2，0））

axs［1］.set_title（‘original PIL image’）

axs［1］.imshow（original）

plt.show（）

將輸入數(shù)據(jù)剪切到含 RGB 數(shù)據(jù)的 imshow 的有效范圍內(nèi)，以［0..1］表示浮點(diǎn)值，或者以［0..255］表示整數(shù)值。

使用 torchvision.transforms

進(jìn)行預(yù)處理

現(xiàn)在，我們已經(jīng)創(chuàng)建了自己的變換函數(shù)或?qū)ο螅ㄔ居米鳛榧铀賹W(xué)習(xí)曲線的練習(xí)），我建議使用 Torch 模塊 torchvision.transforms：

“此模塊定義了一組可組合式類函數(shù)對象，這些對象可作為實(shí)參傳遞到數(shù)據(jù)集（如 torchvision.CIFAR10），并在加載數(shù)據(jù)后 __getitem__ 返回?cái)?shù)據(jù)之前，對數(shù)據(jù)執(zhí)行變換”。

以下列出了可能的變換：

■輸入［23］：

from torchvision import transforms

dir（transforms）

■輸出［23］：

［‘CenterCrop’，

‘ColorJitter’，

‘Compose’，

‘FiveCrop’，

‘Grayscale’，

‘Lambda’，

‘LinearTransformation’，

‘Normalize’，

‘Pad’，

‘RandomAffine’，

‘RandomApply’，

‘RandomChoice’，

‘RandomCrop’，

‘RandomErasing’，

‘RandomGrayscale’，

‘RandomHorizontalFlip’，

‘RandomOrder’，

‘RandomPerspective’，

‘RandomResizedCrop’，

‘RandomRotation’，

‘RandomSizedCrop’，

‘RandomVerticalFlip’，

‘Resize’，

‘Scale’，

‘TenCrop’，

‘ToPILImage’，

‘ToTensor’，

‘__builtins__’，

‘__cached__’，

‘__doc__’，

‘__file__’，

‘__loader__’，

‘__name__’，

‘__package__’，

‘__path__’，

‘__spec__’，

‘functional’，

‘transforms’］

在此示例中，我們使用變換來執(zhí)行了以下操作：

1） ToTensor - 從 PIL 圖像轉(zhuǎn)換為張量，并將輸出格式定義為 CxHxW

2） Normalize - 將張量歸一化

責(zé)任編輯：haq

聲明：本文內(nèi)容及配圖由入駐作者撰寫或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問題，請聯(lián)系本站處理。舉報(bào)投訴

數(shù)據(jù)

數(shù)據(jù)

+關(guān)注

關(guān)注
8

文章
7347

瀏覽量
94998
python

python

+關(guān)注

關(guān)注
58

文章
4882

瀏覽量
90285
pytorch

pytorch

+關(guān)注

關(guān)注
2

文章
813

瀏覽量
14917

原文標(biāo)題：開發(fā)者分享 | 利用 Python 和 PyTorch 處理面向?qū)ο蟮臄?shù)據(jù)集 - 2：創(chuàng)建數(shù)據(jù)集對象

文章出處：【微信號：FPGA-EETrend，微信公眾號：FPGA開發(fā)圈】歡迎添加關(guān)注！文章轉(zhuǎn)載請注明出處。

哈哈哈哈哈操欧洲电影,久草网在线,亚洲久久熟女熟妇视频,麻豆精品色,久久福利在线视频,日韩中文字幕的,淫乱毛视频一区,亚洲成人一二三,中文人妻日韩精品电影

搜索歷史

利用Python和PyTorch處理面向?qū)ο蟮臄?shù)據(jù)集

評論