# Part 6: Learning

Posted under

## Table of Content

Learning is defined by the psychologist as the process of acquiring new and relatively enduring information or behaviors. There are generally three types of learning, classical conditioning (CC), operant conditioning (OC), and cognitive learning. These three guys, as well as their connection, are the topic today.

## Basic Learning Concepts

Learning is basically the process of association. For classical conditioning, it associates two stimuli1. All the animals can exhibit classical conditioning. It is crucial because by doing so, the animals can anticipate the immediate future and survive. For operant conditioning, it associates a behavior and its consequence, so that by repeating the acts, animals can obtain rewards or avoid punishment. It is commonly seen in complex animals. For cognitive learning, it is an even more complex mental process guiding our behavior. Observational learning belongs to this category. It lets us learn from other's experiences.

## Classical Conditioning (CC)

Classical conditioning associates two stimuli1. It is the most basic form of learning. It grants organisms the ability to adapt to their environment. Animals can gain food, avoid dangers, locate mates and produce offspring thanks to it.

There are five terms that are pre-requisite to understanding classical conditioning. Let us explain them in the context of Pavlov's2 classic experiment, in which the food, a tone, and a dog are presented.

A dog intrinsically has no interest in the tone of a ring. The tone can hardly elicit a meaningful response from the dog. So the tone here is called an neutral stimulus (NS).

NS\,(tone)\longrightarrow no\,\,salivation

The food can lead the dog to salivate. It is an instinct of dogs that does not require learning. The fond is called an unconditioned stimulus (US). The response of salivation is called an unconditioned response (UR).

US\,(food)\longrightarrow UR\,(salivation)

The experimenter trains the dog by first ringing the tone, followed by delivering the food, the dog gradually learns to associate the tone with the arrival of food. This training process is called "conditioning".

NS\,(tone) + US\,(food)\longrightarrow UR\,(salivation)

After the conditioning, when the dog hears the tone, it will salivate no matter whether the food is presented or not. The tone here is called the conditioned stimulus (CS), and the salivation the conditioned response (CR).

CS\,(tone)\longrightarrow CR\,(salivation)

Bear these terms in mind, we can move on to several processes.

#### Five Processes

Acquisition is the initial stage of classical conditioning, during which the animal links an NS with an US. The time elapsed between presenting the NS and US usually cannot be too long. The longer the elapse, the weaker the links. Half a second usually works well. During the acquisition, the volume of salivary it emits when hearing sequential tones keep increasing, indicating that the dog learned the conditioning.

Extinction is the diminished response that occurs when the learned CS (tone in our example) no longer signals the soon-to-be-arrival US (food). During the extinction, the later the fake tones in the sequence, the less salivation it can trigger, indicating that the dog somehow "forgets" the conditioning.

Simultaneous Recovery is an interesting stage. After the extinction stage, the salivation no longer occurs and the volume becomes zero. However, when the fake CS is paused for a while and emerges again, the salivation would occur again with a mild volume as if the CS was still remembered. It can even work better than it does during the late stage of previous extinction!

Generalization is the ability that the CR can be triggered by similar CS, like a similar tone in our example.

Discrimination is the opposite of generalization. It is the ability to distinguish between a CS and other irrelevant stimuli.

#### Importance

The importance of classical conditioning is twofold. First, it exists on every kind of animal, from earthworms to people, where the responses to stimuli can be classically conditioned in organisms. It is a way for all organisms to adapt to the environment. Second, the research method of Pavlov used to reveal this process shows that learning can be studied objectively and quantitatively. (In our example the saliva of a dog is measured in cubic centimeters.) It isolates the basic building bricks of complicated behaviors, and studies them with objective laboratory procedures. This success has exampled a scientific model for the young discipline of psychology.

## Operant Conditioning (OC)

By classical conditioning, we can teach a dog to salivate when hearing the sound of a tone. However, it cannot teach an elephant to walk on its hind legs or a child to say please. Classical conditioning can only establish the association between stimuli that are directly relevant to survive and reproduce, they are the categories to which walking on its hind legs or "saying please" does not belong. To associate such behaviors with their consequences, we resort to operant conditioning.

Like classical conditioning, operant conditioning is also a form of associative learning. The difference is as follows. The former associates stimuli (a CS and the US it signals). It can result in respondent behavior, a kind of action that is automatic responses to a stimulus, such as salivating in response to food or a tone. The latter associate seemingly any actions with consequences, in which reinforcers can strengthen and punishments can suppress the connection. It can result in operate behavior, a kind of action that operates on the environment to produce rewarding or punishing stimuli.

Three perspectives constitute operant conditioning. The concept of shaping, the reinforcers or punishments, and the schedule.

#### Shaping

Shaping is a method to gradually guide an organism's actions toward the desired behavior. By using the successive approximations, we reward responses that are ever closer to the final desired behavior, and ignore all other responses. (I have to say, the loss function in Deep Learning is the immediate thing coming to my mind when I read this.) Let us explain the shaping in steps under the context of conditioning a hungry rat to press a bar.

Step one is watching, to reveal how the animal naturally behaves. Here, moving around randomly in the Skinner box3 is the behavior you found. And you decide to build on this existing one.

Step two is providing reinforcements or punishments. Here, you simply choose the food as a reinforcer. Each time when the rat approaches the bar, you give it a bit of food. So that the behavior of approaching the bar is gradually strengthened. Note that you may also choose the shock as a punishment, so that each time when the rat moves away from the bar, it gets a shock. To avoid being shocked, the rat's behavior of moving around randomly will gradually converge to staying around the bar.

Step three, if necessary, is to build another behavior that is closer to the desired behavior. Here, it would be pressing the bar. We can repeat the same method from Step two. When the rat occasionally presses the bar, the food appears, so that the pressing behavior is strengthened.

Overall, by making rewards contingent on desired behaviors, we can shape complex behaviors on animals, and yes, on each of us, of course. For example, a child's whining annoys his father who is reading a newspaper, so that his father may compromise to the whining and pays attention to the child. In this case, the child's whining behavior is reinforced, because he gets what he wants -- the attention of his father. On the other hand, the father's compromising behavior is also reinforced, because he gets what he wants -- the cease of whining.

Another interesting fact of shaping is that it helps us understand what nonverbal organisms perceive. For example, to understand whether a dog can distinguish red and green, we may attempt to shape them using these two colors. If the dog responds to red but not to green, then we know it can perceive the difference. Researchers even reinforce pigeons for pecking after seeing a human face, but not after seeing other images, and the pigeon's behavior manifests that it can recognize human faces.

#### Reinforcers and Punishers

There are four types of reinforcers and punishers. Below we only list the types of reinforcers. Just keep in mind that they can be directly applied to punishers.

Primary reinforcers are innately reinforcing stimuli. They are unlearned. Getting food when hungry belongs to this type.

Conditioned reinforcers, also called secondary reinforcers, are stimuli that gain the reinforcing power through the association with primary reinforcers. If a rat in a Skinner box4 learns that pressing the button can trigger a food delivery, the rat will work to press the button. Pressing the button has become a conditioned reinforcer. Money, and a good grade, our lives are full of conditioned reinforcers.

Immediate reinforcers are stimuli that are presented immediately after the response. Getting food immediately after tossing a coin into the machine is an immediate reinforcer.

Delayed reinforcers are stimuli that are prolonged to present after the response. Working hard on the coursework in order to obtain a good grade in the final exam is a delayed reinforcer. Learning to control our impulses and delay gratification in order to achieve more valued rewards is a big step toward maturity.

#### Reinforcement and Punishment

Reinforcement strengthens a response. There are positive reinforcement and negative reinforcement. They both belong to reinforcers.

Positive reinforcement strengthens a response by presenting a happy stimulus after that response. Negative reinforcement strengthens a response by reducing or removing an unhappy stimulus after a response.

For the example where a father who was reading newspaper compromises by paying attention to his whining son, the son's
whining is positively reinforced, because he happily obtained his father's attention. The father's compromise is negatively reinforced, because he removed his son's whining -- an aversive stimulus. In short, they are shaping each other.

Punishment restrains a response. There are also positive punishment and negative reinforcement.

Positive punishment restrains a response by presenting an unhappy (even terrible) stimulus after that response. Negative punishment restrains a response by removing a happy stimulus after that response.

Confusing huh? Just remember, positive punishment gives you something you dislike, while negative punishment removes you something you like. For example, giving a traffic ticket for speeding is positive punishment. Revoking a library card for nonpayment of fines is negative punishment.

#### Schedules

Schedules determine how is the reinforcement or punishment given. Different schedules can affect behavior differently. The explanation below will focus on reinforcement, yet the same idea can be applied equivalently to punishment.

In terms of spatial, there are continuous reinforcement and partial reinforcement schedules. The former has $100\%$ probability to reinforce the response. It is the best choice to master a behavior, yet the extinction happens rapidly. The latter has a probability $<100\%$, so that sometimes the response is rewarded but sometimes is not. In this case, the reward is occasionally and unpredictably, and learning is slower to appear, but is also more resistant to extinction. (Is it like the Dropout in Deep learning?)

In terms of temporal, there are fixed-ratio, variable-ratio, fixed-interval, and variable-interval schedules.

• Fixed-ratio schedules reinforce behavior after a particular number of responses. For example, a child is rewarded when he obtains full marks three times.
• Variable-ratio schedules reinforce behavior after an unpredictable number of responses. A gambler usually is in this case.
• Fixed-interval schedules reinforce the response after a fixed time lapse. We may usually start to feel hunger when it is around noon.
• Variable-interval schedules reinforce the response after varied time intervals. A Ph.D. student is usually in this case :(

Generally, a higher response rate is achieved if the reinforcement is linked to the number of responses (a ratio schedule) instead of to time (an interval schedule). An unpredictable schedule (a variable schedule) usually leads to more consistent responding compared to that by a predictable one (a fixed schedule).

#### Importance

Skinner provides us a way to administer external consequences toward human betterment. Operant conditioning is widely applied to every corner of our daily lives, such as in homes, schools, workplaces, sports, and prisons. Yes, anywhere!

In schools, a teacher who takes students' rate of learning into account and has awareness of providing prompt positive reinforcement can benefit both the slow and fast learners, compared to another teacher who provides fixed lecturing. In sports, athletics' performance is shaped by first reinforcing small successes and then progressively increasing the challenge. In workplaces, a boss knowing how to specify achievable behaviors and promptly reinforce a well-done job can increase productivity efficiently. At homes, if the parents understand that a pushy command "Get ready for bed" to children evoking further protests or defiance can lead to mutual shaping (by which the whining/arguing from children and menacing yell/gesture from parents are reinforced), they may be able to avoid the development of such a destructive parent-child relationship.

To each of us, operant conditioning can also help reinforce desired behaviors or extinguish undesired ones. Succinct tips are as follows.

1. State a realistic goal in measurable terms.
2. Decide how, when, and where you will work toward your goal.
3. Monitor how often you engage in your desired behavior.
4. Reinforce the desired behavior.

## Contrasting CC and CO

Let us abbreviate classical conditioning and operant conditioning as CC and OC, respectively.

Basically, CC associates events. OC associates behavior and consequences.

The response of CC is involuntary and automatic. Whereas the response of OC is voluntary and environment-dependent.

Both CC and OC have acquisition, extinction, spontaneous recovery, generalization, and discrimination. But these processes have different complications. The differences are explained as follows.

Suppose that we have applied CC and OC on a dog. For CC, the dog exhibits CR when CS is presented. And for OC, the dog exhibits certain behavior in order to reproduce certain consequences.

• The acquisition for CC pairs NS with US, and for OC it pairs a behavior with a consequence.
• The extinction for CC means a decreased CR when CS was repeatedly presented alone, and for OC it means a decreased behavior when the reinforcement stopped.
• The spontaneous recovery means a previously extinguished CR and behavior reappear after a rest period for CC and OC, respectively.
• The generalization for CC means to respond to stimuli similar to the CS, and for OC it means to reinforce other behaviors that have a similar response to the stimuli.
• The discrimination for CC means the ability to distinguish between a CS and other stimuli that do not signal a US, and for OC it means the ability to distinguish which behavior to reinforce/punish.

## The Constraint of Conditioning

Biology, psychology, and social-culture all have influences on learning. Of these three factors, biology constrains the conditioning in the most fundamental level.

An animal's capability for conditioning is constrained biologically. Each species' predispositions for survival enhancement determines to which association it can learn. Environments are not the whole story.

#### Limits on CC

Two critical questions are relevant to this topic.

Q1: Can a rat link a stimulus (CS) to the sickness (UR) triggered by the radiation (US)?

Q2: If yes, can the stimulus just equally be anything it can perceive? Say, a taste, sight, or sound, all can be conditioned equally well?

To answer these questions, let us first recap several facts on CC.

1. CC can work if and only if the animal can perceive the stimuli to be associated.
2. To establish a CS, a short time-lapse ($\sim 0.5s$) between presenting the US and NS can be the most efficient.

A famous experiment conducted by John Garcia and his colleague answered these questions.

In this experiment, we have:

• Three CS, a taste, sight, and sound.
• One UR, the vomiting.

The goal is twofold, which are exactly corresponding to Q1 and Q2 above, respectively. Startling findings emerged. First, the rat associates the CS to UR after being exposed to the radiation for several hours. This violates Fact 2 because the US took so long to take effect, the rat should not be able to "recall" the CS presented long ago. Second, the CS-to-UR association is only developed on taste, not other candidates.

Both of these two findings made adaptive sense. In order to survive, an animal has to eat. To identify a new food, it simply tastes it, and avoid it if sickened. This is called taste aversion. (That's why it's difficult to eradicate the "bait-shy" rats by poisoning.) The other two candidates are too complex/unreliable to contribute to the long-lasting evolutionary process.

We can therefore answer Q1 and Q2 now. Yes, rats can link a CS to UR triggered by the radiation. (Rats cannot perceive radiation, it is the vomiting that is perceived and linked.) No, stimuli are not conditioned equally well. Those with higher adaptive priority rule.

The findings support Darwin's principle that natural selection favors traits that aid survival and reproduction. Humans did, do, and will readily learn taste aversions, as well as the associations to other bad feelings like anxiety and pain. Humans also tend to associate the color red (Valentine's hearts, red-light districts, and red lipstick) with sexuality just because female primates display red when nearing ovulation.

#### Limits on OC

OC has its own natural limits in a similar way to CC. Species are most easily learn and retain behaviors that reflect their biological predispositions.

Using food as a reinforcer, we can train a hamster to dig, because these belong to the hamster's natural food-searching behaviors. But it is impossible to use food as a reinforcer to shape face washing that isn't biologically associated with food or hunger. In an experiment, pigs trained to pick up wooden coins and deposit them in a piggy bank began to drift back to their natural ways, by dropping the coin on the earth, pushing it with their snouts. This is called instinctive drift. Animals always tend to revert to their biologically predisposed patterns.

## Cognitive Learning

There is more to learning than classical conditioning and operant conditioning. Cognition, including thoughts, perceptions, and expectations, also influence the learning process.

In classical conditioning, thought, not simply the CS-US association counts, especially for humans. Presenting novel cartoon characters to children along with ice cream (delicious) or brussels sprouts (non-delicious), the children tend to like the ice-cream associated characters. The conditioned likes and dislikes are even stronger when people are aware of the learned associations. For example, people receiving therapy for alcohol use disorder usually are given nauseating drugs. It is hopeful that according to the principle of CC, the patients should link alcohol use with nausea so that they won't drink anymore. However, if the patient knows that the nausea is caused by the drug, the treatments can have limited effect.

In operant conditioning, evidence of cognitive process has also come from studying rats in mazes. A rat exploring a maze without obvious rewards can develop a cognitive map, which is a mental representation of the maze. Later when the food is presented as a reward, the rat can find the food as quickly and efficiently as other rats that were reinforced with food for this goal. Children may learn from watching a parent but demonstrate the learning years ago, if needed. This is called latent learning, where the learning is only apparent when there was some incentive to demonstrate it.

In operant conditioning, the cognitive perspective has also shown the limits of rewards:
Rewarding people for an already enjoyable task can backfire. Excessive rewards can destroy intrinsic motivation 5. To establish enduring interest in the desired behavior, people should focus on the intrinsic joys, rather than on the extrinsic motivation 6. Indeed, research suggests that people who concentrate on intrinsic motivation not only do better but eventually enjoy more extrinsic rewards. On the other hand, providing people choices can also enhance their intrinsic motivation.

Example is better than percept.

In observational learning, higher animals can learn without direct experience, by watching and imitating others, thanks to cognition. Observing and imitating others are also called a process of modeling. By watching a model, we actually cognitively and physiologically experience vicarious reinforcement or vicarious punishment. By cognitively it refers to the phenomenon that we learn to anticipate a behavior's consequences for situations that are similar to those we are observing. By physiologically it refers to the evidence from fMRI scans which show that the people observing someone wining a reward manifests an activation of their own reward system, as if they themselves are the winners. Chimpanzees observe and imitate all sorts of novel foraging and tool use behaviors, which are then transmitted to descendants within their local culture.

As a result, cognition also has benefits and backfires. A prosocial (positive and helpful) model can have prosocial effects. For example, employees can learn communications, sales, and other service skills by behavior modeling from their affiliations. They can gain these skills faster by observing the modeled skills effectively from experienced workers. Parents can be powerful models for their children if their actions and words are consistent. However, an antisocial model usually brings negative and destructive behaviors. Abusive parents might have aggressive children. TV programs containing unpunished violence might elicit violence-viewing effect, a phenomenon that violent behavior is triggered by viewing media violence.

Behaviorism claims that psychology should be an objective science, which studies behavior without considering the factor of mental processes. Though few researchers would ignore mental processes today, most agree that classical conditioning is the very basic form of learning throughout all the organisms

1. Stimulus is any event or situation that evokes a response. "Stimuli" is the plural of the stimulus. Examples of stimuli include bell rings, food, and illumination, etc.

2. Ivan Pavlov (1849-1936), one of the most influential behaviorism who laid the foundation of classical conditioning. His name rings a bell.

3. B. F. Skinner (1904-1990) is one of the most influential and controversial figures of modern behaviorism.

4. a tool for behaviorism research invented by Skinner.

5. the desire to perform a behavior effectively and for its own sake. By interest, for example.

6. the desire to perform a behavior for the sake of promised rewards or predicted punishment.